Interactive Voice 2026: The Future of Kiosks

Last Updated on May 2, 2026 by Craig Allen Keefner

Table of contents

We like to say interactive touchscreens have “peaked” as far as usage. The new modalities such as Voice interaction or Gesture are emerging as replacements. When was the last time you used a touchscreen at a drive-thru? How about voice?

Here is our starting basis for TIG and The Industry Group.

Insight by Intel

By Craig Keefner

TIG has been covering the self-service and kiosk industry since 1995 — before most voice AI vendors existed. The methodology here is built on primary research, operator conversations, hardware teardowns, and two decades of watching vendor claims hit field conditions. Voice has been key component in ADA since 1990s (primarily ATMs). Most kiosks and POS units these days come with audio jack but not that many actually work.

Craig Keefner has been in QSR lanes, in kiosk integrator shops, and in procurement meetings where the $6,000/yr SaaS quote arrived without anyone mentioning the $15,000-$65,000 ODMB underneath it. That background is what makes the cost model in this report different from anything produced by a firm that has never touched the hardware.

Thanks to Acrelec and URway.

First question — how many kiosks and drive-thru screens are there in the world and what percentages of each come with microphone to accept speech

We have 25-page research brief available for purchase.

Executive Overview — Voice AI in Self-Service 2026

This TIG premium research brief provides a field-corrected view of Voice AI in self-service, cutting through inflated vendor narratives and misapplied market metrics. Built on lane-level drive-thru analysis and real deployment data, the report reframes how operators, vendors, and investors should evaluate the true opportunity.

At its core, the report makes a critical distinction: Voice AI is a drive-thru story—not a kiosk story. While adoption in drive-thru environments continues to expand—led overwhelmingly by large QSR chains—penetration, ROI, and scalability are frequently overstated due to flawed measurement models and unrealistic labor assumptions.

What This Report Delivers

Corrected Market Sizing
- Reframes penetration using lanes (microphones) instead of screens—eliminating 2–3× overstatement common in analyst reports
- Establishes a realistic installed base of ~80,000 US voice-enabled drive-thru lanes, heavily concentrated in McDonald’s deployments
Reality Check on Kiosk Voice
- Quantifies actual microphone penetration at ~5% of new kiosks—not the 15–25% often cited
- Demonstrates why retrofit economics fail once touch-based ordering is stable and accurate
True Cost Structure
- Breaks down the full stack: headset systems, ODMB infrastructure, and SaaS voice AI layers
- Documents real-world costs of $18K–$20K per location annually, challenging the “labor replacement” narrative
The Retrofit Problem
- Identifies the hidden cascade of upgrades (OCB → ODMB → conduit → headset → AI) that derail budgets and timelines
- Explains why most “simple upgrades” become multi-system capital projects
Edge vs Cloud Inflection Point
- Positions edge inference as the structural solution to SaaS cost pressure
- Flags GPU supply constraints as the gating factor for near-term adoption
Buyer Decision Framework
- Separates enterprise-ready deployments from SMB marketing claims
- Provides operators with a realistic ROI lens based on throughput, not headcount elimination
Review of Conversational AI engines

Executive Summary 3
Drive-Thru Anatomy: Lanes, Screens & Microphones 4
Voice AI Penetration — Corrected Field Estimates 5
The Retrofit Problem 6
Kiosk Voice Reality 7
Cost Structure & Edge Computing Shift 8
Competitive Pricing Teardown 9
Buyer Decision Framework 12
Vendor Landscape 14
Toast Drive-Thru: SMB Claims vs. Reality 14
Strategic Outlook 2026–2028 15
Intel Insights: Conversational AI Engine Market View 16
Voice AI Implementation Maturity Score (VAIMS) 18
TIG Consensus Partners 21

Strategic Positioning

The report’s central thesis is pragmatic:

Drive-thru voice AI = real, but concentrated and capital-intensive
Kiosk voice AI = compliance-driven, not ROI-driven
Edge AI = the unlock, but supply-constrained
SMB adoption = not yet economically viable

It also makes a critical industry callout:

The Voice AI market is being consistently misread by analysts who conflate drive-thru, kiosk, and phone-based voice into a single TAM.

Who This Is For

QSR enterprise operators evaluating AI ordering investments
Kiosk manufacturers and software providers navigating voice integration strategy
Payments, POS, and edge AI vendors positioning for next-gen deployments
Investors seeking ground-truth market sizing vs. hype cycles

Bottom Line

Voice AI is not a universal interface shift—it is a context-specific optimization.

If you measure it correctly (per lane, not per screen), price it correctly (full stack, not SaaS only), and deploy it where it fits (drive-thru, not dining room), the opportunity is real.

If you don’t—you will overestimate the market, overspend on retrofits, and misread ROI.

NEXT QUESTION

In the US we have kiosks for self-order, medical kiosks for check-in, medical or telehealth kiosks for diagnostics + all types of directory and wayfinding. Where is biggest growth in interactive voice?

Short answer:
👉 The biggest growth in interactive voice is NOT in kiosks.
👉 It’s in high-throughput, labor-constrained, transaction-heavy environments—led by drive-thru, followed by service triage (healthcare + enterprise), and then selective in-store use cases.

Let’s break it down if we can.

1) Where Voice Is Actually Scaling (Ranked)

#1 — Drive-thru (Clear #1 by a mile)

This is no longer experimental—it’s scaling.

Chains like Wendy’s, White Castle, Checkers already expanding deployments
Dairy Queen rolling voice AI across thousands of locations after pilots
Accuracy now ~90–95% in best cases

Why it wins:

Fixed menu → constrained vocabulary
Known workflow → order → confirm → pay
Labor shortage → immediate ROI
Headset already trained customers for voice

👉 This is the only segment where voice is becoming default.

#2 — Call replacement / “front door” voice (Healthcare + Enterprise)

This is the sleeper—and arguably bigger long-term.

Voice agent deployments up 340% YoY
Majority of enterprises now running production voice systems
Cost reduction up to 30%+ vs human support

In our vertical:

Hospital “Digital Front Door”
- appointment scheduling
- triage
- check-in pre-processing
Insurance / benefits navigation
Telehealth intake (not diagnosis UI yet—intake + routing)

👉 This is where the volume is exploding—not on the kiosk, but before the kiosk.

#3 — QSR in-store kiosks (emerging but selective)

This is where people think growth is—but it’s actually constrained.

Where voice works:

Accessibility (Vispero / JAWS use case)
Hands-free ordering (dirty hands, ADA, hygiene)
Upsell / guided ordering

Where it struggles:

Noise + chaos
Multi-user environments
Privacy concerns

👉 Result:

Voice is additive, not primary modality
Penetration growing, but slowly (~15–25% new installs)

#4 — Healthcare kiosks (very targeted growth)

Not general use—specific workflows only.

High-growth use cases:

ADA / accessibility compliance (speech + audio guidance)
Wayfinding + concierge in hospitals
Symptom intake (guided, structured—not open-ended AI)

Constraints:

HIPAA + privacy
ambient noise
liability of misinterpretation

👉 Voice here = assistive + guided, not conversational free-form

#5 — Wayfinding / retail concierge (limited but real)

This is the “demo that finally works—but only sometimes.”

Where it works:

Airports / large venues
malls / casinos
multilingual environments

Why it’s still niche:

People default to touch or mobile
Voice feels awkward in public spaces

2) The Real Pattern

Voice growth follows a very predictable rule:

interactive voice

3) Our Installed Base Lens

From your TIG model (~3M AI-relevant endpoints):

Drive-thru → 0.6M (high voice penetration)
QSR kiosks → 1.6M (low voice penetration)
Healthcare → 0.45M (targeted voice)

Translation:

👉 Most voice growth is NOT adding new endpoints
👉 It’s:

upgrading drive-thru
replacing call centers
selectively augmenting kiosks

4) What Will Actually Grow Fastest (Next 3 Years)

Tier 1 (Explosive growth)

Drive-thru AI ordering
Voice agents (call replacement / front door)

Tier 2 (Strategic growth)

Accessibility voice for kiosks (ADA / EAA driven)
Hybrid kiosk (touch + voice + mobile handoff)

Tier 3 (Selective / niche)

Wayfinding voice
Diagnostic voice (regulated, slower)

5) Bottom Line

The biggest growth in interactive voice isn’t happening at the kiosk—it’s happening before the kiosk and at the drive-thru.

Drive-thru is the only environment where voice is becoming the primary interface, while enterprise and healthcare are rapidly shifting voice to the “front door” for triage and routing.

Kiosks will adopt voice, but as part of a multi-modal stack—not as a replacement for touch.

Conversational AI Engine Market View

Soundhound is the most publicized engine when it comes to self-service so here’s a breakdown of SoundHound’s main competitors, organized by category:

Tech Giants (Toughest Competition)

Google (Google Assistant / Android Automotive) — deeply embedded in cars and Android devices
Amazon (Alexa / AWS) — strong in smart home and expanding into automotive
Apple (Siri) — tight ecosystem integration via iPhone and CarPlay

Direct Voice AI Rivals

Cerence (CRNC) — the dominant automotive voice AI player, powering ~51% of new cars and embedded in 500M+ vehicles; originally spun out of Nuance Communications
Nuance (now owned by Microsoft) — long-standing enterprise voice/NLP powerhouse
Deepgram — AI speech-to-text and audio understanding, popular with developers
Sensory — embedded/edge voice AI across automotive, medical, and consumer electronics
Picovoice — developer-focused voice AI platform (STT, wake words, speaker recognition)

Restaurant / QSR Vertical

Presto (PRST) — drive-thru voice AI, competes directly in fast food ordering
Valyant AI — conversational AI specifically for Quick Serve Restaurants

Enterprise / Contact Center

Retell AI — voice agents via API
Talkdesk — cloud contact center with voice AI
Nextiva — telecom platform with AI voice features
Flip CX — voice AI for customer service

Edge/Specialized

Snips (acquired by Sonos) — on-device voice AI with no cloud dependency
My Voice AI — speaker verification on edge devices

The most relevant head-to-head competition is with Cerence in automotive and the big three (Google, Amazon, Apple) across all verticals. Given our kiosk and digital signage focus, Presto and Valyant AI are worth watching as they compete in the self-service ordering space where SoundHound is also pushing hard.

Related Resources

The Three We Watch

Sodaclick

UK-based, small company (under 25 employees) but punching well above its weight. They cover the full stack — conversational voice AI plus digital signage content — specifically built for kiosks, drive-thrus, and self-service. Key differentiators:

96 languages and variants
Edge deployment via ASUS NUC / Intel hardware (runs locally without cloud dependency — big deal for reliability)
Showcased at Google ChromeOS Experience Center in San Jose and Samsung’s booth at ISE 2024
Partners include Axiomtek, ASUS, Samsung, HP, Advantech
Already deployed in Australian QSR chain Olivers (220+ stores)

SoundHound

Publicly traded (SOUN), much larger, revenue growing 200%+ YoY. Strong in automotive and restaurant voice ordering. Their Houndify platform lets brands build custom voice AI. More enterprise-scale but less focused purely on the kiosk/self-service hardware layer.

ElevenLabs

Primarily a voice synthesis and conversational AI platform — best-in-class voice quality and naturalness. Their ElevenAgents product is the relevant piece for self-service. More of an underlying engine that others (including potentially Sodaclick) could build on, rather than a turnkey kiosk solution.

Matrix Comparison

Are they built for kiosks, or digital signage, how good at Edge, quality, scale and partnerships
See Gumroad to purchase

Addendums

Modern AI drive-thru platforms are evolving beyond simple voice ordering into fully orchestrated systems. Solutions like FLOW DRIVE combine conversational AI, real-time visual order confirmation, dynamic upselling engines, and queue-aware optimization. The result is not just automation, but a system that actively balances speed, accuracy, and revenue—adjusting behavior in real time based on traffic conditions. Critically, voice is only one component of a broader multimodal architecture that includes digital signage, POS integration, and AI-driven personalization. Here is PDF explaining from experienced deployer — M4B WAVE Drive Thru_v5

voice maturity model credit

Useful Reading and References

Case-Study-Taco-Bell-April-2026 — Omilia “generally” talks about 900 restaurants of Taco Bell
Sep 2025 — Taco Bell Reconsiders AI
Financial news for YUM

Posts 2026: 575