Last Updated on May 2, 2026 by Craig Allen Keefner
We like to say interactive touchscreens have “peaked” as far as usage. The new modalities such as Voice interaction or Gesture are emerging as replacements. When was the last time you used a touchscreen at a drive-thru? How about voice?
Here is our starting basis for TIG and The Industry Group.
Insight by Intel
TIG has been covering the self-service and kiosk industry since 1995 — before most voice AI vendors existed. The methodology here is built on primary research, operator conversations, hardware teardowns, and two decades of watching vendor claims hit field conditions. Voice has been key component in ADA since 1990s (primarily ATMs). Most kiosks and POS units these days come with audio jack but not that many actually work.
Craig Keefner has been in QSR lanes, in kiosk integrator shops, and in procurement meetings where the $6,000/yr SaaS quote arrived without anyone mentioning the $15,000-$65,000 ODMB underneath it. That background is what makes the cost model in this report different from anything produced by a firm that has never touched the hardware.
First question — how many kiosks and drive-thru screens are there in the world and what percentages of each come with microphone to accept speech
We have 25-page research brief available for purchase.
Executive Overview — Voice AI in Self-Service 2026
This TIG premium research brief provides a field-corrected view of Voice AI in self-service, cutting through inflated vendor narratives and misapplied market metrics. Built on lane-level drive-thru analysis and real deployment data, the report reframes how operators, vendors, and investors should evaluate the true opportunity.
At its core, the report makes a critical distinction: Voice AI is a drive-thru story—not a kiosk story. While adoption in drive-thru environments continues to expand—led overwhelmingly by large QSR chains—penetration, ROI, and scalability are frequently overstated due to flawed measurement models and unrealistic labor assumptions.
What This Report Delivers
- Corrected Market Sizing
- Reframes penetration using lanes (microphones) instead of screens—eliminating 2–3× overstatement common in analyst reports
- Establishes a realistic installed base of ~80,000 US voice-enabled drive-thru lanes, heavily concentrated in McDonald’s deployments
- Reality Check on Kiosk Voice
- Quantifies actual microphone penetration at ~5% of new kiosks—not the 15–25% often cited
- Demonstrates why retrofit economics fail once touch-based ordering is stable and accurate
- True Cost Structure
- Breaks down the full stack: headset systems, ODMB infrastructure, and SaaS voice AI layers
- Documents real-world costs of $18K–$20K per location annually, challenging the “labor replacement” narrative
- The Retrofit Problem
- Identifies the hidden cascade of upgrades (OCB → ODMB → conduit → headset → AI) that derail budgets and timelines
- Explains why most “simple upgrades” become multi-system capital projects
- Edge vs Cloud Inflection Point
- Positions edge inference as the structural solution to SaaS cost pressure
- Flags GPU supply constraints as the gating factor for near-term adoption
- Buyer Decision Framework
- Separates enterprise-ready deployments from SMB marketing claims
- Provides operators with a realistic ROI lens based on throughput, not headcount elimination
- Review of Conversational AI engines
CONTENTS
- Executive Summary 3
- Drive-Thru Anatomy: Lanes, Screens & Microphones 4
- Voice AI Penetration — Corrected Field Estimates 5
- The Retrofit Problem 6
- Kiosk Voice Reality 7
- Cost Structure & Edge Computing Shift 8
- Competitive Pricing Teardown 9
- Buyer Decision Framework 12
- Vendor Landscape 14
- Toast Drive-Thru: SMB Claims vs. Reality 14
- Strategic Outlook 2026–2028 15
- Intel Insights: Conversational AI Engine Market View 16
- Voice AI Implementation Maturity Score (VAIMS) 18
- TIG Consensus Partners 21
Strategic Positioning
The report’s central thesis is pragmatic:
- Drive-thru voice AI = real, but concentrated and capital-intensive
- Kiosk voice AI = compliance-driven, not ROI-driven
- Edge AI = the unlock, but supply-constrained
- SMB adoption = not yet economically viable
It also makes a critical industry callout:
The Voice AI market is being consistently misread by analysts who conflate drive-thru, kiosk, and phone-based voice into a single TAM.
Who This Is For
- QSR enterprise operators evaluating AI ordering investments
- Kiosk manufacturers and software providers navigating voice integration strategy
- Payments, POS, and edge AI vendors positioning for next-gen deployments
- Investors seeking ground-truth market sizing vs. hype cycles
Bottom Line
Voice AI is not a universal interface shift—it is a context-specific optimization.
If you measure it correctly (per lane, not per screen), price it correctly (full stack, not SaaS only), and deploy it where it fits (drive-thru, not dining room), the opportunity is real.
If you don’t—you will overestimate the market, overspend on retrofits, and misread ROI.
NEXT QUESTION
In the US we have kiosks for self-order, medical kiosks for check-in, medical or telehealth kiosks for diagnostics + all types of directory and wayfinding. Where is biggest growth in interactive voice?
Short answer:
👉 The biggest growth in interactive voice is NOT in kiosks.
👉 It’s in high-throughput, labor-constrained, transaction-heavy environments—led by drive-thru, followed by service triage (healthcare + enterprise), and then selective in-store use cases.
Let’s break it down if we can.
1) Where Voice Is Actually Scaling (Ranked)
#1 — Drive-thru (Clear #1 by a mile)
This is no longer experimental—it’s scaling.
- Chains like Wendy’s, White Castle, Checkers already expanding deployments
- Dairy Queen rolling voice AI across thousands of locations after pilots
- Accuracy now ~90–95% in best cases
Why it wins:
- Fixed menu → constrained vocabulary
- Known workflow → order → confirm → pay
- Labor shortage → immediate ROI
- Headset already trained customers for voice
👉 This is the only segment where voice is becoming default.
#2 — Call replacement / “front door” voice (Healthcare + Enterprise)
This is the sleeper—and arguably bigger long-term.
- Voice agent deployments up 340% YoY
- Majority of enterprises now running production voice systems
- Cost reduction up to 30%+ vs human support
In our vertical:
- Hospital “Digital Front Door”
- appointment scheduling
- triage
- check-in pre-processing
- Insurance / benefits navigation
- Telehealth intake (not diagnosis UI yet—intake + routing)
👉 This is where the volume is exploding—not on the kiosk, but before the kiosk.
#3 — QSR in-store kiosks (emerging but selective)
This is where people think growth is—but it’s actually constrained.
Where voice works:
- Accessibility (Vispero / JAWS use case)
- Hands-free ordering (dirty hands, ADA, hygiene)
- Upsell / guided ordering
Where it struggles:
- Noise + chaos
- Multi-user environments
- Privacy concerns
👉 Result:
- Voice is additive, not primary modality
- Penetration growing, but slowly (~15–25% new installs)
#4 — Healthcare kiosks (very targeted growth)
Not general use—specific workflows only.
High-growth use cases:
- ADA / accessibility compliance (speech + audio guidance)
- Wayfinding + concierge in hospitals
- Symptom intake (guided, structured—not open-ended AI)
Constraints:
- HIPAA + privacy
- ambient noise
- liability of misinterpretation
👉 Voice here = assistive + guided, not conversational free-form
#5 — Wayfinding / retail concierge (limited but real)
This is the “demo that finally works—but only sometimes.”
Where it works:
- Airports / large venues
- malls / casinos
- multilingual environments
Why it’s still niche:
- People default to touch or mobile
- Voice feels awkward in public spaces
2) The Real Pattern
Voice growth follows a very predictable rule:
More
3) Our Installed Base Lens
From your TIG model (~3M AI-relevant endpoints):
- Drive-thru → 0.6M (high voice penetration)
- QSR kiosks → 1.6M (low voice penetration)
- Healthcare → 0.45M (targeted voice)
Translation:
👉 Most voice growth is NOT adding new endpoints
👉 It’s:
- upgrading drive-thru
- replacing call centers
- selectively augmenting kiosks
4) What Will Actually Grow Fastest (Next 3 Years)
Tier 1 (Explosive growth)
- Drive-thru AI ordering
- Voice agents (call replacement / front door)
Tier 2 (Strategic growth)
- Accessibility voice for kiosks (ADA / EAA driven)
- Hybrid kiosk (touch + voice + mobile handoff)
Tier 3 (Selective / niche)
- Wayfinding voice
- Diagnostic voice (regulated, slower)
5) Bottom Line
The biggest growth in interactive voice isn’t happening at the kiosk—it’s happening before the kiosk and at the drive-thru.
Drive-thru is the only environment where voice is becoming the primary interface, while enterprise and healthcare are rapidly shifting voice to the “front door” for triage and routing.
Kiosks will adopt voice, but as part of a multi-modal stack—not as a replacement for touch.
Conversational AI Engine Market View
Soundhound is the most publicized engine when it comes to self-service so here’s a breakdown of SoundHound’s main competitors, organized by category:
Tech Giants (Toughest Competition)
-
Google (Google Assistant / Android Automotive) — deeply embedded in cars and Android devices
-
Amazon (Alexa / AWS) — strong in smart home and expanding into automotive
-
Apple (Siri) — tight ecosystem integration via iPhone and CarPlay
Direct Voice AI Rivals
-
Cerence (CRNC) — the dominant automotive voice AI player, powering ~51% of new cars and embedded in 500M+ vehicles; originally spun out of Nuance Communications
-
Nuance (now owned by Microsoft) — long-standing enterprise voice/NLP powerhouse
-
Deepgram — AI speech-to-text and audio understanding, popular with developers
-
Sensory — embedded/edge voice AI across automotive, medical, and consumer electronics
-
Picovoice — developer-focused voice AI platform (STT, wake words, speaker recognition)
Restaurant / QSR Vertical
-
Presto (PRST) — drive-thru voice AI, competes directly in fast food ordering
-
Valyant AI — conversational AI specifically for Quick Serve Restaurants
Enterprise / Contact Center
-
Retell AI — voice agents via API
-
Talkdesk — cloud contact center with voice AI
-
Nextiva — telecom platform with AI voice features
-
Flip CX — voice AI for customer service
Edge/Specialized
-
Snips (acquired by Sonos) — on-device voice AI with no cloud dependency
-
My Voice AI — speaker verification on edge devices
The most relevant head-to-head competition is with Cerence in automotive and the big three (Google, Amazon, Apple) across all verticals. Given our kiosk and digital signage focus, Presto and Valyant AI are worth watching as they compete in the self-service ordering space where SoundHound is also pushing hard.
Related Resources
- Voice Order AI – Conversational Self Order
- AI Connect Bar – Conversational AI Audio Gets a Leg Up
- How Kiosks Meet EAA 2025 Compliance with Conversational Voice AI
- AI Assist for Voice Order Kiosks
- Voice Order Kiosk: Enhance Customer Experience
- AI Kiosk Assist and AI Voice Enabled – Kiosk Industry
- Voice Command kiosk – KIosk Manufacturer Association
- Alexa Self-Order Voice Command Voice Response QSR
The Three We Watch
Sodaclick
UK-based, small company (under 25 employees) but punching well above its weight. They cover the full stack — conversational voice AI plus digital signage content — specifically built for kiosks, drive-thrus, and self-service. Key differentiators:
-
96 languages and variants
-
Edge deployment via ASUS NUC / Intel hardware (runs locally without cloud dependency — big deal for reliability)
-
Showcased at Google ChromeOS Experience Center in San Jose and Samsung’s booth at ISE 2024
-
Partners include Axiomtek, ASUS, Samsung, HP, Advantech
-
Already deployed in Australian QSR chain Olivers (220+ stores)
SoundHound
Publicly traded (SOUN), much larger, revenue growing 200%+ YoY. Strong in automotive and restaurant voice ordering. Their Houndify platform lets brands build custom voice AI. More enterprise-scale but less focused purely on the kiosk/self-service hardware layer.
ElevenLabs
Primarily a voice synthesis and conversational AI platform — best-in-class voice quality and naturalness. Their ElevenAgents product is the relevant piece for self-service. More of an underlying engine that others (including potentially Sodaclick) could build on, rather than a turnkey kiosk solution.
Matrix Comparison
- Are they built for kiosks, or digital signage, how good at Edge, quality, scale and partnerships
-
See Gumroad to purchase
Addendums
- Modern AI drive-thru platforms are evolving beyond simple voice ordering into fully orchestrated systems. Solutions like FLOW DRIVE combine conversational AI, real-time visual order confirmation, dynamic upselling engines, and queue-aware optimization. The result is not just automation, but a system that actively balances speed, accuracy, and revenue—adjusting behavior in real time based on traffic conditions. Critically, voice is only one component of a broader multimodal architecture that includes digital signage, POS integration, and AI-driven personalization. Here is PDF explaining from experienced deployer — M4B WAVE Drive Thru_v5
Useful Reading and References
- Case-Study-Taco-Bell-April-2026 — Omilia “generally” talks about 900 restaurants of Taco Bell
- Sep 2025 — Taco Bell Reconsiders AI
- Financial news for YUM
