Voice — the new modality for Self-Service – How Popular?

By | April 19, 2026
voice tam self-service

Last Updated on May 2, 2026 by Craig Allen Keefner

We like to say interactive touchscreens have “peaked” as far as usage.  The new modalities such as Voice interaction or Gesture are emerging as replacements.  When was the last time you used a touchscreen at a drive-thru? How about voice?

Here is our starting basis for TIG and The Industry Group.

Insight by Intel

By Craig Keefner

TIG has been covering the self-service and kiosk industry since 1995 — before most voice AI vendors existed. The methodology here is built on primary research, operator conversations, hardware teardowns, and two decades of watching vendor claims hit field conditions. Voice has been key component in ADA since 1990s (primarily ATMs).  Most kiosks and POS units these days come with audio jack but not that many actually work.

Craig Keefner has been in QSR lanes, in kiosk integrator shops, and in procurement meetings where the $6,000/yr SaaS quote arrived without anyone mentioning the $15,000-$65,000 ODMB underneath it. That background is what makes the cost model in this report different from anything produced by a firm that has never touched the hardware.

Thanks to Acrelec and URway.

First question — how many kiosks and drive-thru screens are there in the world and what percentages of each come with microphone to accept speech

We have 25-page research brief available for purchase.

Executive Overview — Voice AI in Self-Service 2026

This TIG premium research brief provides a field-corrected view of Voice AI in self-service, cutting through inflated vendor narratives and misapplied market metrics. Built on lane-level drive-thru analysis and real deployment data, the report reframes how operators, vendors, and investors should evaluate the true opportunity.

At its core, the report makes a critical distinction: Voice AI is a drive-thru story—not a kiosk story. While adoption in drive-thru environments continues to expand—led overwhelmingly by large QSR chains—penetration, ROI, and scalability are frequently overstated due to flawed measurement models and unrealistic labor assumptions.

What This Report Delivers

  • Corrected Market Sizing
    • Reframes penetration using lanes (microphones) instead of screens—eliminating 2–3× overstatement common in analyst reports
    • Establishes a realistic installed base of ~80,000 US voice-enabled drive-thru lanes, heavily concentrated in McDonald’s deployments
  • Reality Check on Kiosk Voice
    • Quantifies actual microphone penetration at ~5% of new kiosks—not the 15–25% often cited
    • Demonstrates why retrofit economics fail once touch-based ordering is stable and accurate
  • True Cost Structure
    • Breaks down the full stack: headset systems, ODMB infrastructure, and SaaS voice AI layers
    • Documents real-world costs of $18K–$20K per location annually, challenging the “labor replacement” narrative
  • The Retrofit Problem
    • Identifies the hidden cascade of upgrades (OCB → ODMB → conduit → headset → AI) that derail budgets and timelines
    • Explains why most “simple upgrades” become multi-system capital projects
  • Edge vs Cloud Inflection Point
    • Positions edge inference as the structural solution to SaaS cost pressure
    • Flags GPU supply constraints as the gating factor for near-term adoption
  • Buyer Decision Framework
    • Separates enterprise-ready deployments from SMB marketing claims
    • Provides operators with a realistic ROI lens based on throughput, not headcount elimination
  • Review of Conversational AI engines

CONTENTS

  1. Executive Summary 3
  2. Drive-Thru Anatomy: Lanes, Screens & Microphones 4
  3. Voice AI Penetration — Corrected Field Estimates 5
  4. The Retrofit Problem 6
  5. Kiosk Voice Reality 7
  6. Cost Structure & Edge Computing Shift 8
  7. Competitive Pricing Teardown 9
  8. Buyer Decision Framework 12
  9. Vendor Landscape 14
  10. Toast Drive-Thru: SMB Claims vs. Reality 14
  11. Strategic Outlook 2026–2028 15
  12. Intel Insights: Conversational AI Engine Market View 16
  13. Voice AI Implementation Maturity Score (VAIMS) 18
  14. TIG Consensus Partners 21

Strategic Positioning

The report’s central thesis is pragmatic:

  • Drive-thru voice AI = real, but concentrated and capital-intensive
  • Kiosk voice AI = compliance-driven, not ROI-driven
  • Edge AI = the unlock, but supply-constrained
  • SMB adoption = not yet economically viable

It also makes a critical industry callout:

The Voice AI market is being consistently misread by analysts who conflate drive-thru, kiosk, and phone-based voice into a single TAM.

Who This Is For

  • QSR enterprise operators evaluating AI ordering investments
  • Kiosk manufacturers and software providers navigating voice integration strategy
  • Payments, POS, and edge AI vendors positioning for next-gen deployments
  • Investors seeking ground-truth market sizing vs. hype cycles

Bottom Line

Voice AI is not a universal interface shift—it is a context-specific optimization.

If you measure it correctly (per lane, not per screen), price it correctly (full stack, not SaaS only), and deploy it where it fits (drive-thru, not dining room), the opportunity is real.

If you don’t—you will overestimate the market, overspend on retrofits, and misread ROI.

NEXT QUESTION

In the US we have kiosks for self-order, medical kiosks for check-in, medical or telehealth kiosks for diagnostics + all types of directory and wayfinding. Where is biggest growth in interactive voice?

Short answer:
👉 The biggest growth in interactive voice is NOT in kiosks.
👉 It’s in high-throughput, labor-constrained, transaction-heavy environments—led by drive-thru, followed by service triage (healthcare + enterprise), and then selective in-store use cases.

Let’s break it down if we can.


1) Where Voice Is Actually Scaling (Ranked)

#1 — Drive-thru (Clear #1 by a mile)

This is no longer experimental—it’s scaling.

  • Chains like Wendy’s, White Castle, Checkers already expanding deployments
  • Dairy Queen rolling voice AI across thousands of locations after pilots
  • Accuracy now ~90–95% in best cases

Why it wins:

  • Fixed menu → constrained vocabulary
  • Known workflow → order → confirm → pay
  • Labor shortage → immediate ROI
  • Headset already trained customers for voice

👉 This is the only segment where voice is becoming default.


#2 — Call replacement / “front door” voice (Healthcare + Enterprise)

This is the sleeper—and arguably bigger long-term.

  • Voice agent deployments up 340% YoY
  • Majority of enterprises now running production voice systems
  • Cost reduction up to 30%+ vs human support

In our vertical:

  • Hospital “Digital Front Door”
    • appointment scheduling
    • triage
    • check-in pre-processing
  • Insurance / benefits navigation
  • Telehealth intake (not diagnosis UI yet—intake + routing)

👉 This is where the volume is exploding—not on the kiosk, but before the kiosk.


#3 — QSR in-store kiosks (emerging but selective)

This is where people think growth is—but it’s actually constrained.

Where voice works:

  • Accessibility (Vispero / JAWS use case)
  • Hands-free ordering (dirty hands, ADA, hygiene)
  • Upsell / guided ordering

Where it struggles:

  • Noise + chaos
  • Multi-user environments
  • Privacy concerns

👉 Result:

  • Voice is additive, not primary modality
  • Penetration growing, but slowly (~15–25% new installs)

#4 — Healthcare kiosks (very targeted growth)

Not general use—specific workflows only.

High-growth use cases:

Constraints:

  • HIPAA + privacy
  • ambient noise
  • liability of misinterpretation

👉 Voice here = assistive + guided, not conversational free-form


#5 — Wayfinding / retail concierge (limited but real)

This is the “demo that finally works—but only sometimes.”

Where it works:

  • Airports / large venues
  • malls / casinos
  • multilingual environments

Why it’s still niche:

  • People default to touch or mobile
  • Voice feels awkward in public spaces

2) The Real Pattern

Voice growth follows a very predictable rule:

interactive voice

interactive voice

More

3) Our Installed Base Lens

From your TIG model (~3M AI-relevant endpoints):

  • Drive-thru → 0.6M (high voice penetration)
  • QSR kiosks → 1.6M (low voice penetration)
  • Healthcare → 0.45M (targeted voice)

Translation:

👉 Most voice growth is NOT adding new endpoints
👉 It’s:

  • upgrading drive-thru
  • replacing call centers
  • selectively augmenting kiosks

4) What Will Actually Grow Fastest (Next 3 Years)

Tier 1 (Explosive growth)

  1. Drive-thru AI ordering
  2. Voice agents (call replacement / front door)

Tier 2 (Strategic growth)

  1. Accessibility voice for kiosks (ADA / EAA driven)
  2. Hybrid kiosk (touch + voice + mobile handoff)

Tier 3 (Selective / niche)

  1. Wayfinding voice
  2. Diagnostic voice (regulated, slower)

5) Bottom Line

The biggest growth in interactive voice isn’t happening at the kiosk—it’s happening before the kiosk and at the drive-thru.

Drive-thru is the only environment where voice is becoming the primary interface, while enterprise and healthcare are rapidly shifting voice to the “front door” for triage and routing.

Kiosks will adopt voice, but as part of a multi-modal stack—not as a replacement for touch.

Conversational AI Engine Market View

Soundhound is the most publicized engine when it comes to self-service so here’s a breakdown of SoundHound’s main competitors, organized by category:

Tech Giants (Toughest Competition)

  • Google (Google Assistant / Android Automotive) — deeply embedded in cars and Android devices

  • Amazon (Alexa / AWS) — strong in smart home and expanding into automotive

  • Apple (Siri) — tight ecosystem integration via iPhone and CarPlay

Direct Voice AI Rivals

  • Cerence (CRNC) — the dominant automotive voice AI player, powering ~51% of new cars and embedded in 500M+ vehicles; originally spun out of Nuance Communications

  • Nuance (now owned by Microsoft) — long-standing enterprise voice/NLP powerhouse

  • Deepgram — AI speech-to-text and audio understanding, popular with developers

  • Sensory — embedded/edge voice AI across automotive, medical, and consumer electronics

  • Picovoice — developer-focused voice AI platform (STT, wake words, speaker recognition)

Restaurant / QSR Vertical

  • Presto (PRST) — drive-thru voice AI, competes directly in fast food ordering

  • Valyant AI — conversational AI specifically for Quick Serve Restaurants

Enterprise / Contact Center

  • Retell AI — voice agents via API

  • Talkdesk — cloud contact center with voice AI

  • Nextiva — telecom platform with AI voice features

  • Flip CX — voice AI for customer service

Edge/Specialized

  • Snips (acquired by Sonos) — on-device voice AI with no cloud dependency

  • My Voice AI — speaker verification on edge devices

The most relevant head-to-head competition is with Cerence in automotive and the big three (Google, Amazon, Apple) across all verticals. Given our kiosk and digital signage focus, Presto and Valyant AI are worth watching as they compete in the self-service ordering space where SoundHound is also pushing hard.

Related Resources

The Three We Watch

Sodaclick

UK-based, small company (under 25 employees) but punching well above its weight. They cover the full stack — conversational voice AI plus digital signage content — specifically built for kiosks, drive-thrus, and self-service. Key differentiators:

  • 96 languages and variants

  • Edge deployment via ASUS NUC / Intel hardware (runs locally without cloud dependency — big deal for reliability)

  • Showcased at Google ChromeOS Experience Center in San Jose and Samsung’s booth at ISE 2024

  • Partners include Axiomtek, ASUS, Samsung, HP, Advantech

  • Already deployed in Australian QSR chain Olivers (220+ stores)

SoundHound

Publicly traded (SOUN), much larger, revenue growing 200%+ YoY. Strong in automotive and restaurant voice ordering. Their Houndify platform lets brands build custom voice AI. More enterprise-scale but less focused purely on the kiosk/self-service hardware layer.

ElevenLabs

Primarily a voice synthesis and conversational AI platform — best-in-class voice quality and naturalness. Their ElevenAgents product is the relevant piece for self-service. More of an underlying engine that others (including potentially Sodaclick) could build on, rather than a turnkey kiosk solution.

Matrix Comparison

  • Are they built for kiosks, or digital signage, how good at Edge, quality, scale and partnerships
  • See Gumroad to purchase

Addendums

  • Modern AI drive-thru platforms are evolving beyond simple voice ordering into fully orchestrated systems. Solutions like FLOW DRIVE combine conversational AI, real-time visual order confirmation, dynamic upselling engines, and queue-aware optimization. The result is not just automation, but a system that actively balances speed, accuracy, and revenue—adjusting behavior in real time based on traffic conditions. Critically, voice is only one component of a broader multimodal architecture that includes digital signage, POS integration, and AI-driven personalization.  Here is PDF explaining from experienced deployer — M4B WAVE Drive Thru_v5
voice maturity model credit

voice maturity model credit

Useful Reading and References

Author: Craig Allen Keefner

With over 40 years in the industry, Craig is considered to be one of the top experts in the field. Kiosk projects include Verizon Bill Pay kiosk and thousands of others. Craig was co-founder of kioskmarketplace and formed the KMA. Note the point of view here is not necessarily the stance of the Kiosk Association or kma.global -- Currently he manages The Industry Group