Last Updated on
Voice response promises to add a new interactivity for self-service devices, but there are some hurdles that will need to be overcome.
By Richard Slawsky contributor
When we think of interactive kiosks, what typically comes to mind is the touch-enabled displays that are a nearly ubiquitous component of today’s self-service devices. Trained in part by the tap, pinch and swipe actions that are the main feature of smartphones, we’ve come to expect to be able to interact with kiosks through touch. Although touch-enabled displays have been around in one form or another for more than 50 years, it’s only recently that they have become mainstream thanks in part to Apple’s introduction of the iPhone.
Over the past few years, though, the concept of interactivity has taken on a new dimension. Driven in part by home automation devices such as Amazon’s Echo and Google’s Home, people are becoming increasingly comfortable with a new way of interacting with self-service devices: by voice.
A growing number of technology vendors have been introducing voice-enabled kiosks over the past few years. The question remains, though: what does the future hold for interactive voice response and what needs will it fill when it comes to interactive kiosks?
Challenges slowing adoption
Simply put, an interactive voice response system is a computer interface that accepts input by voice rather than mouse, keyboard or touch. The technology has been around at least since the 1970s but has become increasingly widespread as large organizations deploy such systems to handle customer service. And when combined with artificial intelligence, it’s becoming increasingly difficult to distinguish VR from communication with a live person.
When it comes to self-service kiosks, a quick Internet search shows dozens of vendors offering devices outfitted with a VR interface. Such interfaces are touted as a way to provide access for those with limited hand mobility as well as those who can’t read. As is the case with on-screen touch menus. It’s relatively easy to incorporate a variety of languages into VR, allowing the deployer to serve those with a limited command of English.
But while the technology improves on nearly a daily basis, it may be a while before VR-enabled kiosks become commonplace. One of the key reasons is that deploying VR will mean either retrofitting existing kiosks with new hardware or deploying new devices outfitted with the technology.
“Voice recognition is ready for kiosks and companies like Zivelo are already looking at ways to begin rolling the technology out on a wider scale,” said Rob Carpenter, CEO and Founder of Valyant AI, an enterprise-grade conversational AI platform for the quick-serve restaurant industry.
“The biggest hindrance to adoption and scale is going to be the inclusion of microphones and speakers in kiosks, which are required for conversational AI, but hadn’t been included in past hardware iterations because they weren’t needed at the time,” Carpenter said.
The environment where the kiosk will be located will also be a consideration.
“It’ll be important to look at the hardware’s ability to handle conversational AI (it’ll need embedded microphones and speakers), but it’s also important to consider the noise level in the environments,” Carpenter said.
“Conversational AI might struggle in high traffic areas like airports where there is so much noise it’s hard for the AI to hear the customer,” he said. “It’s very likely that for the highest and best use of conversational AI in kiosks, it may also require other capabilities like lip reading and triangulating the customer in a physical space to separate out disparate noise channels.”
As such, deployers will need to incorporate design considerations that include microphone arrays focused on specific areas where a user might be standing. They’ll also need to incorporate design considerations beyond the kiosk itself, including noise-absorbing carpet and walls in the area where the device will be located.
Privacy concerns will come into play as well. Amazon’s Echo devices, for example, store a record of what they hear when activated. And while such recording is only supposed to occur when the user says a “wake” word such as Alexa, anyone who owns such a device knows similar words can prompt a wakeup as well. In addition, when someone is using a VR-enabled kiosk there’s a distinct possibility that nearby sounds will be picked up and recorded as well.
“[It’s a concern] not only for the person ordering train tickets, but for the person who might be standing next to that person who’s having a quite high-level conversation on the phone with a business colleague—or his mistress,” said Nicky Shaw, North American distribution manager with Storm Interface. Storm designs, develops, manufactures and markets heavy-duty keypads, keyboards, and custom computer interface devices, including those that provide accessibility for those with disabilities.
“Now that’s also been picked up and sent to the cloud,” she said. “Privacy needs to be given more consideration in my view because just deploying a microphone on a kiosk with no visible or audible means of letting people know it’s always on needs to be factored into the design.”
The protocols and practices for implementing voice in kiosks are not addressed in any U.S. Access Board standards and the KMA with Storm have incorporated a proposed voice framework for accessibility and more. The Access Board has these standards to consider as a baseline for when they create actual standards. In that sense KMA is setting the table for them.
The degree to which companies mine voice data for advertising information creates its own set of privacy concerns. Because most voice user interfaces require cloud processing services, any time the voice leaves the device makes the process more susceptible to a privacy breach.
That can also create branding issues, with potential confusion as to who exactly the kiosk represents. Is it the foodservice operator, ticker or retailer, or is it a company such as Google or Amazon?
And at the end of the day, making it easy for the average person to use will go a long way toward determining how successful VR in interactive kiosks will be.
“Voice input is the collection method, while the platform collecting the command is the brain/processing power to take the correct actions,” said Tomer Mann, EVP for Milpitas, Calif.-based software company 22Miles.
“We are moving forward with integration but there is a long way to go,” Mann said. “We have the input command solution but the processing machine learning technology needs to improve. It will happen with a few more iterations and innovation.”
One of the obvious applications for VR in self-service kiosks is for accessibility, enabling their use by those with impaired vision or limited hand mobility.
VR can also be used to create the “wow” experience business operators are looking for. Imagine, for example, the opening of the latest blockbuster superhero movie.
“Let’s say a video wall at the theater senses that someone is approaching,” said Sanjeev Varshney, director, Global SAP with Secaucus, N.J. based Cyntralabs, a developer of integrated solutions that help retailers drive sales.
“It could display a character from the movie, who says something such as ‘what movie would you like to see?’,” he said. “The character could then point to a card reader and say ‘just insert your credit card here” and have the tickets printed out or have an SMS sent to your phone.”
“One driver for voice relates to efficient and faster transactions” said Joe Gianelli, CEO & cofounder of Santa Cruz, Calif.-based Aaware Inc., a developer of technology that enables voice interfaces.
Consider tasks that may require an excessive amount of screen navigation or drilling down, Gianelli said. Voice is usually much more efficient if the user needs to navigate beyond three levels of touch.
Of course, VR won’t be a catch-all solution. Still, VR could be part of a menu of accessibility options.
“Speech command technology will never replace the need for other interface devices because people with speech impediments won’t be able to use it, just like there are people who are blind and can’t use a touchscreen,” Shaw said.
“A deployer would still need to provide tactile interface devices as well as the speech command,” she said. “This needs to be seen as another element in multimodal accessibility. There’s not a one-size-fits all solution.”
The technology is at its infancy, but with further innovations and feature updates, the solutions will only be more agile to day-to-day user experiences,” Mann said.
“Technology is getting there,” he said. “22Miles just wants to stay ahead of that innovation as we do it all other digital or content triggering capabilities.”
And when it comes to industries, some of the key applications insiders are seeing are in the ticketing and restaurant ordering fields, with initial results showing promise. Catalogue lookup in a retail setting might also be a prime candidate.
“Imagine being able to find, filter and sort any item through voice,” Carpenter said. “It would eliminate the tedious tasks of searching through pages and pages of items to find your favorites. Just tell it what you want and then be on your way.”