Tag Archives: Voice Recognition

Kiosk Interactivity Redefined – Kiosk Voice Recognition and Voice Command

Voice response promises to add a new interactivity for self-service devices, but there are some hurdles that will need to be overcome.

By Richard Slawsky contributor

When we think of interactive kiosks, what typically comes to mind is the touch-enabled displays that are a nearly ubiquitous component of today’s self-service devices. Trained in part by the tap, pinch and swipe actions that are the main feature of smartphones, we’ve come to expect to be able to interact with kiosks through touch. Although touch-enabled displays have been around in one form or another for more than 50 years, it’s only recently that they have become mainstream thanks in part to Apple’s introduction of the iPhone. 

Over the past few years, though, the concept of interactivity has taken on a new dimension. Driven in part by home automation devices such as Amazon’s Echo and Google’s Home, people are becoming increasingly comfortable with a new way of interacting with self-service devices: by voice.

A growing number of technology vendors have been introducing voice-enabled kiosks over the past few years. The question remains, though: what does the future hold for interactive voice response and what needs will it fill when it comes to interactive kiosks?

Challenges slowing adoption

Simply put, an interactive voice response system is a computer interface that accepts input by voice rather than mouse, keyboard or touch. The technology has been around at least since the 1970s but has become increasingly widespread as large organizations deploy such systems to handle customer service. And when combined with artificial intelligence, it’s becoming increasingly difficult to distinguish VR from communication with a live person.

When it comes to self-service kiosks, a quick Internet search shows dozens of vendors offering devices outfitted with a VR interface. Such interfaces are touted as a way to provide access for those with limited hand mobility as well as those who can’t read. As is the case with on-screen touch menus. It’s relatively easy to incorporate a variety of languages into VR, allowing the deployer to serve those with a limited command of English.

But while the technology improves on nearly a daily basis, it may be a while before VR-enabled kiosks become commonplace. One of the key reasons is that deploying VR will mean either retrofitting existing kiosks with new hardware or deploying new devices outfitted with the technology.

“Voice recognition is ready for kiosks and companies like Zivelo are already looking at ways to begin rolling the technology out on a wider scale,” said Rob Carpenter, CEO and Founder of Valyant AI, an enterprise-grade conversational AI platform for the quick-serve restaurant industry. 

“The biggest hindrance to adoption and scale is going to be the inclusion of microphones and speakers in kiosks, which are required for conversational AI, but hadn’t been included in past hardware iterations because they weren’t needed at the time,” Carpenter said.

The environment where the kiosk will be located will also be a consideration.

“It’ll be important to look at the hardware’s ability to handle conversational AI (it’ll need embedded microphones and speakers), but it’s also important to consider the noise level in the environments,” Carpenter said.

“Conversational AI might struggle in high traffic areas like airports where there is so much noise it’s hard for the AI to hear the customer,” he said. “It’s very likely that for the highest and best use of conversational AI in kiosks, it may also require other capabilities like lip reading and triangulating the customer in a physical space to separate out disparate noise channels.”

As such, deployers will need to incorporate design considerations that include microphone arrays focused on specific areas where a user might be standing. They’ll also need to incorporate design considerations beyond the kiosk itself, including noise-absorbing carpet and walls in the area where the device will be located.

Privacy Concerns

Privacy concerns will come into play as well. Amazon’s Echo devices, for example, store a record of what they hear when activated. And while such recording is only supposed to occur when the user says a “wake” word such as Alexa, anyone who owns such a device knows similar words can prompt a wakeup as well. In addition, when someone is using a VR-enabled kiosk there’s a distinct possibility that nearby sounds will be picked up and recorded as well.

“[It’s a concern] not only for the person ordering train tickets, but for the person who might be standing next to that person who’s having a quite high-level conversation on the phone with a business colleague—or his mistress,” said Nicky Shaw, North American distribution manager with Storm Interface. Storm designs, develops, manufactures and markets heavy-duty keypads, keyboards, and custom computer interface devices, including those that provide accessibility for those with disabilities.

“Now that’s also been picked up and sent to the cloud,” she said. “Privacy needs to be given more consideration in my view because just deploying a microphone on a kiosk with no visible or audible means of letting people know it’s always on needs to be factored into the design.”

Accessibility Protocol

The protocols and practices for implementing voice in kiosks are not addressed in any U.S. Access Board standards and the KMA with Storm have incorporated a proposed voice framework for accessibility and more.  The Access Board has these standards to consider as a baseline for when they create actual standards. In that sense KMA is setting the table for them.

ada KIOSK CERTIFICATION
The KMA guidelines for voice are our suggested best practice for self-service.

The degree to which companies mine voice data for advertising information creates its own set of privacy concerns. Because most voice user interfaces require cloud processing services, any time the voice leaves the device makes the process more susceptible to a privacy breach. 

That can also create branding issues, with potential confusion as to who exactly the kiosk represents. Is it the foodservice operator, ticker or retailer, or is it a company such as Google or Amazon?

And at the end of the day, making it easy for the average person to use will go a long way toward determining how successful VR in interactive kiosks will be.

“Voice input is the collection method, while the platform collecting the command is the brain/processing power to take the correct actions,” said Tomer Mann, EVP for Milpitas, Calif.-based software company 22Miles.

“We are moving forward with integration but there is a long way to go,” Mann said.  “We have the input command solution but the processing machine learning technology needs to improve. It will happen with a few more iterations and innovation.”

Applications Impact

One of the obvious applications for VR in self-service kiosks is for accessibility, enabling their use by those with impaired vision or limited hand mobility.

VR can also be used to create the “wow” experience business operators are looking for. Imagine, for example, the opening of the latest blockbuster superhero movie.

“Let’s say a video wall at the theater senses that someone is approaching,” said Sanjeev Varshney, director, Global SAP with Secaucus, N.J. based Cyntralabs, a developer of integrated solutions that help retailers drive sales. 

“It could display a character from the movie, who says something such as ‘what movie would you like to see?’,” he said. “The character could then point to a card reader and say ‘just insert your credit card here” and have the tickets printed out or have an SMS sent to your phone.”

“One driver for voice relates to efficient and faster transactions” said Joe Gianelli, CEO & cofounder of Santa Cruz, Calif.-based Aaware Inc., a developer of technology that enables voice interfaces.

Consider tasks that may require an excessive amount of screen navigation or drilling down, Gianelli said. Voice is usually much more efficient if the user needs to navigate beyond three levels of touch.

Of course, VR won’t be a catch-all solution. Still, VR could be part of a menu of accessibility options.

“Speech command technology will never replace the need for other interface devices because people with speech impediments won’t be able to use it, just like there are people who are blind and can’t use a touchscreen,” Shaw said. 

 “A deployer would still need to provide tactile interface devices as well as the speech command,” she said. “This needs to be seen as another element in multimodal accessibility. There’s not a one-size-fits all solution.”

The technology is at its infancy, but with further innovations and feature updates, the solutions will only be more agile to day-to-day user experiences,” Mann said. 

“Technology is getting there,” he said. “22Miles just wants to stay ahead of that innovation as we do it all other digital or content triggering capabilities.”

And when it comes to industries, some of the key applications insiders are seeing are in the ticketing and restaurant ordering fields, with initial results showing promise. Catalogue lookup in a retail setting might also be a prime candidate.

“Imagine being able to find, filter and sort any item through voice,” Carpenter said. “It would eliminate the tedious tasks of searching through pages and pages of items to find your favorites. Just tell it what you want and then be on your way.”

More Information

WHITEPAPER – VOICE RECOGNITION & SPEECH COMMAND ASSISTIVE INTERFACE

 

MASTERCARD ZIVELO VOICE ORDERING WITH AI

 

KROGER LAUNCHES VOICE ASSISTANT ORDERING FOR GROCERY ECOMMERCE

 

ALEXA SELF-ORDER VOICE COMMAND VOICE RESPONSE QSR W/ CUSTOMER & EMPLOYEE. BEACON TECH

Whitepaper – Voice Recognition & Speech Command Assistive Interface

The Use of Voice Recognition and Speech Command Technology as an Assistive Interface for ICT in Public Spaces.

A whitepaper published by Peter W Jarvis (Senior Executive VP, Storm Interface) and Nicky Shaw (Operations Manager North America).

September 2018.

Introduction.

The emergence and increasing use of smart speakers (AI) in the home environment has delivered significant benefits for those with mobility, sensory, cognitive or dexterity impairment. For millions of disabled people voice recognition and speech command technology, allied with audible confirmation and presentation of requested information, permits more informed decision making and personal control of their immediate environment.

This improved access to information and control opens a new world of communication, entertainment, education and opportunity for those who are unable to see, read or interact with content presented on a display screen and for those who lack the mobility or dexterity to manipulate tactile system interface devices (such as keyboards, trackballs or touch screens etc.). Speech Command Technology creates significant new opportunities for independent living.

This improved accessibility also creates unique challenges for system designers, legislating authorities and those concerned about privacy and misuse of personal data. As Voice Recognition and Speech Command technology moves beyond the domestic environment into public spaces and the urban infrastructure we will need new guidelines to increase public awareness and new regulation to protect the general population against the misuse of recorded information.

This whitepaper explores the implementation and integration of Speech Command technology within ICT kiosks and self-service applications. It is intended to provide a framework for a proposed Code-of-Practice. This CoP to be drafted for public consultation and possible adoption by the Kiosk Manufacturer Association (KMA) as an addendum to its Accessibility Guidelines.

To illustrate certain devices or technologies there are some references in this document to products manufactured by Storm Interface. These are intended as exemplars only. Other brands and products are available.

1. Who’s Listening

1.1 When a private citizen purchases a connected smart speaker device for home use, he/she makes an informed decision to install that device into their home environment. Before connecting their new device to the manufacturer’s cloud-based AI applications new customers are required to agree and accept many terms and conditions of service. By doing so they make a decision to accept a listening device into their home; albeit with an option to mute that device or switch it off at any time. The customer knows where the device is located, what its connected status is and how to switch it off.

1.2 However, to overcome the latency (delay) inherent in delivering cloud-based AI services to a device that has just been switched on, these devices (by default) usually remain in a powered and connected configuration. Amazon have referred to this default configuration as “Always on, always ready”. This configuration is sometimes referred to by more cynical commentators as “Always on, always listening”. The device needs to be configured in this way to operate as an effective ‘hands free’ Voice Recognition and Speech Commanded information system.

2. In a Public Environment.

2.1 Speech Command and Voice Recognition technology will provide an effective and valuable improvement in accessibility to public ICT systems. Applications such as public transport ticketing and airline check-in terminals would be typical examples.

2.2 As part of a multi-modal approach to accessibility, Speech Command will provide an additional option for those with disabilities (and those without) to confirm their biometric identity and to interface with the kiosk’s application software. The kiosk user will be able to choose from a combination of tactile, audible or visual interface devices to best meet their specific accessibility needs.

2.3 However, it will be essential that all kiosk users and those members of the public in proximity to the kiosk be made aware that the terminal includes Voice Recognition and/or Speech Command technology and that the Speech Command facility is “on and listening”. This awareness is essential for two reasons:

2.3.1 To inform the kiosk user that Speech Command / Voice Recognition technology is available for their use and convenience.

2.3.2 To warn members of the public (in proximity) that their conversations may/will be picked up by the Speech Command / Voice Recognition facility and may be transmitted to a remote server for analysis, processing and possible retention.

2.4 This awareness must be provided for members of the public who are sighted, partially sighted, non-sighted or hearing impaired.

3. A Universal Symbol

3.1 It is proposed that a universally recognized symbol for Speech Command functionality be adopted by the Kiosk and Self Service industry.

3.2 The symbol’s purpose is to indicate the presence of Voice Recognition or Speech Command technology.

3.3 Storm Interface have designed a high contrast, highly visible and tactilely discernible symbol that can be easily applied to the kiosk. During the development of this logo, Storm Interface worked closely with the UK’s Royal National Institute of Blind People (RNIB). Feedback received from the RNIB has influenced the logo design. This to aid recognition and ease of use, and to ensure that all contours and edges are rounded to make it comfortable to the touch.

3.4 As with any new logo, but in particular tactile logos, people will need to learn its meaning. This highlights the importance of introducing a standard logo which can be used across all kiosks and sectors to ensure that blind people need only learn one symbol.

3.5 When Voice Recognition or Speech Commanded services are activated the symbol will be illuminated with bright white LEDs.

3.6 The applied symbol should be positioned such that it can be easily seen or tactilely located as a user approaches or addresses the kiosk.

3.7 When the kiosk is in home screen or screen saver mode, with no detected user activity, an audible signal or statement to indicate the presence of an activated Voice Recognition or Speech Command facility should be played periodically. Alternatively, a proximity sensing device could be used to un-mute a VR or SC device only when a kiosk user approaches the kiosk interface zone.

3.7.1 Similar audible indicators of a functioning Voice Recognition or Speech Command technology should also be given when such a facility is activated (switched on or un-muted) after a period of non-functionality.

3.8 A proposed specification for the symbol is reproduced below. Storm Interface and the RNIB propose to make this symbol available as a “free-to-use” graphic device. Storm Interface propose to offer a physical, manufactured version of the graphic device, in the form of an illuminated tile, for sale to and use by kiosk manufacturers, specifiers or operators.

Figure 1: Images courtesy of Keymat Technology Ltd. All rights recognized.

Coice Recognition Symbol

4. Hardware

4.1 Microphones

4.1.1 Kiosks that offer Speech Command or Voice Recognition technology must support and provide the means for voice input.

4.1.2 This should be by provision of a suitable standard connection point for an audio headset or ear piece (equipped with its own microphone) and by provision of a suitable microphone (or microphone array) permanently installed as a fixture of the kiosk.

4.1.3 In many public kiosk locations or applications it will be necessary to employ advanced noise cancelling and beam focusing technology to enable effective operation of the Speech Command or Voice Recognition technology.

4.1.4 Connection of a headset or assistive hearing device (equipped with its own integrated microphone) should be detected by the host kiosk and the functionality of any permanently installed microphone (or microphone array) should be automatically adjusted to accommodate and allow correct functioning of the headset or hearing aid device

4.1.5 To facilitate reliable and continued functionality, provision and installation of audio device connection points and/or permanently installed microphone devices should accommodate requirements for regular sanitation (wash-down) procedures and should resist the hard use and abuse associated with ICT installations in public spaces. As a minimum requirement, water and dust resistance in accordance with IP54 (or equivalent) should be achieved. A minimum impact resistance of 10J should be achieved.

Figure 2. Beam array microphone for outdoor or unsupervised public environments. Other brands and products are available.

Beam Array Microphone

4.2 Speakers

4.2.1 Kiosks that offer Speech Command or Voice Recognition technology must support and provide the means for audible reproduction of sound or speech.

4.2.2 This should be by provision of a suitable connection point for an audio headset or earpiece and by provision of a suitable amplified speaker system permanently installed as a fixture of the kiosk.

4.2.3 In many public kiosk locations or applications it will be necessary to employ sound directing or sound focusing technology to prevent noise pollution or irritation to those in the local vicinity of the kiosk.

4.2.4 Connection of a headset or assistive hearing device (equipped with its own integrated speakers) should be detected by the host kiosk and the functionality of any permanently installed amplified speakers should be automatically adjusted to accommodate and allow correct functioning of the headset or hearing aid device.

4.2.5 Tactile discernable sound volume controls must be easily accessible to those using assistive headsets, earpieces or hearing aid devices. Tactile sound volume controls should be accessible and functioning throughout the kiosk user session. Wherever possible tactile discernible controls should be suitably shaped to enable function with headsticks or assistive easy grip styli.

Figure 3. Tactile discernable sound volume controls must be easily accessible to those using assistive headsets, earpieces or hearing aid devices and those using headsticks or easy-grip styli.

Volume Control

4.2.6 To facilitate reliable and continued functionality, provision and installation of audio device connection points and/or permanently installed amplified speakers should accommodate requirements for regular sanitation (wash-down) procedures and should resist the hard use and abuse associated with ICT installations in public spaces. A minimum requirement for water and dust resistance in accordance with IP54 (or equivalent) should be achieved. A minimum impact resistance of 10J should be achieved.

4.3 Wireless Devices

4.3.1 For those kiosk users who prefer to use wireless headsets, earbuds or implants in preference to wired devices with a cable and jack-plug connector, it should be possible to connect a personal wireless transponder (powered by a button cell battery) into the jack-plug socket. These personal devices provide encrypted communication between the transponder and a paired personal headset. The transponder would be removed and retained by the kiosk user when the kiosk session is completed.

Figure 4: Compact wireless transponder. These devices can be paired with a wireless headset or earpiece to provide a private listening capability. The transponder can be plugged directly in to the kiosk’s audio jack socket. Other brands and types of transponder are available.

wireless transponder

5. Conclusions:

The emergence of Voice Recognition as a means of biometric confirmation of identity, coinciding with the profound impact of AI on speech commanded ICT, will drive adoption of speech command technology in public spaces and applications. Whereas this presents many challenges and risks to privacy and protection of personal data, it will lead to a new era of equality in access to information, freedom and independence for those with disabilities. It will be necessary for accessibility mandates, regulation and standards to be adapted in support of this revolutionary change in the way humans interface with the digital world. Speech Command Technology creates significant new opportunities for independent living.


Copyright Peter W Jarvis 2018. All rights retained.
Contact: Peter Jarvis: peterJ@storm-interface.com
Nicky Shaw: nickys@storm-interface.com


Feedback Form for General Public and Working Group

The Kiosk Manufacturer Association (KMA) welcomes comments from any and all regarding this proposed framework.