A voice-user interface (VUI) makes spoken human interaction with computers possible, using
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
to understand spoken commands and
answer questions, and typically
text to speech
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
to play a reply. A voice command device is a device controlled with a voice user interface.
Voice user interfaces have been added to
automobile
A car or automobile is a motor vehicle with Wheel, wheels. Most definitions of ''cars'' say that they run primarily on roads, Car seat, seat one to eight people, have four wheels, and mainly transport private transport#Personal transport, pe ...
s,
home automation
Home automation or domotics is building automation for a home, called a smart home or smart house. A home automation system will monitor and/or control home attributes such as lighting, climate, entertainment systems, and appliances. It m ...
systems, computer
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s,
home appliance
A home appliance, also referred to as a domestic appliance, an electric appliance or a household appliance, is a machine which assists in household functions such as cooking, cleaning and food preservation.
Appliances are divided into three ty ...
s like
washing machine
A washing machine (laundry machine, clothes washer, washer, or simply wash) is a home appliance used to wash laundry. The term is mostly applied to machines that use water as opposed to dry cleaning (which uses alternative cleaning fluids and ...
s and
microwave oven
A microwave oven (commonly referred to as a microwave) is an electric oven that heats and cooks food by exposing it to electromagnetic radiation in the microwave frequency range. This induces polar molecules in the food to rotate and produce t ...
s, and television
remote control
In electronics, a remote control (also known as a remote or clicker) is an electronic device used to operate another device from a distance, usually wirelessly. In consumer electronics, a remote control can be used to operate devices such as ...
s. They are the primary way of interacting with
virtual assistant
An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions. The term "chatbot" is sometimes used to refer to virtual ...
s on
smartphones
A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, which ...
and
smart speaker
A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "hot word" (or several "hot words"). Some smart speakers can al ...
s. Older
automated attendant
In telephony, an automated attendant (also auto attendant, auto-attendant, autoattendant, automatic phone menus, AA, or virtual receptionist) allows callers to be automatically transferred to an extension without the intervention of an operator/ ...
s (which route phone calls to the correct extension) and
interactive voice response
Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telecommunications, IVR allows customers to interact ...
systems (which conduct more complicated transactions over the phone) can respond to the pressing of keypad buttons via
DTMF
Dual-tone multi-frequency signaling (DTMF) is a telecommunication signaling system using the voice-frequency band over telephone lines between telephone equipment and other communications devices and switching centers. DTMF was first developed ...
tones, but those with a full voice user interface allow callers to speak requests and responses without having to press any buttons.
Newer voice command devices are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences. They are also capable of responding to several commands at once, separating vocal messages, and providing appropriate
feedback
Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can then be said to ''feed back'' into itself. The notion of cause-and-effect has to be handled ...
, accurately imitating a natural conversation.
Overview
A VUI is the
interface
Interface or interfacing may refer to:
Academic journals
* ''Interface'' (journal), by the Electrochemical Society
* ''Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics''
* '' Inte ...
to any speech application. Controlling a machine by simply talking to it was
science fiction
Science fiction (sometimes shortened to Sci-Fi or SF) is a genre of speculative fiction which typically deals with imaginative and futuristic concepts such as advanced science and technology, space exploration, time travel, parallel unive ...
only a short time ago. Until recently, this area was considered to be
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
. However, advances in technologies like text-to-speech, speech-to-text, Natural Language Processing, and cloud services, in general, contributed to the mass adoption of these types of interfaces. VUIs have become more commonplace, and people are taking advantage of the value that these
hands-free
Handsfree is an adjective describing equipment that can be used without the use of hands (for example via voice commands) or, in a wider sense, equipment which needs only limited use of hands, or for which the controls are positioned so that the ...
, eyes-free interfaces provide in many situations.
VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
,
linguistics
Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
and human factors
psychology
Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
– all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.
A VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of
power users
A power user is a user of computers, software and other electronic devices, who uses advanced features of computer hardware, operating systems, programs, or websites which are not used by the average user. A power user might not have extensive tech ...
(including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative
dialogs", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.
Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.
History
Early applications for VUI included voice-activated
dialing of phones, either directly or through a (typically
Bluetooth
Bluetooth is a short-range wireless technology standard that is used for exchanging data between fixed and mobile devices over short distances and building personal area networks (PANs). In the most widely used mode, transmission power is limi ...
) headset or vehicle audio system.
In 2007, a
CNN
CNN (Cable News Network) is a multinational cable news channel headquartered in Atlanta, Georgia, U.S. Founded in 1980 by American media proprietor Ted Turner and Reese Schonfeld as a 24-hour cable news channel, and presently owned by the M ...
business article reported that voice command was over a billion dollar industry and that companies like Google and
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
were trying to create speech recognition features. It has been years since the article was published, and since then the world has witnessed a variety of voice command devices. In addition, Google created a speech recognition engine called Pico TTS and Apple has released Siri. Voice command devices are becoming more widely available, and innovative ways for using the human voice are always being created. For example, Business Week suggests that the future remote controller is going to be the human voice. Currently
Xbox Live
The Xbox network, formerly and still sometimes branded as Xbox Live, is an Internet, online multiplayer video game, multiplayer gaming and digital media delivery service created and operated by Microsoft. It was first made available to the Xbox ...
allows such features and
Jobs Jobs may refer to:
* Job, an activity that people do for regular income gain
People
* Steve Jobs (1955–2011), co-founder and former CEO of Apple Inc
** Steve Jobs (disambiguation)
* Laurene Powell Jobs (born 1963), widow of Steve Jobs
* Lisa ...
hinted at such a feature on the new
Apple TV
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
.
Voice command software products on computing devices
Both Apple
Mac and
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
PC provide built in speech recognition features for their latest
operating systems
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also inc ...
.
Microsoft Windows
Two Microsoft operating systems,
Windows 7
Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was released to manufacturing on July 22, 2009, and became generally available on October 22, 2009. It is the successor to Windows Vista, released nearly ...
and
Windows Vista
Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
, provide speech recognition capabilities. Microsoft integrated voice commands into their operating systems to provide a mechanism for people who want to limit their use of the mouse and keyboard, but still want to maintain or increase their overall productivity.
Windows Vista
With Windows Vista voice control, a user may dictate documents and emails in mainstream applications, start and switch between applications, control the operating system, format documents, save documents, edit files, efficiently correct errors, and fill out forms on the
Web
Web most often refers to:
* Spider web, a silken structure created by the animal
* World Wide Web or the Web, an Internet-based hypertext system
Web, WEB, or the Web may also refer to:
Computing
* WEB, a literate programming system created by ...
. The speech recognition software learns automatically every time a user uses it, and speech recognition is available in English (U.S.), English (U.K.), German (Germany), French (France), Spanish (Spain), Japanese, Chinese (Traditional), and Chinese (Simplified). In addition, the software comes with an interactive tutorial, which can be used to train both the user and the speech recognition engine.
Windows 7
In addition to all the features provided in Windows Vista, Windows 7 provides a wizard for setting up the microphone and a tutorial on how to use the feature.
Mac OS X
All
Mac OS X
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
computers come pre-installed with the speech recognition software. The software is user-independent, and it allows for a user to, "navigate menus and enter keyboard shortcuts; speak checkbox names, radio button names, list items, and button names; and open, close, control, and switch among applications."
However, the Apple website recommends a user buy a commercial product called
Dictate.
Commercial products
If a user is not satisfied with the built in speech recognition software or a user does not have a built speech recognition software for their OS, then a user may experiment with a commercial product such as
Braina Pro or
DragonNaturallySpeaking for Windows PCs,
and Dictate, the name of the same software for Mac OS.
Voice command mobile devices
Any mobile device running Android OS, Microsoft Windows Phone, iOS 9 or later, or Blackberry OS provides voice command capabilities. In addition to the built speech recognition software for each mobile phone's operating system, a user may download third party voice command applications from each operating system's application store:
Apple App store
The App Store is an app store platform, developed and maintained by Apple Inc., for mobile apps on its iOS and iPadOS operating systems. The store allows users to browse and download approved apps developed within Apple's iOS Software Deve ...
,
Google Play
Google Play, also known as the Google Play Store and formerly the Android Market, is a digital distribution service operated and developed by Google. It serves as the official app store for certified devices running on the Android (operating sys ...
,
Windows Phone Marketplace
Windows Phone Store (originally known as Windows Phone Marketplace) was an app store platform, developed by Microsoft corporation for Windows Phone letting users installing various apps on their device. It initially launched with Windows Phone ...
(initially
Windows Marketplace for Mobile
Windows Marketplace for Mobile was a service by Microsoft for its Windows Mobile platform that allowed users to browse and download applications that had been developed by third parties. The service was available for use directly on Windows Mobil ...
), or
BlackBerry App World
BlackBerry World (previously known as BlackBerry App World) is an application distribution service, aka an 'app store', and application by BlackBerry Limited; for BlackBerry 10 devices, the BlackBerry PlayBook, and a majority of BlackBerry OS devi ...
.
Android OS
Google has developed an open source operating system called
Android, which allows a user to perform voice commands such as: send text messages, listen to music, get directions, call businesses, call contacts, send email, view a map, go to websites, write a note, and search Google.
The speech recognition software is available for all devices since
Android 2.2 "Froyo", but the settings must be set to English.
Google allows for the user to change the language, and the user is prompted when he or she first uses the speech recognition feature if he or she would like their voice data to be attached to their Google account. If a user decides to opt into this service, it allows Google to train the software to the user's voice.
Google introduced the
Google Assistant
Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations, unlike t ...
with
Android 7.0 "Nougat". It is much more advanced than the older version.
Amazon.com
Amazon.com, Inc. ( ) is an American multinational technology company focusing on e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. It has been referred to as "one of the most influential economi ...
has the
Echo
In audio signal processing and acoustics, an echo is a reflection of sound that arrives at the listener with a delay after the direct sound. The delay is directly proportional to the distance of the reflecting surface from the source and the list ...
that uses Amazon's custom version of Android to provide a voice interface.
Microsoft Windows
Windows Phone
Windows Phone (WP) is a discontinued family of mobile operating systems developed by Microsoft for smartphones as the replacement successor to Windows Mobile and Zune. Windows Phone featured a new user interface derived from the Metro design la ...
is
Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
's mobile device's operating system. On Windows Phone 7.5, the speech app is user independent and can be used to: call someone from your contact list, call any phone number, redial the last number, send a text message, call your voice mail, open an application, read appointments, query phone status, and search the web.
In addition, speech can also be used during a phone call, and the following actions are possible during a phone call: press a number, turn the speaker phone on, or call someone, which puts the current call on hold.
[
Windows 10 introduces Cortana, a voice control system that replaces the formerly used voice control on Windows phones.
]
iOS
Apple added Voice Control
A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device con ...
to its family of iOS devices as a new feature of iPhone OS 3. The iPhone 4S
The iPhone 4S (originally styled as iPhone 4 S, retroactively stylized with a lowercase 's' as iPhone 4s as of September 2013) is a smartphone that was designed and marketed by Apple Inc. It is the List of iOS devices, fifth generation o ...
, iPad 3
The iPad (3rd generation) (marketed as The new iPad, colloquially referred to as the iPad 3) is a tablet computer, developed and marketed by Apple Inc. The third device in the iPad line of tablets, it added a Retina Display, the new Apple A5X ch ...
, iPad Mini 1G, iPad Air
The iPad is a brand of iOS and iPadOS-based tablet computers that are developed by Apple Inc. The iPad was conceived before the related iPhone but the iPhone was developed and released first. Speculation about the development, operating s ...
, iPad Pro 1G, iPod Touch 5G and later, all come with a more advanced voice assistant called Siri
Siri ( ) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questio ...
. Voice Control can still be enabled through the Settings menu of newer devices. Siri is a user independent built-in speech recognition feature that allows a user to issue voice commands. With the assistance of Siri a user may issue commands like, send a text message, check the weather, set a reminder, find information, schedule meetings, send an email, find a contact, set an alarm, get directions, track your stocks, set a timer, and ask for examples of sample voice command queries. In addition, Siri works with Bluetooth
Bluetooth is a short-range wireless technology standard that is used for exchanging data between fixed and mobile devices over short distances and building personal area networks (PANs). In the most widely used mode, transmission power is limi ...
and wired headphones.
Amazon Alexa
In 2014 Amazon introduced the Alexa smart home device. Its main purpose was just a smart speaker, that allowed the consumer to control the device with their voice. Eventually, it turned into a novelty device that had the ability to control home appliance with voice. Now almost all the appliances are controllable with Alexa, including light bulbs and temperature. By allowing voice control, Alexa can connect to smart home technology allowing you to lock your house, control the temperature, and activate various devices. This form of A.I allows for someone to simply ask it a question, and in response the Alexa searches for, finds, and recites the answer back to you.
Speech recognition in cars
As car technology improves, more features will be added to cars and these features will most likely distract a driver. Voice commands for cars, according to CNET
''CNET'' (short for "Computer Network") is an American media website that publishes reviews, news, articles, blogs, podcasts, and videos on technology and consumer electronics globally. ''CNET'' originally produced content for radio and televi ...
, should allow a driver to issue commands and not be distracted. CNET stated that Nuance was suggesting that in the future they would create a software that resembled Siri, but for cars. Most speech recognition software on the market in 2011 had only about 50 to 60 voice commands, but Ford Sync had 10,000. However, CNET suggested that even 10,000 voice commands was not sufficient given the complexity and the variety of tasks a user may want to do while driving. Voice command for cars is different from voice command for mobile phones and for computers because a driver may use the feature to look for nearby restaurants, look for gas, driving directions, road conditions, and the location of the nearest hotel. Currently, technology allows a driver to issue voice commands on both a portable GPS
The Global Positioning System (GPS), originally Navstar GPS, is a Radionavigation-satellite service, satellite-based radionavigation system owned by the United States government and operated by the United States Space Force. It is one of t ...
like a Garmin
Garmin Ltd. (shortened to Garmin, stylized as GARMIN, and formerly known as ProNav) is an American, Swiss-domiciled multinational technology company founded in 1989 by Gary Burrell and Min Kao in Lenexa, Kansas, United States, with headquart ...
and a car manufacturer navigation system.
List of Voice Command Systems Provided By Motor Manufacturers:
* Ford Sync
Ford Sync (stylized Ford SYNC) is a factory-installed, integrated in-vehicle communications and entertainment system that allows users to make hands-free telephone calls, control music and perform other functions with the use of voice commands. ...
* Lexus Voice Command
* Chrysler UConnect
* Honda Accord
The , also known as the in Japan and China for certain generations, is a series of cars manufactured by Honda since 1976, best known for its four-door sedan variant, which has been one of the best-selling cars in the United States since 1989. ...
* GM IntelliLink
* BMW
* Mercedes
* Pioneer
* Harman
* Hyundai
Non-verbal input
While most voice user interfaces are designed to support interaction through spoken human language, there have also been recent explorations in designing interfaces take non-verbal human sounds as input. In these systems, the user controls the interface by emitting non-speech sounds such as humming, whistling, or blowing into a microphone.
One such example of a non-verbal voice user interface is Blendie, an interactive art installation created by Kelly Dobson. The piece comprised a classic 1950s-era blender which was retrofitted to respond to microphone input. To control the blender, the user must mimic the whirring mechanical sounds that a blender typically makes: the blender will spin slowly in response to a user’s low-pitched growl, and increase in speed as the user makes higher-pitched vocal sounds.
Another example is VoiceDraw, a research system that enables digital drawing for individuals with limited motor abilities. VoiceDraw allows users to “paint” strokes on a digital canvas by modulating vowel sounds, which are mapped to brush directions. Modulating other paralinguistic features (e.g. the loudness of their voice) allows the user to control different features of the drawing, such as the thickness of the brush stroke.
Other approaches include adopting non-verbal sounds to augment touch-based interfaces (e.g. on a mobile phone) to support new types of gestures that wouldn’t be possible with finger input alone.
Design challenges
Voice interfaces pose a substantial number of challenges for usability. In contrast to graphical user interfaces (GUIs), best practices for voice interface design are still emergent.
Discoverability
With purely audio-based interaction, voice user interfaces tend to suffer from low discoverability Discoverability is the degree to which something, especially a piece of content or information, can be found in a search of a file, database, or other information system. Discoverability is a concern in library and information science, many aspects ...
: it is difficult for users to understand the scope of a system’s capabilities. In order for the system to convey what is possible without a visual display, it would need to enumerate the available options, which can become tedious or infeasible. Low discoverability often results in users reporting confusion over what they are “allowed” to say, or a mismatch in expectations about the breadth of a system’s understanding.
Transcription
While speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
technology has improved considerably in recent years, voice user interfaces still suffer from parsing or transcription errors in which a user’s speech is not interpreted correctly. These errors tend to be especially prevalent when the speech content uses technical vocabulary (e.g. medical terminology) or unconventional spellings such as musical artist or song names.
Understanding
Effective system design to maximize conversational understanding remains an open area of research. Voice user interfaces that interpret and manage conversational state are challenging to design due to the inherent difficulty of integrating complex natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
tasks like coreference resolution
In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in ''Bill said Alice would arrive soon, and she did'', the words ''Alice'' ...
, named-entity recognition
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
, information retrieval
Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
, and dialog management. Most voice assistants today are capable of executing single commands very well but limited in their ability to manage dialogue beyond a narrow task or a couple turns in a conversation.
Future uses
Pocket-size devices, such as PDA
PDA may refer to:
Science and technology
* Patron-driven acquisition, a mechanism for libraries to purchase books
*Personal digital assistant, a mobile device
* Photodiode array, a type of detector
* Polydiacetylenes, a family of conducting poly ...
s or mobile phone
A mobile phone, cellular phone, cell phone, cellphone, handphone, hand phone or pocket phone, sometimes shortened to simply mobile, cell, or just phone, is a portable telephone that can make and receive calls over a radio frequency link whil ...
s, currently rely on small buttons for user input. These are either built into the device or are part of a touch-screen interface, such as that of the Apple iPod Touch
The iPod Touch (stylized as iPod touch) is a discontinued line of iOS-based mobile devices designed and marketed by Apple Inc. with a touchscreen-controlled user interface. As with other iPod models, the iPod Touch can be used as a music pl ...
and iPhone Siri Application. Extensive button-pressing on devices with such small buttons can be tedious and inaccurate, so an easy-to-use, accurate, and reliable VUI would potentially be a major breakthrough in the ease of their use. Nonetheless, such a VUI would also benefit users of laptop- and desktop-sized computers, as well, as it would solve numerous problems currently associated with keyboard and mouse
A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
use, including repetitive-strain injuries such as carpal tunnel syndrome
Carpal tunnel syndrome (CTS) is the collection of symptoms and signs associated with median neuropathy at the carpal tunnel. Most CTS is related to idiopathic compression of the median nerve as it travels through the wrist at the carpal tunn ...
, the challenges of navigating and inputting text within digital interfaces by the visually impaired, and slow typing speed on the part of inexperienced keyboard users. Moreover, keyboard use typically entails either sitting or standing stationary in front of the connected display; by contrast, a VUI would free the user to be far more mobile, as speech input eliminates the need to look at a keyboard.
Such developments could literally change the face of current machines and have far-reaching implications on how users interact with them. Hand-held devices would be designed with larger, easier-to-view screens, as no keyboard would be required. Touch-screen devices would no longer need to split the display between content and an on-screen keyboard, thus providing full-screen viewing of the content. Laptop computers could essentially be cut in half in terms of size, as the keyboard half would be eliminated and all internal components would be integrated behind the display, effectively resulting in a simple tablet computer
A tablet computer, commonly shortened to tablet, is a mobile device, typically with a mobile operating system and touchscreen display processing circuitry, and a rechargeable battery in a single, thin and flat package. Tablets, being comput ...
. Desktop computers would consist of a CPU and screen, saving desktop space otherwise occupied by the keyboard and eliminating sliding keyboard rests built under the desk's surface. Television remote control
In electronics, a remote control (also known as a remote or clicker) is an electronic device used to operate another device from a distance, usually wirelessly. In consumer electronics, a remote control can be used to operate devices such as ...
s and keypads on dozens of other devices, from microwave ovens to photocopiers, could also be eliminated.
Numerous challenges would have to be overcome, however, for such developments to occur. First, the VUI would have to be sophisticated enough to distinguish between input, such as commands, and background conversation; otherwise, false input would be registered and the connected device would behave erratically. A standard prompt, such as the famous "Computer!" call by characters in science fiction TV shows and films such as ''Star Trek
''Star Trek'' is an American science fiction media franchise created by Gene Roddenberry, which began with the eponymous 1960s television series and quickly became a worldwide pop-culture phenomenon. The franchise has expanded into vari ...
'', could activate the VUI and prepare it to receive further input by the same speaker. Conceivably, the VUI could also include a human-like representation: a voice or even an on-screen character, for instance, that responds back (e.g., "Yes, Vamshi?") and continues to communicate back and forth with the user in order to clarify the input received and ensure accuracy.
Second, the VUI would have to work in concert with highly sophisticated software in order to accurately process and find/retrieve information or carry out an action as per the particular user's preferences. For instance, if Samantha prefers information from a particular newspaper, and if she prefers that the information be summarized in point-form, she might say, "Computer, find me some information about the flooding in southern China last night"; in response, the VUI that is familiar with her preferences would "find" facts about "flooding" in "southern China" from that source, convert it into point-form, and deliver it to her on screen and/or in voice form, complete with a citation. Therefore, accurate speech-recognition software, along with some degree of artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
on the part of the machine associated with the VUI, would be required.
Privacy implications
Privacy concerns are raised by the fact that voice commands are available to the providers of voice-user interfaces in unencrypted form, and can thus be shared with third parties and be processed in an unauthorized or unexpected manner. Additionally to the linguistic content of recorded speech, a user’s manner of expression and voice characteristics can implicitly contain information about his or her biometric identity, personality traits, body shape, physical and mental health condition, sex, gender, moods and emotions, socioeconomic status and geographical origin.
See also
* Speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
* Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
* List of speech recognition software
Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.
Acoustic models and speech corpus (compilation)
The following l ...
* Natural-language user interface Natural-language user interface (LUI or NLUI) is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.
In interface d ...
* User interface design
User interface (UI) design or user interface engineering is the design of user interfaces for machines and software, such as computers, home appliances, mobile devices, and other electronic devices, with the focus on maximizing usability and the ...
* Voice browser A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser interpreting Hypertext Markup Language (HTML). Dialog documents interpreted by voice br ...
* Voice command
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ma ...
* Speech recognition in Linux
As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distingui ...
* Linguatronic
Cockpit Management and Data system - COMAND for short - acts as a combined command and control centre for all Sound, audio, telematics and telecommunications functions on Mercedes-Benz vehicles and includes a dedicated flat display screen. In addi ...
* Home automation
Home automation or domotics is building automation for a home, called a smart home or smart house. A home automation system will monitor and/or control home attributes such as lighting, climate, entertainment systems, and appliances. It m ...
* Voice computing
Voice computing is the discipline that develops hardware or software to process voice inputs.
It spans many other fields including human-computer interaction, conversational computing, linguistics, natural language processing, automatic speech ...
References
External links
Voice Interfaces: Assessing the Potential
by Jakob Nielsen
The Rise of Voice: A Timeline
Voice First Glossary of Terms
Voice First A Reading List
{{Virtual assistants
User interface techniques
Voice technology
Speech recognition
History of human–computer interaction