A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel.
The elements of a dialogue system are not defined because this idea is under research, however, they are different from
chatbot
A chatbot or chatterbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent. Designed to convincingly simulate the way a human would behav ...
. The typical
GUI
The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
wizard engages in a sort of dialogue, but it includes very few of the common dialogue system components, and the dialogue state is trivial.
Background
After dialogue systems based only on written text processing starting from the early Sixties, the first ''speaking'' dialogue system was issued by the
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military.
Originally known as the Ad ...
Project in the USA in 1977. After the end of this 5-year project, some European projects issued the first dialogue system able to speak many languages (also French, German and Italian).
[Alberto Ciaramella, ''A prototype performance evaluation report'', Sundial work package 8000 (1993).] Those first systems were used in the telecom industry to provide phone various services in specific domains, e.g. automated agenda and train tables service.
Components
What sets of components are included in a dialogue system, and how those components divide up responsibilities differs from system to system. Principal to any dialogue system is the
dialogue manager, which is a component that manages the state of the dialogue, and dialogue strategy. A typical activity cycle in a dialogue system contains the following phases:
# The user speaks, and the input is converted to plain text by the system's input recogniser/decoder, which may include:
#*
automatic speech recogniser (ASR)
#*
gesture recogniser
A gesture is a form of non-verbal communication or non-vocal communication in which visible bodily actions communicate particular messages, either in place of, or in conjunction with, speech. Gestures include movement of the hands, face, or ot ...
#*
handwriting recogniser
# The text is analysed by a
natural language understanding
Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an A ...
(NLU) unit, which may include:
#*
Proper Name identification
#*
part-of-speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definitio ...
#* Syntactic/semantic
parser
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
# The semantic information is analysed by the
dialogue manager, which keeps the history and state of the dialogue and manages the general flow of the conversation.
# Usually, the dialogue manager contacts one or more task managers, that have knowledge of the specific task domain.
# The dialogue manager produces output using an output generator, which may include:
#*
natural language generator
#*
gesture generator
#*
layout manager Layout managers are software components used in widget toolkits which have the ability to lay out graphical control elements by their relative positions without using distance units. It is often more natural to define component layouts in this mann ...
# Finally, the output is rendered using an output renderer, which may include:
#*
text-to-speech
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
engine (TTS)
#*
talking head
#*
robot
A robot is a machine—especially one programmable by a computer—capable of carrying out a complex series of actions automatically. A robot can be guided by an external control device, or the control may be embedded within. Robots may ...
or
avatar
Avatar (, ; ), is a concept within Hinduism that in Sanskrit literally means "descent". It signifies the material appearance or incarnation of a powerful deity, goddess or spirit on Earth. The relative verb to "alight, to make one's appeara ...
Dialogue systems that are based on a text-only interface (e.g. text-based chat) contain only stages 2–5.
Types of systems
Dialogue systems fall into the following categories, which are listed here along a few dimensions. Many of the categories overlap and the distinctions may not be well established.
* by
modality
Modality may refer to:
Humanities
* Modality (theology), the organization and structure of the church, as distinct from sodality or parachurch organizations
* Modality (music), in music, the subject concerning certain diatonic scales
* Modaliti ...
**
text-based
In computing, text-based user interfaces (TUI) (alternately terminal user interfaces, to reflect a dependence upon the properties of computer terminals and not just text), is a retronym describing a type of user interface (UI) common as an ear ...
**
spoken dialogue system
**
graphical user interface
The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inst ...
**
multi-modal
* by device
** telephone-based systems
**
PDA systems
** in-car systems
**
robot
A robot is a machine—especially one programmable by a computer—capable of carrying out a complex series of actions automatically. A robot can be guided by an external control device, or the control may be embedded within. Robots may ...
systems
**
desktop
A desktop traditionally refers to:
* The surface of a desk (often to distinguish office appliances that fit on a desk, such as photocopiers and printers, from larger equipment covering its own area on the floor)
Desktop may refer to various compu ...
/
laptop systems
*** native
*** in-
browser systems
*** in-
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
** in-
virtual environment
A virtual environment is a networked application that allows a user to interact with both the computing environment and the work of other users. Email, chat
Chat or chats may refer to:
Communication
* Conversation, particularly casual
* Onlin ...
** robots
* by style
** command-based
**
menu
In a restaurant, the menu is a list of food and beverages offered to customers and the prices. A menu may be à la carte – which presents a list of options from which customers choose – or table d'hôte, in which case a pre-established seque ...
-driven
**
natural language
** speech graffiti
* by initiative
** system initiative
** user initiative
** mixed initiative
Natural dialogue systems
''"A Natural Dialogue System is a form of dialogue system that tries to improve usability and user satisfaction by imitating human behaviour"''
[
] (Berg, 2014). It addresses the features of a human-to-human dialogue (e.g. sub dialogues and topic changes) and aims to integrate them into dialogue systems for human-machine interaction. Often,
(spoken) dialogue systems require the user to adapt to the system because the system is only able to understand a very limited vocabulary, is not able to react to topic changes, and does not allow the user to influence the dialogue flow. Mixed-initiative is a way to enable the user to have an active part in the dialogue instead of only
answering questions. However, the mere existence of mixed-initiative is not sufficient to be classified as a natural dialogue system. Other important aspects include:
* Adaptivity of the system
* Support of implicit confirmation
* Usage of verification questions
* Possibilities to correct information that has already been given
* Over-informativeness (give more information than has been asked for)
* Support negations
* Understand references by analysing discourse and anaphora
* Natural language generation to prevent monotonous and recurring prompts
* Adaptive and situation-aware formulation
* Social behaviour (greetings, the same level of formality as the user, politeness)
* Quality of speech recognition and synthesis
Although most of these aspects are issues of many different research projects, there is a lack of tools that support the development of dialogue systems addressing these topics. Apart from
VoiceXML
VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service ...
that focuses on interactive voice response systems and is the basis for many spoken dialogue systems in industry (customer support applications) and
AIML
The All-India Muslim League (AIML) was a political party established in Dhaka in 1906 when a group of prominent Muslim politicians met the Viceroy of British India, Lord Minto, with the goal of securing Muslim interests on the Indian subcont ...
that is famous for the
A.L.I.C.E. chatbot, none of these integrate linguistic features like dialogue acts or language generation. Therefore, NADIA (a research prototype) gives an idea of how to fill that gap and combines some of the aforementioned aspects like natural language generation, adaptive formulation, and sub dialogues.
Performance
Some authors measure the dialogue system's performance in terms of the percentage of sentences completely right, by comparing the model of sentences (this measure is called ''Concept Sentence Accuracy'' or ''Sentence Understanding''
).
Applications
Dialogue systems can support a broad range of applications in business enterprises, education, government, healthcare, and entertainment. For example:
* Responding to customers' questions about products and services via a company's website or
intranet portal An intranet portal is the gateway that unifies access to enterprise information and applications on an intranet. It is a tool that helps a company manage its data, applications, and information more easily through personalized views. Some portal so ...
* Customer service agent
knowledge base
A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. ...
: Allows agents to type in a customer's question and guide them with a response
*
Guided selling
Guided selling is a process that helps potential buyers of products or services to choose the product best fulfilling their needs and hopefully guides the buyer to buy. It also helps vendors of products (e.g. brands, retailer) to actively guide the ...
: Facilitating transactions by providing answers and guidance in the sales process, particularly for complex products being sold to novice customers
*
Help desk
A help desk is a department or person that provides assistance and information usually for electronic or computer problems. In the mid-1990s, research by Iain Middleton of Robert Gordon University studied the value of an organization's help desks ...
: Responding to internal employee questions, e.g., responding to HR questions
* Website navigation: Guiding customers to relevant portions of complex websites—a Website concierge
* Technical support: Responding to technical problems, such as diagnosing a problem with a product or device
* Personalized service: Conversational agents can leverage internal and external databases to personalise interactions, such as answering questions about account balances, providing portfolio information, delivering frequent flier or membership information, for example
* Training or education: They can provide problem-solving advice while the user learns
* Simple dialogue systems are widely used to decrease the human workload in
call centers. In this and other industrial telephony applications, the functionality provided by dialogue systems is known as
interactive voice response
Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telecommunications, IVR allows customers to interac ...
or IVR.
* Support scientist in data manipulation and analysis tasks, for example in genomics.
[
]
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as
embodied agents.
Toolkits and architectures
A survey of current frameworks, languages and technologies for defining dialogue systems.
See also
*
Call avoidance Call avoidance is a strategy businesses use to reduce inbound call volumes to contact centers in the customer service industry, particularly in the consumer market.
Basis
Businesses choose call avoidance techniques because person-to-person service ...
References
Further reading
* {{Cite book , last=Will , first=Thomas , title=Creating a Dynamic Speech Dialogue , publisher=
VDM Verlag Dr. Müller
Omniscriptum Publishing Group, formerly known as VDM Verlag Dr. Müller, is a German publishing group headquartered in Riga, Latvia. Founded in 2002 in Düsseldorf, its book production is based on print-to-order technology.
The company publis ...
, year=2007 , isbn=978-3-8364-4990-8
Multimodal interaction
User interfaces
Systems engineering