VoiceXML

	VoiceXML VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server. VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN). The VoiceXML document format is based on Extensible Markup Language (XML). It is a standard developed by the World Wide Web Consortium (W3C). Usage VoiceXML applications are commonly used in many industries and segments of commerce. These applications include order inquiry, package tracking, driving directions, emergency notification, wake-up, flight tracking, voice ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	SCXML SCXML stands for State Chart XML: State Machine Notation for Control Abstraction. It is an XML-based markup language that provides a generic state-machine-based execution environment based on Harel statecharts. SCXML is able to describe complex finite state machines. For example, it is possible to describe notations such as sub-states, parallel states, synchronization, or concurrency, in SCXML. Goals The objective of this standard is to genericize state diagram notations that are already used in other XML contexts. For example, it is expected that SCXML notations will replace the State machines notations used in the next CCXML 2.0 version (an XML standard designed to provide telephony support to VoiceXML). It could also be used as a multimodal control language in the Multimodal Interaction Activity. One of the goals of this language is to make sure that the language is compatible with CCXML and that there is an easy path for existing CCXML scripts to be converted to SCXML wi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Voice Browser A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser interpreting Hypertext Markup Language (HTML). Dialog documents interpreted by voice browser are often encoded in standards-based markup languages, such as Voice Dialog Extensible Markup Language (VoiceXML), a standard by the World Wide Web Consortium. A voice browser presents information aurally, using pre-recorded audio file playback or text-to-speech synthesis software. A voice browser obtains information using speech recognition and keypad entry, such as DTMF detection. As speech recognition and web technologies have matured, voice applications are deployed commercially in many industries and voice browsers are supplanting traditional proprietary interactive voice response (IVR) systems. Voice browser software is delivered in a variety of implementations models. Systems that present a voice browser to a user, typic ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Call Control EXtensible Markup Language Call Control eXtensible Markup Language (CCXML) is an XML standard designed to provide asynchronous event-based telephony support to VoiceXML. Its current status is a W3C recommendation, adopted May 10, 2011. Whereas VoiceXML is designed to provide a Voice User Interface to a voice browser, CCXML is designed to inform the voice browser how to handle the telephony control of the voice channel. The two XML applications are wholly separate and are not required by each other to be implemented - however, they have been designed with interoperability in mind Status and Future CCXML 1.0 has reached the status of a Proposed Recommendation. The transition from Candidate Recommendation to Proposed Recommendation took 1 year, while the transition from Last Call Working Draft to Candidate Recommendation took just over 3 years. As CCXML uses extensively the concepts of events and transitions, it is expected that the state machines used in the next CCXML 2.0 version will take advantage of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Interactive Voice Response Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telecommunications, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed. IVR systems deployed in the network are sized to handle large call volumes and also used for outbound calling as IVR systems are more intelligent than many predictive dialer systems. IVR systems can be used standing alone to create self-service solutions for mobile purchases, banking payments, services, retail orders, utilities, travel information and weather conditions. In combination with systems such an automated attendant and ACD, call routing can be optimized for a better ca ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speech Recognition Grammar Specification Speech Recognition Grammar Specification (SRGS) is a W3C standard for how ''speech recognition grammars'' are specified. A speech recognition grammar is a set of word patterns, and tells a speech recognition system what to expect a human to say. For instance, if you call an auto-attendant application, it will prompt you for the name of a person (with the expectation that your call will be transferred to that person's phone). It will then start up a speech recognizer, giving it a speech recognition grammar. This grammar contains the names of the people in the auto attendant's directory and a collection of sentence patterns that are the typical responses from callers to the prompt. SRGS specifies two alternate but equivalent syntaxes, one based on XML, and one using augmented BNF format. In practice, the XML syntax is used more frequently. Both the ABNF and XML form have the expressive power of a context-free grammar. A grammar processor that does not support recursive grammars ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Pronunciation Lexicon Specification The Pronunciation Lexicon Specification (PLS) is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use. The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification SRGS and the Speech Synthesis Markup Language SSML. Usage Here is an example PLS document: judgment judgement ˈdʒʌdʒ.mənt fiancé fiance fiˈɒns.eɪ ˌfiː.ɑːnˈseɪ ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Semantic Interpretation For Speech Recognition Semantic Interpretation for Speech Recognition (SISR) defines the syntax and semantics of annotations to grammar rules in the Speech Recognition Grammar Specification (SRGS). Since 5 April 2007, it is a World Wide Web Consortium recommendation. By building upon SRGS grammars, it allows voice browsers via ECMAScript to semantically interpret complex grammars and provide the information back to the application. For example, it allows utterances like "I would like a Coca-cola and three large pizzas with pepperoni and mushrooms." to be interpreted into an object that can be understood by an application. For example, the utterance could produce the following object named : If used against this grammar that includes SISR markup in addition to the standard SRGS grammar in XML format: I would like a out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize; and out.pizza=rules.pizz ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speech Synthesis Markup Language Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's Voice Browser Working Group. SSML is often embedded in VoiceXML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. For desktop applications, other markup languages are popular, including Apple's embedded speech commands, and Microsoft's SAPI Text to speech (TTS) markup, also an XML language. It is also used to produce sounds via Azure Cognitive Services' Text to Speech API or when writing third-party skills for Google Assistant or Amazon Alexa. SSML is based on the Java Speech Markup Language (JSML) developed by Sun Microsystems, although the current recommendation was developed mostly by speech synthesis vendors. It covers virtually all aspects of synthesis, although some areas have been left unspecified, so each vendor accepts a different variant of the language. Also, in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Speech Synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output. The quality of a speech synthesizer is judged by its similarity to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speech Recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent". Speech recognition ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	MSML The Media Server Markup Language (MSML) is used to control and invoke many different types of services on IP Media Servers and is described in RFC 5707. Clients can use it to define how multimedia sessions interact on a Media Server and to apply services to individuals or groups of users. MSML can be used, for example, to control Media Server conferencing features such as video layout and audio mixing, create sidebar conferences or personal mixes, and set the properties of media streams. As well, clients can use MSML to define media processing dialogs, which may be used as parts of application interactions with users or conferences. Transformation of media streams to and from users or conferences as well as IVR dialogs are examples of such interactions, which are specified using MSML. MSML clients may also invoke dialogs with individual users or with groups of conference participants using VoiceXML. The fundamental model with MSML is that the Media Server is an appliance that is s ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the World Wide Web. , W3C had 459 members. W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web. History The World Wide Web Consortium (W3C) was founded in 1994 by Tim Berners-Lee after he left the European Organization for Nuclear Research (CERN) in October 1994. It was founded at the Massachusetts Institute of Technology (MIT) Laboratory for Computer Science with support from the European Commission, and the Defense Advanced Research Projects Agency, which had pioneered the ARPANET, one of the predecessors to the Internet. It was located in Technology Square until 2004, when it moved, with the MIT Computer Science and Artificial ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]