Loquendo is a multinational computer software technology corporation, headquartered in

Torino Turin ( , Piedmontese: ; it, Torino ) is a city and an important business and cultural centre in Northern Italy. It is the capital city of Piedmont and of the Metropolitan City of Turin, and was the first Italian capital from 1861 to 1865. T ...

, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the

Telecom Italia Gruppo TIM, legally TIM S.p.A. (formerly Telecom Italia S.p.A.), also known as the TIM Group in English, is an Italian telecommunications company with headquarters in Rome, Milan, and Naples, (with the Telecom Italia Tower) which provides fixed ...

Lab (formerly, CSELT), also had offices in United Kingdom, Spain, Germany, France, and the United States. Current business products to can be found in portable and in-car navigation devices, assistive devices for the differently able,

smartphones A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...

ebook readers An e-reader, also called an e-book reader or e-book device, is a mobile electronic device that is designed primarily for the purpose of reading digital e-books and periodicals. Any device that can display text on a screen may act as an e-read ...

, talking ATMs,

computer games A personal computer game, also known as a PC game or computer game, is a type of video game played on a personal computer (PC) rather than a video game console or arcade machine. Its defining characteristics include: more diverse and user-dete ...

, voice-controlled domestic appliances and others. The voice synthesis and speech recognition systems is used in a new e-health application as part of Spain's Junta de Andalucía Government Health Services's virtual assistant. Loquendo's products have been the recipient of several awards including being a Speech Technologies Speech Engine Leader in 2007, 2008, and 2009 It was rated as 'Market Leader' by Speech Technologies in 2009 and 2010. On 30 September 2011, Nuance announced that it had acquired Loquendo.

History

Loquendo was originally a research group created in the mid-seventies by managers at IRI- STET in the CSELT laboratories in

Turin Turin ( , Piedmontese language, Piedmontese: ; it, Torino ) is a city and an important business and cultural centre in Northern Italy. It is the capital city of Piedmont and of the Metropolitan City of Turin, and was the first Italian capital ...

before becoming a company in its own right in 2001.

Speech synthesis

Building on the recommendations of the

University of Padua The University of Padua ( it, Università degli Studi di Padova, UNIPD) is an Italian university located in the city of Padua, region of Veneto, northern Italy. The University of Padua was founded in 1222 by a group of students and teachers from ...

, by applying the technique of so-called

diphone In phonetics, a diphone is an adjacent pair of phones in an utterance. For example, in aɪfəʊn the diphones are a ɪ �f ə �ʊ �n The term is usually used to refer to a recording of the transition between two phones. In the following d ...

s (the union of a consonant and a vowel, that counts 150 in total for the Italian) the voice technology group led by

Giulio Modena Giulio () is an Italian given name. Notable people with the name include: * Giulio Alberoni (1664–1752), Italian cardinal and statesman * Giulio Alenio (1582–1649), Italian Jesuit missionary and scholar * Giulio Alfieri (1924–2002), Italian ...

created the first speech synthesizer with high intelligibility able to speak (and sing) Italian in 1975. It was called MUSA (MUltichannel Speaking Automaton), which demonstrated what was possible with the technology of the time. The results achieved in those years were condensed into an audio disc at 45 rpm published in 1978, distributed in thousands of copies through the mass communication media. The auto track, after a short spoken self-presentation of the system, contained a funny Italian version of the song ''

Frère Jacques "Frère Jacques" (, ), also known in English as "Brother John", is a nursery rhyme of French origin. The rhyme is traditionally sung in a round. The song is about a friar who has overslept and is urged to wake up and sound the bell for the m ...

'' carried out in polyphony (''a cappella'') with more singing voices (MUSA could manage up to 8 synthesis channels in parallel). The evolution of this prototype, with the increase in the number of diphones (about 1000), the refinement of the tools of language analysis, and improved waveform management led to a marked improvement of the synthetic voice too. This led to the creation of the first integrated circuit of "voice synthesizer" developed internally in CSELT, which was manufactured by SGS (catalog as

Zilog Zilog, Inc. is an American manufacturer of microprocessors and 8-bit and 16-bit microcontrollers. It is also a supplier of application-specific embedded system-on-chip (SoC) products. Its most famous product is the Z80 series of 8-bit micropro ...

's Z80 microprocessor's peripheral (with the code M8950). Later in the nineties, " ELOQUENS" was born, a multi-platform software speech synthesizer aimed for various operating systems including

DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...

Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...

System 7 System 7, codenamed "Big Bang", and also known as Mac OS 7, is a graphical user interface-based operating system for Macintosh computers and is part of the classic Mac OS series of operating systems. It was introduced on May 13, 1991, by Apple C ...

Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...

OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...

) and telephone boards with very large numbers of channels, such as those used by the Italian telephone operator to build the reverse telephoner subscribers information service (used to obtain a subscriber's identity and address from their telephone number). Towards the end of the 1990s speech synthesis took on a new approach, instead of passing diphones it would use the selection and concatenation of acoustic units of variable length, an approach made possible by the increased power of computers and especially the increasing capacity of mass storage systems. This resulted in "ACTOR" – "The human sounding voice" – which began to have a large audience due to the number of telephone services and applications created by Loquendo related companies. In the year 2000, the synthesizer was released from the research labs as a commercial product, including a number of editing tools to produce synthetic audio enriched with emotions, and it was also released as an SW library for use in various products, from small portable devices such as mobile phones, navigators and palm computers, to multichannel/multilingual telephone servers for (semi)automatic call centers. The Loquendo speech synthesis has become an

internet meme An Internet meme, commonly known simply as a meme ( ), is an idea, behavior, style, or image that is spread via the Internet, often through social media platforms. What is considered a meme may vary across different communities on the Internet ...

YouTube YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second mo ...

, though it is more common in videos of the Spanish language. It is often used in creepypastas and parody dubbings (often with vulgar language).

Speech recognition

Shortly after the start of the research into speech synthesis, they began research on

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...

and at the beginning of the eighties produced the first prototype, able to recognize the ten digits and a few simple commands. Applying the

Hidden Markov models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...

in 1984 led to the development of a speech recognizer which could recognize connected words and sentences, created in collaboration with ELSAG, another company in the IRI- STET group. Even in collaboration with ELSAG, in 1986 was presented RIPAC ''(RIconoscimento PArlato Connesso)'', an early microprocessor aimed to perform recognition of the

connected speech In linguistics, connected speech or connected discourse is a continuous sequence of sounds forming utterances or conversations in spoken language. Analysis of connected speech shows sound changes affecting linguistic units traditionally described ...

. This processor had VLSI levels of integration and was composed of 70.000

transistors upright=1.4, gate (G), body (B), source (S) and drain (D) terminals. The gate is separated from the body by an insulating layer (pink). A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch ...

. The need to produce independent speech recognizer telephone applications leads to the creation of speech databases with the recorded voices of hundreds of different people and in 1987 the first large database, obtained through recording the voices of more than 1000 people calling from all over Italy with an automatic procedure, was used in the creation of a specially crafted phone server at CSELT labs. This saved material saved allowed the training of Markov models, and, by using sophisticated algorithms led to the development of "AURIS", the first commercial recognizer that could "turn" in a variety of devices with

Digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio s ...

s (DSP). In the nineties, a large cross-European collaboration began and, along with a dozen other companies and universities across Europe a very large speech database was collected throughout Europe, with the voices of more than 65000 people. This material, combined with a new mixed approach of

and

Neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...

led to "FLEXUS", the first flexible vocabulary speech recognizer, which allowed many varied telephone services to use automatic speech recognition in their human interfaces. Merging "FLEXUS" and "ACTOR" into a single system created "Dialogos", allowing the creation of cutting-edge telephone services. The birth of Loquendo as a company led to the development of many languages and the release of the recognizer in the form of library software for the creation of various telephony applications. They also introduced several systems to write state-finite grammars and natural language models systems. The speech databases recording campaigns continue having moved on from Europe to Mediterranean countries, to the South, Center and North America, and finally to countries in the Far East. Overall countless hours of speech have been recorded by contacting hundreds of thousands of people in the listed regions. The recordings have been collected both for fixed telephone networks, as well as in moving vehicles for mobile phones and also using high quality microphones in domestic environments for consumer applications such as video games, appliances, and home automation in general.

Speaker recognition

CSELTPortableMobilePhoneWithSpeechRecogniserPrototype

Research activities into speaker recognition were initiated in the early Eighties. Later, in the middle of two-thousands, speech databases tailored for this task became available. In collaboration with Politecnico of Turin they began experiments on two different fronts: speaker ''"identification"'' and ''"verification"''. The success of the research has also pushed the company to move to the development of products specifically for these tasks through the enabling platforms described below.

Speech coding

The research activities into

Speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...

started even before the ones on speech recognition and synthesis, aiming to build equipment such as

CODEC A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...

and echo canceler to be able to increase as much as possible the number of telephone conversations that can flow through a single cable (or satellite connection) without losing voice intelligibility. In the late seventies, studies and experiments led to the creation of algorithms to encode the telephonic speech signal and set-up the European regulation

CCITT The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Comm ...

known as encoding

A-law An A-law algorithm is a standard companding algorithm, used in European 8-bit PCM digital communications systems to optimize, i.e. modify, the dynamic range of an analog signal for digitizing. It is one of two versions of the G.711 standar ...

(8-bit logarithm encoding law "A" for audio signal 8 kHz band limited). This standard was then used in the

for 64 kbit/s

ISDN Integrated Services Digital Network (ISDN) is a set of communication standards for simultaneous digital transmission of voice, video, data, and other network services over the digitalised circuits of the public switched telephone network. Work ...

telephone lines. In subsequent years they built stronger codecs (used telephone exchanges) and, within the PAN-Europe consortium

GSM The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...

, the codec to use in second-generation mobile phones. At the same time they built a

to transmit high-quality signals in spite of the 8 kHz band limit of the telephone cables, which was useful for audio and video conference applications.

Enabling platforms

In the late nineties, the development of the Internet in the form known today (hypertext resident on different servers that span the planet in one big network) led to the need to make these texts available in voice over the phone. At the same time, the IVR –

Interactive Voice Response Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telecommunications, IVR allows customers to interac ...

, became increasingly popular and used hardware and software tools to quickly develop new telephony applications. It became evident that the previous development models that led to the development of complex systems such as automation of directory inquiry service or Automatic Information Service Stations were too rigid and would not easily allow the development of new applications. It was therefore felt that there was a need for enabling platforms for automatic voice telephone systems that are both scalable and easily programmable. To this end there was created a special working group to develop a

voice browser A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser interpreting Hypertext Markup Language (HTML). Dialog documents interpreted by voic ...

prototype, to be shown to the public at SMAU 2000, with the name " VoxNauta". It was such a success that

decided to close its original research labs and create Loquendo on 1 February 2001. Over the years "VoxNauta" was further developed in various scalable forms: from small servers to large enterprise systems with thousands of lines and has been installed in hundreds of companies around the world. The birth of standards to write telephone services to connect server hosting the speech technologies to servers hosting the telephone boards pushes the development of solo SW. The emergence of standards in the writing of telephone services ( VoiceXML) and protocols ( MRCP) for connecting servers hosting the speech technologies to servers hosting the telephone boards led to the creation of Speech Server software, hosting text-to-speech and speech-recognizer engines from Loquendo This continuing research and development have led Loquendo to be one of the most widely known brands in the field of synthesis and voice recognition.

The brand

The name Loquendo was devised by the wife of the founding CEO, Silvano Giorcelli, while the logo was created by the

graphic department. When displayed as an animated gif the three ripples above the "O" turn on in sequence, giving the sense of the emission of sound. The brand has not been protected by the company, there are other Italian companies whose name directly derives from Loquendo, and this has contributed to its widespread use, even at the expense of competing brands.

Sale of the company

Over the years there have been rumors of the sale of Loquendo to other companies. The most recent was in the summer of 2011, when it was announced that two multinational USA based companies, Nuance and

Avaya Avaya Holdings Corp., often shortened to Avaya (), is an American multinational technology company headquartered in Durham, North Carolina, that provides cloud communications and workstream collaboration services. The company's platform inclu ...

, were looking into the possibility of a takeover. As Nuance was a direct competitor of the Italian company there was some worry by Loquendo workers that were worried about the possible dismemberment of research and development and the disappearance from Italy of an excellent brand with forty years experience. A purchase by Avaya seemed more desirable as its activities were complementary to the activity carried on by Loquendo;

in fact did not own any speech technology and therefore could have been very interested in the possibility of in-house development rather than purchasing them from outside companies. These reports were followed with great interest by the workers, local authorities in Turin and Piedmont and the entire international scientific community. On 13 August 2011,

publicly announced the sale of its entire stake in Loquendo to Nuance for 53 million euros.(it) Luca Davi
Telecom Italia cede Loquendo al gruppo Nuance
"Il Sole 24 ORE", 14 agosto 2011

Products

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

* speaker verification *

References

Bibliography

*(it) Luigi Bonavoglia, ''"CSELT trent'anni"'', Ed. CSELT, 199

*(it) Roberto Billi (curator), with the following Authors of CSELT: Agostino Appendino, Giancario Babini, Paolo Baggia, Roberto Billi, Alfredo Biocca, Pier Giorgio Bosco, Franco Canavesio, Giuseppe Castagneri, Alberto Ciaramella, Morena Danieli, Fulvio Faraci, Luciano Fissore, Roberto Gemello, Elisabetta Gerbino, Egidio Giachin, Giorgio Micca, Roberto Montagna, Luciano Nebbia, Silvia Quazza, Daniele Roffinella, Luciano Rosboch, Claudio Rullent, Pier Luigi Salza, Stefano Sandri, ''"Tecnologie vocali per l'interazione uomo-macchina. Nuovi servizi a portata di voce"'', Ed. Telecom Lab 1995, , *(en) Pirani, Giancarlo, ed. Advanced algorithms and architectures for speech understanding. Vol. 1. Springer Science & Business Media, 2013. {{ISBN, 978-3-540-53402-0 *(it) ''Quarant'anni d'innovazione'', ed. Millennium s.r.l, (supplemento al num 224 di Media Duemila, 2005) *(it
torinowireless.it
*(it
smau.it
*(it
corriere.it
*(it
isticom.it
*(it
deputatids.it
*(it
h-care.eu
*(it) Forum P.A. 17–20 maggio 2010 – Cartella Stampa AVAYA

External links

Loquendo website
Companies based in Turin