Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source ( larynx in mammals; syrinx in birds) is filtered. In birds it consists of the trachea, the syrinx, the oral cavity, the upper part of th ...

and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the

tongue The tongue is a muscular organ in the mouth of a typical tetrapod. It manipulates food for mastication and swallowing as part of the digestive process, and is the primary organ of taste. The tongue's upper surface (dorsum) is covered by taste ...

jaw The jaw is any opposable articulated structure at the entrance of the mouth, typically used for grasping and manipulating food. The term ''jaws'' is also broadly applied to the whole of the structures constituting the vault of the mouth and serv ...

, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.

Mechanical talking heads

There is a long history of attempts to build mechanical "

talking heads Talking Heads were an American rock band formed in 1975 in New York City and active until 1991.Talki ...

". Gerbert (d. 1003),

Albertus Magnus Albertus Magnus (c. 1200 – 15 November 1280), also known as Saint Albert the Great or Albert of Cologne, was a German Dominican friar, philosopher, scientist, and bishop. Later canonised as a Catholic saint, he was known during his li ...

(1198–1280) and Roger Bacon (1214–1294) are all said to have built speaking heads ( Wheatstone 1837). However, historically confirmed speech synthesis begins with Wolfgang von Kempelen (1734–1804), who published an account of his research in 1791 (see also ).

Electrical vocal tract analogs

The first electrical vocal tract analogs were static, like those of Dunn (1950), Ken Stevens and colleagues (1953),

Gunnar Fant Carl Gunnar Michael Fant (October 8, 1919 – June 6, 2009) was a leading researcher in speech science in general and speech synthesis in particular who spent most of his career as a professor at the Swedish Royal Institute of Technology (KTH) in ...

(1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an

analog computer An analog computer or analogue computer is a type of computer that uses the continuous variation aspect of physical phenomena such as electrical, mechanical, or hydraulic quantities (''analog signals'') to model the problem being solved. In ...

simulation.

Haskins and Maeda models

The first software articulatory synthesizer regularly used for laboratory experiments was developed at

Haskins Laboratories Haskins Laboratories, Inc. is an independent 501(c) non-profit corporation, founded in 1935 and located in New Haven, Connecticut, since 1970. Haskins has formal affiliation agreements with both Yale University and the University of Connecticut; ...

in the mid-1970s by

Philip Rubin Philip E. Rubin (born May 22, 1949) is an American cognitive scientist, technologist, and science administrator known for raising the visibility of behavioral and cognitive science, neuroscience, and ethical issues related to science, techn ...

, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY, was a computational model of speech production based on vocal tract models developed at

Bell Laboratories Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...

in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control

shape.

Modern models

Recent progress in speech production imaging, articulatory control modeling, and tongue biomechanics modeling has led to changes in the way articulatory synthesis is performe

Examples include the Haskins CASY model (Configurable Articulatory Synthesis), designed by

, Mark Tied

and Louis Goldstei

which matches midsagittal vocal tracts to actual magnetic resonance imaging (MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olov Engwall. A geometrically based 3D articulatory speech synthesizer has been developed by Peter Birkholz (VocalTractLab). The Directions Into Velocities of Articulators (DIVA) model, a feedforward control approach which takes the neural computations underlying speech production into consideration, was developed by Frank H. Guenther at

Boston University Boston University (BU) is a Private university, private research university in Boston, Massachusetts. The university is nonsectarian, but has a historical affiliation with the United Methodist Church. It was founded in 1839 by Methodists with ...

. The ArtiSynth project, headed by Sidney Fel

at the

University of British Columbia The University of British Columbia (UBC) is a public research university with campuses near Vancouver and in Kelowna, British Columbia. Established in 1908, it is British Columbia's oldest university. The university ranks among the top thre ...

, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the

has been pioneered by a number of scientists, including Reiner Wilhelms-Tricaric

Yohan Paya

and Jean-Michel Gerar

Jianwu Dang and Kiyoshi Hond

Commercial models

One of the few commercial articulatory speech synthesis systems is the

NeXT Next may refer to: Arts and entertainment Film * ''Next'' (1990 film), an animated short about William Shakespeare * ''Next'' (2007 film), a sci-fi film starring Nicolas Cage * '' Next: A Primer on Urban Painting'', a 2005 documentary film Lit ...

-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the

University of Calgary The University of Calgary (U of C or UCalgary) is a public research university located in Calgary, Alberta, Canada. The University of Calgary started in 1944 as the Calgary branch of the University of Alberta, founded in 1908, prior to being ins ...

, where much of the original research was conducted. Following the demise of the various incarnations of

(started by Steve Jobs in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under a GNU General Public Licence, with work continuing as gnuspeech. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model".Real-time articulatory speech-synthesis-by-rules
/ref>

Footnotes

Bibliography

* Baxter, Brent, and William J. Strong. (1969). WINDBAG—a vocal-tract analog speech synthesizer. ''Journal of the Acoustical Society of America'', 45, 309(A). * Birkholz P, Jackel D, Kröger BJ (2007) Simulation of losses due to turbulence in the time-varying vocal system. ''IEEE Transactions on Audio, Speech, and Language Processing'' 15: 1218-1225 * Birkholz P, Jackel D, Kröger BJ (2006) Construction and control of a three-dimensional vocal tract model. ''Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006)'' (Toulouse, France) pp. 873–876 * Coker. C. H. (1968). Speech synthesis with a parametric articulatory model. ''Proc. Speech. Symp., Kyoto, Japan'', paper A-4. * * * Dennis, Jack B. (1963). Computer control of an analog vocal tract. ''Journal of the Acoustical Society of America'', 35, 1115(A). * * * Engwall, O. (2003). Combining MRI, EMA & EPG measurements in a three-dimensional tongue model. Speech Communication, 41, 303–329. * Fant, C. Gunnar M. (1960). ''Acoustic theory of speech production''. The Hague, Mouton. * * * Henke, W. L. (1966). Dynamic Articulatory Model of Speech Production Using Computer Simulation. Unpublished doctoral dissertation, MIT, Cambridge, MA. * Honda, Takashi, Seiichi Inoue, and Yasuo Ogawa. (1968). A hybrid control system of a human vocal tract simulator. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 175–8. Tokyo, International Council of Scientific Unions. * Kelly, John L., and Carol Lochbaum. (1962). Speech synthesis. ''Proceedings of the Speech Communications Seminar'', paper F7. Stockholm, Speech Transmission Laboratory, Royal Institute of Technology. * Kempelen, Wolfgang R. Von. (1791). ''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine''. Wien, J. B. Degen. * Maeda, S. (1988). Improved articulatory model. ''Journal of the Acoustical Society of America'', 84, Sup. 1, S146. * Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A. Marchal (Eds.), ''Speech Production and Speech Modelling'', Kluwer Academic, Dordrecht, 131–149. * Matsui, Eiichi. (1968). Computer-simulated vocal organs. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 151–4. Tokyo, International Council of Scientific Unions. * Mermelstein, Paul. (1969). Computer simulation of articulatory activity in speech production. ''Proceedings of the International Joint Conference on Artificial Intelligence'', Washington, D.C., 1969, ed. by D. E. Walker and L. M. Norton. New York, Gordon & Breach. * * * * * * Rubin, P., Saltzman, E., Goldstein, L., McGowan, R., Tiede, M., & Browman, C. (1996). CASY and extensions to the task-dynamic model. ''Proceedings of the 1st ESCA Tutorial and Research Workshop on Speech Producing Modeling - 4th Speech Production Seminar'', 125–128. *

External links

*
Praat
*
Introduction to Articulatory Speech Synthesis
* or a description from the

BBC #REDIRECT BBC #REDIRECT BBC Here i going to introduce about the best teacher of my life b BALAJI sir. He is the precious gift that I got befor 2yrs . How has helped and thought all the concept and made my success in the 10th board exam. ...

...