Articulatory Synthesis
   HOME

TheInfoList



OR:

Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human
vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx (biology), syrinx in birds) is filtered. In birds it consists of the Vertebrate trachea, trachea, the Syrinx (bio ...
and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the
tongue The tongue is a muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for mastication and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper surfa ...
,
jaw The jaw is any opposable articulated structure at the entrance of the mouth, typically used for grasping and manipulating food. The term ''jaws'' is also broadly applied to the whole of the structures constituting the vault of the mouth and serv ...
, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.


Mechanical talking heads

There is a long history of attempts to build mechanical "
talking heads Talking Heads were an American rock band formed in 1975 in New York City and active until 1991.Talkin ...
". Gerbert (d. 1003), Albertus Magnus (1198–1280) and
Roger Bacon Roger Bacon (; la, Rogerus or ', also '' Rogerus''; ), also known by the scholastic accolade ''Doctor Mirabilis'', was a medieval English philosopher and Franciscan friar who placed considerable emphasis on the study of nature through empiri ...
(1214–1294) are all said to have built speaking heads ( Wheatstone 1837). However, historically confirmed speech synthesis begins with
Wolfgang von Kempelen Johann Wolfgang Ritter von Kempelen de Pázmánd ( hu, Kempelen Farkas; 23 January 1734 – 26 March 1804) was a Hungarian author and inventor, known for his chess-playing "automaton" hoax The Turk and for his speaking machine. Personal lif ...
(1734–1804), who published an account of his research in 1791 (see also ).


Electrical vocal tract analogs

The first electrical vocal tract analogs were static, like those of Dunn (1950), Ken Stevens and colleagues (1953), Gunnar Fant (1960). Rosen (1958) built a dynamic vocal tract (DAVO), which Dennis (1963) later attempted to control by computer. Dennis et al. (1964), Hiki et al. (1968) and Baxter and Strong (1969) have also described hardware vocal-tract analogs. Kelly and Lochbaum (1962) made the first computer simulation; later digital computer simulations have been made, e.g. by Nakata and Mitsuoka (1965), Matsui (1968) and Paul Mermelstein (1971). Honda et al. (1968) have made an analog computer simulation.


Haskins and Maeda models

The first software articulatory synthesizer regularly used for laboratory experiments was developed at Haskins Laboratories in the mid-1970s by
Philip Rubin Philip E. Rubin (born May 22, 1949) is an American cognitive scientist, technologist, and science administrator known for raising the visibility of behavioral and cognitive science, neuroscience, and ethical issues related to science, techno ...
, Tom Baer, and Paul Mermelstein. This synthesizer, known as ASY, was a computational model of speech production based on vocal tract models developed at
Bell Laboratories Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...
in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues. Another popular model that has been frequently used is that of Shinji Maeda, which uses a factor-based approach to control
tongue The tongue is a muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for mastication and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper surfa ...
shape.


Modern models

Recent progress in speech production imaging, articulatory control modeling, and tongue biomechanics modeling has led to changes in the way articulatory synthesis is performe

Examples include the Haskins CASY model (Configurable Articulatory Synthesis), designed by
Philip Rubin Philip E. Rubin (born May 22, 1949) is an American cognitive scientist, technologist, and science administrator known for raising the visibility of behavioral and cognitive science, neuroscience, and ethical issues related to science, techno ...
, Mark Tied

, and Louis Goldstei

which matches midsagittal vocal tracts to actual
magnetic resonance imaging Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio wave ...
(MRI) data, and uses MRI data to construct a 3D model of the vocal tract. A full 3D articulatory synthesis model has been described by Olov Engwall. A geometrically based 3D articulatory speech synthesizer has been developed by Peter Birkholz (VocalTractLab). The Directions Into Velocities of Articulators (DIVA) model, a feedforward control approach which takes the neural computations underlying speech production into consideration, was developed by
Frank H. Guenther Frank H. Guenther (born April 18, 1964, Kansas City, MO) is an American computational and cognitive neuroscientist whose research focuses on the neural computations underlying speech, including characterization of the neural bases of communicatio ...
at
Boston University Boston University (BU) is a private research university in Boston, Massachusetts. The university is nonsectarian, but has a historical affiliation with the United Methodist Church. It was founded in 1839 by Methodists with its original campu ...
. The ArtiSynth project, headed by Sidney Fel

at the
University of British Columbia The University of British Columbia (UBC) is a public university, public research university with campuses near Vancouver and in Kelowna, British Columbia. Established in 1908, it is British Columbia's oldest university. The university ranks a ...
, is a 3D biomechanical modeling toolkit for the human vocal tract and upper airway. Biomechanical modeling of articulators such as the
tongue The tongue is a muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for mastication and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper surfa ...
has been pioneered by a number of scientists, including Reiner Wilhelms-Tricaric

Yohan Paya

and Jean-Michel Gerar

Jianwu Dang and Kiyoshi Hond


Commercial models

One of the few commercial articulatory speech synthesis systems is the
NeXT Next may refer to: Arts and entertainment Film * ''Next'' (1990 film), an animated short about William Shakespeare * ''Next'' (2007 film), a sci-fi film starring Nicolas Cage * '' Next: A Primer on Urban Painting'', a 2005 documentary film Lit ...
-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the
University of Calgary The University of Calgary (U of C or UCalgary) is a public research university located in Calgary, Alberta, Canada. The University of Calgary started in 1944 as the Calgary branch of the University of Alberta, founded in 1908, prior to being ins ...
, where much of the original research was conducted. Following the demise of the various incarnations of
NeXT Next may refer to: Arts and entertainment Film * ''Next'' (1990 film), an animated short about William Shakespeare * ''Next'' (2007 film), a sci-fi film starring Nicolas Cage * '' Next: A Primer on Urban Painting'', a 2005 documentary film Lit ...
(started by
Steve Jobs Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American entrepreneur, industrial designer, media proprietor, and investor. He was the co-founder, chairman, and CEO of Apple; the chairman and majority shareholder of Pixar; a ...
in the late 1980s and merged with
Apple Computer Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company b ...
in 1997), the Trillium software was published under a
GNU General Public Licence The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
, with work continuing as
gnuspeech Gnuspeech is an extensible text-to-speech computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a prono ...
. The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Rene Carré's "distinctive region model".Real-time articulatory speech-synthesis-by-rules
/ref>


See also

* Articulatory phonetics *
Articulatory phonology Articulatory phonology is a linguistic theory originally proposed in 1986 by Catherine Browman of Haskins Laboratories and Louis Goldstein of University of Southern California and Haskins. The theory identifies theoretical discrepancies between phon ...
*
Neurocomputational speech processing Neurocomputational speech processing is computer-simulation of speech production and speech perception by referring to the natural neuronal processes of speech production and speech perception, as they occur in the human nervous system (central nerv ...
*
Praat Praat (; , ''wikt:praat#Dutch, "talk"'') is a free software, free computer software package for speech analysis in phonetics. It was designed, and continues to be developed, by Paul Boersma and David Weenink of the University of Amsterdam. It ca ...
*
Speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...


Footnotes


Bibliography

* Baxter, Brent, and William J. Strong. (1969). WINDBAG—a vocal-tract analog speech synthesizer. ''Journal of the Acoustical Society of America'', 45, 309(A). * Birkholz P, Jackel D, Kröger BJ (2007) Simulation of losses due to turbulence in the time-varying vocal system. ''IEEE Transactions on Audio, Speech, and Language Processing'' 15: 1218-1225 * Birkholz P, Jackel D, Kröger BJ (2006) Construction and control of a three-dimensional vocal tract model. ''Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006)'' (Toulouse, France) pp. 873–876 * Coker. C. H. (1968). Speech synthesis with a parametric articulatory model. ''Proc. Speech. Symp., Kyoto, Japan'', paper A-4. * * * Dennis, Jack B. (1963). Computer control of an analog vocal tract. ''Journal of the Acoustical Society of America'', 35, 1115(A). * * * Engwall, O. (2003). Combining MRI, EMA & EPG measurements in a three-dimensional tongue model. Speech Communication, 41, 303–329. * Fant, C. Gunnar M. (1960). ''Acoustic theory of speech production''. The Hague, Mouton. * * * Henke, W. L. (1966). Dynamic Articulatory Model of Speech Production Using Computer Simulation. Unpublished doctoral dissertation, MIT, Cambridge, MA. * Honda, Takashi, Seiichi Inoue, and Yasuo Ogawa. (1968). A hybrid control system of a human vocal tract simulator. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 175–8. Tokyo, International Council of Scientific Unions. * Kelly, John L., and Carol Lochbaum. (1962). Speech synthesis. ''Proceedings of the Speech Communications Seminar'', paper F7. Stockholm, Speech Transmission Laboratory, Royal Institute of Technology. * Kempelen, Wolfgang R. Von. (1791). ''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine''. Wien, J. B. Degen. * Maeda, S. (1988). Improved articulatory model. ''Journal of the Acoustical Society of America'', 84, Sup. 1, S146. * Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A. Marchal (Eds.), ''Speech Production and Speech Modelling'', Kluwer Academic, Dordrecht, 131–149. * Matsui, Eiichi. (1968). Computer-simulated vocal organs. ''Reports of the 6th International Congress on Acoustics'', ed. by Y. Kohasi, pp. 151–4. Tokyo, International Council of Scientific Unions. * Mermelstein, Paul. (1969). Computer simulation of articulatory activity in speech production. ''Proceedings of the International Joint Conference on Artificial Intelligence'', Washington, D.C., 1969, ed. by D. E. Walker and L. M. Norton. New York, Gordon & Breach. * * * * * * Rubin, P., Saltzman, E., Goldstein, L., McGowan, R., Tiede, M., & Browman, C. (1996). CASY and extensions to the task-dynamic model. ''Proceedings of the 1st ESCA Tutorial and Research Workshop on Speech Producing Modeling - 4th Speech Production Seminar'', 125–128. *


External links

* *
Introduction to Articulatory Speech Synthesis
* or a description from the
BBC #REDIRECT BBC #REDIRECT BBC #REDIRECT BBC Here i going to introduce about the best teacher of my life b BALAJI sir. He is the precious gift that I got befor 2yrs . How has helped and thought all the concept and made my success in the 10th board ex ...
on .
Pink Trombone bare-handed speech synthesis online tool
& {{Speech synthesis Speech synthesis Articles containing video clips