Source–filter Model
   HOME

TheInfoList



OR:

The source–filter model represents speech as a combination of a sound source, such as the
vocal cords In humans, vocal cords, also known as vocal folds or voice reeds, are folds of throat tissues that are key in creating sounds through vocalization. The size of vocal cords affects the pitch of voice. Open when breathing and vibrating for speech ...
, and a linear acoustic filter, the
vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx (biology), syrinx in birds) is filtered. In birds it consists of the Vertebrate trachea, trachea, the Syrinx (bio ...
. While only an approximation, the model is widely used in a number of applications such as
speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...
and speech
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
because of its relative simplicity. It is also related to
linear prediction Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples. In digital signal processing, linear prediction is often called linear predictive coding (LPC) and ...
. The development of the model is due, in large part, to the early work of
Gunnar Fant Carl Gunnar Michael Fant (October 8, 1919 – June 6, 2009) was a leading researcher in speech science in general and speech synthesis in particular who spent most of his career as a professor at the Swedish Royal Institute of Technology (KTH) in ...
, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis. Fant built off the work of Tsutomu Chiba and Masato Kajiyama, who first showed the relationship between a vowel's acoustic properties and the shape of the vocal tract. An important assumption that is often made in the use of the source–filter model is the independence of source and filter. In such cases, the model should more accurately be referred to as the "independent source–filter model".


History

In 1942, Chiba and Kajiyama published their research on vowel acoustics and the vocal tract in their book, ''The Vowel: Its nature and structure''. By creating models of the vocal tract using
X-ray photography Radiography is an imaging technique using X-rays, gamma rays, or similar ionizing radiation and non-ionizing radiation to view the internal form of an object. Applications of radiography include medical radiography ("diagnostic" and "therapeut ...
, they were able to predict the formant frequencies of different vowels, establishing a relationship between the two. Gunnar Fant, a pioneering speech scientist, used Chiba and Kajiyama's research involving X-ray photography of the vocal tract to interpret his own data of Russian speech sounds in ''Acoustic Theory of Speech Production'', which established the source–filter model.


Applications

To varying degrees, different
phoneme In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-west o ...
s can be distinguished by the properties of their source(s) and their spectral shape. Voiced sounds (e.g., vowels) have at least one source due to mostly periodic glottal excitation, which can be approximated by an impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, for example, tongue position and lip protrusion. On the other hand,
fricatives A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in t ...
, such as and , have at least one source due to turbulent noise produced at a constriction in the oral cavity or
pharynx The pharynx (plural: pharynges) is the part of the throat behind the mouth and nasal cavity, and above the oesophagus and trachea (the tubes going down to the stomach and the lungs). It is found in vertebrates and invertebrates, though its struc ...
. So-called ''voiced fricatives'', such as and , have two sources - one at the glottis and one at the supra-glottal constriction.


Speech synthesis

In implementation of the source–filter model of speech production, the sound source, or excitation signal, is often modelled as a periodic impulse train, for voiced speech, or white noise for unvoiced speech. The vocal tract filter is, in the simplest case, approximated by an all-pole filter, where the coefficients are obtained by performing linear prediction to minimize the mean-squared error in the speech signal to be reproduced. Convolution of the excitation signal with the filter response then produces the synthesised speech.


Modeling human speech production

In human speech production, the sound source is the
vocal folds In humans, vocal cords, also known as vocal folds or voice reeds, are folds of throat tissues that are key in creating sounds through vocalization. The size of vocal cords affects the pitch of voice. Open when breathing and vibrating for speech ...
, which can produce a periodic sound when constricted or an aperiodic (white noise) sound when relaxed. The filter is the rest of the vocal tract, which can change shape through manipulation of the
pharynx The pharynx (plural: pharynges) is the part of the throat behind the mouth and nasal cavity, and above the oesophagus and trachea (the tubes going down to the stomach and the lungs). It is found in vertebrates and invertebrates, though its struc ...
, mouth, and nasal cavity. Fant roughly compares the source and filter to
phonation The term phonation has slightly different meanings depending on the subfield of phonetics. Among some phoneticians, ''phonation'' is the process by which the vocal folds produce certain sounds through quasi-periodic vibration. This is the defini ...
and articulation, respectively. The source produces a number of
harmonic A harmonic is a wave with a frequency that is a positive integer multiple of the ''fundamental frequency'', the frequency of the original periodic signal, such as a sinusoidal wave. The original signal is also called the ''1st harmonic'', the ...
s of varying
amplitude The amplitude of a periodic variable is a measure of its change in a single period (such as time or spatial period). The amplitude of a non-periodic signal is its magnitude compared with a reference value. There are various definitions of amplit ...
s, which travel through the vocal tract and are either amplified or attenuated to produce a speech sound.


See also

*
Inverse filter Signal processing is an electrical engineering subfield that focuses on analysing, modifying, and synthesizing signals such as sound, images, and scientific measurements. For example, with a filter ''g'', an inverse filter ''h'' is one such that the ...


References

*
 (there were reprinted edition in 1952, an
Japanese translated edition in 2003
as ) * * {{DEFAULTSORT:Source-filter model Speech synthesis