In
speech science
Speech science refers to the study of production, transmission and perception of speech. Speech science involves anatomy, in particular the anatomy of the oro-facial region and neuroanatomy, physiology, and acoustics.
Speech production
The pro ...
and
phonetics
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...
, a formant is the broad spectral maximum that results from an
acoustic resonance
Acoustic resonance is a phenomenon in which an acoustics, acoustic system amplifies sound waves whose frequency matches one of its own natural frequencies of vibration (its ''resonance frequencies'').
The term "acoustic resonance" is sometimes u ...
of the
human vocal tract. In
acoustics
Acoustics is a branch of physics that deals with the study of mechanical waves in gases, liquids, and solids including topics such as vibration, sound, ultrasound and infrasound. A scientist who works in the field of acoustics is an acoustician ...
, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the
harmonic
In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the ''fundamental frequency'' of a periodic signal. The fundamental frequency is also called the ''1st har ...
that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak differs slightly from the associated resonance frequency, except when, by luck, harmonics are aligned with the resonance frequency, or when the sound source is mostly non-harmonic, as in whispering and
vocal fry.
A room can be said to have formants characteristic of that particular room, due to its resonances, i.e., to the way sound reflects from its walls and objects. Room formants of this nature reinforce themselves by emphasizing specific frequencies and absorbing others, as exploited, for example, by
Alvin Lucier
Alvin Augustus Lucier Jr. (May 14, 1931 – December 1, 2021) was an American experimental composer and sound artist. A long-time music professor at Wesleyan University in Middletown, Connecticut, Lucier was a member of the influential Sonic Ar ...
in his piece ''
I Am Sitting in a Room''. In acoustic
digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
, the way a collection of formants (such as a room) affects a signal can be represented by an
impulse response
In signal processing and control theory, the impulse response, or impulse response function (IRF), of a dynamic system is its output when presented with a brief input signal, called an impulse (). More generally, an impulse response is the reac ...
.
In both speech and rooms, formants are characteristic features of the resonances of the space. They are said to be ''excited'' by acoustic sources such as the voice, and they shape (filter) the sources' sounds, but they are not sources themselves.
History
From an acoustic point of view, phonetics had a serious problem with the idea that the effective length of vocal tract changed vowels. Indeed, when the length of the vocal tract changes, all the acoustic resonators formed by mouth cavities are scaled, and so are their resonance frequencies. Therefore, it was unclear how vowels could depend on frequencies when talkers with different vocal tract lengths, for instance
bass
Bass or Basses may refer to:
Fish
* Bass (fish), various saltwater and freshwater species
Wood
* Bass or basswood, the wood of the tilia americana tree
Music
* Bass (sound), describing low-frequency sound or one of several instruments in th ...
and
soprano
A soprano () is a type of classical singing voice and has the highest vocal range of all voice types. The soprano's vocal range (using scientific pitch notation) is from approximately middle C (C4) = 261 Hertz, Hz to A5 in Choir, choral ...
singers, can produce sounds that are perceived as belonging to the same phonetic category. There had to be some way to normalize the spectral information underpinning the vowel identity.
Hermann suggested a solution to this problem in 1894, coining the term “formant”. A vowel, according to him, is a special acoustic phenomenon, depending on the intermittent production of a special partial, or “formant”, or “characteristique” feature. The frequency of the “formant” may vary a little without altering the character of the vowel. For “long e” (''ee'' or ''iy'') for example, the lowest-frequency “formant” may vary from 350 to 440 Hz even in the same person.
[McKendrick, J. G. (1903). Experimental phonetics. In Annual report of the board of regents of the Smithsonian institution for the year ending June 30, 1902 (pp. 241–259). Smithsonian Institution.]
Phonetics
Formants are distinctive frequency components of the acoustic signal produced by speech, musical instruments or
singing
Singing is the art of creating music with the voice. It is the oldest form of musical expression, and the human voice can be considered the first musical instrument. The definition of singing varies across sources. Some sources define singi ...
. The information that humans require to distinguish between speech sounds can be represented purely quantitatively by specifying peaks in the frequency spectrum.
Most of these formants are produced by tube and chamber
resonance
Resonance is a phenomenon that occurs when an object or system is subjected to an external force or vibration whose frequency matches a resonant frequency (or resonance frequency) of the system, defined as a frequency that generates a maximu ...
, but a few whistle tones derive from periodic collapse of
Venturi effect
The Venturi effect is the reduction in fluid pressure that results when a moving fluid speeds up as it flows from one section of a pipe to a smaller section. The Venturi effect is named after its discoverer, the Italian physicist Giovanni Ba ...
low-pressure zones.
The formant with the lowest frequency is called ''F''
1, the second ''F''
2, the third ''F''
3, and so forth. The
fundamental frequency
The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...
or
pitch of the voice is sometimes referred to as ''F''
0, but it is not a formant. Most often the two first formants, ''F''
1 and ''F''
2, are sufficient to identify the vowel. The relationship between the perceived vowel quality and the first two formant frequencies can be appreciated by listening to "artificial vowels" that are generated by passing a click train (to simulate the glottal pulse train) through a pair of bandpass filters (to simulate vocal tract resonances).
Front vowel
A front vowel is a class of vowel sounds used in some spoken languages, its defining characteristic being that the highest point of the tongue is positioned approximately as far forward as possible in the mouth without creating a constriction th ...
s have higher ''F''
2, while
low vowel
An open vowel is a vowel sound in which the tongue is positioned approximately as far as possible from the roof of the mouth. Open vowels are sometimes also called low vowels (in U.S. terminology ) in reference to the low position of the tongue ...
s have higher ''F''
1.
Lip rounding tends to lower ''F''
1 and ''F''
2 in back vowels and ''F''
2 and ''F''
3 in front vowels.
Nasal consonants usually have an additional formant around 2500 Hz. The liquid usually has an extra formant at 1500 Hz, whereas the
English "r" sound () is distinguished by a very low third formant (well below 2000 Hz).
Plosives
In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases.
The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lip ...
(and, to some degree,
fricatives
A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in t ...
) modify the placement of formants in the surrounding vowels.
Bilabial
In phonetics, a bilabial consonant is a labial consonant articulated with both lips.
Frequency
Bilabial consonants are very common across languages. Only around 0.7% of the world's languages lack bilabial consonants altogether, including Tling ...
sounds (such as and in "ball" or "sap") cause a lowering of the formants; on spectrograms,
velar Velar may refer to:
* Velar consonant
Velar consonants are consonants articulated with the back part of the tongue (the dorsum) against the soft palate, the back part of the roof of the mouth (also known as the "velum").
Since the velar region ...
sounds ( and in English) almost always show ''F''
2 and ''F''
3 coming together in a 'velar pinch' before the
velar Velar may refer to:
* Velar consonant
Velar consonants are consonants articulated with the back part of the tongue (the dorsum) against the soft palate, the back part of the roof of the mouth (also known as the "velum").
Since the velar region ...
and separating from the same 'pinch' as the velar is released;
alveolar
Alveolus (; pl. alveoli, adj. alveolar) is a general anatomical term for a concave cavity or pit.
Uses in anatomy and zoology
* Pulmonary alveolus, an air sac in the lungs
** Alveolar cell or pneumocyte
** Alveolar duct
** Alveolar macrophage
* M ...
sounds (English and ) cause fewer systematic changes in neighbouring vowel formants, depending partially on exactly which vowel is present. The time course of these changes in vowel formant frequencies are referred to as 'formant transitions'.
In normal voiced speech, the underlying vibration produced by the vocal folds resembles a
sawtooth wave
The sawtooth wave (or saw wave) is a kind of non-sinusoidal waveform. It is so named based on its resemblance to the teeth of a plain-toothed saw with a zero rake angle. A single sawtooth, or an intermittently triggered sawtooth, is called a ...
, rich in
harmonic
In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the ''fundamental frequency'' of a periodic signal. The fundamental frequency is also called the ''1st har ...
overtones. If the fundamental frequency or (more often) one of the overtones is higher than a resonance frequency of the system, then the resonance will be only weakly excited and the formant usually imparted by that resonance will be mostly lost. This is most apparent in the case of
soprano
A soprano () is a type of classical singing voice and has the highest vocal range of all voice types. The soprano's vocal range (using scientific pitch notation) is from approximately middle C (C4) = 261 Hertz, Hz to A5 in Choir, choral ...
opera
Opera is a form of History of theatre#European theatre, Western theatre in which music is a fundamental component and dramatic roles are taken by Singing, singers. Such a "work" (the literal translation of the Italian word "opera") is typically ...
singers, who sing at pitches high enough that their vowels become very hard to distinguish.
Control of resonances is an essential component of the vocal technique known as
overtone singing
Overtone singing, also known as overtone chanting, harmonic singing, polyphonic overtone singing, or diphonic singing, is a set of singing techniques in which the vocalist manipulates the resonances of the vocal tract to arouse the perception ...
, in which the performer sings a low fundamental tone, and creates sharp resonances to select upper
harmonics
In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the ''fundamental frequency'' of a periodic signal. The fundamental frequency is also called the ''1st harm ...
, giving the impression of several tones being sung at once.
Spectrogram
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.
When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...
s may be used to visualise formants. In spectrograms, it can be hard to distinguish formants from naturally occurring harmonics when one sings. However, one can hear the natural formants in a vowel shape through atonal techniques such as
vocal fry.
Formant estimation
Formants, whether they are seen as acoustic resonances of the vocal tract, or as local maxima in the speech spectrum, like
band-pass filter
A band-pass filter or bandpass filter (BPF) is a device that passes frequencies within a certain range and rejects ( attenuates) frequencies outside that range.
It is the inverse of a '' band-stop filter''.
Description
In electronics and s ...
s, are defined by their frequency and by their
spectral width (
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
).
Different methods exist to obtain this information. Formant frequencies, in their acoustic definition, can be estimated from the
frequency spectrum
In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed int ...
of the sound, using a spectrogram (in the figure) or a spectrum analyzer. However, to estimate the acoustic resonances of the vocal tract (i.e. the speech definition of formants) from a speech recording, one can use ''
linear predictive coding
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model ...
''. An intermediate approach consists in extracting the spectral envelope by neutralizing the fundamental frequency, and only then looking for local maxima in the spectral envelope.
Formant plots

The first two formants are important in determining the quality of vowels, and are frequently said to correspond to the open/close (or low/high) and front/back dimensions (which have traditionally been associated with the shape and position of the
tongue
The tongue is a Muscle, muscular organ (anatomy), organ in the mouth of a typical tetrapod. It manipulates food for chewing and swallowing as part of the digestive system, digestive process, and is the primary organ of taste. The tongue's upper s ...
). Thus the first formant ''F''
1 has a higher frequency for an open or low vowel such as and a lower frequency for a closed or high vowel such as or ; and the second formant ''F''
2 has a higher frequency for a front vowel such as and a lower frequency for a back vowel such as .
Vowels will almost always have four or more distinguishable formants, and sometimes more than six. However, the first two formants are the most important in determining vowel quality and are often plotted against each other in vowel diagrams, though this simplification fails to capture some aspects of vowel quality such as rounding.
Many writers have addressed the problem of finding an optimal alignment of the positions of vowels on formant plots with those on the conventional vowel quadrilateral. The pioneering work of Ladefoged used the
Mel scale
The mel scale (after the word ''melody'')
is a perceptual scale of pitches judged by listeners to be equal in distance from one another. The reference point between this scale and normal frequency measurement is defined by assigning a percept ...
because this scale was claimed to correspond more closely to the auditory scale of
pitch than to the acoustic measure of
fundamental frequency
The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...
expressed in Hertz. Two alternatives to the Mel scale are the
Bark scale
The Bark scale is a psychoacoustical scale proposed by Eberhard Zwicker in 1961. It is named after Heinrich Barkhausen, who proposed the first subjective measurements of loudness.Zwicker, E. (1961),Subdivision of the audible frequency range i ...
and the
ERB-rate scale. Another widely adopted strategy is plotting the difference between ''F''
1 and ''F''
2 rather than ''F''
2 on the horizontal axis.
Singer's formant
Studies of the frequency spectrum of trained speakers and classical
singers
Singing is the art of creating music with the voice. It is the oldest form of musical expression, and the human voice can be considered the first musical instrument. The definition of singing varies across sources. Some sources define singi ...
, especially male singers, indicate a clear formant around 3000 Hz (between 2800 and 3400 Hz) that is absent in speech or in the spectra of untrained speakers or singers. It is thought to be associated with one or more of the higher resonances of the vocal tract.
It is this increase in energy at 3000 Hz which allows singers to be heard and understood over an
orchestra
An orchestra (; ) is a large instrumental ensemble typical of classical music, which combines instruments from different families. There are typically four main sections of instruments:
* String instruments, such as the violin, viola, cello, ...
. This formant is actively developed through
vocal training, for instance through so-called ''
voce di strega'' or "witch's voice"
exercises and is caused by a part of the vocal tract acting as a
resonator
A resonator is a device or system that exhibits resonance or resonant behavior. That is, it naturally oscillates with greater amplitude at some frequencies, called resonant frequencies, than at other frequencies. The oscillations in a reso ...
.
In classical music and vocal pedagogy, this phenomenon is also known as ''
squillo''.
See also
*
Formant synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
*
Human voice
The human voice consists of sound Voice production, made by a human being using the vocal tract, including Speech, talking, singing, Laughter, laughing, crying, screaming, shouting, humming or yelling. The human voice frequency is specifically ...
*
Linear predictive coding
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model ...
*
Praat
Praat ( , ; ) is a free, open-source computer software package widely used for speech analysis and synthesis in phonetics and other fields of linguistics. It was designed and continues to be developed by Paul Boersma and David Weenink at the ...
*
Timbre
In music, timbre (), also known as tone color or tone quality (from psychoacoustics), is the perceived sound of a musical note, sound or tone. Timbre distinguishes sounds according to their source, such as choir voices and musical instrument ...
*
Vocoder
A vocoder (, a portmanteau of ''vo''ice and en''coder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.
The vocoder wa ...
References
External links
Formants for fun and profitA discussion of the three different meanings of the word 'formant'
from the University of New South Wales
from the University of New South Wales
{{Acoustics
Human voice
Sound synthesis types
Acoustics