A pitch detection algorithm (PDA) is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
designed to estimate the
pitch or
fundamental frequency
The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...
of a
quasiperiodic or
oscillating signal, usually a
digital recording
In digital recording, an audio signal, audio or video signal is converted into a stream of discrete numbers representing the changes over time in air pressure for audio, or Color, chroma and luminance values for video. This number stream is s ...
of
speech
Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...
or a musical note or tone. This can be done in the
time domain
In mathematics and signal processing, the time domain is a representation of how a signal, function, or data set varies with time. It is used for the analysis of mathematical functions, physical signals or time series of economic or environmental ...
, the
frequency domain
In mathematics, physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency (and possibly phase), rather than time, as in time ser ...
, or both.
PDAs are used in various contexts (e.g.
phonetics
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...
,
music information retrieval
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine lear ...
,
speech coding
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
,
musical performance systems) and so there may be different demands placed upon the algorithm. There is as yet no single ideal PDA, so a variety of algorithms exist, most falling broadly into the classes given below.
A PDA typically estimates the period of a quasiperiodic signal, then inverts that value to give the frequency.
General approaches
One simple approach would be to measure the distance between
zero crossing
A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathema ...
points of the signal (i.e. the
zero-crossing rate). However, this does not work well with complicated
waveform
In electronics, acoustics, and related fields, the waveform of a signal is the shape of its Graph of a function, graph as a function of time, independent of its time and Magnitude (mathematics), magnitude Scale (ratio), scales and of any dis ...
s which are composed of multiple sine waves with differing periods or noisy data. Nevertheless, there are cases in which zero-crossing can be a useful measure, e.g. in some speech applications where a single source is assumed. The algorithm's simplicity makes it "cheap" to implement.
More sophisticated approaches compare segments of the signal with other segments offset by a trial period to find a match. AMDF (
average magnitude difference function), ASMDF (Average Squared Mean Difference Function), and other similar
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...
algorithms work this way. These algorithms can give quite accurate results for highly periodic signals. However, they have false detection problems (often "''octave errors''"), can sometimes cope badly with noisy signals (depending on the implementation), and - in their basic implementations - do not deal well with
polyphonic
Polyphony ( ) is a type of musical texture consisting of two or more simultaneous lines of independent melody, as opposed to a musical texture with just one voice ( monophony) or a texture with one dominant melodic voice accompanied by chords ...
sounds (which involve multiple musical notes of different pitches).
Current time-domain pitch detector algorithms tend to build upon the basic methods mentioned above, with additional refinements to bring the performance more in line with a human assessment of pitch. For example, the YIN algorithm and the MPM algorithm are both based upon
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...
.
Frequency-domain approaches
Frequency domain, polyphonic detection is possible, usually utilizing the
periodogram to convert the signal to an estimate of the
frequency spectrum
In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed int ...
. This requires more processing power as the desired accuracy increases, although the well-known efficiency of the
FFT, a key part of the periodogram algorithm, makes it suitably efficient for many purposes.
Popular frequency domain algorithms include: the
harmonic product spectrum;
[Pitch Detection Algorithms](_blank)
online resource from Connexions cepstral analysis and
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
which attempts to match the frequency domain characteristics to pre-defined frequency maps (useful for detecting pitch of fixed tuning instruments); and the detection of peaks due to harmonic series.
To improve on the pitch estimate derived from the discrete Fourier spectrum, techniques such as
spectral reassignment (phase based) or
Grandke interpolation (magnitude based) can be used to go beyond the precision provided by the FFT bins. Another phase-based approach is offered by Brown and Puckette
Spectral/temporal approaches
Spectral/temporal pitch detection algorithms, e.g. the
YAAPT pitch tracking algorithm,
[Stephen A. Zahorian and Hongbing Hu]
YAAPT Pitch Tracking MATLAB Function
/ref> are based upon a combination of time domain processing using an autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...
function such as normalized cross correlation, and frequency domain processing utilizing spectral information to identify the pitch. Then, among the candidates estimated from the two domains, a final pitch track can be computed using dynamic programming. The advantage of these approaches is that the tracking error in one domain can be reduced by the process in the other domain.
Speech pitch detection
The fundamental frequency of speech
Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...
can vary from 40 Hz for low-pitched voices to 600 Hz for high-pitched voices.
Autocorrelation methods need at least two pitch periods to detect pitch. This means that in order to detect a fundamental frequency of 40 Hz, at least 50 milliseconds (ms) of the speech signal must be analyzed. However, during 50 ms, speech with higher fundamental frequencies may not necessarily have the same fundamental frequency throughout the window.[
]
See also
*Auto-Tune
Auto-Tune is audio processor software released on September 19, 1997, by the American company Antares Audio Technologies. It uses a proprietary device to measure and Pitch correction, correct pitch in music. It operates on different principles ...
* Beat detection
* Frequency estimation
* Linear predictive coding
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model ...
* MUSIC (algorithm)
* Sinusoidal model
References
External links
Alain de Cheveigne and Hideki Kawahara: YIN, a fundamental frequency estimator for speech and music
AudioContentAnalysis.org: Matlab code for various pitch detection algorithms
{{DEFAULTSORT:Pitch Detection Algorithm
Audio engineering
Digital signal processing