A pitch detection algorithm (PDA) is an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

designed to estimate the pitch or

fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...

of a quasiperiodic or oscillating signal, usually a

digital recording In digital recording, an audio signal, audio or video signal is converted into a stream of discrete numbers representing the changes over time in air pressure for audio, or Color, chroma and luminance values for video. This number stream is s ...

speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...

or a musical note or tone. This can be done in the

time domain In mathematics and signal processing, the time domain is a representation of how a signal, function, or data set varies with time. It is used for the analysis of mathematical functions, physical signals or time series of economic or environmental ...

, the

frequency domain In mathematics, physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency (and possibly phase), rather than time, as in time ser ...

, or both. PDAs are used in various contexts (e.g.

phonetics Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...

music information retrieval Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine lear ...

speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

, musical performance systems) and so there may be different demands placed upon the algorithm. There is as yet no single ideal PDA, so a variety of algorithms exist, most falling broadly into the classes given below. A PDA typically estimates the period of a quasiperiodic signal, then inverts that value to give the frequency.

General approaches

One simple approach would be to measure the distance between

zero crossing A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathema ...

points of the signal (i.e. the zero-crossing rate). However, this does not work well with complicated

waveform In electronics, acoustics, and related fields, the waveform of a signal is the shape of its Graph of a function, graph as a function of time, independent of its time and Magnitude (mathematics), magnitude Scale (ratio), scales and of any dis ...

s which are composed of multiple sine waves with differing periods or noisy data. Nevertheless, there are cases in which zero-crossing can be a useful measure, e.g. in some speech applications where a single source is assumed. The algorithm's simplicity makes it "cheap" to implement. More sophisticated approaches compare segments of the signal with other segments offset by a trial period to find a match. AMDF ( average magnitude difference function), ASMDF (Average Squared Mean Difference Function), and other similar

autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, measures the correlation of a signal with a delayed copy of itself. Essentially, it quantifies the similarity between observations of a random variable at differe ...

algorithms work this way. These algorithms can give quite accurate results for highly periodic signals. However, they have false detection problems (often "''octave errors''"), can sometimes cope badly with noisy signals (depending on the implementation), and - in their basic implementations - do not deal well with

polyphonic Polyphony ( ) is a type of musical texture consisting of two or more simultaneous lines of independent melody, as opposed to a musical texture with just one voice ( monophony) or a texture with one dominant melodic voice accompanied by chords ...

sounds (which involve multiple musical notes of different pitches). Current time-domain pitch detector algorithms tend to build upon the basic methods mentioned above, with additional refinements to bring the performance more in line with a human assessment of pitch. For example, the YIN algorithm and the MPM algorithm are both based upon

Frequency-domain approaches

Frequency domain, polyphonic detection is possible, usually utilizing the periodogram to convert the signal to an estimate of the

frequency spectrum In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed int ...

. This requires more processing power as the desired accuracy increases, although the well-known efficiency of the FFT, a key part of the periodogram algorithm, makes it suitably efficient for many purposes. Popular frequency domain algorithms include: the harmonic product spectrum;Pitch Detection Algorithms
online resource from Connexions cepstral analysis and

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

which attempts to match the frequency domain characteristics to pre-defined frequency maps (useful for detecting pitch of fixed tuning instruments); and the detection of peaks due to harmonic series. To improve on the pitch estimate derived from the discrete Fourier spectrum, techniques such as spectral reassignment (phase based) or Grandke interpolation (magnitude based) can be used to go beyond the precision provided by the FFT bins. Another phase-based approach is offered by Brown and Puckette

Spectral/temporal approaches

Spectral/temporal pitch detection algorithms, e.g. the YAAPT pitch tracking algorithm,Stephen A. Zahorian and Hongbing Hu
YAAPT Pitch Tracking MATLAB Function
/ref> are based upon a combination of time domain processing using an

function such as normalized cross correlation, and frequency domain processing utilizing spectral information to identify the pitch. Then, among the candidates estimated from the two domains, a final pitch track can be computed using dynamic programming. The advantage of these approaches is that the tracking error in one domain can be reduced by the process in the other domain.

Speech pitch detection

The fundamental frequency of

can vary from 40 Hz for low-pitched voices to 600 Hz for high-pitched voices. Autocorrelation methods need at least two pitch periods to detect pitch. This means that in order to detect a fundamental frequency of 40 Hz, at least 50 milliseconds (ms) of the speech signal must be analyzed. However, during 50 ms, speech with higher fundamental frequencies may not necessarily have the same fundamental frequency throughout the window.

References

External links

Alain de Cheveigne and Hideki Kawahara: YIN, a fundamental frequency estimator for speech and music

AudioContentAnalysis.org: Matlab code for various pitch detection algorithms
{{DEFAULTSORT:Pitch Detection Algorithm Audio engineering Digital signal processing