A phase vocoder is a type of

vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was ...

-purposed algorithm which can

interpolate In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one often has a n ...

information present in the

frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...

and

time domain Time domain refers to the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the cas ...

s of audio signals by using

phase Phase or phases may refer to: Science *State of matter, or phase, one of the distinct forms in which matter can exist *Phase (matter), a region of space throughout which all physical properties are essentially uniform * Phase space, a mathematic ...

information extracted from a frequency transform. The computer

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...

allows

frequency-domain In physics, electronics, control systems engineering, and statistics, the frequency domain refers to the analysis of mathematical functions or signals with respect to frequency, rather than time. Put simply, a time-domain graph shows how a sign ...

modifications to a digital sound file (typically time expansion/compression and pitch shifting). At the heart of the phase vocoder is the

short-time Fourier transform The short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divid ...

(STFT), typically coded using

fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in th ...

s. The STFT converts a

representation of sound into a time-frequency representation (the "analysis" phase), allowing modifications to the amplitudes or phases of specific frequency components of the sound, before resynthesis of the time-frequency domain representation into the time domain by the inverse STFT. The time evolution of the resynthesized sound can be changed by means of modifying the time position of the STFT frames prior to the resynthesis operation allowing for time-scale modification of the original sound file.

Phase coherence problem

The main problem that has to be solved for all cases of manipulation of the STFT is the fact that individual signal components (sinusoids, impulses) will be spread over multiple frames and multiple STFT frequency locations (bins). This is because the STFT analysis is done using overlapping analysis windows. The windowing results in

spectral leakage The Fourier transform of a function of time, s(t), is a complex-valued function of frequency, S(f), often referred to as a frequency spectrum. Any linear time-invariant operation on s(t) produces a new spectrum of the form H(f)•S(f), which chang ...

such that the information of individual sinusoidal components is spread over adjacent STFT bins. To avoid border effects of tapering of the analysis windows, STFT analysis windows overlap in time. This time overlap results in the fact that adjacent STFT analyses are strongly correlated (a sinusoid present in analysis frame at time "t" will be present in the subsequent frames as well). The problem of signal transformation with the phase vocoder is related to the problem that all modifications that are done in the STFT representation need to preserve the appropriate correlation between adjacent frequency bins (vertical coherence) and time frames (horizontal coherence). Except in the case of extremely simple synthetic sounds, these appropriate correlations can be preserved only approximately, and since the invention of the phase vocoder research has been mainly concerned with finding algorithms that would preserve the vertical and horizontal coherence of the STFT representation after the modification. The phase coherence problem was investigated for quite a while before appropriate solutions emerged.

History

The phase vocoder was introduced in 1966 by Flanagan as an algorithm that would preserve horizontal coherence between the phases of bins that represent sinusoidal components. This original phase vocoder did not take into account the vertical coherence between adjacent frequency bins, and therefore, time stretching with this system did produce sound signals that were missing clarity. The optimal reconstruction of the sound signal from STFT after amplitude modifications has been proposed by Griffin and Lim in 1984. This algorithm does not consider the problem of producing a coherent STFT, but it does allow finding the sound signal that has an STFT that is as close as possible to the modified STFT even if the modified STFT is not coherent (does not represent any signal). The problem of the vertical coherence remained a major issue for the quality of time scaling operations until 1999 when Laroche and Dolson proposed a means to preserve phase consistency across spectral bins. The proposition of Laroche and Dolson has to be seen as a turning point in phase vocoder history. It has been shown that by means of ensuring vertical phase consistency very high quality time scaling transformations can be obtained. The algorithm proposed by Laroche did not allow preservation of vertical phase coherence for sound onsets (note onsets). A solution for this problem has been proposed by Roebel. An example of software implementation of phase vocoder based signal transformation using means similar to those described here to achieve high quality signal transformation is

Ircam IRCAM (French: ''Ircam, '', English: Institute for Research and Coordination in Acoustics/Music) is a French institute dedicated to the research of music and sound, especially in the fields of avant garde and electro-acoustical art music. It is ...

's SuperVP.

Use in music

British composer

Trevor Wishart Trevor Wishart (born 11 October 1946) is an English composer, based in York. Wishart has contributed to composing with digital audio media, both fixed and interactive. He has also written extensively on the topic of what he terms " sonic art", a ...

used phase vocoder analyses and transformations of a human voice as the basis for his composition ''Vox 5'' (part of his larger

Vox Cycle ''Vox Cycle'' is a six composition or independent movement cycle for four amplified voices, and electroacoustic music by Trevor Wishart, composed between 1980 and 1988, associated with extended vocal techniques and the contemporary vocal compo ...

). ''

Transfigured Wind Transfiguration(s) or The Transfiguration may refer to: Religion * Transfiguration of Jesus, an event in the Bible * Feast of the Transfiguration, a Christian holiday celebrating the Transfiguration of Jesus * Transfiguration (religion), a mom ...

'' by American composer

Roger Reynolds Roger Lee Reynolds (born July 18, 1934) is a Pulitzer prize-winning American composer. He is known for his capacity to integrate diverse ideas and resources, and for the seamless blending of traditional musical sounds with those newly enabled by t ...

uses the phase vocoder to perform time-stretching of flute sounds. The music of

JoAnn Kuchera-Morin JoAnn Kuchera-Morin (born 1951) is a professor of Media Arts & Technology and of Music." A composer and researcher specializing in multimodal interaction, she is the Creator and Director of the AlloSphere at the California NanoSystems Institute a ...

makes some of the earliest and most extensive use of phase vocoder transformations, such as in ''Dreampaths'' (1989). Roads, Curtis (2004). ''Microsound'', p.318. MIT Press. .

References

External links

The Phase Vocoder: A Tutorial
- A good description of the phase vocoder
New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic EffectsA new Approach to Transient Processing in the Phase Vocoder
- Phase vocoder description with figures and equations {{Speech synthesis Signal processing Speech synthesis

Phase coherence problem

History

Use in music

See also

References

External links