HOME

TheInfoList



OR:

Harmonic Vector Excitation Coding, abbreviated as HVXC is a
speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
specified in
MPEG-4 Part 3 MPEG-4 Part 3 or MPEG-4 Audio (formally ISO/IEC 14496-3) is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published ...
(MPEG-4 Audio) standard for very low
bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction w ...
speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and
variable bit rate Variable bitrate (VBR) is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate (CBR), VBR files vary the amount of output data per time segment. VBR allows a ...
mode and
sampling frequency In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or sp ...
8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic
delay Delay (from Latin: dilatio) may refer to: Arts, entertainment, and media * ''Delay 1968'', a 1981 album by German experimental rock band Can * ''The Delay'', a 2012 Uruguayan film People * B. H. DeLay (1891–1923), American aviator and acto ...
for the encoder and decoder is 36 ms. It was published as subpart 2 of
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...
/
IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
14496-3:1999 (MPEG-4 Audio) in 1999. An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000). MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP (
Code Excited Linear Prediction Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorith ...
). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.


Technology


Linear Predictive Coding

HVXC uses
Linear predictive coding Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. ...
(LPC) with block-wise adaptation every 20ms. The LPC parameters are transformed to Line spectral pair (LSP) coefficients, which are jointly quantized. The LPC residual signal is classified as either
voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refer ...
or
unvoiced In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...
. In the case of voiced speech, the residual is coded in a parametric representation (operating as a
vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was ...
), while in the case of unvoiced speech, the residual waveform is quantized (thus operating as hybrid speech codec).


Voiced (Harmonic) Residual Coding

In voiced segments, the residual signal is represented by two parameters: the pitch period and the spectral envelope. The pitch period is estimated from the peak values of the
autocorrelation Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
of the residual signal. In this process, the residual signal is compared against shifted copies of itself, and the shift which yields the greatest similarity by measure of linear dependence is identified as the pitch period. The spectral envelope is represented by a set of amplitude values, one per
harmonic A harmonic is a wave with a frequency that is a positive integer multiple of the ''fundamental frequency'', the frequency of the original periodic signal, such as a sinusoidal wave. The original signal is also called the ''1st harmonic'', the ...
. To extract these values, the LPC residual signal is transformed into the DFT-domain. The DFT-spectrum is segmented into bands, one band per harmonic. The frequency band for the m-th harmonic consists of the DFT-coefficients from (m-1/2)ω0 to (m+1/2)ω0, ω0 being the pitch frequency. The amplitude value for the m-th harmonic is chosen to optimally represent these DFT-coefficients. Phase information is discarded in this process. The spectral envelope is then coded using variable-dimension weighted
vector quantization Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by di ...
. This process is also referred to as Harmonic VQ. To make speech with a mixture of voiced and unvoiced excitation sound more natural and smooth, three different modes of voiced speech (Mixed Voiced-1, Mixed Voiced-2, Full Voiced) are differentiated. The degree of voicing is determined by the value of the normalized autocorrelation function at a shift of one pitch period. Depending on the chosen mode, different amounts of band-pass
Gaussian noise Gaussian noise, named after Carl Friedrich Gauss, is a term from signal processing theory denoting a kind of signal noise that has a probability density function (pdf) equal to that of the normal distribution (which is also known as the Gaussia ...
are added to the synthesized harmonic signal by the decoder.


Voiceless (VXC) Residual Coding

Unvoiced segments are encoded according to the
CELP Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...
scheme, which is also referred to as vector excitation coding (VXC). The CELP coding in HVXQ is performed using only a stochastic codebook. In other CELP codecs, a dynamic codebook is used additionally to perform long-term prediction of voiced segments. However, since HVXC does not use CELP for voiced segments, the dynamic codebook is omitted from the design.


See also

*
Opus (audio format) Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough for ...


References

{{Compression formats MPEG-4 Speech codecs