Cepstral mean and variance normalization (CMVN) is a
computation
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms.
Mechanical or electronic devices (or, hist ...
ally efficient
normalization
Normalization or normalisation refers to a process that makes something more normal or regular. Science
* Normalization process theory, a sociological theory of the implementation of new technologies or innovations
* Normalization model, used in ...
technique for robust
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
. The performance of CMVN is known to
degrade for short
utterance
In spoken language analysis, an utterance is a continuous piece of speech, by one person, before or after which there is silence on the part of the person. In the case of oral language, spoken languages, it is generally, but not always, bounded ...
s. This is due to insufficient data for
parameter estimation and loss of discriminable information as all utterances are forced to have zero
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
and unit
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
.
CMVN minimizes distortion by noise contamination for robust
feature extraction
Feature may refer to:
Computing
* Feature recognition, could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (machine learning), in statistics: individual measurable properties of the phenome ...
by linearly transforming the cepstral coefficients to have the same segmental statistics. Cepstral Normalization has been effective in the
CMU Sphinx for maintaining a high level of recognition accuracy over a wide variety of acoustical environments.
[ Liu, F., Stern, R., Huang, X., and Acero, A. (1993)]
Efficient cepstral normalization for robust speech recognition
Proc.
ARPA Workshop on Human Language Technology, Princeton, NJ.
Cepstral Normalization Techniques
There are multiple algorithms that achieve Cepstral Normalization in different ways.
Fixed codeword-dependent cepstral normalization (FCDCN)
FCDCN was developed to provide a form of compensation that provides greater recognition accuracy than SDCN but in a more computationally-efficient manner than the CDCN algorithm. The FCDCN algorithm applies an additive correction that depends on the instantaneous SNR of the input (like SDCN), but that can also vary from codeword to codeword (like CDCN).
Multiple Fixed Codeword-dependent Cepstral Normalization (MFCDCN)
MFCDCN is a simple extension of FCDCN algorithm that does not need environment specific training. In MFCDCN, compensation vectors are pre-computed in parallel for a set of target environments, using the FCDCN algorithm.
Incremental Multiple Fixed Codeword-dependent Cepstral Normalization (IMFCDCN)
While environment selection for the compensation vectors of MFCDCN is generally performed on an utterance-by-utterance basis, IMFCFCN improves on it by allowing the classification process to make use of cepstral vectors from previous utterances in a given session.
Cepstral Noise Subtraction
Automatic speech recognition (ASR) describes the steps of transcribing speech utterances represented as acoustic wave forms to written words. As is, CMVN has been used in different applications as this technique has proven to provide better speech recognitions results in different environments. CMVN has the capabilities to reduce differences between test and training data produced by channel distortions and colorizations . CMVN has also been found to be able to reduce differences in feature representation between speakers can also partly reduce the influence of background noise.
[Rehr, R., & Gerkmann, T. (2015). Cepstral noise subtraction for robust automatic speech recognition. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). ]
References
Speech recognition
{{computer science stub