HOME
*





CMVN
Cepstral mean and variance normalization (CMVN) is a computationally efficient normalization technique for robust speech recognition. The performance of CMVN is known to degrade for short utterances. This is due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. CMVN minimizes distortion by noise contamination for robust feature extraction by linearly transforming the cepstral coefficients to have the same segmental statistics. Cepstral Normalization has been effective in the CMU Sphinx CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model traine ... for maintaining a high level of recognition accuracy over a wide variety of acoustical environments. Liu, F., Stern, R., Huang, X., and Acero, A. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cepstrum
In Fourier analysis, the cepstrum (; plural ''cepstra'', adjective ''cepstral'') is the result of computing the inverse Fourier transform (IFT) of the logarithm of the estimated signal spectrum. The method is a tool for investigating periodic structures in frequency spectra. The ''power cepstrum'' has applications in the analysis of human speech. The term ''cepstrum'' was derived by reversing the first four letters of ''spectrum''. Operations on cepstra are labelled ''quefrency analysis'' (or ''quefrency alanysisB. P. Bogert, M. J. R. Healy, and J. W. Tukey, ''The Quefrency of Time Series for Echoes: Cepstrum, Pseudo Autocovariance, Cross-Cepstrum and Saphe Cracking'', ''Proceedings of the Symposium on Time Series Analysis'' (M. Rosenblatt, Ed) Chapter 15, 209-243. New York: Wiley, 1963.''), ''liftering'', or ''cepstral analysis''. It may be pronounced in the two ways given, the second having the advantage of avoiding confusion with ''kepstrum''. Origin The concept of the cep ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Computation
Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as '' computers''. An especially well-known discipline of the study of computation is computer science. Physical process of Computation Computation can be seen as a purely physical process occurring inside a closed physical system called a computer. Examples of such physical systems are digital computers, mechanical computers, quantum computers, DNA computers, molecular computers, microfluidics-based computers, analog computers, and wetware computers. This point of view has been adopted by the physics of computation, a branch of theoretical physics, as well as the field of natural computing. An even more radical point of view, pancomputationalism (inaudible word), is the postulate of digital physics that argues that the evolution of the universe is itself ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Audio Normalization
Audio normalization is the application of a constant amount of gain to an audio recording to bring the amplitude to a target level (the norm). Because the same amount of gain is applied across the entire recording, the signal-to-noise ratio and relative dynamics are unchanged. Normalization is one of the functions commonly provided by a digital audio workstation. Two principal types of audio normalization exist. Peak normalization adjusts the recording based on the highest signal level present in the recording. Loudness normalization adjusts the recording based on perceived loudness. Normalization differs from dynamic range compression, which applies varying levels of gain over a recording to fit the level within a minimum and maximum range. Normalization adjusts the gain by a constant value across the entire recording. Peak normalization One type of normalization is peak normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Speech Recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent". Speech recognition ap ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Degradation (telecommunications)
In telecommunication, degradation is the loss of quality of an electronic signal, which may be categorized as either "'' graceful''" or "''catastrophic''", and has the following meanings: #The deterioration in quality, level, or standard of performance of a functional unit. #In communications, a condition in which one or more of the required performance parameters fall outside predetermined limits, resulting in a lower quality of service. There are several forms and causes of degradation in electric signals, both in the time domain and in the physical domain, including runt pulse, voltage spike, jitter, wander, swim, drift, glitch, ringing, crosstalk, antenna effect (not the same antenna effect as in IC manufacturing), and phase noise. Degradation usually refers to reduction in quality of an analog or digital signal. When a signal is being transmitted or received, it undergoes changes which are undesirable. These changes are called degradation. Degradation is usually caused by: ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Utterance
In spoken language analysis, an utterance is a continuous piece of speech, often beginning and ending with a clear pause. In the case of oral languages, it is generally, but not always, bounded by silence. Utterances do not exist in written language; only their representations do. They can be represented and delineated in written language in many ways. In oral/spoken language, utterances have several characteristics such as paralinguistic features, which are aspects of speech such as facial expression, gesture, and posture. Prosodic features include stress, intonation, and tone of voice, as well as ellipsis, which are words that the listener inserts in spoken language to fill gaps. Moreover, other aspects of utterances found in spoken languages are non-fluency features including: voiced/un-voiced pauses (i.e. "umm"), tag questions, and false starts, or when someone begins uttering again to correct themselves. Other features include fillers (i.e. "and stuff"), accent/dialect, deic ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Parameter Estimation
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An ''estimator'' attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered: * The probabilistic approach (described in this article) assumes that the measured data is random with probability distribution dependent on the parameters of interest * The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. Examples For example, it is desired to estimate the proportion of a population of voters who will vote for a particular candidate. That proportion is the parameter sought; the estimate is based on a small random sample of voters. Alternatively, it ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Mean (mathematics)
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithmetic mean'', also known as "arithmetic average", is a measure of central tendency of a finite set of numbers: specifically, the sum of the values divided by the number of values. The arithmetic mean of a set of numbers ''x''1, ''x''2, ..., x''n'' is typically denoted using an overhead bar, \bar. If the data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic mean is the ''sample mean'' (\bar) to distinguish it from the mean, or expected value, of the underlying distribution, the ''population mean'' (denoted \mu or \mu_x).Underhill, L.G.; Bradfield d. (1998) ''Introstat'', Juta and Company Ltd.p. 181/ref> Outside probability and statistics, a wide range of other notions of mean are ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by \sigma^2, s^2, \operatorname(X), V(X), or \mathbb(X). An advantage of variance as a measure of dispersion is that it is more amenable to algebraic manipulation than other measures of dispersion such as the expected absolute deviatio ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Feature Extraction
In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values ( features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters, or the repetitiveness of images presented as pixels), then it can be transformed into a reduced set of features (also named a feature vector). Determining a subset of the initial features is called feature selection. The selected features are expected to contain the relevant information from the input data, so that the desired task can be performed by using this reduced representation instead of the complete initial data. General Feature extract ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




CMU Sphinx
CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model trainer (SphinxTrain). In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training, language model compilation and a public domain pronunciation dictionary, cmudict. Sphinx encompasses a number of software systems, described below. Sphinx Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models ( HMMs) and an n-gram statistical language model. It was developed by Kai-Fu Lee. Sphinx featured feasibility of continuous-speech, speaker-indepe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]