HOME





Speech Coding
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP). The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques. The techniques employed in speech coding are similar to those used in audio data compression and audio coding where appreciation of psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Data Compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding: encoding is done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a sig ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ATRAC
Adaptive Transform Acoustic Coding (ATRAC) is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC, in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as a CD while storing audio information with minimal perceptible loss in quality. Improvements to the codec in the form of ATRAC3, ATRAC3plus, and ATRAC Advanced Lossless followed in 1999, 2002, and 2006 respectively. Files in ATRAC3 format originally had the extension; however, in most cases, the files would be stored in an OpenMG Audio container using the extension . Previously, files that were encrypted with OpenMG had the extension, which was replaced by starting in SonicStage v2.1. Encryption is no longer compulsory as of v3.2. Other MiniDisc manufacturers such as Sharp and Panasonic also implemented their own versions of the ATRAC codec. History ATRAC was developed for Sony's MiniDisc format. ATR ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Secure Voice
Secure voice (alternatively secure speech or ciphony) is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or Voice over IP, IP. History The implementation of voice encryption dates back to World War II when secure communication was paramount to the US armed forces. During that time, noise was simply added to a voice signal to prevent enemies from listening to the conversations. Noise was added by playing a record of noise in sync with the voice signal and when the voice signal reached the receiver, the noise signal was subtracted out, leaving the original voice signal. In order to subtract out the noise, the receiver needed to have exactly the same noise signal and the noise records were only made in pairs; one for the transmitter and one for the receiver. Having only two copies of records made it impossible for the wrong receiver to decrypt the signal. To implement the system, the army contracted Bell ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Delta Modulation
Delta modulation (DM, ΔM, or Δ-modulation) is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data are reduced to a 1-bit data stream representing either up (↗) or down (↘). Its main features are: * The analog signal is approximated with a series of segments. * Each segment of the approximated signal is compared to the preceding bits and the successive bits are determined by this comparison. * Only the change of information is sent, that is, only an increase or decrease of the signal amplitude from the previous sample is sent whereas a no-change condition causes the modulated signal to remain at the same ↗ or ↘ state of the previous sample. To achieve high signal-to-noise r ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fundamental Frequency
The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a note that is perceived as the lowest Harmonic series (music)#Partial, partial present. In terms of a superposition of Sine wave, sinusoids, the fundamental frequency is the lowest frequency sinusoidal in the sum of harmonically related frequencies, or the frequency of the difference between adjacent frequencies. In some contexts, the fundamental is usually abbreviated as 0, indicating the lowest frequency Zero-based numbering, counting from zero. In other contexts, it is more common to abbreviate it as 1, the first harmonic. (The second harmonic is then 2 = 2⋅1, etc.) According to Benward and Saker's ''Music: In Theory and Practice'': Explanation All sinusoidal and many non-sinusoidal waveforms repeat exactly over time – they are per ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Periodic Waveform
A periodic function, also called a periodic waveform (or simply periodic wave), is a function that repeats its values at regular intervals or periods. The repeatable part of the function or waveform is called a ''cycle''. For example, the trigonometric functions, which repeat at intervals of 2\pi radians, are periodic functions. Periodic functions are used throughout science to describe oscillations, waves, and other phenomena that exhibit periodicity. Any function that is not periodic is called ''aperiodic''. Definition A function is said to be periodic if, for some nonzero constant , it is the case that :f(x+P) = f(x) for all values of in the domain. A nonzero constant for which this is the case is called a period of the function. If there exists a least positive constant with this property, it is called the fundamental period (also primitive period, basic period, or prime period.) Often, "the" period of a function is used to mean its fundamental period. A function ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Audio Bit Depth
In digital audio using pulse-code modulation (PCM), bit depth is the number of bits of information in each sample, and it directly corresponds to the resolution of each sample. Examples of bit depth include Compact Disc Digital Audio, which uses 16 bits per sample, and DVD-Audio and Blu-ray Disc, which can support up to 24 bits per sample. In basic implementations, variations in bit depth primarily affect the noise level from quantization error—thus the signal-to-noise ratio (SNR) and dynamic range. However, techniques such as dithering, noise shaping, and oversampling can mitigate these effects without changing the bit depth. Bit depth also affects bit rate and file size. Bit depth is useful for describing PCM digital signals. Non-PCM formats, such as those using lossy compression, do not have associated bit depths. Binary representation A PCM signal is a sequence of digital audio samples containing the data providing the necessary information to reconst ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Digital Telephony
Telephony ( ) is the field of technology involving the development, application, and deployment of telecommunications services for the purpose of electronic transmission of voice, fax, or data, between distant parties. The history of telephony is intimately linked to the invention and development of the telephone. Telephony is commonly referred to as the construction or operation of telephones and telephonic systems and as a system of telecommunications in which telephonic equipment is employed in the transmission of speech or other sound between points, with or without the use of wires. The term is also used frequently to refer to computer hardware, software, and computer network systems, that perform functions traditionally performed by telephone equipment. In this context the technology is specifically referred to as Internet telephony, or voice over Internet Protocol (VoIP). Overview The first telephones were connected directly in pairs: each user had a separate telephone wire ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

μ-law Algorithm
The μ-law algorithm (sometimes written Mu (letter), mu-law, often abbreviated as u-law) is a companding algorithm, primarily used in 8-bit PCM Digital data, digital telecommunications systems in North America and Japan. It is one of the two companding algorithms in the G.711 standard from ITU-T, the other being the similar A-law. A-law is used in regions where digital telecommunication signals are carried on E-1 circuits, e.g. Europe. The terms PCMU, G711u or G711MU are used for G711 μ-law. Companding algorithms reduce the dynamic range of an audio signal. In analog systems, this can increase the signal-to-noise ratio (SNR) achieved during transmission; in the digital domain, it can reduce the quantization error (hence increasing the signal-to-quantization-noise ratio). These SNR increases can be traded instead for reduced Bandwidth (signal processing), bandwidth for equivalent SNR. At the cost of a reduced peak SNR, it can be mathematically shown that μ-law's non-linear q ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


A-law
An A-law algorithm is a standard companding algorithm, used in European 8-bit PCM digital communications systems to optimize, i.e. modify, the dynamic range of an analog signal for digitizing. It is one of the two companding algorithms in the G.711 standard from ITU-T, the other being the similar μ-law, used in North America and Japan. For a given input x, the equation for A-law encoding is as follows: F(x) = \sgn(x) \begin \dfrac, & , x, < \dfrac, \\ ex \dfrac, & \dfrac \leq , x, \leq 1, \end where A is the compression parameter. In Europe, A = 87.6. A-law expansion is given by the inverse function: F^(y) = \sgn(y) \begin \dfrac, & , y, < \dfrac, \\ \dfrac, & \dfrac \leq , y, < 1. \end The reason for this encoding is that the wide

picture info

Deep Learning Speech Synthesis
Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. Formulation Given an input text or some sequence of linguistic units Y, the target speech X can be derived by X=\arg\max P(X, Y, \theta) where \theta is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Machine Learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task (computing), tasks without explicit Machine code, instructions. Within a subdiscipline in machine learning, advances in the field of deep learning have allowed Neural network (machine learning), neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance. ML finds application in many fields, including natural language processing, computer vision, speech recognition, email filtering, agriculture, and medicine. The application of ML to business problems is known as predictive analytics. Statistics and mathematical optimisation (mathematical programming) methods comprise the foundations of machine learning. Data mining is a related field of study, focusing on exploratory data analysi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]