G.729

	G.729 G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as ''Coding of speech at 8 kbit/s using code-excited linear prediction'' speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J. Because of its low bandwidth requirements, G.729 is mostly used in voice over Internet Protocol (VoIP) applications when bandwidth must be conserved. Standard G.729 operates at a bit rate of 8 kbit/s, but extensions provide rates of 6.4 kbit/s (Annex D, F, H, I, C+) and 11.8 kbit/s (Annex E, G, H, I, C+) for worse and better speech quality, respectively. G.729 has been extended with various features, commonly designated as G.729a and G.729b: * G.729: This is the original codec using a high-complexity algorithm. * G.729A or Annex A: This version has a medium complexity, and is compatible with G.729. It provides a slightly lower vo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Voice Over Internet Protocol Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet telephony, broadband telephony, and broadband phone service specifically refer to the provisioning of communications services (voice, fax, SMS, voice-messaging) over the Internet, rather than via the public switched telephone network (PSTN), also known as plain old telephone service (POTS). Overview The steps and principles involved in originating VoIP telephone calls are similar to traditional digital telephony and involve signaling, channel setup, digitization of the analog voice signals, and encoding. Instead of being transmitted over a circuit-switched network, the digital information is packetized and transmission occurs as IP packets over a packet-switched network. They transport media streams using special media delivery protoc ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Voice Activity Detection Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth. VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language. It was first investigat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speech Coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Some applications of speech coding are mobile telephony and voice over IP (VoIP). The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques. The techniques employed in speech coding are similar to those used in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Algebraic Code Excited Linear Prediction Algebraic code-excited linear prediction (ACELP) is a speech coding algorithm in which a limited set of pulses is distributed as excitation to a linear prediction filter. It is a linear predictive coding (LPC) algorithm that is based on the code-excited linear prediction (CELP) method and has an algebraic structure. ACELP was developed in 1989 by the researchers at the Université de Sherbrooke in Canada. The ACELP method is widely employed in current speech coding standards such as AMR, EFR, AMR-WB (G.722.2), VMR-WB, EVRC, EVRC-B, SMV, TETRA, PCS 1900, MPEG-4 CELP and ITU-T G-series standards G.729, G.729.1 (first coding stage) and G.723.1. The ACELP algorithm is also used in the proprietary ACELP.net codec. Audible Inc. use a modified version for their speaking books. It is also used in conference-calling software, speech compression tools and has become one of the 3GPP formats. The ACELP patent expired in 2018 and is now royalty-free. Features The main advantage of ACE ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was invented in 1938 by Homer Dudley at Bell Labs as a means of synthesizing human speech. This work was developed into the channel vocoder which was used as a voice codec for telecommunications for speech coding to conserve bandwidth in transmission. By encrypting the control signals, voice transmission can be secured against interception. Its primary use in this fashion is for secure radio communication. The advantage of this method of encryption is that none of the original signal is sent, only envelopes of the bandpass filters. The receiving unit needs to be set up in the same filter configuration to re-synthesize a version of the original signal spectrum. The vocoder has also been used extensively as an electronic musical instrument. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Code-excited Linear Prediction Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction (RELP) and linear predictive coding (LPC) vocoders (e.g., FS-1015). Along with its variants, such as algebraic CELP, relaxed CELP, low-delay CELP and vector sum excited linear prediction, it is currently the most widely used speech coding algorithm. It is also used in MPEG-4 Audio speech coding. CELP is commonly used as a generic term for a class of algorithms and not for a particular codec. Background The CELP algorithm is based on four main ideas: * Using the source-filter model of speech production through linear prediction (LP) (see the textbook "speech coding algorithm"); * Using an adaptive and a fixed codebook as the input (excitation) of the LP model; * Performing a search in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Modified Discrete Cosine Transform The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used lossy compression technique in audio data compression. It is employed in most modern audio coding standards, including MP3, Dolby Digital (AC-3), Vorbis (Ogg), Windows Media Audio (WMA), ATRAC, Cook, Advanced Audio Coding (AAC), High-Definition Coding (HDC), LDAC, Dolby AC-4, and MPEG-H 3D Audio, as well as speech coding standards such ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	PSQM Perceptual Speech Quality Measure (PSQM) is a computational and modeling algorithm defined in Recommendation ITU-T P.861 that objectively evaluates and quantifies voice quality of voice-band (300 – 3400 Hz) speech codecs. It may be used to rank the performance of these speech codecs with differing speech input levels, talkers, bit rates and transcodings. P.861 was withdrawn and replaced by Recommendation ITU-T P.862 ( PESQ), which contains an improved speech assessment algorithm. Why it is used Using the PSQM standard allows automated, simulation-based test methodologies to objectively rate both speech clarity and transmitted voice quality. Various software and/or hardware products have been developed to facilitate this testing. This results in considerable savings in cost and time over the traditional practice of using large groups of people to subjectively evaluate voice signals and assess voice quality. Moreover, it yields objective results that are reliable and reproduci ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mean Opinion Score Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual "values on a predefined scale that a subject assigns to his opinion of the performance of a system quality". Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated. MOS is a commonly used measure for video, audio, and audiovisual quality evaluation, but not restricted to those modalities. ITU-T has defined several ways of referring to a MOS in RecommendatioITU-T P.800.1 depending on whether the score was obtained from audiovisual, conversational, listening, talking, or video quality tests. Rating scales and mathematical definition The MOS is expressed as a single rational number, typically in the range 1–5, where 1 is lowest perceived quality, and 5 is the highest perceived quality. Other MOS rang ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Session Description Protocol The Session Description Protocol (SDP) is a format for describing multimedia communication sessions for the purposes of announcement and invitation. Its predominant use is in support of streaming media applications, such as voice over IP (VoIP) and video conferencing. SDP does not deliver any media streams itself but is used between endpoints for negotiation of network metrics, media types, and other associated properties. The set of properties and parameters is called a ''session profile''. SDP is extensible for the support of new media types and formats. SDP was originally a component of the Session Announcement Protocol (SAP), but found other uses in conjunction with the Real-time Transport Protocol (RTP), the Real-time Streaming Protocol (RTSP), Session Initiation Protocol (SIP), and as a standalone protocol for describing multicast sessions. The IETF published the original specification as a Proposed Standard in April 1998. Revised specifications were released in 2006 (RF ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Silence Silence is the absence of ambient audible sound, the emission of sounds of such low intensity that they do not draw attention to themselves, or the state of having ceased to produce sounds; this latter sense can be extended to apply to the cessation or absence of any form of communication, whether through speech or other medium. Sometimes speakers fall silent when they hesitate in searching for a word, or interrupt themselves before correcting themselves. Discourse analysis shows that people use brief silences to mark the boundaries of prosodic units, in turn-taking, or as reactive tokens, e.g., as a sign of displeasure, disagreement, embarrassment, desire to think, confusion, and the like. Relatively prolonged intervals of silence can be used in rituals; in some religious disciplines, people maintain silence for protracted periods, or even for the rest of their lives, as an ascetic means of spiritual transformation. Rhetorical practice Silence may become an effective rhet ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	ITU-T The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Communication Technology Information and communications technology (ICT) is an extensional term for information technology (IT) that stresses the role of unified communications and the integration of telecommunications (telephone lines and wireless signals) and computers, ... such as X.509 for cybersecurity, Y.3172 and Y.3173 for machine learning, and H.264/MPEG-4 AVC for video compression, between its Member States, Private Sector Members, and Academia Members. The first meeting of the World Telecommunication Standardization Assembly (WTSA), the sector's governing conference, took place on 1 March of that year. ITU-T has a permanent secretariat called the Telecommunication Standardization Bureau (TSB), which is based at the ITU headquarters ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]