HOME

TheInfoList



OR:

Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, n ...
speech codec that may be used on
VoIP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet t ...
applications and
podcast A podcast is a program made available in digital format for download over the Internet. For example, an episodic series of digital audio or video files that a user can download to a personal device to listen to at a time of their choosin ...
s. It is based on the CELP speech coding algorithm.Xiph.Or
Introduction to CELP Coding
Retrieved 2009-09-01
Speex claims to be free of any
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A ...
restrictions and is licensed under the revised (3-clause)
BSD license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lice ...
. It may be used with the Ogg
container format A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. No ...
or directly transmitted over UDP/ RTP. It may also be used with the
FLV Flash Video is a container file format used to deliver digital video content (e.g., TV shows, movies, etc.) over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are ...
container format. The Speex designers see their project as complementary to the
Vorbis Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder ( codec) for lossy audio compression. Vorbis is most commonly used in con ...
general-purpose audio compression project. Speex is a
lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data si ...
format, ''i.e.'' quality is permanently degraded to reduce file size. The Speex project was created on February 13, 2002. The first development versions of Speex were released under
LGPL The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...
license, but as of version 1.0 beta 1, Speex is released under Xiph's version of the (revised) BSD license. Speex 1.0 was announced on March 24, 2003, after a year of development.Xiph.Org (2003-03-24
Speex reaches 1.0; Xiph.Org now a 501(c)(3) Non-Profit Organization
Retrieved 2009-09-01
The last stable version of Speex encoder and decoder is 1.2.0. Xiph.Org now considers Speex obsolete; its successor is the more modern Opus codec, which uses the
SILK Silk is a natural protein fiber, some forms of which can be woven into textiles. The protein fiber of silk is composed mainly of fibroin and is produced by certain insect larvae to form cocoons. The best-known silk is obtained from th ...
format under license from
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
and surpasses its performance in most areas except at the lowest sample rates.


Description

Speex is targeted at
voice over IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet t ...
(VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32  kHz
sampling rate In signal processing, sampling is the reduction of a continuous-time signal In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled. Discrete time ...
),
wideband In communications, a system is wideband when the message bandwidth significantly exceeds the coherence bandwidth of the channel. Some communication links have such a high data rate that they are forced to use a wide bandwidth; other links ma ...
(16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Since Speex was designed for VoIP instead of cell phone use, the codec must be robust to lost packets, but not to corrupted ones. All this led to the choice of code excited linear prediction (CELP) as the encoding technique to use for Speex. One of the main reasons is that CELP has long proven that it could do the job and scale well to both low
bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...
s (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with
G.728 G.728 is an ITU-T standard for speech coding operating at 16  kbit/s. It is officially described as ''Coding of speech at 16 kbit/s using low-delay code excited linear prediction''. Technology used is LD-CELP, low-delay code excited linear pre ...
@ 16 kbit/s). The main characteristics can be summarized as follows: *
Free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, n ...
/
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...
,
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A ...
and royalty-free. * Integration of narrowband and wideband in the same bit-stream. * Wide range of bit rates available (from 2 kbit/s to 44 kbit/s). * Dynamic bit rate switching and variable bit-rate (VBR). * Voice activity detection (VAD, integrated with VBR) (not working from version 1.2). * Variable complexity. * Ultra-wideband mode at 32 kHz (up to 48 kHz). * Intensity stereo encoding option.


Features

;Sampling rate: Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmit
telephone A telephone is a telecommunications device that permits two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most efficiently the human voice, into e ...
calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband. ;Quality: Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
, while for variable bit-rate (VBR), the parameter is a real (
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...
) number. ;Complexity (variable): With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way similar to the -1 to -9 options to gzip compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
requirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones, or if encoding is not in real-time. ; Variable bit-rate (VBR): Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like
vowel A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (len ...
s and high-energy
transient ECHELON, originally a secret government code name, is a surveillance program (signals intelligence/SIGINT collection and analysis network) operated by the five signatory states to the UKUSA Security Agreement:Given the 5 dialects that us ...
s require a higher bit rate to achieve good quality, while
fricative A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in ...
s (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has three main drawbacks: first, by only specifying quality, there is no guarantee about the final average bit-rate. Second, for some real-time applications like
voice over IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet t ...
(VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. Third, encryption of VBR-encoded speech may not ensure complete privacy, as phrases can still be identified, at least in a controlled setting with a small dictionary of phrases, by analysing the pattern of variation of the bit rate. ;Average bit-rate (ABR): Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate. ; Voice Activity Detection (VAD): When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called " comfort noise generation" (CNG). Last version VAD was working fine is 1.1.12, since v 1.2 it has been replaced with simple Any Activity Detection. ; Discontinuous transmission (DTX): Discontinuous transmission is an addition to VAD/VBR operation which allows ceasing transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s). ;Perceptual enhancement: Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement). ;Algorithmic delay: Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values do not account for the CPU time it takes to encode or decode the frames.


Applications

There are a large base of applications supporting the Speex codec. Examples include: *
Streaming Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content i ...
applications like teleconference (e.g. TeamSpeak, Mumble) * VoIP systems (e.g.
Asterisk The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star. Computer scientists and mathematicians often voc ...
) * Videogames (e.g.
Xbox Live The Xbox network, formerly and still sometimes branded as Xbox Live, is an online multiplayer gaming and digital media delivery service created and operated by Microsoft. It was first made available to the Xbox system on November 15, 2002. A ...
,As announced by Ralph Giles, the
Theora Theora is a free lossy video compression format. It is developed by the Xiph.Org Foundation and distributed without licensing fees alongside their other free and open media projects, including the Vorbis audio format and the Ogg container ...
codec maintainer, on LugRadiobr>episode 29
/ref> '' Civilization 4'', '' DropMix'' vocal tracks, ...) * Audio processing applications. Most of these are based on the DirectShow filter or OpenACM codec (e.g.
Microsoft NetMeeting Microsoft NetMeeting is a discontinued VoIP and multi-point videoconferencing client included in many versions of Microsoft Windows (from Windows 95 OSR2 to Windows Vista). It uses the H.323 protocol for videoconferencing, and is interoperable ...
) on
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
, or Xiph.org's reference implementation, libvorbis, on
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
(e.g. Ekiga). There are also plugins for many audio players. See the plugin and software page on the speex.org site for more details. The media type for Speex is audio/ogg while contained by Ogg, and audio/speex (previously audio/x-speex) when transported through RTP or without container. The
United States Army The United States Army (USA) is the land service branch of the United States Armed Forces. It is one of the eight U.S. uniformed services, and is designated as the Army of the United States in the U.S. Constitution.Article II, section 2, ...
's Land Warrior system, designed by
General Dynamics General Dynamics Corporation (GD) is an American publicly traded, aerospace and defense corporation headquartered in Reston, Virginia. As of 2020, it was the fifth-largest defense contractor in the world by arms sales, and 5th largest in the Uni ...
, also uses Speex for VoIP on an EPLRS radio designed by
Raytheon Raytheon Technologies Corporation is an American multinational aerospace and defense conglomerate headquartered in Arlington, Virginia. It is one of the largest aerospace and defense manufacturers in the world by revenue and market capitali ...
. The Ear Bible is a single-ear headphone with a built-in Speex player with 1 GB of flash memory, preloaded with a recording of the
New American Standard Bible The New American Standard Bible (NASB) is an English translation of the Bible. Published by the Lockman Foundation, the complete NASB was released in 1971. The NASB relies on recently published critical editions of the original Hebrew and Gre ...
. ASL Safety & Security's Linux based VIPA OS software which is used in long line public address systems and voice alarm systems at major international air transport hubs and rail networks. The Rockbox project uses Speex for its voice interface. It can also play Speex files on supported players, such as the Apple iPod or the iRiver H10. The Vernier LabQuest handheld data acquisition device for science education uses Speex for voice annotations created by students and teachers using either the built-in or an external microphone. The Google Mobile App for iPhone currently incorporates Speex. It has also been suggested that the new
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
voice search iPhone app is using Speex to transmit voice to Google servers for interpretation. Adobe
Flash Player Adobe Flash Player (known in Internet Explorer, Firefox, and Google Chrome as Shockwave Flash) is computer software for viewing multimedia contents, executing rich Internet applications, and streaming audio and video content created on the ...
supports Speex starting with Flash Player 10.0.12.36, released in October 2008. Because of some bugs in Flash Player, the first recommended version for Speex support is 10.0.22.87 and later. Speex in Flash Player can be used for both kind of communication, through Flash Media Server or P2P. Speex can be decoded or converted to any format unlike
Nellymoser Asao (also known as Nellymoser audio codec) is a proprietary single-channel (mono) codec and compression format optimized for low-bitrate transmission of audio, developed by Nellymoser Inc. Technical Details Sound data is grouped into frames o ...
audio, which was the only speech format in previous versions of Flash Player. Speex can be also used in the
Flash Video Flash Video is a container file format used to deliver digital video content (e.g., TV shows, movies, etc.) over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There ar ...
container format A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. No ...
(.flv), starting with version 10 of Video File Format Specification (published in November 2008). The JavaSonics ListenUp voice recorder uses Speex to compress voice messages that are recorded in a browser and then uploaded to a web server. Primary applications are language training, transcription and social networking. Speex is used as the voice compression algorithm in the
Siri Siri ( ) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer qu ...
voice assistance on the
iPhone 4S The iPhone 4S (originally styled as iPhone 4 S, retroactively stylized with a lowercase 's' as iPhone 4s as of September 2013) is a smartphone that was designed and marketed by Apple Inc. It is the fifth generation of the iPhone, succ ...
. Since text-to-speech occurs on Apple's servers, the Speex codec is used to minimize network bandwidth.


See also

*
Comparison of audio coding formats The following tables compare general and technical information for a variety of audio coding formats. For listening tests comparing the perceived audio quality of audio formats and codecs, see the article Codec listening test. General informati ...
*
Opus (audio format) Opus is a lossy audio coding format developed by the Xiph.Org Foundation and standardized by the Internet Engineering Task Force, designed to efficiently code speech and general audio in a single format, while remaining low-latency enough fo ...
- successor of Speex


Sources

''This article uses material from th
Speex Codec Manual
which is copyright © Jean-Marc Valin and licensed under the terms of the
GFDL The GNU Free Documentation License (GNU FDL or simply GFDL) is a copyleft license for free documentation, designed by the Free Software Foundation (FSF) for the GNU Project. It is similar to the GNU General Public License, giving readers th ...
.''


References


External links

* – RTP Payload Format for the Speex Codec
Official Speex homepage

Plugin & software page

JSpeex is a port of Speex to the Java platform

NSpeex is a port of Speex to the .NET platform and Silverlight based on JSpeex

CSpeex is a port of Speex to the .NET platform based on JSpeex
* – Ogg Media Types * http://dirac.epucfe.eu/projets/wakka.php?wiki=P12AB10 - Speex Encoder Player (César MBUMBA) {{Compression Software Implementations Speech codecs Free audio codecs Xiph.Org projects GNU Project software Open formats