Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a

free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...

speech codec Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...

that may be used on

VoIP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet t ...

applications and

podcast A podcast is a program made available in digital format for download over the Internet. For example, an episodic series of digital audio or video files that a user can download to a personal device to listen to at a time of their choosing ...

s. It is based on the

CELP Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...

speech coding algorithm.Xiph.Or
Introduction to CELP Coding
Retrieved 2009-09-01 Speex claims to be free of any

patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A p ...

restrictions and is licensed under the revised (3-clause)

BSD license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...

. It may be used with the

Ogg Ogg is a free, open container format maintained by the Xiph.Org Foundation. The authors of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high-quality d ...

container format or directly transmitted over UDP/ RTP. It may also be used with the

FLV Flash Video is a container file format used to deliver digital video content (e.g., TV shows, movies, etc.) over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are t ...

container format. The Speex designers see their project as complementary to the

Vorbis Vorbis is a free and open-source software project headed by the Xiph.Org Foundation. The project produces an audio coding format and software reference encoder/decoder (codec) for lossy audio compression. Vorbis is most commonly used in conjun ...

general-purpose audio compression project. Speex is a

lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

format, ''i.e.'' quality is permanently degraded to reduce file size. The Speex project was created on February 13, 2002. The first development versions of Speex were released under

LGPL The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...

license, but as of version 1.0 beta 1, Speex is released under Xiph's version of the (revised) BSD license. Speex 1.0 was announced on March 24, 2003, after a year of development.Xiph.Org (2003-03-24
Speex reaches 1.0; Xiph.Org now a 501(c)(3) Non-Profit Organization
Retrieved 2009-09-01 The last stable version of Speex encoder and decoder is 1.2.0. Xiph.Org now considers Speex obsolete; its successor is the more modern

Opus ''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.). Opus or OPUS may refer to: Arts and entertainment Music * Opus number, (abbr. Op.) specifying order of (usually) publicatio ...

codec, which uses the

SILK Silk is a natural protein fiber, some forms of which can be woven into textiles. The protein fiber of silk is composed mainly of fibroin and is produced by certain insect larvae to form cocoons. The best-known silk is obtained from the coc ...

format under license from

Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...

and surpasses its performance in most areas except at the lowest sample rates.

Description

Speex is targeted at

voice over IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of speech, voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms In ...

(VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32

kHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that on ...

sampling rate In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or spac ...

wideband In communications, a system is wideband when the message bandwidth significantly exceeds the coherence bandwidth of the Channel (communications), channel. Some communication links have such a high Bit rate, data rate that they are forced to use a ...

(16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Since Speex was designed for VoIP instead of cell phone use, the codec must be robust to lost packets, but not to corrupted ones. All this led to the choice of

code excited linear prediction Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...

(CELP) as the encoding technique to use for Speex. One of the main reasons is that CELP has long proven that it could do the job and scale well to both low

bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction w ...

s (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with G.728 @ 16 kbit/s). The main characteristics can be summarized as follows: *

Free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...

open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...

and

royalty Royalty may refer to: * Any individual monarch, such as a king, queen, emperor, empress, etc. * Royal family, the immediate family of a king or queen regnant, and sometimes his or her extended family * Royalty payment for use of such things as int ...

-free. * Integration of narrowband and wideband in the same bit-stream. * Wide range of bit rates available (from 2 kbit/s to 44 kbit/s). * Dynamic bit rate switching and variable bit-rate (VBR). *

Voice activity detection Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speech coding and speech recognition. I ...

(VAD, integrated with VBR) (not working from version 1.2). * Variable complexity. * Ultra-wideband mode at 32 kHz (up to 48 kHz). * Intensity stereo encoding option.

Features

;Sampling rate: Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmit

telephone A telephone is a telecommunications device that permits two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most efficiently the human voice, into e ...

calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband. ;Quality: Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

, while for variable bit-rate (VBR), the parameter is a real (

floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...

) number. ;Complexity (variable): With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way similar to the -1 to -9 options to

gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...

compression Compression may refer to: Physical science *Compression (physics), size reduction due to forces *Compression member, a structural element such as a column *Compressibility, susceptibility to compression * Gas compression *Compression ratio, of a ...

utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like

DTMF Dual-tone multi-frequency signaling (DTMF) is a telecommunication signaling system using the voice-frequency band over telephone lines between telephone equipment and other communications devices and switching centers. DTMF was first developed ...

tones, or if encoding is not in real-time. ; Variable bit-rate (VBR): Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like

vowel A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (leng ...

s and high-energy transients require a higher bit rate to achieve good quality, while

fricative A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in t ...

s (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has three main drawbacks: first, by only specifying quality, there is no guarantee about the final average bit-rate. Second, for some real-time applications like

(VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. Third, encryption of VBR-encoded speech may not ensure complete privacy, as phrases can still be identified, at least in a controlled setting with a small dictionary of phrases, by analysing the pattern of variation of the bit rate. ;Average bit-rate (ABR): Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate. ;

Voice Activity Detection Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speech coding and speech recognition. I ...

(VAD): When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "

comfort noise Comfort noise (or comfort tone) is synthetic background noise used in radio and wireless communications to fill the artificial silence in a transmission resulting from voice activity detection or from the audio clarity of modern digital lines. ...

generation" (CNG). Last version VAD was working fine is 1.1.12, since v 1.2 it has been replaced with simple Any Activity Detection. ;

Discontinuous transmission Discontinuous transmission (DTX) is a means by which a mobile telephone is temporarily shut off or muted while the phone lacks a voice input. Misconception A common misconception is that DTX improves capacity by freeing up TDMA time slots for us ...

(DTX): Discontinuous transmission is an addition to VAD/VBR operation which allows ceasing transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s). ;Perceptual enhancement: Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement). ;Algorithmic delay: Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values do not account for the CPU time it takes to encode or decode the frames.

Applications

Opus quality comparison colorblind compatible

There are a large base of applications supporting the Speex codec. Examples include: *

Streaming Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...

applications like

teleconference A teleconference is the live exchange of information among several people remote from one another but linked by a telecommunications system. Terms such as audio conferencing, telephone conferencing and phone conferencing are also sometimes used t ...

(e.g. TeamSpeak, Mumble) * VoIP systems (e.g.

Asterisk The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star. Computer scientists and mathematicians often voc ...

) * Videogames (e.g.

Xbox Live The Xbox network, formerly and still sometimes branded as Xbox Live, is an Internet, online multiplayer video game, multiplayer gaming and digital media delivery service created and operated by Microsoft. It was first made available to the Xbox ...

,As announced by Ralph Giles, the

Theora Theora is a free file format, free Lossy compression, lossy video compression format. It is developed by the Xiph.Org Foundation and distributed without licensing fees alongside their other free and open media projects, including the Vorbis ...

codec maintainer, on LugRadiobr>episode 29
/ref> '' Civilization 4'', ''

DropMix ''DropMix'' is a music mixing game developed by Harmonix and published by Hasbro, no longer supported in either iOS or Android app stores. The game uses a mix of physical cards with chips embedded with near field communication, a specialized elec ...

'' vocal tracks, ...) * Audio processing applications. Most of these are based on the

DirectShow DirectShow (sometimes abbreviated as DS or DShow), codename Quartz, is a multimedia framework and API produced by Microsoft for software developers to perform various operations with media files or streams. It is the replacement for Microsoft's ea ...

filter or OpenACM codec (e.g.

Microsoft NetMeeting Microsoft NetMeeting is a discontinued VoIP and multi-point videoconferencing client included in many versions of Microsoft Windows (from Windows 95 OSR2 to Windows Vista). It uses the H.323 protocol for videoconferencing, and is interoperable wit ...

) on

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...

, or Xiph.org's reference implementation, libvorbis, on

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...

(e.g.

Ekiga Ekiga (formerly called GnomeMeeting) is a VoIP and video conferencing application for GNOME and Microsoft Windows. It is distributed as free software under the terms of the GNU GPL-2.0-or-later. It was the default VoIP client in Ubuntu until Octob ...

). There are also plugins for many audio players. See the plugin and software page on the speex.org site for more details. The media type for Speex is audio/ogg while contained by Ogg, and audio/speex (previously audio/x-speex) when transported through RTP or without container. The

United States Army The United States Army (USA) is the land service branch of the United States Armed Forces. It is one of the eight U.S. uniformed services, and is designated as the Army of the United States in the U.S. Constitution.Article II, section 2, cla ...

Land Warrior Land Warrior was a United States Army program, launched in 1994, cancelled in 2007 U.S. Army Budget Request Documents FY2008 (page 4) but restarted in 2008,http://www.army-technology.com/projects/land_warrior/ Land Warrior at Army-Technology.com ...

system, designed by

General Dynamics General Dynamics Corporation (GD) is an American publicly traded, aerospace and defense corporation headquartered in Reston, Virginia. As of 2020, it was the fifth-largest defense contractor in the world by arms sales, and 5th largest in the Uni ...

, also uses Speex for VoIP on an

EPLRS The Enhanced Position Location Reporting System (EPLRS) is a secure, jam-resistant, computer-controlled communications network that distributes near real-time tactical information, generally integrated into radio sets, and coordinated by a Network C ...

radio designed by

Raytheon Raytheon Technologies Corporation is an American multinational aerospace and defense conglomerate headquartered in Arlington, Virginia. It is one of the largest aerospace and defense manufacturers in the world by revenue and market capitaliza ...

. The Ear Bible is a single-ear headphone with a built-in Speex player with 1 GB of flash memory, preloaded with a recording of the

New American Standard Bible The New American Standard Bible (NASB) is an English translation of the Bible. Published by the Lockman Foundation, the complete NASB was released in 1971. The NASB relies on recently published critical editions of the original Hebrew and Gre ...

. ASL Safety & Security's Linux based VIPA OS software which is used in long line public address systems and voice alarm systems at major international air transport hubs and rail networks. The

Rockbox Rockbox is a free and open-source software replacement for the OEM firmware in various forms of digital audio players (DAPs) with an original kernel. It offers an alternative to the player's operating system, in many cases without removing the or ...

project uses Speex for its voice interface. It can also play Speex files on supported players, such as the Apple iPod or the iRiver H10. The Vernier LabQuest handheld data acquisition device for science education uses Speex for voice annotations created by students and teachers using either the built-in or an external microphone. The Google Mobile App for iPhone currently incorporates Speex. It has also been suggested that the new

Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...

voice search iPhone app is using Speex to transmit voice to Google servers for interpretation. Adobe

Flash Player Adobe Flash Player (known in Internet Explorer, Firefox, and Google Chrome as Shockwave Flash) is computer software for viewing multimedia contents, executing rich Internet applications, and streaming audio and video content created on the ...

supports Speex starting with Flash Player 10.0.12.36, released in October 2008. Because of some bugs in Flash Player, the first recommended version for Speex support is 10.0.22.87 and later. Speex in Flash Player can be used for both kind of communication, through

Flash Media Server Adobe Media Server (AMS) is a proprietary data and media server from Adobe Systems (originally a Macromedia product). This server works with the Flash Player and HTML5 runtime to create media driven, multiuser RIAs ( Rich Internet Applications ...

P2P P2P may refer to: * Pay to play, where money is exchanged for services * Peer-to-peer, a distributed application architecture in computing or networking ** List of P2P protocols * Phenylacetone, an organic compound commonly known as P2P * Poin ...

. Speex can be decoded or converted to any format unlike Nellymoser audio, which was the only speech format in previous versions of Flash Player. Speex can be also used in the

Flash Video Flash Video is a container file format used to deliver digital video content (e.g., TV shows, movies, etc.) over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There ar ...

container format (.flv), starting with version 10 of Video File Format Specification (published in November 2008). The JavaSonics ListenUp voice recorder uses Speex to compress voice messages that are recorded in a browser and then uploaded to a web server. Primary applications are language training, transcription and social networking. Speex is used as the voice compression algorithm in the

Siri Siri ( ) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questio ...

voice assistance on the

iPhone 4S The iPhone 4S (originally styled as iPhone 4 S, retroactively stylized with a lowercase 's' as iPhone 4s as of September 2013) is a smartphone that was designed and marketed by Apple Inc. It is the List of iOS devices, fifth generation o ...

. Since text-to-speech occurs on Apple's servers, the Speex codec is used to minimize network bandwidth.

Sources

''This article uses material from th
Speex Codec Manual
which is copyright © Jean-Marc Valin and licensed under the terms of the

GFDL The GNU Free Documentation License (GNU FDL or simply GFDL) is a copyleft license for free documentation, designed by the Free Software Foundation (FSF) for the GNU Project. It is similar to the GNU General Public License, giving readers the r ...

.''

References

External links