HOME
The Info List - Speex


--- Advertisement ---



Speex is an audio compression format specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts.[6] It is based on the CELP speech coding algorithm.[7] Speex claims to be free of any patent restrictions and is licensed under the revised (3-clause) BSD license. It may be used with the Ogg
Ogg
container format or directly transmitted over UDP/RTP. It may also be used with the FLV
FLV
container format.[8] The Speex designers see their project as complementary to the Vorbis general-purpose audio compression project. Speex is a lossy format, i.e. quality is permanently degraded to reduce file size. The Speex project was created on February 13, 2002.[9] The first development versions of Speex were released under LGPL
LGPL
license, but as of version 1.0 beta 1, Speex is released under Xiph's version of the (revised) BSD license.[10] Speex 1.0 was announced on March 24, 2003, after a year of development.[11] The last stable version of Speex encoder and decoder is 1.2.0.[3] Xiph.Org now considers Speex obsolete; its successor is the more modern Opus codec, which surpasses its performance in all areas.[12]

Contents

1 Description

1.1 Features

2 Applications 3 See also 4 Sources 5 References 6 External links

Description[edit] Speex is targeted at voice over IP (VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32 kHz sampling rate), wideband (16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Since Speex was designed for VoIP instead of cell phone use, the codec must be robust to lost packets, but not to corrupted ones. All this led to the choice of code excited linear prediction (CELP) as the encoding technique to use for Speex.[7] One of the main reasons is that CELP has long proven that it could do the job and scale well to both low bit rates (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with G.728 @ 16 kbit/s). The main characteristics can be summarized as follows:

Free software/open-source, patent and royalty-free. Integration of narrowband and wideband in the same bit-stream. Wide range of bit rates available (from 2 kbit/s to 44 kbit/s). Dynamic bit rate switching and variable bit-rate (VBR). Voice activity detection (VAD, integrated with VBR) (not working from version 1.2). Variable complexity. Ultra-wideband mode at 32 kHz (up to 48 kHz). Intensity stereo
Intensity stereo
encoding option.

Features[edit]

Sampling rate Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmit telephone calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband. Quality Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a real (floating point) number. Complexity (variable) With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way similar to the -1 to -9 options to gzip compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4,[13] though higher settings are often useful when encoding non-speech sounds like DTMF
DTMF
tones, or if encoding is not in real-time. Variable bit-rate (VBR) Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit rate to achieve good quality, while fricatives (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has three main drawbacks: first, by only specifying quality, there is no guarantee about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. Third, encryption of VBR-encoded speech may not ensure complete privacy, as phrases can still be identified, at least in a controlled setting with a small dictionary of phrases,[14] by analysing the pattern of variation of the bit rate. Average bit-rate (ABR) Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate. Voice Activity Detection (VAD) When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "comfort noise generation" (CNG). Last version VAD was working fine is 1.1.12, since v 1.2 it has been replaced with simple Any Activity Detection. Discontinuous transmission (DTX) Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s). Perceptual enhancement Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement). Algorithmic delay Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values do not account for the CPU time it takes to encode or decode the frames.

Applications[edit] There are a large base of applications supporting the Speex codec. Examples include:

Streaming applications like teleconference (e.g. TeamSpeak, Mumble) VoIP systems (e.g. Asterisk) Videogames (e.g. Xbox Live,[15] Civilization 4, DropMix vocal tracks, ...) Audio processing applications.

Most of these are based on the DirectShow filter or OpenACM codec (e.g. Microsoft NetMeeting) on Microsoft Windows, or Xiph.org's reference implementation, libvorbis, on Linux
Linux
(e.g. Ekiga). There are also plugins for many audio players. See the plugin and software page on the speex.org site for more details.[16] The media type for Speex is audio/ogg while contained by Ogg, and audio/speex (previously audio/x-speex) when transported through RTP or without container. The United States Army's Land Warrior
Land Warrior
system, designed by General Dynamics, also uses Speex for VoIP on an EPLRS radio designed by Raytheon. The Ear Bible[17] is a single-ear headphone with a built-in Speex player with 1 GB of flash memory,[18] preloaded with a recording of the New American Standard Bible. ASL Safety & Security's[19] Linux
Linux
based VIPA OS software[20] which is used in long line public address systems and voice alarm systems at major international air transport hubs and rail networks. The Rockbox
Rockbox
project uses Speex for its voice interface. It can also play Speex files on supported players, such as the Apple iPod or the iRiver H10. The Vernier LabQuest[21] handheld data acquisition device for science education uses Speex for voice annotations created by students and teachers using either the built-in or an external microphone. The Google
Google
Mobile App for iPhone currently incorporates Speex.[22] It has also been suggested that the new Google
Google
voice search iPhone app is using Speex to transmit voice to Google
Google
servers for interpretation.[23] Adobe Flash Player
Flash Player
supports Speex starting with Flash Player 10.0.12.36, released in October 2008.[24] Because of some bugs in Flash Player, the first recommended version for Speex support is 10.0.22.87 and later. Speex in Flash Player
Flash Player
can be used for both kind of communication, through Flash Media Server
Flash Media Server
or P2P. Speex can be decoded or converted to any format unlike Nellymoser audio, which was the only speech format in previous versions of Flash Player.[25][26] Speex can be also used in the Flash Video
Flash Video
container format (.flv), starting with version 10 of Video File
File
Format Specification (published in November 2008).[27] The JavaSonics ListenUp[28] voice recorder uses Speex to compress voice messages that are recorded in a browser and then uploaded to a web server. Primary applications are language training, transcription and social networking. Speex is used as the voice compression algorithm in the Siri voice assistance on the iPhone 4S.[29] Since text-to-speech occurs on Apple's servers, the Speex codec is used to minimize network bandwidth. See also[edit]

Free software
Free software
portal

Comparison of audio coding formats Opus (audio format)
Opus (audio format)
- successor of Speex

Sources[edit] This article uses material from the Speex Codec Manual which is copyright © Jean-Marc Valin and licensed under the terms of the GFDL. References[edit]

^ "PlayOgg! - FSF - Free Software Foundation". 2010-03-17. Retrieved 2013-10-01.  ^ Jean-Marc Valin (2009). "people.xiph.org - personal webspace of the xiphs - Jean-Marc Valin". Xiph.Org. Retrieved 2009-09-11.  ^ a b " Speex News". Xiph.Org Foundation. Retrieved 2017-04-11.  ^ "The Speex Codec Manual - Speex License". Xiph.Org Foundation. Retrieved 2009-09-01.  ^ "Sample Xiph.Org Variant of the BSD License". Xiph.Org Foundation. Retrieved 2009-08-29.  ^ Xiph.Org Speex: A Free Codec For Free Speech, Retrieved 2009-09-01 ^ a b Xiph.Org Introduction to CELP Coding, Retrieved 2009-09-01 ^ Adobe FLV
FLV
format specification, retrieved 2016-04-18 ^ Xiph.org Speex releases - pre-1.0 - NEWS and ChangeLog in speex-0.0.1.tar.gz, Retrieved 2009-09-01 ^ Xiph.Org Speex FAQ – Under what license is Speex released?, Retrieved 2009-09-01 ^ Xiph.Org (2003-03-24) Speex reaches 1.0; Xiph.Org now a 501(c)(3) Non-Profit Organization, Retrieved 2009-09-01 ^ [1] Speex homepage, retrieved 2017-04-11 ^ Codec Description ^ Spot me if you can: Uncovering Spoken Phrases in Encrypted VoIP Conversations (Charles V. Wright Lucas Ballard Scott E. Coull Fabian Monrose Gerald M. Masson) ^ As announced by Ralph Giles, the Theora codec maintainer, on LugRadio
LugRadio
episode 29 ^ "A free codec for free speech". Speex. Retrieved 2012-12-29.  ^ Lascelles, LLC. "The worlds most convenient Audio Bible". Ear Bible. Retrieved 2012-12-29.  ^ Lascelles, LLC. "Support". Ear Bible. Retrieved 2012-12-29.  ^ "PA/VA, PSIM Software and Station Management Systems > ASL Safety & Security". Asl-control.co.uk. Retrieved 2012-12-29.  ^ IPAM 400: IP Based Intelligent Public Address Amplifier - User Manual ^ "LabQuest 2 > Vernier Software & Technology". Vernier.com. 2012-05-23. Retrieved 2012-12-29.  ^ "Legal Notices". Google
Google
Inc. Retrieved 2014-12-05.  ^ Deconstructing Google
Google
Mobile's Voice Search on the iPhone ^ Adobe (2008) Flash Player
Flash Player
10 Datasheet, Retrieved 2009-09-01 ^ AskMeFlash.com (2009-05-10) Speex for Flash, Retrieved on 2009-08-12 ^ AskMeFlash.com (2009-05-10) Speex vs Nellymoser, Retrieved on 2009-08-12 ^ Adobe Systems Incorporated (November 2008). "Video File
File
Format Specification, Version 10" (PDF). Adobe Systems Incorporated. Retrieved 2014-12-05.  ^ Phil Burk. "JavaSonics ListenUp voice recording Applet for Java that uploads messages to a web server". Javasonics.com. Retrieved 2012-12-29.  ^ "Applidium — News". Applidium.com. Retrieved 2012-12-29. 

External links[edit]

RFC 5574 – RTP Payload Format for the Speex Codec Official Speex homepage Plugin & software page J Speex is a port of Speex to the Java platform N Speex is a port of Speex to the .NET platform and Silverlight based on JSpeex C Speex is a port of Speex to the .NET platform based on JSpeex RFC 5334 – Ogg
Ogg
Media Types http://dirac.epucfe.eu/projets/wakka.php?wiki=P12AB10 - Speex Encoder Player (César MBUMBA)

v t e

Xiph.Org Foundation

Ogg
Ogg
Project

Vorbis Daala Theora FLAC Opus CELT Speex Tremor OggPCM Ogg
Ogg
Writ

Other projects

XSPF Annodex cdparanoia Icecast

Related articles

Chris Montgomery CMML Ogg
Ogg
page Ogg
Ogg
Squish Use of Ogg
Ogg
formats in HTML5 Vorbis
Vorbis
comment

v t e

GNU
GNU
Project

History

GNU
GNU
Manifesto Free Software Foundation

Europe India Latin America

History of free software

Licenses

GNU
GNU
General Public License GNU
GNU
Lesser General Public License GNU
GNU
Affero General Public License GNU
GNU
Free Documentation License GPL linking exception

Software

GNU
GNU
(variants) Hurd Linux-libre glibc Bash coreutils findutils Build System GCC binutils GDB GRUB GNOME GNUstep GIMP GNU
GNU
Ring GNU
GNU
Emacs GNU
GNU
TeXmacs GNU
GNU
Octave GNU
GNU
R GSL GMP GNU
GNU
Electric GNU
GNU
Archimedes GNUnet GNU
GNU
Privacy Guard Gnuzilla (IceCat) GNU
GNU
Health GNUmed GNU
GNU
LilyPond GNU
GNU
Go GNU
GNU
Chess Gnash Guix Guix System Distribution more...

Public speakers

Alexandre Oliva Benjamin Mako Hill Bradley M. Kuhn Eben Moglen Federico Heinz Georg C. F. Greve John Sullivan Loïc Dachary Matt Lee Nagarjuna G. Ricardo Galli Richard Stallman Robert J. Chassell

Other topics

GNU/ Linux
Linux
naming controversy Revolution OS Free Software Foundation
Free Software Foundation
anti-Windows campaigns Defective by Design

v t e

Multimedia
Multimedia
compression and container formats

Video compression

ISO/IEC

MJPEG Motion JPEG 2000 MPEG-1 MPEG-2

Part 2

MPEG-4

Part 2/ASP Part 10/AVC

MPEG-H

Part 2/HEVC

ITU-T

H.120 H.261 H.262 H.263 H.264 H.265

SMPTE

VC-1 VC-2 VC-3 VC-5

Alliance for Open Media

AV1

Others

Apple Video AVS Bink Cinepak Daala Dirac DV DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 ProRes 4444 QuickTime

Animation Graphics

RealVideo RTVideo SheerVideo Smacker Sorenson Video, Spark Theora Thor VP3 VP6 VP7 VP8 VP9 WMV XEB YULS

Audio compression

ISO/IEC

MPEG-1
MPEG-1
Layer III (MP3) MPEG-1
MPEG-1
Layer II

Multichannel

MPEG-1
MPEG-1
Layer I AAC

HE-AAC AAC-LD

MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio

ITU-T

G.711 (A-law, µ-law) G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1

IETF

Opus iLBC

3GPP

AMR AMR-WB AMR-WB+ EVRC EVRC-B EVS GSM-HR GSM-FR GSM-EFR

Others

ACELP AC-3 AC-4 ALAC Asao ATRAC CELT Codec2 DRA DTS FLAC iSAC Monkey's Audio TTA

True Audio

MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV Speex SVOPC TwinVQ VMR-WB Vorbis VSELP WavPack WMA MQA aptX LDAC

Image compression

IEC, ISO, ITU-T, W3C, IETF

CCITT Group 4 GIF HEIF HEVC JBIG JBIG2 JPEG JPEG-LS JPEG
JPEG
2000 JPEG
JPEG
XR JPEG
JPEG
XT PNG TIFF TIFF/EP TIFF/IT

Others

APNG BPG DjVu EXR FLIF ICER MNG PGF QTVR WBMP WebP

Containers

ISO/IEC

MPEG-ES

MPEG-PES

MPEG-PS MPEG-TS ISO base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport

ITU-T

H.222.0 T.802

IETF

RTP

Others

3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink

Smacker

BMP DivX
DivX
Media Format EVO Flash Video GXF IFF M2TS Matroska

WebM

MXF Ogg QuickTime
QuickTime
File
File
Format RatDVD RealMedia RIFF

WAV

MOD and TOD VOB, IFO and BUP

Collaborations

NETVC MPEG-LA

See Compression methods for methods and Compression software for codecs

v t e

Data compression
Data compression
software

Archivers with compression (comparison)

Free software

7-Zip Archive Manager Ark Expander FreeArc Info-ZIP KGB Archiver PAQ PeaZip The Unarchiver (decompression only) tar Xarchiver Zipeg ZPAQ

Freeware

Filzip LHA StuffIt Expander (decompression only) TUGZip ZipGenius

Commercial

ARC ALZip Archive Utility ARJ BetterZip JAR MacBinary PKZIP/SecureZIP PowerArchiver StuffIt WinAce WinRAR WinZip

Non-archiving compressors

Generic

bzip2 compress gzip lzip lzop pack rzip Snappy XZ Utils

For code

UPX

Audio compression (comparison)

Lossy

Fraunhofer FDK AAC Nero AAC Codec Freeware
Freeware
Advanced Audio Coder (FAAC) Helix DNA Producer l3enc LAME TooLAME libavcodec libcelt libopus libspeex Musepack libvorbis Windows Media Encoder

Lossless

ALAC FLAC libavcodec Monkey's Audio mp4als OptimFROG Shorten TTA (True Audio) WavPack

Video compression (comparison)

Lossy

MPEG-4 ASP

3ivx DivX Nero Digital FFmpeg HDX4 Xvid

H.264 / MPEG-4 AVC

CoreAVC Blu-code DivX FFmpeg Nero Digital OpenH264 QuickTime x264

HEVC

DivX x265

Others

CineForm Cinepak Daala DNxHD Helix DNA Producer Indeo libavcodec Schrödinger (Dirac) SBC Sorenson VP7 libtheora libvpx Windows Media Encoder

Lossless

FFV1 Huffyuv Lagarith MSU Lossless YULS

See also: compression methods and c

.