Speex is an audio compression format specifically tuned for the
reproduction of human speech and also a free software speech codec
that may be used on VoIP applications and podcasts.[6] It is based on
the CELP speech coding algorithm.[7]
Speex claims to be free of any
patent restrictions and is licensed under the revised (3-clause) BSD
license. It may be used with the
Ogg

Ogg container format or directly
transmitted over UDP/RTP. It may also be used with the
FLV

FLV container
format.[8]
The
Speex designers see their project as complementary to the Vorbis
general-purpose audio compression project.
Speex is a lossy format, i.e. quality is permanently degraded to
reduce file size.
The
Speex project was created on February 13, 2002.[9] The first
development versions of
Speex were released under
LGPL

LGPL license, but as
of version 1.0 beta 1,
Speex is released under Xiph's version of the
(revised) BSD license.[10]
Speex 1.0 was announced on March 24, 2003,
after a year of development.[11] The last stable version of Speex
encoder and decoder is 1.2.0.[3]
Xiph.Org now considers
Speex obsolete; its successor is the more
modern Opus codec, which surpasses its performance in all areas.[12]
Contents
1 Description
1.1 Features
2 Applications
3 See also
4 Sources
5 References
6 External links
Description[edit]
Speex is targeted at voice over IP (VoIP) and file-based compression.
The design goals have been to make a codec that would be optimized for
high quality speech and low bit rate. To achieve this the codec uses
multiple bit rates, and supports ultra-wideband (32 kHz sampling
rate), wideband (16 kHz sampling rate) and narrowband (telephone
quality, 8 kHz sampling rate). Since
Speex was designed for VoIP
instead of cell phone use, the codec must be robust to lost packets,
but not to corrupted ones. All this led to the choice of code excited
linear prediction (CELP) as the encoding technique to use for
Speex.[7] One of the main reasons is that CELP has long proven that it
could do the job and scale well to both low bit rates (as evidenced by
DoD CELP @ 4.8 kbit/s) and high bit rates (as with
G.728 @
16 kbit/s). The main characteristics can be summarized as
follows:
Free software/open-source, patent and royalty-free.
Integration of narrowband and wideband in the same bit-stream.
Wide range of bit rates available (from 2 kbit/s to
44 kbit/s).
Dynamic bit rate switching and variable bit-rate (VBR).
Voice activity detection (VAD, integrated with VBR) (not working from
version 1.2).
Variable complexity.
Ultra-wideband mode at 32 kHz (up to 48 kHz).
Intensity stereo

Intensity stereo encoding option.
Features[edit]
Sampling rate
Speex is mainly designed for three different sampling rates:
8 kHz (the same sampling rate to transmit telephone calls),
16 kHz, and 32 kHz. These are respectively referred to as
narrowband, wideband and ultra-wideband.
Quality
Speex encoding is controlled most of the time by a quality parameter
that ranges from 0 to 10. In constant bit-rate (CBR) operation, the
quality parameter is an integer, while for variable bit-rate (VBR),
the parameter is a real (floating point) number.
Complexity (variable)
With Speex, it is possible to vary the complexity allowed for the
encoder. This is done by controlling how the search is performed with
an integer ranging from 1 to 10 in a way similar to the -1 to -9
options to gzip compression utilities. For normal use, the noise level
at complexity 1 is between 1 and 2 dB higher than at complexity
10, but the CPU requirements for complexity 10 is about five times
higher than for complexity 1. In practice, the best trade-off is
between complexity 2 and 4,[13] though higher settings are often
useful when encoding non-speech sounds like
DTMF

DTMF tones, or if encoding
is not in real-time.
Variable bit-rate (VBR)
Variable bit-rate (VBR) allows a codec to change its bit rate
dynamically to adapt to the "difficulty" of the audio being encoded.
In the example of Speex, sounds like vowels and high-energy transients
require a higher bit rate to achieve good quality, while fricatives
(e.g. s and f sounds) can be coded adequately with fewer bits. For
this reason, VBR can achieve lower bit rate for the same quality, or a
better quality for a certain bit rate. Despite its advantages, VBR has
three main drawbacks: first, by only specifying quality, there is no
guarantee about the final average bit-rate. Second, for some real-time
applications like voice over IP (VoIP), what counts is the maximum
bit-rate, which must be low enough for the communication channel.
Third, encryption of VBR-encoded speech may not ensure complete
privacy, as phrases can still be identified, at least in a controlled
setting with a small dictionary of phrases,[14] by analysing the
pattern of variation of the bit rate.
Average bit-rate (ABR)
Average bit-rate solves one of the problems of VBR, as it dynamically
adjusts VBR quality in order to meet a specific target bit-rate.
Because the quality/bit-rate is adjusted in real-time (open-loop), the
global quality will be slightly lower than that obtained by encoding
in VBR with exactly the right quality setting to meet the target
average bitrate.
Voice Activity Detection (VAD)
When enabled, voice activity detection detects whether the audio being
encoded is speech or silence/background noise. VAD is always
implicitly activated when encoding in VBR, so the option is only
useful in non-VBR operation. In this case,
Speex detects non-speech
periods and encodes them with just enough bits to reproduce the
background noise. This is called "comfort noise generation" (CNG).
Last version VAD was working fine is 1.1.12, since v 1.2 it has been
replaced with simple Any Activity Detection.
Discontinuous transmission (DTX)
Discontinuous transmission is an addition to VAD/VBR operation, that
allows to stop transmitting completely when the background noise is
stationary. In a file, 5 bits are used for each missing frame
(corresponding to 250 bit/s).
Perceptual enhancement
Perceptual enhancement is a part of the decoder which, when turned on,
tries to reduce (the perception of) the noise produced by the
coding/decoding process. In most cases, perceptual enhancement makes
the sound further from the original objectively (signal-to-noise
ratio), but in the end it still sounds better (subjective
improvement).
Algorithmic delay
Every codec introduces a delay in the transmission. For Speex, this
delay is equal to the frame size, plus some amount of "look-ahead"
required to process each frame. In narrowband operation (8 kHz),
the delay is 30 ms, while for wideband (16 kHz), the delay
is 34 ms. These values do not account for the CPU time it takes
to encode or decode the frames.
Applications[edit]
There are a large base of applications supporting the
Speex codec.
Examples include:
Streaming applications like teleconference (e.g. TeamSpeak, Mumble)
VoIP systems (e.g. Asterisk)
Videogames (e.g. Xbox Live,[15] Civilization 4,
DropMix vocal tracks,
...)
Audio processing applications.
Most of these are based on the
DirectShow filter or OpenACM codec
(e.g. Microsoft NetMeeting) on Microsoft Windows, or Xiph.org's
reference implementation, libvorbis, on
Linux

Linux (e.g. Ekiga). There are
also plugins for many audio players. See the plugin and software page
on the speex.org site for more details.[16]
The media type for
Speex is audio/ogg while contained by Ogg, and
audio/speex (previously audio/x-speex) when transported through RTP or
without container.
The United States Army's
Land Warrior

Land Warrior system, designed by General
Dynamics, also uses
Speex for VoIP on an
EPLRS radio designed by
Raytheon.
The Ear Bible[17] is a single-ear headphone with a built-in Speex
player with 1 GB of flash memory,[18] preloaded with a recording of
the New American Standard Bible.
ASL Safety & Security's[19]
Linux

Linux based VIPA OS software[20] which
is used in long line public address systems and voice alarm systems at
major international air transport hubs and rail networks.
The
Rockbox

Rockbox project uses
Speex for its voice interface. It can also
play
Speex files on supported players, such as the Apple iPod or the
iRiver H10.
The Vernier LabQuest[21] handheld data acquisition device for science
education uses
Speex for voice annotations created by students and
teachers using either the built-in or an external microphone.
The
Google

Google Mobile App for iPhone currently incorporates Speex.[22] It
has also been suggested that the new
Google

Google voice search iPhone app is
using
Speex to transmit voice to
Google

Google servers for
interpretation.[23]
Adobe
Flash Player

Flash Player supports
Speex starting with Flash Player
10.0.12.36, released in October 2008.[24] Because of some bugs in
Flash Player, the first recommended version for
Speex support is
10.0.22.87 and later.
Speex in
Flash Player

Flash Player can be used for both kind
of communication, through
Flash Media Server

Flash Media Server or P2P.
Speex can be
decoded or converted to any format unlike Nellymoser audio, which was
the only speech format in previous versions of Flash Player.[25][26]
Speex can be also used in the
Flash Video

Flash Video container format (.flv),
starting with version 10 of Video
File

File Format Specification (published
in November 2008).[27]
The JavaSonics ListenUp[28] voice recorder uses
Speex to compress
voice messages that are recorded in a browser and then uploaded to a
web server. Primary applications are language training, transcription
and social networking.
Speex is used as the voice compression algorithm in the Siri voice
assistance on the iPhone 4S.[29] Since text-to-speech occurs on
Apple's servers, the
Speex codec is used to minimize network
bandwidth.
See also[edit]
Free software

Free software portal
Comparison of audio coding formats
Opus (audio format)

Opus (audio format) - successor of Speex
Sources[edit]
This article uses material from the
Speex Codec Manual which is
copyright © Jean-Marc Valin and licensed under the terms of the GFDL.
References[edit]
^ "PlayOgg! - FSF - Free Software Foundation". 2010-03-17. Retrieved
2013-10-01.
^ Jean-Marc Valin (2009). "people.xiph.org - personal webspace of the
xiphs - Jean-Marc Valin". Xiph.Org. Retrieved 2009-09-11.
^ a b "
Speex News". Xiph.Org Foundation. Retrieved 2017-04-11.
^ "The
Speex Codec Manual -
Speex License". Xiph.Org Foundation.
Retrieved 2009-09-01.
^ "Sample Xiph.Org Variant of the BSD License". Xiph.Org Foundation.
Retrieved 2009-08-29.
^ Xiph.Org Speex: A Free Codec For Free Speech, Retrieved 2009-09-01
^ a b Xiph.Org Introduction to CELP Coding, Retrieved 2009-09-01
^ Adobe
FLV

FLV format specification, retrieved 2016-04-18
^ Xiph.org
Speex releases - pre-1.0 - NEWS and ChangeLog in
speex-0.0.1.tar.gz, Retrieved 2009-09-01
^ Xiph.Org
Speex FAQ – Under what license is
Speex released?,
Retrieved 2009-09-01
^ Xiph.Org (2003-03-24)
Speex reaches 1.0; Xiph.Org now a 501(c)(3)
Non-Profit Organization, Retrieved 2009-09-01
^ [1]
Speex homepage, retrieved 2017-04-11
^ Codec Description
^ Spot me if you can: Uncovering Spoken Phrases in Encrypted VoIP
Conversations (Charles V. Wright Lucas Ballard Scott E. Coull Fabian
Monrose Gerald M. Masson)
^ As announced by Ralph Giles, the
Theora codec maintainer, on
LugRadio

LugRadio episode 29
^ "A free codec for free speech". Speex. Retrieved 2012-12-29.
^ Lascelles, LLC. "The worlds most convenient Audio Bible". Ear Bible.
Retrieved 2012-12-29.
^ Lascelles, LLC. "Support". Ear Bible. Retrieved 2012-12-29.
^ "PA/VA, PSIM Software and Station Management Systems > ASL Safety
& Security". Asl-control.co.uk. Retrieved 2012-12-29.
^ IPAM 400: IP Based Intelligent Public Address Amplifier - User
Manual
^ "LabQuest 2 > Vernier Software & Technology". Vernier.com.
2012-05-23. Retrieved 2012-12-29.
^ "Legal Notices".
Google

Google Inc. Retrieved 2014-12-05.
^ Deconstructing
Google

Google Mobile's Voice Search on the iPhone
^ Adobe (2008)
Flash Player

Flash Player 10 Datasheet, Retrieved 2009-09-01
^ AskMeFlash.com (2009-05-10)
Speex for Flash, Retrieved on 2009-08-12
^ AskMeFlash.com (2009-05-10)
Speex vs Nellymoser, Retrieved on
2009-08-12
^ Adobe Systems Incorporated (November 2008). "Video
File

File Format
Specification, Version 10" (PDF). Adobe Systems Incorporated.
Retrieved 2014-12-05.
^ Phil Burk. "JavaSonics ListenUp voice recording Applet for Java that
uploads messages to a web server". Javasonics.com. Retrieved
2012-12-29.
^ "Applidium — News". Applidium.com. Retrieved 2012-12-29.
External links[edit]
RFC 5574 – RTP Payload Format for the
Speex Codec
Official
Speex homepage
Plugin & software page
J
Speex is a port of
Speex to the Java platform
N
Speex is a port of
Speex to the .NET platform and Silverlight based
on JSpeex
C
Speex is a port of
Speex to the .NET platform based on JSpeex
RFC 5334 –
Ogg

Ogg Media Types
http://dirac.epucfe.eu/projets/wakka.php?wiki=P12AB10 -
Speex Encoder
Player (César MBUMBA)
v
t
e
Xiph.Org Foundation
Ogg

Ogg Project
Vorbis
Daala
Theora
FLAC
Opus
CELT
Speex
Tremor
OggPCM
Ogg

Ogg Writ
Other projects
XSPF
Annodex
cdparanoia
Icecast
Related articles
Chris Montgomery
CMML
Ogg

Ogg page
Ogg

Ogg Squish
Use of
Ogg

Ogg formats in HTML5
Vorbis

Vorbis comment
v
t
e
GNU

GNU Project
History
GNU

GNU Manifesto
Free Software Foundation
Europe
India
Latin America
History of free software
Licenses
GNU

GNU General Public License
GNU

GNU Lesser General Public License
GNU

GNU Affero General Public License
GNU

GNU Free Documentation License
GPL linking exception
Software
GNU

GNU (variants)
Hurd
Linux-libre
glibc
Bash
coreutils
findutils
Build System
GCC
binutils
GDB
GRUB
GNOME
GNUstep
GIMP
GNU

GNU Ring
GNU

GNU Emacs
GNU

GNU TeXmacs
GNU

GNU Octave
GNU

GNU R
GSL
GMP
GNU

GNU Electric
GNU

GNU Archimedes
GNUnet
GNU

GNU Privacy Guard
Gnuzilla (IceCat)
GNU

GNU Health
GNUmed
GNU

GNU LilyPond
GNU

GNU Go
GNU

GNU Chess
Gnash
Guix
Guix System Distribution
more...
Public
speakers
Alexandre Oliva
Benjamin Mako Hill
Bradley M. Kuhn
Eben Moglen
Federico Heinz
Georg C. F. Greve
John Sullivan
Loïc Dachary
Matt Lee
Nagarjuna G.
Ricardo Galli
Richard Stallman
Robert J. Chassell
Other topics
GNU/
Linux

Linux naming controversy
Revolution OS
Free Software Foundation

Free Software Foundation anti-Windows campaigns
Defective by Design
v
t
e
Multimedia

Multimedia compression and container formats
Video
compression
ISO/IEC
MJPEG
Motion JPEG 2000
MPEG-1
MPEG-2
Part 2
MPEG-4
Part 2/ASP
Part 10/AVC
MPEG-H
Part 2/HEVC
ITU-T
H.120
H.261
H.262
H.263
H.264
H.265
SMPTE
VC-1
VC-2
VC-3
VC-5
Alliance for Open Media
AV1
Others
Apple Video
AVS
Bink
Cinepak
Daala
Dirac
DV
DVI
FFV1
Huffyuv
Indeo
Lagarith
Microsoft Video 1
MSU Lossless
OMS Video
Pixlet
ProRes 422
ProRes 4444
QuickTime
Animation
Graphics
RealVideo
RTVideo
SheerVideo
Smacker
Sorenson Video, Spark
Theora
Thor
VP3
VP6
VP7
VP8
VP9
WMV
XEB
YULS
Audio
compression
ISO/IEC
MPEG-1

MPEG-1 Layer III (MP3)
MPEG-1

MPEG-1 Layer II
Multichannel
MPEG-1

MPEG-1 Layer I
AAC
HE-AAC
AAC-LD
MPEG Surround
MPEG-4 ALS
MPEG-4 SLS
MPEG-4 DST
MPEG-4 HVXC
MPEG-4 CELP
MPEG-D USAC
MPEG-H 3D Audio
ITU-T
G.711 (A-law, µ-law)
G.718
G.719
G.722
G.722.1
G.722.2
G.723
G.723.1
G.726
G.728
G.729
G.729.1
IETF
Opus
iLBC
3GPP
AMR
AMR-WB
AMR-WB+
EVRC
EVRC-B
EVS
GSM-HR
GSM-FR
GSM-EFR
Others
ACELP
AC-3
AC-4
ALAC
Asao
ATRAC
CELT
Codec2
DRA
DTS
FLAC
iSAC
Monkey's Audio
TTA
True Audio
MT9
Musepack
OptimFROG
OSQ
QCELP
RCELP
RealAudio
RTAudio
SD2
SHN
SILK
Siren
SMV
Speex
SVOPC
TwinVQ
VMR-WB
Vorbis
VSELP
WavPack
WMA
MQA
aptX
LDAC
Image
compression
IEC, ISO,
ITU-T, W3C, IETF
CCITT Group 4
GIF
HEIF
HEVC
JBIG
JBIG2
JPEG
JPEG-LS
JPEG

JPEG 2000
JPEG

JPEG XR
JPEG

JPEG XT
PNG
TIFF
TIFF/EP
TIFF/IT
Others
APNG
BPG
DjVu
EXR
FLIF
ICER
MNG
PGF
QTVR
WBMP
WebP
Containers
ISO/IEC
MPEG-ES
MPEG-PES
MPEG-PS
MPEG-TS
ISO base media file format
MPEG-4 Part 14 (MP4)
Motion JPEG 2000
MPEG-21 Part 9
MPEG media transport
ITU-T
H.222.0
T.802
IETF
RTP
Others
3GP and 3G2
AMV
ASF
AIFF
AVI
AU
BPG
Bink
Smacker
BMP
DivX

DivX Media Format
EVO
Flash Video
GXF
IFF
M2TS
Matroska
WebM
MXF
Ogg
QuickTime

QuickTime
File

File Format
RatDVD
RealMedia
RIFF
WAV
MOD and TOD
VOB, IFO and BUP
Collaborations
NETVC
MPEG-LA
See Compression methods for methods and Compression software for
codecs
v
t
e
Data compression

Data compression software
Archivers with
compression
(comparison)
Free software
7-Zip
Archive Manager
Ark
Expander
FreeArc
Info-ZIP
KGB Archiver
PAQ
PeaZip
The Unarchiver (decompression only)
tar
Xarchiver
Zipeg
ZPAQ
Freeware
Filzip
LHA
StuffIt Expander (decompression only)
TUGZip
ZipGenius
Commercial
ARC
ALZip
Archive Utility
ARJ
BetterZip
JAR
MacBinary
PKZIP/SecureZIP
PowerArchiver
StuffIt
WinAce
WinRAR
WinZip
Non-archiving
compressors
Generic
bzip2
compress
gzip
lzip
lzop
pack
rzip
Snappy
XZ Utils
For code
UPX
Audio
compression
(comparison)
Lossy
Fraunhofer FDK AAC
Nero AAC Codec
Freeware

Freeware Advanced Audio Coder (FAAC)
Helix DNA Producer
l3enc
LAME
TooLAME
libavcodec
libcelt
libopus
libspeex
Musepack
libvorbis
Windows Media Encoder
Lossless
ALAC
FLAC
libavcodec
Monkey's Audio
mp4als
OptimFROG
Shorten
TTA (True Audio)
WavPack
Video
compression
(comparison)
Lossy
MPEG-4 ASP
3ivx
DivX
Nero Digital
FFmpeg
HDX4
Xvid
H.264 /
MPEG-4 AVC
CoreAVC
Blu-code
DivX
FFmpeg
Nero Digital
OpenH264
QuickTime
x264
HEVC
DivX
x265
Others
CineForm
Cinepak
Daala
DNxHD
Helix DNA Producer
Indeo
libavcodec
Schrödinger (Dirac)
SBC
Sorenson
VP7
libtheora
libvpx
Windows Media Encoder
Lossless
FFV1
Huffyuv
Lagarith
MSU Lossless
YULS
See also: compression methods and c