HOME

TheInfoList



OR:

Codec 2 is a low-bitrate speech audio
codec A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...
(
speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
) that is
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an sufficiency of disclosure, enabling discl ...
free and
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
. Codec 2 compresses speech using
sinusoidal A sine wave, sinusoidal wave, or sinusoid (symbol: ∿) is a periodic wave whose waveform (shape) is the trigonometric sine function. In mechanics, as a linear motion over time, this is '' simple harmonic motion''; as rotation, it correspond ...
coding, a method specialized for human
speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...
. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for
amateur radio Amateur radio, also known as ham radio, is the use of the radio frequency radio spectrum, spectrum for purposes of non-commercial exchange of messages, wireless experimentation, self-training, private recreation, radiosport, contesting, and emer ...
and other high compression voice applications.


Overview

The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from Opus). Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450 bit/s codec modes. It outperforms most other low-bitrate speech codecs. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality. The speech codec uses 16-bit
PCM Pulse-code modulation (PCM) is a method used to Digital signal (signal processing), digitally represent analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio application ...
sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz. The
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
is open source and is freely available in a
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
repository. The source code is released under the terms of version 2.1 of the
GNU Lesser General Public License The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...
(LGPL). It is programmed in C and current source code requires
floating-point arithmetic In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
, although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice software modem and a graphical user interface based on
WxWidgets wxWidgets (formerly wxWindows) is a widget toolkit and tools library for creating graphical user interfaces (GUIs) for cross-platform applications. wxWidgets enables a program's GUI code to compile and run on several computer platforms with no s ...
. The software is developed on
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
and a port for
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
created with
Cygwin Cygwin ( ) is a free and open-source Unix-like environment and command-line interface (CLI) for Microsoft Windows. The project also provides a software repository containing open-source packages. Cygwin allows source code for Unix-like operati ...
is offered in addition to an Apple
MacOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
version. The codec has been presented in various conferences and has received the 2012
ARRL The American Radio Relay League (ARRL) is the largest membership association of amateur radio enthusiasts in the United States. ARRL is a non-profit organization and was co-founded on April 6, 1914, by Hiram Percy Maxim and Clarence D. Tuska of H ...
Technical Innovation Award, and the Linux Australia Conference's Best Presentation Award.


Technology

Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared
voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refe ...
(vowel) or unvoiced (consonant). Codec 2 uses sinusoidal coding to model speech, which is closely related to that of multi-band excitation codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called Line spectral pairs, or LSP, on top of a determined
fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...
of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the
harmonics In physics, acoustics, and telecommunications, a harmonic is a sinusoidal wave with a frequency that is a positive integer multiple of the ''fundamental frequency'' of a periodic signal. The fundamental frequency is also called the ''1st harm ...
are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the
Linear Predictive Coding Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model ...
(LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters. The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally
gray code The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray (researcher), Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit (binary digit). For ...
d before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing Booleans, LSP's, etc.). For example, Mode 3200, has 20 ms of audio converted to 64 bits. So 64 bits will be output every 20 ms (50 times a second), for a minimum data rate of 3200 bit/s. These 64 bits are sent as 8 bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel. Another example is Mode 1300, which is sent 40 ms of audio, and outputs 52 bits every 40 ms (25 times a second), for a minimum rate of 1300 bit/s. These 52 bits are sent as 7 bytes to the application or data channel.


Adoption

Codec 2 is currently used in several radios and Software Defined Radio Systems * FreeDV * FlexRadio 6000 series * SM1000 * Quisk * M17 Project Codec2 has also been integrated into FreeSWITCH and there's a patch available for support in
Asterisk The asterisk ( ), from Late Latin , from Ancient Greek , , "little star", is a Typography, typographical symbol. It is so called because it resembles a conventional image of a star (heraldry), heraldic star. Computer scientists and Mathematici ...
. There was an FM-to-Codec2 digital voice repeater in earth orbit on amateur radio
CubeSat A CubeSat is a class of small satellite with a form factor of cubes. CubeSats have a mass of no more than per unit,, url=https://static1.squarespace.com/static/5418c831e4b0fa4ecac1bacd/t/5f24997b6deea10cc52bb016/1596234122437/CDS+REV14+2020-07-3 ...
''LilacSat-1'' (call sign ON02CN, QB50 constellation), which was launched and subsequently deployed from the
International Space Station The International Space Station (ISS) is a large space station that was Assembly of the International Space Station, assembled and is maintained in low Earth orbit by a collaboration of five space agencies and their contractors: NASA (United ...
in 2017.


History

The prominent
free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
advocate and radio amateur Bruce Perens lobbied for the creation of a free speech codec for operation at less than 5 kbit/s. Since he did not have the background himself, he approached Jean-Marc Valin in 2008, who introduced him to lead developer David Grant Rowe, who has worked with Valin on
Speex {{More citations needed, date=May 2025 The Speex project is an attempt to create a free software speech codec, unencumbered by patent restrictions. Speex is licensed under the BSD License and is used with the Xiph.org Foundation's Ogg containe ...
on several occasions. Rowe himself was also a radio amateur (amateur radio
call sign In broadcasting and radio communications, a call sign (also known as a call name or call letters—and historically as a call signal—or abbreviated as a call) is a unique identifier for a transmitter station. A call sign can be formally as ...
VK5DGR) and had experience in creating and using voice codecs and other signal processing algorithms for speech signals. He obtained a PhD in speech coding in the 1990s and was involved in the development of one of the first satellite telephony systems ( Mobilesat). He agreed to the task and announced his decision to work on a format on August 21, 2009. He built on the research and findings from his doctoral thesis. The underlying sinusoidal modelling goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) from the mid-1980s. In August 2010, David Rowe published version 0.1 alpha. Version 0.2 was released towards the end of 2011, introducing a mode with 1,400 bits/s and significant improvements in quantization. In January 2012, at
linux.conf.au linux.conf.au (often abbreviated as lca or LCA) is Australasia's regional Linux and open source conference. It is a roaming conference, held in a different Australian or New Zealand city every year, coordinated by Linux Australia and organised ...
, Jean-Marc Valin helped improve the quantization of line spectral pairs, which Rowe is less familiar with. After several changes to the available bit rate modes in winter and spring 2011/2012, 2,400, 1,400 and 1,200 bit/s modes were available after May of that year. Codec 2 700C, a new mode with a bit rate of 700 bit/s, was finished in early 2017. In July 2018 an experimental 450 bit/s mode was demonstrated, which was developed as part of a master thesis at the University of Erlangen-Nuremberg. By clever training of the vector quantization the data rate could be further reduced based on the principle of the 700C mode.


References


External links


Official website

FreeDV
{{Compression Software Implementations Speech codecs Free audio codecs