HOME

TheInfoList



OR:

Codec 2 is a low-bitrate speech audio
codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or da ...
(
speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
) that is
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A p ...
free and
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
. Codec 2 compresses speech using
sinusoidal A sine wave, sinusoidal wave, or just sinusoid is a mathematical curve defined in terms of the '' sine'' trigonometric function, of which it is the graph. It is a type of continuous wave and also a smooth periodic function. It occurs often in m ...
coding, a method specialized for human
speech Speech is a human vocal communication using language. Each language uses Phonetics, phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if ...
. Bit rates of 3200 to 450 bit/s have been successfully created. Codec 2 was designed to be used for
amateur radio Amateur radio, also known as ham radio, is the use of the radio frequency spectrum for purposes of non-commercial exchange of messages, wireless experimentation, self-training, private recreation, radiosport, contesting, and emergency communic ...
and other high compression voice applications.


Overview

The codec was developed by David Grant Rowe, with support and cooperation of other researchers (e.g., Jean-Marc Valin from
Opus ''Opus'' (pl. ''opera'') is a Latin word meaning "work". Italian equivalents are ''opera'' (singular) and ''opere'' (pl.). Opus or OPUS may refer to: Arts and entertainment Music * Opus number, (abbr. Op.) specifying order of (usually) publicatio ...
). Codec 2 consists of 3200, 2400, 1600, 1400, 1300, 1200, 700 and 450 bit/s codec modes. It outperforms most other low-bitrate
speech codec Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...
s. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality. The speech codec uses 16-bit PCM sampled audio, and outputs packed digital bytes. When sent packed digital bytes, it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz. The
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation o ...
is open source and is freely available in a
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
repository. The source code is released under the terms of version 2.1 of the
GNU Lesser General Public License The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...
(LGPL). It is programmed in C and current source code requires
floating-point arithmetic In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
, although the algorithm itself does not require this. The reference software package also includes a frequency-division multiplex digital voice software modem and a graphical user interface based on
WxWidgets wxWidgets (formerly wxWindows) is a widget toolkit and tools library for creating graphical user interfaces (GUIs) for cross-platform applications. wxWidgets enables a program's GUI code to compile and run on several computer platforms with mini ...
. The software is developed on
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
and a port for
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
created with
Cygwin Cygwin ( ) is a POSIX-compatible programming and runtime environment that runs natively on Microsoft Windows. Under Cygwin, source code designed for Unix-like operating systems may be compiled with minimal modification and executed. The Cygwin in ...
is offered in addition to an Apple
MacOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
version. The codec has been presented in various conferences and has received the 2012
ARRL The American Radio Relay League (ARRL) is the largest membership association of amateur radio enthusiasts in the United States. ARRL is a non-profit organization, and was co-founded on April 6, 1914, by Hiram Percy Maxim and Clarence D. Tuska of ...
Technical Innovation Award, and the Linux Australia Conference's Best Presentation Award.


Technology

Internally, parametric audio coding algorithms operate on 10 ms PCM frames using a model of the human voice. Each of these audio segments is declared
voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refer ...
(vowel) or unvoiced (consonant). Codec 2 uses sinusoidal coding to model speech, which is closely related to that of
multi-band excitation In telecommunications, a multi-band device (including (2) dual-band, (3) tri-band, (4) quad-band and (5) penta-band devices) is a communication device (especially a mobile phone) that supports multiple radio frequency bands. All devices which ha ...
codecs. Sinusoidal coding is based on regularities (periodicity) in the pattern of overtone frequencies and layers harmonic sinusoids. Spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called
Line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear prediction coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g. smaller sensitivity to quantization noise) that make them s ...
, or LSP, on top of a determined
fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'', is defined as the lowest frequency of a periodic waveform. In music, the fundamental is the musical pitch of a note that is perceived as the lowest partial present. In ...
of the speaker's voice (pitch). The (quantised) pitch and the amplitude (energy) of the
harmonics A harmonic is a wave with a frequency that is a positive integer multiple of the ''fundamental frequency'', the frequency of the original periodic signal, such as a sinusoidal wave. The original signal is also called the ''1st harmonic'', the ...
are encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the
Linear Predictive Coding Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. ...
(LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters. The digital bytes are in a bit-field format that have been packed together into bytes. These bit fields are also optionally
gray code The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit (binary digit). For example, the representati ...
d before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit fields out. The bit fields make up the various parameters that are stored or exchanged (pitch, energy, voicing booleans, LSP's, etc.). For example, Mode 3200, has 20 ms of audio converted to 64 bits. So 64 bits will be output every 20 ms (50 times a second), for a minimum data rate of 3200 bit/s. These 64 bits are sent as 8 bytes to the application, which has to unwrap the bit fields, or send the bytes over a data channel. Another example is Mode 1300, which is sent 40 ms of audio, and outputs 52 bits every 40 ms (25 times a second), for a minimum rate of 1300 bit/s. These 52 bits are sent as 7 bytes to the application or data channel.


Adoption

Codec 2 is currently used in several radios and Software Defined Radio Systems * FreeDV * FlexRadio 6000 series * SM1000 * Quisk * M17 Project Codec2 has also been integrated into
FreeSWITCH FreeSWITCH is free and open-source server software for real-time communication applications, including WebRTC, video, and voice over Internet Protocol (VoIP). It runs on Linux, Windows, macOS, and FreeBSD. FreeSWITCH is used to build private bran ...
and there's a
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song) ...
available for support in
Asterisk The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star. Computer scientists and mathematicians often voc ...
. There was an FM-to-Codec2 digital voice repeater in earth orbit on amateur radio
CubeSat A CubeSat is a class of miniaturized satellite based around a form factor consisting of cubes. CubeSats have a mass of no more than per unit, and often use commercial off-the-shelf (COTS) components for their electronics and structure. CubeSats ...
''LilacSat-1'' (call sign ON02CN,
QB50 A CubeSat is a class of miniaturized satellite based around a form factor consisting of cubes. CubeSats have a mass of no more than per unit, and often use commercial off-the-shelf (COTS) components for their electronics and structure. CubeSats ...
constellation), which was launched and subsequently deployed from the
International Space Station The International Space Station (ISS) is the largest modular space station currently in low Earth orbit. It is a multinational collaborative project involving five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA ...
in 2017.


History

The prominent
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
advocate and
radio amateur An amateur radio operator is someone who uses equipment at an amateur radio station to engage in two-way personal communications with other amateur operators on radio frequencies assigned to the amateur radio service. Amateur radio operators h ...
Bruce Perens Bruce Perens (born around 1958) is an American computer programmer and advocate in the free software movement. He created The Open Source Definition and published the first formal announcement and manifesto of open source. He co-founded the Open ...
lobbied for the creation of a free speech codec for operation at less than 5 kBit/s. Since he did not have the background himself, he approached Jean-Marc Valin in 2008, who introduced him to lead developer David Grant Rowe, who has worked with Valin on
Speex Speex is an audio compression codec specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts. It is based on the CELP speech coding algorithm.Xiph.OrIntrodu ...
on several occasions. Rowe himself was also a radio amateur (amateur radio
call sign In broadcasting and radio communications, a call sign (also known as a call name or call letters—and historically as a call signal—or abbreviated as a call) is a unique identifier for a transmitter station. A call sign can be formally assigne ...
VK5DGR) and had experience in creating and using voice codecs and other signal processing algorithms for speech signals. He obtained a PhD in speech coding in the 1990s and was involved in the development of one of the first
satellite telephony A satellite telephone, satellite phone or satphone is a type of mobile phone that connects to other phones or the telephone network by radio through orbiting satellites instead of terrestrial cell sites, as cellphones do. The advantage of a sa ...
systems ( Mobilesat). He agreed to the task and announced his decision to work on a format on August 21, 2009. He built on the research and findings from his doctoral thesis. The underlying sinusoidal modelling goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) from the mid-1980s. In August 2010, David Rowe published version 0.1 alpha. Version 0.2 was released towards the end of 2011, introducing a mode with 1,400 bits/s and significant improvements in quantization. In January 2012, at
linux.conf.au linux.conf.au (often abbreviated as lca) is Australasia's regional Linux and Open Source conference. It is a roaming conference, held in a different Australian or New Zealand city every year, coordinated by Linux Australia and organised by lo ...
, Jean-Marc Valin helped improve the quantization of line spectral pairs, which Rowe is less familiar with. After several changes to the available bit rate modes in winter and spring 2011/2012, 2,400, 1,400 and 1,200 bit/s modes were available after May of that year. Codec 2 700C, a new mode with a bit rate of 700 bit/s, was finished in early 2017. In July 2018 an experimental 450 bit/s mode was demonstrated, which was developed as part of a master thesis at the University of Erlangen-Nuremberg. By clever training of the vector quantization the data rate could be further reduced based on the principle of the 700C mode.


References


External links


Official website



FreeDV
{{Compression Software Implementations Speech codecs Free audio codecs