Linear predictive coding (LPC) is a method used mostly in
audio signal processing
Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...
and
speech processing for representing the
spectral envelope
In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed into ...
of a
digital
Digital usually refers to something using discrete digits, often binary digits.
Businesses
*Digital bank, a form of financial institution
*Digital Equipment Corporation (DEC) or Digital, a computer company
*Digital Research (DR or DRI), a software ...
signal
A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology.
In ...
of
speech
Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...
in
compressed form, using the information of a
linear
In mathematics, the term ''linear'' is used in two distinct senses for two different properties:
* linearity of a '' function'' (or '' mapping'');
* linearity of a '' polynomial''.
An example of a linear function is the function defined by f(x) ...
predictive model.
LPC is the most widely used method in
speech coding
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
and
speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low
bit rate
In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time.
The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...
.
Overview
LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for
voiced
Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced.
The term, however, is used to refe ...
sounds), with occasional added hissing and popping sounds (for
voiceless
In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...
sounds such as
sibilant
Sibilants (from 'hissing') are fricative and affricate consonants of higher amplitude and pitch, made by directing a stream of air with the tongue towards the teeth. Examples of sibilants are the consonants at the beginning of the English w ...
s and
plosive
In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases.
The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lip ...
s). Although apparently crude, this
Source–filter model is actually a close approximation of the reality of speech production. The
glottis
The glottis (: glottises or glottides) is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing sound from the vocal folds.
Etymology
From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γ ...
(the space between the vocal folds) produces the buzz, which is characterized by its intensity (
loudness
In acoustics, loudness is the subjectivity, subjective perception of sound pressure. More formally, it is defined as the "attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relat ...
) and
frequency
Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...
(pitch). The
vocal tract
The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx in birds) is filtered.
In birds, it consists of the trachea, the syrinx, the oral cavity, the upper part of t ...
(the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to
formant
In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...
s, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.
The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.
Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.
Early history
Linear prediction (signal estimation) goes back to at least the 1940s when
Norbert Wiener
Norbert Wiener (November 26, 1894 – March 18, 1964) was an American computer scientist, mathematician, and philosopher. He became a professor of mathematics at the Massachusetts Institute of Technology ( MIT). A child prodigy, Wiener late ...
developed a mathematical theory for calculating the best
filters
Filtration is a physical process that separates solid matter and fluid from a mixture.
Filter, filtering, filters or filtration may also refer to:
Science and technology
Computing
* Filter (higher-order function), in functional programming
* Fil ...
and predictors for detecting signals hidden in noise.
Soon after
Claude Shannon
Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...
established a
general theory of coding, work on predictive coding was done by
C. Chapin Cutler,
Bernard M. Oliver and Henry C. Harrison.
Peter Elias in 1955 published two papers on predictive coding of signals.
Linear predictors were applied to speech analysis independently by
Fumitada Itakura of
Nagoya University and Shuzo Saito of
Nippon Telegraph and Telephone
(NTT) is a Japanese telecommunications holding company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third largest pu ...
in 1966 and in 1967 by
Bishnu S. Atal,
Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
; Atal and Schroeder described an
adaptive linear predictor approach; Burg outlined an approach based on
principle of maximum entropy.
In 1969, Itakura and Saito introduced method based on
partial correlation (PARCOR),
Glen Culler proposed real-time speech encoding, and
Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the
Acoustical Society of America
The Acoustical Society of America (ASA) is an international scientific society founded in 1929 dedicated to generating, disseminating and promoting the knowledge of acoustics and its practical applications. The Society is primarily a voluntary org ...
. In 1971, realtime LPC using
16-bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...
LPC hardware was demonstrated by
Philco-Ford; four units were sold.
LPC technology was advanced by Bishnu Atal and
Manfred Schroeder during the 1970s1980s.
In 1978, Atal and Vishwanath ''et al.'' of BBN developed the first
variable-rate LPC algorithm.
The same year, Atal and
Manfred R. Schroeder at Bell Labs proposed an LPC speech
codec
A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder.
In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...
called
adaptive predictive coding, which used a
psychoacoustic coding algorithm exploiting the masking properties of the human ear.
This later became the basis for the
perceptual coding technique used by the
MP3 audio compression format, introduced in 1993.
Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985.
LPC is the basis for
voice-over-IP (VoIP) technology.
In 1972,
Bob Kahn of
ARPA with Jim Forgie of
Lincoln Laboratory (LL) and Dave Walden of
BBN Technologies
Raytheon BBN (originally Bolt, Beranek and Newman, Inc.) is an American research and development company based in Cambridge, Massachusetts.
In 1966, the Franklin Institute awarded the firm the Frank P. Brown Medal, in 1999 BBN received the ...
started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400
bit/
s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the
ARPANET
The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first computer networks to implement the TCP/IP protocol suite. Both technologies became the tec ...
at 3500 bit/s between Culler-Harrison and Lincoln Laboratory.
LPC coefficient representations
LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see
linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.
There are more advanced representations such as
log area ratios (LAR),
line spectral pairs (LSP) decomposition and
reflection coefficients. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.
Applications
LPC is the most widely used method in
speech coding
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...
and
speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
. It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the
GSM
The Global System for Mobile Communications (GSM) is a family of standards to describe the protocols for second-generation (2G) digital cellular networks, as used by mobile devices such as mobile phones and Mobile broadband modem, mobile broadba ...
standard, for example. It is also used for
secure wireless, where voice must be
digitized,
encrypted and sent over a narrow voice channel; an early example of this is the US government's
Navajo I.
LPC synthesis can be used to construct
vocoders where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in
electronic music
Electronic music broadly is a group of music genres that employ electronic musical instruments, circuitry-based music technology and software, or general-purpose electronics (such as personal computers) in its creation. It includes both music ...
.
Paul Lansky made the well-known computer music piece
notjustmoreidlechatter using linear predictive coding.
A 10th-order LPC was used in the popular 1980s
Speak & Spell educational toy.
LPC predictors are used in
Shorten,
MPEG-4 ALS
MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/ IEC 14496-3:2005/Amd 2:2006 in 20 ...
,
FLAC
FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software ...
,
SILK
Silk is a natural fiber, natural protein fiber, some forms of which can be weaving, woven into textiles. The protein fiber of silk is composed mainly of fibroin and is most commonly produced by certain insect larvae to form cocoon (silk), c ...
audio codec, and other
lossless audio codecs.
LPC has received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.
See also
*
Akaike information criterion
*
Audio compression
*
Code-excited linear prediction (CELP)
*
FS-1015
*
FS-1016
*
Generalized filtering
*
Linear prediction
*
Linear predictive analysis
*
Pitch estimation
*
Warped linear predictive coding
References
Further reading
*
*
*
External links
real-time LPC analysis/synthesis learning software30 years later Dr Richard Wiggins Talks Speak & Spell development
{{DEFAULTSORT:Linear Predictive Coding
Audio codecs
Lossy compression algorithms
Speech codecs
Digital signal processing
Japanese inventions
Data compression