Linear predictive coding (LPC) is a method used mostly in

audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...

and

speech processing Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to ...

for representing the

spectral envelope In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed into ...

of a

digital Digital usually refers to something using discrete digits, often binary digits. Businesses *Digital bank, a form of financial institution *Digital Equipment Corporation (DEC) or Digital, a computer company *Digital Research (DR or DRI), a software ...

signal A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology. In ...

speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...

in compressed form, using the information of a

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

predictive model Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive mod ...

. LPC is the most widely used method in

speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

and

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low

bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for

voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refe ...

sounds), with occasional added hissing and popping sounds (for

voiceless In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...

sounds such as

sibilant Sibilants (from 'hissing') are fricative and affricate consonants of higher amplitude and pitch, made by directing a stream of air with the tongue towards the teeth. Examples of sibilants are the consonants at the beginning of the English w ...

s and

plosive In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases. The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lip ...

s). Although apparently crude, this

Source–filter model The source–filter model represents speech as a combination of a sound source, such as the vocal cords, and a linear acoustic filter, the vocal tract. While only an approximation, the model is widely used in a number of applications such as speech ...

is actually a close approximation of the reality of speech production. The

glottis The glottis (: glottises or glottides) is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing sound from the vocal folds. Etymology From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γ ...

(the space between the vocal folds) produces the buzz, which is characterized by its intensity (

loudness In acoustics, loudness is the subjectivity, subjective perception of sound pressure. More formally, it is defined as the "attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relat ...

) and

frequency Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...

(pitch). The

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx in birds) is filtered. In birds, it consists of the trachea, the syrinx, the oral cavity, the upper part of t ...

(the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to

formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...

s, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least the 1940s when

Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American computer scientist, mathematician, and philosopher. He became a professor of mathematics at the Massachusetts Institute of Technology ( MIT). A child prodigy, Wiener late ...

developed a mathematical theory for calculating the best

filters Filtration is a physical process that separates solid matter and fluid from a mixture. Filter, filtering, filters or filtration may also refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Fil ...

and predictors for detecting signals hidden in noise. Soon after

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...

established a general theory of coding, work on predictive coding was done by C. Chapin Cutler, Bernard M. Oliver and Henry C. Harrison.

Peter Elias Peter Elias (November 23, 1923 – December 7, 2001) was a pioneer in the field of information theory. Born in New Brunswick, New Jersey, he was a member of the Massachusetts Institute of Technology faculty from 1953 to 1991. In 1955, Elias introdu ...

in 1955 published two papers on predictive coding of signals. Linear predictors were applied to speech analysis independently by

Fumitada Itakura is a Japanese scientist. He did pioneering work in statistical signal processing, and its application to speech analysis, synthesis and coding, including the development of the linear predictive coding (LPC) and line spectral pairs (LSP) metho ...

Nagoya University , abbreviated to or NU, is a Japanese national research university located in Chikusa-ku, Nagoya. It was established in 1939 as the last of the nine Imperial Universities in the then Empire of Japan, and is now a Designated National Universit ...

and Shuzo Saito of

Nippon Telegraph and Telephone (NTT) is a Japanese telecommunications holding company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third largest pu ...

in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on

principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...

. In 1969, Itakura and Saito introduced method based on

partial correlation In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two ...

(PARCOR), Glen Culler proposed real-time speech encoding, and Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the

Acoustical Society of America The Acoustical Society of America (ASA) is an international scientific society founded in 1929 dedicated to generating, disseminating and promoting the knowledge of acoustics and its practical applications. The Society is primarily a voluntary org ...

. In 1971, realtime LPC using

16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...

LPC hardware was demonstrated by

Philco-Ford Philco (an acronym for Philadelphia Battery Company) is an American electronics industry, electronics manufacturer headquartered in Philadelphia. Philco was a pioneer in battery, radio, and television production. In 1961, the company was purchase ...

; four units were sold. LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s1980s. In 1978, Atal and Vishwanath ''et al.'' of BBN developed the first variable-rate LPC algorithm. The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech

codec A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...

called adaptive predictive coding, which used a

psychoacoustic Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, ...

coding algorithm exploiting the masking properties of the human ear. This later became the basis for the

perceptual coding Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, ...

technique used by the

MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg. It was designed to greatly reduce the amount ...

audio compression format, introduced in 1993.

Code-excited linear prediction Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...

(CELP) was developed by Schroeder and Atal in 1985. LPC is the basis for

voice-over-IP Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as ...

(VoIP) technology. In 1972,

Bob Kahn Robert Elliot Kahn (born December 23, 1938) is an American electrical engineer who, along with Vint Cerf, first proposed the Transmission Control Protocol (TCP) and the Internet Protocol (IP), the fundamental communication protocols at the hea ...

of ARPA with Jim Forgie of

Lincoln Laboratory The MIT Lincoln Laboratory, located in Lexington, Massachusetts, is a United States Department of Defense federally funded research and development center chartered to apply advanced technology to problems of national security. Research and dev ...

(LL) and Dave Walden of

BBN Technologies Raytheon BBN (originally Bolt, Beranek and Newman, Inc.) is an American research and development company based in Cambridge, Massachusetts. In 1966, the Franklin Institute awarded the firm the Frank P. Brown Medal, in 1999 BBN received the ...

started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400

bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...

/ s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the

ARPANET The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first computer networks to implement the TCP/IP protocol suite. Both technologies became the tec ...

at 3500 bit/s between Culler-Harrison and Lincoln Laboratory.

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see

linear prediction Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples. In digital signal processing, linear prediction is often called linear predictive coding (LPC) and ...

for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as log area ratios (LAR),

line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear predictive coding, linear prediction coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g. smaller sensitivity to quantizatio ...

(LSP) decomposition and

reflection coefficient In physics and electrical engineering the reflection coefficient is a parameter that describes how much of a wave is reflected by an impedance discontinuity in the transmission medium. It is equal to the ratio of the amplitude of the reflected ...

s. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in

and

. It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the

GSM The Global System for Mobile Communications (GSM) is a family of standards to describe the protocols for second-generation (2G) digital cellular networks, as used by mobile devices such as mobile phones and Mobile broadband modem, mobile broadba ...

standard, for example. It is also used for secure wireless, where voice must be

digitize Digitization is the process of converting information into a digital (i.e. computer-readable) format.Collins Dictionary. (n.d.). Definition of 'digitize'. Retrieved December 15, 2021, from https://www.collinsdictionary.com/dictionary/english/ ...

encrypted In cryptography, encryption (more specifically, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the information, known as plain ...

and sent over a narrow voice channel; an early example of this is the US government's Navajo I. LPC synthesis can be used to construct

vocoder A vocoder (, a portmanteau of ''vo''ice and en''coder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder wa ...

s where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in

electronic music Electronic music broadly is a group of music genres that employ electronic musical instruments, circuitry-based music technology and software, or general-purpose electronics (such as personal computers) in its creation. It includes both music ...

Paul Lansky Paul Lansky (born June 18, 1944, in New York City) is an American composer. Biography Paul Lansky (born 1944) is an American composer. He was educated at Manhattan's High School of Music and Art, Queens College and Princeton University, studyi ...

made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. A 10th-order LPC was used in the popular 1980s Speak & Spell educational toy. LPC predictors are used in Shorten,

MPEG-4 ALS MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/ IEC 14496-3:2005/Amd 2:2006 in 20 ...

FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software ...

SILK Silk is a natural fiber, natural protein fiber, some forms of which can be weaving, woven into textiles. The protein fiber of silk is composed mainly of fibroin and is most commonly produced by certain insect larvae to form cocoon (silk), c ...

audio codec An audio codec is a device or computer program capable of encoding or decoding a digital data stream (a codec) that encodes or decodes audio. In software, an audio codec is a computer program implementing an algorithm that compresses and decompres ...

, and other

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statisti ...

audio codecs. LPC has received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.

References

External links

real-time LPC analysis/synthesis learning software30 years later Dr Richard Wiggins Talks Speak & Spell development

{{DEFAULTSORT:Linear Predictive Coding Audio codecs Lossy compression algorithms Speech codecs Digital signal processing Japanese inventions Data compression