Linear predictive coding (LPC) is a method used mostly in

audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves— longitudinal waves which travel through air, consist ...

and

speech processing Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied t ...

for representing the

spectral envelope The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, o ...

of a digital

signal In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...

speech Speech is a human vocal communication using language. Each language uses Phonetics, phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if ...

in compressed form, using the information of a

linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...

predictive model Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive mod ...

. LPC is the most widely used method in

speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic d ...

and speech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for

voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refer ...

sounds), with occasional added hissing and popping sounds (for

voiceless In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies ...

sounds such as sibilants and

plosive In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases. The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lips ...

s). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The

glottis The glottis is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing vowels and voiced consonants. Etymology From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γλῶττα'' (glôtta), va ...

(the space between the vocal folds) produces the buzz, which is characterized by its intensity (

loudness In acoustics, loudness is the subjective perception of sound pressure. More formally, it is defined as, "That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relation of ph ...

) and

frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...

(pitch). The

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source ( larynx in mammals; syrinx in birds) is filtered. In birds it consists of the trachea, the syrinx, the oral cavity, the upper part of th ...

(the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to

formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmoni ...

s, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least 1940s when Norbert Wiener developed a mathematical theory for calculating the best

filters Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component tha ...

and predictors for detecting signals hidden in noise. Soon after

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, and cryptographer known as a "father of information theory". As a 21-year-old master's degree student at the Massachusetts Inst ...

established a general theory of coding, work on predictive coding was done by

C. Chapin Cutler Cassius Chapin Cutler (December 16, 1914 – December 1, 2002) was an American electrical engineer at Bell Labs. His notable achievements include the invention of the corrugated waveguide and differential pulse-code modulation (DPCM). Biogr ...

Bernard M. Oliver Bernard M. Oliver (May 17, 1916 – November 23, 1995), also known as Barney Oliver, was a scientist who made contributions in many fields, including radar, television, and computers. He was the founder and director of Hewlett Packard ( HP) la ...

and Henry C. Harrison.

Peter Elias Peter Elias (November 23, 1923 – December 7, 2001) was a pioneer in the field of information theory. Born in New Brunswick, New Jersey, he was a member of the Massachusetts Institute of Technology faculty from 1953 to 1991. In 1955, Elias introdu ...

in 1955 published two papers on predictive coding of signals. Linear predictors were applied to speech analysis independently by

Fumitada Itakura is a Japanese scientist. He did pioneering work in statistical signal processing, and its application to speech analysis, synthesis and coding, including the development of the linear predictive coding (LPC) and line spectral pairs (LSP) method ...

of Nagoya University and Shuzo Saito of

Nippon Telegraph and Telephone , commonly known as NTT, is a Japanese telecommunications company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third la ...

in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on principle of maximum entropy. In 1969, Itakura and Saito introduced method based on

partial correlation In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two ...

(PARCOR), Glen Culler proposed real-time speech encoding, and Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the

Acoustical Society of America The Acoustical Society of America (ASA) is an international scientific society founded in 1929 dedicated to generating, disseminating and promoting the knowledge of acoustics and its practical applications. The Society is primarily a voluntary org ...

. In 1971, realtime LPC using 16-bit LPC hardware was demonstrated by

Philco-Ford Philco (an acronym for Philadelphia Battery Company) is an American electronics manufacturer headquartered in Philadelphia. Philco was a pioneer in battery, radio, and television production. In 1961, the company was purchased by Ford and, from 196 ...

; four units were sold. LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s1980s. In 1978, Atal and Vishwanath ''et al.'' of BBN developed the first variable-rate LPC algorithm. The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech

codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...

called adaptive predictive coding, which used a

psychoacoustic Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological responses associated wit ...

coding algorithm exploiting the masking properties of the human ear. This later became the basis for the

perceptual coding Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how humans perceive various sounds. More specifically, it is the branch of science studying the psychological responses associated wit ...

technique used by the

MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...

audio compression format, introduced in 1993.

Code-excited linear prediction Code-excited linear prediction (CELP) is a linear predictive speech coding algorithm originally proposed by Manfred R. Schroeder and Bishnu S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algori ...

(CELP) was developed by Schroeder and Atal in 1985. LPC is the basis for

voice-over-IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet t ...

(VoIP) technology. In 1972,

Bob Kahn Robert Elliot Kahn (born December 23, 1938) is an American electrical engineer who, along with Vint Cerf, first proposed the Transmission Control Protocol (TCP) and the Internet Protocol (IP), the fundamental communication protocols at the hea ...

of ARPA, with Jim Forgie (

Lincoln Laboratory The MIT Lincoln Laboratory, located in Lexington, Massachusetts, is a United States Department of Defense federally funded research and development center chartered to apply advanced technology to problems of national security. Research and dev ...

, LL) and Dave Walden (

BBN Technologies Raytheon BBN (originally Bolt Beranek and Newman Inc.) is an American research and development company, based next to Fresh Pond in Cambridge, Massachusetts, United States. In 1966, the Franklin Institute awarded the firm the Frank P. Brown ...

), started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the

ARPANET The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first networks to implement the TCP/IP protocol suite. Both technologies became the technical fou ...

at 3500 bit/s between Culler-Harrison and Lincoln Laboratory. In 1976, the first LPC conference took place over the ARPANET using the

Network Voice Protocol The Network Voice Protocol (NVP) was a pioneering computer network protocol for transporting human speech over packetized communications networks. It was an early example of Voice over Internet Protocol technology. History NVP was first defi ...

, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s.

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see

linear prediction Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples. In digital signal processing, linear prediction is often called linear predictive coding (LPC) and ...

for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as log area ratios (LAR),

line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear prediction coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g. smaller sensitivity to quantization noise) that make them s ...

(LSP) decomposition and

reflection coefficient In physics and electrical engineering the reflection coefficient is a parameter that describes how much of a wave is reflected by an impedance discontinuity in the transmission medium. It is equal to the ratio of the amplitude of the reflected w ...

s. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in

and speech synthesis. It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the

GSM The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...

standard, for example. It is also used for

secure Secure may refer to: * Security, being protected against danger or loss(es) **Physical security, security measures that are designed to deny unauthorized access to facilities, equipment, and resources **Information security, defending information ...

wireless, where voice must be

digitize DigitizationTech Target. (2011, April). Definition: digitization. ''WhatIs.com''. Retrieved December 15, 2021, from https://whatis.techtarget.com/definition/digitization is the process of converting information into a digital (i.e. computer- ...

encrypted In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can deci ...

and sent over a narrow voice channel; an early example of this is the US government's

Navajo I The Navajo I is a secure telephone built into a briefcase that was developed by the U.S. National Security Agency. According to information on display in 2002 at the NSA's National Cryptologic Museum, 110 units were built in the 1980s for use b ...

. LPC synthesis can be used to construct

vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was ...

s where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in

electronic music Electronic music is a genre of music that employs electronic musical instruments, digital instruments, or circuitry-based music technology in its creation. It includes both music made using electronic and electromechanical means ( electroac ...

. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding

A 10th-order LPC was used in the popular 1980s Speak & Spell (game), Speak & Spell educational toy. LPC predictors are used in Shorten,

MPEG-4 ALS MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 200 ...

FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...

SILK Silk is a natural protein fiber, some forms of which can be woven into textiles. The protein fiber of silk is composed mainly of fibroin and is produced by certain insect larvae to form cocoons. The best-known silk is obtained from the ...

audio codec An audio codec is a device or computer program capable of encoding or decoding a digital data stream (a codec) that encodes or decodes audio. In software, an audio codec is a computer program implementing an algorithm that compresses and decompres ...

, and other

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...

audio codecs. LPC received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.

References

Robert M. Gray, IEEE Signal Processing Society, Distinguished Lecturer Program

External links

real-time LPC analysis/synthesis learning software30 years later Dr Richard Wiggins Talks Speak & Spell development
{{DEFAULTSORT:Linear Predictive Coding Audio codecs Lossy compression algorithms Speech codecs Digital signal processing Japanese inventions