Linear predictive coding (LPC) is a method used mostly in

audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves— longitudinal waves which travel through air, consist ...

and speech processing for representing the spectral envelope of a

digital Digital usually refers to something using discrete digits, often binary digits. Technology and computing Hardware *Digital electronics, electronic circuits which operate using digital signals ** Digital camera, which captures and stores digital ...

signal of

speech Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are th ...

in compressed form, using the information of a

linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...

predictive model. LPC is the most widely used method in speech coding and

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for voiced sounds), with occasional added hissing and popping sounds (for

voiceless In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies ...

sounds such as

sibilant Sibilants are fricative consonants of higher amplitude and pitch, made by directing a stream of air with the tongue towards the teeth. Examples of sibilants are the consonants at the beginning of the English words ''sip'', ''zip'', ''ship'', and ...

s and

plosive In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases. The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lip ...

s). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The

glottis The glottis is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing vowels and voiced consonants. Etymology From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γλῶττα'' (glôtta), v ...

(the space between the vocal folds) produces the buzz, which is characterized by its intensity (

loudness In acoustics, loudness is the subjective perception of sound pressure. More formally, it is defined as, "That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relation of ph ...

) and

frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...

(pitch). The

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source ( larynx in mammals; syrinx in birds) is filtered. In birds it consists of the trachea, the syrinx, the oral cavity, the upper part of th ...

(the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to

formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmoni ...

s, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least 1940s when

Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American mathematician and philosopher. He was a professor of mathematics at the Massachusetts Institute of Technology (MIT). A child prodigy, Wiener later became an early researcher ...

developed a mathematical theory for calculating the best

filters Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component that ...

and predictors for detecting signals hidden in noise. Soon after

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, and cryptographer known as a "father of information theory". As a 21-year-old master's degree student at the Massachusetts I ...

established a general theory of coding, work on predictive coding was done by C. Chapin Cutler, Bernard M. Oliver and Henry C. Harrison. Peter Elias in 1955 published two papers on predictive coding of signals. Linear predictors were applied to speech analysis independently by Fumitada Itakura of

Nagoya University , abbreviated to or NU, is a Japanese national research university located in Chikusa-ku, Nagoya. It was the seventh Imperial University in Japan, one of the first five Designated National University and selected as a Top Type university of ...

and Shuzo Saito of

Nippon Telegraph and Telephone , commonly known as NTT, is a Japanese telecommunications company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third la ...

in 1966 and in 1967 by

Bishnu S. Atal Bishnu S. Atal (born 1933) is an Indian physicist and engineer. He is a noted researcher in acoustics, and is best known for developments in speech coding. He advanced linear predictive coding (LPC) during the late 1960s to 1970s, and developed ...

, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on

principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...

. In 1969, Itakura and Saito introduced method based on

partial correlation In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two ...

(PARCOR),

Glen Culler Glen Jacob Culler (July 7, 1927 – May 3, 2003) was an American professor of electrical engineering and an important early innovator in the development of the Internet. Culler joined the University of California, Santa Barbara (UCSB) mathematics ...

proposed real-time speech encoding, and

presented an LPC speech coder at the Annual Meeting of the

Acoustical Society of America The Acoustical Society of America (ASA) is an international scientific society founded in 1929 dedicated to generating, disseminating and promoting the knowledge of acoustics and its practical applications. The Society is primarily a voluntary org ...

. In 1971, realtime LPC using 16-bit LPC hardware was demonstrated by

Philco-Ford Philco (an acronym for Philadelphia Battery Company) is an American electronics manufacturer headquartered in Philadelphia. Philco was a pioneer in battery, radio, and television production. In 1961, the company was purchased by Ford and, from 19 ...

; four units were sold. LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s1980s. In 1978, Atal and Vishwanath ''et al.'' of BBN developed the first variable-rate LPC algorithm. The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech

codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...

called

adaptive predictive coding Adaptive predictive coding (APC) is a narrowband analog-to-digital conversion that uses a one-level or multilevel sampling system in which the value of the signal at each sampling instant is predicted according to a linear function of the past valu ...

, which used a psychoacoustic coding algorithm exploiting the masking properties of the human ear. This later became the basis for the perceptual coding technique used by the MP3 audio compression format, introduced in 1993. Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985. LPC is the basis for

voice-over-IP Voice over Internet Protocol (VoIP), also called IP telephony, is a method and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. The terms Internet ...

(VoIP) technology. In 1972,

Bob Kahn Robert Elliot Kahn (born December 23, 1938) is an American electrical engineer who, along with Vint Cerf, first proposed the Transmission Control Protocol (TCP) and the Internet Protocol (IP), the fundamental communication protocols at the hea ...

of ARPA, with Jim Forgie (

Lincoln Laboratory The MIT Lincoln Laboratory, located in Lexington, Massachusetts, is a United States Department of Defense federally funded research and development center chartered to apply advanced technology to problems of national security. Research and d ...

, LL) and Dave Walden (

BBN Technologies Raytheon BBN (originally Bolt Beranek and Newman Inc.) is an American research and development company, based next to Fresh Pond in Cambridge, Massachusetts, United States. In 1966, the Franklin Institute awarded the firm the Frank P. Brow ...

), started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the

ARPANET The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first networks to implement the TCP/IP protocol suite. Both technologies became the technical fou ...

at 3500 bit/s between Culler-Harrison and Lincoln Laboratory. In 1976, the first LPC conference took place over the ARPANET using the

Network Voice Protocol The Network Voice Protocol (NVP) was a pioneering computer network protocol for transporting human speech over packetized communications networks. It was an early example of Voice over Internet Protocol technology. History NVP was first defin ...

, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s.

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as

log area ratio Log area ratios (LAR) can be used to represent reflection coefficients (another form for linear prediction coefficients) for transmission over a channel. While not as efficient as line spectral pairs (LSPs), log area ratios are much simpler to comp ...

s (LAR), line spectral pairs (LSP) decomposition and

reflection coefficient In physics and electrical engineering the reflection coefficient is a parameter that describes how much of a wave is reflected by an impedance discontinuity in the transmission medium. It is equal to the ratio of the amplitude of the reflected ...

s. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in speech coding and

. It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the

GSM The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation ( 2G) digital cellular networks used by mobile devices such ...

standard, for example. It is also used for

secure Secure may refer to: * Security, being protected against danger or loss(es) ** Physical security, security measures that are designed to deny unauthorized access to facilities, equipment, and resources **Information security, defending information ...

wireless, where voice must be

digitize DigitizationTech Target. (2011, April). Definition: digitization. ''WhatIs.com''. Retrieved December 15, 2021, from https://whatis.techtarget.com/definition/digitization is the process of converting information into a digital (i.e. computer-r ...

d, encrypted and sent over a narrow voice channel; an early example of this is the US government's Navajo I. LPC synthesis can be used to construct

vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder ...

s where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in

electronic music Electronic music is a genre of music that employs electronic musical instruments, digital instruments, or circuitry-based music technology in its creation. It includes both music made using electronic and electromechanical means ( electro ...

Paul Lansky Paul Lansky (born June 18, 1944, in New York) is an American composer. Biography Paul Lansky (born 1944) is an American composer. He was educated at Manhattan's High School of Music and Art, Queens College and Princeton University, studying wit ...

made the well-known computer music piece notjustmoreidlechatter using linear predictive coding

A 10th-order LPC was used in the popular 1980s Speak & Spell (game), Speak & Spell educational toy. LPC predictors are used in Shorten, MPEG-4 ALS,

FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference softwa ...

SILK Silk is a natural protein fiber, some forms of which can be woven into textiles. The protein fiber of silk is composed mainly of fibroin and is produced by certain insect larvae to form cocoons. The best-known silk is obtained from th ...

audio codec, and other lossless audio codecs. LPC received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.

References

Robert M. Gray, IEEE Signal Processing Society, Distinguished Lecturer Program

External links

real-time LPC analysis/synthesis learning software30 years later Dr Richard Wiggins Talks Speak & Spell development
{{DEFAULTSORT:Linear Predictive Coding Audio codecs Lossy compression algorithms Speech codecs Digital signal processing Japanese inventions