Linear predictive coding (LPC) is a method used mostly in

audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves— longitudinal waves which travel through air, consist ...

and

speech processing Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied ...

for representing the

spectral envelope The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...

of a

digital Digital usually refers to something using discrete digits, often binary digits. Technology and computing Hardware *Digital electronics, electronic circuits which operate using digital signals **Digital camera, which captures and stores digital i ...

signal In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...

speech Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if they are th ...

in compressed form, using the information of a

linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...

predictive model. LPC is the most widely used method in

speech coding Speech coding is an application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic ...

and

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...

. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for

voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refe ...

sounds), with occasional added hissing and popping sounds (for

voiceless In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...

sounds such as

sibilant Sibilants are fricative consonants of higher amplitude and pitch, made by directing a stream of air with the tongue towards the teeth. Examples of sibilants are the consonants at the beginning of the English words ''sip'', ''zip'', ''ship'', an ...

s and plosives). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The

glottis The glottis is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing vowels and voiced consonants. Etymology From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γλῶττα'' (glôtta), ...

(the space between the vocal folds) produces the buzz, which is characterized by its intensity (

loudness In acoustics, loudness is the subjective perception of sound pressure. More formally, it is defined as, "That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relation of phys ...

) and

frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from '' angular frequency''. Frequency is measured in hertz (Hz) which is ...

(pitch). The

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source ( larynx in mammals; syrinx in birds) is filtered. In birds it consists of the trachea, the syrinx, the oral cavity, the upper part of the e ...

(the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to

formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...

s, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least 1940s when

Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American mathematician and philosopher. He was a professor of mathematics at the Massachusetts Institute of Technology (MIT). A child prodigy, Wiener later became an early researcher ...

developed a mathematical theory for calculating the best

filters Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component th ...

and predictors for detecting signals hidden in noise. Soon after

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, and cryptographer known as a "father of information theory". As a 21-year-old master's degree student at the Massachusetts In ...

established a general theory of coding, work on predictive coding was done by C. Chapin Cutler, Bernard M. Oliver and Henry C. Harrison. Peter Elias in 1955 published two papers on predictive coding of signals. Linear predictors were applied to speech analysis independently by

Fumitada Itakura is a Japanese scientist. He did pioneering work in statistical signal processing, and its application to speech analysis, synthesis and coding, including the development of the linear predictive coding (LPC) and line spectral pairs (LSP) meth ...

Nagoya University , abbreviated to or NU, is a Japanese national research university located in Chikusa-ku, Nagoya. It was the seventh Imperial University in Japan, one of the first five Designated National University and selected as a Top Type university of ...

and Shuzo Saito of

Nippon Telegraph and Telephone , commonly known as NTT, is a Japanese telecommunications company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third la ...

in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on

principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...

. In 1969, Itakura and Saito introduced method based on

partial correlation In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two ...

(PARCOR), Glen Culler proposed real-time speech encoding, and Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the

Acoustical Society of America The Acoustical Society of America (ASA) is an international scientific society founded in 1929 dedicated to generating, disseminating and promoting the knowledge of acoustics and its practical applications. The Society is primarily a voluntary org ...

. In 1971, realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold. LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s1980s. In 1978, Atal and Vishwanath ''et al.'' of BBN developed the first variable-rate LPC algorithm. The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech

codec A codec is a device or computer program that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder on a signal or ...

called adaptive predictive coding, which used a psychoacoustic coding algorithm exploiting the masking properties of the human ear. This later became the basis for the perceptual coding technique used by the

MP3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Orig ...

audio compression format, introduced in 1993. Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985. LPC is the basis for voice-over-IP (VoIP) technology. In 1972,

Bob Kahn Robert Elliot Kahn (born December 23, 1938) is an American electrical engineer who, along with Vint Cerf, first proposed the Transmission Control Protocol (TCP) and the Internet Protocol (IP), the fundamental communication protocols at the hear ...

of ARPA, with Jim Forgie (

Lincoln Laboratory The MIT Lincoln Laboratory, located in Lexington, Massachusetts, is a United States Department of Defense federally funded research and development center chartered to apply advanced technology to problems of national security. Research and de ...

, LL) and Dave Walden (

BBN Technologies Raytheon BBN (originally Bolt Beranek and Newman Inc.) is an American research and development company, based next to Fresh Pond in Cambridge, Massachusetts, United States. In 1966, the Franklin Institute awarded the firm the Frank P. Brown ...

), started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the

ARPANET The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first networks to implement the TCP/IP protocol suite. Both technologies became the technical foun ...

at 3500 bit/s between Culler-Harrison and Lincoln Laboratory. In 1976, the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s.

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as log area ratios (LAR),

line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent linear prediction coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g. smaller sensitivity to quantization noise) that make them s ...

(LSP) decomposition and

reflection coefficient In physics and electrical engineering the reflection coefficient is a parameter that describes how much of a wave is reflected by an impedance discontinuity in the transmission medium. It is equal to the ratio of the amplitude of the reflected w ...

s. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in

and

. It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the GSM standard, for example. It is also used for secure wireless, where voice must be

digitize DigitizationTech Target. (2011, April). Definition: digitization. ''WhatIs.com''. Retrieved December 15, 2021, from https://whatis.techtarget.com/definition/digitization is the process of converting information into a digital (i.e. computer- ...

encrypted In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decip ...

and sent over a narrow voice channel; an early example of this is the US government's

Navajo I The Navajo I is a secure telephone built into a briefcase that was developed by the U.S. National Security Agency. According to information on display in 2002 at the NSA's National Cryptologic Museum, 110 units were built in the 1980s for use ...

. LPC synthesis can be used to construct

vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was ...

s where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in

electronic music Electronic music is a Music genre, genre of music that employs electronic musical instruments, digital instruments, or electronics, circuitry-based music technology in its creation. It includes both music made using electronic and electromech ...

. Paul Lansky made the well-known computer music piece

notjustmoreidlechatter Paul Lansky (born June 18, 1944, in New York) is an American composer. Biography Paul Lansky (born 1944) is an American composer. He was educated at Manhattan's High School of Music and Art, Queens College and Princeton University, studying wi ...

using linear predictive coding

A 10th-order LPC was used in the popular 1980s Speak & Spell (game), Speak & Spell educational toy. LPC predictors are used in Shorten,

MPEG-4 ALS MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/ IEC 14496-3:2005/Amd 2:2006 in 20 ...

FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...

SILK Silk is a natural protein fiber, some forms of which can be woven into textiles. The protein fiber of silk is composed mainly of fibroin and is produced by certain insect larvae to form cocoons. The best-known silk is obtained from the ...

audio codec An audio codec is a device or computer program capable of encoding or decoding a digital data stream (a codec) that encodes or decodes audio. In software, an audio codec is a computer program implementing an algorithm that compresses and decompres ...

, and other

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...

audio codecs. LPC received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.

References

Robert M. Gray, IEEE Signal Processing Society, Distinguished Lecturer Program

External links

real-time LPC analysis/synthesis learning software30 years later Dr Richard Wiggins Talks Speak & Spell development
{{DEFAULTSORT:Linear Predictive Coding Audio codecs Lossy compression algorithms Speech codecs Digital signal processing Japanese inventions