Linear predictive coding (LPC) is a method used mostly in

audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...

and speech processing for representing the

spectral envelope In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed into ...

of a

digital Digital usually refers to something using discrete digits, often binary digits. Businesses *Digital bank, a form of financial institution *Digital Equipment Corporation (DEC) or Digital, a computer company *Digital Research (DR or DRI), a software ...

signal A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology. In ...

speech Speech is the use of the human voice as a medium for language. Spoken language combines vowel and consonant sounds to form units of meaning like words, which belong to a language's lexicon. There are many different intentional speech acts, suc ...

in compressed form, using the information of a

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

predictive model. LPC is the most widely used method in

speech coding Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic da ...

and

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low

bit rate In telecommunications and computing, bit rate (bitrate or as a variable ''R'') is the number of bits that are conveyed or processed per unit of time. The bit rate is expressed in the unit bit per second (symbol: bit/s), often in conjunction ...

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for

voiced Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as ''unvoiced'') or voiced. The term, however, is used to refe ...

sounds), with occasional added hissing and popping sounds (for

voiceless In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Phonologically, it is a type of phonation, which contrasts with other states of the larynx, but some object that the word phonation implies v ...

sounds such as

sibilant Sibilants (from 'hissing') are fricative and affricate consonants of higher amplitude and pitch, made by directing a stream of air with the tongue towards the teeth. Examples of sibilants are the consonants at the beginning of the English w ...

s and

plosive In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases. The occlusion may be made with the tongue tip or blade (, ), tongue body (, ), lip ...

s). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The

glottis The glottis (: glottises or glottides) is the opening between the vocal folds (the rima glottidis). The glottis is crucial in producing sound from the vocal folds. Etymology From Ancient Greek ''γλωττίς'' (glōttís), derived from ''γ ...

(the space between the vocal folds) produces the buzz, which is characterized by its intensity (

loudness In acoustics, loudness is the subjectivity, subjective perception of sound pressure. More formally, it is defined as the "attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud". The relat ...

) and

frequency Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...

(pitch). The

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx in birds) is filtered. In birds, it consists of the trachea, the syrinx, the oral cavity, the upper part of t ...

(the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to

formant In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmo ...

s, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least the 1940s when

Norbert Wiener Norbert Wiener (November 26, 1894 – March 18, 1964) was an American computer scientist, mathematician, and philosopher. He became a professor of mathematics at the Massachusetts Institute of Technology ( MIT). A child prodigy, Wiener late ...

developed a mathematical theory for calculating the best

filters Filtration is a physical process that separates solid matter and fluid from a mixture. Filter, filtering, filters or filtration may also refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Fil ...

and predictors for detecting signals hidden in noise. Soon after

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...

established a general theory of coding, work on predictive coding was done by C. Chapin Cutler, Bernard M. Oliver and Henry C. Harrison. Peter Elias in 1955 published two papers on predictive coding of signals. Linear predictors were applied to speech analysis independently by Fumitada Itakura of Nagoya University and Shuzo Saito of

Nippon Telegraph and Telephone (NTT) is a Japanese telecommunications holding company headquartered in Tokyo, Japan. Ranked 55th in ''Fortune'' Global 500, NTT is the fourth largest telecommunications company in the world in terms of revenue, as well as the third largest pu ...

in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on principle of maximum entropy. In 1969, Itakura and Saito introduced method based on partial correlation (PARCOR), Glen Culler proposed real-time speech encoding, and Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the

Acoustical Society of America The Acoustical Society of America (ASA) is an international scientific society founded in 1929 dedicated to generating, disseminating and promoting the knowledge of acoustics and its practical applications. The Society is primarily a voluntary org ...

. In 1971, realtime LPC using

16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...

LPC hardware was demonstrated by Philco-Ford; four units were sold. LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s1980s. In 1978, Atal and Vishwanath ''et al.'' of BBN developed the first variable-rate LPC algorithm. The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech

codec A codec is a computer hardware or software component that encodes or decodes a data stream or signal. ''Codec'' is a portmanteau of coder/decoder. In electronic communications, an endec is a device that acts as both an encoder and a decoder o ...

called adaptive predictive coding, which used a psychoacoustic coding algorithm exploiting the masking properties of the human ear. This later became the basis for the perceptual coding technique used by the MP3 audio compression format, introduced in 1993. Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985. LPC is the basis for voice-over-IP (VoIP) technology. In 1972, Bob Kahn of ARPA with Jim Forgie of Lincoln Laboratory (LL) and Dave Walden of

BBN Technologies Raytheon BBN (originally Bolt, Beranek and Newman, Inc.) is an American research and development company based in Cambridge, Massachusetts. In 1966, the Franklin Institute awarded the firm the Frank P. Brown Medal, in 1999 BBN received the ...

started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/ s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the

ARPANET The Advanced Research Projects Agency Network (ARPANET) was the first wide-area packet-switched network with distributed control and one of the first computer networks to implement the TCP/IP protocol suite. Both technologies became the tec ...

at 3500 bit/s between Culler-Harrison and Lincoln Laboratory.

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as log area ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in

and

. It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the

GSM The Global System for Mobile Communications (GSM) is a family of standards to describe the protocols for second-generation (2G) digital cellular networks, as used by mobile devices such as mobile phones and Mobile broadband modem, mobile broadba ...

standard, for example. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel; an early example of this is the US government's Navajo I. LPC synthesis can be used to construct vocoders where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in

electronic music Electronic music broadly is a group of music genres that employ electronic musical instruments, circuitry-based music technology and software, or general-purpose electronics (such as personal computers) in its creation. It includes both music ...

. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. A 10th-order LPC was used in the popular 1980s Speak & Spell educational toy. LPC predictors are used in Shorten,

MPEG-4 ALS MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/ IEC 14496-3:2005/Amd 2:2006 in 20 ...

FLAC FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software ...

SILK Silk is a natural fiber, natural protein fiber, some forms of which can be weaving, woven into textiles. The protein fiber of silk is composed mainly of fibroin and is most commonly produced by certain insect larvae to form cocoon (silk), c ...

audio codec, and other lossless audio codecs. LPC has received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.

References

External links

real-time LPC analysis/synthesis learning software30 years later Dr Richard Wiggins Talks Speak & Spell development

{{DEFAULTSORT:Linear Predictive Coding Audio codecs Lossy compression algorithms Speech codecs Digital signal processing Japanese inventions Data compression