[dsp HISTORY] Bishnu S. Atal

The History of Linear Predictionl

n 1965, while attending a seminar on coding (LPC) method, then the multi- an airplane. Since the plane moves, one information theory as part of my pulse LPC and the code-excited LPC. must predict its position at the time the Ph.D. course work at the Polytechnic shell will reach the plane. Wiener’s work Institute of Brooklyn, New York, I PREDICTION AND appeared in his famous monograph [2] came across a paper [1] that intro- PREDICTIVE CODING published in 1949. Iduced me to the concept of predictive The concept of prediction was at least a At about the same time, Claude coding. At the time, there would have quarter of a century old by the time I Shannon made a major contribution [3] been no way to foresee how this concept learned about it. In the 1940s, Norbert to the theory of communication of sig- would influence my work over the years. Wiener developed a mathematical theory nals. His work established a mathe- Looking back, that paper and the ideas for calculating the best filters and pre- matical framework for coding and that it generated must have been the dictors for detecting signals hidden in transmission of signals. Shannon also force that started the ball rolling. My noise. Wiener worked during the described a system for efficient encoding story, told next, recollects the events that Second World War on the problem of of English text based on the predictability led to proposing the linear prediction aiming antiaircraft guns to shoot down of the English language. Following the work of Shannon and Wiener, Peter Elias published two papers EDITORS’ INTRODUCTION [1], [4] in 1955 on predictive coding of Bishnu S. Atal was born on 10 May 1933 in Kanpur, India. He obtained a B.S. degree in signals. Predictive coding is a remarkably physics (1952) from the University of Lucknow, India, a diploma in electrical communi- simple concept, where prediction is used cation engineering (1955) from the Indian Institute of Science, Bangalore, and a Ph.D. degree (1968) in electrical engineering from the Polytechnic Institute of Brooklyn, to achieve efficient coding of signals. New York. He was a lecturer in acoustics at the Department of Electrical (The prediction could be linear or non- Communication Engineering, Indian Institute of Science, Bangalore (1957–1960). Next, linear, but linear prediction is the sim- Dr. Atal was with Bell Labs (1961–1996) and AT&T Labs Research (1997–2002), Florham plest. Moreover, a comprehensive Park, New Jersey, where he was a technology director. He became a Bell Laboratories mathematical theory exists for applying Fellow (1994) and an AT&T Fellow (1997). Since 2002, he has been an affiliate profes- linear prediction to signals.) In predictive sor with the Department of Electrical Engineering, University of Washington, Seattle. coding, both the transmitter and the Dr. Atal’s research work has spanned various aspects of digital signal processing with receiver store the past values of the application to the general area of . He coedited the books Advances transmitted signal, and from them pre- in Speech Processing (1991), Papers in Speech Communication: Speech Processing dict the current value of the signal. The (1991), Speech Production (1991), Speech Perception (1991), and Speech and Audio transmitter does not transmit the signal Coding for Wireless and Network Applications (1993). He is the recipient of many awards, including the IEEE Centennial Medal (1984), the IEEE Morris N. Liebmann but the encoded prediction error (predic- Award (1986), the IEEE Signal Processing Society Award (1993), and the Benjamin tion residual), which is the difference Franklin Medal in Electrical Engineering (2003). between the signal and its predicted When he does not meditate on professional topics (his happiest professional value. At the receiver, this transmitted moment so far was the invention of multipulse ), Bishnu Atal prediction error is added to the predicted enjoys traveling, collecting stamps, and reading. His reading tastes are diverse, from value to recover the signal. For efficient Indian history books and the famous epic of The Mahabharata, translated from its coding, the successive terms of the pre- fundamental Sanskrit form and edited by J.A.B. Van Buitenen, to famous speeches in diction error should be uncorrelated and Lend Me Your Ears, edited by former presidential speech writer William Safire, and the entropy of its distribution should be successful habits of visionary companies in Built to Last by James Collins and Jerry as small as possible. Porras. In his story, Dr. Atal tells the tale of his work on linear prediction, work that has When I came across Elias’s paper also proved to be built to last. —Adriana Dumitras and George Moschytz while attending the seminar on informa- “DSP History” column editors tion theory mentioned earlier, I found [email protected], [email protected] the concept of predictive coding to be very interesting. However, there were

IEEE SIGNAL PROCESSING MAGAZINE [154] MARCH 2006 1053-5888/06/$20.00©2006IEEE two problems. First, my colleagues at work for speech signals. A first step in ms. Next, an 8-tap linear predictor, pre- Bell Labs in the speech research area determining the usefulness of predictive dicting over a short interval of 1 ms, was showed no interest. Speech compression coding for reducing the bit rate for trans- used to predict the samples of the predic- research at that time was primarily in mission of speech over digital channels is tion error that remained after the first the area of channel (voice to find out if the first-order entropy of prediction. These eight predictor coeffi- coder), a device invented by Homer the distribution of prediction error signal cients were also adjusted every 5 ms. We Dudley in 1930s. Dudley said that the is significantly smaller than the corre- called the method adaptive predictive real information in speech was carried by sponding entropy of the speech signal; coding [5]–[7] (others would call our low-frequency modulation signals corre- smaller entropy of the prediction error method simply linear predictive coding) sponding to slow motion of the vocal could produce a lower bit rate. and demonstrated its speech quality at organs and, therefore, speech can be I wrote a program and the results the IEEE International Conference on compressed by extracting such signals were encouraging. For speech sampled at Speech Communication held in Boston from speech. Although the channel 6.67 kHz, the first-order entropy of pre- in 1967, using two-level encoding of the did not produce speech of suffi- diction error turned out to be 1.3 b/ prediction error. The audience heard the ciently good quality for telephone appli- sample as compared to 3.3 b/sample for signals illustrated in Figure 1, i.e., Figure cations, they were used during World the speech signal. Since the speech char- 1(a), the original speech signal, Figure War II to provide secure voice communi- acteristics vary with time, the linear pre- 1(b), the noise-like transmitted predic- cation. They remained the central theme dictor had to be adaptive. The prediction tion error, and Figure 1(c), the recon- of speech coding research for about 35 was done in two steps. First, the predic- structed speech signal. Many people years. Second, at the time my work at tion was done over a time interval com- found it hard to believe that a noise-like Bell Labs was primarily in the area of parable to a pitch period using a linear signal could recreate both periodic room acoustics. My knowledge of speech predictor consisting of an adjustable voiced speech and nonperiodic unvoiced processing was rudimentary, and my delay and gain factor, adjusted every 5 speech at the receiver. The predictive knowledge in the area of speech com- pression was practically zero. Both of these problems would disappear faster than I thought.

LINEAR PREDICTIVE CODING Original Speech Just a few months later, in 1966, I was one day in Manfred R. Schroeder’s office at Bell Labs when John Pierce brought a (a) tape showing a new speech time com- pression system. Schroeder was not impressed. After listening to the tape, he Transmitted said that there had to be a better way of Prediction Error compressing speech. Manfred mentioned the work in image coding by Chape (b) Cutler at Bell Labs based on differential pulse code modulation (DPCM) tech- nique, which was a simplified version of predictive coding. Our discussions that Reconstructed afternoon kept me thinking. Since my Speech recently started Ph.D. thesis work focused on automatic speaker recogni- (c) tion, I hesitated to start a side project on speech compression at that time. Also, I 02550 had doubts whether I could add anything Time (ms) useful to this crowded field of research. However, Manfred’s remarks at our meet- ing made a deep impression. Waiting at [FIG1] The waveforms for (a) the original speech signal, (b) the transmitted prediction the subway station for a train to error signal, and (c) the reconstructed speech signal in the adaptive predictive coder. The Brooklyn, I convinced myself that I prediction error was quantized by a two-level quantizer whose step size was adjusted once every 5 ms. The prediction combined two predictors: one predicting over a relatively should do some exploratory investigation long time interval comparable to a pitch period and another predicting over a shorter to determine if predictive coding could interval of 1 ms.

IEEE SIGNAL PROCESSING MAGAZINE [155] MARCH 2006 [dsp HISTORY] continued coder produced natural-sounding speech our case, the prediction was adaptive and order predictors are about 10 and 20 db, and speech quality was good, except for was conducted over a long time interval, respectively, below the average speech the presence of a low-level crackling at least as long as a pitch period. spectrum for voiced speech. A small noise that could be heard with careful Prediction over a long time interval is value of the prediction error is necessary listening over headphones. necessary to produce a “white” noise-like for producing small quantizing noise in a Further research on adaptive predic- prediction error. Figure 2(a) shows the predictive coding system. tive coding brought the bit rate for high- spectrum of the original speech signal, Independently of the work at Bell quality speech coding to 16 kb/s, a Figure 2(b) shows the spectrum of the Labs on predictive coding, in 1966 reduction by a factor of four over the prediction error with a 16th order pre- Fumitada Itakura and Shuzo Saito at pulse code modulation (PCM) rate. By dictor, and Figure 2(c) shows the spec- NTT, Japan, developed a statistical contrast, predictive coding systems such trum of the prediction error with a 128th approach for the estimation of speech as DPCM, which have been used earlier order predictor for a frame of voiced spectral density using a maximum likeli- for speech coding, used a fixed predictor speech. The spectrum envelope of predic- hood method [8], [9]. Their work was and only a few past samples for predic- tion error with a 16th order predictor is originally presented at conferences in tion. Consequently, they could not pro- flat, but the spectral fine structure is not. Japan and, therefore, was not known duce high-quality speech at bit rates Moreover, the average spectral levels of worldwide. The mathematics behind significantly lower than the PCM rate. In the prediction error with 16th and 128th their statistical approach were slightly different than that of linear prediction, but the overall results were identical. Based on their statistical approach, 60 Itakura and Saito introduced new speech parameters such as the partial autocorre- Speech lation (PARCOR) coefficients for efficient encoding of linear prediction coeffi- dB cients. Later, Itakura discovered the line spectrum pairs, which are now widely used in speech coding applications.

0 FROM LPC THEORY TO APPLICATIONS (a) LPC rapidly became a very popular topic in speech research. A large number of Prediction Error p=16 40 people contributed valuable ideas for the application of the basic theory of linear prediction to speech analysis and syn- dB thesis. The excitement was evident at practically every technical meeting. 0 Research on LPC vocoders gained momentum partly due to increased (b) funding from the U.S. government and Prediction Error p=128 its selection for the 2.4 kb/s secure-voice 30 standard LPC10 [10]. LPC required a lot dB of computations when it started being applied to speech. Fortunately, comput- 0 er technology was rapidly evolving. By 1973, the first compact real-time LPC vocoder had been implemented at Philco-Ford. In 1978, Texas Instruments 01234 (c) introduced a popular LPC-based toy that was called “Speak and Spell.” Frequency (kHz) Although LPC vocoders produced [FIG2] The spectrum of (a) the original speech signal, (b) the prediction error with a 16th intelligible speech at low bit rates, the order predictor, and (c) the prediction error with a 128th order predictor for a frame of speech quality was not good enough for voiced speech. The average spectral levels of the prediction error with 16th and 128th order predictors are about 10 and 20 db, respectively, below the average speech commercial . The need for spectrum for voiced speech. high-quality speech coding was on the

IEEE SIGNAL PROCESSING MAGAZINE [156] MARCH 2006 horizon as commercial telephony began computationally tractable solution was duce high-quality speech at 8 kb/s and developing in new directions. In 1977, obtained by determining the location even lower. They form the basis of most Bell Labs constructed and operated a and amplitude of pulses, one pulse at a current international standards for digi- prototype cellular system for mobile time, thereby converting a problem with tal speech transmission and provide communication. Two years later, the first many unknowns into a problem with speech coding for hundreds of millions commercial cellular telephone system only two unknowns. The results were of cell phones and computers worldwide. began to operate in Tokyo. It became startling. When I heard the synthetic The introduction of linear prediction clear that the increasing demand for speech from a multipulse LPC synthesizer, techniques started a new era in speech cellular phones could not be met without it sounded just like the original and com- processing about 40 years ago. Since reducing the bit rate for speech trans- pletely natural, with no background then, these techniques have found mission. How to produce high-quality noise or distortions. Using multipulse numerous applications. We were fortu- speech at low bit rates was still LPC, we brought the bit rate for high nate that, by the time LPC methods unresolved. quality speech to 9.6 kb/s. became the focus of speech processing research, the digital hardware was evolv- EXTENSIONS: MULTIPULSE LPC EXTENSIONS: CODE-EXCITED LPC ing at a revolutionary pace with the Synthesizing speech of high quality on The multipulse idea quickly evolved into invention of integrated circuits (IC) by computers was a difficult problem and code-excited linear prediction (CELP) Jack Kilby in 1958 and the discovery of the topic of a meeting that I had on the [12], [13]. Ideally, the transmitted signal Moore’s law by Gordon Moore in 1965. afternoon of 20 February 1981 with Joel in predictive coders must be random and The advances in IC design leading to fast Remde. He was a linguist by training and therefore the pulses for the multipulse (DSP) chips and, an expert system-level programmer. I synthesizer could be selected from a in speech coding, made possible the spent a few hours talking to Joel explain- codebook populated with “random white large-scale deployment of speech com- ing the problem, but he was not noise” sequences. The searches for select- pression technology, from cell phones to impressed. Instead, he tried to grasp the ing pulse sequences in CELP coders voice-over-IP telephones. The progress in problem by asking probing questions of a required a large number of computa- discovering novel techniques for speech fundamental nature. After many discus- tions; the first simulation of CELP in processing is likely to continue. The IC sions, we figured out that one could pro- 1983 required over 150 s on a Cray-1 and DSP revolutions are still going duce speech of any desired quality by supercomputer to process 1 s of speech. strong and will provide big opportunities providing a sufficient number of pulses The processing capabilities of digital for applying sophisticated speech pro- at the input of an all-pole filter. That was hardware (microprocessors and digital cessing algorithms that take advantage of the multipulse idea [11] for speech cod- signal processors) increased roughly 100 the exciting and evolving digital telecom- ing and it focused the speech coding times over the next ten years and, by munication environment. research on a different track: speech cod- 1993, the CELP coders were implement- ing became basically a problem of gener- ed for real-time operation on a single ating a pulse sequence that will produce DSP chip. CELP coders are able to pro- (continued on page 161) at the synthesizer a speech signal that to human ears will sound identical to the original speech. The basic philosophy of Original Speech multipulse LPC is illustrated in Figure 3. The synthetic speech samples at the out- put of an all-pole filter are compared Weighted sn with the corresponding samples of the Error Multipulse ^ E un Speech sn Weighting w original speech signal and the resulting Excitation – Synthesizer Filter W error signal is weighted to produce an Generator approximate measure of the perceptual difference between the original and syn- Error thetic speech signals. Joel left that Friday Synthetic Speech afternoon for a two-week vacation in Egypt and I got busy developing the pro- cedure for multipulse analysis. [FIG3] Block diagram of the basic multipulse analysis. A speech synthesizer, typically an In general, a procedure for multipulse all-pole LPC filter, produces samples of synthetic speech. The synthetic speech samples Sˆ n analysis would be impractical if one are compared with the corresponding speech samples Sn of the original speech signal to seeks to determine all the pulses at once produce an error signal. The error signal is then weighted to produce an approximate measure E of the perceptual difference between the original and synthetic speech even over a short interval of time (5– w signals. The multipulse excitation generator produces a sequence of excitation pulses un 10 ms). I discovered that an efficient and that minimizes the weighted error.

IEEE SIGNAL PROCESSING MAGAZINE [157] MARCH 2006 different phases. However, this increase use of simple concepts whenever possi- AUTHOR in design/code complexity probably does ble. (“Things should be described as sim- Mark Borgerding is a principal engineer not outweigh the meager cost of multi- ply as possible, but no simpler.”—A. at 3dB Labs, Inc., a small company spe- plying by a complex phasor. Einstein.) We owe it to ourselves as engi- cializing in DSP consulting and contract If coarse-grained mixing is unaccept- neers to realize those simple concepts as engineering services. He is often found able, mixing in the time domain is a bet- efficiently as possible. lurking on the comp.dsp newsgroup or ter solution. The general solution to The familiar and simple concepts tinkering with his KISSFFT library. allow multiple channels with multiple shown in Figure 2 may be used for the mixing frequencies is to postpone the design of mixed, filtered, and decimated REFERENCES mixing operation until the filtered, deci- channels. The design may be implement- [1] A. Oppenheimer and R. Schafer, Discrete-Time mated data is back in the time domain. ed more efficiently using the equivalent Signal Processing. Upper Saddle River, NJ: Prentice- Hall, 1989. If mixing is performed in the time structure shown in Figure 3. [2] L. Rabiner and B. Gold, Theory and Application domain: of Digital Signal Processing. Englewood Cliffs, NJ: ■ All filters must be specified in SUMMARY Prentice-Hall, 1975. terms of the input frequency (i.e., In this article, we outlined considera- [3] R. Lyons, Understanding Digital Signal Processing, 2/E. Upper Saddle River, NJ: Prentice- nonshifted) spectrum. tions for implementing multiple OS Hall, 2004. ■ The complex sinusoid used for mix- channels with decimation and mixing in [4] S. Orfanidis, Introduction to Signal Processing. ing the output signal must be created the frequency domain, as well as supply- Englewood Cliffs, NJ: Prentice-Hall, 1995.

at the output rate. ing recommendations for choosing FFT [5] M. Frerking, Digital Signal Processing in size. We also provided implementation Communication Systems. New York: Chapman & Hall, 1994. PUTTING IT ALL TOGETHER guidance to streamline this powerful [6] R. Crochiere and L. Rabiner, Multirate Digital By making efficient implementations of multichannel filtering, down-conversion, Signal Processing. Englewood Cliffs, NJ: Prentice- conceptually simple tools, we help our- and decimation process. Hall, 1983. selves to create simple designs that are as [7] M. Boucheret, I. Mortensen, and H. Favaro, “Fast efficient as they are easy to describe. ACKNOWLEDGMENTS convolution filter banks for satellite payloads with on-board processing,” IEEE J. Select. Areas. Humans are affected greatly by the sim- I would like to thank my wife, Elaine, for Commun., vol. 17, no. 2, pp. 238–248, Feb. 1999. plicity of the concepts and tools used in helping me find the time to write this, [8] S. Muramatsu and H. Kiya, “Extended overlap- designing and describing a system. We and David Evans, for being a DSP mentor add and -save methods for multirate signal process- ing,” IEEE Trans. Signal Processing, vol. 45, no. 9, owe it to ourselves as humans to make and sounding board. pp. 2376–2380, Sep. 1997. [SP]

[dsp HISTORY] continued from page 157

REFERENCES [1] P. Elias, “Predictive coding I,” IRE Trans. Inform. [6] B.S. Atal and M.R. Schroeder, “Adaptive predic- [10] T.E. Tremain, “The government standard linear Theory, vol. IT-1 no. 1, pp. 16–24, Mar. 1955. tive coding of speech,” Bell Syst. Tech. J., vol. 49 no. predictive coding algorithm: LPC10,” Speech 8, pp. 1973–1986, Oct. 1970. Technol., vol. 1, pp. 40–49, Apr. 1982. [2] N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Cambridge, [7] B.S. Atal and S.L. Hanauer, “Speech analysis and [11] B.S. Atal and J.R. Remde, “A new model of MA: MIT Press, 1949. synthesis by linear prediction of the speech wave,” J. LPC excitation for producing natural-sounding Acoust. Soc. Amer., vol. 50, pp. 637–655, Aug. 1971. speech at low bit rates,” in Proc. ICASSP’82, May [3] C.E. Shannon, “A mathematical theory of com- 1982, pp. 614–617. munication,” Bell Syst. Tech. J., vol. 27, pp. 379–423, [8] S. Saito, Fukumura, and F. Itakura, “Theoretical 623–656, 1948. consideration of the statistical optimum recognition [12] B.S. Atal and M.R. Schroeder, “Stochastic coding of the spectral density of speech”, J. Acoust. Soc. of speech signals at very low bit rates,” in Proc. Int. [4] P. Elias, “Predictive coding II,” IRE Trans. Inform. Japan, Jan. 1967. Conf. Commun., ICC’84, May 1984, pp. 1610–1613. Theory, vol. IT-1 no. 1, pp. 24–33, Mar. 1955. [9] F. Itakura and S. Saito, “A statistical method for [13] M.R. Schroeder and B.S. Atal, “Code-excited lin- [5] B.S. Atal and M.R. Schroeder, “Predictive coding estimation of speech spectral density and formant ear prediction (CELP): High-quality speech at very of speech,” in Proc. 1967 Conf. Communications and frequencies,” Electron. Commun. Japan, vol. 53-A, low bit rates,” in Proc ICASSP’85, Mar. 1985, pp. Proc., Nov. 1967, pp. 360–361. pp. 36–43, 1970. 937–940. [SP]

IEEE SIGNAL PROCESSING MAGAZINE [161] MARCH 2006