The History of Linear Prediction the History of Linear Predictionl
Total Page:16
File Type:pdf, Size:1020Kb
[dsp HISTORY] Bishnu S. Atal The History of Linear Predictionl n 1965, while attending a seminar on coding (LPC) method, then the multi- an airplane. Since the plane moves, one information theory as part of my pulse LPC and the code-excited LPC. must predict its position at the time the Ph.D. course work at the Polytechnic shell will reach the plane. Wiener’s work Institute of Brooklyn, New York, I PREDICTION AND appeared in his famous monograph [2] came across a paper [1] that intro- PREDICTIVE CODING published in 1949. Iduced me to the concept of predictive The concept of prediction was at least a At about the same time, Claude coding. At the time, there would have quarter of a century old by the time I Shannon made a major contribution [3] been no way to foresee how this concept learned about it. In the 1940s, Norbert to the theory of communication of sig- would influence my work over the years. Wiener developed a mathematical theory nals. His work established a mathe- Looking back, that paper and the ideas for calculating the best filters and pre- matical framework for coding and that it generated must have been the dictors for detecting signals hidden in transmission of signals. Shannon also force that started the ball rolling. My noise. Wiener worked during the described a system for efficient encoding story, told next, recollects the events that Second World War on the problem of of English text based on the predictability led to proposing the linear prediction aiming antiaircraft guns to shoot down of the English language. Following the work of Shannon and Wiener, Peter Elias published two papers EDITORS’ INTRODUCTION [1], [4] in 1955 on predictive coding of Bishnu S. Atal was born on 10 May 1933 in Kanpur, India. He obtained a B.S. degree in signals. Predictive coding is a remarkably physics (1952) from the University of Lucknow, India, a diploma in electrical communi- simple concept, where prediction is used cation engineering (1955) from the Indian Institute of Science, Bangalore, and a Ph.D. degree (1968) in electrical engineering from the Polytechnic Institute of Brooklyn, to achieve efficient coding of signals. New York. He was a lecturer in acoustics at the Department of Electrical (The prediction could be linear or non- Communication Engineering, Indian Institute of Science, Bangalore (1957–1960). Next, linear, but linear prediction is the sim- Dr. Atal was with Bell Labs (1961–1996) and AT&T Labs Research (1997–2002), Florham plest. Moreover, a comprehensive Park, New Jersey, where he was a technology director. He became a Bell Laboratories mathematical theory exists for applying Fellow (1994) and an AT&T Fellow (1997). Since 2002, he has been an affiliate profes- linear prediction to signals.) In predictive sor with the Department of Electrical Engineering, University of Washington, Seattle. coding, both the transmitter and the Dr. Atal’s research work has spanned various aspects of digital signal processing with receiver store the past values of the application to the general area of speech processing. He coedited the books Advances transmitted signal, and from them pre- in Speech Processing (1991), Papers in Speech Communication: Speech Processing dict the current value of the signal. The (1991), Speech Production (1991), Speech Perception (1991), and Speech and Audio transmitter does not transmit the signal Coding for Wireless and Network Applications (1993). He is the recipient of many awards, including the IEEE Centennial Medal (1984), the IEEE Morris N. Liebmann but the encoded prediction error (predic- Award (1986), the IEEE Signal Processing Society Award (1993), and the Benjamin tion residual), which is the difference Franklin Medal in Electrical Engineering (2003). between the signal and its predicted When he does not meditate on professional topics (his happiest professional value. At the receiver, this transmitted moment so far was the invention of multipulse linear predictive coding), Bishnu Atal prediction error is added to the predicted enjoys traveling, collecting stamps, and reading. His reading tastes are diverse, from value to recover the signal. For efficient Indian history books and the famous epic of The Mahabharata, translated from its coding, the successive terms of the pre- fundamental Sanskrit form and edited by J.A.B. Van Buitenen, to famous speeches in diction error should be uncorrelated and Lend Me Your Ears, edited by former presidential speech writer William Safire, and the entropy of its distribution should be successful habits of visionary companies in Built to Last by James Collins and Jerry as small as possible. Porras. In his story, Dr. Atal tells the tale of his work on linear prediction, work that has When I came across Elias’s paper also proved to be built to last. —Adriana Dumitras and George Moschytz while attending the seminar on informa- “DSP History” column editors tion theory mentioned earlier, I found [email protected], [email protected] the concept of predictive coding to be very interesting. However, there were IEEE SIGNAL PROCESSING MAGAZINE [154] MARCH 2006 1053-5888/06/$20.00©2006IEEE two problems. First, my colleagues at work for speech signals. A first step in ms. Next, an 8-tap linear predictor, pre- Bell Labs in the speech research area determining the usefulness of predictive dicting over a short interval of 1 ms, was showed no interest. Speech compression coding for reducing the bit rate for trans- used to predict the samples of the predic- research at that time was primarily in mission of speech over digital channels is tion error that remained after the first the area of channel vocoder (voice to find out if the first-order entropy of prediction. These eight predictor coeffi- coder), a device invented by Homer the distribution of prediction error signal cients were also adjusted every 5 ms. We Dudley in 1930s. Dudley said that the is significantly smaller than the corre- called the method adaptive predictive real information in speech was carried by sponding entropy of the speech signal; coding [5]–[7] (others would call our low-frequency modulation signals corre- smaller entropy of the prediction error method simply linear predictive coding) sponding to slow motion of the vocal could produce a lower bit rate. and demonstrated its speech quality at organs and, therefore, speech can be I wrote a program and the results the IEEE International Conference on compressed by extracting such signals were encouraging. For speech sampled at Speech Communication held in Boston from speech. Although the channel 6.67 kHz, the first-order entropy of pre- in 1967, using two-level encoding of the vocoders did not produce speech of suffi- diction error turned out to be 1.3 b/ prediction error. The audience heard the ciently good quality for telephone appli- sample as compared to 3.3 b/sample for signals illustrated in Figure 1, i.e., Figure cations, they were used during World the speech signal. Since the speech char- 1(a), the original speech signal, Figure War II to provide secure voice communi- acteristics vary with time, the linear pre- 1(b), the noise-like transmitted predic- cation. They remained the central theme dictor had to be adaptive. The prediction tion error, and Figure 1(c), the recon- of speech coding research for about 35 was done in two steps. First, the predic- structed speech signal. Many people years. Second, at the time my work at tion was done over a time interval com- found it hard to believe that a noise-like Bell Labs was primarily in the area of parable to a pitch period using a linear signal could recreate both periodic room acoustics. My knowledge of speech predictor consisting of an adjustable voiced speech and nonperiodic unvoiced processing was rudimentary, and my delay and gain factor, adjusted every 5 speech at the receiver. The predictive knowledge in the area of speech com- pression was practically zero. Both of these problems would disappear faster than I thought. LINEAR PREDICTIVE CODING Original Speech Just a few months later, in 1966, I was one day in Manfred R. Schroeder’s office at Bell Labs when John Pierce brought a (a) tape showing a new speech time com- pression system. Schroeder was not impressed. After listening to the tape, he Transmitted said that there had to be a better way of Prediction Error compressing speech. Manfred mentioned the work in image coding by Chape (b) Cutler at Bell Labs based on differential pulse code modulation (DPCM) tech- nique, which was a simplified version of predictive coding. Our discussions that Reconstructed afternoon kept me thinking. Since my Speech recently started Ph.D. thesis work focused on automatic speaker recogni- (c) tion, I hesitated to start a side project on speech compression at that time. Also, I 02550 had doubts whether I could add anything Time (ms) useful to this crowded field of research. However, Manfred’s remarks at our meet- ing made a deep impression. Waiting at [FIG1] The waveforms for (a) the original speech signal, (b) the transmitted prediction the subway station for a train to error signal, and (c) the reconstructed speech signal in the adaptive predictive coder. The Brooklyn, I convinced myself that I prediction error was quantized by a two-level quantizer whose step size was adjusted once every 5 ms. The prediction combined two predictors: one predicting over a relatively should do some exploratory investigation long time interval comparable to a pitch period and another predicting over a shorter to determine if predictive coding could interval of 1 ms. IEEE SIGNAL PROCESSING MAGAZINE [155] MARCH 2006 [dsp HISTORY] continued coder produced natural-sounding speech our case, the prediction was adaptive and order predictors are about 10 and 20 db, and speech quality was good, except for was conducted over a long time interval, respectively, below the average speech the presence of a low-level crackling at least as long as a pitch period.