PROCEEDINGSIEEE, OF THE VOL. 70, NO. 9, SEPTEMBER885 1982 A Historical Perspective of Spectrum Estimation

ENDERS A. ROBINSON

Invited Paper

Alwhrct-The prehistory of spectral estimation has its mots in an- times, credit for the empirical discovery of spectra goes to the cient times with the development of the calendar and the clock The diversified genius of Sir Isaac Newton [ 11. But the great in- work of F’ythagom in 600 B.C. on the laws of musical harmony found mathematical expression in the eighteenthcentury in terms of the wave terest in spectral analysis made its appearanceonly a little equation. The strueto understand the solution of the wave equation more than a century ago. The prominent German chemist was fhlly resolved by Jean Baptiste Joseph de Fourier in 1807 with Robert Wilhelm Bunsen (18 1 1-1899) repeated Newton’s his introduction of the Fourier series TheFourier theory was ex- experiment of the glass prism. Only Bunsen did not use the tended to the case of arbitrary orthogollpl functions by Stmn and sun’s rays Newton did. Newtonhad found that aray of Liowillein 1836. The Stum+Liouville theory led to the greatest as empirical sum of spectral analysis yet obbhed, namely the formulo sunlight is expanded into a band of many colors, the spectrum tion of quantum mechnnics as given by Heisenberg and SchrMngm in of the rainbow. In Bunsen’s experiment, the role of pure sun- 1925 and 1926. In 1929 John von Neumann put the spectral theory of light was replaced by the burning of an old rag that had been the atom on a Turn mathematical foundation in his spectral represent, soaked in a salt solution (sodium chloride). The beautiful rain- tion theorm in Hilbert space. Meanwhile, Wiener developed the mathe- matical theory of Brownian mavement in 1923, and in 1930 he in- bow of Newton did not appear. The spectrum, which Bunsen duced genernlized harmonic analysis, that is, the spectral repreaentation saw, only exhibited a few narrow lines, nothing more. One of of a stationuy random process The cOmmon ground of the spectral the lines was a bright yellow. repreaentations of von Neumm and Wiener is the Hilbert space; the Bunsen conveyed this result to Gustav Robert Kirchhoff von Neumm result is for a Hermitian operator, whereas the Wiener (1824-1887),another well-known German scientist.They result is for a unitary operator. Thus these two spectral representations are dated by the Cayley-MBbius transformation. In 1942 Wiener ap knew that the role of the glass prism consisted only in sorting plied his methods to problems of prediction and filtering, and his work the incident rays of light into their respective wavelengths (the wasinterpreted and extended by Norman Levinson. Wienerin his process known as dispenion). The Newton rainbow was the empiricalwork put mm emphasison the autocorrelationfunction extended continuous band of the solar spectrum; it indicates than on the power spectrum that all wavelengths of visible light are present in pure sunlight. Themodern history of spectral eaimation begins with the break- through of J. W. Tukey in 1949, which is the statistical counterpart of The yellow line, which appeared when the light source was a the breakthrough of Fourier 142 years earlier. This result made pos burning rag, indicated thatthe spectrum of tablesalt con- sible an active development of empirical spectral analysis by research taineda single specific wavelength. Further experiments workers in dl scientifiidisciplines However, spectral analysiswas showed that this yellow line belonged to the element sodium. computationally expensive. A major computational breakthrough oc- No matter what the substance in which sodium appeared, that curred with the publication in 1965 of the fast dge rithm by J. S. Cooley and J. W. Tukey. The Cooley-Tukey method element made its whereabouts known by its bright yellow made it practical to do signal pmcessiy on waveforms in either the spectral line. As time went on, it was found that every chemi- time or thefrequency domain, sometlung never practical with con- cal element has its own characteristic spectrum, and that the tinuous systems. The Fourier trpnaOrm became not just a theoretical spectrum of a given element is always the same, no matter in description, but a tod With the development of the fast Fouriertram form the feld of empirical .spectral analysis grew from obscurity to what compound or substance the element is found. Thus the importance, and is now a major discipline. Further important contribu- spectrum identifies the element, and in this waywe can tell tions were the introduction of maximum entropy spectral analysis by what elements are in substances fromthe distant stars to John Burg in 1967, the development ofspectral windows by Emmanuel microscopic objects. Parzen and others starting in the 195O’s, the statistical work of Maurice Priestley and his shod, hypothesis testing in time series analysis by The successes of spectral analysis were colossal. However, Peter Whiffle startiug in 1951, the Box-Jenkins approach by George the spectral theory of the elements could not be explained by Box and G. M. Jenkins in 1970, and autorepshe spectral estimation classical physics. As we know, quantum physics was born and and order-determining criteria by E. Parzen and H. Akaike starting in spectral theory was explained in 1925 and 1926 by the work the 1960’9 To these statistical contributions mustbe added the equdly importantengineering contributions to empirid spectrum analysis, of Werner Heisenberg (1901-1976) and Erwin Schrijdinger which are not treated at dl in this paper, but form the subject matter (1887-1961). In this paper, we will show how spectral theory of the other papen in this special issue. developed in the path to thisgreat achievement. Although most of the glamour of spectral theory has been I. INTRODUCTION associated with quantum physics, we will not neglect the paral- PECTRAL estimation has its roots in ancient times, with lel pathtaken in classical physics. Although thetwo paths the determination of the length of the day, the phases of began diverging with the work of Charles Sturm (1803-1855) the moon, and the length of the year. The calendar and and Joseph Liouville (1809-1882) on the spectral theory of s differential equations, we wiU see that thefinal results, namely, the clock resulted from empirical spectral analysis. In modem the spectral representation of John von Neumann(1903-1957) Manuscript received January 22, 1982; revised April 28, 1982. Thls for quantumphysics, and that of (1894-1964) paper is dedicated to Norman Levinson, August 11, 1912-October 10, for classical physics, are intimately related, 1975, my teacher in differential equations. Because light has high frequencies, our instruments cannot Theauthor is with the Departmentof Theoretical and Applied Mechanics and theDepartment of Geological Sciences, Cornell Uni- respondfast enough to directly measure the waveforms. In- versity, Ithaca, NY 14853. stead, the instruments measure the amount of energy in the

0018-9219/82/0900-0885$00.750 1982 IEEE 886 PROCEEDINGSVOL. IEEE, OF THE 70, NO. 9, SEPTEMBER 1982

frequencybands of interest. The measurementand analysis borhood of a certain point x = a, into an infinite series whose of the spectra of other typesof signals, however, take different coefficients are the successive derivatives of the function at the forms. With lower frequency signals, such as mechanical vibra- given point tions, speech, sonar signals, seismic traces, cardiograms, stock market data, and so on, we can measure the signals as func- tions of time (that is, as time series) and then find the spectra by computation. With the advent of the digital computer, Thus analytic functions are functions which can be differenti- numerical spectrum estimation has become an important field ated to any degree. We know that the definition of the deriva- of study. tive of any order at the pointx = a does not require more than Let us saya few words aboutthe terms“spectrum” and knowledge of the function in anarbitrarily small neighbor- “spectral.” Sir Isaac Newton introducedthe scientific term hood of thepoint x = u. The astonishing property of the “spectrum” using the Latin word for an image. Today, in Taylor series is that the shape of the function at a finite dis- English, we have the word specter meaning ghost or appari- tance h from the point x = u is uniquely determined by the tion, and the corresponding adjective spectral. We also have behavior of the function in the infinitesimal vicinity of the the scientific word spectrum and the dictionary lists the word point x = u. Thus the Taylor series implies that an analytic spectrul as the corresponding adjective. Thus“spectral” has function has a very strong interconnected structure; by study- two meanings. Many feel that we should be careful to use ing the function in a small vicinity of the point x = a, we can “spectrum” in place of “spectral” a) whenever the reference precisely predict what happens at the point x =(I + h, which is to data or physical phenomena, and b) whenever the word is at a finite distance from the point of study. This property, modifies “estimation.”They feel thatthe word “spectral,” however, is restricted to the classof analytic functions. The withits unnecessary ghostly interpretations, should be con- best known analytic functions are, of course, the sine and fined to those usages in a mathematical discipline where the cosine functions, the polynomials, and the rational functions term is deeply embedded. (away from their poles). The material in this paper through Section XI1 surveying the period fromantiquity through Levinson and Wiener can be 111. THE DANIELBERNOULLI SOLUTION OF THE described as “The Prehistory of Spectrum Estimation” to em- WAVE EQUATION phasize that spectrum estimation is interpreted estimation as Thegreat Greek mathematicianPythagoras (ca 600 B.C.) from data.The remaining sections may be described as was the fmt to consider a purely physical problem in which “Some Pioneering Contributions to the Development of Meth- spectrum analysis made its appearance. Pythagoras studied the ods of Spectrum Estimation.” laws of musical harmony by generating pure sine vibrationson Modem spectrum estimation began with the breakthrough a vibrating string, fixed atits two endpoints. This problem for the analysis of short time series made by J. W. Tukey in excitedscientists since ancientdays, butthe mathematical 1949. This work led to a great blossoming forth of spectrum turningpoint came in the eighteenth century when it was analysis techniques. Despite the advances in digital computing recognized that the vertical displacement u(x, t)of the vibrat- machinery, such computations were still expensive. The next ing string satisfies the wave equation greatbreakthrough occurred with the discovery of the fast Fourier transform in 1965 independently by J. W. Cooley and a2u 1 aZu -_--- - 0. J. W. Tukey and by Gordon Sande. This development, in con- ax2 c2 at2 junction with silicon chip technology, has brought spectrum analysis to bear on a wide range of problems. Another break- Here x is the horizontal coordinate and t is the time. The con- through occurred with the introduction of maximumentropy stant c is a physical quantity characteristic of the material of methods into spectrumanalysis by John Burg in 1967. the string, and represents the velocity of the traveling waves on the string. Because the endpoints x = 0 and x = 71 are fixed, we 11. TAYLORSERIES have the boundary conditions At the time when calculus was introduced in theseventeenth u(0, t) = u(s, t)= 0. century by Newton and Leibnitz, the concept of amathe- matical “function” entailed restricted properties, which in the (Note that for simplicity, we have taken the string to be of course of time were gradually made less severe. In those days, length r.) The problem of constructing the solution of the the observations of natural events seemed to indicate that con- wave equation was thenattacked by some of the greatest tinuous relations always existedbetween physical variables. of all time, and in so doing, they paved the This viewwas reinforced by the formulation of the laws of way for the theory of spectrumanalysis. nature on the basisof differentialequations, as exemplified One of the finest results was that of Daniel Bernoulli (1 700- by Newton’s laws. Thus it became commonplace to assume 1782) [3] in 1738. He introduced the method of separation thatany function describing physical phenomena would be of variables in which atrial solution is constructed as the differentiable. The idea of a functionthat changes in some product of a function of x alone, and a function of t alone. capricious or random way, and thus does not allow any ana- Thus, he wrote lytic formula forits representation, did notenter into the u(x, t)= X(x) T(t). thinking of the mathematicians of that time. It was thus very natural for Brook Taylor (1 685-1 73 1)[ 21, a contemporary of Putting this trial solution in the differential equation and solv- Newton, to introduce the concept of “analytic function.” The ing, he found thesolutions Taylor series expands an analytic function an infinite sum- as cos kx cos kcr, cos kx sin kct mation of component functions. More precisely, the Taylor series expands a function f(x), which is analytic in the neigh- sin kx cos kct, sin kx sin kct. ROBINSON: HISTORICAL PERSPECTIVE OF SPECTRUM ESTIMATION

However, the boundary condition at x = 0 excludes the solu- grange (1736-1813) [SI is tions involving cos kx, and so the possible solutionsare re- duced to the two choices n A, = 2 J f(x) sin nx dx. sin kx cos kct, sin kx sin kct. n The boundary condition at x = n requires that the value of k This is the point at which the question stood at the startof the to be an integer. In view of the linearity of the wave equation, nineteenth century. any superposition of solutions gives a solution. Bernoulli thus gave the following solution: Iv. JEAN BAPTISTE JOSEPH DE FOURIERAND THE . SINUSOIDAL SPECTRALTHEORY 00 On December 21, 1807 theengineer Jean Baptiste Joseph de U(X, t)= sin kX(Ak COS kct + Bk sin kct) k=1 Fourier (1768-1830) [ 61 addressed the French Academy and made a claim that appeared incredible to the eminent mathe- where the Ak and Bk are arbitrary constants. Bernoulli made maticians who were members of the Academy. As it turned the claim that this infinite sum is the general solution of the out, oneof the greatest advances in the historyof , equation for the vibrating string. Theimplications of Ber- an innovation which was to occupy much of the attention of noulli’sclaim were startling. From the principles of mechan- the mathematical community for over a century, was made by ics it was knownthat the initial displacementand initial an engineer. Fourier said at that historic meeting that an arbi- velocity of the string could be prescribed in an arbitrary way. trary function, defined over a finite interval by any rough and That is, it was known that at the initial time t = 0, both u(x, 0) even discontinuous graph, could be represented as an infinite and &(x, 0) could have any functional form. (Note that the summation ofcosine andsine functions. The distinguished dot over a function indicates differentiation with respect to and brilliant academicians questioned the validity of Fourier’s time, so & represents the velocity of the string in the vertical theorem,for they believed that any superposition of cosine direction.) However, Bernoulli’s solution gives explicit expres- and sine functions could only give an analytic function, that sions for initial displacement and initial velocity, namely, is, an infinitely differentiable function, An analytic function,

00 of course, could never be discontinuous, and thus was very far U(X, 0) = x A& sin kx removed from some arbitrarily drawn graph. In fact, Taylor’s k=1 theorem stated that an analytic function had the property that, given its shape in an inffitesimal interval, the continuation of its course to the right and left by finite amounts was uniquely determined(the so-calledprocess of analyticcontinuation). The academicians and the other great mathematicians of the Thus Bernoulli‘s solution implied that each of two arbitrary time could not reconcile the property of analytic continuation functions u (x, 0) and ti (x, 0) could be expanded in theinterval with Fourier’s theorem. How could the physical reasoning of 0

gives knowledge of thefunction in the entire range. The image of itsformer self once it is onthe unit circle. The Taylor series requiresunlimited differentiability ata point, Taylor series breaks down on the unit circle, but its counter- whereas the Fourier series does not demand any differentiabil- part, the Fourierseries in 8, is still valid. The theoremof ity properties whatever. Fourier is true;science could blossom. Surprisinglyenough, the chasm between the Taylor series and the Fourier series is bridged by means of the z-transform, V. THE STURM-LIOUVILLESPECTRAL THEORY OF which is thefundamental transform used inthe theory of DIFFERENTIALEQUATIONS digital . Let us consider an analytic function Following the great innovation of Fourier in 1807, the re- f(z) of the complex variable markable properties of Fourier series were gradually developed throughoutthe nineteenth century and intothe twentieth century.The Fourier series as introduced by Fourieris an We now expand the function in a Taylor series (in the variable expansion in terms of cosines and sines, which represent an z-l) about the point z-l = 0, to obtain the z-transform orthogonalset of functions. However, there are many other sets of orthogonal functions, and so today any such expansion in terms of orthogonal functions is called a Fourier series. As f(z) = anz-. n=O we will see, some sets of orthogonal functions can be stochas- tic, and it turns out that the corresponding Fourier series play The radius of convergence of this series extends from z-' = 0 an important role in statistical spectralanalysis. to the fust singular point, say, z0' . A singular point is a point First, however, let us look at the important generalizations where the function ceases to be analytic. The region of con- made by the Frenchmathematicians Charles Sturm (1803- vergence of the Taylor series expansion of f(z) is the region in 1855)[7] andJoseph Liouville (1809-1882) [8] in the the z-plane outside the circle of radius Izol; that is, the region decade of the 1830's. Let us now briefly look at the Sturm- of convergence is for all points z such that Iz-'~ < IzO'l or Liouville theory of differentialequations. The vibration of equivalently IzI > IzoI. any infinitely long right circular cylinder of radius one can be Let us now write the Taylor series expansion for points on described by a second-order differential equation. Let us con- the unit circle z = cos 8 - j sin 8. We have sidera simple case, namely, the differentialequation (the one-dimensional Helmholtz equation)

n=o U"(X) + kZ u (x) = 0. which is in the form of a complex Fourier series inthe angle 8. The Helmholtz equation can be obtained by taking the tem- Three cases can occur. In the fitcase, the singular point zo is poral Fourier transform of the wave equation, which set off inside the unit circle in the z-plane. In this case, the function the search for the theory of Fourier. Here k is the wavenum- is analytic on the unit circle and the Fourier series thus is an ber which is equal to w/c where w is the temporal frequency. analytic representation of this analytic function, The French In the Helmholtz equation, k2 is some undetermined param- Academy believed this was the only case. In the second case, eter. The variable x is the central angle of the cylinder, and so the singular point zo is outside the unit circle. In this case, the x lies in the range -R and A. Because the points x = -n and Taylor series does not represent the function, and so we will x = A represent the same point on the cylinder, we must have not consider the case further. The third case is the interesting the two boundary conditions one, and is the case which resolves the mathematical contre versy which led up to Fourier's discovery in 1807. When the u(-A) = u(a) singular point zo lies on the unit circle, the Taylor series will u'(-A) = u'(n). not converge at some or all of the points on the unit circle. Thus the Taylor series defines an analytic function, which is The general solution of the differential equation is differentiable to anyorder outside the unit circle, but the u(x)=A coskx+Bsinkx. function becomes nonanalytic at some or all of the points on the unit circle. The Fourier series in 8 is the Taylor series for The two boundary conditions restrict thechoice of the param- z on the unit circle, and thus the Fourier series represents a eter kZ to thediscrete set of values function in the variable 8, which is nonanalytic at some or all k2 = 0, lZ,2', 3', * of the points in its range -A Q 0 < A. A small modification of the Fourier coefficients that would move the singular point zo which are called the eigenvalues of the Helmholtz equation. from on the unitcircle to justinside the unit circlewould Thecorresponding solutions of theequation, namely,the change anonanalytic Fourier representation to ananalytic functions one. The amazing thing is that it is enough to move the singu- uk(X)=A coskx+Bsinkx larity from the periphery of the unit circle to the inside by an arbitrarily small amount, in order to change the given nondif- are called the egenfinctions. These eigenfunctions are the co ferentiable function in 8 to one which can be differentiated sine and sine functions which Fourier had used to construct any number of times. Thus the mistake of the great French his Fourier series. These functions represent the characteristic mathematicians of the prestigious French Academywho wanted vibrational modes of the cylinder, which can only vibrate in to restrict the validity of Fourier series to analytic functions thisdiscrete set of wavenumbers k = 0, 1, 2, * . Thusthe depended entirely on that extremely small but fiite distance Sturm-LiouvUe theory has given the answer to why the dis- from a point on the periphery to a point just inside the unit crete set of cosine and sine functions were the correct ones for circle. A function can be extremely smooth right up to the Fourier to use in a problem which stemmed from the wave unit circle, and then disintegrate into a rough and distorted equation. ROBINSON: HISTORICAL PERSPECTIVE OF SPECTRUM ESTIMATION 889

Furthermore, the Sturm-Liouville theory gives us added in- sight to spectral analysis and, in fact, is the foundation of the lb#(x) dx = 0 spectraltheory of differentialequations. Most of the eigen- value problems of mathematical physics are characterized by so thatthe eigenfunctions forman orthonormal set.The differential operators H of the form orthonormal property can be written more concisely as

I" $j(X) &(X) dx = 8jk The physical problems we consider require that the function A (x)be positive within the given interval. Let us now form the where 6jk is the Kronecker delta function. operation VHU- uHv, which is Let us now represent an arbitrary function f(x) in the form d of the infinite expansion VHU- UHU= - [A(x)(Lu' - uv')] . dx m f(x)= ck$k(x). We notice that the right-hand side is a total derivative, and so k=l we have As we have previously mentioned, such an expansion is called a Fourier series in honor of the pioneering work of Fourier. (UHU- uHu)dx = [A(x)(uu'- UU')];. The Fourier coefficients Ck are obtained by multiplying both T sides by $j(x) and integrating. The result is Any differential operator H, which allows the transformation of such an integral(as on the left) intoa pure boundary term(as cj = f(x)$j(x) dx. on the right), is called self-adjoint. Thus the Sturm-LiouviUe operator H is self-adjoint. Often we may prescribe boundary 6 conditions so that the right-hand side vanishes; such boundary Undercertain general conditions, it canbe shownthat the conditions are called self-adjoint. We then have a self-adjoint orthonormal set is complete, so that the above Fourier expan- problem,namely, a problem characterized by aself-adjoint sionactually converges tothe function f(x). Supposenow operator H and self-adjointboundary conditions. We then that f(x) is the solution to the inhomogeneousdifferential have the identity in the functionsu (x)and u (x)given by equation Hfb)= P(x>. (VHU- UHV) dx= 0 In terms of linear system theory, p(x) is the input andf(x) is 1 the output. Now substitute u = f and v = $k into Green's which called Green's identity. is identity, We obtain The eigenvalue problemassociated with the self-adjoint operator H starts with thedifferential equation ($kHf - fHh)dx = H$ = A$. lb 0

A solution satisfying the boundary conditions does not exist which is for all values of A, but only for a certain selected set hi called the eigenvalues. This setconsists of aninfinite number of eigenvalues hi whichare all realand which tend to infiity lb($kP - f hk$k) dx = 0. with i. We generallyarrange these eigenvalues in increasing order to obtain the infinite sequence (called the spectrum) The above equation can be written as

b b hl, AZ, X39 *' * 1 together with thecorresponding eigenfunctions f@k#kPdx* dx=

$1,$2,#3>"*. We recognize theleft-hand side as the expression forthe We nowconsider two different eigenvalues hi, hk andtheir Fourier coefficient ck. Thus corresponding eigenfunctions $j, (bk. Ifwe substitute u = $j b and u = $k into Green's identity, we obtain ck = $k(t)P(t)dg.

[(h,@j$k-hk$k$j)dx=O We nowsubstitute thisexpression for ck into theFourier series to obtain which gives the orthogonality condition

ib&(x) $k(x) dx = 0, for j # k. If we denote the expression in brackets by G(x, t),then this By normalization, we can require that equation is ago PROCEEDINGS OF THE IEEE, VOL. 70, NO, 9, SEPTEMBER 1982

b of these energies is embodied as a quantum of electromagnetic f(x)= 1 P(t) G(x,f) 4. energy, thephoton. if the energy diminishes, a photon is born. If the energy increases, a photon or a quantum of energy This is the integral form of the input-output relationship, and from some other field has been absorbed just before the jump. we recognize In quantum mechanics, an electron is represented by a prob- ability density function. (The probability density function is found as the squaredmagnitude 1#12 of aprobability wave function 9.) An electron jump has a probability that depends upon the shapes of the probability density functions that cor- as the impulse response function or Green’s function (under respond to the electron prior to and after the jump. The prob- the given boundaryconditions), concepta originated by ability of a jump is, generally speaking, greater for the stronger George Green (1793-1841) [9]. This equation exhibits the overlapping ordeeper interpenetration of these probability impulse response function of a linear system in terms of its densityfunctions, The laws that divide electrontransitions in atoms into more probable and less probable ones are called spectrum AI, X2, X3, * * - . We can confirm that the Green’s function is indeed the impulse response by setting the input selection rules. It is in this jumping of electrons that photons p(x) equal to the impulse 6(x - xo). Then the output is are born. These photons enter a spectroscope, get sorted into types, and produce the spectral lines. Themore photonsthat an atom emits in asecond, the ~6(1.-xo)C(X,f)d~=G(X,Xo) brighter the spectral lines. If thenumber of atoms remains constant,then the brightness of the spectral lines depends upon the statistical frequency of electron jumps in the atoms. and so G(x, XO) represents the output at due to an impulse x And this statistical frequency is detemed by the probability at XO. Since the differential equation representsan input- distribution of jumps. It is in this way that an atomic spec- output system, we see that the Green’s function satisfies trum consisting of a number of lines of different brightnesses is %(x, XO) = 6 (x - xo), generated. One can make the observation that the spectrum estimation This equation shows that the Green’s function G(x, xo)is the problem (the subject matter of this special issue of Proceedings inverse of the differential operator H. of the IEEE) is not central to the spectral representation in We have thus reviewed the spectral theory of differential quantum mechanics. This situation was broughtforcibly to operators, and now we can look at the most spectacular ap the writer’s attention several years ago at the U.S. Air Force plication of spectral estimation-quantum physics. Geophysics Library at Hanscom Field, MA, which is one of the best scientific libraries inthe world. Themany shelves VI. SCHRODINGERSPECTRAL THEORYOF THE ATOM devoted to “spectra” consisted of a mixture of both kinds of The Sturm-Liouville theory of the expansion of functions in books, but no book devoted to a discussion of the relation- terms of orthogonal functions found numerous physical ap ship between the two areas of spectral theory. plications in the work of Lord Rayleigh (1842-1919). Such Spectral estimation in quantum mechanics is based on the expansions occur throughout the studyof the elastic vibrations edifice of spectroscopy, which is an instrumentational science. of solids and in the theoryof sound. In the history of physics, In1891, the physicist A.A. Michelson developed an inter- a decisive breakthroughoccurred when Erwin Schrodinger ferometer,a device producing the superposition of alight (1887-1961) [ 101 showed in 1926 that the vibrations occur- signal on top of itself with a prescribed delay. In one series ring within theatom can be understood by means of the of experiments, Michelson first bandpass filtered a light signal Sturm-Liouville theory. Let us now explain how the wave by passing it through a prism. He then used the interferometer mechanics of Schrodinger describes the spectral lines of the to measure the visibility of the superimposed signal as a func- atom. An equivalent matrix mechanics was formulated a year tion of delay. Theresulting curve was the autocovariance before Schrodinger by Werner Heisenberg (1901-1976) [ 111. function of the original signal. Michelson then used a mechan- Before quantum theory, classical physics was at an impasse. ical harmonic analyzer to compute the Fourier transform of It could not explain the existence of atomic spectra. For ex- the visibility curve; that is, he estimated the power spectrum of ample, the bright yellow spectral line of sodium discovered by the signal, Michelson’s experiments were done to examine the Bunsen means that the radiation of its atoms produces a dis- fine structure of spectral lines of light. Thus in those early crete frequency wo. If we assume that this line is emitted by days, the present day dichotomy of spectrum estimation had an electron, then the laws of classical physics state that such not yet materialized. an electron should emit not a discrete line at oo,but a whole Thetechnique of spectral analysis in physics developed spectrum of lines at all frequencies w, and with no discontinu- rapidly in the twentieth century, and the instruments became ities in the spectrum. That is, classical physics predicts that the morepowerful and sensitive. The spectroscopists came up spectrum of an electron should be continuous as is the spec- with the following question for theoreticians, namely, the trum of the sun. Yet Bunsen observed the discrete spectrum question of why spectral lines are somewhat fat, not inf~- of sodium as evidenced by the bright yellow line. (As we will tesimally thin. soon see, this line observed by Bunsen is actually a doublet, It was recognized that a photon corresponds to a line at one which Bunsen was unable to resolve with the means available frequency w. The question was why the lines on a photo- to him.) graphicplate of aspectroscope come out somewhatbroad- Quantum mechanics allows us to see the atom from a new ened, not slender. The answer was found in the wave prop point of view. Quantum mechanics says that atomic electrons erty of the electron and the Heisenberg uncertainty principle. jump from one energy state to another,and that the difference The initial energy of an electronin an atom refers to a RO BINSON: HISTORICAL PERSPECTIVEHISTORICAL ROBINSON: OF SPECTRUM ESTIMATION 891 stationarystate, and so does the final energy.However, SiOnal displacement x, the time-independent Schriidinger equa- an electron jump is in violation of some steady state. As soon tion is as this occurs, the Heisenberg principle takes over.If we let At designate the lifetime of an electron between jumps, then H$ = A$ the uncertainty of photon energy is AE - h/At, where h is where His defied as the differential operator Planck’s constant. Using Planck’s formula for energy quanta, the uncertainty AE of the energy is proportional to the un- H=~-xd2 2 certainty AO of the frequency of the photon dx and A is defied as h AE = -Ao. 2.rr A= -2E Thus the spectral lines have a width AO which is inversely -KwO proportional to the time of the “settled life” of the electron Here $ is the probability wave function, the constant E the in the atom is energy, h = 27cH is Planck’s constant, and the constant oois 2a the natural frequency. The problem of finding the probability AO-- wave function $ is a Sturm-Liouville problem. The solution At ‘ gives the eigenvalues as 1, 3, 5, 7, * , and so we write In other words, the more “settled” or quiescent the life of the electron in the atom, the narrower the spectral lines. That is Ak = (2k + l), for k = 0, 1,2, * . why at high temperatures and pressures, when many of the Thus the eigenenergies are atomic electrons are unsettled, the spectral lines broaden out and become smeared. Thus an individualspectral line has a Ek = iliuOhk=lfoo(k + i), for k = 0,1,2, * * . finitewidth associated with thermal motion and collision The corresponding eigenfunctions are broadening. This is notonly important in physics, but it relates very importantly to the topic of spectrum estimation $k = Ck hk(x) edX2fi, for k = 0,1,2, * * * in special issue. Real “lines” have finitewidth. This this where is a normalization constant, and hk(x)is the Hermite means that real lines behave like narrow-band noises and not Ck polynomial of order k. The discrete set of eigenenergies Eo, likeeither single frequencies or aconstant-amplitude lightly El, E2, * * * represent the discrete lines observed in the spec- frequency-modulated signal. trum. Thus quantum mechanics, through the use of Sturm- Let us now return the discussion of the yellow sodium line Liouville theory, is able to explain the existence of atomic which Bunsenobserved. Thesodium D line is adoublet. spectra. However, certain mathematical difficulties remained; Moreover thesodium spectrum contains four lines in the the history of their resolution is given in the next section, visible range, and two more inthe nearultraviolet, strong enough to be useful for analytic chemistry. The sodium spec- WI. THE VON NEUMANNSPECTRAL, trum contains 29 lines of astrophysical interest between the REPRESENTATION THEOREM D lines and 4390 ii (still in the visible). In finite-dimensional space, the following eigenvalue prob- We might say that Bunsen over a century ago was performing lem is posed. Given an Hermitian matrix H, frnd all column- spectrumestimation. Hewas unable to resolve the two fre- vector solutions$ of the characteristic equation quencies present in the doublet, even as today a person doing spectrumestimation mighthave the sameproblem in some H$ = A$ other situation. Bunsenmissed the many other lines in Also where A is a constant also to be determined. That is, given H, the sodium atom, even today a person doing spectrum esti- as find $ and Thesolutions dl, ,$, are called the eigen- mation might not find some features without theuse of modem h solutions (assumed to be normalized), and the corresponding techniques. As spectrosopic instruments became better, these real numbers AI, *, A, are called the eigenvaluesof the lines werediscovered. Now anotherquestion, however, has matrix H. The totality of the eigenvalues A2, * * A,, in come up. Many spectral lines, which, it would seem, should Al , , order of increasingmagnitude, is called the spectrum. Now correspond to a single frequency,actually turned out to be the states of a number of very close-lying lines. The fact that write the eigenequations thesodium line is adoublet is a case inpoint. The fine D H$k = Ak$k (for k = 1, * a, n) structures of spectral lines (doublets, etc.) were revealed only because of the great advances in spectral techniques. In turn, in the form of the matrix equation electron spin wasdiscovered in order to explainthese “fine HU = UA. qualities” in spectra.Let us briefly give the reason.When spectra are generated, the statesof two electrons with opposite Because the eigensolutions are orthonormal,the matrix U spins can have slightly different energies. As a result, the spec- (whichhas the eigensolutions as it columns) is unitary, i.e., tral line is doubled; in place of one line we have twin lines with =I identical brightnesses. Such twins are usually born only when UUT the outer electron shell hasone electron. If the number of where I is the identity matrix.(The superscript T indicates electrons in this shell increases, we can have triplets and even complex conjugate transpose.) The matrix A is diagonalma- larger families of the former spectralline, trix,with the spectrum along its diagonal. Thus this eigen- Let us now consider the quantum mechanical formulation of value problem can be described as the problem of finding a the harmonic oscillator problem. In terms of the nondimen- unitary matrix U that reduces H to areal diagonal matrix, 892 PROCEEDINGSIEEE, OF THE VOL. 70, NO. 9, SEPTEMBER 1982

Le., This equation represents thevon Neumann spectral representa- tion of the Hermitian matrix U-'HU = A. H Let us now analyze this equation. Ifwe strip the u and v (Note: In case His real, then H is a symmetric matrix and U from this equation, we are left with

is an orthogonal matrix.) 00 Althoughthe unitary matrix U, whose columnsare the XX(A)dA eigensolutions @i, is not uniquely determined by H, John von .=I, Neumann [ 121 in 1929 exploited the unitary nature of U to reformulatethe eigenvalue problem.The von Neumann re which, in matrix notation, is formulation, which is called the spectral representation prob- lem, yields the same results as the eigenvalue problem in finite- dimensional space, but has the advantage that it can be ex- We can write therow vector u as tended to Hilbert space. We recall that the diagonal matrix A is defined to be the matrix with the eigenvalues, ordered by increasing magnitude, along its main diagonal and zeros off the diagonal. Because of thisordering, thematrix A is uniquelydetermined for any which is given Hermitian matrix H. Because some eigenvalues may be u=uP~+UPZ +.**+Up,. repeated, let us relabel them as A1, Xz, * , X, (with m < n), where each hi is now distinct. Consequently for a given H, Finally, we can write we have the unique decomposition W A=AlQ1 +XzQz +*"+AmQm HU =[= AX(h)UdA where Qi is a diagonal matrix with 1's in those places on its main diagonal in which A, occursin A and 0's elsewhere. which is The sum of the Qfgives the identity matrix HU=X~P~U+A~P~U+...+X,P,V. I=Ql +Qz +*"+Q,. Let us now consider functions of the matrix H. First, we We now define the matrixPi as consider the square ofH. We have

Pi = UQiU-l (for j = 1,2, * * * , m), Hz=(AIPl +.**+A,P,)2=A:P1 t..-tA~Prn

A projectionmatrix is defined as a Hermitian idempotent 00 matrix. Because Qi is Hermitian (Qi = QT) and idempotent AzX(A) dX. = =I, (QiQi Qi), it follows that Qi is aprojection matrix. Be- cause Pi is Hermitian (Pi = Pi') and idempotent We see that squaring H results in squaring the A inside the PIPI = UQjU-' UQjU-' = UQjQjU-' =Pi integral.In general, if we forma function of H, thenthe result is that the same function of A is taken within the in- it follows that Pi is a projection matrix. Since for i #j tegral sign; that is PiPi = UQiU-lUQiU-' 0 f(H) f(A) X(N dA. it follows that Pi + Pi is a projection matrix and the space =[I spanned by Pi is orthogonal to the space spanned by PI. Let us now define the function X(A) of the continuous variable Theabove spectral representation was derived for finite Aas dimensional space, that is, a space in which the elements u and u are vectors and the Hermitian operator His a matrix. X(A)=P,6(A- A1)+Pz6(h- A2)+...+Pm6(A- A,). One of the major achievements of von Neumann was the de velopment of the concept of the infinitely-dimensional space, This function is thecontinuous representation of thesuite which hecalled Hilbertspace in honor of the great mathe of projection matrices Pz, * PI, , P,. matician David Hilbert(1862-1943). We nowlet u and u We now consider the quadratic form uHv where u is a row representselements in Hilbert space, and let H representa vector and u is a column vector. We have Hermitianoperator. A Hilbertspace is characterizedby an

UHU=UUAU-~U=UU(X~Q,+A~Q~ +..a+ X,Q,,,)U-~U innerproduct (or dot product). The inner product of the elements u and u is denoted by (u, u). If we let H operate on =u(h1P1 +AzPz +"'+X,P,>U theelement u, we obtaina new element Hv. Theinner =~~up~ut~~up~u+~~~t~,up,v. product of theelements u and Hv is denotedby (u,Hu). This inner product is the counterpart of the quadratic form The essence of the von Neumann spectral representationlies in uHv infinite-dimensional space. Once we establishthis con- the fact that the components uPiu are numerically invariant nection,it turns out that the von Neumannspectral repre- for given u, H, and u. In this way, the nonuniqueness of the sentation has exactly the same form in Hilbert spaceas it does unitary matrix U appearing in the eigenvalue decomposition infinite-dimensional space. Thus in Hilbert space, we also is bypassed. We see that we can write the quadratic form as have an operator X( X), which is the continuous representation the integral of the suite of projection operators associated with the Her- mitianoperator H. Whereas infinite-dimensional space, we made use of the quadratic form uX(A) u, we now make use of itscounterpart (u, X(A) v> in Hilbertspace. Thus the von ROB INSON: HISTORICAL PERSPECTIVEHISTORICAL ROBINSON: OF SPECTRUM ESTIMATION 893

Neumann spectral representation in Hilbert space is and 1926, there was a crisis in abstract mathematics because the physics of quantum mechanics could not be adequately formulated in terms of the existing mathematical framework. (u, = u> dh. Hu) Ah, X(A) This situation was rectified in 1929 by von Neumann [ 12J I, who laid the mathematical foundations of quantum mechanics Let us now lookat somehistory. In general, there is no in terms of Hilbert space. There is an apocryphal story that quadratically integrable solution to the eigenvalue problem in the young John von Neumann, who was barely past being a Hilbert space. This circumstance, however, bothered no one teenager, and had not yet earned his doctorate, was lecturing working in physics. Wavelet solutions (i.e., quadratically in- in GBttingen. Of course, most of the famous physicsts present tegrable superpositions of eigenfunctions with eigenvalues in regarded his work as too abstract, but the great a small neighborhood) were used from the start, appearing in Hilbert was in the audience. As thestory goes, the elderly the works of de Broglie and Schrodinger from 1924. Hilbert leaned over and whispered into Professor Courant’s One of the authors cited in the Reference Section knew von ear: “What is this Hilbert space?” Another even more apoo Neumannpersonally, studied his work assiduously, and cer- ryphal story goes as follows. A group of physicists came to tainly regards him as one of the truly great founders of quan- von Neumann and described a problem in physics which they tum theory. However, there was never a “crisis in physics” could not solve. After thinking for a while, von Neumann in that was resolved by the von Neumann spectral representation his head came up with the numerical answer which agreed with theorem. Most peopledoing the practical calculations to be the experimental result, which the physicists knew but had not compared with experiment had never heard of the theorem, told him. They were very impressed and they blurted out which was for them at such a high level of abstraction that it “Dr. von Neumann, the general solution involves solving an had no bearing on what they were doing. infinite set of nonlinearpartial differential equations. Cer- Throughout this essay we have traced the development of tainly you have found some mathematical shortcut!” von Neu- spectral theory, from the analytic functions of Brook Taylor, mann answered “NO, I solved the infinite set.” to the nondifferentiable functions of Jean Baptiste Joseph de von Neumann [ 131 showed that from a mathematical point Fourier,and now tothe more general operatorsof Hilbert of view, it is the spectralrepresentation that is required in space. At each stage, these developments were mathematical quantum mechanics rather than the solution of the eigenvalue in nature,but they laid thefoundations for subsequent ad- problem as such. In this sense, spectral theory represents the vances in physics. Reasoning in mathematics and reasoning in key to the understanding of the atom. In fact, von Neumann physics often appear quite different. When a major physical [ 13J has shown that the spectral representation enters so es- breakthrough occurs, such as inquantum mechanics inthe sentially into all quantum mechanical concepts thatits 192O’s, and a flood of exciting new physical results come out, existence cannot be dispensed with. His establishment of the certainly the work of mathematicians in establishing existence spectral representation of the Hermitian operator H is one of and uniqueness theorems might seem somewhat irrelevant. the great achievements in mathematics, and a milestone in the For a moment let us go back to Sir Isaac Newton. It is often history of spectral theory. said thatthe unique greatness of Newton’s mind andwork consists in the combination of a supreme experimental with a VIII. EINSTEIN-WIENER THEORYOF BROWNIAN MOTION supreme mathematical genius. It is also often said thatthe A highly interesting kinetic phenomenonknown as Brownian distinctive feature of Newtonian science consists precisely in movement was first reportedin 1827 by the distinguished the linking together of mathematics and experiment, that is, botanist, Robert Brown, who found that “extremely minute in the mathematical treatment of experimental or (as in particles of solid matter when suspended in pure water exhibit astronomy, geophysics, or wherever experimentscannot be motions for which I am unable to account and which, from performed) observationaldata. Yet, although correct, this their irregularityand seeming independence, resemble in a description does not seem to be quite complete; there is more remarkable degree, the less rapid motions of some of the sim- in the work of Newton than mathematicsand experiment. plest animalcules of infusions.” This type of irregular zigzag There is also a deep intuition and insight in his interpretation movement is typified by the dancing of dust particles in a of nature. beam of light. The cause of Brownian movement was long in In today’s science, specialization has gone far. Physicists doubt,but with the development of thekinetic theory of use mathematics; they formulate problems, devise methods of matter came the realization that the particles move because solution, andperform long derivations and calculations, but they are bombarded unequally on different sides by the rap generally they are not interested in creating new mathematics. idly moving molecules of the fluid in which they are sus- The discovery and purification of abstract concepts and prin- pended. The Brownian movement never ceases. The detailed ciples is particularly in the realm of mathematics. John von physical theory of Brownian movement was worked outin Neumann (1903-1957) is a prime example of a mathematician 1904 by M. von Smoluchowski [ 141, and in a more final form doing physics. When he did physics, he thought and calculated in1905 by AlbertEinstein [ 15 I. In1923, Norbert Wiener likea physicist, only faster.. He understood all branches of [ 161 developed the mathematical theory of Brownian move- physics, as well as chemistry and astronomy, but mainly he ment, which today is the basis of the mathematical model of had a talentfor introducing only those mathematical ideas white noise incontinuous time. White noise is defined as a that were relevant to thephysics at hand. The introduction of stationaryrandom process which has a constant spectral abstract Hilbert space theory in quantum mechanics, chiefly powerdensity. Theconcept of the white noise process, as by von Neumann, made possible the construction of a solid given by the Einstein-Wiener theory of Brownian motion, is theory on the basis of the powerful intuitive ideas of Dirac important inall theoretical studies of spectrum analysis. and other physicists. In practice, a signal is of frnite duration, and usually can be The physics of quantum theory cannot be mathematically digitized on a grid fineenough for interpolation to be ade- formulatedin finite-dimensional space but requires Hilbert quate. In this sense, the set of data representinga signal is space. After the work of Heisenberg and SchrBdinger in 1925 really finite. Accordingly, we do nothave to go to continuous 894 PROCEEDINGSIEEE, OF THE VOL. 70, NO. 9, SEPTEMBER 1982 time or to infinite time unless 1) we so wish or 2) we gain from question becomes it. In other words, as long as we stay finite, we do not need the Einstein-Weiner theory. With this caveatemptor, let us f(t)e(t) dr. now discuss this theory. [I A white noise process in continuous time cannot be repre- sented by the ordinary types of mathematical functions which As is usual statistical practice, let E denote the mathematical one meets in calculus, Instead,white noise can only be expectation operator. Since this operator is linear, it may be interchangedwith integral signs (providedcertain regularity represented by what mathematicians call a generalized func conditions hold). The expectation of the above integral is tion. The most familiar example of generalized function is the Dirac delta function,which is often defined as 00 f(r) e(t) dt=ls f(t)Ee(t)dt. 0, for tfto 6(t- to)= I 00, for t = to Because we wantwhite noise to have zeromean, we let 00 Se(t) = 0, and so the above integral is zero. Let us next con- 6(t- to)dt= 1. sider the variance given by

The most important property of the delta function is its sift- E[Jmf(t)a(t)dt]2-00 =s[I_f(t)€(t)df/Pf(~)e(~)d~] -00 ing property, that is, its ability to isolate or reproduce a par- f(t) ticular value of anordinary function according to the 00 convolution formula = f(t)f(T)E[dT)€(TI1 dtd7.

Now we come to the key point. We want white noise to be uncorrelated at two different time points, but at the same time If one feels uncomfortablewith generalized functions,then we want the variance of white noise to produce an impulse so one can often avoid them by using Lebesgue-Stieltjes integrals. as to make the above integral have a nonzero value. Thus the For example, the Heaviside step function H(r) is an ordinary key element is to define the covariance E[e(t)e(7)I as being function equal to zero fort < 0 and to one for f 2 0. Since equal to 6 (t - 7). Then the above integral becomes dH(t)= 6 (t)dt 00 f(t) f(7) 6 (t - T)dt d7 = f2 (t)dt. the above convolution formula becomes theLebesgue-Stieltjes 1, integral L We can therefore make the following definition. A generalized random function e(t) is white noise provided that Ee(t) = 0 [I [I f(t - to)dH(t) = f(to). and Ee(t) E(T) = 6 (t - T). For a long time such a randompro- cess was regarded as improper. As we know, the delta function This Lebesgue-Stieltjes integral involves onlyordinary can be approximatedarbitrarily close byordinary fun@ functions. tions. Likewise, the white noise process e(t) can be approxi- Let us now look at a white noise process which we denote mated arbitrarily close by ordinary random processes. by e(t). It is a generalized randomfunction. Again let f(t) Because one never uses the white noise process in isolation be an ordinary function, andconsider the convolution integral but only in integrals, the white noise process can be avoided by the use of the Lebesgue-Stieltjes integral, just as the Dirac 00 delta function can be so avoided. However, as we have said, f(t - to)e(t) dt. we will not follow theLebesgue-Stieltjes approach here. Let us now consider white noise e(n) for discrete (integer) time n. White noise in discrete timeis not a generalized random Let d (t)be the integrated whitenoise process, so that we may write process, for e(n) is merely a sequence of zeremean, constant- variance, uncorrelated random variables. However, the Fourier d8(t) = e(t)dt. transform of discrete white noise is a generalized random pro- cess, which we denote by E(o). We have The integrated white noise process d (t)is an ordinary random function, and the above convolution becomes the Legesgue- Stieltjes integral ~(o)=2 e(n)e-’wn (for -a Q o Q n). n =-m

00 We can easily verify that E(w)has zero mean. The covariance f(t - to)& (t). 1- of E(w)is Wiener formulated everything. in terms of Lebesgue-Stieltjes integralswith ordinary functions. However, we are going to n k take a strictly engineering approach and formulate things in terms of ordinary integrals, but with generalized functions. Without loss of generality in the discussion which follows, nk we can forconvenience let to = 0, so thatthe integral k Because Ee(n) e(k)= (i.e., theKronecker delta function, RO BINSON: HISTORICAL PERSPECTIVEHISTORICAL ROBINSON: OF SPECTRUM ESTIMATION 895 which is one when n = k and zero otherwise), we have the suggestion of periodicity still quite clear to the eye. If the errors are increased in magnitude, the graph becomes more ir- B[E*(u)E(C()I = e-in(p-u) = 27r6 (c( - a). regular, the suggestionof periodicitymore obscure, and we n have onlysufficiently to increase theerrors to mask com- That is, the covariance is a Dirac deltafunction. Thus we pletely any appearance of periodicity. But, however large the come to an important result: The Fourier transform of a white errors, Schuster’s periodogram analysis is applicable to such a noise process in (infinitely extended) discrete time n is a white time series, and given asufficient number of observations noise process in the continuous variable o. (It is easy to show should yield a close approximation to the period and ampli- thatthe correspondingresult holds forthe caseof awhite tude of the underlying sinusoidal wave. noise process in continuous time.) In other words, the Fourier Yule reasoned inthe following way. Considera case in transform of a very rough (white) process in time is a very whichperiodogram analysis is applied to atime series gen- rough(white) process infrequency. The Fourier transform erated by some physical phenomenon in the expectation of preserves (saves) information and does not smooth (destroy) elicitingone or more true periodicities.Then it seemed to information. Today this result is second-nature to an engineer, Yule that in such a case there would be a tendency to start but when Wiener obtained this result in 1923 it was startling. withthe initial hypothesisthat the true periodicitiesare Wiener unlocked the spectral theory of the most random of maskedsolely by additiverandom noise. As we well know, processes (white noise), and now thestage was set for applying additive random noise does not in any way disturb the steady this result to the more smooth processes which are generated course of the underlying sinusoidal function or functions. It bymany physical phenomena. Wiener madethis application is true that the periodogram itself will indicate the truth or in 1930 under the name of generalized harmonic analysis, but otherwise of thehypothesis made, but Yulesaw no reason before we give its history we will break our train of thought for assuming it to be the hypothesis most likelyu priori. and look at the innovative work of Yule in 1927. Yule’s work At thispoint, Yule introducedthe concept of anihput- at the time seemed modest. While most mathematicians and output feed-back model, The amplitude of a simple harmonic physicists weredeveloping general methods to deal with the pendulumwith damping (in discrete approximation) can be infinite and the infinitesimal in spectrum analysis, Yulewas represented by the homogeneous difference equation developing a simple model with a finite number of parameters b(n)+u1b(n- l)+uzb(n- 2)=0. (i.e., a finite parameter model) in order to handle spectrum analysis in those cases where this model was appropriate. This Here b(n) is the amplitude at discrete (integer) time n. Errors model of Yule is known as the autoregressive (AR) process. of observation would cause superposed fluctuations on b(n), but Yuleobserved that, byimprovement of apparatusand automatic methods of recording, errors of observation can be IX. YULE AUTOREGRESSIVESPECTRUM ESTIMATIONMETHOD practically eliminated: An initial impulse or disturbance would set the pendulum in motion, and the solutionof the difference At the turn of the twentieth century, Sir Arthur Schuster equation wouldgive the impulse response. The initial condi- [ 171 introduced a numerical method of spectrum analysis for tions are b (n) = 0 for n < 0 and b (0) = 1. The characteristic empirical time series. Let x(n) represent the value of a time equation of the difference equation is series at discrete (integer) time n. Given N observations of the time series from n = 1 to n =N, then Schuster’s method con- E’ +ulE +u2 = 0. sisted of computing the periodogrum P(o)defined as From physical considerations, we knowthat the impulse 1 response is a damped oscillation, so the roots El and E2 of the ~(o)=-Ix(l)e-iw +x(2)e-iw2 +**a +x(N)e-juN12. characteristic equation must be complex with magnitude less N than one. This condition is equivalent to the condition that For example, suppose that the time series consists of a sinu- uz < 1 and 4u2 - a: > 0. The solution of the difference equa- soid of frequency oo with superposed errors; then, the per- tion thus comes outto be iodogram would show a peak at o = oo.Thus by computing sin(n+ l)oo the periodogram, the peaks wouldshow the location of the b (n)= eAn frequencies of the underlyingsinusoidal motion. Until the sin a0 work ofYule (1871-1951)in 1927 [18],the Schuster per- where iodogram approach was the only numerical method of empiri- cal spectrum analysis. However, manyempirical time series A = 0.5 In ~2 observed in nature yielded a periodogram that was very erratic and did notexhibit any dominant peaks. This led Yule to wo = tan-’ [-a;’ dW. devise his autoregressive method of spectrum analysis. In The damped oscillation b (n) is the impulse response function. those days, empirical spectrum analysis was called the investi- Thefrequency 00 is thefundamental frequency of the im- gation of periodicities in disturbed series. His main application pulse response function. was thedetermination of thespectrum of Wolfer’s sunspot Aswe mentioned above, Yule ruled out superposed errors. time series. Now, however, he allows a driving function (or input)of white G. Udny Yule in1927 introduced the concept of afinite noise, which he describes in the following way. The apparatus parameter model for a stationary random process in his funda- is left to itself, and unfortunately boys get into the room and mental paper on the investigation of the periodicities in time start pelting the pendulum with peas, sometimes from one side series with special reference to Wolfer’s sunspot numbers. If and sometimes from the other. He states that the motion is we consider a curve representing a sinusoidal function of time now affected, not by superposed fluctuations, but by driving and superpose on the ordinate small random errors, then the disturbances. As aresult, the graph willbe of anentirely only effect is to make the graph somewhat irregular, leaving different kind than a graph in thecase of a sinusoid withsuper- 896 PROCEEDINGSIEEE, OF THE VOL. 70, NO. 9, SEPTEMBER 1982 posed errors, The pendulumand pea graph will remain sur- was motivated by the work of researchers in optics, especially prisingly smooth,but amplitudeand phase will vary con- that of Rayleigh and Schuster. However, Wiener demonstrated tinuously, as governed by the inhomogeneousdifference that the domain of generalized harmonic analysis was much equation broader than optics. Among Wiener’s results were the writing down of the precise definitions of and the relationship be- x(n)talx(n- 1)+(12x(n- 2)=€(n) tween the autocovariance function and the power spectrum. where ~(n)is the white noise input. The solution of this dif- Thetheorem that these two functions make up a Fourier ference equation is transform pair is today known as the Wiener-Khintchine theorem [221. 00 Mention should be made of the basic fact that the existence x(n) = b(k) ~(n- k) of the spectrum follows from the properties of positive defi- k=0 nite functions. Bochner’s theorem on the spectral representa- where b (k)is the impulse response function given above. tion of positive definite functions provides adirect mathe- Yule thus created a model with a finite number of param- matical unification of spectral theories in Hilbert space and in eters, namely, the coefficients al and a2 of the difference stationary time series. equation. Given an empirical time series x(n),he uses the The writer several times in the 1950’s discussed with Pre method of regression analysis to fiid these two coefficients. fessor Wiener why his 1930 paper was not more accepted and Because he regresses x(n) on its own past instead of on other used by the mathematical profession at the time. As with all variables, it is a self-regression or autoregression. The least things, Wiener looked at history quite objectively and with his squares normal equations involve the empirical autocorrelation characteristic concern for and love of people. In retrospect, it coefficients of the time series, and today these equations are seems it was not until the publication of Wiener’s book Cyber- called the Yule-Walker equations. netics (231 in 1948 and also the nonclassified publication of Yule carried out his autoregressive analysis on Wolfer’s sun- his book Time Series [24] in 1949 that the general scientific spot numbers, which are a sequence of yearly observations of community was able to grasp the overall plan and implications sunspot observations. He used the numbers over the period of Wiener’s contributions. 1749-1 924 and obtained the autoregressive equation (with the The following passage from Wiener’s 1933 book The Fourier mean value removed) Integral [251 indicates the philosophy ofWiener’s thinking and his great personal appeal: x(n)- 1.34254x(n - 1)+0.65504x(n- 2)=~(n) “Physically speaking, this is the total energy of that portion and thus of the oscillation lying within the interval in question. As X = 0.5 In 0.65504 = -0.21 154 this determines the energy-distribution of the spectrum, we may briefly call it the “spectrum.” The author sees no com- wo = 33.963’ per year. pelling reason to avoid a physical terminology in pure mathe- Hence, the dominant period is 36O0/wO= 10.60 years. Yule matics when a mathematical concept corresponds closely to states that his autoregressive method represents an alternative a concept already familiar in physics. When a new term is to method of estimating the spectrum, as opposed to the Schuster be invented to describe anidea new tothe pure mathe- periodogram. In fact, his autoregressive model gives him an matician, it is by all means better to avoid needless duplica- estimate not only of the power spectrum but of an amplitude- tion, and to choose the designation already current,The and-phase spectrum “spectrum” of this book merely amounts to rendering pre- cise the notion familiar to the physicist, and may well be 00 1 as known by the same name.” Let us now define a stationary random process. We could which, for the sunspotnumbers, is either use discrete or continuous time, but for convenience let us use discrete (integer) time n. Let the process be denoted by 1 B(w)= x(n), which, we will assume, haszero mean. The process is 1 - 1,34254 e-iw + 0.65504 e-2iw called (second-order) stationary, provided thatits autoce variance function The magnitude IB(o)l and the phase d(o) are given by the equation @(k) =lTx*(n)x(n + k) depends only upon the time-shift k. Here, as always, the super- script asterisk indicates complex-conjugate. The normalized The power spectrum is the square of the magnitude spectrum, autocovariance function is called the autocorrelation function. that is, IB(o)12. The peak is close tothe fundamental fre- However, Wiener generally used the term autocorrelation for quency wo = 33.963’ per year. Exceptin exploration gee @(k), whether it was normalized ornot, Nevertheless, it is physics [ 19 I, [201, where Yule’s amplitude-and-phase spee confusing to keep using the term “autocorrelation” with two trum B(o) is physically the spectrum of the minimum-delay different meanings. It is better to use the term “autocovari- seismic wavelet, Yule’s spectralestimation method received ance” wherever it is appropriate. scant attention until the1960’s. A white noise process is stationary. In the case of continu- ous time, its autocorrelation is 6 (t) (the Dirac delta), whereas X. WIENER’S GENERALIZEDHARMONIC ANALYSIS in the case of discrete time, its autocorrelation is 6k (the Norbert Wiener [ 2 11 published in 1930 his classic paper, Kronecker delta). As we have seen, the Fourier transform “Generalized Harmonic Analysis,” whichhe personally con- E(w) of white noise in time is white in frequency; thatis, the sidered his finest work. In his introduction, he states that he autoconrelation in frequency is the Dirac delta function ROB INSON: HISTORICAL PERSPECTIVEHISTORICAL ROBINSON: OF SPECTRUM ESTIMATION 891

reservations of the mathematicians. It is true that the physi- Theproblems confronting empirical workers inspectral cists spoke of 6 (t)as an ordinary function, which it cannot be analysis in the fmt part of the twentieth century werecen- in any precisesense, and that they treated it as an ordinary tered around the Schuster periodogram. Schuster introduced function by integrating it and even differentiating it. But the this concept at the turn of the century, and until Yule’s work physicists were justified because they only used 6(t) inside integralswith sufficiently-differentiable functions f(t). For in 1927, it was the only method available to carry out empiri- example, the derivative of 6 (t) always appeared inside in- cal spectral analysis. Supposethat weobserve astationary an tegral, and the integral was integrated by parts follows: random process for a very long time, so that we obtain a time as series x(n) for n = 1, 2, * * ,N, where Nis verylarge. Schuster 00 then computed theperiodogram Jm 6’(t)f(t) dt = - 6 (t)f’(t) dt = -f’(O). -00 I-00 The physicistsnever used the delta function except to map functions to real numbers. In this sense, they employed the where X(w)is the discrete Fourier transform machinery but not the words of distributiontheory, which was devised expressly in order to give delta functions a sound basis. It was the French mathematician L. Schwartz who after n=l World War I1 created a systematic theory of generalized func- tions and explained it in well-known monograph Thiorie (Today we can compute X(w) very rapidly by means of the his des Distn’butions in 1950 and 195 1. From then on the theory Cooley-Tukey fast Fourier transform, but then it was a for- ofgeneralized functions wasdeveloped intensivelyby many midable task.) In the case when the stationary process is made mathematicians. This precipitate development of distribution up of sinusoidal waves withsuperimposed white noise, the theory received its main stimulusfrom therequirements of periodogram is effective in picking out the discrete frequencies mathematics and theoretical physics, in particular the theory of the sinusoids. Buta purely nondeterministic stationary of differentialequations and quantum physics,Generalized process is generated by the convolution formula (input-output functions possess a number of remarkable properties that ex- relation) tendthe capabilities of classical mathematical analysis, For

00 example, any generalized function turns out to be infiitely differentiable(in the generalized meaning),convergent series k=O of generalized functionsmay be differentiated termwise an infinite number of times, the Fourier transform of a general- (Here we interpret b(k) as the impulse response function of a filter, the white noise process E(n) as the input to the filter, izedfunction always exists,and so on. For thisreason, the usesof generalized functiontechniques substantially expand andthe stationary process x(n) as theoutput). For such a the range of problems that can betackled and leads to ap process, the Schuster periodogram P(o) is extremely rough, and oftencannot readily be interpreted.Empirical spectral preciable simplifications that make otherwise difficult opera- tions automatic, analysis was at an impasse, Most of the time series observed in nature could not be analyzed by the methods available in As scienceadvances, its theoretical statements seem to re- 1930. quire an ever higher level of mathematics. When he gave his Now comes Wiener in1930 with generalized harmonic theoretical prediction of the existence of antiparticles in 1931 analysis, In brief, Wiener in1930 knew how to take the (hoc. Roy. SOC. London, Ser. A, vol. 133, pp.60-72) Dirac Fourier transform of a stationary random process, a milestone wrote, “It seems likely that this process of increasing abstrac- in the use of Fourier methods. Wiener’s generalized harmonic tion will continue in the future and that advance in physics is analysis makes use of a generalized random function, namely, to be associated with a continual modification and generaliza- the Einstein-Wiener (whitenoise) process. Inorder toput tion of the axioms at the base of mathematics rather than with a logical development of any one mathematical scheme on a Wiener’s work into context, we will now give a small digression onthe most widely known generalized function:the Dirac fixedfoundation.” Subsequent developments in theoretical delta function. physicshave corroboratedthis view. Inthis essay, we have The impuhe(Dirac delta) function hadbeen knownfor seen that since the time of Newton, the search for and the manyyears prior toits use by Dirac [26]in 1928. It was study of mathematicalmodels of physical phenomena have known by Heaviside [27]. However, it took the stature of a made it necessary to resort to a wide range of mathematical great physicist, Paul Dirac, to decree in 1928 the useof the toolsand have thusstimulated the development ofvarious impulse function in physics. In those early days, people used areas of mathematics. Now let us return to Norbert Wiener in 1930. to talkabout 6(t) a function of t in theordinary sense as Physicists areconcerned with unlocking the mysteries of whose integral with f(t)produces that is, f(0); nature, and the impulse (Dirac delta) function eases their task. The impulse function is the simplest of the generalized func- tions. Onecan imagine the plight of Wiener inthe matha

J-00 maticalcommunity when he introduced generalized random functionsinto the mathematical literature as early as 1923, This idea used to cause great distress to mathematicians, some and especially in his 1930 paper. of whom even declared that Dirac was wrong despite the fact Let us now give the gist ofWiener’s generalizedharmonic that he kept getting consistent and useful results, The physi- analysis. As we know,convolution in thetime domain cor- cists rejected these extreme criticisms and followed their inttti- responds to multiplication in the frequency domain. Thus in tion. We can now see why the physicists succeeded despite the terms of Fourier transforms, the above input-output convolu- 898 PROCEEDINGSIEEE, OF THE VOL. IO, NO. 9, SEPTEMBER 1982

tion integral becomes a calendar, to the work of the great mathematicians who for- mulated the wave equation in the eighteenth century, it took X(0)= B(0)E(o). thousands of years.Then the work of Bernoulli,Euler, and In this equation, E(w) is the Fouriertransform of white Fourier came, and the result was a spectral theory in terms noise, so that E(o) is ageneralized function that is white of sinusoidal functions, in place at the beginning of the nine- (i.e., very rough) in frequency.The filter’s transfer func- teenth century. The theory was extended to the case of arbi- tion E(o)is a smooth well-behaved (ordinary) function. The traryorthogonal functions by Sturm and Liouville, andthis product X(w)is also very rough. led to the greatest empirical success of spectral analysis yet Let us nowtake the inverse Fouriertransform of X(&)). obtained: the physical results of spectral estimation that un- It is locked the secret of the atom. Credit for this result belongs to Heisenbergand Schriidinger in 1925 and 1926. Then in 1= 1929, the work of von Neumann put the spectral theory of x (n)= ~1,ejwnX(w) do the atom on a fim mathematical foundation in his spectral representation theorem. The spectral work of von Neumann which is represents the cumulation of this line of research in quantum physics. Meanwhile, Rayleigh andSchuster at the beginning 1” of the twentieth century were applying the original sinusoidal x(n)= -2.rr ejW”B(w)E(w) dw. methods of Fourier to the analysis of datain the realm of classical physics. However, the periodogramapproach of This formula represents Wiener’s generalized harmonic analysis Schuster did not work well for purely nondeterministicstation- of x (n); that is, it is the spectral representation of the station- ary random processes, and this led Yule in 1927 to develop a ary random process x(n). It involves the smooth filter trans- spectraltheory for a subclass known as autoregressive pro- fer function B(w) and the very rough(white in frequency) cesses. Meanwhile, Wiener haddeveloped the mathematical process E(w). We thus see thatthe spectralrepresentation theory of Brownian movementin 1923, and in 1930 he in- requires Wiener’s generalized randomfunction E(o),which troducedgeneralized harmonic analysis, that is, the spectral came out of his studies of Brownian movement. representation of a stationary random process. Thus in 1930, Wiener’s generalized harmonic analysis (i.e., spectral repre- we have two spectral theories, one represented by the spectral sentation) explains why the periodogram of Schuster did not representation theorem of von Neumann and the other by the workfor convolutional processes. Because the periodogram spectral representation theorem of Wiener, It is the purpose (as the number of observations becomes large)is of thissection, with the benefit of hindsight, of course, to indicate the relationshipbetween von Neumannand Wiener 1 spectral theories. P(w)= - IX(0)l2 N The common ground is the Hilbert space. As we have seen, the von Neumannresult is thespectral representation of a it follows that the periodogram has the intrinsic roughness of Hermitian operator H in Hilbert space. The Schrodinger equa- the X(w)process. (It was not until thework of J. Tukey [34] tion is written in terms of a Hermitian operator, and this equa- in 1949 that a means was found to overcomethis problem; tion governs the spectrum of atoms and molecules. Now let Tukey’s breakthrough was of epoch proportions.) us, however, leave this Hilbert space and look at another one. Wiener in his 1930 paper gave the following method, which The other Hilbert space is one defined by the probability mea- was standarduntil the work of Tukeyin 1949. Wiener’s sure that governs the stationary random process in question. method was intended for very longtime series. Itconsisted we know, a Hilbert space is specified by an inner (or dot) of computing the autocovariance function as the time average As product. The elements of the Hilbert space are random vari- ables, and the inner product is defied as the expected value given by for -p Q k Gp, where p is less than the data length N, and (x,y)=Ex*y. then computing the powerspectrum @(a) as the Fourier transform (The superscript asterisk indicates the complex conjugate.) In this Hilbert space, a stationary process is defined as follows. We use discrete (integer) time n, although a similar develop ment can be made in the case of continuous time. A sequence k=-p of random variables x (n)in Hilbert space is called a stationary This Fouriertransform relationship between autocovariance random process if its autocorrelation and power spectrum, aswe have observed, is now called the @(k) = (x(n), x(n+ k)) Wiener-Khinchin theorem. depends only upon the time-shift k and not on absolute time Whereas von Neumann’s work in quantum physics in 1929 n. This definition implies that the elements x(n) of the pro- received instant acclaim and well-deserved recognitionby cess are generated recursivelyby a unitary operator; thatis, physicistsand mathematicians, Wiener’s workin 1930 lay dormant. However, now withthe benefit of hindsight, it is Ux(n)=x(n+ 1) worthwhile for us to reconcile these two approaches to spec- so that tral estimation. This we will do in the nextsection. x(n + k) = Ukx(n). XI. RECONCILIATION OF THE Two SPECTRALTHEORIES Because a unitary operator represents a rotation, we see that a We have come a long way in the history of spectral estima- stationary random process traces out a spiral in Hilbert space, tion to this point. From the work of the ancients in deriving the so-called Wiener spiral. We now come to the connection R OBINSON: HISTORICAL PERSPECTIVEHISTORICAL ROBINSON: OF SPECTRUM ESTIMATION 899 that we are seeking, namely, the fact that the Cayley-Mobius Although Wiener’s “GeneralHarmonic Analysis” did not transformation[28] of aHermitian operator is a unitary have immediateinfluence, his TimeSeries book, which was operator. Thus there is a one-bone correspondence between written in a more understandable style, did among those who Hermitian operators and unitaryoperators in Hilbert space. had access to the book in 1942 and thegeneral public in 1949. The von Neumann spectral representation is for a Hermitian As we will now see, a great deal of credit for the dissemination operator. Ifwe takeits Cayley-Mobius transformation, we ofWiener’s ideas belongs to his former student and his col- obtain the corresponding spectral representation for the uni- league, Professor N. Levinson. tary operator U. This spectral representation has the form Levinson’s initial contact with Wiener was in Wiener’s course in 1933-34 on Fourier Series and Integrals, which is described u=-1= in Levinson’s own words as follows: 2n 1,ejWQ(u) do “I became acquainted with Wiener in September 1933 while where Q(o)represents a family of projection operators as a still an undergraduate student of , when function of circular frequency o. Thus the process has the I enrolled in his graduate course. It was at that time really a representation seminar course. At that level he was amost stimulating teacher. He would actually carry out his research atthe 1= blackboard. As soon as I displayed a slight comprehension x(n)= U”x(0)= -2n 1. eiw”Q(u) x(0) do. of what he was doing, hehanded me the manuscript of Paley-Wiener for revision. I found a gap in a proof and We now make the identification proved a lemma to set it right. Wiener thereupon set down at his typewriter, typed mylemma, affixed myname and Uu)x (0) = X(u) sentit off to ajournal. A prominent professor does not often act a secretary for a young student.” and we obtain as N. Levinson, adynamic and brilliant mathematicianand a 1= warm and kind person, made important and permanent con- x(n) = ,I, e’W”X(o) do. tributions to engineering and applied science. The Levinson theorem in quantum mechanics illustrates his ability to grasp This equation is Wiener’s generalized harmonic analysis of the the relationship between physical concepts and mathematical process. Thus wehave the connection we sought;the two structure, Few have thisinsight, and nowhere is itbetter spectralrepresentations are related by the Cayley-MBbius demonstrated thanin the two expository papers writtenin transformation. 1942 by Levinson soon afterthe restrictedpublication of Wiener’s Time Series book. These two papers were published XII. WIENER-LEVINSONPREDICTION THEORY in 1947 in the Journal of Mathematical Physics, and thus they Early in 1940, Wiener became involved in defense work at represented the first public disclosure of Wiener’s time series MIT and, in particular, he became interested in the design of results. Later these two papers also appeared as Appendices C fire-control apparatus for anti-aircraft guns. The problem was and B inthe unrestrictedpublication ofWiener’s book in to build into the control system of the gun some mechanical 1949 by MIT Press 1241. device to aim the gun automatically. The problem, in effect, An appreciation of Levinson’s contribution can be gained in was made up of two parts: a mathematical part, which con- historical perspective. The 1942 edition of Wiener’s book was sisted of predicting the future position of an airplane from its bound in a yellow paper cover, and because of its difficult observed past positions, and an engineering part, which con- mathematics, it came to be known among engineers as the sisted of realizing the mathematical solution in the form of an “yellow peril” (a term familiar to mathematicians as applying actual physical device. Wiener recognized that it was not pos- to afamous series of advanced texts). However clear in a sible to develop a perfect universal predictor, and so he formu- conceptual way the building of an actual device was to Wiener, latedthe mathematicalproblem on astatistical basis.He there were few engineers at that time who were able to grasp defined the optimum predictor as the one that minimizes the Wiener‘s mathematical solution, much less to realize it in the mean-square predictionerror. The minimization led tothe form of a physical device. At this point, Levinson stepped in Wiener-Hopf integral equation, which represented the comple- and wrote “A heuristic exposition of Wiener’s mathematical tion of the mathematical part of the problem. As to the engi- theory of predictionand filtering,” [301, one of his two neering part, Wiener immediately recognized thatit was classic applied papers on explaining Wiener’s work. Levinson possible to devise ahardware apparatusthat represents the describes his paper as an expositoryaccount ofWiener’s solution to the Wiener-Hopf equation. As Wiener [29] states theory. Levinson’s earlier training was in electrical engineer- in his autobiography (p. 245): “It was not hard to devise ap ing, so he understood hardware design methods. In the paper, paratus to realize in the metalwhat we had figured out on Levinson shows in an elementary way why the Wiener-Hopf paper. All that we had todo was make a quite simple as- equationcannot be solved by use of theFourier transform sembly of electric inductances, voltage resistances, and capaci- theorem, Then, in a natural way, he introduces the spectral tors, acting on a small electric motor of the sort which you can factorization and obtains the explicit solution for the predic- buy from any instrument company.” Wiener’s mathematical tion operator and, more generally, for the filter operator. This results [ 241 were published in 1942 as a classified report to masterpiece of expositionopened up these methods tothe Section D2of the National Defense Research Committee. engineering profession. This report is Wiener’s famous TimeSeries book, which we Levinson’s other classic applied paper is entitled, “The Wiener mentioned previously. Its full title is Extrapolation,Inter- RMS (root mean square)error criterion in filter design and polation,and Smoothing of StationaryTime Series with prediction.” 131 I As before, let us try to put thispaper in EngineeringApplications, and it was republished as an un- historical perspective. In 1942, the Army Air Force Weather restricted document in 1949 by MIT Press, Cambridge. Division negotiated a contract with MIT to perform statistical 900 PROCEEDINGSIEEE, OF THE VOL. 70, NO. 9, SEPTEMBER 1982 analyses of meteorological and climatological data, particularly siumon Autocorrelation Analysis Applied to Physical Prob in relationship to weather forecasting, and to conduct research lems” held at Woods Hole, MA in June 1949, sponsored by the into the applicationof statistical techniques to long-range fore- Office of Naval Research, The high point of this meeting was casting [ 321. Professor G.P. Wadsworth of the MIT Mathe- thepaper by Tukey [34], Before Tukey’s work,the power maticsDepartment was in charge of thismeteorological spectracomputed from empirical autocorrelation functions project. The basic idea was to collect and sort large amounts were too erratic. to be of any use in formulating physical hy- of numerical meteorological data and to forecast by analogy, potheses. Not only did Tukey show correctly how to compute much like the forecasts made on television today, in which the power spectra from empirical data, buthe also laid the statisti- weatherman looks at the data appearing on the satellite picture cal framework for the analysis of short-time series, as opposed of the earth. Wadsworth’s methodhad merit, but the data to the very long ones envisaged by Wiener and Levinson. required was just not available in the 1940’s.Wiener’s Time Wadsworth was also director of the MIT section of the U.S. Series book was completedat about the same time as this Naval Operations Evaluation Group, a project started in World MIT Meteorological Project was starting up. Since the weather WarI1 which initiated the useof operations research in the data available occurredat discrete intervals of time,the United States. By 1950, Wadsworth was applying operations continuous-time methods of Wiener were not directly applic- research methods to industry and had established himself as able. As aresult, Wadsworth asked Levinson to writeup a one of the highest paid consultantsin the United States. discreteform of Wiener’s theory, Theresult wasLevinson’s There were so many industrial people waiting to see him in “Wiener RMS” paper with the Levinson recursion. However, his outer office at MIT that one had to make an appointment use was never made of Wiener-Levinson prediction theory by with his secretary many weeks in advance to see him in his the MIT Meteorological Project,and Levinson’s papersat inner office for just 5 or 10 minutes at the most. The writer dormant. began as one ofWadsworth‘s researchassistants in the MIT Inorder to understandwhy Levinson’s methods were not Mathematics Department in September 1950, and he was as- used in the 1940’s, one must look at the computing facilities signed to work in seismology by Professor Wadsworth. Mobil available at the time. The actual realization of these methods Oil made available eight seismic records, and the writer im- would have to be carried out by people using hand calculators. mediately got a very lonely feeling, especially at MIT at night A hand calculator could add, subtract, multiply, and divide, digitizing the Mobil seismic records with a ruler and pencil, but had no memory except an accumulator. Thus the result Except for Wadsworth, in 1950 nobody at MIT or in the oil of each separate calculation had to be transferred to paper by industry thought that theanalysis of digital seismic data would hand, a drawn-out, time-consuming process. In contrast to the ever be feasible. hardware devices working in real time, as envisaged by Wiener, Fortunately Tukey took an interest in the seismic project thehand calculator was apoor substitute. As Wiener [24, and conveyed his research ideas by mail. The first empirical p. 1021 states: “Much less important, though of real interest, results were the computation of the Tukey spectra forvarious is the problem of the numerical filter for statistical work, as sections of the Mobil recordsin the spring of 195 1. From contrasted with the filter as a physically active piece of engi- thesespectral results, a seismic analysis based on prediction neering apparatus.” error was formulated in the summer of 195 1, This analysis After Levinson wrote his two expository papers, which were made useof Wiener prediction theory in digital form. Prior completed in 1942, neither he nor Wiener took up researchin to this work, Wiener’s procedures had only been realized in thecomputational (software) aspect ofWiener’s theory. analogform. In hand plotting the fit numericalresults of Wienerwas moreinterested in its realization by machines what today is called linear predictive coding (LPC), the writer (hardware), and his research interest was already shifting to was so amazed that he could not believe his eyes, and he was biological and medical problems. In fact, it was theunion sure that he would never see such good results again. But the of these two research interests that led to his discovery and second trace, and the third trace, and so on, were computed formulation of the science of cybernetics, which he describes andconfumed what he saw. Thedigital processing method as the problem of control and communication in machines and called deconvolutionworked! As soon as possible, hemade animals. Meanwhile Levinson had decided as early as 1940 to an appointmentto see Wadsworth,which thesecretary set shift his field from the Fourier methods of Wiener to the field three weeks from then, in September 195 1. Theresult was of nonlineardifferential equations. He talkedabout this that the digitally-processed seismic traces [351 were sent out decision with his friends in 1940. Levinson worked hard over to the oil industry, and the oil companies gave money to sup a period of two or three years (which included the period dur- port a project. The MIT Geophysical Analysis Group was thus ing which he wrote the two expository papers) before he felt born,and the MIT Whirlwind digitalcomputer wasused to that he had enough mastery in his new field. Such mastery he analyze seismic records throughout the lifetime of the GAG did achieve, and his outstanding contributions to differential (1952-1957). During thisperiod, Tukey freely gave his re- equations were recognized by his receiving the prestigious search advice [ 361, [ 371. For example, Tukey’s methods for Bocher Prize in Mathematics in 1954. estimating coherency (today called by various names, such as Despite their other research interests, both Wiener and “semblance” by the oil industry) are vital in the estimation of Levinson were always ready to give their support and time to seismic velocity as well as in othermultichannel methods. the MIT Meteorological Project directed by G. P. Wadsworth. Tukey’svision of afast Fourier transform was always in- Wiener was especially interested in seeing physical examples of fluential.In fact, s. M. Simpson [38], wholater directed autocorrelationfunctions. This interestled to the computa- the Geophysical Analysis Group,eventually devisedan ef- tion ofseveral autocorrelation functions of ocean wave data ficient 24-point Fourier transform, which was a precursor to by Wadsworth and by his friend and associate H. R. Seiwell the Cooley-Tukey fast Fourier transform in 1965. The FFT [33], who was with the Woods Hole Oceanographic Institu- made all of Simpson’s efficient autocorrelation and spectrum tion. The interest in these computations led to the ‘‘SympG programs instantlyobsolete, on which he had worked the ROBINSON:PERSPECTIVE HISTORICAL ESTIMATIONOF SPECTRUM 901 equivalent of half a lifetime. Wiener was very generous of his tion was required for the proper design of experiments for the time [39]. Wiener’s work [40], [41] on multichannelmeth- collection of time series data. In a very interesting paper [44], ods was helpful laterin extending the Levinson recursion, Tukey describes the situation which led to his spectral work, which Levinson had devised for single channel time series, to includinga discussion of Hamming’s suggestion aboutthe the multichannel case [421. Theexcellent seismic dataand smoothing of the discrete Fourier transform of an empirical corresponding well logs supplied by the oil industry to the autocorrelation, which led to the joint work of Hamming and Geophysical Analysis Group made possible the development Tukey. of the statistical minimum-delay model of the earth’s strati- During the last four decades, Tukey [45]-[ 501 introduced graphic layers, together with the theoretical justification of a multitude of terms and techniques that are standard to the seismic [ 191, [43]. practice of thedata analysis of time series. Suchcommon- In the late 1950’s a digital revolution occurred because of place termsand concepts as “prewhitening,” “aliasing,” the introduction of transistors in the building of digital com- “smoothing anddecimation,” “tapering,” “bispectrum,” puters, which made possible reliable computers at amuch “complex demodulation,” and “cepstrum” are due to Tukey. lower cost than previously. As a result, the seismic industry Very few papers in the literatureof applied time series analysis completely converted to digital technology in the early 1960’s, do not give some acknowledgment of Tukey’s ideas and meth- a long ten years after the first digital results were obtained. ods, and most papers credit his ideas in some vital way. More- Since then, nearly every seismic record taken in the explora- over, Tukey [51]-[551 has made substantial contributions in tion of oil and natural gas has been digitally deconvolved and the placing of the data analysis of time series into perspective otherwise digitally processed by thesemethods. The final with current research in the physical sciences, in statistics, and result of the digital processing of seismic data was the dis- in computing and numerical analysis. covery of great oil fields which could not be found by analog We have already mentioned the key influence Tukey had on methods. These oil fields includemost of the offshore dis- the MIT group, which included Wadsworth, Simpson, the coveries, as in the North Sea, the Gulf of Mexico, the Persian writer, and others. W. J. Pierson and L. J. Tick [56] at New Gulf, as well as great onshore discoveries in Alaska, Asia, York University used Tukey’s methodsin the analysis of Africa, Latin America, and the Middle East, made in the last oceanographic time series records. The outstanding thesis of twenty years, Today an oil company will deconvolve and Goodman [ 571, which extended the results of Tukey to the process as many as one million seismic traces per day; it took bivariate case, was written under Tukey’s supervision. The a whole summer in 195 1 to do 32traces. groupat La Jolla, CA, which included Munk, Rudnick,and Whereas the digital revolution came first to the geophysical Snodgrass, applied Tukey’s spectral methods to estimate wave industry largely because of the tremendousaccuracy and flexi- motion due to storms manythousands of miles away; a bility afforded by large digital computers, today we are in the testimony to the power of his methods. Munk and McDonald midst of a universal digital revolution of epic proportions. One wrotea remarkable book, The Roration of theEarth [58], now realizes that the work of Wiener and Levinson is being which used these spectral methods in several novel ways. appreciated and used by an ever-increasing number of people. In this period, packages of computer programs fortime Digital signal processing is a growing and dynamic field which series analysis were appearing. The collection of programs by involves the exploration of new technology and the applica- Healy [59] of Bell Laboratories were circulated from1960 tion of the techniques to new fields. Thetechnology has on. The BOMM collection of programs [ 601 was developed at advanced from discrete semiconductorcomponents to very La Jolla. Some of the programs used by Parzen were included largescaleintegration (VLSI)with densities above 100 000 in his book [61]. The programs written at MIT are described components per silicon chip. The availability of fast, low-cost in [381 and 1621. microprocessors and custom high-density integratedcircuits In econometrics, Granger’s book [ 631 in 1964 described means that increasingly difficultand complex mathematical many of the techniques suggested by Tukey for the analysis methods can be reduced to hardware devices as originally of univariate and bivariate time series. Themost successful envisaged by Wiener, except the devices are digital instead of application of spectrum techniques to economic series is its analog. For example,customa VLSI implementation of use forthe description of themultitude of procedures of linear predictive coding is now possible, requiringa small seasonal adjustment. In astronomy, Neyman and Scott [64] number of custom chips, Whereas originally digital methods in 1958 carried outthe analysis of two-dimensional data were used at great expense only because the application de- consisting of the positions of the images of galaxies on phot@ manded high flexibility and accuracy, we have now reached graphic plates. thepoint that anticipated long-term cost advantages have Norbert Wiener remained active until his death in 1964. His become a significant factor for the use of digital rather than later work included both empirical results, such as modeling analog methods. and analyzing brain waves [661, [671, and theoretical results, such as his work with Masani on multivariate prediction theory XIII. TUKEY EMPIRICALSPECTRAL ANALYSIS [40], [41]. Wiener’s death marks the end of an era in time Aswe have mentioned,a turningpoint in the empirical series analysis and spectral theory. analysis of time series data began in 1949 at the Woods Hole Symposium on Applications of Autocorrelation Analysis, W.THE COOLEY-TUKEYFAST FOURIERTRANSFORM There Tukey presented the fit of three papers [34], [36], The present epoch of time series analysis began in 1965 with [37], which he had written in the early years on spectrum the publication of the fast Fourier transform by Cooley and analysis. These papers introducedthe classic Tukeymethod Tukey [ 681. The effect that this paper has had on scientific of numerical spectral estimation, a method that has been used and engineering practice cannot be overstated, The paper by most workers since that time. In addition, Tukey described described an algorithm for the discrete Fourier transform of an approximate distribution forthe estimate. This distribu T = TI- - Tp values by means of T(T1 + * - + Tp) multi- 902 PROCEEDINGS OF THE IEEE, VOL. 70, NO. 9, SEPTEMBER 1982

plicationsinstead of the naive number T2. Althoughsuch regularity of a noise-like process does not lie in the shapes of algorithmsexisted previously [69],they seem notto have its individual realizations, but in itsunderlying statistical struc- been put to much use. Sande developed a distinct, symmetri- ture. As a result, different realizations usually do not appear cally related algorithm simultaneously and independently. to resemble each other. A segment of one realization generally The existence of such an algorithm meant, for example, that will neither look the same nor have the same empirical aut@ the following things could be computed an order of magnitude covariances as the corresponding segment of another realiza- morerapidly: spectrum estimates, correlograms, filtered ver- tion. Anyone using the autocovariance from a given realiza- sions of series, complex demodulates, and Laplace transforms tion as a“known” autocovariance,and thenfitting this (see, for example, [70]). General discussions of the uses and “known” autocovariance exactly, is likely to be in worse error importance of fast Fourier transform algorithms maybe found than if hedoes not. As Brillinger andTukey point out, we in [7 1] and [ 721.The Fourier transform of an observed often have only one distinct realization, and we need to make stretch of series can now be taken as a basic statistic and clas- as good inferences as we can about the underlying population. sical statistical analyses-such as multiple regression, analysis We have to think statistically, and treat our numerical results of variance, principal components, canonical analysis, errorsin with due consideration as to the tentative nature of the struc- variables, and discrimination-can be meaningfully applied to tural and stochastic assumptions built into the model. It is for its values, [73] andreferences cited therein. Higher-order data near the noiseprocess end of the continuum thatmany of spectra may be computed practically [74]. inexpensivepor- the Fourier methods of spectrum estimation are intended and tablecomputers for carrying out spectralanalysis have ap are most effective. peared onthe marketand may be foundin many small At the other extreme are the signal-like processes. Some of laboratories. the best examples of such processes are found in exploration The years since 1965 have been characterized by the knowl- seismology. A concentrated energy source produces a seismic edge thatthere arefast Fourier transform algorithms. They record. Because the random background noise is usually weak, have also been characterized by the rapid spread of type of the record will appearessentially the same as anotherone data analyzed. Previously, the data analyzed consisted almost taken at the same place at a different time. As Brillinger and totally of discrete or continuous real-valued time series. Now Tukey point out, the study of signal-like processes may well be the joint analysis of many series, such as the 625 recorded by done by quite different methods. the Large Aperture Seismic Array in Montana [ 75 1, has be- Also there is anotheraspect of spectrumanalysis, one to come common. Spatial series are analyzed [75]. The statisti- which the limitations of the data have forced many applied cal analysis of point processes hasgrown into an entirely research workers. Here the models are not narrowly restricted separate field [761.The SASE iV computer program de- by reliable subject-matter knowledge. The time-series records veloped by Peter Lewis [771 has furthered such analysis con- are not long, and the appearance of the data is not distinctive. siderably. We notethat transforms otherthan the Fourier Corresponding records do not look alike. With almost nothing are finding interest as well [ 781, [ 791. to work with, the research workers can do nothing much more than fitting a few constants. Thus they fit low-order AR, MA, XV. BURG MAXIMUM ENTROPYSPECTRAL ANALYSIS and ARMA models usings the Box-Jenkins [ 931 methodology. Thediscrete Fourier transform of the autocovariance is As BWger and Tukey observe, this approach in a large num- called the powerspectrum. Thus the powerspectrum of a ber of applications seems to work much better than might be stationary time series with autocovariance @(n)is anticipated. The basic issue in spectrum estimation is the proper choice of model and then the resulting choice of the method of spec- @(a)= 2 $(n)e-’W” (for -n p. The question is how should we esti- trum. They then interpret their result accordingly. One must mate the power spectrum (1) from only this partial knowledge.not charge these workers with making any assumption about In order to answer this question, we should first consider the the uncalculated autocovariances. What they usually do is to phenomenon under study. As there are many different types recognize thatthe empiricalautocovariances thatthey do of phenomena, there is a corresponding diversity in spectrum calculate will vary from realization to realization, and so they analysis. Brillinger and Tukey state that no distinction is more do not and should not take the empirical values as absolute vital thanthat among a) noise-like processes, b) signal-like truth. processes, and c) signal-plus-noise processes. While b) is or- One of the purposes of this historical essay is to try to give dinarily unrealistic, there are enough cases of c) with only a the flavor of important developments in spectrum estimation. slight amount of noise so that b) is a helpful idealization. Unfortunately, the writer did not attend the 37th Meeting of A noiselike process produces time series quite different in the Society of Exploration Geophysicists in 1967, although he characterthan those produced by a signal-like process. The attended the meetings in the years both before and after that ROBINSON: HISTORICAL PERSPECTIVEHISTORICAL ROBINSON: OF SPECTRUM ESTIMATION 903

one. It was at that meeting in Oklahoma City that John Burg that is consistentwith the given measurements, Specifically, presented a paper that was to shake the foundations of spec- the given measurementsare the known autocovariance coef- trum estimation. This fundamentalwork [80] is entitled ficients, namely Maximum Entropy Spectral Analysisand its abstract reads: "The usual digital method of obtaining a power spectrum estimate from a measured autocovariance function makes the assumption that the correlation function is zero at alllags In terms of information theory, we require that the entropy for which no estimate is available and uses some treatment per sample of time series is a maximum. From the workof of the estimated lags to reduce the effect of truncation of Shannon in 1948, it follows that the entropy is proportional the autocovariancefunction. The method discussed in this to the integral of the logarithm of the power spectrum, thatis, paperinstead retains allof theestimated lags without the entropy is modification and uses a nonzero estimate for the lags not directly estimated. The particular estimation principle used is that the spectral estimate must be the most random or 1: log @(a)do. (4) have the maximum entropy of any power spectrum which is consistentwith the measureddata. This newanalysis Therefore the required maximum entropy power spectrum is technique gives a much higher resolution spectral estimate that function @(w)which maximizes (4) under the constraint than is obtained by conventionaltechniques with a very equations (3). little increase in computing time. Comparisons will illustrate One way to solve the problem of finding the maximum en- the relative importance." tropy powerspectrum subject to fixed values of @(n) for Theestimation of the powerspectrum of stationarytime 1 n I < p is by use of Lagrange multipliers. However, we may series frompartial knowledge of its autocovariance function instead use the following approach. From (1) we see that the is a classical problem to which much attention has been given partial derivative of @(a)with respect to @(n) is over the years. Almost all of this work is based on the use of windowfunctions, whose properties can beanalyzed by Fouriermethods. Burg [go], [811, [ 821 in his pioneering work introduced a new philosophy in spectral analysis based It follows that on general variational principles, and, in particular, the maxi- mum entropy method(MEM) which we will now discuss. Theconventional approach to estimating the power spec- trum from @(n), In1 Qp, is to assume that @(n) = 0 for n >p and to takethe Fourier transform of w(n)@(n), In1 Qp, Now let us maximize (4) with respect to the unknown values where w(n)is a weighting function. We now want to describe @(n) where 1 n 1 > p. Thus we set the partial derivatives of (4) the maximum entropy method of Burg. with respect to @(n) for In I > p equal to zero; that is Given a limited set of autocovariance coefficients together with the fact that a power spectrum @(a)must be non-nega- tive, we know that there are generally an infinite number of power spectra in agreement with this information. Thus ad- ditional information is required, and a reasonable goal is to for In1 >p. (6) finda single function @(a), which is representative of the class of all possible spectra. In order to resolve this problem, Making use of (5), we see that (6)reduces to some choice has to be made, and Burg made use of the con- cept of entropy. Maximum entropy spectral analysis is based [@(o)l-' e-iwn dw = 0, for I p. (7) onchoosing thatspectrum whichcorresponds tothe most In > random or the mostunpredictable time-serieswhose aut@ covariance coincideswith the given set of values. This con- This equation specifies the form of the inverse power spectrum cept of maximum entropy is the same as that used in both [@(a)]-' of a maximum entropy process. Let us explain. statistical mechanics and information theory, and as we will Given any stationary process with positive power spectrum see represents the most noncommittal assumptionpossible with @(a),then its inverse power spectrum regard to the unknownvalues of the autocovariance function. Equation (1) gives the power spectrum @(a)as the discrete Fourier transform of the autocovariance function @(n). From Fourier theory, we know that the autocovariance function can is also positive. We assume that the inverse power spectrum is beobtained as the inverse Fouriertransform of the power integrableand bounded, so thatit is a well-behaved power spectrum; thatis spectrum in its own right. Thus the inverse power spectrum (8) can be associated with an autocovariance function, which 1" @(n) = @(a)eiwn dw (for all integers n). (2) we designate by Jl(n), such that the counterparts of (1) and (2) hold, namely The fundamental assumption involved in maximum entropy spectral analysis is that the stationaryprocess under considera- tion is the most random or the least predictable time series 9 04 PROCEEDINGS OF THE IEEE, VOL. 70, NO. 9, SEPTEMBER 1982 and ical inverse problem. Itakura and Saito 1841 were responsible for introducing two important ideas into spectrum estimation that are now gainingwide acceptancein the engineering world. The first idea is that of using maximum likelihood in spectrumestimation. Although the ideaitself was not new, (10) their introduction of a particular gpectrum distance measure is becoming more and more important for different applica- Because we do notnormalize, the zero-lag value $ (0) does not tions, such as speech. Parzen [85] gives the name information have to be one. divergence to this measure, which is the same as the Kullback- Let us now return to the maximum entropy process. As we Leibler information number. He also shows its relation to the have seen, its inverse power spectrum satisfies (7) for >p. In1 notion of cross-entropy. The second idea is that of using the In (7) replace by -n, and also multiply each side of the equa- n lattice as a filter structure for the purpose of analysis (as an tion by 1/27~.Thus we obtain all-zero filter) and synthesis (as an ail-pole filter). The idea of an adaptive lattice was fmt proposed by Itakura and Saito as a 1= - [@(o)~-'e'~~dw=~,for ~nl>~. (11) way of estimating thepartial correlation (PARCOR) coef- 2n I, ficients(a termthey coined) adaptively.Makhoul [86] has shown how Burg's technique is really a special case of lattice Comparing (1 1)with (10), we see that for a maximum entropy analysis. Also, the lattice has become important because of its process we have fast convergence and its relative insensitivity to roundoff $(n)=O, for Inl>p. (12) errors.

Hence, using (12) in (9), we see that the inverse power speo XW. STATISTICAL THEORY OF SPECTRUM ESTIMATION trum of a maximum entropy process is Since the pioneeringwork of Tukey[341 in 1949, many importantcontributions have beenmade tothe statistical [+(a)] -1 = 5 IC, (n)e-iwn. (13) theory of spectrum estimation. An adequate treatment would n=-p require a long paper in itself, andso all we can hope to dohere is to raise the reader's consciousness concerning the statistical The right-hand side of (13) is afinite trigonometric series. theory required to understandand implement spectrum Thus we have shown that themaximum entropy process is one estimation. whose inverse power spectrum is a finite trigonometric series, Thewriter has great admiration for the work of Parzen, or equivalently one whose power spectrum is the reciprocal of whofrom the 1950's to the presenttime has consistently a finite trigonometricseries, that is made bedrock contributions both in theory and applications 1 [61], [85], [87],[88]. His long series of paperson time @.(a)= (14) series analysis includethe famous Parzen window for spec- f $(n) e-iwn trum analysis. Anotherone ofParzen's importantcontribu- n=-p tions is his formulation of the time series analysis problem in If we let z = elw, then the finite trigonometric series (13) terms of reproducingkernel Hilbert spaces. A remarkable becomes number ofPh.D. theses ontime series analysishave been written under the direction of Parzen, more than any other person. The writer has had the good fortune to discussgeo- physical time-series problems with Professor Parzen over the years, and in every case Parzen has been ableto provide impor- This can befactored by the Fej6r method(Robinson [43, tant physical insight in the application of the statistical meth- p. 1941) as ods. The Harvard lectures by Professor Parzen in 1976 repre- sent one of the high points in timeseries analysis and spectrum P 1 estimation ever to be heard in those venerable halls. =-[l +alz-l +* "+apz-q $(n)z-" The book by Grenander and Rosenblatt [651 in 1957 for- n=-p U2 malized many of the data analysis procedures and approxima- - [l +alz+***+apzP] (15) tions that have come into use. They have an extensive treat- ment of theproblem of choice of window andbandwidth. where is a positive constant and where u2 The further contributions to this problem by Parzen and by ~(z)= 1 + alz-' + * * * + apz-p (16) Jenkins are discussed in the 1961 paper of Tukey [ 5 1 1. An accurate and informative account of the developments of speo is minimumdelay (i.e., where A(z) has no zeros on or out- trum estimation in the 1950'sis given by Tukey [44]. side the unit circle). Thus the maximum-entropy process is Anotherimportant statistical development that deserves specified by mention is the alignment issue in the estimation of coherence, -2 (I worked on in the 1960's by Akaike andYamanouch, by @(z)= Priestley,and by Parzen.Discussions of thiswork and the A(z)A(z-') * references can be found in the book by Priestley [89]. This This result shows that the maximum entropy process is an AR excellent book, which appeared in 1981, has already set a new process of order p. standard. It can be recommended as an authoritative account Silvia and Robinson [ 831, through theuse of lattice methods, of the statistical theoryof spectrum estimation, whichwe only have related the concept of maximum entropy to the geophys- touch upon inthis section. ROBINSON:PERSPECTIVE HISTORICAL OF SPECTRUM ESTIMATION 905

H.Wold coined the names“moving averageprocess” and ACKNOWLEDGMENT “autoregressive process” in his 1938 thesis [90] under Profes- Iwant to expressmy sincere appreciation to Prof. D. R. sor HaraldCramCr atStockholm University. In his thesis, Brillinger who let me freely use his paper, “Some history of Wold computed a model of the yearly level of Lake Vaner in data analysis of time series in the United States,” in History Sweden as a movingaverage of the current rainfalland the of Statistics in the United States, edited by D. B. Owen and previousyear’s rainfall. He also computedan autoregressive published by Marcel Dekker in 1976. I want to thank Dr. J. model of the businesscycle in Sweden for the years1843- Makhoul for sending me notes on maximumlikelihood and 1913. In turn, Whittle wrote his 1951 thesis [91] under Pro- lattice networks. I want to especially thank the authors cited fessorWold at UppsalaUniversity. Whittleopened up and inthe ReferenceSection whose constructivecomments ma- made important contributions to the field of hypothesis test- teriallyimproved this paper. In writingan historical paper, ing in timaseries analysis. Whittle’s careful work is exemplified we should include a thousand references instead of one hun- by his autoregressive analysis of a seiche record [92] in which dred, so important statistical contributions have unfortunately he fits a low level autoregressive model to the data and gives been left out. Of course, we have purposely not included engi- statistical tests to determine the appropriateness of the model. neering contributions (as they are covered in the rest of this Professor Whittle used to return to Sweden for visits, and the special issue) but there is never a clear cut line, and so in this writerremembers taking long walks withhim through the sense other important work has also been left out. However, Uppsala countryside exploring for old runestones and ancient all such omissions are not intentional, and we will gladly try Viking mounds.Although the writerhad left the University to rectifyany situation in someappropriate future publica- of Wisconsin to work with ProfessorWold in Sweden, it turned tion. Finally, most of all, I want to thank Prof. J. Tukey for out that Wisconsin under the leadership of Professor G. Box the support and help he gave me on spectrum estimation thirty became the real center of time series analysis. It was the joint years ago at MIT for which I am forever grateful. work of Box with Professor G. M. Jenkins [93] that actually brought the autoregressive (AR) process and the moving aver- age (MA) process to theattention of the general scientific REFERENCES community. The brilliance of this work has made the names [ 1 ] I. Newton, Optics London,England, 1704. Box-Jenkins synonymous withtime-series analysis. No achieve- [ 21B. Taylor, MethodusZncrementomm Directa et Inverse. Lon- ment is better deserved. No personunderstands data better don, England, 171 5. [ 31 D. Bernoulli, Hydrodynamics Basel,Switzerland, 1738. than Box in the application of statistical methods to obtain (41 L. Euler, Znsritutiones Calculi Differentialis St. Petersburg, meaningful results. Russia, 1755. the 1960’s Parzen [87] and Akaike [941 discussed auto- [ 51 J. L. Lagrange, Thkorie desFonctions Analytiques Paris, In France, 1759. regressive spectrumestimation, and this workled to their [a] J. Fourier, Thkorie Analytiquede la Chaleur. Paris, France: crucial work on autoregressive order-determining criteria [88] Didot, 1822. [ 71 C. Sturm, “Memoire sur les kquations diffkrentielles linkaires du and[95]. Suchcriteria have made possible the widespread second ordre,” Journal de Mathkmatiques Pures et Appliquies, application ofautoregressive spectrumestimation by re- Paris, France, Series 1, vol. 1, pp. 106-186, 1836. searchersin diverse scientific fields. Akaikehas provided a [ 81 J. Liouville, “Premier mkmoire sur la thkorie des kquations dif- firentielles linkaries et sur ledeveloppement des fonctions en link between statistics and control theory with deep andsignif- skries,” Journal de Mathimatiques Pures et Appliquies, Paris, icantresults, and his work is of the highest traditionthat France, Series 1, vol. 3, pp. 561-614, 1838. science can provide. Young research workers can learn much [ 91 G. Green, Essay on the Application of Mathematical Analysis to the Theories of Electricity and Magnew Nottingham, En- by studying his writings well. land, 1828. We wish we had more space and knowledge to expand upon [ 101 E. Schrodinger, Collected Papers on Wave Mechanics London, this section, and those many statisticians whom we have not England: Blackie, 1928. [ 111 W. Heisenberg, The Physical Principles of QuantumTheory. mentioned should remember that this history is by no means Chicago, IL: University of Chicago Press, 1930. the final word. Someday we hope to write more fully on this [ 121 J. von Neumann, “Eigenwerttheorie Hermitescher Funcktional- operatoren,”Math Ann., voL 102, p. 49, 1929. subject, and we welcome all comments and suggestions. [ 131 -, Mathematische Grundhgen der Quantenmechanik. Berlin, Germany: Springer, 1932. I141 M. vonSmoluchowski, TheKinetic Theory of Matter and XVII. ENGINEERINGUSE OF SPECTRAL ESTIMATION Electricity. Leipzigand Berlin, Germany, 1914. [ 151 k Einstein,“On the theory of theBrownian movement,”. The purpose of this section is only to refer to therest of the Annalen derPhysik, vol. 19, pp. 371-381, 1906. 161 N. Wiener, “Differential space,” J. Math, Phys., vol. 2, p. 131. papers in this specialissue of the Proceedings of theIEEE. 171 k Schuster,” On the investigation of hidden periodicities with These other papers cover the engineering use of spectral esti- application to a supposed 26-day period of meterological phe- mationmuch better than we could do here.There papers nomena,” Tew. Magnet., vol. 3, pp. 13-41, 1898. 181 G. U.Yule, “On a method of investigatingperiodicities in dis- representa living history of the present status of spectral turbedseries, withspecial reference to Wolfer’ssunspot num- estimation,and, in them and in the references which they bers,” Phil Trans Roy. Soc London, A, vol. 226, pp. 267-298. 191 E. k Robinson, Predictive Decomposition of Time Series with give, the reader can find the works of the people who have Applications to Seismic Exploration, MIT Geophysical Analysis made spectralanalysis and estimation avital scientific discipline Group, 1954; Reprintedin Geophysics, voL 32, pp. 418-484, today. As general references, we would especially like to men- 1967. [20] -, AnIntroduction to Infinitely Many Variates London, tionthe 1978 IEEE bookedited by Childers [96], Haykin England: Griffin, 1959, p. 109. [971,the RADC SpectrumEstimation Workshop [98],the [21] N. Wiener, “Generalizedharmonic analysis,” ActaMath., vol. 55, pp. 117-258, 1930. First IEEE ASSP Workshop on Spectral Estimation [99], and [22] A. Y. Khintchine,“Korrelations theorie der Stationiiren St* Ulrych and Bishop [ 1001. Although much progress has been chastischen Prozesse,”Math Ann., vol. 109, p. 604. made, much work yet remains to be done, and there is adven- [23] N. Wiener, Cybemerics Cambridge, MA: MIT Press, 1948. [24] -, Extrapolation, Interpolation, and Smoothing of Station- ture fora research worker who setshis course in this rewarding ary TimeSeries with Engineering Applications, MIT NDRC and exciting field. Report, 1942, Reprinted, MIT Press, 1949. 906 PROCEEDINGS OF THE IEEE, VOL. 70, NO. 9, SEPTEMBER 1982

[25] -, The FourierZntegraL London, England: Cambridge, 1933. [ 571 N. R Goodman, “On the joint estimation of the spectra, cospec- [26] P. Dirac, Principles of Quantum Mechanics New York: Oxford trum and quadrature spectrum of a two-dimensional stationary University Press, 1930. Gaussian process,” Science Paper no. 10, Engineering Statistics [ 271 0. Heaviside, Electrical Papers, vol. I and 11. New York: Mac- Laboratory, New York University, New York, 1957. millan, 1892. [58] W. H. Munk and G. J. F. MacDonald, TheRotation of the [28] J. von Neumann, “Uber Funktionen von Funktional opera- Earth. New York: Cambridge University Press, 1960. toren,”Ann Math., vol. 32, p. 191. [ 591 M. J. Healy and B.P. Bogert, “FORTRAN subroutines for time [ 291 N. Wiener, Z Am a Mathematician Cambridge, MA: MIT Press, series analysis,” Commun Soc. Computing Machines, voL 6, pp. 1956. 32-34, 1963. [30] N. Levinson, “A heuristic exposition of Wiener’s mathematical (601 E. C Bullard, F.E. Ogelbay, W. H. Munk, and G.R. Miller, A theory ofprediction andNtering,” Journal of Math and User’s Guide to BOMM, Institute of Geophysics and Planetary Physics, voL 26, pp. 110-1 19, 1947. Physics, University of California Press, San Diego, 1966. [ 31 ] N. Levinson, The Wiener RMS (root mean square) error crite [6l] E. Parzen, TimeSeries Analysis Papers. San Francisco, CA: rion in filter design and prediction, J. Math Phys., vol. 25, pp. Holden-Day, 1967. 261-278, 1947. [62] E. A. Robinson, Multichannel Time Series Analysis with Digital [ 321 G.P. Wadsworth, Short-Range and Extended Forecasting by Computer Programs San Francisco, CA: Holden-Day, 1967. Statistical Methods, Air Weather Service, Washington, DC, 1948. I631 C. W. J. Granger, Spectral Analysis of Economic Time Series [ 33) H. R Seiwell, “The principles of time series analyses applied to Princeton, NJ: Princeton University Press, 1964. ocean wave data,” in Proc Nut Acad Sei., U.S.., voL 35, pp. [64] J. Neyman and E.L. Scott, “Statisticalapproach to problems 518-528, 1949. of cosmology,”J. Roy. Statist Soc., Series B, vol. 20, pp. 1-43, [34] J. W. Tukey,“The sampling theory of powerspectrum esti- 1958. mates,” in Proc Symp. AppL Autocorr. AnaL Phys Prob. U.S. [65] U. Grenander and M. Rosenblatt, StatisticalAnalysis OfStation- On Naval Res (NAVEXOSP-725), 1949. Reprinted in J. ary Time Series New York: Wiley, 1957. Cycle Res., voL 6, pp. 31-52, 1957. [ 661 N. Wiener, Nonlinear Problems in Random Theory. Cambridge, [35] G.P. Wadsworth, E. A. Robinson, J. G. Bryan, and P. M. Hur- MA: MIT Ress, 1958. ley,“Detection of reflections on seismic records by linear [67] -, “Rhythm in physiology with particularreference to en- operators,” Geophys, vol. 18, pp. 539-586, 1953. cephalography,” Proc Roy.Virchow Med Soc., NY,vol. 16, I361 J. W. Tukey and R. W. Hamming, “Measuring noise color 1,” pp. 109-124, 1957. Bell Lab. Memo, 1949. [68] J. W. Cooley and J. W. Tukey,” An algorithm for the machine 1371 J. W. Tukey, Measuring NoiseColor, Unpublished manuscript calculation of Fourier series,” MathComput., vol. 19, pp, prepared for distribution atthe Institute of Radio Engineers 297-301, 1965. Meeting, Nov. 1951. (691 J. W. Cooley, P. A. W. Lewis, and P.D. Welch, “Historical notes [38] S. M. Simpson, TimeSeries Computations in FORTRAN and on the fast Fouriertransform,” ZEEE TransAudio Electro- FAR Reading, MA: Addison-Wesley, 1966. acoust., v01. AU-15, pp. 76-79, 1967. [ 391 N. Wiener, Bull Amer. Math Soc., vol. 72, pp. 1-145. [70] C. Bingham, M. D. Godfrey, and J. W. Tukey, “Modern tech- [40] N. Wiener and P. Masani, “The prediction theory of multivariate niques of power spectrum estimation,” ZEEE Trans Audio stochastic processes,” Acta Math., vol. 98, pp. 111-150, 1957. Electroacoust., vol. AU-15, pp. 56-66, 1967. (41] -, “On bivariate stationary processes,” Theory Prob. AppL, [ 71 ] E. 0. Brigham and R E. Morrow, “The fast Fourier transform,” voL 4, pp. 300-308. ZEEE Spectrum, pp. 63-70, 1967. [42] R A. Wiggins and E. A. Robinson, “Recursive solution to the [ 721 IEEE TransAudio Electroacoust (Special issue on Fourier multichannel filtering problem,” J. of Geophys Res., vol. 70, transform), B. P. Bogert and F. Van Veen, E&., vol. AU-15, pp. 1885-1891, 1965. June 1967 and voL AU-17, June 1969. [43] E. A. Robinson, Statistical Communication and Detection with [ 731 D. R Brillinger, Time Series: Data Analysis and Theory. New Special Reference to Digital Data Processing of Radar and York: Holt, Rinehart, and Winston, Inc., 1974; revised edition Seismic Signals London, England: Griffin, 1967. Reprinted by Holden-Day, San Francisco, CA, 1980. with new title, Physical Applications of Stationary Time Series. [74] D. R Brillinger and M. Rosenblatt, “Computation and interpre- New York: Macmillan, 1980. tation of k-th order spectra,” in Spectral Analysis of Time Series, [44] J. W. Tukey, “An introduction to the calculations of numerical B. Harris, Ed. New York: Wiley, 1967, pp. 189-232. spectrum analysis,” in SpectralAnalysis of TimeSeries, 8. [75] I. Capon, “Applications of detection and estimation theory to Harris, Ed. New York: Wiley, 1967, pp. 25-46. large array seismology,”Proc ZEEE, vol. 58,.pp. 760-770,197& [45] H. Press and J. W. Tukey, “Power spectral methods of analysis [76] StochasftcPoint Processes, P. A. W.Le-, Ed. New York: and their application to problems in airplane dynamics,” Bell Wiley, 1972. Syst Monogr., voL 2606, 1956. [77] P. A. W. Lewis, A. M. Katcher, and A. H. We& “SASE IV-An I461 R B. Blackman and J. W. Tukey, “The measurement of power improved program for the statistical analysis of series of events,” spectra from the point of view of communications engineering,” ZBMRes Resp., RC2365, 1969. Bell Sysf Tech J., voL 33, pp. 185-282, 485-569, 1958; also [78] Roc Symp. Walsh Functions, 1970-1973. New York: Dover, 1959. [79] A. Cohen and R H. Jones, “Regreasion on a random field,” [47] J. W. Tukey,“The estimation of power spectra and related J. Amer. Statist. Ass., vol. 64, pp. 1172-1182, 1969. quantities,” On NumericalApproximation, R. E. Langer, Ed. [80] J.P. Burg, MaximumEntropy Spectral Analysis. Oklahoma Madison, WI: University of Wisconsin Press, 1959, pp. 389-411. City, OK, 1967. [48] -, “An introduction to the measurementof spectra,” in [ 81 ] -, Maximum Entropy Spectral A~lysis.Stanford, CA: Stan- Probability and Statistics, U. Grenander,Ed. New York: ford University, 1975. Wiley, 1959, pp. 300-330. [ 821 R T. Lacoss, “Data adaptive spectral analysis methods,” Gee [49] -, “Equalization and pulse shaping techniques applied to the PhySiCS, VOL 36, Pp. 661-675,1971. determmation of the initial sense of Rayleigh waves,” in The I831 M. T. Silvia and E. A. Robinson, Deconvolution of Geophysical Need of Fundamental Research in Seisnology, Appendix 9, Time Series in the Exploration of Oil and Natuml Gas Amster- Department of State, Washington, DC, 1959, pp. 60-129. dam, The Netherlands: Ekevier, 1979. [SO] B. P. Bogert, M. J. Healy, and J. W. Tukey, “The frequency anal- [841 F. Itakura and S. Saito, “Analysis synthesis telephony based on ysis of time series for echoes; cepstrum pseud-autocovariance, on the maximumlikelihood method,” in Proc Sfxth Znf Cow. cross-cepstrum and shape-cracking,’’ in TimeSeries Analysis, on AcouJtlcs, (Tokyo, Japan), 1968. M. Rosenblatt, Ed. New York: Wiley, 1963, pp. 201-243. [ 85 I E. Parzen, “Modern empirical spectral analysis,” Tech. Report [ 511 J. W. Tukey, “Discussion emphasizing the connection between N-12, Texas A&M Research Foundation, College Station, TX, analysis of variance and spectrum analysis,” Technometrics, 1980. VOL 3, pp. 1-29, 1961. (861 J. Makhoul, “Stable and efficient latticemethods for linear [52] -, “The future of data analysis,” Ann Math Statipt., vol. prediction,” ZEEE TransAcoust., Speech, Signal Processing, 33, pp. 1-67, 1963. voL ASSP-25, pp. 423-428, 1977. [ 531 -, ‘What can data analysis and statisticsoffer today?” in [871 E. Parzen, “An approach to empirical time series,” J. Res Not. Ocean Wave Spectra, Net. Aced Sci., Washington, DC, and Bur. Stand., VOL 68B, pp. 937-951, 1964. RenticeHall, Englewood Cliffs, NJ, 1963. I881 -, “Some recent advances in time series modeling,” ZEEE [ 541 -, “Uses of numerical spectrum analysis in geophysics,” Bull Tram Automat Contr., voL AC-19, pp. 723-730,1974. Znt Statist. Znst., voL 39, pp. 267-307, 1965. [891 M. Priestley, SpectralAnalyak and Time Series, 2 volumes, [ 551 -, “Data analysis and the frontiers of geophysics,” Science, London, England: Academic Press, 1981. voL 148, pp. 1283-1289, 1965. I901 H. Wold, A Study in the Analysis of StationaryTime Series. [56] W. J. Pierson and L. J. Tick, “Stationary random processes in Stockholm, Sweden: Stockholm University, 1938. meteorology and oceanography,” BuU Znt Statist. Znst., vol. [911 P. Whittle, Hypothesis Tesring in Time-Series Analysis. Upg 35, pp. 271-281, 1957. sala, Sweden: Almqvist and Wiksells, 1951. PROCEEDINGSIEEE, OF THE VOL. 70, NO. 9, SEPTEMBER907 1982

(921 -, The statistical analysis of a seiche record, J. Marine Res., Germany: Springer, 1979. voL 13, pp. 76-100, 1954. [98] hoc of the RADC Spechum Estimation Workshop, Rome Air [93] G. Box and G. M. Jenkins, TimeSeries Analysis, Forecasting, Development Center, Griffw Air Force Base, NY, 1979. and Control Oakland, CA: Holden-Day, 1970. [99] hoc of the FirstASSP Workshop on SpectralEstimation, [94] H. Akaike, “Power spectrum estimation through autoregressive McMaster University, Hamilton, Ontario, Canada, 1981. model fitting,” Ann Znst Statist Math., vol. 21, pp. 407-419, [loo] T.J. Ulrych and T.N. Bishop, “Maximum entropyspectral 1969. analysis andautoregressive decomposition,” Rev.Geophys., (951 -, “Anew look at statistical modelidentification,” ZEEE VOI. 13, pp. 183-200, 1975. Trans Autom Contr., vol. AC-19, pp. 716723, 1974. [ 1011 P. R Gutowski, E. A. Robinson, and S. Treitel, “Spectral esti- [96] D.G. Childers, ModernSpechum Analysis New York:IEEE mation:Fact or fiction,” ZEEE TransGeosci Electronics, Press, 1978. vol. GE16, pp. 80-84,1978. Reprinted in D.G. Childers, 1971 S. Haykin, NonlinearMethods of SpectralAnalysis Berlin, Modern Spechum Analysis New York: IEEE Press, 1978.

Spectral Estimation: An Overdetermined Rational Model Equation Approach

JAMES A. CADzoW, SENIORMEMBER, IEEE

Al#frrret-In seeking rational models of time series, the concept of statisticalcharacteristics of a wide-sense stationarytime approximatingsecond-order statistical relationships @e., theYule- series. More often than not, this required characterization is Walkerequations) is often explicitly or implicitlyinvoked. The pa- embodiedin the time series’ autocorrelation Zag sequence as rameters of the hypothesized rational model are typically selected so that these relationships “best represent” a set of autocodation lag specified by estimates computed from time series obmvations. One of the objec- r,(n> =~{x(n+ rn) ?(rn)} (1.1) tives of this paper win be that of establishing this fundamental ap- proach to the generation of rational models. in which E and - denote the operations of expectation and An examination of many popular contempomy spectral estimation complexconjugation, respectively. From this definition,the methods reveals that the parameters of a hypothesized rational model are estimated upon using a ‘‘minhd’’ set of YuleWalker equation well-known property that the autocorrelation lags are complex evaluations. This results in an undesired parameterhypersensitivity conjugate symmetric (i.e., r,(-n) = F,(n>) is readily established. and a subsequent decrease in estimation performance. To counteract We will automatically assume this property whenever negative this parameter hypersensitivity, the concept of using more than the lag autocorrelation elements (or their estimates) are required. minirml numberof YuleWalkerequation evaluations is herein ad- The second-order statistical characterization represented vocated. It is shown that by taking thii overdetermined parametric as evaluationapproach, a reduction in data-inducedmodel parameter by the autocorrelation sequence may be given an “equivalent” hypersensitivity is obtained,and a corresponding improvement in frequency-domain interpretation. Namely, upontaking the modelingperformance results. Moreover,upon adapting a singular Fourier transform of the autocorrelation sequence, thatis, value decomposition representation of an extended-order autocorrela- tion matrix estimate to this procedure, a desired model order determi- 00 nation method is obtained and a further significant improvement in rx(n)e-inwS,(ejw)= (1.2) modeling performance is achieved. This approach makes possible the n=-w generation of low-order highquality rational spectral estimates from we obtainthe associated powerspectral density function short data lengths. &(eiw) in which the normalized frequency variable w takes I. INTRODUCTION on values in [-n,n]. This function possesses a number of salient properties among which are that it is a positive semi- N A VARIETY of applicationssuch as found inradar definite,symmetric (if thetime series is real valued),and Doppler processing, .adaptivefiltering, speech processing, periodic function of a. This function is seen to have a Fourier Hunderwater acoustics, seismology, econometrics,spectral series interpretation in which the autocorrelation lags play the estimation, and array processing, it is desired to estimate the role of theFourier coefficients. It,therefore, follows that these coefficients may be determined from the power spectral Manuscript received May 24, 1982. This work was supported in part density function through the Fourier series coefficient integral by the Signal Processing Section of RADC under Grant AFOSR-81-0250 andby the Statistics and Probability Program of the Office of Naval expression Research under Contract N00014-82-K-0257. The author is with the College of Engineering and Applied Sciences, In Department of Electricaland Computer Engineering, Arizona State rx(n)= -2n S,(eiw) eiwn dw. University, Tempe, A2 85287. I,

0018-9219/82/0900-0907$00.750 1982 IEEE