146 IEEE TRANSACTIONS ON , VOL. IT-20, NO. 2, MARCH 1974 A View of Three Decades of Linear Filtering Theory

Invited Paper

THOMAS KAILATH, , IEEE

Abstrucf-Developments in the theory of linear least-squares estima- fields; problems of scalar and matrix polynomial factoriza- tion in the last thirty years or so are outlined. Particular attention is tion with applications in network theory and stability paid to early mathematical. wurk in the field and to more modem develop- theory [135], [177] ; the solution of linear equations, ments showing some of the many connections between least-squares filtering and other fields. especially as they arise in constructing state-variable realizations from impulse-response or transfer-function I. IN~~DUCTION AND OUTLINE data, which in turn is related to the Berlekamp-Massey HE SERIES of survey papers of which this is a part algorithm for decoding BCH codes [289], [191]; and the was begun largely to commemorate the twenty-fifth inversion of multivariable linear systems [357], [361], anniversaryT of the publication of Shannon’s classic paper [362]. There are also more purely mathematical ramifica- on information theory. However, 1974 is also twenty-five tions in Hilbert-space theory, operator theory, and more years after the publication in the open literature of Wiener’s generally in functional analysis [245], [250], [261]. famous monograph, “Extrapolation, Interpolation and The section headings give a quick idea of the scope of Smoothing of Stationary Time Series, with Engineering the paper. Applications” [I], so that it is appropriate this year to I. Introduction. commemorate this event as well. [As noted elsewherein this II. A Key Linear Estimation Problem. issue, this month is also the tenth anniversary of Wiener’s III. Wiener Filters and Early Generalizations. death (March 18, 1964).] Not only was this work the direct IV. Kalman Filters. causefor the great activity of the last three decadesin signal V. Recursive Wiener Filters. estimation, but it was perhapsthe greatestfactor in bringing VI. New Algorithms for Time-Invariant Systems. the statistical point of view clearly into communication VII. Some Early Mathematical Work. theory and also . It may suffice to quote VIII. Canonical Representations of Continuous-Time Shannon’sown major acknowledgment: “Credit should also Processes. be given to Professor N. Wiener, whose elegant solution IX. Recent Results on Innovations Processes and of the problems of filtering and prediction of stationary Some Applications. ensembleshas considerably influenced the writer’s thinking X. Karhunen-Lo&e Expansions, Canonical Correla- in this field.” tions and State Models. The subject of estimation is a vast one, and most of our XI. Concluding Remarks. attention will be devoted to the particular problems of XII. Bibliography. linear least-squares estimation, or linear Jiltering as it has generally come to be called in the engineering literature. Needless to say, the choice of material and emphasis Even though least-squares estimation is clearly only a in this paper are mine; the field is a vast one and can be small part of the possible forms of estimation theory, in surveyed in various ways. My main aims are to provide the author’s opinion it is perhaps the most interesting and some perspective on presently used methods, to bring out most important part. Least-squarestheory not only pro- the significance and relevance of some relatively early, vides useful solutions to certain specific estimation prob- but often neglected, work in this field, and to illustrate lems, but it also has connections to and implications for a some of the connections between least-squarestheory and surprisingly large number of other problems, both statistical other fields. and deterministic. As some examples we mention signal In Section II we formulate the problem of determining detection [93], [291]; the calculation of mutual information the causal linear least-squaresestimate of a signal process in certain channels [303] ; the solution of integral equations corrupted by additive white noise. Although this is only [288], [292], two-point boundary value problems in many one of a large number of possible estimation problems, it is a key one in the sensethat its solution underlies that of many others. Manuscript received June 15, 1973; revised October 3, 1973. This work was supported in part by the Air Force Office of Scientific Sections III-VI describe some of the current approaches Research, Air Force Systems Command, under Contract AF 44-620- to solution of this key problem. Information theorists and 69-C-0101, and in part by the Joint Services Electronics Program communication engineers have been more familiar with under Contract N-00014-67-A-01 12-0044. The author is with , Stanford, Calif. 94305. problems in which covariance information is given about KAILATH: LINEAR FILTERING THEORY 147 signal and noise, which are usually called Wiener filtering The bibliography is of necessity rather vast, although problems. Control engineersdeal more often with prob- it could easily have beeneven larger. On severaloccasions, lems wherethe signal and noise are describedby state-space the availability of convenientbibliographies has led me to models, usually called Kalman filtering problems. Because omit various references.This is undoubtedly an injustice of differing backgrounds,mutual knowledge of these two to the authors of many fine papers, but it seemsto be generalapproaches is often small, and one of our aims is unavoidable.I haveattempted to do more justice to papers to bring out the close and useful relation that must exist published in this journal, and in fact all suchlinear filtering betweenthese two approaches.O f courseno proofs can be papers in the period 1968-1972have been included in the given here, but the main results are stated and their signif- bibliography, even though no explicit referencemay have icanceand role explained.Appropriate referencesare given beenmade to them in the text. This has also beendone for for the proofs. certain other papers appearing in other journals that I The discussionin SectionsIII-VI is fairly self-contained, feel contain some ideas or approachesthat may appeal to but at various points allusions are made to earlier results our readers. The choice is of necessity rather subjective in the mathematical literature, especially of the forties. and any omissions should be regardedas a measureof my This work is explored in SectionsVII and VIII, partly for ignorancerather than a consciousslight. the record but really becauseit ’contains ideas that in my The bibliography is organized under five subheadings, opinion have still not yet beenadequately appreciated and though the division of papers between the five categories exploited. For example, the work of Krein (1944)and of is on occasion somewhat arbitrary, partly becausethe Levinson (1947)is only now beginning to be rediscovered fields are of coursenot completely exclusive.In retrospect, and extended. Limitations of space again prevent any some reassignmentswould really have been desirable, but detaileddiscussion, but I havetried to provide someguidance I have not had the courageor the time to attempt them. for a reader interested in further study. Moreover, even a I must repeat that the inevitable limitations of time, casual reader might find some fascinating nuggetsexposed space, and personal knowledge are undoubtedly reflected here, although I should stressthat the lode is really much in this survey.The only palliative I can offer is that perusal richer. of the various referenceswill enablethe readerto learn many In Section IX, I have described in somewhat more additional facts and results that could not be covered in detail one vein that I have personally found to be very the paper and to make his own judgment of any controv- illuminating and powerful: the role of canonical or in- ersial matters. novations representationsof random processes.Again I 1I.A KEY LINEAR ESTIMATION PROBLEM havegiven only referencesto many results and applications, but I could not resist being a little more specific about Some Early History oneaspect; name ly, the connectionsto spectralfactorization From the earliest times, peoplehave been concerned with and to the so-called positive real functions of network interpreting observations and making estimates and pre- and complex variable theory. The aim is to show at least dictions. Neugebauer[370] has noted that the Babylonians one explicit connection between stochastic and deter- useda rudimentary form of Fourier seriesfor suchpurposes. ministic problems. A fact I attempt to stress in Sections As with so much else, the beginnings of a “theory” of VIII and IX is the importance of deterministic system estimation in which attempts are made to minimize various structure in the theory of random processes.This is at the functions of the errors can apparently be attributed to moment an active field of research,but one that is being Galileo Galilei in 1632[174]. Then came a whole seriesof largely carried out in control theory. There is scope for illustrious investigators, including the young Roger Cotes many more communication theorists to enter this field. (of whom Newton said “had he lived, we might haveknown Conversely, in the last section I have briefly referred to something”), Euler, Lagrange, Laplace, Bernoulli, and some recent results in information theory that have useful others. implications for estimation. The time seemsto be ripe for a As is well known, the method of least squares was fruitful symbiosis. apparently first usedby Gaussin 1795[197], though it was Section X is a brief look at the role and significance of first published by Legendrein 1805 [198]. (It is less well seriesexpansions. These should be relatively more familiar known that Adrain in America, unaware of thesedevelop- to readersof this journal, and I havetherefore concentrated ments, independentlydeveloped the method in 1808[196]). on some special, but often overlooked, aspects of such Since then, there has been a vast literature on various expansions.The sectionconcludes in fact with an indication aspects of the least-squaresmethod. A comprehensive of how certain random process ideas can illuminate the annotated bibliography of least-squaresestimation for currently active system-theoreticproblem of abstract state- random variables has been given in a report by Harter space determination. Again there is scope for fruitful [375]. (Seealso a brief surveyby Sorenson[376].) Therefore interaction with control and system theorists. The paper we shall not go into this here, but shall proceedto least- closes with the thought that it is the range and scope of squaresestimation in stochastic processes,the first studies such possibilities that has kept estimation theory vital and of which were made by Kolmogorov [207], [209], Krein active, without, it seemsto me, the above-averagedoubts [214], [215], and Wiener [l]. and misgivings that the more self-containedfield of strict- The works of Kolmogorov and Krein were independent senseShannon theory has beenexposed to recently. of Wiener’s, and while there was some overlap in the 148 IEEE TRANSACTIONS ON INFORMATION THFDRY, MARCH 1974 results, their aims were rather different. Kolmogorov, However, this breaks down in feedback communication inspired by somework of Wold [209], gavea comprehensive and feedback control problems where the signal z( *) may treatment of the prediction problem for discrete-time be influenced by past signal and noise. Therefore, a more stationary processes.Krein noted the relationship of this general assumption is that work to some early work of Szegij [200], [201], on ortho- arbitrary, t2s gonal polynomials, and extended the results to continuous Ez(t)v’(s) = o 7 t < s. time by clever use of a bilinear transformation. (We shall l (4) describe these results in more detail later (Section VII).) It should be stressedthat (4) can be introduced only with However, no special attention was paid to explicit difficulty in many of the analyses found in the literature; formulas for the optimum predictor itself. Such formulas for example, it cannot be directly handled by methods that are obviously necessaryfor applications, and in fact, certain rely on representing the signal and noise processesz(e) anti-aircraft fire-control problems led Wiener to formulate and v(e) by Karhunen-Lo&e expansions. (See Section X independentlythe continuous-time linear prediction problem for further discussion of this point.) and derive an explicit formula for the optimum predictor. To proceed with our description, let us define He also considered the “filtering” problem of estimating a K(t,s) = E[z(t)z’(s) + z(t)v’(s) + v(t)z’(s)]. process corrupted by a “noise” process. An interesting (5) nontechnical account of this work and its background and Note that K( -, a) is generally not a covariance, unless z(e) development was given by Wiener in his autobiography and v(a) are uncorrelated. However K( *, *) does determine [371, pp. 240-2621. the covariance function of the processy(e) as Wiener constantly attempted to examine and stress the R&s) = Ey(t)y’(s) = Z&t - s) + K(Q). engineering significance of his ideas and results and his (6) book [l] contains several explicit examples, which are still We shall require that R,(t,s) be strictly positive definite on generally the only ones to be found in many textbooks on the square [t,,t,-] x [to,tf]. Other assumptions will be the subject. Wiener was also conscious of the problems of that the signal process has finite expected energy and that actually building circuits to implement the theoretical K(t,s) is continuous, though both of these assumptions solutions. For example, he notes that “The detailed design can in fact be relaxed; the essential thing is really that of a filter involves certain choices of constants which must K(t,s) be “smoother” than 1,&t - s). (Seealso [327c].) be justified economically. In general, it does not pay to The problem is to determine a random variable 2(t 1tf) eliminate a small error from a quantity when there is a of the form large irremovable error in it.” Another paragraph of his tf book is entitled “The Determination of Lag and Number 2(t 1 tf) = H(t,M4 dr, to I t I tf (7) of Meshes in a Filter.” Partly because of such concerns, s to much of Wiener’s work, despite its hard mathematics, so that has had a wide influence in engineeringcircles. tr E[$t) - z^(t)][z(t) - i?(t)]’ = minimum. A Key Problem It is by now well known (see,e.g., [lo]) that such an optimum As we begin to be more specific, we are immediately linear least-squaresestimate is characterizedby the “ortho- confronted with the fact that there is a large variety of gonality” property estimation problems, even just linear ones. For example, as the reader knows, we can have prediction or filtering or E[z(t) - %t I tJ]y’(s) = 0, tq I s I tf 63) smoothing problems, in state-variable form or transfer- so that a simple calculation shows that the optimum filter function form, with additive white noise or colored noise, H( -, *) is determined by the solution of the integral equation etc. However, in my opinion there is a key linear estimation ff problem in the sense that its solution can be shown to H(t,z)K(z,s) dz = Ez(t)z’(s) + Ez(t)v’(s), underlie that of many other problems (see [298], [181], H(tYs) + s to for some examples): we have observations y(a) of a signal t, I t, s I tp (9) processz( *) in additive white noise v( *) Depending on whether tfc t,tf = t,or tf> t,we have Y(S) = 4s) + VW, toI s )I tf (1) what are called predicted, filtered, or smoothed estimates, where respectively. For convenience,we shall write 2(t1t) as Ev(t)v’(s) = Z$(t - s). (2)’ t The usual assumption on z(e) is that it is uncorrelated with If(t)= h(t,sM) ds. sto 4.) Ez(t)= 0. v’(s)(3) Using (8) and (4), the relevant integral equation forfiltering can be found as 1 Somenotational conventions: all randomvariables will beassumed to have zero mean. No special notation will be used to distinguish t scalars and matrices; primes denote transposes; ZPis a p x p identity h(t,z)K(z,s) dz = K(t,s), 5 s I I tp matrix. We could have assumed any strictly positive-definite matrix h(t,s) + to t s instead of ZP, but by a normalization we can return to the case des- to cribed by (2). (10) KATLATH: LINEAR FILTERING THEORY 149

This is a key equation in linear system theory. densities,for which K(t,s) had the form It is important to note that (10) is substantially more difficult to solve than (9). Equation (9) is a Fredholm K(Q) = Ic(lt - sl) integral equation of the secondkind and a lot is known about its properties. Several general solution methods are = i ai exp (-PiIt - 51) available, including reduction in different ways to a set of 1 linear algebraic equations and the use of various gradient ” C % exp (-Pit> exp (Pis>9 t 2 s (12a)” methods,see, e.g., [51], [37], [18], [27]. On the other hand, = I 1 (10) is a much more difficult equation becauseof the con- i % ew (-PP) exp (Bit>2 t (15) which is quite a different thing. Simple Fourier transforma- tion does not work, and we must in generaluse the cele- and {Zi} and {pi} are the left-half-plane zeros and poles brated Wiener-Hopf spectral factorization technique [2]- of S,,(s), Re zi < 0, Re pi < 0. The Wiener-Hopf tech- [4], which in fact has given (11) its name. nique (see,e.g., [4], [20], [51]) shows that this factoriza- The Wiener-Hopf equation (11) first arose in astro- tion completely determines the Laplace transform of the physics in 1894and has beenwidely studied; seeespecially optimum filter as two comprehensivepapers [234], [235], which deserve H&s) = 1 - [S,+(s)]-‘. (16) to be more widely read by engineers.On the other hand, much less is known about the nonstationary version (lo), This expression,though implicit in Wiener’s own examples, which we shall describe as being of Wiener-Hopf type.2 was first explicitly given by Yovits and Jackson [l I] and Several referencesare collected in [257], [51], [298], and by Krein [235]. some of these will be brought up as we proceedwith this Yovits and Jackson also gave a closed-form expression survey. However, it is appropriate to begin with (ll), for the mean-squareerror in the special case of uncor- which was where Wiener started. related signal and noise

III. WIENER FILTERS AND EARLY GENERALIZATIONS E[z(t) - L(t)]” = /a In [l + S,(io)] do. (17) -ccI The first explicit solutions for least-squaresestimates of stochastic processeswere given by Wiener in 1942 [l] Since (17) does not require explicit knowledge of the under the assumptions of a scalar observation process optimum filter, it can be usedto help decideif an optimum (p = I), a semi-infinite observation interval (to = -co), filter is worthwhile; severalsuch applications in modulation and jointly stationary signal and noise processes.W iener theory have been discussed by Van Trees [40], [47], used a variational argument to determine the optimum Stiffler [46], Lindsey [52], and the referencestherein. estimate and was delighted to find that what was required Recently Yao [48], [49], and Snyders [54], [54a], have was the solution of the Wiener-Hopf equation (ll), a extendedthis formula to cover certain problems with non- problem to which Hopf and he had ten years earlier con- white noise, and signals or noise with nonrational spectra tributed an elegant solution [2]. The method applies to (seealso Prouza [53]). quite general kernels, but, as Wiener himself noted, it Wiener’s theory has mainly beenapplied to the optimum took its simplest form for processeswith rational spectral choice of various components in modulation systems, as we have noted previously. We should explicitly mention

2 One might argue that (10) is just a family of Fredholm equations, indexed by f. This is true, but the major problem lies in showing that 3 This is not the most general form except when the poles {pi} in the solutions {I@,.)} can be satisfactorily fitted together, for example (15) are distinct; the generalization is not difficult but is notationally to makeh(.,.) square-integrablein both variables. cumbersome and so has been avoided. 150 IEEE TRANSACTIONS ON INFORMAnON THEORY, MARCH 1974 a notable early application to the design of loop filters in of the first to tackle this problem, and he presented some phase-locked loops [9]. Related applications were also useful recursive algorithms [61], [80], [I.%], that were made in optimal control (see[13], [24], and the references soon recognized and applied, especially by a group at the therein). Bell Laboratories, who added various contributions of their own [84]. For different reasons, growing out of his Some Generalizations successfulapplication of state-spaceideas in deterministic The Wiener-Hopf equation was soon extended to cover problems, Kalman developed [64], [68], [69], a somewhat estimation of stationary processesgiven only over a finite more restricted algorithm than Swerling’s, but it was one observation interval (to > - oo), and more generally to that seemedparticularly matched to the dynamical state- cover the estimation of nonstationary processes.However, estimation problems that were brought forward by the while it was not hard to discover that the general equation advent of the space age. Groups at the NASA Ames was of the form (IO), there was no general method for Laboratory [71], [72], and at the M.I.T. Draper Labora- solving such equations, and therefore a host of special tories [70], [83], took up Kalman’s ideas and developed results and techniques were developed. We especially them into programs that were successfully used in many mention an early paper by Zadeh and Ragazzini [6], spaceapplications [139], [145], [138]. which we shall encounter again in Section VIII. For We shall examine the in some detail in processes with rational power-spectral densities, fairly Section IV. The reasons for our attention go beyond the explicit results were obtained by Yaglom [lo], Hajek [243], specific algorithm and are more broadly connected with the Rozanov and Pisarenko [31], Whittle [32], Helstrom [33], importance of dynamical structure in data-processing and Slepian and Kadota [43]. algorithms. Unfortunately communication engineers and Some solutions were also obtained for nonstationary information theorists have lagged behind control engineers processes [6], [7], [17], [21]. The most useful of these in appreciating this fact, though, as pointed out in Wong’s were by Shinbrot [14], [25], who however found it neces- recent survey [327], the gap is closing. sary to restrict attention to K( +,a) of the form IV. KALMAN FILTERS

$ ai(t>bi(s>, . t 2 S (184 Kalman [64], [68], [69], changed the conventional I K(Q) = formulation of the problem by giving, not the covariance

$ ai(sPi(t>, t I s. (18b) of the signal process, but a “model” for it as the output \ of a dynamical linear systemdriven by white noise. Specific- This is clearly a generalization of (12), but its true signif- ally, he assumed that the signal process z(a) could be icance was not appreciated until later (see Section V). described by Despite such important contributions, however, too large t 2 to a part of the literature dealt with minor variations and 4) = fww W> special cases,so much so that Elias felt compelled to edito- i(t) = F(t)x(t) + G(t)@), x(t,) = x0 (19b) rialize in the IEEE TRANSACTIONS ON INFORMATION THEORY where x(.) is an 12x 1 “state” vector and u(s) is an m x 1 in 1958that it was time to stop writing “two famous papers.” random input such that One was “The Optimum Linear Mean-Square Filter for SeparatingSinusoidally-Modulated Triangular Signalsfrom Eu(t)u’(s) = Q(t)@t - s) (19c) Randomly-Sampled Stationary Gaussian Noise, with Exoxo’ = lx,, Eu(t)xo’ = 0, t 2 t,. (19d) Applications to a Problem in Radar.” (The other was “Information Theory, Photosynthesis, and Religion,” a Also the matrices F(a), G(e), H(e), Q(m), and II, are title suggestedby D. Huffman.) assumed known and continuous. In trajectory estimation Furthermore, there were other reasons for being dis- (19b) could be the “linearized” equations of motion satisfied even with the most significant of the results of this describing the evolution of the position and velocity vector period. x(e) subject to the wideband perturbations u(v) caused by i) They were rather complicated, often requiring the random drag, gravitational uncertainties, etc., and the solution of auxiliary differential and algebraic equations initial uncertainties x0. and the calculation of roots of polynomials. The assumption that x0 and u(.) are uncorrelated not ii) They were not easily upd.ated with increasesin the only is physically reasonable but also has the important observation interval. consequencethat the process x(e) is now a wide-sense iii) They could not be conveniently adapted to the vector Markov process [225]. Kalman also assumes that the case (p > 1). “plant” noise u(a) and the “observation” noise v(e) in the These last two difficulties came immediately to the fore observedprocess in the late fifties in the problem of determining satellite y(t) = z(t) + v(t) = H(t)x(t) + u(t) orbits. Here there were generally vector observations of (20) some combinations of position and velocity, and also can be correlated, but he restricts the dependenceto being there were large amounts of data sequentially accumulated of the form with each pass over a tracking station. Swerling was one Eu(t)u’(s) = C(t)cs(t - s) (21) KAILATH: LINEAR FILTFNNG THEORY 151 which is consistentwith our more generalearlier assumption where {@,T,H,Q,R,C,II,} are known matrices. The Kalman (4) on the one-sided dependencebetween z(a) and a(a). filter solution is The equations (19)-(21) describe the Kalman model for jzi+lli = @i2ili-1 + Ki(Ri”)-l&i, R,,-1 = 0 the estimation problem. The Kalman filter is not given (294 by an explicit formula for the impulse responseof the Ei = yi - 2ili-1, 2ili-1 = Hijlili-1 (29b) optimal filter, but as an algorithm suitable for direct Ri” = EEiEi’ = HiPili_lHi’ + Ri, evaluation by analog or digital computers Pili-1 = E~ilI-1~i’I-1 (29~) .2(t) = H(t)A(t) (22) where Ki = ~Pili-,Hi’ + riCi (294 R(t) = F(t)A(t) + IL(t)@), a&)> = 0 (23) where the IZ x IZmatrix E(t) = y(t) - .2(t) = y(t) - H(t)lZ(t) (24) f’ili-1 A E[Xi - ~i~i-l][Xi - Rili-l]’ K(t) = P(t)H’(t) + G(t)C(t) (25) can be computed via the so-called Riccati difference and the n x y1matrix P(a) is the covariancematrix of the equation errors in the state estimates P.1+1/i = ~iPi,i-l~i’ - Ki(Ri”)-lKi’ + r,Qiri’, P(t) = Eqt)n’(t), z?(t) = x(t) - R(t). 06) P,,-1 = rr,. (30) P( *) can be computed as the unique solution of the non- The similarity of this set of equations to (22)-(27) is clear; linear differential equation in fact, the latter can be obtained from (28), (29) by a limiting procedure[68]. Note that becauseof the presence P(t) = F(t)P(t) + P(t)F’(t) - K(t)K’(t) + G(t)Q(t)G’(t), of the PiI i- ,-dependentterm RF, the discrete-timeformulas P(to) = II,. (27) are somewhat more complicated than the continuous- time ones, or even than discretized versions of the con- This equation is a matrix version of the familiar Riccati tinuous-time formulas. Moreover, in discrete time, there equation, first introduced by Francesco,Count Riccati, in is no particular need to assumethat the covarianceof the 1724[I951 and sincethen often encounteredin the calculus additive-noiseis nonsingular,and we havetherefore written of variations. It seemssurprising that a nonlinear equation it as Ri rather than as I. Note that we could even take Ri should arise in a linear problem and be regardedas advan- to be zero without affecting the formulas (29), (30). This tageous.However, the point is that it is a difSerentia1 iqua- is not possible in continuous time, where problems with tion with known initial conditions, and such equations are no nonsingular white noise component require more care comparatively easyto solve on a digital or analog computer (cf. [90], [122], [I811 and the referencestherein). The becausethey involve only the iteration of relatively simple study of state-estimationproblems where there is no additive updating operations. This circumstance is indeed a very noise has recently uncovered some interesting differences happy one, becauseRiccati equationscan be introduced to between discrete- and continuous-time estimation and solve general linear two-point boundary value problems, control problems (cf. [ 1701,[ 1791). which arise often in various fields, see,e.g., [146], [247]. By now the Kalman filter is widely known and widely used, notably in aerospaceengineering; see, for example, Discrete-Time Results the papers and referencesin the survey volumes [ 1391 The discrete-time Kalman filter results were actually, and [145]. Furthermore, (19)-(30) have turned out to have the first to be obtained, partly becausethe major system- a fundamental role in understanding the structure and theory activity in the mid-fifties was in the field of sampled- properties of dynamical systems, in many stochastic and data systems,which arose when modern digital computers deterministic problems. We may refer to work in quadratic were put into control and communication links. Sampled- optimization, stability theory, network theory, covariance data filters for least-squaresestimation were given by and spectral factorization, stochastic control, sensitivity Franklin [S], Friedland [60], and others. While Friedland analyses,signal detection, etc. (The factorization problems used infinite triangular matrices, Blum [59], [67], studied will be briefly discussedin Section IX.) We forebear from recursive filters for limiting the storage requirements of giving specific references,but shall merely note somerecent such algorithms. We have already noted the work of books in which such topics are covered[106], [128], [129], Swerling [61]. Kalman’s contribution was to introduce [132], [135], [143], [153], [157], [177]; it should be noted state-spacemode ls. He assumedthat that Kalman himself launchedthe study of severalof these

yi = Zi + V’ Zi = HiXi, i20 (28a) questions. I9 However, despite all the good that has come out of this Xi+1 = ~iXi + I-pi (28b) t heory, there have been many excessesand oversights in EuixO’ = 0 z EvixO’, ExOxO’ = l-I,, (284 its pursuit, partly reflected in an incredible volume of papers.Although some of this literature was necessaryand Euiuj’ = Qisij, EUiVj’ = Risij, EUiVj’ = CiSij worthwhile, a good fraction of it must be attributed to the (28d) general expansion of technological and especially space 152 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974 activity that Sputnik and the Apollo project brought to In my opinion, it was the peculiar atmosphere of the America scene,in terms both of researchand development sixties, with its catchwords of “building research com- contracts for industry and of the rapid expansionof graduate petence,” “ training more scientists,” etc., that supported education in universities. Another factor was that this the uncritical growth of a literature in which quantity and period coincided with the emergenceof what is now called formal novelty were often prized over significance and modern control theory, which was being built upon the attention to scholarship. There was little concern for fitting rediscovery of the importance, emphasized in the mid- new results into the body of old ones; it was important to fifties by Bellman, of the notion of “state.” The fact that have “new” results! Wiener had someinteresting comments the Kalman filter dealt largely with state-estimation made it on the sceneas early as 1956 [371, p. 2711. comparatively easy to include it in books and courses in Despite this unfortunate historical context, one should state-variable theory, without having to go very deeply not underestimate the significance of. the Kalman filter, into estimation theory or even into Wiener filtering. which, to repeat, is more than just a solution to a specific Although several excellent examples of the clever and estimation problem. As a tribute to this work, I now attempt successfulapplication of control theory ideas to estimation to add my view concerning the slight controversy that prob1emscanbefound,e.g.,[74],[111],[128],[133],[148], exists as to its origin. [84a], [193a], the majority of contributions have suffered from having too narrow a base. I feel it is unfortunate that Historical Notes on the Kabnan Filter a whole generationof control engineershas grown up whose Recursive solutions to least-squaresproblems are not of only knowledge of estimation theory is through Kalman recent origin. Gauss was forced to invent them to handle filtering. The work of Kolmogorov, Krein, Wiener, the vast calculations he undertook in order to help astron- Karhunen, Levinson, Levy, Hida, and others (see Section omers locate the asteroid Ceres. His work dealt with the VII) on many still important aspects of estimation has discrete-time model (28), where, however, the state Xi was generally been neglected, not without loss. On the other constant (i.e., CDand I were zero). Given hindsight one can hand, I should also state that in my opinion, the potential generalizethis work to handle dynamics and, for example, of the results and insights in the just-cited control-theoretic Rosenbrock has done this in his interesting note [92], ideas has also not yet been fully exploited. (see also a note by Genin in [139]). Incidentally, Whittle It may be useful to reinforce the previous comments by in 1963 [32, p. 353 pointed out that the classical Wiener giving an illustration from control theory itself. As Kalman filter could be rewritten in a recursive form as a differential has often stressed[68] the major contribution of his work equation, and he also studied some nonstationary ex- is not perhaps the actual filter algorithm, elegant and tensions [95].4 useful as it no doubt is, but the proof that under certain However, the general case was first studied by Kalman technical conditions called “controllability” and “observ- [64], who combined state-spacedescriptions and the notion ability,” the optimum filter is “stable” or “robust” in the of discrete-time innovations, as described for example in sensethat the effects of initial errors and round-off and Doob [225, especially sects. XII.1 and X11.31, to give a other computational errors will die out asymptotically. complete and elegant solution. Kalman’s solution also However, the known proofs of this result are somewhat introduced a nonlinear recurrence equation (30) which difficult, and it is significant that only a small fraction of the was the discrete-time counterpart of a Riccati differential vast literature on the Kalman filter deals with this problem. equation he had already encounteredin studies on quadratic (Significant recent contributions have been made in [126], minimization problems in optimal control [63]. From this [ISS], [167], [151], [188].) Th e concepts of controllability it was an easy step to obtain the continuous analog of the and observability actually first arose as technical conditions discrete-time equation for the least-squares estimate, in certain optimal control problems [ 1191.They also enter especially since Kalman also recognized a “duality” be- in a fundamental way [76], [119], in characterizing ir- tween the filtering and control problems. An immediate reducible transfer functions and minimal state-space bonus of his analysis of the steady-state behavior of the realizations of linear systems.Kalman isolated thesenotions Riccati equation in optimal control was the important and, for conceptual and other reasons, also de’fined them result that, under the previously mentioned technical in terms of certain idealized but simple control problems; conditions of “observability” and “controllability,” the e.g., he observedthat controllability is equivalent to being finite-time solution converges to a unique steady-state able to take an arbitrary initial state to the origin [62]. solution, independent of the initial condition and of errors However, such definitions are only somewhat incidental, introduced during the computation. [This stability question and their main justification lies in the theorems that can be did not arise in the classical Wiener problem, which roughly proved with them. Nevertheless, many textbooks deal speaking, corresponds to a Kalman filter problem with the largely with examinations and elaborations of the definitions F(-) matrix in the state-spacesignal model (19) constant and of controllability and observability, with hardly a mention stable (i.e., having eigenvalues with negative real parts). of the associated theorems as being the reasons for this great attention. Information theorists may recognize 4 Whittle [32], [95], used difference equation (autoregressive- similarities to the fate of the words information and moving average) models, and interestingly enough it is only recently that several advantages of such models have been fully appreciated entropy! (see the discussion at the end of Section IX). KAILATH : LINEAR FILTERING THEORY 153

Therefore, the signal variance goesto a steady-statevalue functions by one in which state-spacemode lswere specified and it can be shown that SO does the (always smaller) for the signal and noise, and it seemedto many that this variance of the error. It is a striking fact, at least without difference in specification was the chief reason for the further thought, that the error variance goes to a finite successof the Kalman filter. Therefore,it was thought that steady-statevalue under the structural conditions of con- to obtain similar computationally efficient recursive trollability and observability even if F is unstable, so that solutions for problems with covariancz information one the signal variance becomesunbounded.] should first deducestate-space mode ls consistent with the In view of thesefacts it seemsfair to usethe nameKalman given covariance specifications, to which the Kalman filter for the continuous-time algorithm as well as for its solution could then be applied. Unfortunately, the problem discrete-time analog. The continuous-time filter is often of determining such state-spacemode ls for nonstationary also called the Kalman-Bucy filter, or sometimesthe Bucy- processes,which include stationary processes observed Kalman filter (seePart I of [113] and also [136]). Bucy’s over finite time intervals, is quite difficult. Most known coauthorship in Kalman and Bucy [69] grew out of some solutions require an amount of work roughly equivalent early work by Carlton and Follin [56] and Hanson [57], to that involved in solving the Riccati equation for a at the Applied Physics Laboratory of Johns Hopkins process with an already known signal model. Thus it University, in which algorithms of the Kalman type were appears in effect that the price paid for starting with obtained for some specialcases. Kalman ’s discovery of the covariance information rather than model information is general continuous-time formulas was apparently in- essentiallya doubling of work. dependentof this, beingbased as we havenoted on analogies However, this is not true. With a proper formulation, with optimal control [63]. Later Kalman obtained a direct the same amount of computation suffices to solve either derivation by applying a limiting argument to the discrete- problem. More specifically, suppose that we return to time formulas [68]. Bucy’s important contribution to the Shinbrot’s covariance specification (18), which we shall joint paper [69] was a derivation using the finite-time rewrite more compactly in matrix notation as Wiener-Hopf equation (10). It should also be noted that K&s) = A(t)B(s)l(t - s) + B’(t)A’(s)l(s - t) (31) Siegertin 1953-1955[58] had already shown in a different context that finite-time Wiener-Hopf equations could be where A(*) and B’( *) are p x n matrices and I(*) is the solvedby reduction to a Riccati differential equation. Heaviside unit step function. The meaning of this assump- It is also not so widely known that, independentlyof all tion, which as stated earlier Shinbrot was forced to make this, Stratonovich in the USSR had begunto study recursive for purely mathematical reasons,is that this is the form solutions for nonlinear least-squaresestimates of the states that K(t,s) = E[z(t)z’(s) + z(t)v’(s) + v(t)z’(s)] must take of a nonlinear dynamical system driven by white noise. for the processesz(a) and v(e) in a state-spacemode l In this connection, it is natural to consider the linearized (19), (20). problem and its solution, and in so doing Stratonovich Before proceeding,it will be convenientto rewrite K(Q) in 1960 also obtained the Kalman filter equations ([65], in the form [141, p. 6751). However, no stability analysis was undertaken. K(t,s) = M(t)cD(t,s)N(s)l(t - s) We should also mention that with hindsight one can + N’(QD’(s,t)M’(s)l(s - t) (32) specialize certain recursive formulas obtained in 1958 by Swerling [61] for nondynamical systems to again obtain where @(a;) is a so-called state transition matrix defined the Kalman filter. Swerling did not actually explicitly [106] as the unique solution of the linear differential consider this special case, nor did he anywhere have a equation Riccati equation. However, as noted earlier, Swerling’s d@(t,s) - = F(t)Q(t,s), @(s,s) = z (33) papers [611, [1551, contain several useful and interesting dt ideas, for linear and nonlinear filtering, many of which have been widely overlooked. [One such idea will be and F(a) is an arbitrary matrix that can be chosen con- encounteredin Section X.1 veniently for the problem at hand. There is no loss of As a final comment on this topic, we may note that in a generality in doing this becauseQ(. , .) is nonsingular and little known 1944paper [213] (unfortunately not cited in obeys cD(t,s)= @(t,t,,)@(&,,s)for an arbitrary t,. When his 1953book, but added by Yaglom to the 1956Russian F(e) is constant translation) Doob made explicit and effective use of linear Q(Q) = exp F(t - s) state-variable models to study processeswith rational spectraldensity. This paper (seealso [219]) contains several 4 I + F(t - s) + F2 (t - sj2 + . . .* (34) 2! formulas and results that were rediscoveredmuch later in the state-spaceliterature. For a given F(a), the correspondencebetween (29) and (30) is establishedby the relations (with to arbitrary) V. RECURSIVE WIENER FILTERS Kalman replaced the conventional specification of the A(t) = WYW,), B(t) = w,,tw(t) filtering problem in terms of signal and noise covariance M(t) = A(t)@(t,,t), iv(t) = cD(t,t,)B(t). 154 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974

Now it is shown in [I491 (see[292], [105], [144], for earlier case, where A( a) and B(a) are exponential functions, we efforts) that the estimate 2(e) can be calculated by the have basically the classical network-theory problem of following recursive algorithm : approximating a time function by a sum of exponential functions (or, in frequency-domain terms, approximating w = ~WW (354 a function by a ratio of polynomials). A modern version where of this broblem is that of obtaining minimal state-space realizations from measurementsof transfer functions. There d(t) = FWV) + W)b(t) - M(tM(t)l, 4(to) = 0 are now several methods available for doing this, and in Wb) fact research into more efficient methods is still going on (see, e.g., [129], [360], and the referencestherein). How- K(t) = Iv(t) - C(t)M’(t) (364 ever, this is only for the stationary case. Although certain Ii(t) = F(t)C(t) + E(t)F’(t) + K(t)K’(t), qt,> = 0. analogous procedures can be devised for the nonstationary case [112], in generalit is a difficult matter even in network Wb) theory to obtain time-variant system realizations. The equation for E(e) is again an IZ x n nonlinear matrix For this reason, and also because attention is shifting differential equation of Riccati type. It is different from the away from aerospace problems, there is renewed interest Riccati equation (26) of the Kalman filter, though it is in time-invariant models. Such models have generally been closely related [149]. As expected, (32)-(35) reduce to the only ones studied by statisticians and communication the Wiener filter formulas (1l)-(16) in the special case engineers,and recent researchin control and system theory p = 1, t, = - co, and K(t,s) a function only of It - sl. has taken a big swing in this direction. Thus we have now found the recursive generalization of the Wiener filter. To put it another way, we can now see VI. NEW ALGORITHMS FOR TIME-INVARIANT SYSTEMS how to make Shinbrot’s integral equation solution recursive. When the parametersin the Kalman state-spacemodel are A proof of these results is outlined in (128)-(133)of Section constant (time invariant), it has recently been discovered IX. (The discrete-time version of these equations can be that one can obtain recursive solutions without going found in [180].) through a Riccati equation, and in several problems it is The important point is that the equations for C( *) and possible to obtain significant computational advantages z^(*) can be directly written down from the covariance thereby (cf. [ 1661,[ 1831,[ 1841).One reason for presenting specifications without first having to determine a state- this result is that a special case of it is closely related to space model. Thus both specifications, in terms of covar- algorithms invented in 1943-1947 by the astrophysicists iances or in terms of state-space models, are seen to be Ambartsumian (USSR) [21 l] and Chandrasekhar (USA) equivalent, not just in that they give the same final answer [217], [220], to solve a class of Wiener-Hopf equations by (because,of course, they must), but in that their solutions reduction to nonlinear differential equations. This reduction involve comparable amounts of work. The choice between was actually sought to obtain better numerical procedures, them lies purely in whether state-spacemodels or covariance the same fundamental motivation underlying the work a specifications are more readily at hand. This fact is still not decadelater of Swerling, Stratonovich, and Kalman. widely appreciated and the literature contains many dis- We shall start with the general state-spacemodel (19), cussions of attempts to “identify” state-spacemodels from (20), wherenow F,G,H,Q,Care assumedto be time invariant. covariance data so as to be able to use a Kalman filter. Then it has been shown [183] that the linear least-squares Nevertheless, “modeling” is, as in all subjects, a thorny estimate of z(s) can be computed via the equations problem and we should say a few more words about it 2(t) = HA(t) here. State-space models are often at hand in aerospace (374 problems, where we may have enough information to write i(t) = F$(t) + K(t)[y(t) - Hi(t)], R(t,) = 0 (37b) down the equations of motion, whether they be time invariant or time variant. However, the choice of the which are (cf. (22), (23)) as in the Kalman filter, except proper number of states to model a given problem ad- that now K(a) need not be computed via a Riccati-type equately is not always an easy one. In many problems of equation (27), but through the equations industrial process control and communications, it is R(t) = L(t)SL’(t)H’, K(t,) = l&H’ + GC generally impossible to write down state equations (as is clear if we try to do so for a large power grid, or chemical (384 plant or a telephone-line channel) and recourse has to be i.(t) = [F - K(t)H]L(t), Wd = J5l WI had to terminal measurements,for example of the covariance where S and the initial condition matrix L, are found as function or power spectrum of the channel output. Now follows. Let covariance estimation is itself a vast subject, but even if we assume that good estimates are available, K(t,s) will be D A PI-I, + II,F’ + GQG’ available only as a numerical function of t and s and not - (I&H’ + GC)(l&H + CC)’ (39) in the factored form (31) or (32); getting the functions {A(*),B(*)} or {M(*),@(*;),N(*)} involves a further step and supposethat of approximation. How can this be done? In the stationary rank of D = tl, CI I n. (40) KAILATH : LINEAR FILTERING THEORY 155

SinceD is symmetric, we can write it via a standardnumerical which should be comparedwith the Riccati-type algorithm procedure(called the LDU decomposition, seee.g., [387]) (35), (36) of the previous section. as We can now point out a closerelationship to somefamous D = LJL,’ (41) equations obtained in astrophysics in connection with a Wiener-Hopf equation (10) with K(t,s) of the form whereLO is an n x c1matrix and S is the a x a “signature” 1 matrix of D K(Q) = zc(Jt - sl) = exp (-It - sla)w(cl)da (46) s S = diag {l,l;..,l,-I,-l;..,-I} 0 for a certain weighting function w(a). Ir. 1947,Chandrasek- with as many ones as D has positive eigenvalues.For har [217] showed that the solution ;ould be obtained in reasonsthat will be given later, we shall say that the non- terms of two functions, now generally known as Chand- linear differential equations(38) are of Chandrasekhar-type. rasekhar’s X and Y functions, obeying the simultaneous This new algorithm determinesK(.) directly via the solution nonlinear differential equations of n(p + LX)nonlinear differential equations for the n x p am4 matrix K(.) and the n x CI matrix L(e). In the Kalman ~ = - Y(t,u) ’ Y(t,LW(~) &’ (47) filter solution we have n(n + I)/2 equations for the com- at s 0 ponents of the matrix P(e), from which K( *) must then be 1 aw4 found as P(*)H’ + GC. ~ = -aY(t,a) - X(t,or) Y(t>P>NB>dP (48) In this generality, there may not be much to gain by at s 0 choosing one algorithm or the other. However, there are X(O,u) = 1 = Y(O,a), O

D = -(nH’ + GC)(TjiH’ + .GC)’ and rank D 5 p. With some small effort, the reader should be able to see that theseequations are essentiallythe sameas (37), (45a), (43) (45b) if we make the assumptions Assuming for definitenessthat the rank is p, the number of outputs we can take S = -Z and L, = iiH’ + GC, so W(~) = ~ cli 6(a - cli), Ui 2 0, -F = diag (~1~;* .,a,}. that no special factorization is neededto specify the equa- 1 (52) tions, which now comprise 2np nonlinear equations comparedto n(n + 1)/2 for the Kalman filter. Therecan be This is why the nonlinear differential equations (38) are a considerablecomputational saving when n >> p. said to be of Chandrasekhartype. We should note that the X If we assumenot a state-spacemode l for the stationary and Y functions were already introduced by Ambartzumian signal processz(a), but only knowledge of the covariance in [211]. Chandrasekhar [217] first gave the differential function of z(a), we shall be closer to the classical assump- forms, which Bellman, Kalaba, and their colleaguesbegan tions of the pre-Kalman theory, and will be able to bring to numerically exploit in the early sixties. out someinteresting connections.Thus if (cf. (32)) Discrete-Time Models R,(t,s) = Z&t - s) + MeF”-“‘N’l(t - s) Analogous results can be obtained for discrete-time + N’eF’(S-f)M’l(s - t) (44) problems but the formulas are somewhatmore complicated. This happenedalso with the Kalman formulas but the then it can be shown [183] that the algorithm is just (37) difference is even more pronounced here. Once again, with (instead of (38)) we use the sameestimator equation as in the Kalman filter

k(t) = -L(t)I!(t)H’, K(t,) = N (45a) Ri+l[i = @2ili-l + Ki[Ri”]-‘i?iy A,,-, = 0 (53a) i(t) = [F - K(t)H]Y(t), L(t,) = N (45b) Ei = yi - HRili-1 Wb) 156 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974 but Ki is found not via the Riccati difference equation (30), close connection between Chandrasekhar’s and Levinson’s but via the equations [184] results. Actually, both can be derived by using certain “invariance” principles, which consist basically of “in- Ki+l = Ki + ~Li[Ri']-lLi'H' (544 variantly imbedding” the given problem in a family of

Li+l = [~ - Ki+,[R~+,]-lH]Li Pb) similar problems. This will be explained in some detail in Section VII. Rf, I = Ri" + HL,[R;]-'L,'H' (554 We should also note that the techniques used to obtain R;,, = Ri' - L,IH'[R,"]-'HLi (55b) (38)-(41) and (54)-(58) can also be applied [183], [184], where to other problems where Riccati equations arise, even to general two-point boundary value problems whose solution K, = OoH' + l-C, ROE = R + Hl-I,H' (56) is well known to be obtainable via a nonsymmetric Riccati and Lo, R,' are found by factoring the matrix equation. Furthermore, by using the ideas discussed at the end of Section IX, the results can also be extendedto certain D A N-I,@' + rQI-' - II, - K,[R,]-'K,' (57) classesof time-variant and even nonlinear models. Applica- as tions to infinite-dimensional (distributed parameter) prob- lems seem to hold special computational promise since D = Lo [MO+ i-1 Lo', M, > 0, Me < 0. (58) operator Riccati equations on Hilbert x Hilbert spacesare replaced by equations on Hilbert x R" spaces. The matrix Lo has dimension n x a, where a = rank of D. As stressedin [ 1841,[ 193d], a significant aspect of all the Then Lo is the initial value for (54b), while results of this section is that a reevaluation is timely of M+-' 0 the almost total concentration on the Riccati equation in the R,' = 0 Mm-' I ' sixties. The rest of this paper hopes to describe some of the concepts that will underlie such a reexamination. These The form of (59) suggeststhat we define concepts have actually been available for quite a while, but, M, = CR;]-' as stated before, they seemto have beengenerally neglected, perhaps as historical curiosities. However, many of the in which case (55b) gives the equation results of Sections V and VI would probably not have been Mi+l = Mi+ MiLi'H'[Ri"]-lHLiMi. (60) developed without awarenessof the important role of the We could obtain other variations by updating [R,"]-1 Wiener-Hopf equation and of spectral factorization in this directly. The best choices seem to be those we have given; field. Our survey of these various ideas can only be partial, a count of the number of operations, which is more signif- but it is deliberately also somewhat tutorial so as to aid the icant in discrete time than the number of equations, shows interested reader in making a closer study of the many that all forms of the new algorithm involve less computation referencesthat will be noted later. than the Kalman filter. Recently, other variants have been found in which specific equations as in (54) or (60) are VII. SOME EARLY MATHEMATICAL WORK ON LINEAR replaced by the specification of successiveorthogonalizing LEAST-SQUARES ESTIMATION transformations (e.g., of the Householder type) to be ap- The adjectives in the title might seem strange to some- plied to certain data arrays [ 193~1,[193d]. These forms are one who has gone through the previous sections; it might intimately related to square-root estimation algorithms (see also seem a bit presumptuous considering how often the Wal, C128a1,U481, and the referencestherein), and to the work of Wiener has been mentioned so far. Nevertheless, ideas of canonical spectral factorization (shadesof Wiener, as stated by Masani [255] in the special Wiener issue of again). This last topic is further discussed at the end of the Bulletin of the American Mathematical Society, the Section IX. portion of Wiener’s work [l] that we have described does As in continuous time, in various special cases the not have “the theoretical strength and completenessof that algorithm can be simplified further, e.g., when II0 = 0 of Kolmogorov.” Wiener became aware of this himself or when II, = fi = an@’ + IQI’. In the latter case, when he tackled the multivariate and nonlinear least- the processesz and y are stationary, and it can be shown squaresproblems. Masani writes that “Wiener adopted the that the relevant equations are closely related to the work [Kolmogorov] Hilbertian approach in his later papers of Levinson in 1947[218]. Curiously, the algorithm for the under the stimulus of his younger collaborators.” continuous-time stationary process case is also related to Before embarking upon our examination of the more work done in 1947, namely that of Chandrasekhar [217]. mathematical work of Kolmogorov and his successors,we While Chandrasekhar’s paper was in astrophysics and should perhaps reassure the reader that the mathematical therefore somewhat inaccessible, it is unfortunate that level of our presentation is not going to take a big jump. Levinson’s paper, which was reprinted as an appendix in “Deeper” mathematics, or even “more abstract” math- Wiener’s monograph [1], has been somewhat overlooked ematics, does not necessarily entail more formidable in the literature of communication and control theory, mathematical “language.” It is quite possible to present though it has been widely used by geophysicists [79], [96], the basic ideas of the deeper mathematics in a physical [1 lo], and very recently in speech analysis [ 1641, [173]. way, and in fact it is no longer a novelty that often the Since they solve similar problems, there must clearly be a deeper mathematics is closer to physical constructs. The KAILATH: LINEAR FILTERING THEORY 157

Schwartztheory of generalizedfunctions is a good example, complete efforts, and Masani [255] comments that “so where the abstract topological notions of generalized thorough had been Kolmogorov’s treatment of univariate equality and convergenceare more closely tied to the prediction in the discretecase that therewas little left to do.” mechanismsof physical measurementsthan are the classical Kolmogorov first noted that, though Wold’s decom- pointwise or Lp definitions. position was stated as an existencetheorem, it becomesa prediction formula as soon as the deterministic process The Work of Wold (Z938) {$(a)} and the coefficients {bi} of the moving-average In 1938,just a few years after Kolmogorov had put the process(so named by Kolmogorov {s(e)}) are fixed. Thus theory of probability and stochastic processeson a sound supposewe know that $(e) is identically zero. Wold was generalfooting [203], Wold presenteda Ph.D. dissertation aware, but did not explicitly state !n his theorem, that on discrete-time stationary processes.This dissertation, E(t) could be uniquely determined by {y(t), y(t - 1); * .}. now availablein book form [206], contains severalinterest- In fact, Wold essentially constructed the E(.) sequenceby ing results, of which we mention only a very few, related to successiveGram-Schmidt orthogonalization, for which our theme. For example, Wold already used the idea this property is obvious. (We say “essentially” because proposed by Frechet in 1937 [205] of regarding random Wold was dealing with an infinite sequence {y(t), variables as elements of a metric spacewith the distance y(t - 1);. *} and a possibly nonzero deterministic $(a), betweentwo elementsbeing the varianceof their difference. so that a careful double limiting procedurehad to be used.) This geometric interpretation made it natural to interpret This property was explicitly introduced and exploited by least-squaresestimation as projection onto a subspace.It Kolmogorov as follows. By Wold’s theorem took many years for this natural idea to penetrate the engineeringliterature, where even in the sixties, strenuous y(t) = hoe(t) f blE(t - 1) + b2e(t - 2) + . . * (60a) efforts were made in many papers to avoid using the so- where the {s(t), s(t - 1),* * *} are uncorrelated random called “orthogonality” condition for least-squaresestimates. variables that can be computed from {y(t), y(t - 1); * *} We may mention in passingthat Wold was influencedby the by linear operations.Also work of Frisch [202], where as Wold states, “matrix calculus was for the first time systematically employed y(t + 1) = boE(t + 1) + [ble(t) + b2E(t - 1) + . . *I. in statistics.” In 1969Frisch receivedthe first Nobel Prize in Economics. However, as just noted, the terms in the square bracket One of Weld’s major observationswas that it simplified are completely determined by knowledge of past y(.); calculations to replace a sequenceof correlated random moreover, these terms are not correlated with s(t + 1). variables by an “equivalent” sequenceof uncorrelated Therefore,we have variables. He also noted that certain processescould be j(t + 1 1t) I linear least-squaresestimate of “singular” in that their future values could be predicted exactly from knowledgeof their past values. Such processes y(t + 1) given {y(t), y(t - 1); . .1 are nowadays, following Doob [213], called deterministic = b,&(t) + b2E(t - 1) + . * *. processes.These various ideas were combined into the following fundamental result [206, p. 891. This solvesthe prediction problem. Here

Let y(t) be a finite-variance stationary discrete- bo.z(t + 1) = y(t + 1) - $(t + 1 I t) time process.Then there exist threejointly stationary so that one may call s(t + 1) the “new information” or processes MO,WW)~ the “innovation” in the processy(.) at time t + 1, and 0 y(t) = x(-l + +(*I; the processE(*) may be called the innovations process of ii) $( *) and x( *) are uncorrelated; y(e), a name that was apparently first used by Wiener and iii) $(a) is deterministic and unique; Masani in the mid-fifties (personal communication in iv) s(a) has uncorrelated components, E&(i)&(j) = 1968from P. Masani). (Seealso a 1960paper by Cramer

dij; [268].) We shall seethat such processesplay a fundamental v) x(t) = b,&(t) + b+(t - 1) + b&t - 2) + * . .) role in our understanding of the process y(e) in both wherexb’ c co. discrete and continuous time (SectionsVIII and IX). So far we have chiefly an easy application of Wold’s This decomposition is now called the Wold decomposition theorem. Kolmogorov went on to deepenWe ld’s theorem and it has been widely used and generalized,especially in by relating it to properties of the so-calledintegrated power functional analysis [245], [250], [261]. spectrum of the processy(s). With the covariance The Work of Kolmogorov (1939-1941) R(i - ,j) A Ey(i)y(j) While Wold went on in his thesis to apply his ideas to Wold had shown that there exists a nondecreasingfunction economic time series, it was left to Kolmogorov to pick F(1) called the integrated power spectrum of y( .) such that up and complete Wold’s results on prediction, which he 112 did in a brilliant and comprehensivefashion in the papers R(k) = exp (i2zkA) dF(A). [207], [209], [210]. Later Wiener made similar, but less s -l/Z 158 IEEE TRANSACTIONS ON INFORMATTON THEORY, MARCH 1974

In general F(a) will consist of an absolutely continuous TheseA-functions can be extendedfrom the interval (- +,+), part, a jump part, and a singular part (continuous and or equivalently from the unit circle, into the complex nondecreasingbut with zero derivative almost everywhere). plane, an idea of Hardy’s [199] that had been extensively Kolmogorov showed that the deterministic part of a studied by Paley and Wiener [204]. Nevertheless, it was process y(e) is identically zero if and only if F(A) is ab- Kolmogorov who exploited theseideas in generalprediction. solutely continuous and its derivative 8’(A) satisfies A Formula of Szegii’s (191.5) 112 In P(A) dl > -co. (61) Kolmogorov’s explicit formula (62) for the mean-square s -l/2 error had already been discovered in a different, but He also gave explicit formulas for the coefficients {bi} in isomorphic problem by G. Szegii in 1915 [20@], [201]. terms of the Fourier series coefficients of In fi(A)5 and The isomorphism was precisely the one introduced by finally showed that the one-step prediction error has a Kolmogorov, according to which the problem of choosing simple form {aj} to minimize II-1 2 lim E[y(t) - E(t 1t - l)]” = lim E&‘(t) an2 P E Y(n) - 5 ajY(j> (67) t-+03 t-02 [ 11-2 1 = exp 3 In p(A) dA. is the sameas that of choosing them to minimize s -l/2 112 n-1 2 (62) Is2” - exp (i2rcln) - C aj exp (i271;lj) #(A) This last formula, which is valid for all processes(even s -l/2 I 0 those with a nonzero deterministic part), is closely related ((33) or to the Yovits--Jackson formula (17) mentioned in Section III (cf. [48], [54]). (69) These remarkable results were obtained by connecting the study of stationary random processes with that of Thus the problem of minimizing a,’ by suitable choice of certain deterministic functions in the frequency domain. the {ai} is just a problem of polynomial approximation on Kolmogorov did this via the relationship the unit circle. This problem was solved for absolutely 112 continuous F(z) by SzegB,and rederived by Kolmogorov Ey(k)y(l) = R(k - I) = exp i2&(k - 1) dF(A). for generalF(z). The connection to Szego’swork was noted s -l/2 (63) by Krein [214] and later by Grenander [223]. It will be useful later to set The Work of Krein (1944-1945) z = exp i2nA In responseto questions raised by Kolmogorov, Krein so that (63) can also be written in 1944-1945 [214], [215], showed how Kolmogorov’s results could be extended to continuous time by use of a Ey(k)y(Z) = R(k - 1) = zkz-’ dF(z) (64) z simple bilinear transformation. To each discrete-time P stationary process with integrated spectrum F(;i) we can where the special symbol denotes integration around the associatea stationary continuous-time processwith integrated unit circle. The left side of (63) can be regarded as an inner spectrum S(f), where product between the random variables y(k) and y(Z) in a Hilbert space of random variables formed from finite WI = S(f), 1= Atan-‘f (70) linear combinations of the {y(k)} and their limits in the TL covariance norm. Similarly, becauseF(A) is nondecreasing, 1 + if tan nA = exp (271% - 1 the right side can be regarded as an inner product between exp i2nI. = - , ___-- = if. 1 - if exp (2&) + 1 exp i2nk3, and exp i2nlA in the space L,(dF) of functions square-integrable with respect to the measure dF(3,). (71) Similarly the right side of (64) defines an inner product This transformation has the useful property of preserving between the powers zk in the space of polynomials on the causality, and therefore it is often used in digital signal unit circle. Therefore, we have an obvious norm-preserving processing(see, e.g., [367]). mapping (or isometry) betweenthese spacesin which Use of the bilinear transformation shows easily [225, ch. XII] that the necessaryand sufficient condition for no y(k) ++ exp i2nk,I c1 zk (65) n deterministic part is (compare (64)) F a,y(k) c-) $ ak exp i2nkA c* i akzk. (66) O” In ‘(f) df , _ Go 1 (72) f --m 1 -?-f’ The bilinear transformation can also be extended to multi- 5 Incidentally, this defines the so-called “cepstrum” of y( .), which has had a vogue in recently [364]-[368]. variable systems [233] and to systemsin state-spaceform KAILATH : LINEAR FILTERING THEORY 159

[163],‘j where it has beenexploited to give a new technique will be briefly discussedafter we presentthe basic problem for solving the steady-stateRiccati equation. The bilinear and its solution. transformation was also usedin somegenerality by Masani Given a segment of a stationary time-series {y(O), and Robertson [246], [35], [45], who paid particular y(l),* * *, y(N - l)}, where the {y(i)} are p-vectors, we attention to the question of how the discrete-time innova- wish to find the optimum one-stepprediction tions processgoes over to continuous time. We may note N-l that the discrete-time one-step prediction error formula $N[N-I & - c AN,,-iv(i). (73) has no continuous-time analog, but there is a continuous- i=O Let time version of the Wold decomposition and of the in- novations processs(e). Theseare important results, which Ri-j B -@@y’(j) we shall discussfurther in the next section. then by using the orthogonality property of least-squares Krein has made severalother important contributions to estimates,we will have the equations filtering theory. In 1954, he discovered that the spectral N-l analysisof a weightedstring, in which he had beeninterested RN-j = - C AN,N-iRi-j, j = N - l;.*,O. (74) since 1940,enabled him to obtain [227] some deepresults i=O on estimation given a finite data segment.Krein ’s analysis The mean-squareerror is given by also led him to several other results on the solution of N-l RN~ & = E[YN - PNIN-JYN’ = Ro -I- c AN,,-8-N. integral equations, the so-called “inverse-scattering” prob- 0 lem (see, e.g., [224], [226]). Recently, Dym and McKean (75) [259], [260] have pursuedKrein ’s ideas evenfurther. Since RN” is a nonincreasingfunction of N, its value can be The Work of Levinson (1947) usedto decidewhether it is necessaryto collect more data (i.e., increaseN) in order to achievea desiredmean-square In the USA, work on least-squaresestimation proceeded error. As stressedby Levinson, this makes it important to along different lines. Wiener’s basic ideas, rather than his find a way of successivelycalculating RN’, N = O,l, * * . . The ingenious solution of the problem, influenced work on first step is rearrangethe filter equations(74) and the error anti-aircraft devices,where the needto find computationally equation (75) in a single block-Toeplitz matrix equation simple solutions led to some alternative approaches.Thus Phillips [212] beganwith the assumptionthat the optimum filter has a rational transfer function and used the mean- square-errorcriterion to solve for the coefficients. Others, apparently including Blackman, Bode, and Shannon [3], tried to incorporate the dynamical constraints on the targets into the prediction schemes. &N (76) Levinson [218] formulated the problem in discrete time. In his words, “A few months after Wiener’s work appeared, where gN is a block-Toeplitz matrix. The unknowns are the author, in order to facilitate computation procedure, the {AN,i} and RN”. The aim is to determine Rh,, and worked out an approximate, and one might say, math- {A N+l,i} in a way that takes maximum advantageof the ematically trivial procedure.” Levinson’s deprecatorycom- previous computations made to find RNE and {AN,i}. It ments notwithstanding, this work has had an important takes almost as long to describethe result as it doesto give impact on the field, both directly and indirectly. His a derivation, following Robinson [110, ch. 61. We shall equations were rediscovered in 1960 by Durbin [239] therefore do so here, partly also with the hope that readers in a schemefor recursive fitting of autoregressivemode ls may recognizeanalogies with similar proceduresin other to scalar time-series data. Whittle [81], [32] extended problems. these recursions to multivariate time series, and his work The method is first to try an “obvious” solution, pushing has beenwidely usedby statisticians. Levinson’swork was our luck the most by assuming that adding a zero to the directly usedand extendedto multivariate seriesby workers previous solution may work. It will if in the resulting in geophysics,especially Robinson, with various contribu- equation tions by groups in the Geology Department at M.I.T. and [ZJN,,,* * *,AN,N,~]~N+I = [RNE,O;. .A~N] (77) in the oil industry (cf. [I lo] and the referencestherein). These algorithms are now being used in speechanalysis the term c(~ = RN+1 + Cy-’ AN,iRN-i is zero. If this does (see, e.g., [164], [173]) and in spectral estimation [107], not happen, we have to find a (simple) way of forcing MN [156], [182]. There are also close relations to the theory to zero. For this we introduce the “auxiliary” (adjoint or of orthogonal polynomials [208], [236], [241] and to the reversed)equations algorithms of Section VI. These and other connections [WN,N, * . . ,BN,~J]~N+ 1 = [PN,O,* * . ,O~N'] (78) where at stage N we assumethat we know {AN,i,RN’,UN) 6See also Popov [87], [87a], papersthat contain several important and {BN,i,RNr,QN}. [For N = 1, we take B,,, = Z, R,’ = ideas on spectral factorization and innovations representations (see especially [87, sects. 5, 7, and appendix E, F). R. = ROE.] Next we form a weighted combination of (77), 160

(781 where z is an indeterminate and similarly define BN(z), \ I AN+~(z), BN+ t(Z). Th en the recursions (81)-(83) can be [&AN,~ + KN'BN,N,' "9KNa]~N+1 written compactly as = [RN’ + KNaPN,O,** * ,O,CIN+ KN’RN’] from which it is easy to seethat choosing KN’ = -NN[RN’]-~ (79) which turn out to be exactly the recurrence formulas for gives us a solution of the extendedequation, i.e., orthogonal polynomials on the unit circle [236], [241]. These are polynomials {AN(z), IzI = l} such that [Z,AN+l,l> ” ‘,A N+l,N+l 1 = [Z,A,,,; . *,AN,N,O]+ Kzv~[O,BN,N,* . *,BN,tJ]. (80)

&ego’s classic result that the polynomials AN(z) have all uncorrelated increments E(e) and the notion of Wiener their roots inside the unit circle, so that AN-l(z) can be stochastic integrals. [The situation is analogousto that in regarded as the transfer function of a bounded filter, is deterministic system theory, where we can use Heaviside the analog of the fact that the transformation between a functions and Stieltjes integrals to handle singularities that discrete-time processy(s) and its innovations is boundedly are at worst delta functions.] Even when the second ap- reversible. proach is taken, the concept of white noise is still very So far, we have only’noted connections with known convenient and useful, as engineers have long known. results. On the other hand, innovations can be defined A thought-provoking example is provided by the develop- for large classesof processesbesides stationary discrete- ment of general series expansionsfor random processes, time sequences.Therefore, we have the possibility of which were known to engineersmuch before they were obtaining various generalizations of the previous results formally discoveredby mathematicians(cf. Section X). and in particular of the classical theory of orthogonal However, as the mathematical sophistication of the field polynomials. (Seealso [248a].) increases,there is a tendencyto uncritically reject the useof These possibilities provide additional motivation for white noise. This I believe is a serious mistake, which going on to a deeperstudy of innovations and the Wold delays for many potential usersthe appreciation of several representation. general and useful results that may have been originally derived in a more abstract context. Moreover, at the very VIII. CANONICAL REPRESENTATIONS OF CONTINUOUS-TIME least it can be a powerful guide to our intuition and a hedge PROCESSES against many unfruitful investigations. It may be apt to The concept of innovations processeswas introduced for note that, accordingto Doob [253a], it was Wiener who first stationary discrete-time processesvia the Wold representa- “showed, and applied repeatedly, that the [process E(s)] tion theorem. The continuous-time decomposition, which acts as though [its] derivative processexists, is stationary was obtained by different methods by Krein [215], Kar- and has a constant spectral density.” Doob’s book [225, hunen [222], and Hanner [221], is a natural generalization pp. 435-436,p. 533, pp. 546-5471has some nice examples of the discrete-time representation. It contains a regular of how white noise can be usefully employed even in a part plus a deterministic part, which will be absent if and rigorous exposition of stochastic processesaddressed to only if condition (72) is met, as we shall henceforthassume mathematicians. A recent book of Hida [295] stressesthe for convenienceof writing. Then the Wold representationis importance of white noise and this has been further rein- forced by recent work of Hida and of McKean [323]. t Rao has developedperhaps the most generalresults to date Y(t) = s(t - u> dE(u) (91) s -Co [3!5], studying more white noise processesthan those defined as the derivatives of processeswith independent where E(e) is a processwith uncorrelatedincrements, and increments [225]. the integral is the so-called Wiener stochastic integral, cf. Doob [225, ch. IX]. There are many kernels g(a) that can Properties of Canonical Kernels be usedin the previous representation,as we shall seelater, Karhunen and others have studied the kernels g( *) and but there is always a particular one go(*) such that y(a) go(.) in some detail. Karhunen [222] showedthat all g( *) and E(a) are causally equivalent in the sense that any that could serve to define the Wold representation (91) finite-variance random variable linearly dependent on are of the form {E(u), u I t} can also be calculated by linear operations on {y(s), s I t} and vice versa. In this case, we shall say g(t) = exp (iwt)G(w) 2 that the Wold representationis canonical, or that s t G(o) = lim G(s), s=o+io Y(t) = go(t - u> dE(u) (92) 0-O s -02 G(s) = A(s (93) is an innovations representation of y(a), and we shall call where E( *) the innovations process of y( *). This can be rewritten in a form closer to Weld’s (cf. [60a]) rt 1 y(t) = go0 - M4 du, e(u) = dE(u)/du. J -cc P(J) = the power spectraldensity of the processy(.) (94) However, e(.) is now a continuous time “white noise” and process,so that the differentiation of E(s) is not valid in a classicalsense, but only in the senseof generalizedfunctions. A(s) = transfer function of an “all-pass” system. However, as long as one is at most concernedwith a white Any such all-pass transfer function can be further de- noise process(and not any of its generalizedderivatives), composedinto there is no needto introduce the machinery of generalized processes;it sufficesto work with the integrated processof 4) = ~o~,(sM,(s)~,(s) (95) 162 where 1) Letf(*) be a square-integrablefunction on (- co,oo). Then A, = constant of magnitude 1 (called a trivial all-pass t function) go(t - ulf(u> dzJ = 0, t I 0, s --m (Blaschke product) if and only if f(t) = 0, t I 0. (104) This property was discovered by Karhunen [222]. Re Sk > 0, 2 Re Sk/l -!- IskI < 00 (96) 2) The canonical kernel has maximum partial energy A,(s) = eeas (pure delay) in the sensethat (97) t g&d) du 2 t s2(4 & all t > 0 (105) As(s) = exp - J O” &$ @O] (singular part) s s [ n s -03 (98) and all ciusal or noncauial g(*) with IG(iw)l = IGo(io)l. This property was given in 1962 by Robinson [271], where p(a) is a nondecreasing function whose derivative though closely related results were also noted by Levy vanishesalmost everywhere,and 0 < p(co) - p( - co) < co. in the mid-fifties (see,e.g., [266, p. 1401).See also [317]. The {Ai} all have unit modulus along the io axis, so that 3) The set {gO(t - r), r 2 0} spans L,(O,co). This last IG(ico)l = /Go(i (99) result is due to Beurling .[264] and provides a version for L, of a famous theorem of Wiener’s that the Banach space Also, for all but A, we have L, can be spanned by the translates of a function with positive Fourier transform. IAi( < 1, Re s > 0, i = 1,2,3. (100) 4) Let G(s) be rational and let If we restrict G(s) to being rational, then A,(s) and A3(.s) G(b) = IG(iw)l’exp $(io). will not appear and the so-called Blaschke product in (106) A,(s) will be finite (k < co). Note that all the poles of Then C#J(a) is called the phase Zag of the filter G(a) and A,(s) must be in the left half-plane (corresponding to -d~#~(io)/dw is called the group delay. Of all filters with the causality, since the region of definition of all our functions same “gain” jG(io>)l, Go(.) has the smallest phase lag and here includes the iw-axis), though there can be zeros in the group delay. This is often labeled the minimum-phase right half-plane. These results were obtained by using property of go(*) (see [263]). classical results on bilateral Laplace transforms (cf. Paley Applications to Prediction and Wiener [204]) and the so-called Hardy functions (see Hoffman [244] and Duren [258] for recent accounts). It should be clear (as noted by Krein and Karhunen) It may be of interest that the decomposition for G(s) was that the innovations representation (92) can be used to obtained before Karhunen and Krein by Krylov [262] solve the prediction problem just as in the discrete-time in a purely mathematical study of the transforms of “one- case. In fact we have sided” (causal) time functions. Such results have again t A(t + CY1 t) = go(t - u> dE(u), CI > 0. (107) become of interest in recent studies on infinite-dimensional s -CO realization theory [307]. Of course, the difficult thing is to explicitly calculate Karhunen proved that (E(u), u I t} from {x(u), u 5’ t}. For rational G,(e) the G(s) = AoGoW (101) solution is simple: just pass x(e) through the filter with transfer function [G,(s)]-‘. (In principle, this is what we is a necessaryand sufficient condition for do in general,but care has to be taken in defining the inverse MY; t> = ME; t), --co E ME; 0, -co

Nonstationary Processes y(t) = j=; [I + jUt k(w) dz] dE(u). (114) The cited work up to 1950was all for stationary processes and usedfrequency-domain methods. In 1950Hanner [221] This is a useful result, though it is a bit difficult to seethe gave a purely time-domain derivation of the continuous- rationale for the steps (see some following comments by time Wold decomposition, and this raised the possibility Hida). We shall give an explanation in Section IX. of extensionsto nonstationary processes.The first results Although Levy [266] noted that his method applied to in this direction were perhapsthose of Yaglom and Pinsker certain more generalkernels, e.g., those that were an a-fold [265], who studiednonstationary processes that had station- double integral of [d(t - s) + K(t,s)] (cf. (111)) with a ary increments of order II. The simplest example is the not necessarilyintegral, he was unable to prove that (110) Wiener process, which has stationary increments of first always has a solution, or, equivalently, that an innovations order, and can be representedas the integral of white representation always exists for a (nondeterministic) noise. The general case was first studied by Levy [260], process.The reason for this failure appearedonly a few [274], who sought representationsof the form years later, when in 1960 Cramer [268] and Hida [269] independentlydiscovered a new dimension to this problem, finding that a single kernel g( *, a)does not sufficein general. y(t) = f g(w) dE(u), 0 5 t I T < co. (108) s 0 The proper Wold decomposition for a finite-variance nonstationary processis When L,(y; t) = L,(E; t) (109) Y(t) = 5 1’ gi(t,u) dEi(u) + $(t> (115) 1 0 he called the representationsproper canonical, using the word canonical for the case L,(y; t) c L,(E; t). This where $( *) is a deterministic process,and the {Ei(.)} are terminology has now beenabandoned. The covarianceof a orthogonal-increment processes,uncorrelated with each processas in (108) is other. The number N is uniquely determinedby the covari- ante of y(e), though the {gi} and {E,(e)} are not. N is tlis R(t,s) = s(t,u>s(w) du, tAs = min (t,s). (110) called the multiplicity of the processy(a), and if N > 1 s 0 the representation(115) is called a generalizedcanonical ‘representation. The multiplicity N can be infinite [273], Now Levy asked whether, given a covariance R(t,s), one even though all presently known examples of processes could find a suitable function g(t,u). He did not obtain a with N > 1 have rather pathological kernel functions general solution to this problem, but among other results, {gi(t,u)}. Hitsuda has very recently discovered [320] that he did obtain the following useful one. a processof the form Let t s w1(t) + f(t)W2(0, t20 (116) R(t,s) = tAs + K(u,v) du dv, 0 I t,s I T ss0 0 where wi(*) and w2(*) are independentW iener processes, (llla) will havemu ltiplicity 1 iff( *) is absolutely continuous with a square-integrablederivative. However, if the derivative is not square-integrableon every open interval (I,m) c = ’ ’ [6(u - v) + K(u,v)] du dv (lllb) ss0 0 [O,oo),then the multiplicity is 2. The multiplicity is also 2 if f( *) has unboundedvariation everywhere. where K(*, *) is a continuous symmetric function of two Hida (2691developed some of Levy’s ideas more clearly variables. Assume that the eigenvaluesof K(* , a) on the and obtained several new results. In his words “[Levy’s] square[O,T] x [O,T] are greaterthan - 1, or equivalently pioneering works contain some points difficult for us to that R(. , *) is strictly positive definite on this square.Next follow. The main aim of this paper is to establishhis theory determine a function h(*, *) as the unique solution of the systematicallyand to prove somenew facts.” Among other Wiener-Hopf type of equation things, Hida proved in great generality that if a process has a representationof the form (log), then it always has a t canonical (innovations) representation obeying (109). He h(t,s) + h(t,u)K(u,s) du = K(t,s), Ols this means forming These results are for processesof multiplicity one and are difficult to extend to the general case (115), becauseof the nonuniquenessof the {gi} and {Ei}. The analogous procedure for a continuous-time process is Processes of Multiplicity One to form It is useful to identify processeswhose multiplicity is 1, 40 = v(t) - $0 I t-1 since statistical applications will be easier in this case. but this will be identically zero for any process with con- As noted earlier, the results of Krein, Karhunen, and tinuous paths. However, if Hanner show that nondeterministic stationary processes O

j(t) = H(t)x(t) + v(t) (I - k)-’ = Z + k (126a) with the usual assumptions (19)-(21) on u(.), v(e), x0. The or equivalently via the Volterra equation least-squaresfilter (22)-(24)for this state-spacemodel can be k + kk = k. (126b) rewritten as Then i(t) = F(t)R(t) + K(t)E(t), a(0) = 0 (130) Z + K = Eyy’ = (I - k)- lEee’(Z - k’)- 1 y(t) = H(t)a(t) + E(t) = (I - k)-‘(I - k’)-’ where the gain function K( *), which is defined by = (Z + k)(Z + k’). (127) K(t) = P(t)H’(t) + G(t)C(t), P(t) = ET(t)?(t) (131) Equivalently, we have a canonical representation for v(s), viz., can be calculated via the Riccati equation (27) for P(e), or directly via the Chandrasekhar-type equations of Section y = (I + k)E. (128) VI if F, G, H are constant. This factorization, so natural now, took some time to be Now, sinceE( .) is known to be white, (130)can be regarded recognized. Shepp had given a noncausal factorization, as another causal model for y(.) driven by a single white but could not solve the problem of causal factorization noise. Also, it is easy to calculate E(*) from y( .): replace [276, pp. 3321. This is now accomplished by (127). This E by y - H? in the differential equation, calculate 8, and result having been obtained, it became clear that for con- then form E as y - HA. Therefore, for a process with a tinuous K we had just rediscoveredthe LCvy factorization known state model (129), (130) defines the innovations (c.f. (ill)-(114) of Section VIII, where y there is actually representation (IR). (This simple fact, widely known by the integral of the previous JJ). However, our result was now, was to our knowledge first pointed out in [116, somewhat more general; assumption (125) does not annendixI I DlJ and exnlicitlvI . restated in r2841.JL 2 , KAILATH : LINEAR FILTERING THEORY 167

Now the generalstudies of Hida and Cramer on IRS (cf. The power spectral density matrix is defined by the values Section VIII) have shown that the IR of a processv(a) can for s = io of the function depend only upon the covariance of v(a). Therefore, m F, K, H in (129) must be determinable (up to state-space S,(s)= R(t)ees’ dt transformations) from the covariance of y(e), no matter s-co what state model we initially assumed.To do this we can = Z + M(sZ - F)-lN + A”(-sZ - F’)-lA4’. calculate the covariance of v(e) as given by the particular model (129) and compare with the given expression(128) (135) to make the nonuniqueidentifications This is not generallythe form in which the power spectral density will be given, but let us postponethis aspectfor a M(t) = H(t), F(t) = i!y @-‘(t,s) while. The problem is to factor S,(s) as [cf. (14)] S,(s) = s,+(s)s,+(-s) N(t) = II(t)H’(t) + G(t)C(t). We now use these relations to express the parameters where S,,+(S)is the transfer-function matrix of a causaland (F,H,K) of the IR (130)in terms of (M,@,N). Let causally invertible systemwith p inputs and p outputs. Now, for finite time, we have such a system in the IR X(t) A ES?(t)?(t). (130), (132),(133). However, when F is stable, it is not hard to prove (see, e.g., [126], [135]) that as t -+ co, X(t) in Then by the orthogonality of 2 and 2 we have (133) will tend to a constant matrix Z and hence that K(t) Ex(t)x’(t) A I-I(t) = X(t) + Z’(t) will tend to a constant matrix R ~=N--~M’ so that we can write K( *), as given by (131), as (136)

K(t) = I-I(t)H’(t) + G(t)C(t) - C(t)H’(t) where z is the unique nonnegative-definitesolution of the algebraicRiccati equation = N(t) - Iqt)W(t). (132) 0 = FE + CF’ + [N - %kZ’][N - CM’]‘. (137) Moreover, using (130)and the fact that E( *) is white readily yields Then the innovations representation(130) has the transfer function g(t) = F(t)X(t) + C(t)F’(t) + [N(t) - C(t)M’(t)] S,,+(s) = Z + M(sZ - F)-‘K (138) . [N(t) - C(t)M’(t)]‘, E(O) = 0. (133) which we have designatedS,, ‘(s) becausein the limit it The stochasticinterpretation of C( *) guaranteesits existence, continues to provide the canonical (causal and causally and we seethat (132) and (133)enable us to calculate K( .) convertible) factor of the power spectral density. From from knowledgeonly of M(e), N( *), and @(*, a). Therefore, (135) and (138), we obtain the useful spectral factorization (130), (132), and (133) determine the canonical or innova- formula tions factorization of the covariance(128). The state model Z + M(sZ - F)-‘N + N’(-sZ - F’)-%W (129) was only used here to motivate the developmentof the IR, as defined by (130), (132), (133). The result can be = [I + M(sZ - F)-%][I + K’(-sZ - F’)-%‘]. deducedjust from the assumption that the covariance in (139) (128) is strictly positive definite (cf. [285], [181, appendix III]). However, the present derivation does have the We clearly have several procedures for computing R advantagethat it clearly displays the intimate relationship and thereby factoring S,(s). We can find the unique non- betweenthe filtering problem and the factorization problem. negative-definite solution C of the nonlinear algebraic Thus we seethat the state vector of the IR is a( *), so that equation (137), or we can find c as the limiting solution H(t)a(t) = M(t)a(t) can also be computed directly from of the Riccati differential equation (133); another more knowledgeof the covariancefunction; this gives us a proof direct method is to find R as the limiting solution of the of the result presentedin Section V. Other applications are Chandrasekhar-typeequations of Section VI. The dif- describedin [lSl], [285], [292], [310]. ferential equation procedureswould be preferred because As oneexample, we shall show how to obtain somematrix of their simplicity and automatic production of the right spectral factorization algorithms. c or K, but it is sometimesdifficult to control the accumula- tion of computational errors until steady state is reached. Multivariate Spectral Factorizations However, the Chandrasekhar-type algorithms seem to Consider a covariancefunction R,(e) of the form (128a) behavequite well in this regard. The solution of the quad- where M and N are constant p x n and n x p matrices, ratic algebraicequation (137)involves choosingamong the respectively,and several possible solutions for E and is generally more laborious, though efficient eigenvalue-eigenvectormethods @(t,s) = exp F(t - s), F = a stability matrix. (134) have recently beenproposed [78], [loo], [160]. 168 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974

We now examine the problem of how to handle S,(s) When Z(s) has rational elements, an important tool in that are not given in the form (135).The properties of power such studies has been the so-called positive-real lemma, spectra show that we can easily write S,,(s)in the form first given by Yakubovic [73} and Kalman [75] for scalar S,(s), and extended to the matrix case by Popov [87] and a polynomial matrix S,(s) = (140) Anderson (seereferences in [177]). Vs)V - s> The positive-real lemma starts with a (nonunique) minimal realization of Z(s) in the form where $(s)$( - s) is a scalar polynomial that is the greatest common divisor of the denominators of all the entries in Z(s) = J + M(sZ - F)-‘N (144) qss). Now by a partial fraction expansion (or other means) where minimality meansthat the squarematrix F has lowest we can decomposeS,(s) as dimension among all possible F that could be used. Then it states that Z(s) will be positive real if and only if there S,(s) = Z(s) + Z’( -s) (141) exists a real symmetric matrix n 2 0 such that - - where Z(s) contains all the terms corresponding to the left ’ N-IfA4 >. Jl = &y+j=g,,, (145) half-plane zeros of $(s)$( - s). We can identify Z(s) as the [ J+J’ 1-. Laplace transform of the positive-time part of R(t) [cf. Since the .&? matrix is nonnegative definite, it can be (1341,U35)l factored as

Z(s) = m [$Zd(t) + MeFtN]eFst dt (142) A = (146)7 s ; [LW’ ’] 0 [ 1 = +Z + M(sZ - F)-IN. (143) where the column size of L and W is arbitrary. Then an alternative statement is clearly that Z(s) will be positive Now Z(s) can be regarded as the transfer function of a real if and only if there exist matrices L and W such that system with impulse responsefunction [$Z + MeF’N], and therefore given Z(s) we can find M, F, N by using one of Fn + nF’ = LL’ (147a) severalalgorithms [129], [306], [360], going from rational N-i%kf’=LW (147b) transfer functions to state-variable realizations. Thus we have some new methods for the classical multi- J + J’ = WW’. (147c) variate spectral factorization problem. This problem has a The significanceof L and W is that they immediately give a long history and several different algorithms have been factorization of S,(s). In fact, we can check that proposed by Wiener and Masani [12], Youla [26], Davis [28], Yaglom [23], Rozanov [22], Masani [35], Csaki S,(s) = Z(s) + Z’( -s) and Fischer [36], Tuel [125], Strintzis [55], and others. = [W + M(sZ - F)+L][W’ + L’(-sZ - F’)-‘M’. The method based on solution of the nonlinear algebraic equation (137) was first derived in a different way by (148) Anderson [104], who used certain connections, initially There are many matrices n that will satisfy (145), and noted by Youla [26] and Kalman [75], [77], between consequently there will be many factorizations. The family spectral factorization and certain functions long familiar of all such solutions has been studied by Anderson [177], in network theory. Willems [158], Kucera [167], and Canabal [171]. The The point is that the Z(s) in (142),(143) is not an arbitrary maximum and minimum n, when they exist, play a signif- transfer function, but a “driving-point” impedancefunction, icant role in the analysis, with the minimum being the one a property that Brune showed, in a 1931 dissertation that that gives the innovations factorization; the maximum essentially founded network theory, is equivalent to Z(s) relates similarly to a certain dual system. being positive real, viz., that it obeys the conditions 1) all We cannot pursue such discussions any further here, elements of Z(s) are analytic in Re s > 0; 2) Z(s) is real but it may be interesting to note that related and in fact when s is real and positive; 3) Z(s) + Z’(-s) 2 0, if somewhat more general minimality properties were dis- Re s > 0. It can be shown (see,e.g. [177]) that equivalent covered by Krein in 1945 (cited in [22]) and by Masani conditions are that Z(s) + Z/(-s) is a power spectral [237]. The multivariate estimation problem has many density matrix or that fascinating aspects that are not generally known in the engineering literature, but we must content ourselves here Z(s) = m R(t)e-“’ dt to calling attention to the book [30] and two fine surveys s 0 by Masani [35], [45]. where R(e) is a nonnegative-definite function (i.e., a We conclude this section in a more engineeringvein. covariance function). These equivalences enable a con- siderable interplay between results in network theory, 7 Such factorizations of the discrete-time time-invariant generaliza- stability theory, control theory, and estimation (see, e.g., tion of the matrix in (145) are at the heart of the square-root algorithms [771, [871, [1581, [1771). mentioned briefly at the end of Section VI. KAILATH : LINEAR FILTERING THEORY 169

Transfer-Function Models Versus State Models The assumptionof Gaussiannessis madefor terminological The many successesof the state model in recent years convenience;all statementscan be translated in a standard have led to an unwise neglect of more traditional methods way to apply to just “second-order” processes. and problems. Some exampleswere discussedin Sections The K-L expansionof z( .) is VI and VII. Another example, briefly noted in Section IX, is the use of multivariate transfer-function models and (149) frequency-domain analysis. Such models are essentially the only ones used in the statistical literature, where they where the {Y i( .)} are eigenfunctionsof I?,(. , .), viz., are known as ARMA (autoregressive-movingaverage T models). There are several interesting features associated R,(t,ds = s)Y&Yui(t), Oyi(t> dt, 1,2,. . . . Briefly, supposewe have a processy( *) such that s It is known from the theory of integral equations that the y(n) + A, y(n - 1) + * * f + &y(O) = w(n) {Y i( .)} are orthonormal where T Yi(t)Yj(t) dt = 6ij s w(n) = B,u(n) + B,u(n - 1) + * * * + B,u(n - m) 0 and that (Mercer’s formula) and u( *) is a white-noise sequence.Then w(e) is a moving- average process. If Bi = 0, i 2 1, then J( .) is an auto- R,(t,s) = 2 iiYi(t)Yi(s). (150) regressive process; otherwise we have a mixed or auto- 1 regressive-moving average process.The interesting point is A simple calculation shows that the coefficients {zi} are that the innovations for u(a) are just the innovations for uncorrelated 4.1 Ezizj = ,liSLj. p(n 1n - 1) = -A, y(n - 1) - * * * - A,y(O) - ti(n 1n - 1) If we temporarily write the random process as z(t,o), w so that beingthe probability-spacevariable, then the K-L expansion is a decomposition of a function of two variables into a E(n) = y(n) - $(n 1n - 1) = w(n) - G(n 1n - 1). sum of products of functions of one variable

The processw(a) is often much simpler than y(a), e.g., m z(t,o) = -g Zi(w)Yi(t). may be much less than ~2,or the {Bi} may be constant 1 while the {Ai} are time variant or even nonlinear. These Suchdecompositions are familiar from the partial differential possibilities have already been exploited in [ 1311, [ 1751, equations of physics and their significance in random- [191], but more can be done. A useful stimulus to such processtheory is basically the same. Since the (Yi(*)} are researchesis also provided by the relatively recent work of deterministic, we can replace study of the uncountable Popov [355], [359], Rosenbrock [356], Wang [358], family of random variables z(t,w) by that of the countable Morf [191], Forney [361], Wolovich [363], and others, family {Zi(O)}. The K-L expansionshave the further useful which has uncoveredthe close relationships betweenlinear property that the {zi(w)} are independentbecause z(.) is ARMA models and state-spacemode ls, thus enabling a Gaussian,which simplifies many probabilistic calculations, fruitful combination of time- and frequency-domain e.g., determination of moments and convergenceof sums. methods. The K-L expansion (Karhunen [331], Loeve [329]) was independentlyintroduced by Kac and Siegert [330], [331], X. KARHUNEN-LO~VE EXPANSIONS : CANONICAL to simplify the calculation of the distribution of the output CORRELATIONS AND STATE MODELS power from a nonlinear circuit (limiter-squarer-filter) Those familiar with the textbooks on statistical com- driven by noise. On more abstract grounds, it had already munication theory in the last decadeor so may be surprised been introduced in 1943 by Kosambi [328], an Indian that we have not referred to seriesexpansions of random statistician and Marxist philosopher, and also .by Obukhov processes,and more particularly to the Karhunen-Loeve (in a 1946 dissertation, cited in [346]), Pugachev [339], (K-L) expansions(see, e.g., [341], [342], [40]). Consider and perhaps many others. The popularity of the K-L a scalar Gaussianprocess z( *) with a continuouscovariance expansion(this terminology is now well entrenched)grew R,(t,s) such that from its use in the 1950Ph.D. dissertation [333] of Gren- ander to extendto stochasticprocesses the classicaltheories T T R,‘(t,s) dt ds < co. of statistical estimation and hypothesistesting, which had ss0 0 been developed for finite families of random variables. 170 IEEE TRANSACTTONS ON INFORMATION THEORY, MARCH 1974

The K-L expansion was soon used in estimation problems The easeof solving the smoothing equation (151) is heavily by Davis [335], Slepian [336], and Youla [337]. Its dependent upon the assumption that z(e) and v(a) are presentation in 1958in the pioneering textbook of Daven- uncorrelated, which makes the right side equal to R,(t,s). port and Root [341] and the many applications in th$ Otherwise we would also have [cf. (9)] the term Ez(t)v(s), widely used 1960textbook of Helstrom [342] gave the K-L and now there is no obvious solution. Some reflection will expansion a major place in the literature of the sixties. show that what really yielded the solution (152) of (151) Despite these many successes,however, the use of such was the simultaneous expansions expansions is diminishing for several reasons. One is ,of course that K-L expansions really apply only to Gaussian Ez(t)z(s) = R,(t,s) = $ iiYi(t)Yi(s) (or second-order) processes, while recently martingale theory has enabled significant headway to be made with Ev(t)v(s) = s(t - s) = 2 1 . Yi(t)Yi(s) non-Gaussian processes (see, e.g., the survey [327] by 1 Wong). However, this deficiency does not apply to linear filtering, the main concern of our survey. Here a common ‘Z(t) = C ZiYi(t) complaint is that the K-L expansion does not lend itself Ez~z~ = ;li6ij to recursive calculation because the {zi} and {Y i(.)} are Evivj = 6ij not easily updated as T increases. Also, since the (zi} depend upon the values z(t), for all t E [O,T], K-L ex- and the fact that pansions would seemto be more appropriate for smoothing EZ(t)V(S) ~ 0 j EZiUj = 0, for all i,j. problems (data over [O,T]) rather than causal filtering problems. But how can dependencebetween z(e) and v(.) be reflected For example, consider the smoothing integral equation into the {zi} and the {Vi}, especially a one-sideddependence (9) for uncorrelated signal z( .) and noise v( .), say (in an as in (4)? This question does not seem to have been raised obvious notation) in the engineeringliterature, rigorous or formal, eventhough T dependenceis neededto model many problems, e.g., when H(G) + H(t,4R,(w) d7 = R,(t,s), 0 r. t,s I T. feedbaclcis present. s 0 The answer is that one should not treat z(.) and v(s) (151) separately but should work with the observed process The Mercer formula (150) for R,(t,s) now shows readily y(e) = z(a) + v(a), which has covariance that the solution can be written Ey(t)y(s) = s(t - s) + K(t,s).

H(Q) = g1 A, yi(t>yi(s>, 0 5 t,s I T. (152) K(t,s) will not generally be a covariance when z(e) and I v(e) are correlated, but since Ey(t)y(s) is a covariance one The computational value of such a solution is debatable can show that K(t,s} (assumedto be continuous in t and s) becausethe {ail and {Yi(.)} are difficult to compute, but has only a finite number of negative eigenvalues, and the at least its explicitness is often convenient. No similar Mercer formula extends to such functions as well (Riesz- solution appearspossible for the filtering (or Wiener-Hopf) Nagy [385, p. 2421).The emphasis on the observedprocess equation (10) becauseof the causality constraint 0 I s < y(o), as against the signal and noise processesseparately, t I T. Nevertheless, recently a number of authors [352], is also the key to the development of the recursive Wiener [353], have shown that, with proper interpretations, series- filters; cf. the discussion of (128)-(133) in Section IX. expansion techniques can also be exploited in causal We have passed quickly over the above point that the filtering problems. The key to these results can be found in white noise v(a) or its covariance d(t) do not meet the an old device of Swerling’s (cited in [155]). Swerling conditions for the validity of the K-L expansion (149) or calculates the filtered estimate .2(t ( t) as the Mercer formula (150). However, the validity of the

m expansion ~(t 1 t) = C z^iltYi(t) 1 where the {2il,} are smoothed estimates of the coefficients (zi} given data on [O,t], and therefore can be determined {+i(.)} = any complete orthonormal family on L,[O,T] by solving the smoothing equation(7). The filtered estimate can then be put together as an infinite combination of is usually argued on the grounds that any L,-function f( .) smoothed estimates. (This reflects in a different way our can be correctly calculated as comment in Section II that (10) is a family (but not the At> = T .fW(t - s) ds = 1 .f$i(t> obvious one) of equations of the form (9).) The recursive s Kalman filter can be derived along these lines. 0 While one common criticism of the K-L approach can .fi = ST f(tMi(t> dt, i = 1,2;... thus be partly met, there is another more serious difficulty. 0 KAILATH : LINEAR FILTERING THEORY 171

Therefore the {pi} can be chosenas, say, the eigenfunc- results can also be heuristically explained by using white tions {c$~(*)}of someother covariance,say R,(*;), and it is noise, but we shall give a slightly different explanation. usually arguedthat we can write Reproducing Kernel Hilbert Spaces

V(t) = i “iyiCt) + vrem(t) The time functions {Y,(e)} in the K-L expansion are 1 orthogonal over [O,T] in the sensethat where the quantity v,,,( *) needed to make both sides

“equal” is orthogonal to the signal processz( .) and can (yi,y,j> = T Yi(t)Yj(t) dt = 6zj. thereforebe ignored. We do not wish to push this argument f 0 too far, though it has been quite successfullyused in the This property does.not hold for the {Qi} literature. However, lest one get too scornful of what some people call “engineering nonsense,”we shall show how it can be usedto obtain (or let us say conjecture)a result that but if we define a new inner product as appearedonly much later in the mathematical literature (not that this is a new phenomenon). (4.M(.)>Hcw, = T a(t)b(t)* dt + a(O)b(O) s 0 Expansions of a Wiener Process then it is easyto seethat White Gaussian noise can be thought of as the formal derivative of a Wiener processw(e) with continuous cov- ariancefunction for all choices of the {Qi(*)}. This inner product is ap- Ew(t)w(s) = min (t,s). propriate for the Wiener processw(.). For other processes, we can determine suitable inner products that will make Now the eigenfunctions of this covariance are readily the correspondingexpansions have simultaneously ortho- calculated gonal random variables and time functions. The only Y,,(t) = (2/T)‘/” sin [(2n - l)nt/2T], n = I,.** special feature of the K-L expansion is that the time functions are orthogonal with respect to the L,(dt) inner so that the K-L expansionis product. However, there is nothing sacred about L, or about Lebesguemeasure dt; we could, for example, use a w(t) = 2 w,Y,(t) measurep(t)dt, wherep(.) is a weight function such that 1 where the {w,} are uncorrelated random variables with R2(t,s)p(t)p(s) dt ds < UZI. variances{4T2/(2n - 1)2rc2).However, since formally ssI I t This leads to expansionsthat are orthogonal in L,(p(t)dt) u(.) = white noise w(t)= norm. In fact, to avoid dependenceon sucharbitrary weight s0 44 dz , functions p(.) or on the arbitrary family {qi(*)}, it is we can also write desirable to try to seek an “intrinsic” norm associated m w(t)= c Via+(t) just with the covariance of the process w(.). The norm 1 < 2 hz(W)is just such a norm, and it is called a reproducing kernel norm becauseof the property where the {vi} are uncorrelated unit-variance random variables t ,m(~>>Hc,j< ~0. ~,[Wl. Such reproducing kernel Hilbert spaces(RKHS) were Thus we can get many expansionsfor w(a), each with introduced into stochastic process theory by Lotve in uncorrelated random variables {wi}, but with dzfirent 1948(cf. [274, appendixI]), and their usefulnessin statistical families of deterministic functions Qi(*). The {wi} can be applications has beenmade clear, notably by Parzen [347]. calculatedas Many other referencesas well as sometutorial explanations T and applications are discussed in [347], [349], [311]. wi = ~i(t)U(t) dt = T Vi(t) dw(t). However, it should be noted here that engineers have s s 0 0 generally tended to think only of the space L, whenever For most purposesthese expansionsare almost as useful Hilbert spaceis mentioned, perhaps on the grounds that as the K-L expansion(149). The previous result was first L, is isomorphic to any Hilbert space.But the norm is the obtained by Shepp [276] and has since been extendedto most important feature of a Hilbert space,and isometries fairly general(non-Wiener) processes[350]. Thesegeneral (norm-preservingisomorphisms) are more significant than 172 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974 additive isomorphisms. The theory of RKHS shows that used it for the calculation of mutual information [338]. there are many other quite different and quite useful Some especially interesting results are obtained when the Hilbert spacesbesides L,. two families of random variables are the past and future There are many other aspects of series expansions that of the same random process. Yaglom has shown that the could be discussed (some are noted in Wong’s survey past and future (and in fact any two disjoint segments)of a [327]), but we shall conclude with a discussion of a some- nondeterministic continuous-time process have a finite what different type of process representation. number of canonical variables if and only if the process has a rational spectral density. Canonical Correlations and State Models Recent studies in automata theory and algebraic system Since series expansions generally contain an infinite theory [119], [129], have shown that the analysis of a number of terms, they cannot be used directly without system in terms of past and future leads naturally to a truncation to a finite number, and this brings up the “state-space” description. Related ideas can be recognized question of which terms to keep. It is usually suggested in an interesting but somewhat obscure paper by Levinson that we keep the coefficients that have maximum variance. and McKean [345]. For a stationary process y(s), they However, calculations of least-squares estimates on this introduce, among others, the subspaces basis have not beenvery satisfactory, especially for processes with rational spectral density. The reason is roughly that B Ca,b)= the linear (Hilbert) space spanned by the random the whole data interval entersequally into the determination of all the series coefficients, whereas, for example in pre- variables (y(r), a I z 5 b) diction, the most recent observations should make a larger B, = B(,,,, = the future contribution. The state-space description of random processes reflects this circumstance better, but there is B- = Bc-m,oj = the past another quite general statistical technique that is roughly equivalent. This is the so-called theory of canonical cor- B+,- = the prqjection of B, on B- relations, which was independently developed in the mid- B thirties by Hotelling and Obukhov (cf. [346]). [Incidentally, o+ = ,?, %.a)~ Hotelling was apparently the first to use eigenvalue-eigen- vector decompositions (the finite analogs of the K-L B, + is called the germ field [343] and B, , _ is the minimal expansion) in statistics (in a 1933 paper in the Journal of splitting field of past and future, viz., it is the smallest field Educational Psychology).] such that, given B,, --) B, is independent of B-. [There Theseauthors proposedthe following method for studying are clearly many splitting fields, e.g., B-; the proof that the interrelations betweentwo (full-rank) families of random B+,- is the minimal field was given by McKean in a variables {X1,X,; * *,X,} and {Y1,Y,;. . Y,}. First find the fascinating paper [344] on multidimensional Brownian linear combinations motion.] These concepts are clearly related to the notion of state, and with this in mind we should not be surprised that

u, = i cc,,xi, vl = $ PliY, B, l - is finite-dimensional if and only if y( a) has a rational 1 1 spectral density or that B,, _ = Bo+ if and only if the that have the largest cross-correlation coefficient spectral density has no zeros [345], [261a]. Recall that the state of a system can be determined from the output and p1 = EU,VI/JEU12EV12. its derivatives, without knowledge of the input, if and only Next find linear combinations U, and V, that are uncor- if the transfer function has no zeros. related with (U,,V,) and have maximum correlation It can be proved that the canonical variables for the sets coefficient, and so on until a new set of random variables {y(r), r r 0} and {y(z), z I 0} are a useful basis for the

(U,,U,,* * *,u,, v1; * * ,V,) has been found that span the state-spaceB, , --) and this fact has recently been cleverly same space as the {X1,X2; . -,X,,, Y,; . *,Y,} and are pair- exploited by Akaike [316] to study state-spacemodeling. wise uncorrelated, except for the pairs (Ui,Vi), i = 1,. * a, In particular he obtains a stochastic interpretation of the min (n,m). The actual calculation can be shown to be Ho-Youla-Silverman and other algorithms for determining equivalent to the solution of a certain eigenvalue problem a minimal state-spacerealization of an impulse response. (see,e.g., [340]). We cannot pursue these matters any further here, though It is reasonable that in making inferences about the as a final comment we may note that the canonical cor- { Yi} from the {Xi}, the first few canonical variables { Ui} relations would seem to be useful in more problems than should give us more information than the first few coef- those to which they have been explicitly applied so far. ficients of the discrete-time analog of the K-L expansion. This observation was made by Yaglom [346], who with XI. CONCLUDING REMARKS Gelfand generalized the theory to the case of continuous- In this survey, we have described several well established time random processes(x(s), s E S} and {y(t), t E T}, and and widely used results and on occasion we have also KAILATH: LINEAR FILTERING THEORY 173 indicated some areas for further work and some new It should be noted that the previously mentioned un- directions.* expected difficulties with models and integrals lent the One important trend is the growing interplay between nonlinear recursionsmore than academicinterest. Wonham linear system theory and linear filtering theory. This is gave the first rigorous proof, but only for signals taking reinforced by the growing realization in both fields of the finitely many values (e.g., finite Markov chains) [97]. importance of understanding the underlying structural Other proofs have since beenconstructed for more general features and invariants of dynamical linear systems and signals, but under rather complicated and physically of the stochasticprocesses that can be generatedfrom them. obscureconditions. In particular, dependenceof the signal This structural knowledge is bound to be useful in all on feedbackobservations was excluded.As noted in Section applications where linear systemsand stochastic processes IX, Fujisaki et al. have recently used the innovations arise, ranging from long-distancecommunication problems approachto overcomemost of thesedifficulties. to the analysis of single circuits. However, although such rigorous proofs are of interest, Another obvious direction is into nonlinear filtering. their main contribution is to highlight the fundamental Actually the last decadehas seena considerableeffort to difficulties in the way of practical optimum filtering. Despite extend the Kalman filter to nonlinear problems. This vast major efforts (see, e.g., the proceedingsof many recent areawill haveto be surveyedseparately, but someimportant symposia) the field is in some disarray. A good account aspectsshould be mentioned here. of someof the more successfulefforts is given by Jazwinski Recursiveformulas for updating the least-squaresestimate [138]. I believe that the situation is somewhat analogous (the conditional mean) were first obtained by Stratonovich to that of linear filtering in the mid-fifties, when the field [65], [66], and Kushner [86]. However, it was found that was rapidly grinding to a halt amid a welter of numerous in generalthe formulas involve all the conditional moments, attempts at direct extensions of the Wiener filter. The so that an infinite .set of simultaneous equations (or Kalman filter provided a new impulse that moved things equivalently, a partial differential equation for the con- out of the doldrums into a new and fruitful direction. ditional probability density or the conditional characteristic Similarly in nonlinear filtering it may be that attempts to functional) is necessary.There is still no consensusas to a solve the nonlinear filtering problem along the lines of the satisfactory way of “truncating” this set of equations, successfulKalman linear filter are misdirected. Some new which incidentally is also encounteredin other fields such approachneeds to be uncovered. as fluid mechanics and quantum mechanics [388]-[390]. Perhapsthe way to begin is by lowering our sights by Furthermore, no other computationally satisfactory ap- restricting ourselvesto parameter estimation rather than proaches to solving directly, even approximately, the to estimation of rather general stochastic processes.Even partial differential equations seem to be at hand, though this is a difficult subject, which will not even be outlined spline function approachesdo hold out some promise. here. However, I bring it up becauserecently information- At the moment, one of the chief benefits of having theoretic’ ideas have been found to be useful in getting attacked the nonlinear problem was that it brought to the error bounds for such finite-parameter problems (see,e.g., fore certain difficulties associatedwith the proper definition [3771-WI, [W), and to a small extent for certain of the differential and integral equations used to describe infinite-parameter (or stochastic-process)problems as well, nonlinear operations on white noise. For nearly the first see [383]. Furthermore, recently Blahut [382] and others time, as far as engineerswere concerned, the use of different have begun to show how the basic results of information definitions of integrals made a difference to the answer, theory can also be illuminated by the use of some simple rather than just to the “rigor” of the proofs. It also affected hypothesis-testing and parameter-estimation problems. the proper modeling of physical nonlinear problems. Wong This interchangewill be valuable and is in fact somewhat and Zakai [277], McShane[277a], [277b], and others have overdue,which brings me to my final point. made important contributions to this subject, but there At many times in the twenty-five years of information are still many unresolved questions. Nevertheless, this theory there has been a not inconsiderabledissatisfaction work pointed the way to the introduction of martingale with the scope,development, and application of the theory. theory into communication and control problems, a fact This may or may not (probably not) have beenjustified, whose significancewill probably far outreach any specific but it is interesting, at least in my opinion, that the fields nonlinear filtering problems. This point has beendiscussed of signal detection, estimation, and stochastic processes at greaterlength in Wong’s survey [327] ; seealso [298]. havenot experiencedsuch traumas. It seemsto me that the reason lies in the actively pursued connections between s It perhaps goes without saying that we have confined ourselves these subjects and many other topics. “Strict-sense” for many reasons to the theory of linear filtering and in particular to informa.tion theory, although a beautiful and important the probabilistic theory, where knowledge of all statistical parameters (e.g., means and covariances) is assumed. The development of a subject, has suffered by its partially deliberate insularity statistical theory of filtering will introduce several new dimensions, and isolation. The numerousinterconnections of statistical although it might be noted that the lack of a complete statistical theory does not seem to have significantly limited the successful use of the ideas of the probabilistic theory. ’ In the Shannon sense. 174 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974 signal processing with other fields, as richly displayed 2nd ed., 1964 (Transl.: Reading, Mass.: Addison-Wesley, 1966). [22] Yu. A. Rozanov, “Spectral properties of multivariate stationary even in the small subdomain of linear filtering that we have processes and boundary properties of analytic matrices,” surveyed, is perhaps the surest guarantee of its continued Theory Prob. Appl. (USSR), vol. 5, pp. 362-376, 1960. [23] A. M. Yaglom, “Effective solutions of linear approximation vitality. It has never been other than a pleasure for me to problems for multivariate stationary processes with a rational have worked in such a field. spectrum,” Theory Prob. Appl. (USSR), vol. 5, pp. 239-264,196O. [24] S. S. L. Chang, Synthesis of Optimum Control Systems. New York: McGraw-Hill, 1961. ACKNOWLEDGMENT [25] E. L. Peterson, Statistical Analysis and Optimization of Systems. New York: Wiley, 1961. The last sentence really says it all: I am grateful to all [26] D. C. Youla, “On the factorization of rational matrices,” IRE Trans. Inform. Theory, vol. 7, pp. 172-189, July 1961. who have contributed to this field. Several friends have [27] A. V. Balakrishnan, “An operator-theoretic formulation of a generously provided comments on drafts of this paper. class of control problems and a steepest descent method of solution,” SIAM J. Contr., vol. 1, pp. 109-127, 1963. [27a] H. C. Hsieh and R. A. Nesbit, “Functional analysis and its BIBLIOGRAPHY applications to mean-square esttmation problems,” in Modern Control Systems Theory, C. Leondes, Ed. New York: McGraw- A. Wiener Filtering and Related Topics Hill, 1965, pp. 97-120; [l] N. Wiener, Extrapolation, Interpolation and Smoothing of [281 M. C. Davis. “Factoring the soectral matrix,” IEEE Trans. Stationary Time Series, with Engineering Applications. New Automat. Co&., vol. A&8, pp. 296-305, Oct. 1963. York: Technology Press and Wiley, 1949. (Originally issued in ~91 E. Parzen, “A new approach to the synthesis of optimal smooth- February 1942, as a classified Nat. Defense Res. Council Rep.) ing and prediction systems,” in Mathematical Optimization [2] N. Wiener and E. Hopf, “On a class of singular integral equa- Techniques, R. Bellman, Ed. Berkeley, Calif. : Univ. California tions,” Proc. Prussian Acad., Math.-Phys. Ser., p. 696, 1931. Press, 1963, pp. 75-108. [3] R. B. Blackman, H. W. Bode, and C. E. Shannon, “Data [301 Yu. A. Rozanov, Stationary Random Processes. MOSCOW: smoothing and prediction in fire-control systems,” Research & Fizmatgiz, 1963 (Transl.: A. Feinstein, San Francisco: Holden- Development Board, Washington, DC., Aug. 1944. Day, 1967). [4] N. Levinson, “A heuristic exposition of Wiener’s mathematical 1311V. F. Pisarenko and Yu. A. Rozanov, “On some problems for theory of prediction and filtering,” J. Math. Phys., vol. 25, stationary processes reducing to equations of the Wiener- pp. 11&119, July 1947; reprinted as an Appendix in [l]. Hopf type,” Probl. Pered. Inform. (in Russian), vol. 14, pp. [5] H. W. Bode and C. E. Shannon, “A simplified derivation of 113-135, 1963. linear least square smoothing and prediction theory,” Proc. IRE, [321 P. Whittle, Prediction and Regulation. New York: Van Nost- vol. 38, pp. 417425, Apr. 1950. rand Reinhold, 1963. [6] L. A. Zadeh and J. R. Ragazzini, “An extension of Wiener’s 1331C. W. Helstrom, “Solution of the detection integral equation theory of prediction,” J. Appl. Phys., vol. 21, pp. 645-655, for stationarv filtered white noise.” IEEE Trans. Inform. Theory, July 1950. vol. IT-l!, pp. 335-339, July 1965. [7] C. L. Dolph and H. A. Woodbury, “On the relation between 1341K. Steightz, “The equivalence of digital and analog signal Green’s functions and covariances of certain stochastic processes processing, ” Inform. Contr., vol. 8, pp. 455467, 1965. and its applications to unbiased linear prediction,” Trans. P. Masam, “Recent trends in multivariate prediction theory,” Amer. Math. Sot., vol. 72, pp. 519-550, 1952. [351 in1 Multivariate. .- Analysis,.-,, P. R. Krishnaiah, Ed. New York: [8] G. F. Franklin, “The optimum synthesis of sampled-data Academic Press, IYbb. systems,” Electron. Res. Lab., Columbia Univ., New York, [36] F. Csaki and P. Fischer, “On the spectrum factorization,” Tech. Rep. T-6/B, May 1955. Acta Tech. Acad. Sci. Hung., vol. 58, pp. 145-168, 1967. [9] R. Jaffe and E. Rechtin, “Design and performance of phase- lock circuits capable of near-optimum performance over a [37] S. G. Mikhlin and K. L. Smolitsky, Approximate Methods for wide range of input signal and noise levels,” IRE Trans. Inform. Solution of Differential and Integral Equations, New York: Theory, vol. IT-!, pp. 66-76, Mar. 1955. American Elsevier, 1967. [9a] P. Ehas, “Predictive coding,” IRE Trans. Inform. Theory, vol. [38] B. Widrow, P. E. Mantey, L. J. Griffiths, and B. Goode, “Adap- IT-l, pp. 1633, Mar. 1955. tive antenna systems,” Proc. IEEE, vol. 55, pp. 2143-2159, [9b] R. Price, “On entropy equivalence in the time- and frequency- Dec. 1967. domains,” Proc. IRE, vol. 43, p. 484, Apr. 1955. [39] J. F. Claerbout, “A summary, by illustrations, of least squares [lo] A. M. Yaglom, Theory of Stationary Random Functions, trans- filters with constraints,” IEEE Truns. Inform. Theory, vol. IT-14, lated from the Russian by R. A. Silverman. Englewood Cliffs, pp. 269-272, Mar. 1968. N.J.: Prentice-Hall, 1962. (Originally published as a survey [40] H. L. Van Trees, Detection, Estimation, and Modulation Theory, paper in 1955.) Part I. New York: Wiley, 1968. Reviewed by R. A. Scholtz, [ll] M. C. Yovits and J. L. Jackson, “Linear filter optimization with IEEE Trans. Inform. Theory (Book Rev.), vol. IT-14, pp. 612-613, game theory considerations,” in IRE Nat. Conv. Rec., pt. 4, July 1968. pp. 193-199, 1955. [41] I. F. Blake, “Linear filtering and piecewise linear correlation I121 N. Wiener and P. Masani. “The nrediction theory of multivariate functions,” IEEE Trans. Inform. Theory, vol. IT-15, pp. 345-349, stochastic processes, Pt. ‘I,” A& Math., vol. 98, pp. 111-150, May 1969. 1957; Pt. II, ibid., vol. 99, pp. 93-137, 1958. [42] W. M. Brown and R. B. Crane, “Conjugate linear filtering,” [13] G. C. Newton, L. A. Gould, and J. F. Kaiser, Analytical Design IEEE Trans. Inform. Theory, vol. IT-15, pp. 462465, July 1969. of Linear Feedback Controls. New York: Wiley, 1957. [43] D. Slepian and T. T. Kadota, “Four integral equations of [14] M. Shinbrot, “A generalization of a method for the solution of detection theory,” SIAM J. Appl. Math., vol. 17, pp. 1102-l 117, the integral equation arising in optimization of time-varying 1969. linear systems with nonstationary inputs,” IRE Truns. Inform. [44] D. L. Snyder, The State-Variable Approach to Continuous Theory, vol. IT-3, pp. 220-225, Dec. 1957. Estimation, with Applications to Analog Communication Theory. [I 51 F. J. Beutler, “Prediction and filtering for random parameter Cambridge, Mass.: M.I.T. Press, 1969. Reviewed by E. C. Power. IEEE Trans. Inform. Theorv_ (Book Rev.), vol. IT-18, svstems.~A ” IRE Trans. Inform. Theorv._ , vol. IT-4. DD._. 166-171, Dec. 1958. p. 314, Mar. 1972. - [161 H. Laning and R. Battin, Random Processes in Automatic [451 P. Masani, “Review of Stationary Random Processes, by Yu. A. Control. New York: McGraw-Hill, 1958. Rozanov,” Ann. Math. Statist., vol. 42, pp. 1463-1467, 1971. 1171S. Darlington, “Nonstationary smoothing and prediction using t461 J. J. Stiffler, Theory of Synchronous Communications. Engle- network theorv concevts.” IRE Trans. Inform. Theorv_ (Suecial. wood Cliffs, N.J.: Prentice-Hall, 1971. Reviewed by R. A. Suppl.), vol. IT-5, pp. i-14, May 1959. - Scholtz, IEEE Trans. Inform. Theory (Book Rev.), vol. IT-18, 1181P. Leonov, “On an approximate method for synthesizing pp. 218-219, Jan. 1972. optimal linear systems for separating signals from noise,” 1471H. L. Van Trees, Detection, Estimation, and Modulation Theory, Automat. Remote Contr., vol. 20, pp. 1039-1048, 1959. Part II-Nonlinear Modulation Theory. New York: Wiley, [I91 A. V. Balakrishnan, “On a characterization of processes for 1971. Reviewed by J. B. Thomas, IEEE Trans. Inform. Theory which the optimal mean-square systems are of specified form,” (Book Rev.), vol. IT-18, pp. 450-451, May 1972. IRE Trans. Inform. Theory, vol. IT-6, pp. 490-500, Sept. 1960. 1481K. Yao, “On the direct calculations of MMSE of linear realizable [201 Y. W. Lee, Statistical Theory of Communication. New York: estimator by Toeplitz form method,” IEEE Trans. Inform. Wiley, 1960. Theory (Corresp.), vol. IT-17, pp. 95-97, Jan. 1971. r211 V. S. Pugachev, Theory of Random Functions and Its Applica- 1491__ “An alternative approach to the linear causal least-square tions in Automatic Control. Moscow: Goztekhizdat, 1960; filtering theory,” IEEE Trans. Inform. Theory, vol. IT-17, pp. KAILATH: LINEAR FILTERING THEORY 175

232-240, May 1971. linear dynamic systems,” Aeronaut. Syst. Div., Wright-Patterson [50] Yu. A. Rozanov, “Some approximation problems in the theory AFB, Ohio, Tech. Rep. ASD-TDR-63-119, Feb. 1963. of stationary processes,” J. Multivariable Anal., vol. 2, pp. [751 R. E. Kalman. “Lvauunov functions for the nroblem of Lur’e 135-144, June 1972. in automatic control,” Proc. Nat. Acad. Sci., &vol. 49, pp. 201- [51] J. A. Cochran, Analysis of Linear Integral Equations. New 205, Feb. 1963. York: McGraw-Hill, 1972. [761 -, “Mathematical description of linear dynamical systems,” [52] W. C. Lindsey, Synchronization Systems in Communication and SIAM J. Contr., vol. 1, pp. 152-192, 1963. Control. Englewood Cliffs, N.J. : Prentice-Hall, 1972. [771 -, “On a new characterization of linear passive systems,” [53] L. Prouza, “On generalized linear discrete inversion filters,” in Proc. 1st Annu. Allerton Conf Circuit and System Theory, Nov. f$ernrn$ka, vol. 8, pp. 264-267, 1972; also, ibid., vol. 6, pp. 225- 1963, pp. 456-470. . . I781 A. G. J. MacF$ane, “An eigenvector solution of the optimal [54] J. Snyders, “Error expressions for optimal linear filtering of y;;y regulator, J. Electron. Contr., vol. 14, pp. 643-654, June stationary processes,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 574582, Sept. 1972. [791 E. A. Robinson, “Mathematical development of discrete filters [54a] J. Snyders, “Error formulae for optimal linear filtering, pre- for detection of nuclear explosions,” J. Geophys. Res., vol. 68, diction and interpolation of stationary time series,” Ann. Math. DD. 5559-5567. 1963. Statist., vol. 43? pp. 1935-1943, 1972. WI E Swerling, “Comment on ‘A statistical optimizing navigation [54b] M. G. Strintzis, “A solution to the matrix factorization prob- procedure for space flight’,” AZAA J. vol. 1, p. 1968, Aug. 1963. lem,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 225-232, WI P. Whittle, “On the fitting of multivariate autoregressions and Mar. 1972. the approximate canonical factorization of a spectral density [55a] D. G. Messerschmitt, “A geometric theory of intersymbol matrix,” Biometrika, vol. 50, pp.129-134, 1963. interference, Part I: Zero-forcing and decision-feedback equaliza- WI K. Astrom and S. Wensmark, +‘Numerical identification of tion,” BeN Syst. Tech. J., vol. 52, pp. 1483-1519, 1973. stationary time-series,” in Proc. 6th Znt. Instrumentation and [55b] R. W. Lucky, “A survey of the communication theory literature: Measurements Congr., Sept. 1964. 1968-1973,” ZEEE Trans. Inform. Theory, vol. IT-19, pp. 725- [831 R. H. Battin. Astronautical Guidance. New York: McGraw- 739, Nov. 1973. Hill, 1964. [55c] J. Salz, “Optimum mean-square decision feedback equaliza- P341 R. B. Blackman, “Methods of orbit refinement,” Bell Syst. tion,” Bell Syst. Tech. J., 1974. Tech. J., vol. 43, pp. 885-909, May 1964. [84a] H. Cox, “On the estimation of state variables and parameters B. Recursive Wiener and Kalman Filtering for noisy dynamic systems,” IEEE Trans. Automat. Contr., vol. WI A. G. Carlton and J. W. Follin, Jr., “Recent developments in AC-9, pp. 5-12, Jan. 1964. fixed and adaptive filtering,” NATO Advanced Group for [85] R. E. Kalman, “When is a linear control system optimal?,” Aerospace R&D, AGARDograph 21, 1956. Trans. ASME, Ser. D, J. Basic Eng., vol. 86, pp. 51-60, June [571 J. E. Hanson, “Some notes on the application of the calculus 1964. of variations to smoothing for finite time, etc.,” Appl. Phys. [86] H. J. Kushner, “On differential equations satisfied by condi- Lab., Johns Hopkins Univ., Baltimore, Md., Internal Memo tional probability densities of Markov processes,” SIAM J. BBD-346, 1957. Contr., vol. 2, pp. 106-119, 1964. 1581A. J. F. Siegert, “A systematic approach to a class of problems [871 V. M. Popov, “Hyperstability and optimality of automatic in the theory of noise and other random phenomena, Pt. 11,” systems with several control functions,” Rev. Roum. Sci. Tech., IRE Trans. Inform. Theory, vol. IT-13, pp. 38843, Mar. 1957; vol. 9, pp. 629-690, 1964. g Hfu&!., vol. IT-4, pp. 4-14, Mar. 1958. [87a] -! “Incompletely controllable positive systems and ap- [591 “Recursion formulas for growing memory digital plications to optimization and stability of automatic control filters,” ZRE Trans. Inform. Theory, vol. IT-4, pp. 24-30, Mar. systems.” Rev. Roum. Sci. Tech.. Electrotech. Enem..1 vol. 12, 1958. pp. 337L357, 1967. [W B. Friedland, “Least-squares filtering and prediction of non- Rw G. L. Smith, “Multivariable linear filter theory applied to ;t&onary sampled data, Inform. Contr., vol. 1, pp. 297-313, ;g4e vehicle guidance,” SIAM J. Contr., vol. 2, pp. 19-32, Lb11P. Swerling, “First-order error propagation in a stagewise [891 R. Ll Stratonovich and Yu. G. Sosulin, “Optimal detection of a smoothing procedure for satellite observations,” J. Astronaut. Markov process in noise,” Eng. Cybern., vol. 6, pp. 7-19, Oct. Sci., vol. 6? pp. 46-52, Autumn 1959; see also “A proposed 1964. stagewise differential correction procedure for satellite tracking [901 A. E. Bryson and D. E. Johansen, “Linear filtering for time- and prediction,” RAND Corp. Rep. P-1292, Jan. 1958. varying systems using measurements containing colored noise,” [Ql R. E. Kalman, “On the general theory of control,” in Proc. 1st IEEE Trans. Automat, Contr,, vol. AC-IO, pp. 4-10, Jan. 1965. ZFA C Cong. London : Butterworth, 1960. [911 R. S. Bucy, “Nonlinear filtermg theory,” IEEE Trans. Automat. K31 -, “Contributions to the theory of optimal control,” Bo/. Contr. (Corresp.), vol. AC-lo, p. 198, Apr. 1965. Sot. Mat. Mex., vol. 5, pp. 102-119, 1960. 1921H. H. Rosenbrock, “On the connection between discrete linear 1641 -, “A new approach to linear filtering and prediction prob- filters and some formulae of Gauss,” in Act. Congr. Automatique lems,” J. Basic Eng., vol. 82, pp. 3445, Mar. 1960. Theorique. Paris: Dunod, 1965. [651R. L. Stratonovich, “Apphcation of the theory of Markov 1931F. C. Schweppe, “Evaluation of likelihood functions for Gaus- processesfor optimum filtration of signals,” Radio Eng. EIectron. sian signals,” IEEE Trans. Inform. Theory, vol. IT-11, pp. 61-70, Phys. (USSR), vol. 1, pp. l-19, Nov. 1960. 1965. [661- “Conditional Markov process theory,” Theory Prob. [941 E. B. Stear, “Shaping filters for stochastic processes,” in Modern Appl: (USSR), vol. 5, pp. 156178, 1960 Control Systems Theory, C. T. Leondes, Ed. New York: [671 M. Blum, :‘A stagewise parameter estimation procedure for McGraw-Hill, 1965, pp. 121-155. correlated data,” Numer. Math., vol. 3, pp. 202-208, 1961. r951 P. Whittle, “Recursive relations for predictors of non-stationary [68] R. E. Kalman, “New methods of Wiener filtering theory,” processes,” J. Roy. Statist, Sot., Ser. B, vol. 27, pp. 525-532, in Proc. 1st Symp. Engineering Applications of Random Function 1965. Theory and Probability, J. L. Bogdanoff and F. Kozin, Eds. [961 R. A. Wiggins and E. A. Robinson, “Recursive solution to the New York: Wiley, 1963, pp. 270-388; also, RIAS, Baltimore, multichannel filtering problem,” J. Geophys. Res., vol. 70, Md., Tech. Rep. 61-1, 1961. pp. 1885-1891, Apr. 1965. [69] R. E. Kalman and R. S. Bucy, “New results in linear filtering [97l W. M. Wonham, “Some applications of stochastic differential and prediction theory,” Trans. ASME, Ser. D, J. Basic Eng., eauations to optimal nonlinear filtering,-.” SIAM J. Contr., vol. 83, pp. 95-107, Dec. 1961. vd. 2, pp. 347-369, 1965. [70] R. H. Battin. “A statistical ontimizine navieation nrocedure for [98] B. D. 0. Anderson, “Time-varying spectral factorization,” space flight,” J. Amer. Rocked Sot., v& 32, ip. 168’1-1692,1962. Stanford Electron. Lab., Stanford, Calif., Tech. Rep. SEL-66-107, I711 J. D. McLean, S. F. Schmidt, and L. A. McGee, “Optimal Oct. 1966. filtering and linear prediction applied to a midcourse navigation [98a] P. Businger and G. H. Golub, “Linear least-squares solutions system for the circumlunar mission,” NASA Rep. TND-1208, by Householder transformation,” Math. Comput., vol. 20, pp. 1962. 325-328, 1966. [721 S. F. Schmidt., “State-space techniques applied to the design of a [99] R. E. Mortensen, “Optimal control of continuous-time stochastic space navigation system,” m Proc. 1962 Joint Automatic Control systems,” Ph.D. dissertation, Univ. California, Berkeley, 1966. Conf, Paper 11-3. [loo] J. E. Potter, “Matrix quadratic solutions,” SIAM J. Appl. r731 V. A. Yakubovic, “The solution of certain matrix inequalities Math., vol. 14, pp. 496501, May 1966. in automatic control theory,” Dokl. Akad. Nauk SSSR, vol. 143, [loll A. N. Shiryaev, “Stochastic equations of nonlinear filtration pp. 1304-1307, 1962. for purely discontinuous Markov processes,” Probl. Peredach. [741 A. E. Bryson and M. Frazier, “Smoothing for linear and non- Inform., vol. 2, pp. 3-22, 1966. 176 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974

[IO21 R. L. Stratonovich and Yu. G. Sosulin, “Optimum reception of A. T. Bharucha-Reid, Ed. New York: Academic Press, 1969, signals in nonGaussian noise,” Radio Eng. Electron., vol. 11, ch. 2. pp. 497-507, Apr. 1966. H331 L. E. Zachrisson, “On optimal smoothing of continuous-time [103] D. C. Youla, “The synthesis of linear dynamical systems from Kalman processes,” Inform. Sci., vol. 1, pp. 143-172, 1969. prescribed weighting patterns,” SIAM J. Appl. Math., vol. 14, [I341 L. H. Brandenburg and M. E. Meadows, “Shaping filter rep- pp. 527-549, May 1966. resentation of nonstationary colored noise,” IEEE Trans. [IO41 B. D. 0. Anderson, “An algebraic solution to the spectral Inform. Theory, vol. IT-17, pp. 26-31, Jan. 1971; see also L. factorization problem,” IEEE Trans. Automat. Contr., vol. Brandenburg, Ph.D. dissertation, Columbia Univ., New York, AC-12, pp. 410-414, Aug. 1967. .June 1970. [105] B. D. 0. Anderson and J. B. Moore, “Solution of a time- v351 R. W. Brockett, Finite-Dimensional Linear Systems. New varying Wiener filtering problem,” Electron. Lett., vol. 3, pp. York: Wiley, 1970. 562-563, Dec. 1967. U361 R. S. Bucy, “Linear and nonlinear filtering,” Proc. IEEE, vol. 58, [106] K. J. Astrom, Introduction to Stochastic Control Theory. New pp. 854-864, June 1970. York: Academic Press, 1967. [1371 A. Gersho and D. J. Goodman, “Projecting filters for recursive [106a] A. Bjorck and G. H. Golub, “Iterative refinement of linear prediction of discrete-time processes,” Bell Syst. Tech. J., vol. 49, least-squares solutions by Householder transformation,” BIT, pp. 2377-2403, Nov. 1970. vol. 7, pp. 322-337, 1967. Cl381 A. H. Jazwinski, Stochastic Processes and Filtering Theory. PO71 J. Burg, “Maximum entropy spectral analysis,” in Proc. 37th New York: Academic Press, 1970. Annu. Meet. Sot. Explor. Geophys., 1967. [I391 C. T. Leondes, Ed., “Theory and applications of Kalman UW T. E. Duncan, “Probability densities for diffusion processes filtering,” NATO Advanced Group for Aerospace R&D, with applications to nonlinear filtering theory and detection AGARDOgraph 139, Feb. 1970. theory,” Ph.D. dissertation, Dep. Elec. Eng., Stanford Univ., [I401 J. B. Moore and B. D. 0. Anderson, “Spectral factorization of Stanford, Calif., June 1967. Time-varying covariance functions,” Math. Syst. Theory, vol. 4, uo91 H. J. Kushner. Stochastic Stabilitv and Control. New York: pp. 10-23, 1970. Academic, 1967. [1411 R. L. Stratonovich, “Detection and estimation of signals in t1101E. A. Robinson, Multichannel Time-Series Analysis with Digital noise when one or both are non-Gaussian,” Proc. IEEE, vol. 58, Computer Programs. San Francisco, Calif., Holden-Day, 1967. E. ;7C-;;9,Gya~ 1970. [llll W. M. Wonham, “Lecture notes on stochastic optimal control,” [I421 Autoregressive model fitting for control,” Ann. Div. Appl. Math., Brown Univ., Providence, R.I., Rep. 67-1, Inst. Statis;. Math., vol. 23, pp. 163-180 1971. 1967. D431 B. D. 0. Anderson and J. B. Moore, Linear Optimal Control. H121C. Bruni, A. Isidori, and A. Ruberti, “A method of factorization Englewood Cliffs, N.J., Prentice-Hall, 1971. of the impulse-response matrix,” IEEE Trans. Automat. Contr. [I441 __ “The Kalman-Bucy filter as a true time-varying Wiener (Corresp.), vol. AC-13, pp. 739-741, Dec. 1968. filter:” IEEE Trans. Syst., Man, Cybern., vol. SMC-1, pp. 119- u131 R. S. Bucy and P. D. Joseph, Filtering for Stochastic Processes 128, Apr. 1971. with Applications to Guidance. New York: Wiley, 1968. P451 M. Athans, Ed., Special Issue on Linear-Quadratic-Gaussian [I141 L. D. Collins, “Realizable whitening filters and state-variable Problem, IEEE Trans. Automat. Contr., vol. AC-16, Dec. realizations,” Proc. IEEE (Lett.), vol. 56, 100-101, Jan. 1968. 1971. PI51 T. E. Duncan, “Evaluation of likelihood functions,” Inform. [I461 R. Bellman and E. D. Denman, Eds., “Invariant imbedding,” Contr.. vol. 13, pp. 62-74, July 1968. in,. Lecture.--. Notes in Operations Research, vol. 52. New York: PI61 T. Kailath, “An-innovations -approach to least-squares estima- sponger, 19 11. tion-Part I: Linear filtering in additive white noise.” IEEE [I471 G. Epstein, “On finite-memory, recursive filters,” ZEEE Trans. Trans. Automat. Contr., vol.-AC-13, pp. 646655, De& 1968. Inform. Theory (Corresp.), vol. IT-17, pp. 486487, July 1970; I1171 T. Kailath and P. Frost, “An innovations approach to least- see also ibid,, p. 614, Sept. 1971, p. 753, Nov. 1971. squares estimation, Part II: Linear smoothing in additive white I1481 P. G. Kaminski “Square-root filtering and smoothing for noise,” IEEE Trans. Automat. Cont., vol. AC-13, pp. 655-660, discrete processes:” Ph.D. dissertation, Stanford Univ., Stanford, Dec. 1968. Calif., Sept. 1971. [liS] G. Kallianpur and C. Striebel, “Estimation of stochastic systems: [148a] P. G. Kaminski, A. E. Bryson, Jr., and S. F. Schmidt, “Discrete arbitrary system process with additive white observation square root filtering: A survey of current techniques,” IEEE errors,” Ann. Math. Statist., vol. 39, pp. 785-801, 1969. Trans. Automat. Contr., vol. AC-16, pp. 727-735, Dec. 1971. H191 R. E. Kalman. “Lectures on controllability and observability,” [149] T. Kailath and R. A. Geesey, “An innovations approach to Lecture Notes; CIME, Bologna, 1968. - least-squares estimation-Part IV: Recursive estimation given WI R. S. Lipt’ser and A. N. Shiryaev, “Nonlinear filtering of Markov lumped covariance functions,” IEEE Trans. Automat. Contr., diffusion processes,” Proc. Steklov Inst. Math. (English Transl.), vol. AC-1.6,. pp. 720-727, Dec. 1971. vol. 104, pp. 163-218, 1968. [150] D. G. Lamiotis, “Optimal nonlinear estimation,” Znt. J. Contr., WI ~ “Nonlinear interpolation of Markov diffusion processes,” vol. 14, pp. 1137-1148, 1971. Theory Prob. Appl. (USSR). vol. 13, pp. 564-583, 1968. [150a] --, “Optimal linear smoothing: Continuous-data case,” Znt. w-1 J. B. Moore and B. D. 0. Anderson, “Extensions of quadratic J. Contr., vol. 17, pp. 921-930? May 1973. minimization theory, I,” Znt. J. Contr., vol. 7, pp. 465472, 1968. [151] K. Martensson, “On the matrix Riccati equation,” Znform. Sci., r1231 F. C. Schweppe and H. K. Knudsen, “The theory of amor- vol. 3, pp. 1749, 1971. phous cloud trajectory prediction,” IEEE Trans. Znform Theory, [151a] J. K. Omura, “Optimal receiver design for convolutional codes vol. IT-14, pp. 415-427, May 1968. and channels with memory via control theoretical concepts,” ~1241 E. Stear and A. Stubberud. “Ootimal filtering for Gauss- Inform. Sci., vol. 3, pp. 243-266, 1971. Markov noise,” Znt. J. Contr.; ~01.~8,pp. 123-130, 1968. u521 I. B. Rhodes, “A tutorial introduction to estimation and filter- n251 W. G. Tuel, “Computer algorithm for spectral factorization ing,” IEEE Trans. Automat. Contr., vol. AC-16, pp. 688-707, of rational matrices,” IBM J. Res. Develop., vol. 12, pp. 163-170, Dec. 1971. Mar. 1968. [I531 A. P. Sage and J. L. Melsa, Estimation Theory with Applications W61 W. M. Wonham, “On a matrix Riccati equation of stochastic to Communication and Control. New York: McGraw-Hill, control,” SIAM J. Contr., vol. 6, pp. 681-697, Nov. 1968. 1971. Reviewed by K. Yao, IEEE Trans. Znform. Theory (Book [127] B. D. 0. Anderson, J. B. Moore, and S. G. Loo, “Spectral Rev.), vol. IT-19, pp. 374-376, May 1973. factorization of time-varying covariance functions,” IEEE [154] M. D. Srinath and P. K. Rajasekaran, “Estimation of randomly Trans. Inform. Theory, vol. IT-15, pp. 550-557, Sept. 1969. occurring stochastic signals in Gaussian noise,” IEEE Trans. VW A. E. Bryson and Y. C. Ho, Applied Optimal Control. Waltham, Znform. Theory, vol. IT-17, p. 206, Mar. 1971. Mass. ; Blaisdell, 1969. u551 P. Swerling, “Modern state estimation methods from the [128a] P. Dyer and S. McReynolds, “Extension of square-root filtering viewpoint of the method of least squares,” IEEE Trans. Automat. to include process noise,” J. Optimiz. Theory Appl., vol. 3, Contr. vol. AC-16, pp. 707-720, Dec. 1971. pp. 444459, 1969. [I561 A. van den Bos, “Alternative interpretation of maximum [129] R. E. Kalman, P. Falb, and M. A. Arbib, Topics in Mathematical entropy spectral analysis,” IEEE Trans. Inform. Theory (Corresp.), System Theory. New York: McGraw-Hill, 1969. vol. IT-17. DD. 493-494. Julv 1971. [129a] N. Morrison, Introduction to Sequential Smoothing and Predic- [I571 H. L. Van’ Trees, Detection, Estimation, Modulation Theory, tion. New York: McGraw-Hill,, 1969. Pt. III-Radar-Sonar Signal Processing and Gaussian Signals in t1301 N. E. Nahi, “Optimum recursive estimation with uncertain Noise. New York: Wiley, 1971. observation.” IEEE Trans. Inform. Theory._ , vol. IT-15. __DU. 457- _11581 - J. C. Willems, “Least squares stationary optimal control and 462, July 1969. the algebraic Riccati equation,” IEEE Trans. Automat. Contr., u311 J. Rissanen and L. Barbosa, “Properties of infinite covariance vol. AC-16, pp. 621-634, Dec. 1971. matrices and stability of optimum predictors,” Inform. Sci., [158a] --, “Dissipative dynamical systems, Part II: Linear systems vol. 1, pp. 221-236, 1969. with quadratic supply rates,” Arch. Ration. Mech. Anal., vol. 45, U321 W. M. Wonham, “Random differential equations in control pp. 352-393, 1972. theory,” in Probabilistic Methods in Applied Mathematics, [159] M. G. Wood, J. B. Moore, and B. D. 0. Anderson, “Study of an KAUATH: LINEAR FILTERING THEORY 177

integral equation arising in detection theory,” IEEE Trans. 1973. See also IEEE Trans. Automat. Contr., vol. AC-19, Aug. Znform. Theory, vol. IT-17, pp. 677-686, Nov. 1971. 1974. [160] A. V. Balakrishnan, “System theory and stochastic optimization,” [f85] G. S. Sidhu, T. Kailath, and M. Morf, “Development of fast in Proc. NATO 1972 Advanced Study Institute on Network and algorithms via innovations decompositions,” in Proc. 7th Hawaii St@aI Theory, Bournemouth, England, 1972. Znt. Conf Syst. Sci., Honolulu, Hawaii, Jan, 1974. [16Oa] A. F. Fath, “Computational aspects of the linear optimal [186] R. Sh. Lipt’ser and A. N. Shiryaev, “Statistics of conditionally regulator problem,” IEEE Trans. Automat. Contr., vol. AC-14, Gaussian random sequences,” in Proc. 6th Berkeley Symp. pp. 547-550, Oct. 1969; also, A. E. Bryson and W. E. Hall, Mathematics, Statistics, and Probability, vol. III. Berkeley, “Optimal control and filter synthesis by Eigenvector decomposi- Calif.: Univ. California Press, 1973, pp. 389422. tion,” Dep. Aeron. and Astron., Stanford Univ., Stanford, [187] B. P. Molinari, “The stabilizing solution of the algebraic Riccati Cahf., Rep. 436, Dec. 1971. equations,” SIAM J. Contr., vol. 11, pp. 262-272, May 1973. [161] J. L. Casti, R. E. Kalaba, and V. K. Murthy, “A new initial- [188] -, “Equivalence relations for the algebraic Riccati equation,” value method for on-line filtering and estimation,” IEEE Trans. SIAM J. Contr., vol. 11, pp. 272-286, May 1973. Inform. Theory, vol. IT-18, pp. 515-518, July 1972. [189] J. B. Moore and P. Hetrakul, “Optimal demodulation of PAM [162] P. E. Gill, G. H. Golub, W. Murray,. and M. A. Saunders, signals,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 188-197, “Methods for modifying matrix factonzations,” Cornput. Sci. Mar. 1973. Dep., Stanford Univ., Stanford, Calif., Rep. CS-72-322, Nov. [190] H. J. Payne and L. M. Silverman, “On the discrete time alge- 1972. braic Riccati equation,” IEEE Trans. Automat. Contr., vol. [I631 K. L. Hitz and B. D. 0. Anderson, “Iterative method of com- AC-18, pp. 226234, June 1973. puting the limiting solution of the matrix Riccati differential [19Oa] M. Aoki, “On the subspaces associated with partial recon- equation,” Proc. Inst. Elec. Eng. vol. 119, pp. 140221406,Sept. struction of state vectors, the structure algorithm and the pre- 1972. dictable directions of Riccati equations,” IEEE Trans. Automat. [164] F. Itakura, “Extraction of feature parameters of speech by Contr., vol. AC-18, pp. 399400, Aug. 1973. statistical methods,” in Proc. 8th Symp. Speech Information [191] M. Morf, “Discrete-time multivariable systems,” Ph.D. disserta- Processing, Feb. 1972. tion,, Stanford Univ., Stanford, Calif., 1974. [165] T. Kailath, “A note on least-squares estimation by the innova- [I921 J. Rtssanen, “A fast algorithm for optimum predictors,” IEEE tions method,” J. SIAM Contr:, vol. 10, pp. 477486, Aug. 1972. Trans. Automat. Contr., vol. AC-18, p. 555, Oct. 1973. [166] -, “Some Chandrasekhar-type algorithms for quadratic [193] --, “Algorithms for triangular decomposition of block Hankel regulator problems,” m. Proc. IEEE Conf Decision and Control and Toeplitz matrices with application to factorizing positive and 11th Symp. Adaptive Processes,Dec. 1972, pp. 219-223. matrix polynomials,” Math. Comput., vol. 27, pp. 147-154, 1973. [I671 V. KuEera, “A contribution to matrix quadratic equations,” [193a] F. C. Schweppe, Uncertain Dynamic Systems. Englewood IEEE Trans. Automat. Contr., vol. AC-17, pp. 344346, June Cliffs, N.J. : Prentice-Hall, 1973. -_1973 .-. [193b] A. V. Balakrishnan, “Stochastic differential systems, Pt. I,” [167a] --, “A review of the matrix Riccati equation,” Kybernetika, in Lecture Notes in Economics and Mathematical Systems, vol. 9, pp. 42-61, 1973. vol. 84. New York: Springer, 1973. Hf31 H. Kwakernaak and R. Sivan, Linear Optimal Control Svstems. 1193cl H. J. Pavne and L. Silverman. “Matrix Riccati eauations and New York: Wiley, 1972. -system structure,” in Proc. IEEE Conf on Decision and Control, [I691 J. T. H. Lo, “Finite-dimensional sensor orbits and optimal pp. 558-563, Dec. 1973. nonlinear filtering,” IEEE Trans. Inform. Theory, vol. IT-18, [193d] M. Morf and T. Kailath, “Square-root algorithms for linear pp. 583-588, Sept. 1972. least-squares estimation and control,” in Proc. 8th Princeton [1701 D. Rappaport, “Constant directions of the Riccati equation,” Symp. Information and System Science, 1974. Automatica, vol. 8, pp. 175-186, Mar. 1972. See also IEEE Trans. Automat. Contr., vol. AC-15, pp. 5355540, Oct. 1970. 11711 J. M. Rodriguez-Canabal, “The geometry of the Riccati equa- C. Some Early Mathematical Work tion,” Ph.D. dissertation, Univ. Southern California, Los 11941 G. Galileo, “Dialog0 Sopra i due Massimi Sestemi de1 Mondo:

Angeles.Urn --, June--~~- --1972. --- Tolemaico e Conernicano. Florence: Landini. 1632 (Transl. : r1721 D. L. Snyder, “Filtering and detection for doubly stochastic Berkeley, Calif. : ‘Univ California Press, 1953.) ’ . Poisson processes,” IEEE Trans. Inform. Theory, vol. IT-18, [I951 Jacopo Francesco, Count Riccati, “Animadversationes in pp. 91-101, Jan. 1972. aequationes differentiales secundi gradus,” in Actorum Erudi- r1731 H. Wakita, “Estimation of the vocal tract shape by optimal torum quae Lipsiae publicantur, Suppl. 8, pp. 6673, 1724. inverse filtering,” Speech Commun. Res. Lab., Inc., Santa [I961 R. Adrain. “Research concernma the mobabilities of the errors Barbara, Calif., Mono. 9, July 1972. See also IEEE Truns. Audio which happen in making observations,” Analyst, vol. 1, pp. Electroacoust., vol. AU-21, pp. 417-427, Oct. 1973. 193-209, 1808. 11741 G. T. Wilson, “The factorization of matrical spectral densities,” [I971 C. F. Gauss, Theoria Motus Corporum Coelestium in Sectionibus SIAM J. Appl. Math., vol. 23, pp. 420-426, Dec. 1972. Conicis So/em Ambientum. Hamburg.-, 1809 (Transl. : Dover, New u751 H. Aasnaes and T. Kailath, “An-innovations approach to least- York: 1963). squares estimatton-Pt VII: Some applications of vector H981 A. M. Legendre, “Methode des moindres qua&s, pour trouver autoregressive-moving average models,” IEEE Trans. Automat. le milieu le plus probable entre les resultats de differentes Contr., vol. AC-18, pp. 601-607, Dec. 1973. observations,” Mem. Inst. France, pp. 1499154,1810. V761 H. Aasnaes and T. Kailath, “Robustness of linear-least squares [I991 G. H. Hardy, “The mean value of the modulus of an analytic filtering algorithms,” in Proc. 1973 Joint Automatic Control Conf function ” Proc. London Math. Sot., vol. 14, pp. 269-277, 1915. See also IEEE Trans. Automat. Contr., vol. AC-19, June 1974. t2w G. Sze&. “Ein Grenszwertsatz uber die Toeohtzschen Deter- v77l B. D. 0. Anderson and S. Vongpanitlerd, Network Analysis and minanten einer reellen positiven Funktion,” Math. Ann., vol. 76, Synthesis-A Modern Systems Theory Approach. Englewood pp. 490-503, 1915. Cliffs, N.J. : Prentice-Hall, 1973. 12011 ---. “Beitraze zur Theorie der Toeplitzschen Formen,” Math. 11781 J. A. Edward and M. M. Fitelson, “Notes on maximum- Z., vol. 6, pp: 167-202, 1920. ^ entropy processing, IEEE Trans. Inform. Theory. (Corresp.), [2021 R. Frisch, “Correlation and scatter in statistical variables,” vol. IT-19, pp. 232-234, Mar. 1973. Nord. Stat. Tidsk., vol. 8, pp. 36102, 1928. 11791 M. Gevers and T. Kailath, “Constant, predictable and degenerate ~2031 A. N. Kolmogorov, Foundations of the Theory of Probability, directions of the discrete-time Riccati equation,” Automatica, New York: Springer, 1933 (Transl.: New York: Chelsea, 1950). -_ -^_ vol. 9, ,,.pp. 699-712,.Nov. 1973. m41 R. E. A. C. Paley and N. Wiener, “Fourier transforms in-t-he [lrtuj -, “An ,mnovattons approach to least-squares estimation- complex domain,” Amer. Math. Sot. Colloq. Publ., vol. 19, lY34. Part VI: Dtscrete-time innovations representations and recursive [205] M. Frechet, Recherches theoriques modernes sur la thtorie des estimation,” IEEE Trans. Automat. Contr., vol. AC-18, pp. 588- Probabilities, vol. 1. Paris: Gauthier-Villars, 1937; 2nd ed., 600, Dec. 1973. 1950. [181] T. Kailath and R. Geesey, “An innovations approach to least- [206] H. Wold, A Study in the Analysis of Stationary Time Series. squares estimation-Part V: Innovations representations and Uppsala, Sweden: Almqvist and Wiksell, 1938; 2nd. ed,, 1954. recursive estimation in colored noise,” IEEE Trans. Automat. [207] A. N. Kolmogorov, “Sur l’interpolation et extrapolatton des Contr., vol. AC-18, pp. 435453, Oct. 1973. suites stationnaires,” C. R. Acud. Sci., vol. 208, p. 2043, 1939. [182] R. H. Jones, “Autoregressive spectrum estimation,” in Proc. [208] G. Szegii, “Orthogonal polynomials,” Amer. Math. Sot. Colloq. Amer. Meteorol. Sot. 3rd Conf Probability and Statistics in Publ., vol. 23, 1939; 2nd ed., 1958; 3rd ed. 1967. Atmosuheric Science. 1973. [209] A. N. Kolmogorov, “Stationary sequences in Hilbert space” [183] T. Kailath,, “Some new algorithms for recursive estimation in (in Russian), Bull. M&h. Univ. Moscow, vol. 2, no. 6, 1941 constant hnear systems,” IEEE Trans. Inform. Theory, vol. (A transl. by N. Artin is available in many libraries). IT-19, pp. 750-760, Nov. 1973. [210] -, “Interpolation and extrapolation of stationary random [184] T. Kailath, M. Morf, and G. S. Sidhu. “Some new algorithms sequences,” Bull. Acad. Sci. USSR, Ser Math., vol. 5, 1941; for recursive estimation in constant discrete-time linear @stems,” Transl.: RAND Corp., Santa Monica, Calif., Memo. in Proc. 7th Princeton Symp. Information and System Science, RM-3090-PR, Apr. 1962). 178 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974

[211] V. A. Ambartsumian, “Diffuse reflection of light by a foggy [243] J. Hajek, “On linear statistical problems in stochastic processes,” medium,” Dokl. Akad. Sci. SSSR, vol. 38, pp. 229-322, 1943. Czech. Math. J., vol. 12, pp. 404-444, 1962. .12121 _ R. S. Phihios. “Servomechanisms.” M.I.T. Radiation Lab.. [244] K. Hoffman, Banach Spacesof Analytic Functions. Englewood Cambridge, ‘Mass.: Rep. 372, May 1943; also in Theory of Cliffs, N.J. : Prentice-Hal!, 1962. Servomechanisms, H. M. James, N. B. Nichols, and R. S. [245] P. Masani, “Shift-invartant spaces and prediction theory,” Phillips, Eds. New York: McGraw-Hill, 1957, ch. 7. Acta Math., vol. 107, pp. 275-290, 1962. [213] J. L. Doob, “The elementary Gaussian processes,” Ann. Math. [246] P. Masani and J. B. Robertson, “The time-domain analysis of Statist., vol. 15, pp. 229-282, 1944. continuous-parameter, weakly stationary stochastic processes,” [214] M. G. Krein, “On a generalization of some investigations of Pac. J. Math., vol. 12, pp. 1361-1378, 1962. G. Szego, W. M. Smirnov, and A. N. Kolmogorov,” Dokl. [247] R. M. Redheffer, “On the relation of transmission-line theory to Akad. Nauk SSSR, vol. 46, pp. 91-94,1945. scattering and transfer,” J. Math. Phys., vol. 41, pp. 141, 1962. [215] -, “On a problem of extrapolation of A. N. Kolmogorov,” [247a] G. Baxter, “A norm inequality for a ‘finite-section’ Wiener- Dokl. Akad. Nauk SSSR, vol. 46, pp. 306-309, 1945. Hoof eauation.” Ill. J. Math.. vol. 7. DD. 97-103. 1963. [216] W. T. Reid, “A matrix differential equation of the Riccati [248] M.-Lo&e, Probability Theory; 3rd ed. - ‘New York: Van Nost- type,” Amer. J. Math., vol. 68, pp. 237-246, 1946. rand Reinho!d, 1963. [216a] -, Riccati Differential Equations. New York: Academic [248a] F. B. Atkinson, Discrete and Continuous Boundary Problems. Press, 1972. New York : Academic Press, 1964. .12171S. Chandrasekhar, “On the radiative equilibrium of a stellar [249] A. Devinatz, “Asymptotic estimates for the finite predictor,” atmosphere, Pt XXI,” Astrophys. J., vol. 106, pp. 152-216, Math. Stand., vol. 15, pp. 111-120, 1964. 1947; Pt XXII, ibid, vol. 107, pp. 48-72, 1948. [250] H. Helson, Lectures on Knvariant Subspaces. New York: [2181 N. Levinson, “The Wiener rms (root-mean-square) error Academic Press, 1964. criterion in filter design and prediction,” J. Math. Phys., vol. 25, [251] I. A. Ibragimov, “On the asymptotic behavior of the prediction pp. 261-278, Jan. 1947; reprinted as appendix in [l]. error.” Theorv Prob. Avol. USSR. vol. 9. vv. 627-633. 1964. w91 J. L. Doob, “Time series and harmonic analysis,” in Proc. [252] B. Noble, “The numerical solution of nonlinear integral equations Berkeley Symp. Mathematics, Statistics, and Probability. and related topics,” in Nonlinear Integral Equations, P. M. Berkeley, Calif.: Univ. California Press, 1949, pp. 303-343. Anselone, Ed. Madison, Wis. : Wisconsin Univ. Press, 1964. t2201 S. Chandrasekhar, Radiative Transfer. Oxford, England: [253] C. M. Deo, “Prediction theory of nonstationary random Oxford Univ. Press, 1950; also New York: Dover,. 1960. processes,” Sankhya, Ser. A, vol. 27, pp. 113-132, 1965. 12211 0. Hanner, “Deterministic and nondeterministlc stationary [253a] J. L. Doob, “Wiener’s work in probability theory,” Bull. random processes,” Ark. Mat., vol. 1, pp. 161-177, 1950. Amer. Math. Sot., vol. 72, no. 1, pt. II, pp. 69-72, Jan. 1966. P-221K. Karhunen, “Uber die Struktur Stationarer Zufalliger Funk- [254] H. Kagiwada and R. E. Kalaba, “An initial-value method for tionen,” Ark. Mat., vol. 1, pp. 141-160, 1950 (Transl.: RAND Fredholm integral equations of convolution type,” RAND Corp.. Memo. RM-3091-PR. Am. 1962). Corn Memo RM-5186-PR, 1966. w31 U. Grenander, “On Toeplitz forms and stationary processes,” L-1 P. Masani, “Wiener’s contribution to generalized harmonic Ark. Mat., vol. 1, pp. 555-571, 1952. analysis, prediction theory and filter theory,” Bull. Amer. Math. 12241I. M. Gel’fand and B. H. Levitan, “On the determination of a Sot., vol. 72, no. 1, pt. II, pp. 73-125, Jan. 1966. differential equation from its spectral function,” Zzv. Akad. [2561 G. M. Wing, “On certain integral equations reducible to initial Nauk SSSR, vol. 15, pp. 309-360, 1951 (Transl.: Amer. Math. value problems,” SIAM Rev., vol. 9, pp. 655-670, 1967. Sot. Transl., SeE. 2, vol. 1, pp. 253-304, 1955). W'l A. Devinatz and M. Shinbrot. “General Wiener-Hoof ooerators.” L-1 J. L. Doob, Stochastic Processes. New York: Wiley, 1953. Trans. Amer. Math. Sot., vol. 145, pp. 467494, Nov. -1967. ’ P261 M. G. Krem, “Some problems of the effective determmation of L-1 P. L. Duren, Theory of HP Spaces. New York: Academic a nonhomogeneous string by means of its spectral function,” Press, 1970. Dokl. Akad. Nauk SSSR. vol. 93. DD. 617-620. 1953. 12591 H. Dym and H. P. McKean, “Application of de Branges spaces 12271---, “On a fundamental’approximation problem in the theory - _ of integral functions to the prediction of stationary Gaussian of extrapolation and filtration of stationary processes,” Dokl. processes,”Ill. J. Math., vol. 14, pp. 299-343, 1970. Akad. Nauk SSSR, vol. 94, pp. 13-16, 1954 (Transl. : Select. [2601 H. Dym and H. P. McKean, “Extrapolation and interpolation Trunsl. Probl. Math. Statist., vol. 4, pp. 127-131, 1964). of stationary Gaussian processes,” Ann. Math. Statist., vol. 41, P28j --, “On integral equations governing differential equations pp. 1817-1844,Dec. 1970. of second order,” Dokl. Akad. Nauk SSSR, vol. 97, pp. 21-24, t2611 B. Sz. Nagy and C. Foias, Harmonic Analysis of Operators on 1954. Hilbert Space. New York: Academic Press, 1970. 12291-, “On a new method of solving linear integral equations of [261a] L. D. Pitt, “On problems of trigonometrical approximation the first and second kinds,” Dokl. Akad. Nauk SSSR, vol. 100, from the theory of stationary Gaussian processes,” J. Multi- pp. 413416, 1955. variable Anal., vol. 2, pp. 145-161, June 1972. ~2301-. “The continuous analoaues of theorems on oolvnomials orthogonal on the unit circle;” Dokl. Akad. Nauk*SSSR, vol. D. Canonical Representations,Innovations, Martingales, and All That 104, pp. 637-640, 1955. [262] V. I. Krylov, “On functions regular in a half-plane,” Mat. 12311R. E. Bellman, “Functional equations in the theory of dynamic Sb.. vol. 6, pp. 95-138, 1939 (Transl.: Amer. Math. Sot. Transl. programming, VII: A partial differential equation for the (2); vol. 32; pp. 37-81, 1963. Fredholm resolvent,” Proc. Amer. Math. Sot., vol. 8, pp 435 [263] H. W. Bode, Network Analysis and Feedback Amplifier Design, 440, 1957. Princeton, N.J. : Van Nostrand Reinhold, 1945. t2321 K. M. Case, “On Wiener-Hopf equations,” Ann. Phys., vol. 2, [264] A. Beurling, “On two problems concerning linear transforma- pp. 384405, 1957. tions in Hilbert space,” Acta Math., vol. 81, pp. 239-255, 1949. WI E. G. Gladyshev, “On multidimensional stationary random [265] A. M. Yaglom and M. S. Pinsker, “Random processes with processes,” Theory Prob. Appl. (in Russian), vol. 3, pp. 425428, stationarv increments of order n.” Dokl. Akad. Nauk SSSR. 1958. vol. 90, pp. 731-733, 1953. PW I. C. Gohberg and M. G. Krein, “Systems of integral equations [2661 P. Levy, “Sur une classe de courbes de l’espace de Hilbert et on a half-axis with kernels depending on the difference of the sur une equation integrale non lineaire,” Ann. Sci. EC. Norm. arguments,” Usp. Mat. Nauk, vol. 13, pp. 3-72, 1958. Super., vol. 73, pp. 121-156, 1956. WI M. G. Krein. “Intearal eauations on a half-axis with kernel 12671 N. Wiener and G. Kahianour. “Nonlinear orediction.” Office depending on’the diff&ence’of the arguments,” Usp. Mat. Nauk, - 1 of Naval Research, Tech. ‘Rep. 1, Cu-2-56:NONR-266, (39)- vol. 13, pp. 3-120, 1958. CIRMIP, Project NR-047-015, 1956. [2361 U. Grenander and G. Szego, Toeplitz Forms and Their Applica- [2681 H. Cramer, “On some classes of nonstationary stochastic lions. Berkeley, Calif. : Univ. California Press. 1958. vrocesses.” in Proc. 4th Berkelev Svmn. Mathematics. Statistics. W'l P. Masani, “Cramer’s theorem on monotone matrix-valued and Probability. Berkeley, Calif : -Univ. California Press; functions and the Wold decomposition,” in Probability and 1960, pp. 57-78. f&tistics, U. Grenander, Ed. New York: Wiley, 1959, pp. 175- [269] T. Hida, “Canonical representations of Gaussian processesand their avnlications.” Mem. Colleqe Sci., Univ. Kyoto, Ser. A, [2381 S. Sandor, “Sur I’equation differentielle matricielle de type vol. 33:;~. 109-i55. 1960. _ Riccati,” Bull. Math. Sot. Sci. Math. Phys., Revue Physique [2701 L. A. Zadeh, “Time-varying networks, I,” Proc. IRE, vol. 49, Roumaine (New Series), vol. 3, pp. 229-249, 1959. pp. 1488-1502, Oct. 1961. [239] J. Durbin, “The fitting of time-series models,” Rev. Intern. [2711 E. A. Robinson, “Extremal representation of stationary stochas- Statist. Inst., vol. 28, pp. 233-244, 1960. tic processes,” Ark. Mat., vol. 4, pp. 379-384, 1962. 12401G. Baxter, “Polynomials defined by a difference system,” J. [2721 - Random Wavelets and Cybernetic Wavelets. New York: Math. Anal. Appl., vol. 2, pp. 223-263, 1961. Hafner, 1962. [241] L. Ya. Geronimus, Orthogonal Polynomials (Transl. from the [273] H. Cramer, “Stochastic processes as curves in Hilbert space,” Russian). New York: Consultant’s Bureau, 1961. Teor. Veroyat. Primen., vol. 9, pp. 169-179, 1964. [242] G. Baxter, “An asymptotic result for the finite predictor,” Math. P’41 P. Levy, Processus Stochastiques et Mouvement Brownien, Stand., vol. 10, pp. 137-144, 1962. 2nd ed. Paris: Gauthier-Villars, 1964. KAILATH: LINEAR FILTERING THEORY 179

[275] H. Cramer, “A contribution to the multiplicity theory of sto- 13011P. A. Frost and T. Kailath, “An innovations approach to least- chastic processes,” in Proc. 5th Berkeley Symp. Mathematics, squares estimation-Part III: Nonlinear estimation in white Statistics, and Probability, vol. 2. Berkeley, Calif.: Univ. Gaussian noise,” IEEE Trans. Automat. Contr., vol. AC-16, California Press, 1965, pp. 215-224. pp. 217-226, June 1971. [276] L. A. Shepp, “Radon-Nikodym derivatives of Gaussian mea- [302] P. A. Frost, “Estimation and detection for a simple class of sures,” Ann. Math. Statist., vol. 37, pp. 321-354, Apr. 1966. conditionally independent-increment processes,” in Proc. IEEE .-12771 - E. Wona and M. Zakai. “On the relation between ordinarv and Decision and Control ConJ, 1971. Also published as lecture notes stocha&c differential equations and applications to stochastic for Washington University summer course on current trends in problems in control theory,” in Proc. 3rd IFAC Congr. London: automatic control. St. Louis. MO.. 1970. Butterworth, 1966. [3031 T. T. Kadota, M. ‘Zakai, and J. Ziv, “Mutual information of the 1277a1E. J. McShane, “Stochastic functional equations: Continuitv white Gaussian channel with and without feedback,” IEEE -oronerties and relation to ordinarv eauations.” in Control Theor; Trans. Inform. Theory, vol. IT-17, pp. 368-371, July 1971. and&the Calculus of Variations, A. V. Balakrishnan, Ed. New [3041 T. Kailath, “Some extensions of the innovations theorem,” York: Academic Press, 1969. Bell Syst. Tech. J., vol. 50?pp. 1487-1494, Apr. 1971. [277b] -, “Stochastic differential equations and models of random [305] M. M. Rao, “Local functtonals and generalized random fields processes,” in Proc. 6th Berkeley Symp. Mathematics, Statistics, with independent values,” Theory Prob. Appl. (in Russian), and Probabilitv. Berkelev: Univ. California Press. 1973. vol. 16, pp. 457473,. 1971. W'81 H. Kunita and S. Watanabe, “On square-integrable martin- [3061 J. Rissanen, “Recursive identification of linear systems,” SZAM gales,” Nagoya Math. J., vol. 30, pp. 209-245, Aug. 1967. J. Contr., vol. 9, pp. 420-430, Aug. 1971. t2791P. A. Frost, “Nonlinear estimation in continuous-time systems,” [3071 J. Baras and R. Brockett, “Hz-functions and infinite-dimen- Ph.D. dissertation, Dep. Elec. Eng., Stanford Univ., Stanford, sional realization theory, ” in Proc. IEEE Decision and Control Calif., June 1968. Co&, 1972, pp. 355-360; also, Ph.D. dissertation, Harvard P801 I. C. Gohberg and M. G. Krein, “Theory and Applications of Univ., Cambridge, Mass., Sept. 1973. Volterra Operators in Hilbert Space,” (in Russian). Moscow: [3W P. Bremaud, “A Martingale approach to point processes,” y9yok)a, 1967. (Transl.: Amer. Math. Sot., Providence, RI., Ph.D. dissertation, Univ. Calif., Berkeley, Aug. 1972; also Electron. Res. Lab., Rep. M345. 12811R. Geesey, “Canonical representations of second-order processes [3091 M. Fujisaki, G. Kallianpur, and H. Kunita, “Stochastic differ- with applications,” Ph.D. dissertation, Dep. Elec. Eng., Stanford ential equations for the nonlinear filtering problem,” Osaka J. Univ., Stanford, Calif., 1968. Math., vol. 9, pp. 1940, 1972. W4 M. Hitsuda, “Representation of Gaussian processesequivalent [3101 T. Kailath, R. Geesey, and H. Weinert, “Some relations between to Wiener processes,” Osaka J. Math., vol. 5, pp. 299-312, 1968. RKHS norms, and Fredholm equations innovation representa- [2831 J. M. C. Clark, “Conditions for one-to-one correspondence tions,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 341-348, between an observation process and its innovations,” Cent. May 1972. Co6y.put. Automat., Imperial College, London, Tech. Rep. 1, [3111 T. Kailath and D. Duttweiler, “An RKHS approach to detection and estimation problems-Part III : Generalized innovations I2841 R. Geesey and T. Kailath, “Comments on ‘The relationship of representations and a likelihood-ratio formula,” IEEE Trans. alternate state-space representations in linear problems’,” Inform. Theory, vol. IT-18, pp. 730-745, Nov. 1972. IEEE Trans. Automat. Contr. (Corresp.), vol. AC-14, pp. 113- [312]1 H. Dym and H. McKean, Fourier Series and Znte,graZs. New 114, Feb. 1969. York: Academic Press, 1972, t2851 R. Geesey and T. Kailath, “Applications of the canonical [313] J. Rissanen and T. Kailath, “Partial realization of stochastic representation to estimation and detection of colored noise,” processes,” Automatica, vol. 8, pp. 380-386, July 1972. in Proc. Symp. Computer Processingin Communications. Brook- [314] Yu. A. Rozanov, “On nonanticipative linear transformations lyn, N.Y.: Polytechnic Inst. Brooklyn Press, Apr. 1969. of Gaussian processes with equivalent distributions,” Nagoya [286] I. M. Gel’fand and N. Ya. Vilenkin, Generalized Functions. Math. J., vol. 47, pp. 227-235, 1972. New York: Academic Press, 1968. [3151 E. Wong, Stochastic Processesin Information and Communication [287] T. Kailath and R. Geesey, “Covariance factorization-an Systems. New York: McGraw-Hill, 1971. Reviewed by F. J. explication via examples,” in Proc. 2nd Asilomar ConJ: Systems Beutler, IEEE Trans. Inform. Theory (Book Rev.), vol. IT-18, Science, 1968. pp. 827-828, Nov. 1972. [288] A. Schumitzky, “On the equivalence between matrix Riccati ’ [316] H. Akaike, “Markovian representation of stochastic processes equations and Fredholm resolvents,” J. Comp. Syst. Sci., vol. 2, by canonical variables,” SIAM J. Contr., to be published, 1973. nv. 76-87. June 1968. [316a] -, “Stochastic theory of minimal realizations,” IEEE Trans. [288aj A. McNabb and A. Schumitzky, “Factorization of operators, Automat. Contr., vol. AC-19, to be published. Part III: Initial value methods for linear two-point boundary [317] B. D. 0. Anderson, “Algebraic properties of minimal degree value problems,” J. Math. Anal. Appl., vol. 31, pp. 391405, spectral factors,” Automatica, vol. 9, pp. 491-500, 1973. Aug. 1970. [318] A. Ephremides and L. Brandenburg, “On the reconstruction [2891 E. -R. Berlekamp, Algebraic Coding Theory. New York: error of sampled data estimates,” IEEE Trans. Inform. Theory McGraw-Hill, 1969. (Corresp.), vol. IT-19, pp. 365-367, May 1973. t2901 T. Kailath, “Application of a resolvent identity to a linear [319] A. Ephremides and J. B. Thomas, Random Processes-MuZti- smoothing problem,” SIAM J. Contr., vol. 7, pp. 68-74, Feb. plicity Theory and Canonical Decompositions. Stroudsburg, Pa. :

19mI,-,. Dowden, Hutchinson and Ross, 1973. [2911 -, “A general likelihood-ratio formula for random signals [320] M. Hitsuda, “Multiplicity of some classes of Gaussian pro- in Gaussian noise.” IEEE Trans. Inform. Theorv.,, vol. IT-15.~~, cesses,” Nagoya J. Math., to be published, 1974. pp. 350-361, May 1969. [321] G. Kallianpur and H. Oodaira, “Nonanticipative representations r2921 - “Fredholm resolvents, Wiener-Hopf equations, and of equivalent Gaussian processes,” Ann. Prob., vol. 1, pp. 104 Riccati differential equations,” IEEE Trans. Znform. Theory, 122, 1973. vol. IT-15, pp. 665-672, Nov. 1969. [322] T. Kailath and A. Segall, “A further note on innovations, [2931 T. E. Duncan, “On the absolute continuity of measures,” Ann. martingales and nonlinear estimation,” in Proc. IEEE Decision Math. Statist., vol. 41, pp. 3638, 1970. and Control Conf, 1973. W41 P. A. Frost, “The innovations process and its application to [323] H. P. McKean, “Geometry of differential space,” Ann. Prob., nonlinear estimation and detection of signals in additive white vol. 1, pp. 197-206, Apr. 1973. noise,” in Proc. Univ. Missouri, Rolla-M. J. Kelly Communications [324] P. A. Meyer, “Sur un probleme de filtration,” Seminaire de Conf, Oct. 1970, pp. 7.3.1-7.3.6. See also Proc. 4th Princeton vrobabilities. Pt VII. Lecture Notes in Mathematics. vol. 321. Symp. on Information and System Science, 1970. New York: Springer; 1973,,pp. 223-247. [2951 T. Hida, Stationary Stochastic Processes. Princeton, N.J. : WI Yu. A. Rozanov, “Innovations and nonanticipative processes,” Princeton Univ. Press, 1970. in Multivariate Analysis, vol. 3, P. R. Krishnaiah, Ed. New [2961 T. Kailath, “Likelihood ratios for Gaussian processes,” IEEE York: Academic Press, 1973. Trans. Inform. Theory, vol. IT-16, pp. 276-288, May 1970. [3261 A. Segall, “A Martingale approach to modeling, estimation [2971 - “A further note on a general likeihood formula for and detection of jump processes,” Ph.D. dissertation, Dep. random signals in Gaussian noise,” IEEE Trans. on Inform. Elec. Eng., Stanford Univ., Stanford, Calif., Aug. 1973. Theory, vol. IT-16, pp. 393-396, July 1970. [327] E. Wong, “Recent progress in stochastic processes-a survey,” [2981 - “The innovations approach to detection and estimation IEEE Trans. Inform. Theory, vol. IT-19, pp. 262-275, May 1973. theoj,” in Proc. IEEE, vol. 58, pp. 680-695, May 1970. [327a] J. H. Van Schuppen, “Estimation theory for continuous-time t2991 B. D. 0. Anderson and T. Kailath, “The choice of signal process processes, a martingale approach,” Ph.D. dissertation, Dep. models in Kalman-Bucy filtering,” J. Math. Anal. Appl., vol. 35, Elec. Eng., Univ. California, Berkeley, Sept. 1973. pp. 659-668, Sept. 1971. [327b] R. Boel, P. Varaiya, and E. Wong, “Martingales on jump [3W H. Cramer, Structural and Statistical Problems for a Class of processes,I: Representation results, II: Applications,” Univ. of Stochastic Processes. Princeton, N.J. : Princeton Univ. Press, California, Berkeley, ERL Memos 407 and 409, Oct. 1973; also, 1971. submitted to SIAM J. Contr. 180 IEEE TRANSACTIONS ON INFORMATION THEORY, MARCH 1974

[327c] L. H. Brandenburg, “Covariance factorization: Some unified expansion,” ZEEE Trans. Inform. Theory, (Corresp.), vol. IT-19, results encompassing both stationary and nonstationary pro- pp. 561-564, July 1973. cesses,” to appear in IEEE Trans. Inform. Theory. [353a] V. Belevitch, “On network analysis by polynomial matrices,” in Recent Developments in Network Theory, S. R. Deards, Ed. E. Miscellaneous Oxford: Pergamon, 1963, pp. 19-30. [353b],z Classical Network Theory. San Francisco: Holden-Day, 1) Series Expansion (further references can be found in [327]) W-81 D. D. Kosambi, “Statistics in function space,” J. Indian Math. [353c] C. Gueguen, A. Fossard, and M. Gauvrit, “Une representation Sot. vol. 7, pp. 76-88, 1943. intermediaire des systtmes multi-dimensionnels,” in Proc. 1st W'l M. Loeve, “Sur les functions aleatories stationnaires de second ZFAC Symp. on Multivariable Control, Dusseldorf, Germany, order,” Rev. Sci., vol. 83, pp. 297-310, 1945; see also Comptes Oct. 1968. Rend., vol. 220, p. 380, 1945, and vol. 222, p. 489, 1946. [3301 M. Kac and A. J. F. Siegert, “On the theory of noise in radio 2) Linear System Structure receivers with square-law detectors,” J. App. Phys., vol. 18, [354] J. L. Massey, “Shift-register synthesis and BCH decoding,” pp. 383-397, Apr. 1947; see also, Ann. Math. Statist., vol. 18, IEEE Trans. Inform. Theorv. vol. IT-15. vv. 122-127. Jan. 1969. pp. 438442; 1947. [355] V. M. Popov,, “Some properties of control systems with matrix [3311 K. Karhunen. “Uber Lineare Methoden in der Wahrschein- transfer functions,” in Lecture Notes in Mathematics, vol. 144. lichkeitsrechnung,” Amer. Acad. Sci., Fennicade, Ser. A, I, Berlin: Springer 1970, pp. 250-261. vol. 37, pp. 3-79, 1947; (Transl.: RAND Corp., Santa Monica, [356] H. H. Rosenbrock, State Space and Multivariable Theory. Calif., Rep. T-131, Aug. 1960). New York: Wiley, 1970. [3321 N. Aronszajn, “Theory of reproducing kernels,” Trans. Amer. [357] L. Silverman, “Inversion of multivariable linear systems,” IEEE Math. Sot., vol. 63, pp. 337-404, May 1950. Trans. Automat. Contr., vol. AC-14, pp. 270-276, June 1969. [3331 U. Grenander, “Stochastic processes and statistical inference,” [357a] L. Silverman and H. J. Payne, “Input-output structure of Ark. Mat., vol. 1, pp. 195-277, Oct. 1950. linear systems,” SIAM J. Contr., vol. 9, pp. 199-233, May 1971. [3341 A. Ya. Povzner, “A class of Hilbert function spaces,” Dokl. .~~13581 . S. H. Wane. “Desien of linear multivariable svstems.” Ph.D. Akad. Nauk SSSR, vol. 68, pp. 817-820, 1949; see also, ibid., dissertation:Univ. Calif., Berkeley, Dec. 1971: also Electron. vol. 74, pp. 13-17, 1950. Res. Lab. Memo. ERL-M309, Oct. 1971. [3351 R. C. Davis, “On the theory of prediction of nonstationary [358a] A. S. Morse and M. A. Wonham, “Status of noninteracting stochastic processes,” J. Appl. Phys., vol. 23, pp. 1047-1053, control,” IEEE Trans. Automat. Contr., vol. AC-16, pp. 568-581, Sept. 1952.. Dec. 1971. W61 D. Slepian, “Estimation of signal parameters in the presence [359] V. M. Popov, “Invariant description of linear, time-invariant of noise,” IRE Trans. Inform. Theory, vol. P6IT-3, pp. 68-89, controllable system,” SIAM J. Contr., vol. 10, pp. 252-264, Mar. 1954. 1972. I3371 D. C. Youla, “The use of the method of maximum likelihood [360] B. Dickinson, M. Morf, and T. Kailath, “A minimal realization in estimating continuous-modulated intelligence which has been algorithm for matrix seauences.” IEEE Trans. Automat. Contr.. corruvted bv noise.” IRE Trans. Inform. Theory, vol. P6IT-3, vd. AC-19, pp. 31-38, Feb. 1974. pp. 98-105, kar. 1954. [361] G. D. Forney, Jr., “Convolutional codes I: Algebraic structure,” [3381 I. M. Gel’fand and A. M. Yaglom, “Calculation of the amount of IEEE Trans. Inform. Theory, vol. IT-16, pp. 720-738, Nov. 1970. information about a random function contained in another such [361a] -, “Structural analysis of convolutional codes via dual function,” Usp. Mat. Nauk, vol. 12, pp. 3-52, 1956. codes,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 512-518, [3391 V. S. Pugachev, “Application of canonic expansions of random July 1973. functions in determining an optimum linear system,” Automat. [361b] -, “Minimal bases of rational vector spaces with applica- Remote Contr., vol. 17, pp. 489499, 1956. tions to multivariable linear systems,” SIAM J. Contr., 1974. t3401 T. W. Anderson, An Introduction to Multivariate Statistical [362] S. H. Wang and E. J. Davison, “A minimization algorithm for Analysis. New York: Wiley, 1958. the design of linear multivariable systems,” IEEE Trans. [3411 W. Davenport and W. L. Root, An Introduction to the Theory Automat. Contr., vol. AC-18, pp. 220-223, June 1973. of Random Signals and Noise. New York: McGraw-Hill, 1958. [363] W. A. Wolovich, “The determination of state-space representa- [3421 C. W. Helstrom, Statistical Theory of SignalDetection. London: tions for linear multivariable systems,” Automatica, vol. 9, Pergamon Press, 1960; 2nd ed., 1968. pp. 97-106, 1973. 13431 V. N. Tutubalin and M. I. Freidlin, “On the structure of the infinitesimal c-algebra of a Gaussian process,” Teor. Veroyat. Primen., vol. 7, pp. 196-199, 1962. 3) Cepstral Analysis l3441 H. P. McKean, “Brownian motion with a several dimensional [3641 B. Bogert, M. Healy, and J. Tukey, “The quefrency alanysis of time,” Teor. Veroyat. Primen., vol. 8, pp. 357-378, 1963. time series for echoes,” in Proc. Symp. Time Series Analysis, l3451 N. Levinson and H. P. McKean, “Weighted trigonometrical M. Rosenblatt, Ed. New York: Wiley, 1963, ch. 15, pp. 209- approximation on R’ with application to the germ field of a 243. stationary Gaussian noise,” Acta Math., vol. 112, pp. 99-143, [3651 B. P. Bogert and J. F. Ossanna, “The heuristics of cepstrum 1964. analysis of a stationary complex echoed Gaussian signal in W61 A. M. Yaglom, “Outline of some topics in linear extrapolation stationary Gaussian noise,” IEEE Trans. Inform. Theory, of stationary random processes,” in Proc. 5th Berkeley Symp. vol. IT-2, pp. 373-380, July 1966. Mathematics, Statistics, and Probability, vol. II. Berkeley, [3661R. C. Kemerait and D. G. Childers, “Signal detection and Calif. : Univ. California-Press, 1970, pp. 259-278. extraction by cepstrum techniques,” IEEE Trans. Inform. [346a] -, “Strong limit theorems for stochastic processes and Theory, vol. IT-18, pp. 745-759, Nov. 1972. orthogonality conditions for probability,” in Proc. Bernoulh, [3671 L. R. Rabiner and C. M. Rader, Eds., Digital Signal Processing. Bayes, Laplace Symp. Berlin: Springer, 1965, pp. 253-262. New York: IEEE Press, 1972. t3471 E. Parzen, Time Series Analysis Papers. San Francisco: [3@31T. J. Cohen, “Source-depth determinations using spectral, Holden-Day, 1967. pseudoautocorrelation and cepstral analysis,” Geophys. J. Roy. t3481 T. T. Kadota, “Optimum estimation of nonstationary Gaussian Astron. Sot., vol. 20, pp. 223-231, 1970. signals in noise,” IEEE Trans. Inform. Theory, vol. IT-15, pp. 253-257, Mar. 1969. 4) Historical Surveys t3491 T. Kailath, “RKHS approach to detection and estimation problems-Part I: Deterministic signals in Gaussian noise,” [3691 R. L. Placket& “A historical note on the method of least squares,” IEEE Trans. Inform. Theory, vol. IT-17? pp. 530-549, Sept. 1971. Biometrika, vol. 36, pp. 458460, 1949. [3501 R. D. LePage, “Note relating Bochner Integrals and reproducing 13701 0. Neugebauer, The Exact Sciences in Antiquity. Princeton, kernels to series expansions on a Gaussian Banach space,” N.J. : Princeton Univ. Press, 1952. Proc. Amer. Math. Sot., vol. 32, pp. 285-289, Mar. 1972. [3711 N. Wiener, I Am a Mathematician. Cambridge, Mass. : M.I.T. N. Jain and G. Kallianpur, Proc. Amer. Math. Sot., vol. 25, Press, 1956. pp. 890-8951970. . 13721 C. Eisenhart, “The meaning of ‘least’ in least squares,” J. [35Oa] E. Lyttkens, “Regression aspects of canonical correlation,” Wash. Acad. Sci., vol. 54? pp. 24-33, 1964. J. Multivariate Anal., vol. 2, pp, 418439, 1972. t3731 H. L. Seal. “The historical development of the Gauss linear [3511 S. Cambanis, “A general approach to linear mean-square model,” Biometrika, vol. 54, pp. lL23, 1967. estimation problems,” IEEE Trans. Inform. Theory, (Corresp.), [3741 H. W. Sorenson, “Least-squares estimation: from Gauss to vol. IT-19, vv. 110-114. Jan. 1973. Kalman,” IEEE Spectrum, vol. 7, pp. 63-68, July 1970. [3521 W. A. Gardner, on “A general approach to linear mean-square i3751 H. L. Harter, “The method of least squares and some alter- estimation problems,” IEEE Trans. Inform. Theory, (Corresp.), natives,” Aerospace Res. Lab.: Air Force Systems Command, vol. IT-19, pp. 114-115, Jan. 1973. Seealso this issue, pp. 271-274. Wright-Patterson AFB, Ohio, Rep. ARL 72-0129, Sept. 1972. [3531 T. E. Fortmann and B. D. 0. Anderson, “On the approximation [3761 H. W. Sorenson, “Estimation theory: a historical perspective,” of optimal realizable linear filters using a Karhunen-Loeve in Proc. SW IEEE Conf , 1972. IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-20, NO. 2, MARCH 1974 181

5) Information-Theoretic Analyses filtering of certain diffusion processes,” IEEE Trans. Inform. 13771I. Vajda, “A contribution to the informational analysis of Theory, vol. IT-18, pp. 325-331, May 1972. patterns,” in Methodologies of Pattern Recognition, M. S. [384? J. Ziv and M. Zakai, “On functionals satisfying a data-processing Watanabe, Ed. New York: Academic Press, 1969, pp. 509-519. theorem,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 275- 13781S. Arimoto, “Information-theoretical considerations on estima- 283, May 1973. tion problems,” Inform. Contr., vol. 19, pp. 181-194, 1971. I3791 J. Ziv and M. Zakai, “Some lower bounds on signal parameter estimation,” IEEE Trans. Inform. Theory, vol. IT-15, pp. 386- 6) Others 391, May 1969. [385] F. Riesz and B. Sz.-Nagy, Functional Analysis. New York: 13801L. P. Seidman, “Performance limitations and error calculations Ungar, 1955. for parameter estimation,” Proc. IEEE, vol. 58, pp. 644-652, [386] F. Smithies, Integral Equations. New York: Cambridge Univ. May 1970. Press, 1962. 13811J. Seidler, “Bounds on the mean-square error and the quality [387] J. H. Wilkinson and C. Reinsch, Linear Algebra, Handbook of of domain decisions based on mutual information,” IEEE Automatic Computation, vol. 2. Berlin: Springer, 1971. Trans. Inform. Theory, vol. IT-17, pp. 655-665, Nov. 1971. [388] J. R. Klauder and E. G. G. Sudarshan, Quantum Optics. New 13821R. E. Blahut, “An hypothesis-testing approach to information York: Benjamin, 1969. theory,” Ph.D. dissertation, Cornell Univ., Ithacar, N.Y., [389] E. Hopf, “Statistical hydro-dynamics and functional calculus,” Aug. 1972; Abstract in IEEE Trans. Inform. Theory (Disserta- J. Ration. Mech. Anal., vol. 2, pp. 587-591, 1953. tion Abstr.), vol. IT-19, p. 253, Mar. 1973. [390] R. H. Kraichnan, “The closure problem of turbulence theory,” ]3831 M. Zakai and J. Ziv, “Lower and upper bounds on the optimal in Proc. Symp. Applied Mathematics, vol. 13, 1962, pp. 199-225.

A New Estimator for an Unknown Signal Imbedded in Additive GaussianNo ise

MANOUCHEHR MOHAJERI, MEMBER, IEEE

Absiruci-Estimation of an unknown signal observedin the presence Although various array processingmethods have been of an additive Gaussian noise process is reduced to the problem of employed, and significant improvements in the signal-to- estimating an unknown complex parameter. A new class of estimators for an unknown complex parameter is introduced, and their biases and noiseratio havebeen obtained, there still remainsa residual mean-squareerrors are studied. The performance of a particular member noise that needsfurther reduction [3]. Since by their very of this class (c-a estimator) is compared with that of the maximum- nature seismic signals are unknown, this noise reduction likelihood (ML) estimator, and it is shown that the c-a estimator reduces should be treated as an unknown signal estimation problem. considerably the mean-square error for small values of SNR, at the One widely used technique for estimating an unknown expense of introducing a small bias. The c-a and ML estimators of a signalis the maximum-likelihood (ML) estimation procedure complex parameter are applied to the problem of signal estimation, and some interesting numerical results are presented. [4]. When the additive noise process is Gaussian, this estimator has a simple structure, and it choosesthe observed waveform as the signal estimate. The ML estimate of an I. INTRODUCTION unknown signal, observedin the presenceof an additive Gaussiannoise process,is an unbiasedefficient estimate of N MANY communication problems, such as the dis- the signal, and makes no use of the knowledgeof the noise I crimination of small-magnitudeseismic events, the back- spectrum. ground noise causesserious difficulties. Seismic discrimin- In this paperwe introduce a new classof signal estimators ants such as complexity (ratio of the signal energyin two which take advantageof the noise spectrum and which are different time intervals) and spectral ratio (ratio of the generally biased. The reason for introducing such biased signal energyin two different frequencybands) are powerful estimators is to reduce the mean-squareerror at small tools for discrimination of large-magnitudeseismic events signal-to-noise ratios. Analysis of the performance of a [l], [2]. When the event magnitude diminishes, the noise particular member of this class of estimators showsthat, at becomes so critical that these discriminants lose their the expenseof introducing a small bias, a considerable identification capabilities, and therefore one has to search reduction in the interval mean-squareerror, relative to the for different meansof noise reduction. mean-squareerror of the ML estimator, is attainable. In seismic signal estimation, the noise process is short- term stationary and this method of estimation proves to be Manuscript received April 4, 1972; revised September 6, 1973. The author was with the Lincoln Laboratory, Massachusetts In- extremely useful. For such a noise process an updated stitute of Technology, Cambridge, Mass. He is now with the Faculty estimate of the noise spectrumis usedin the structure of the of the Department of , Pahlavi University, Shiraz, Iran. signal estimator.