<<

IEEETRANSACTIONS ONACOUSTICS, SPEECH, AND PROCESSING, VOL. ASSP-33,NO. 2, APRIL 1985 387 Detection of by Information Theoretic Criteria

MAT1 WAX AND THOMAS KAILATH, FELLOW, IEEE

Abstract-A new approach is presented to the problem of detecting are discussed in Sections IV and V, respectively. Simulation the number of signals in a multichannel time-series, basedon the appli- results that illustrate the performance of the new method for cation of the information theoretic criteria for intro- duced by Akaike (AIC) and by Schwartz and Rissanen (MDL). Unlike sensor array processing aredescribed in Section VI. Fre- the conventional hypothesis testing based approach, the new approach quency domain extensions and some concluding remarks are does not require any subjective threshold settings; thenumber of signals presented in SectionsVI1 and VIII, respectively. is obtained merely by minimizing the AIC or the MDL criteria. Siula- tion results that illustrate the performance of the new method for the detection of the number of signalsreceived by a sensor array are 11. FORMULATIONOF THE PROBLEM presented. The observation vector in certain important problems in sig- nal processing such as sensor array processing [15] , [20] , [5] , I. INTRODUCTION [6] , [lo] , [ 121 , [22] , [25] , retrieval [15] , [12] , pole retrieval from the natural response [ll], [24], and re- N many problems in , the vector of observa- trieval of overlapping echos from radar backscatter [7] , de- tions can be modeled as a superposition of a finite number I noted by the p X l vector x(t), is successfully described by the of signals embedded in an additive noise. This is the case, for following model example, in sensor array processing, in harmonic retrieval, in retrieving the poles of a system from the natural response, and 4 in retrieving overlapping echoesfrom radar backscatter. A key x(t) = A(q) Sj(t) i-n(t) issue in theseproblems is thedetection of thenumber of i= 1 signals. where One approach to this problem is based on the observation si( e) = scalar complex referredto as the ithsignal that the number of signals can be determined from the eigen- A(@i)=a p X 1 complex vector, parameterized by an un- values of the covariancematrix of the observationvector. known parameter vector aiassociated with the ithsignal Bartlett [4] and Lawley [14]developed a procedure, based n( -)=a p X 1 complex vector referred to as the additive on a nFsted sequence of hypothesis tests, to implement this ap- noise. proach.For each hypothesis, the likelihoodratio is We assume that the q (q

Manuscript received May 19, 1983; revised June 1, 1984. This work A = [A(%) * * ‘N@q>I (2b) was supported in part by the Air Force Office of Scientific Research, Air Force Systems Command, under Contract AF 49-620-79-C-0058, and s(t) is the 4 X 1 vector the U.S. ArmyResearch Office, underContract DAAG29-79-C-0215, andby the Joint Services Program at Stanford University under Con- ST@) = [s, (t) * ’ * sq(t)] . tract DAAG29-81-K-0057. The authors are with the Information Systems Laboratory, Stanford Because the noise is zero mean and independent of the sig- University, Stanford, CA 94305. nals, it followsthat the covariance matrix of x( *) is given by

0096-3518/85/0400-0387$01.OO 0 1985 IEEE 388 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-33, NO. 2, APRIL 1985

R = \k t aZI (34 used to encode the observed , Rissanen proposed to select the model that yields the minimum code length. It turns out where that in the large-sample limit, both Schwartz’s and Rissanen’s \k = ASAi (3b) approaches yield the same criterion,given by with t denoting the conjugate transpose, and S denoting the MDL= -lOgf(XIS) t iklog N. (7) covariance matrix ofthe signals, i.e., S = E[s(.) s)~] s( . Note that apart froma factor of 2, thefirst term is identical to Assuming that the matrix A is of full column rank, i.e., the the corresponding one in the AIC, while the second term has vectors A(Qi) (i = 1, * , q) are linearly independent, and that an extra factor of3 log N. the covariance matrix of the signals S is nonsingular, it follows that the rank of Ik is q, or equivalently, the p - 4 smallest IV. ESTIMATINGTHE NUMBEROF SIGNALS eigenvalues of \E are equal to zero. Denoting the eigenvalues To applythe information theoretic criteria to detect the of R by AI ha . A, it follows, therefore, that the small- > .> number of signals, or equivalently, to determine the rank of est p - q eigenvalues of R are all equal to , i.e., d the matrix 9,we must first describe the family of competing - = Xq+1 =Aq+2 -... h, = 02. (4) models, or density functions that we are considering. Regard- ing the observations x(tl), * * * ,x(tN) as identical and statisti- The number of signals q can hence be determined from the caily independent complex Gaussian random vectors of zero muZtiplicity of the smallest eigenvalue of R. The problem is mean, the family of models is necessarily describedby the co- that the covariance matrix R is unknown in practice. When matrix of x( e). Since our model’s covariance matrix estimated from a finite sample size, the resulting eigenvalues is given by (3), it seems natural to consider the following fam- are all different with probability one, thus making it difficult ily of covariance matrices to determine the number of signals merely by “observing” the eigenvalues. Amore sophisticated approach to the problem, R(k) = \kc4 + 021 (8) developed by Bartlett [4] and Lawley [14] , is based on a se- wheredenotes a semipositive matrix of rank k, and u de- quence of hypothesis tests. The problems associated with this notes an unknown scalar. Note that k E (0, l, - * . , p - l} approach is the subjective judgment needed in the selection of ranges over the set of all possiblenumber of signals. the threshold levels for the different tests. Using the well-known spectral representation theorem from In this paper we take a different approach. We pose the de- linear algebra, we can express R(k)as tection problem as a model selection problem and then apply theinformation theoretic criteria formodel selection intro- duced by Akaike (AIC) and by Schwartz and Rissanen(MDL).

111. INFORMATION THEORETICCRITERIA where XI,. * , Xk and v,,* , Vk are the eigenvalues and The information theoretic criteria for model selection, intro- eigenvectors, respectively, of R@). Denoting by dk) the duced by Akaike [l] , [2], Schwartz [21], and Rissanen [17] parameter vector ofthe model,it follows that address the following general problem. Given a set of N obser- vations X = {x(l), * * ,x(N)} and a family of models, that is, a parameterized family of probability densities f(XlO),select With this parameterizationwe now proceed to the derivation the model thatbest fits the data. of the information theoretic criteria for the detection prob- Akaike’s proposal was to select the model which gives the lem. Since the observations are regarded as statistically inde- minimum AIC, defined by pendent complex Gaussian random vectors with zero mean, AIC = -2 log f(Xl6) t 2k (6) their joint probability densityis given by where 6 is the maximum likelihood estimate of the parameter vector 0, and k. is tlie number of free adjusted parameters in 0. The first term is the well-known log-likelihood oftlie max- imum likelihood estimator of the parameters of the model. The second term is a bias correction term, inserted so as to make the AIC an unbiasedestimate of the meanKulback- Taking the logarithm and omitting terms that do not depend Liebler distance betwey the modeled density and the f(Xl0) on the parameter vector dk),we find that the log-likelihood estimated densityf(Xl0). L(@) is given by Inspired by Akaike’s pioneering work, Schwartz and Rissanen approached the. problem from quite different points of view. I,(@@)) = -N log det R@) - tr [R@)]-lR (1 2a) Schwartz’s approach is based on Bayesian arguments. He as- sumed that eachcompeting model can beassigned aprior where R is the sample-covariance matrix defined by probability, and proposed to select the model that yields the maximum . Rissanen’s approach is based on information theoretic arguments. Since each model can be WAX AND KAILATH: DETECTION OF SIGNALS BY INFORMATION THEORETIC CRITERIA 389

Themaximum-likelihood estimate is the valueof dk) that Thenumber of signals d^ is determined as the value of kg maximizes (12). Following Anderson [3] ,these estimates are (0, 1, * . - , p - I} for which either the AIC or the MDLis given by minimized.

A hi=li i=l,.-.,k (1 34 V. CONSISTENCYOF THE CRITERIA We have described two different criteria for estimating num- ber of signals. The natural question is which one should be preferred. What can be said about the goodness of the esti- mates obtained by these criteria. One possible benchmark test where 1, > la * * * > lp and CI, * ,Cp are the eigenvalues a,nd is the behavior as the sample size increases. One would prefer eigenvectors, respectively, of the sample covariance matrixR. an estimator that yields the true number of signals with prob- Substituting the maximum likelihood estimates (13) in the ability one as the sample size increases to infinity. An estima- log-likelihood (1 2), with some straightforward manipulations, tor with this property is said to be consistent. By generalizing we obtain a method of proof given inRissanen [18] and Hannanand Quinn [9], we shall show that the MDL yields a consistent estimate, while the AIC yields an inconsistent estimate that tends, asymptotically,to overestimate the number of signals. The consistency of the MDL is proved by showing that in the large-sample limit, MDL(k) is minimized for k = q. Taking first k < 4,it follows from (17) that

1 Note that the term in the bracket is the ratio of the geometric - [MDL(q) - MDL(k)] mean to the arithmeticmean of the smallest p -.k eigenvalues. N The number of free parameters in dk)is obtained by count- ing the number of degrees of freedom ofthe space spanned by dk). Recalling that the eigenvalues of a complex covariance i=k+l = log matrix are real but that the eigenvectors are complex, it fol- q-k lows that dk)has kt1 t'2pk parameters. However, not all of the parameters are independently adjusted;the eigenvectors i=k+l are constrained to have unit norm and to be mutually orthog- onal. This amounts to reduction of 2 k degrees of freedom due to the normalization and 2&k(k- 1) degrees of freedom due to the mutua1.orthogonaIization. Thus, we obtain t log (number 'of free adjusted parameters)

= k t 1 t 2pk - [ik(k- l)] = k(2p - k) t 1. (15) The form of AIC for thisproblem is therefore given by

k)N

Since the eigenvalues of the sample-covariance matrixlj (i = 1, + * * ,4)are consistent estimates of the eigenvalues of the true t 2k(2p - k) covariance matrix hi, it follows that in the large-sample limit the eigenvalues li (i = k t 1, - . ,q) are not ali equal with probability one. Therefore, by the arithmetic-meangeometric- mean inequalityit follows that in thelarge-sample limit while the MDL criterion is given by

This implies that the first term in (18) is negative with prob- ability one in' thelarge-sample limit'. Similarly, by the general- ized arithmetic-mean geometric-mean inequality

w,A1 + w4,>AY'A,W2 w1t wa = 1 (20) 1 + - k(2p - k)log N. 2 it follows that in thelarge sample limit 390 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,VOL. ASSP-33, NO. 2, APRIL 1985

zero. Hence, the AIC tends to overestimate the number of signals q in the large-sample limit. We should note that from the analysis above it follows that any criteria of the form

-log f(Xl4) t Q(N)k (23)

where or(N) + 00 and or(N)/N+ 0 yields a consistent estimate This implies that the second term in (1 8) is also negative with of the number of signals. probability one in the large-sample limit. Now, since the last term in (18) goes to zero as the sample size increases, it fol- VI. SIMULATIONRESULTS lows that the difference [MDL(q)- MDYk)] is negative with In this section we present simulation results that illustrate probability one in the large-sample limit for k < q. the performance of our method when applied to sensor array Taking now k > 4, it follows from (17) that processing.By the well-known dualitybetween spatial fre- 2 [MDL(k)- MDL(q)] quency and temporal , these examples can also be interpreted in the context of harmonic retrieval. The exam- = (k - 4)(2p - k - 4) log N ples refer to a uniform linear array of p sensors with 4 incoher- ent sinusoidal plane waves impinging from directions {G1, . . . , Gq}. Assuming that the spacing between the sensors is equal to half the wavelength of the impinging wavefronts, the vector of the received signal at the arrayis then given by

9 x(t) = 2 A(G~) t n(t) (24a) k=1 where A(&) is the p X 1 “directionvector” of the kth wavefront. A($~)T= [le-jTk . . . e-j(q-l)Tk 1 (24b) with

Note that the terms in the curly bracket are twice the log- 7k = 77 sin q5k (24c) likelihoods of the maximum likelihood estimator under the and hypotheses that the rank of \k is q and k, respectively. Thus,

their difference is the likelihood-ratio for deciding between Q( e) = random uniformly distributedon (0,277)

these two hypotheses. From the general theory of likelihood n( 0) = vector of white noise with mean zero and covariance ratios (see, e.g., Cox and Hinkley [8]) it follows thatthe u21. Notethat this model is a specialcase of the general asymptotic distribution of this statistic is x’ with number of model presented in (1). degrees of freedom equal to the difference of the dimensions In the first example, we considered an array with seven sen- of the parameter spaces under the two hypothesis,i.e., sors (p= 7) and two sources (4 = 2) with directions-of-arrivals 20” and 25”. The signal-to-noise ratio, defined as 10 log 1/2u2, [WP- k) + 11 - M2P - 4) + 11 was 10 dB. Using N= 100 samples, the resulted eigenvalues of = (k - q)(2p - k - 4). the sample-covariance matrix were 21.2359, 2.1717, 1.4279, 1.0979, 1,0544, 0.9432, and 0.7324. Observing thegradual Thus, as the sample size increase, the probability that the term decrease of the eigenvalues it is clear that the separation of the in the curly bracket in (20) exceeds the first term in (20) is 3 “smallest” eigenvalues from the 2 “large” ones is a difficult given by the area in the tail from (k - 4)(2p - k - q)log N of task in which a naive approach is likely to fail. However, ap- the mentioned x2 distribution with (k - q)(2p - k - 4) de- plying the new approach we have presented above yielded the grees of freedom. Since the area in this tail approaches zero as following values for the AIC and MDL. the sample size increases, it follows that in the large-sample 0 1 2346 5 limit the difference [MDL(k)- MDL(q)] is positive with prob- ability one for k > q. Combining this with theprevious result AIC 1180.8100.5 71.4 75.5 96.093.2 86.8 MDL 66.9590.467.2 80.7 105.295.5 110.5. for k < q, it follows that MDL(k) has a minimum at k = q. Repeating the above arguments forthe AIC, it follows The minimum of both the AIC and the MDL is obtained, as that in the large-sample limit and for k 4, the difference [AIC(k) - AIC(q)] has nonzero prob- the scenario described in the first example. The eigenvalues of ability to be negative even in the large-sample limit, since the the sample-covariance in this cpe were 15.5891, 8.1892, tail from (k - 4)(2p - k - q t 1) of the x2 distribution with 1.4715,1.1602, 1.0172, 0.9210, and 0.6528. The resulting (k - q)(2p - k - 4 t 1) degrees’ of freedom is definitely not values of the AIC and theMDL were as follows. ‘&‘AX AND KAILATH: DETECTION OF SIGNALS BY INFORMATION THEORETIC CRITERIA 391

0 1 2346 5 where Zl (on)2 - - * > lP(wn)are the eigenvalu? of the peri- odogram estimateof the spectral-density matrixK(wn). AIC 1011.4 562.1 82.9 83.2 90.4 95.9 96.0 MDL 505.7 298.0 72.7 84.6 97.3 106.5 110.5. When the signals are wideband, namely,occupy several frequency bins, say, 01,. . - , OZ+M, the information on the In this case, the minimum value of both the AIC and the MDL number of signals is contained in all the M frequency bins. As- is obtained, incorrectly, for 4 = 2. The failure of the AIC and suming that the observation time is much larger than the cor- MDL in this case is because of the inherent limitations of the relation times of the signals, it follows that the Fourier coeffi- problem.Indeed, repeating the examplewith signal-to-noise cients corresponding to different are statistically ratio of 10 dByielded the following eigenvalues 14.5595, independent (see, e.g., Whalen [26, p. 811). The AIC for de- 6.7786,0.3786,0.1372,0.1109,0.0946, and 0.0687, and con- tecting the number of wide-band signals that occupy the fre- sequentially the following values of the AIC and the MDL. quency band wz,e * . , WI+M, is given by the sum of (27) over 0 3 1 2 4 5 6 the frequency of interest

AIC 2731.6 1960.6 241.4 91.690.7 96.095.1 MDL 1365.8 997.2 152.0 88.4 97.9 106.2 110.5 Now the minimum of both the AIC and the MDL is obtained, correctly, for q = 3.

VII. EXTENSIONTO THE FREQUENCYDOMAIN i- 2M[k(2p - k)]. (2 8) The starting point of our approachwas the time domainrela- tion (1). However, in certain cases, especially in array process- Thecorresponding expression for the MDL criterion can be ing, the frequency domain is more natural. As we shall show, similarly derived. our approach is easily extended to handle frequency-domain VIII. CONCLUDING REMARKS observations. A new approach to the detectionof the number of signals in Consider the frequency domain analog of the a multichannel has been presented. The approach model (1); the observed vector is a Fourier coefficients vector is based on the application of the AIC and MDL information expressed by theoretic criteria for model selection. Unlike the conventional 4 hypothesis test based approach, thenew approach does not re- quire any subjective threshold settings; the number of signals is dwn)= A(mn 3 ai) si(wn) + n(wn> (25) i= 1 determined merely by minimizing either the AIC or the MDL where criterion. It has been shown that the MDL criterion yields a consistent si(wn)= theFourier coefficient of the ith signal atfre- estimate of the number of signals, while the AIC yields an in- quency on, consistent estimate that tends, asymptotically, to overestimate A(w,, ai) = a p X 1complex vector, determined by the the number of signals. parameter vector Qii associated with the ith signal The detection problem addressed in this paper is usually a n(wn)= a p X 1 complex vector of the Fourier coefficients part of a more complex combined detection-estimation prob- of the additive noise. lem, where one wants to estimate the number as well as the The problem, as in the time domain, is to estimate the num- parameters Q1, * * . , Qi,of the signals in (1). Because of the ber of signals 4 from L samples of the Fourier-coefficients vec- computationalcomplexity involved, the problem is usually torxl(on), ,xL(an)- solved in two steps: first the number of signals is detected, and By the well-knownanalogy between multivariate analysis then, with an estimateof the number of signals 6 at hand, the and time-series analysis (see,e.g., Wahba [23]), the time- parameters of the signals are estimated. It should be pointed domain approach carries over to the frequency domain with out, however, that in principle the AIC and MDL criteria can the role of the sample-covariance matrix played by the peri- be applied tothe combineddetection-estimation problem odogram estimate of thespectral density matrix, given by [26]. Theevaluation of the AIC and MDL in this case in- ;elves th% computationof the maximum likelihood estimates 4 1, . * * , @k, for every possible number of signals k E (0, 1, * * . ,p - 1). Solving this highly nonlinear problem for all Val- Thus, the frequency-domainversion of the AIC is givenby uesof k is computationally very expensive. Nevertheless, if one is ready to pay the price, the gain in term of performance, fV especially in difficult situationssuch as low signal-to-noise ratio, small sample size, or closely “spaced” signals, may be significant. REFERENCES [ 11 H. Akaike, “Information theory and an extension of the maxi- mum likelihoodprinciple,” in hoc. 2nd Int. Symp. Inform. 392 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-33, NO. 2, APRIL 1985

Theory, suppl. Problems of Control and Inform. Theory, 1973, [25] M. Wax, T. J. Shan, and T. Kailath,“Spatio-temporal spectral pp. 267-281. analysis by eigenstructure methods,” IEEE Trans. Acoust., [ 21 -, “A new look at the identification,” IEEE Speech, Signal Processing, vol. ASSP-32, pp. 817-827,1984. Trans. Automat. Contr., vol. AC-19, pp. 716-723, 1974. [26] M. Wax, “Detectionand estimationof superimposed signals,’’ [3] T. W. Anderson,“Asymptotic theoryfor principal component Ph.D. dissertation, Stanford Univ., Stanford, CA, Mar. 1985. analysis,”Ann. J. Math. Stat., vol. 34,, pp. 122-148, 1963. [27] A.D. Whalen, Detection of Signals in Noise. New York: Aca- [4] M:S. Bartlett, “A note on the multiplying factors for various x2 demic, 1971. approximations,” J. Roy. Stat. SOC.,ser. E, vol. 16, pp. 296-298, 1954. [5] G. Bienvenu and L. Kopp, “Adaptivity to background noise spa- tialcoherence for high resolution passive methods,” in Proc. ICASSP’80, Denver, CO, 1980, pp. 307-310. [6] T.P. Bronez and J. A. Cadzow, “An algebraic approach to super resolution array processing,” IEEE Trans. Aerosp.Electron. Syst., VO~.AES-19, pp. 123-133, 1983. [7] A. Bruckstein, T. J. Shan, and T. Kailath,“The resolution of overlapping echoes,” IEEE Trans. Acoust., Speech, Signal Pro- cessing, submitted for publication. [8] D. R. Cox and D. V. Hinkley, Theoretical . London, England: Chapman and Hall, 1974. [9] E. J. Hannan and B. G. Quinn, “The determination of the order of an autoregression,” J. Roy. Stat. Soc., ser. E, vol. 41, pp. 190- 195,1979. [lo] D. H. Johnsonand S. R. Degraff, “Improving the resolution of bearing in passive sonararrays by eigenvalue analysis,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-30, pp. 638- 647,1982. [ 111 R. Kumaresan and D. W. Tufts,“Estimating the parameters of exponentially damped sinusoids and pole-zero modeling in noise,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-30, pp. 833-840,1983. [ 121 -, “Estimation of arrival of multiple plane waves,” IEEE Trans. Aerosp. Electron. Syst., vol. AES-19, pp. 123-133, 1983. [13] S. Y. Kung and Y. H. Hu, “Improved Pisarenko’s sinusoidal spec- trum estimate via the SVD subspace approximationmethod,” inProc. 21 CDC, Orlando, FL, 1982,pp. 1312-1314. [14] D. N. Lawley, “Tests of significance of the latent roots of the covariance andcorrelation matrices,” Biometrica, vol. 43,pp. 128-136,1956. [ 151 N. L. Owsley, “Spectral signal set extraction,” Aspects of Signal Processing: Part II, G. Tacconi, Ed., Dordecht, Holland: Reidel, pp. 469-473,1977. [16] V. F. F’isarenko, “The retrieval of from a covariance function,” Geophys. J. Roy. Astron. Soc., vol. 33, pp. 247-266, 1973. [ 171 J. Rissanen, “Modeling by shortest datadescription,” Automatica, VO~.14, pp. 465-471,1978. [18] -, “Consistent order estimation of autoregressive processes by shortest description of data,” Analysis and Optimization of Sto- chastic Systems, Jacobs et al. Eds. New York: Academic, 1980. [19] -, “A universal prior for the integers and estimation by mini- mum descriptionlength,”Ann. Stat.,vol. ll,pp.417-431, 1983. I201 R. 0. Schmidt, “Multiple emitter location and signal parameter estimation,” in Pvoc. RADC Spectral Estimation Workshop, Rome, NY, 1979, pp. 243-258. [21] G. Schwartz, “Estimating the dimension of a model,” Ann. Stat., VO~.6, pp. 461-464,1978. [22] G. Su and M. Morf, “The signal subspace approach for multiple emitterlocation,” in Proc. 16th Asilomar Con$ Circuits Syst. Comput., Pacific Grove, CA, 1982, pp. 336-340. [23] G. Wahba, “On the distribution of some statistics useful in the analysis of jointly stationary time-series,” Ann. Math. Stat., vol. 39, pp. 1849-1862,1968. [24] M. Wax, R. 0. Schmidt, and T. Kailath, “Eigenstructure method for retrieving the poles from the naturalresponse,” in Proc. 22nd CDC, San Antonio, TX, 1983, pp. 1343-1344.