Signal Processing 74 (1999) 253—277

On the efficient use of Givens rotations in SVD-based subspace tracking algorithms

Philippe Pango*, Benoıˆt Champagne

INRS-Te& le& communications, Universite& du Que& bec, 16 place du Commerce, Verdun, Que&bec, Canada H3E 1H6 Received 28 July 1997; received in revised form 23 October 1998

Abstract In this paper, the issue of the efficient use of Givens rotations in SVD-based QR Jacobi-type subspace tracking algorithms is addressed. By relaxing the constraint of upper triangularity on the singular value , we show how even fewer Givens rotations can achieve a better diagonalization and provide more accurate singular values. Then, we investigate the efficient use of Givens rotations as a vector tool. The cancellation of cross-terms is presented as an efficient signal/noise separation technique which guarantees a better updating of the subspaces basis. Regarding the choice between inner and outer rotations, we properly use the permutation properties of Givens rotations to maintain the decreasing ordering of the singular values throughout the updating process and analyze the consequences on the tracking performance of QR Jacobi-type algorithms. Finally, based on the developed theory, we propose two efficient subspace tracking algorithms which outperform existing QR Jacobi-type algorithms. Comparative simulation experiments validate the concepts. ( 1999 Elsevier Science B.V. All rights reserved.

Zusammenfassung In diesem Beitrag wird die effiziente Verwendung von Givens rotations in SVD-basierten subspace tracking Algorith- men vom QR Jacobi-Typ diskutiert. Wir zeigen, dass durch die Aufgabe der Einschra¨ nkung auf obere Dreiecksstruktur der Singula¨ rwertmatrix mit weniger Givens rotations eine bessere Diagonalisierung and genauere Singula¨ rwerte erzielt werden ko¨ nnen. Weiters wird die effiziente Verwendung von Givens rotations als ein vector rotation Werkzeug diskutiert. Die gegenseitige Aufhebung von Kreuztermen wird als effizientes Signal/Gerau¨ sch Trennungsverfahren pra¨ sentiert, welches ein besseres updating der Unterraumbasis garantiert. Wir demonstrieren den Einfluss der Rotations- art (innere oder au¨ ssere) auf die Reihenfolge der Singula¨ rwerte wa¨ hrend des gesamten Aktualisierungsprozesses. Weiters werden die Konsequenzen der Rotationsart auf die tracking performance diskutiert. Schliesslich werden zwei effiziente subspace tracking Algorithmen vorgeschlagen, die auf den gewonnenen Ergebnissen basieren und besser sind als existierende Algorithmen vom QR Jacobi-Typ. Vergleichende Simulationsergebnisse besta¨ tigen die vorgestellen Kon- zepte. ( 1999 Elsevier Science B.V. All rights reserved.

Re´sume´ Dans cet article, nous abordons le proble`me de l’utilisation optimale des rotations de Givens dans les algorithmes de type QR Jacobi pour le suivi de sous-espaces. En enlevant la contrainte de triangularite´ supe´rieure, nous montrons

* Corresponding author. Tel.: 514-761-8635; fax: 514-761-8501; e-mail: [email protected]

0165-1684/99/$ — see front matter ( 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 2 1 5 - 1 254 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 comment un nombre re´duit de rotations de Givens rotations peuvent produire une meilleure diagonalisation de la matrice des valeurs singulie`res et donner des valeurs singulie`res plus pre´cises. Puis, nous analysons l’utilisation efficace des rotations de Givens en tant qu’outils de rotation de vecteurs. L’annulation des termes croise´s est pre´sente´e comme une technique efficace de se´paration des sous-espaces signal et bruit qui garanti une meilleure mise a` jour des bases des sous-espaces. A© propos du choix entre rotations internes et externes, nous exploitons ade´quatement les proprie´te´s permutatives de ces rotations pour maintenir l’ordre de´croissant des valeurs singulie`res lors de la mise a` jour, et analysons les conse´quences sur les performances de suivi des algorithmes de type QR Jacobi. Finalement, en se basant sur la the´orie de´veloppe´e, nous proposons deux algorithmes efficaces de suivi de sous-espaces dont les performances exce`dent celles des algorithmes de type QR Jacobi existant. Des simulation expe´rimentales valident les concepts pro- pose´s. ( 1999 Elsevier Science B.V. All rights reserved.

Keywords: Subspace tracking; Invariant subspace updating; Singular-value decomposition; Givens rotation; Cross- terms; Direction of arrival

Notation various decompositions, including the eigenvalue a angle of arrival decomposition (EVD) of the sample correlation A data matrix matrix, the QR factorization [2] or the singular- *» 1 time variation of signal subspace value decomposition (SVD) of the data matrix. In * J convergence step induced by the lth rota- the case of non-stationary signals, these decomposi- tion tions need to be updated each time new informa- c efficiency of Givens rotations (EGR) tion is provided by the most recent measurements. G H h GF Givens rotation in i—j plane with angle The quality of the updating process affects the k time index accuracy of the parameter estimates, and its low j forgetting factor complexity may allow a real-time implementation. M number of pairs of rotations at refinement These considerations have led to an intensive re- step search on the development of the so-called sub- N number of sensors space tracking algorithms. Beginning with Owsley off[ ] off-norm of its argument [18], many adaptive subspace tracking algorithms r number of sources have been developed and can be grouped in fami- p G ith singular value lies, depending on the specific technique they use: p , average noise singular value stochastic gradient [35], recursive least squares R singular value matrix [34], perturbation approach [3], to quote a few. º left singular vectors The introduction in [5] is a good source of refer- » right singular vectors ences concerning the development of subspace u electrical angle tracking algorithms. x measurement vector Generally, an SVD-based subspace tracking con- » y update vector, i.e. projection of x on sists in computing the SVD of a data matrix of  yG update vector after the ith QR rotation growing dimension, defined recursively as

(jA(k!1) A(k)" , (1) 1. Introduction C x&(k) D

In the application of subspace methods to array where k is the discrete time index, 0(j(1 is the signal processing, projection techniques like forgetting factor, and x(k)3", is the measurement MUSIC [25], root-MUSIC [1], ESPRIT [24] or vector at time index k. The SVD of A(k) can be minimum-norm [15] are often used to estimate expressed as the signal parameters. These methods require a subspace information which can be provided by A(k)"º(k)R(k)»(k)&, (2) P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 255 where »(k)isanN;N unitary matrix, º(k)is many questions remain unanswered regarding a k;N matrix with orthonormal columns and R(k) these algorithms. Since they use exclusively Givens is an N;N diagonal matrix. Usually, the left singu- rotations throughout the updating process, one lar vectors º(k) need not be tracked to provide the must analyze the efficiency of these rotations on the subspace information. So, one is interested in track- singular values and singular vectors tracking per- +p *p *2*p , ing only the singular values   , formance, and also on the accuracy of the signal along the diagonal of R(k) and the right singular parameter estimates they provide. vectors, i.e. the columns of »(k). Computing this SVD Regarding the singular values, QR Jacobi-type from scratch at each iteration is computationally algorithms usually track an approximate SVD de- expensive and time-consuming. The main issue composition (e.g. URV) in which the singular value consists then in using the information contained in matrix R(k) is given a specific structure; then, Givens the measurement x(k) to update the previous de- rotations are applied throughout the updating algo- composition obtained at time k!1. rithm in order not to destroy that specific structure. In this work, we focus our attention on a specific It might be argued that such a structural constraint family of SVD-based subspace tracking algorithms introduces limitations in the way Givens rotations which we define as QR Jacobi-type algorithms couldbeusedtoachieveabetterupdatingofthe [12,17,23]. QR Jacobi-type algorithms intensively singular values. For example, the above QR Jacobi- make use of Givens rotations in the subspace updat- type algorithms impose R(k)inEq.(2)tobeupper- ing process, the main matrix operations consisting in triangular. The efficiency of Givens rotations assum- products with Givens rotations; thus, they track the ing this specific structure (and eventually others) has subspaces with a low complexity and a rather simple not been investigated yet. Therefore, one notes that formulation. Also, Givens rotations have the ad- the above algorithms use Givens rotations in a way vantage to maintain the orthonormality of matrices, which is not proved to be optimal. assuming an infinite precision. The direct conse- Regarding the singular vectors, QR Jacobi-type quence is that QR Jacobi-type algorithms are good algorithms use Givens rotations to rotate the ap- candidates for implementation in a rather regular proximate singular vectors in the direction of the structure, e.g. using CORDIC processors [27,32]. exact ones. This re-orientation of the singular vec- While the exact SVD-based tracking of »(k)and tors is performed to update a subspace basis which R(k)requiresO(N) operations at each iteration, inputs MUSIC-like estimators and provide the de- Moonen et al. have proposed a QR Jacobi-type sired signal parameter estimates. The accuracy of algorithm with a complexity of O(N) operations per the parameter estimates is linked to the ability of update [17]. In this paper, we shall refer to Moon- the Givens rotations to rotate the proper singular en’s algorithm as MA. Kavcic and Yang [12] intro- vectors in the proper direction. Here, the issue duced a noise sphericalized version of MA, the noise consists then in the choice of the appropriate vec- average SVD (NASVD) which updates a spheri- tors to be rotated in order to provide the best calized URV decomposition [30]; it reduces the update of the subspace basis. Note that in most complexity to O(Nr)bytrackingonlyanr-dimen- existing QR Jacobi-type algorithms, this issue is sional signal subspace. Based on the same structure, not specifically addressed. Rabideau [23] proposed an algorithm of similar Finally, a general issue is the ordering of the complexity, the refinement-only fast subspace track- singular values throughout the updating process. It ing (RO-FST), which considerably improves the is generally accepted that the ordering of the singu- tracking performance. Let us note that Stewart’s lar values can be controlled by the choice of the algorithm for the updating of the rank revealing type of rotations: inner or outer. It is also known URV [30] is also a QR Jacobi-type algorithm which that the choice of the type of rotations has a major has the advantage to track the number of sources r. effect on the tracking performance of QR Jacobi- The effectiveness of the above algorithms in sub- type algorithms; the analytical demonstration of space tracking and/or parameter estimation has the link between the type of rotations and both the already been validated through simulations. Yet ordering of the singular values and the tracking 256 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 error of QR Jacobi-type algorithms needs to be closely related to two matters: the ability of the clearly stated. algorithm to diagonalize R(k), and then, to rotate The purpose of this paper is to investigate the the approximate singular vectors in the direction of efficient use of Givens rotations in order to propose the exact ones. Indeed, a good diagonalization of new efficient SVD-based subspace tracking algo- R(k) ensures the correctness of the singular values, rithms. To diagonalize R(k) efficiently, we remove while an appropriate rotation of the estimated the constraint of upper triangularity of R(k)and singular vectors moves them close to the direction propose an updating structure in which the Givens of the exact ones. The Givens rotation is a tool rotations are dynamically applied in various sub- that can achieve both tasks simultaneously if regions of R(k), allowing a robust response to highly it is judiciously used. In this section, we first non-stationary cases. We analyze the direct effect of present some preliminaries after which we discuss Givens rotation on the subspace basis tracking error the issue of the efficient use of Givens rotations for and introduce the concept of cross-terms cancella- the tracking of the singular values; then, we do tion as an efficient signal/noise subspace separation the same for the singular vectors and, finally, we technique. We provide a solution to the question of investigate the type of rotations to be used, i.e. the ordering of the singular values by choosing ap- inner or outer, and its consequence on the tracking propriately the type of each Givens rotation (inner performance. or outer). Our conclusions regarding the structure of the decomposition, the type of Givens rotations and 2.1. Preliminaries their efficient use enable us to validate the relative performance of various existing QR Jacobi-type al- Let us first introduce the appropriate notation gorithms and propose two new computationally effi- for the Givens rotation, which is the main tool QR cient QR Jacobi-type subspace tracking algorithms. Jacobi-type algorithms use. We recall that a Givens They are formulated for both cases of partial and rotation differs from an identity matrix only at four complete tracking, i.e signal#noise subspace track- entries as in ing and signal subspace-only tracking, with respect-  ive complexity O(N )andO(Nr), where r is the IG\ number of signal sources. The new algorithms out- c 2 sH perform existing QR Jacobi-type algorithms, even G H" $ $ with fewer Givens rotations. GF IH\G\ , (3) The paper is organized as follows. In Section 2, C !s 2 c D the efficient use of Givens rotations to diagonalize I,\H R(k) and track the subspaces basis vectors is ad- ; dressed; also, the choice between inner or outer where IL is the n n identity matrix and rotations and its effect on the tracking error and the "c"#"s""1. In the real case, we have c"cos h ordering of the singular values is analyzed. Section and s"sin h, where h is the angle of the rotation. 3 presents the proposed algorithms, and Section Note that a complex Givens rotation G# can be 4 shows experimental results of computer experi- expressed as the product of a real Givens rotation " ments to validate the new concepts and algorithms G with a phasor P,asinG# GP; accordingly in in various tracking scenarios. Finally, conclusive this paper, we may use the real notation without remarks are given in Section 5. Throughout the any loss of generality. paper, "" ) "" is the 2-norm of its argument. Referring to Eqs. (1) and (2), one would like to approximate the decomposition of the data matrix A(k). To update the singular-value matrix R and the 2. Optimizing the use of Givens rotation right singular vectors » from time k!1tok,QR Jacobi-type subspace tracking algorithms essential- Refering to Eq. (2), the convergence of an SVD- ly proceed in two steps, namely the QR step and the based subspace tracker to the exact SVD of A(k)is refinement step. The QR step applies N Givens P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 257

h rotations to zero each entry of the measurement Details for the computation of J and J can be vector’s projection on the observation space, i.e. found in [16]. Note that the rotations can be either y(k)"»(k!1)&x(k); the updated matrix R(k)is inner or outer. Later in this paper, we shall address obtained by dropping the last (zeroed) row: the issue of the choice of the type of rotations. In MA, we have M"N!1 so that the overall R(k)  "[G, ,>]&2[G ,>]&[G ,>]& complexity of the updating algorithm is O(N ). (, ( ( C 0& D Kavcic’s NASVD [12] is the sphericalized version of MA. It lowers the updating complexity by track- (jR(k!1) ; . (4) ing the following decomposition: C y(k)& D RI (k!1) 0 Regarding the update vector y(k), note furthermore A(k!1)"º(k!1) C 0 p (k!1)I D that one may either pre-multiply each right singu- , ,\P\ lar vectors by an appropriate phase term to make ;»(k!1)&. (8) y(k) real [26] and maintain a real singular value matrix R(k), or work directly with complex Givens RI (k!1) is an upper-triangular square matrix of rotations. In the latter case, phase terms have to be dimension r#1; its diagonal contains the r signal interleaved with Givens rotations to maintain the singular values and one average noise singular R p ! # # singular values real along the diagonal of (k). In value ,(k 1) which is kept at the (r 1, r 1) the sequel, one may assume that the components of position. The above decomposition is updated in y(k) and R(k) are real without any loss of generality. four steps: a Householder transformation which After the QR step, the refinement step intends to rotates the noise subspace basis so that the projec- concentrate the energy of R(k) along its diagonal. tion of the incoming data x(k) on it has a single This ‘diagonalization’ step, sometimes called SVD non-zero component [11], a QR step and a refine- step or refinement, exists in different flavours in the ment step not different from Eqs. (4)—(6), but requir- literature. The refinement usually consists in ing fewer Givens rotations, i.e. M"r. Finally, the a series of M pairs of Givens rotations on both sides average noise singular value is re-estimated. The of R(k) [12,17], as in complexity of NASVD is O(Nr).

R(k)Q[GGJ HJ]&R(k)GGJ HJ (5) (J FJ 2.2. A matter of diagonalization for l"1:M, and the corresponding accumulation of the right rotations on the singular vectors In this section, we would like to develop updat- ing algorithms which diagonalize the singular value + R »(k)Q»(k) “ GGJ HJ. (6) matrix (k) efficiently, i.e. with M not exceeding FJ J N!1 pairs of Givens rotations at the refinement In Eq. (5), the rotation angles h and are com- step as in MA. In this case, the overall complexity J J of the algorithm is expected not to be higher than puted so as to zero the (iJ, jJ) and ( jJ, iJ) entries of  R(k), which we denote as p and p ; we call p the O(N ). Below, we show that the solution to the GJHJ HJGJ GJHJ efficient diagonalization of R(k) consists in choos- pivot of the rotations GGJ HJ and GGJ HJ. For each (i , j ) (J FJ J J pair of entries to be zeroed at the refinement step, ing appropriately the (iJ, jJ) pivots in Eq. (5). h the angles J and J are the solutions of the SVD (diagonalization) of a 2-by-2 matrix, as in 2.2.1. An appropriate structure for R(k) In QR Jacobi-type algorithms, the main goal of cos sin 2 p p cos h sinh J J GJGJ GJHJ J J the refinement step is the reduction of the off-norm C!sin cos D Cp p DC!sinh cos h D of R(k) which is defined by J J HJGJ HJHJ J J p 0 GJGJ " . (7) off[R(k)]" + "p ". (9) C 0 p D  GH HJHJ WG$HW, 258 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277

Let us reconsider the refinement step as described tions. On the other hand, the breaking of the up- in Eq. (5) and omit temporarily the rotation index l. per-triangular structure does not interfere with the G H & G H After the pair of rotations [G( ] and GF are ap- main goal of the refinement step: the reduction of plied to the left and the right of R(k), respectively, its the off-norm of R(k). Note also that unlike the full off-norm is reduced as [9,10] matrix, an upper-triangular R(k) does not allow one to restore the decreasing ordering of the singular off[R(k)]Qoff[R(k)]!("p "#"p "). (10) GH HG values at any time, even by using permutations. p p Thus, the larger GH and HG, the better the diagonaliz- As a consequence of the UXV decomposition, ation. This observation led us to investigate the effect the MS technique does not guarantee the standard of selecting the off-diagonal entry of R(k)withthe QR in Eq. (4) to zero totally the incoming vector largest magnitude before each pair of rotations is y(k) in a fixed number of Givens rotations. More applied, so that in Eq. (5) the rotations always zero particularly, the ith QR rotation GG ,> in Eq. (4) (G the largest pivots. We call this technique a ‘Maximum zeros the ith entry of y(k) and modifies only the ith Search’ (MS) procedure. In agreement with Eq. (10) and the (N#1)th rows, as in and as observed experimentally in Section 4, such Q a technique achieves a better off-norm reduction, and [ith row] cos G[ith row] thus a better tracking of the singular values. ! # sin G[(N 1)th row], (11) Note that unlike other QR Jacobi-type algo- # Q # rithms, the MS procedure does not maintain the [(N 1)th row] cos G[(N 1)th row] upper-triangular structure. Indeed, since off-diag- #sin [ith row]. (12) onal maxima can be located anywhere in R(k), the G MS procedure applies Givens rotations anywhere As a consequence of Eq. (12) being repeated and not necessarily on the second diagonal of R(k) N times for i"1,2,N in Eq. (4), it can be verified as in conventional QR Jacobi-type algorithms. This that after the QR step, the entries of the last row are causes non-zero upper elements to be reintroduced of the same order of magnitude as the off-diagonal in the lower part of R(k), breaking the upper-tri- entries of R(k). This implies that after the QR rota- angular structure. To reduce efficiently the off- tions have been applied, the diagonalization error norm by allowing Givens rotations to be applied due to the entries of the (N#1)th row is much anywhere in R(k), we need to consider a different smaller than the total error due to all the off- type of approximate decomposition, still expressed diagonal entries of R(k), assuming N is reasonably as in (2), but where R(k) is now almost diagonal but large. Also, one should notice that in practical not specifically upper-triangular. We refer to this applications, the data matrix itself A(k) is noise- decomposition as an UXV, where X stands for the corrupted; therefore, depending on the SNR and singular-value matrix R(k) which has no particular the forgetting factor j, the effect of setting to 0 e-size structure but being almost diagonal. As the URV terms (where e is the order of magnitude of the [30], the UXV is simply an approximate SVD but off-diagonal entries of R(k)) might be masked by the in which the constraint of upper triangularity of statistical fluctuations resulting from the noise. R has been relaxed to provide more flexibility to Based on the above considerations, as an updat- Givens rotations. ing procedure for the UXV decomposition, we pro- As far as the SVD is concerned, the most impor- pose to drop the entries of the last row once the tant attribute for R(k) is not to be upper-triangular N QR Givens rotations have been applied. Even as required by conventional QR Jacobi-type algo- though this UXV-based QR step does not zero rithms, but to have the smallest possible off-norm. completely y(k)& and does not lead to an informa- On one hand, applying rotations as conventional tion-preserving updating procedure, we still refer to QR Jacobi-type algorithms do, i.e. preserving the such an operation as a QR step to maintain a com- upper triangular structure, is not proved to be the patibility of terminology with existing algorithms. optimal way to reduce the off-norm of R(k) under In summary, assuming an UXV decomposition, the constraint of a fixed number of Givens rota- an MS refinement appears as an efficient off-norm P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 259 reduction procedure. Our viewpoint is that what [14]. Let r denote the effective rank of A(k). We is lost at the QR step is meaningless, if compared define the following submatrices: to the improvement of performance due to the º " º "º ‘maximum-search’ diagonalization procedure. (k) [ 1(k) ,(k)], (13) The removal of the upper-triangular constraint " » "» of R(k) leads effectively to improved subspace V(k) [ 1(k) ,(k)], tracking algorithms, as observed experimentally in º " 3"I"P » " where 1(k) [u,2,uP] and 1(k) Section 4. € € 3","P [ ,2, P] . Let us also rewrite the almost diagonal singular-value matrix as 2.2.2. Locating efficiently the largest pivots R R At the refinement step, the location of the largest 1(k) 1,(k) R R(k)" , (14) off-diagonal entry of (k) cannot be determined CR (k) R (k) D a priori from theoretical consideration alone. In- ,1 , R 31P"P deed, after the QR step, the largest off-diagonal where 1(k) , and the other blocks are defined entry of R(k) can be located anywhere, and its accordingly. position changes within R(k) after each pair of rota- The main diagonal of R(k) contains the approx- tion is applied. The only way to obtain the max- imate singular values which for now are supposed imum location is by an exhaustive search. Locating to be always maintained in decreasing order: ; p 'p '2'p the largest entry in an N N array involves   ,; later in the paper, we shall N comparison tests. Since the MS procedure re- address the issue of the ordering of the singular quires the maximum search to be performed prior values. In Eqs. (13) and (14), the singular values and to each of the M pairs of rotations at the refinement singular vectors have thus been separated into two step, this leads to an unacceptable O(N) complex- sets corresponding to the r largest and N!r ity updating algorithm. (Recall that M is O(N).) In smallest singular values; the signal and noise R » fact, even if all the entries of (k) are maintained in subspaces are the column-span of 1(k) and » "» ! & " a separate sorted table, each refinement rotation in ,(k), respectively. Let y(k) (k 1) x(k) ! " Eq. (5) modifies 8N 4 entries (2 rows and 2 col- [y1(k) y,(k)]; y1(k) and y,(k) represent the compo- umns) which have to be re-inserted in the table; this nents of the projection of the measurement still leads to an O(N log N)-complexity bottleneck. vector x(k) on the signal and noise subspaces, Therefore, one needs to find a way to perform the respectively. maximum search, but at a lower complexity. One easily notes that the size of the entries of R To efficiently reduce the off-norm of (k), the key y1 and y, provides a draft estimation of the local- idea is to predict the sub-region of R(k) where the ization of the off-norm increase in R. Indeed, at the # # off-norm is expected to increase more significantly QR step the norm y1(k) is transferred exclusively R # # R R after the QR step; if the search region contains to 1, while y,(k) is transferred to 1, and ,. O(N) entries, e.g. a row, so an MS-like procedure This information might be usefull, it is still not can be performed locally with a reduced complex- complete yet. Searching for the largest entry in R ity. The main issue then is the accurate detection of 1, remains a complex operation, particularly the sub-region of R(k) where the QR step causes the when N r. To reduce the search region to the size largest norm increase. In this paper, we propose an of a row, let us investigate more thoroughly the off- approach which consists in estimating the off-norm norm increase of R(k) at the QR step. increase in each row of R(k) after the QR step and We assume that the off-diagonal entries of R(k) distribute dynamically the refinement rotations be- are originally e-sized before the QR step where e is  tween all the rows. small. Let us denote by yG the update vector y(k) p *p *2*p If   , are the singular values of altered by the i first consecutive rotations of the QR ;  " an M N matrix B listed in decreasing order, one step, with y y(k). In Eq. (4), the ith QR rotation defines the effective rank of B as r if p 'e'p , [GG ,>]& alters only the ith and the (N#1)th row, P P> (G ) ) e  & where 1 r min(M,N) and is a specified bound i.e. yG\, as in Eqs. (11) and (12) which we now 260 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 rewrite as step. The dynamic distribution of the rotations is performed in two steps: Q !  & [ith row] [ith row]cos G yG\sin G, (15) 1. Estimate the off-norm growth in each row section for each block: this is performed at the QR step y &Qy & cos #[ith row]sin . (16) G G\ G G by computing ng1,(i), ng,(i) and ng,(i)asin Table 1, prior to each of the N!1 first QR On the one hand, the angle G is computed so as to  & rotation; here, the ng’s correspond to the norm zero the ith entry of the last row yG\; one sees from Eq. (16) that [GG ,>]& also alters the entire of the term which is transferred from the (G # (N#1)th row by adding to it a vector whose en- (N 1)th row to each row section in each block. tries are the off diagonal e-sized entries from the ith 2. In each block, distribute the Givens rotations to each row section: we compute the total off-norm row, multiplied by sin G. On the other hand, one sees from Eq. (15) that [GG ,>]& alters the ith row increase (G  & by multiplying it by cos G and adding yG\sin G to NG" it. Therefore, in R(k), the off-norm growth (ng) gen- erated by the ith QR rotation is located in the ith (+P ng(i)#+P ng (i)#+, ng(i) (18) G 1 G 1, GP> , row only, and can be approximated by the norm of then, we allocate to each row section a number the term which is transferred from the (N#1)th of pairs of Givens rotations corresponding to its row to the off-diagonal entries of the ith row, i.e. contribution to NG. R ng(i)"#y (12i!1, 0, i#12N)sin #. (17) In 1, we compute the number of pairs of rotations G\ G " to be applied in each row section, l1(i) for i 1:r. R " The computation of the off-norm increase ng(i)of Similarly, in 1,, we compute l1,(i) for i 1:r and each row of R(k) prior to the refinement step can be finally, the number of rotations in the row sections R " # used advantageously. Indeed, the refinement step of ,, l,(i) for i r 1:N, so that we have M M"+P l (i)#+P l (i)#+, l (i). Once has to be performed with only pairs of Givens G 1 G 1, GP> , rotations; instead of searching for the maximum of each row section has been allocated a specific num- an N-by-N array prior to each pair of rotations, ber of Givens rotations, the maximum search can one can dynamically distribute the M refinement be performed prior to each rotation, but in the pairs of rotations between the N!1 rows of R(k) corresponding row section only. This procedure according to the predicted off-norm growth ng(i)in leads to a low complexity maximum-search refine- each of them, i.e. more pivots are annihilated in the ment and, as will be shown in simulation experi- rows whose off-norm increase more significantly. ments, to an efficient diagonalization of R(k) even Once the number of Givens rotations allocated to with fewer Givens rotations. a specific row is known, e.g. l(i), an O(N)-complex- ity maximum search can be performed l(i) times on 2.3. A matter of vector rotation the ith row only prior to each of the l(i) refinement pairs of Givens rotations. Thus, if M is of the order The above discussion has been drawn from of N as in MA, the maximum search has a low a ‘singular-value’ point of view. On a diagonaliz- complexity of O(N). ation point of view, it is advantageous to choose the In this paper, we propose a solution to the effi- largest off-norm entries as pivots, as this ensures an cient reduction of the off-norm of R(k) which com- efficient reduction of the off-norm of R(k). But, as bines the above row and block-division approach shown below, it does not guarantee that the singu- of R(k). lar vectors are updated properly. In this section, we The idea is to distribute the available M pairs of now turn our attention on the effect of Givens Givens rotations to each row section in each block. rotations on the estimated singular vectors only. The whole procedure is presented in Table 1; it The QR Givens rotations consist in row rota- consists in interleaving the computation of the off- tions on R(k) and have no effect at all on the right norm increase in each row section with the QR singular vectors in »(k). After the QR step, the P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 261

Table 1 QR step and off-norm increase computation

Step Operation

QR step and for i"1:N off-norm growth estimation OFF-NORM GROWTH ESTIMATION for each row section compute the angle G to zero y(i)asin p p [G ]& GG " GG (G Cy(i)D C 0 D if i)r """ 2 ! # 2 "" ng1(i) y(1 i 1,0,i 1 r)sin G """ # "" ng1,(i) y(r 1:N)sin G else """ # 2 ! # 2 "" ng,(i) y(r 1 i 1,0,i 1 N)sin G end QR ROTATION R & Q[GG ,>]& Cy&D (G Cy&D end

Compute the number of "(+P  #+P  #+,  NG Gng1(i) Gng1,(i) GP>ng,(i) pairs of rotations in each for i"1:N!1 row section of each block if i)r " l1(i) round(MngS(i)/NG) " l1,(i) round(MngSN(i)/NG) else " l,(i) round(MngN(i)/NG) end end +P " Maintain the number of if Gl1,(i) 0 R R rotations in 1, greater or equal to 1 find the row section of 1, with the largest norm-increase " 2 i.e. find h such that l1,(h) max([l1,(1) l1,(r)]) " set l1,(h) 1 end intermediate decomposition has the form of Eq. (2) time k as and needs to be further refined. At this point, we "º R » & have A(k) 1‘°’(k) 1‘°’(k) 1‘°’(k) "º R » & #º‘°’(k)R‘°’(k)»‘°’(k)&, (21) A(k) 1(k) 1(k) 1(k) , , , R #º (k)R (k)» (k)&#A (k), (19) where ‘°’ stands for ‘exact decomposition’. 1‘°’ and , , , 1, R ; ! ; ! ,‘°’ are r r and (N r) (N r) diagonal sub- where matrices, containing the exact signal and noise sin- gular values. "º R » & A1,(k) 1(k) 1,(k) ,(k) By comparing Eqs. (19) and (21), we see that the #º (k)R (k)» (k)&. (20) approximate expression is more likely to be close to , ,1 1 " the one of an SVD when A1,(k) 0, which is A1,(k) expresses the cross-interaction between sub- achieved when the off-diagonal blocks in Eq. (14) #R #"#R #" spaces of different nature (e.g. between the signal are null, i.e. 1, ,1 0. We note that even " right singular vectors and the noise left singular when A1,(k) 0, the approximate subspaces vectors). Let us express the exact decomposition at are not yet identical to the exact ones since the 262 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277

R R *» " submatrices 1 and , may not be diagonal; never- Whereas 1(k) 0 means either perfect conver- theless, the block diagonalization of R(k) helps to gence or a useless algorithm at the kth time iter- * " improve the separation between the signal and ation, J 0 is the result of an unnecessary Givens noise subspaces. rotation. Since each rotation in Eq. (5) helps to To substantiate this affirmation, we will now improve the diagonalization of R(k) and refine the analyze the direct effect of the above-mentioned decomposition, we intuitively consider the time *» * block diagonalization on the tracking error. More variation 1(k) as a convergence step, while J can specifically, we shall investigate the effect of Givens be viewed as a measure of the contribution of the *» rotations on the signal subspace projector lth rotation to 1(k). Our viewpoint here is that, » » & 1(k) 1(k) . Recall that this projector is used by like an MS algorithm which performs a better diag- MUSIC-like algorithms to estimate various signal onalization at each rotation by zeroing the off- parameters. Let the tracking error be defined as the diagonal entry with the largest magnitude, * distance between the approximate and exact signal J should be as large as possible. subspace projector, i.e. After the lth pair of rotations is applied, the modification of the signal subspace is expressed as TE(k)"#»°(k)»°(k)&!» (k)» (k)&#. (22) 1 1 1 1 » "» !  #» ! 1(l) 1(l 1)G ,(l 1)G , (26) Similarly, define the time variation of the updating where G and G denote, respectively, the r;r process [17], *» (k), as the distance between the 1 upper-left and (N!r);r lower-left part of GGJ HJ. approximate signal subspaces at two consecutive FJ ! There are three possible choices for the location time iterations, k 1 and k, i.e. R ! R ! of the lth pivot in (l 1), namely: 1(l 1), *» "#» » &!» ! » ! &# R (l!1) and R (l!1) (R (l!1) and R (l!1) 1(k) 1(k) 1(k) 1(k 1) 1(k 1) . , 1, 1, ,1 (23) being equivalent). For each of these locations, let us analyze the effect of the lth Givens rotation: In QR Jacobi-type subspace tracking algorithms, # ) ( ) 1. r 1 iJ jJ N: the pivot is located in the refinement step transforms the right singular R !  " " ,(l 1). We have G IP and G 0; the vectors according to Eq. (6) where (iJ, jJ) represents rotation affects only two noise singular vectors the pivot location and h the angle of the lth right » ! J as in Eq. (24). Since the signal subspace 1(l 1) Givens rotation in Eq. (5). Each right rotation in * " is unchanged, we have J 0. Eq. (6) affects only the i th and the j th right singular ) ( ) R ! J J 2. 1 iJ jJ r: the pivot is located in 1(l 1) vectors as in and the rotation affects two vectors of the signal subspace. Here, G "0 and G is unitary so € Q€ cos h !€ sin h , GJ GJ J HJ J (24) that, using Eqs. (25) and (26), one still obtains * "0, even though two signal vectors are ro- € Q€ sin h #€ cos h . J HJ GJ J HJ J tated. ) ) # ) ) Temporarily denote by º(l!1)R(l!1)»(l!1)& 3. 1 iJ r and r 1 jJ N: the pivot is R ! the approximate decomposition of A(k) before the located in 1,(l 1). In this case, one vector is lth rotation is applied. In the same way, define rotated in each subspace. The signal subspace » (l!1), » (l!1), R (l!1), R (l!1), R (l!1) transformation is still expressed as in Eq. (26), 1 , 1 , 1, O  and R (l!1) as in Eqs. (14) and (13) with time but here, G 0 and G is not unitary. As ,1 * ' index k replaced by rotation index l!1. Then, we a result, J 0 and a positive contribution is * made to the time variation *» (k). may define J, a measure of the partial alteration of 1 the signal subspace projector resulting from the lth In summary, during the refinement step, each rota- tion GGJ HJ modifies the signal subspace projector rotation, as the distance between the approximated FJ » » & signal subspaces before and after that rotation in 1(k) 1(k) and thus contributes to the time vari- *» Eq. (6), i.e. ation 1(k) if and only if the pivot is chosen in R 1,, or similarly, if the pair of rotations rotate * "#» » &!» ! » ! &# J 1(l) 1(l) 1(l 1) 1(l 1) . (25) two singular vectors belonging respectively to the P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 263

» » orthogonal subspaces spanned by 1 and ,. Let us reconsider the refinement of QR Jacobi- Then, it follows that locating the pivots in type algorithms shown in Eq. (5) in which we tem- R 1, guarantees that each Givens rotation produces porarily abandon the rotation index l. A series of G H & G H a noticeable effect on typical detectors such as pairs of rotations [GF ] and G( are used to diag- MUSIC, and therefore on the estimated para- onalize R(k). Each of these pairs performs the SVD meters; otherwise, the parameters estimates remain of a 2-by-2 matrix and refine the singular values p p p p the same. Several papers from Stewart [28] and GG and HH into GG and HH, as in Eq. (7). Regarding #R # h Fiero et al. [7,8] justify the need to reduce 1, so the angles and , there exist two solutions which p p as to minimize an upper bound on the tracking zero the off-diagonal terms GH and HG, namely inner error, assuming either an URV decomposition [30] or outer rotation angles [16]. To force the updated p p or an ULV decomposition [31]. Indeed, each rota- singular values GG and HH to be in decreasing order- R #R # p 'p tion in 1, reduces 1, by annihilating the ing, i.e. GG HH, it can be proved that the type of cross-terms p and p . the rotations in must be chosen as follows [19]: GJHJ HJGJ We propose to measure the ‘Efficiency of Givens rotations’ (EGR) of a Jacobi-type algorithm as the inner p I p#p!p. (28) proportion of cross-terms rotations involved in its GG HH GH HG refinement step, as in outer number of cross-terms 2;2 SVDs c(N,r)"Avg , Eq. (28) illustrates the mixing properties of outer C number of 2;2 SVDs D and inner rotations. This simple test guides one in (27) the choice between inner and outer rotations to p p obtain two updated singular values GG and HH in where Avg[ ] stands for time average. In our algo- decreasing ordering. When the off-diagonal entries rithm design philosophy, we consider that the EGR are much smaller than the diagonal entries, we p'p#p!p should be high enough to guarantee a satisfactory generally have GG HH GH HG; in this case, updating of the signal subspace projector, and ac- outer rotations generally permute the singular cordingly, a better subspace tracking performance, values unlike inner rotations. Note that those mix- without increasing the total number of rotations. ing properties of outer rotations had already been noticed by Stewart [29]; in [19], we provide the 2.4. ‘Inner’ or ‘Outer’ analytical proof of the sorting effect of inner and outer rotations. The conclusions of the previous section were The main conclusion is that the ordering of the obtained under the assumption that the cross-terms singular values can be imposed by the choice of the are confined in the upper right r;(N!r) corner of type of rotations according to the magnitude of the R(k). To easily identify the cross-terms region of involved off-diagonal and diagonal entries. Before R(k), one must be able to monitor and maintain the each pair of rotations in Eq. (5), one can easily decreasing ordering of the singular values through- choose the type of the rotations to maintain the out the updating process. Note that once the ordering of the corresponding pair of singular ordering of the singular values is correct, no permu- values. This 2-by-2 ordering might not lead to tations are needed to restore the appropriate order- a perfect ordering of all the singular values; but, it ing of the singular values and vectors. The ordering generally ensures that the signal singular values of the singular values mainly depends on two fac- and vectors are maintained at the top r positions. R tors: the initialization of the process, and the deci- In this case, the cross-terms entries, i.e. 1,(k), are sion to choose inner or outer rotations. In this succesfully confined in the r;(N!r) upper-right section, we discuss the proper choice of the type of region of R(k), and eventually a full cross-terms rotations in Jacobi-type algorithms so as to main- annihilation refinement can be performed. tain the ordering of the singular values and provide To elaborate on the above conclusion, let a higher EGR. us analyze here how various QR Jacobi-type 264 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 algorithms select the type of rotations, and the In fact, during the first r time iterations, we have c " ! ! c ! corresponding effect on their convergence proper- ! (N r)/(N 1); then ! falls to r/(N 1) ties. In the algorithm from Moonen et al. (MA) during the following N!r time iterations. Recall [17], all N!1 rotations are outer and in Eq. (5), that the EGR indicates the proportion of pair of " # " ! jJ iJ 1 for iJ 1,2,N 1; the pivots are rotations which updates the signal subspace pro- chosen along the second diagonal of R(k), from (1,2) jector. Accordingly, in each window of N samples down to (N!1,N). In agreement with Eq. (28), if long, the tracking performance of MA can be ex- the off-diagonal entries are sufficiently small, this pected to be higher during the r first time iterations, type of sweep provides a built-in permutation and lower during the (N!r) following samples, scheme which is due to the permutation properties assuming N is reasonably large; this pattern is then of outer rotations, as demonstrated in [19]. The repeated every N samples, when the decreasing result is that in Eq. (5), the first singular value (not ordering of the singular values is recovered once necessarily the largest) moves from one position to again (see simulations in Section 4). Fig. 1 shows c the next one after each pair of rotations is applied, how ! varies as a function of N for different so that at the end of the sweep, it reaches the last values of r. One sees that in general, 50% of the position. The EGR of MA averaged over N sam- rotations do not alter the subspaces projection. ples is In NASVD, as in any other algorithm which sphericalizes the noise subspace, it is crucial to lock 2r(N!r) p c (N,r)" . (29) the average noise singular value , at the last ! N(N!1) position, otherwise a signal singular value might be

Fig. 1. Efficiency of Givens rotation in Moonen’s algorithm (MA) as a function of N and r. P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 265 wrongly averaged with the noise singular values. QR Jacobi-type algorithms: Therefore, Kavcic proposes a refinement step in E an UXV decomposition enables more flexibility which all rotations are outer except the last one. in the choice of pivots locations; Let us first note that by choosing the last rotation E the computation of the off-norm increase of R(k) to be an inner one, NASVD becomes highly depen- at the QR step enables one to distribute dynam- dent on the initialization: indeed, the singular ically the Givens rotations and perform a better values must be selected so that the initial average diagonalization; noise singular value is small enough, since it has few E by choosing appropriately the type of rotations, chances to be permuted to an upper position. Also, one can maintain the larger singular values at the p # since , is locked at the (r 1)th position, the only top r diagonal positions; rotation which anihilates a cross-term in each E at each time iteration, at least one rotation in Eq. R sweep, i.e. corresponding to 1,, is the last rth one, (5) must be applied to annihilate cross-terms in so that the EGR of NASVD over r samples is only R(k) to ensure the updating of the subspace pro- jection matrix. c "1/r. (30) !$ In this section, we take into account these prin- In the case of sources tracking, e.g. direction of ciples in the design of two new efficient subspace arrival (DOA) tracking, NASVD cannot be ex- tracking algorithms for both cases of partial and pected to perform well when two sources get close. complete tracking, i.e. signal#noise subspace Indeed, it is known that when two sources get much tracking and signal subspace-only tracking. closer, one signal singular value falls to the noise level. Even if this singular value becomes the 3.1. Signal and noise subspace tracking: cross-terms smallest of the set, the exclusive use of outer rota- singular-value decomposition 2 (CSVD2) tions in the (r!1) first rotations hardly brings it to the (r#1)th position. For all these reasons, the Based on the theoretical considerations of the tracking performance of NASVD cannot be ex- previous Section, we propose an improved second pected to be high, as will be shown in simulations. version of the CSVD algorithm [20], namely The last QR Jacobi-type algorithm we investi- CSVD2. It retains the cross-terms annihilation con- gate is a modified version of NASVD, Refinement cept of CSVD, but eventually, allocates a portion of Only Fast Subspace Tracking (ROFST) from R R the refinement Givens rotations to ,(k) and 1(k) Rabideau [23]. Essentially, ROFST differs from to provide a better and robust tracking of the NASVD by its refinement structure which zeros the singular values and the singular vectors. The RI last column and the last row of (k!1) alternately CSVD2 algorithm is presented in Table 2 and its with two separate series of row and column rota- steps are explained below. tions. The two separate series of Givens rotations » " R " implement inner rotations which are more likely to Initialization: (0) I, and (0) 0,, or with an maintain the ordering of the singular values if the initial approximate SVD. off-diagonal entries are sufficiently small. By choos- QR step and off-norm increase computation (see ing all the pivots in the last column, ROFST has the Table1):theQRstepisachievedbyN row rotations c " highest possible EGR, & 1. This explains the asinEq.(4).Foreachoftheserotations,e.g.theith good performance of ROFST as compared to rotation [GG ,>]&, we first compute the angle ; (G G NASVD and other subspace tracking algorithms, then we compute the off-norm increase in the row as observed in [22]. sections of the only row of R(k)thatisalteredby [GG ,>]&,i.e.theith row. Finally, the rotation is (G appliedsoastozerotheith entry of the last row. At 3. Proposed algorithms theendoftheQRstep,theentriesofthelastroware assumed small enough to be dropped. The discussion in Section 2 leads to the following Distribution of the refinement Givens rotations (see principles regarding the use of Givens rotations in Table 1): the M pairs of refinement rotations are 266 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277

Table 2 CSVD2 algorithm

Step Operation

»" R" Initialization Initialize with I, and 0,, or with an approximate decomposition

$%& M" number of pairs of Givens rotations at refinement step

Main loop for k"1,2,R y"»&x, RQ(jR

QR step and Use the equations in Table 1 to: off-norm control ⅷ perform the QR operation ⅷ estimate the off-norm increase of all row sections ⅷ compute the number of pivots to be selected in each row section of R(k) at the next (refinement) step

R " Refinement of 1 for m 1:r " for n 1:l1(m) search the location (i,j) of the off-diagonal maximum of R(m,1:r) choose the type of rotation as in Eq. (28) RQ G H &R G H [G(GH] GFGH »Q» G H GFGH end end

R " Refinement of 1, for m 1:r " for n 1:l1,(m) search the location (i,j) of the maximum of R(m,r#1:N) choose the type of rotation as in Eq. (28) RQ G H &R G H [G(GH] GFGH »Q» G H GFGH end end

R " # Refinement of , for m r 1:N " for n 1:l,(m) search the location (i, j) of the off-diagonal maximum of R(m,r#1:N) choose the type of rotation as in Eq. (28) RQ G H &R G H [G(GH] GFGH »Q» G H GFGH end end end

R R distributed to the row sections of 1, 1, and row. But, in order to ensure the updating of the R , according to their specific contribution to the signal subspace projector, the number of rotations R R total norm-increase NG in (k), so that at the in the cross-terms region 1, is always maintained refinement step, more pivots are selected where greater or equal to 1. a greater diagonalization effort is required. Note Refinement: Three refinement loops are per- R R that depending on the off-norm distribution, it formed to reduce the off-norm of 1 and ,, and R might happen that no pivot is selected in a specific the norm of 1,, respectively. In each block, P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 267

p a maximum search is performed in each row section additional average noise singular ,, so that the only. This search is performed the number of times singular-value matrix is reduced to an (r#1); # RI R as prescribed at the previous step. The type of each (r 1) matrix (k). ,(k) is thus reduced to a single p rotation (inner or outer) is chosen as in Eq. (28) to component , and its diagonalization is of no con- R ; maintain the altered singular values in decreasing cern. Also, 1, is reduced to an r 1 column. ordering and maintain the signal singular values at We present a noise sphericalized version of CSVD, the top positions. CSVD2 is designed to react effi- the noise average-cross-terms singular-value decom- ciently to various tracking scenarios, as will be position (NA-CSVD). The NA-CSVD algorithm is shown in the experiments of Section 4. If M N as presented in Table 3 and its steps are explained below. in MA, the complexity of CSVD2 is O(N). Initialization: NA-CSVD can be initialized with 3.2. Signal subspace tracking: noise-averaged an approximate sphericalized SVD, or with » " RI " CSVD (NA-CSVD) 1(0) I,"P and (0) 0P>, where I,"P stands for the r first columns of I,. NA-CSVD is the noise sphericalized version of Householder transformation: a Householder CSVD; it tracks the signal subspace only, and an transformation rotates the noise singular vectors,

Table 3 NA-CSVD algorithm

Step Operation

» " RI " Initialization Initialize with 1 I,"P 0P> or with an approximate sphericalized SVD

Loop for k"1,2,R p "RI # # , (r 1,r 1) "»& Householder x1 1x " !» transformation z x 1x1 € " "" "" , z/ z

QR step &Q & "" "" RI Q(jRI y [x1, z ], for i"1:r#2

compute the angle G to zero y(i)asin p p [G ]& GG " GG (G Cy(i)D C 0 D RI RI Q[GG P>]& Cy&D (G Cy&D end

Refinement step for l"r downto 1 choose the type of rotation as in Eq. (28) RI Q J P> 2RI J P> [G(J ] GFJ » € Q » € J P> [ 1, ,] [ 1, ,]GFJ end

Sphericalization (N!r!1)jp #RI (r#1,r#1) RI (r#1,r#1)Q ,  N!r end 268 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 so that the projection of the measurement vector tion, namely UXV, and both algorithms have dif- x(k) onto the noise subspace is parallel to the first ferent refinement structures. The complexity of € noise vector, , (see [4,11] for details). NA-CSVD is O(Nr). QR step: the QR step is achieved as in Eq. (4) but with only r#1 Givens rotations. 3.3. Further remarks on the proposed algorithms Refinement: the r pivots are chosen from the rth entry up to the first entry of the (r#1)th column of Regarding the numerical properties of the algo- RI (k) to provide a highest EGR of 1. For each pair of rithms, let us note that CSVD2 and NA-CSVD retain rotations, the type of rotations is chosen as in Eq. the classical QR#refinement structure from [17] (28) to maintain the decreasing ordering of the and [12], but simply apply the rotations in a different altered singular values. way. Therefore, an error propagation analysis similar Sphericalization: finally, the average noise singu- to the one presented in [17] can be formulated; the lar value is re-estimated. algorithms can be proven to be numerically stable, provided the forgetting factor j verifies j(1, and if Here, the full cross-terms cancellation concept a reorthogonalization procedure is included. leads to a type of RO-FST structure. However, our Our innovative way of monitoring the off-norm approach is based on a different type of decomposi- increase throughout R(k) allows us to dynamically

Fig. 2. Initial convergence of complete tracking algorithms: Exact SVD, CSVD2, MA and Maximum search (MS): (a) tracking error, R " " h " h " " j" (b) off-norm of (k). N 20, r 2,  10°,  20°, SNR 10 dB, 0.99, 10 experiments average. P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 269 distribute the Givens rotations between each row in to estimate the rank variations. NA-CSVD has R R R 1, , and 1, so that the off-norm is reduced been succesfully coupled with a recent rank track- where it is predicted to grow more significantly. ing technique for spherical subspace trackers The main result is that CSVD2 can achieve a track- (RSST) proposed by Kavcic and Yang [13]. NA- ing performance similar to Moonen’s algorithm CSVD/RSST tracks efficiently the subspaces and (MA) [17] for instance, but with much fewer Givens the variations of the rank changes in various rotations at the refinement step, as shown in the tracking scenarios, including moving and crossing simulation experiments. sources [21]. Unlike the rank revealing URV updating algo- rithm [30], CSVD2 does not require the initial con- ditions regarding the size of the entries of each block 4. Experimental results to be explicitely stated. The block partitioning of R » » » (k), or similarly the sorting of into [ 1 ,]is In this section, computer simulations are con- obtained after a few iterations and maintained by ducted to test the new algorithms and the underly- choosing the appropriate type of each rotation. ing theory regarding the efficient use of Givens Therefore, CSVD2 maintains the signal singular rotations in QR Jacobi-type subspace tracking al- R values at the top position and confines 1, in the gorithms. In all the experiments, we deal with the upper-right corner of R(k), as in Stewart’s URV. problem of estimating the direction of arrival Regarding the issue of sudden rank changes, one (DOAs) of r incident plane waves on a receiving may use the MDL criteria [33] jointly with CSVD2 linear array of N sensors. The intersensor spacing

Fig. 3. Comparison of MA (M"N!1"19), and CSVD2 using only 50% (M"9) of the refinement rotations used in MA: (a) tracking R " " " " " j" error, (b) off-norm of (k). N 20, r 2, a 10°, a 20°, SNR 10 dB, 0.99, 10 experiments average. 270 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 d is set to half the wavelength. At each snapshot, the rithms can be reached by using efficiently fewer measurement vector x(k) is obtained and its ith pairs of Givens rotations at the refinement step. We component represents the combined action of all ran two series of experiments, one for each type of the sources at the ith sensor, as in QR-Jacobi-type algorithms: complete tracking algo- rithms which track both the signal and noise sub- P spaces, and partial tracking algorithms which track x (k)" + s (k)e (G\)SJ(I)#n (k), i"1,2,N, G J G the signal subspace only by sphericalizing the noise J (31) subspace. All algorithms are initialized with an initial exact decomposition obtained from a few u where J(k) is the electrical angle (intersensor initial measurement samples. We would like how- phase-shift at time index k); sJ(k) is the complex ever to mention that CSVD2 and NA-CSVD pro- amplitude of the lth source; nG(k) is an additive vides similar experimental results if initialized with » " R " background noise component. nG(k) and sJ(k) are (0) I and (0) 0. modelled as independent zero-mean complex circu- lar Gaussian variables. The variance of nG(k) is set to Experiment 1. Initial convergence of complete track- " 1, and the complex amplitudes sJ(k) correspond to ing algorithms. N 20 sensors are used to track " " " each of the sources signal-to-noise ratio (SNR). r 2 stationary sources at a 10° and a 20° Our goal is to show how the performance of with SNR"10 dB. The forgetting factor j is set existing QR Jacobi-type subspace tracking algo- to 0.99. The experiment involves the following

Fig. 4. Initial convergence of partial tracking algorithms: NA-CSVD, RO-FST (one refinement), Exact SVD and NASVD. " " " " " " " " j" N 10, r 5, SNR 10 dB, a 10°, a 20°, a 30°, a 40°, a 50°, 0.99, 10 experiments average. P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 271 algorithms: Exact SVD which computes the exact N time iterations, the signal singular vectors (and SVD of Eq. (1) at each time iteration, Moonen’s values) are properly positioned and for the r follow- algorithm (MA) where the refinement consists in ing time iterations, 18 of the 19 pairs of rotations in M"N!1"19 pairs of rotations, Maximum the refinement step annihilate a cross-term. The Search (MS) which searches the maximum of R(k) same effect is more visible in the off-norm of the prior to each of the M"N!1"19 refinement singular value matrix from MA. Since MA requires rotations (as detailed in Section 2.2.1) and CSVD2 N measurement samples to zero all upper-diagonal (M"N!1"19). All algorithms are initialized entries, the off-norm of R(k) is really lowered each with the exact SVD obtained from only one initial N samples only, i.e. at specific time indexes " &"º R " " " sample x(0), i.e. A(0) x(0) (0)" (0)", k 2, k 22, k 42,2 and so on. The increase » & R " "" "" 2 (0),", with (0) [ x(0) ,0 0]; (to start with of the off-norm between these time indexes is due to a square singular value matrix, we use R(0)Q the fact that the QR step constantly contributes to diag(R(0)) instead. Fig. 2(a) and (b) show the initial the increase of upper-diagonal entries which have convergence for the tracking error and the off-norm to wait N time iterations before being zeroed. of R(k) as defined in Eqs. (9) and (22), respectively; The MS algorithm maintains the off-norm of these results are the average of a 10-trial experi- R(k) at the lowest level, but within a complexity of ment. O(N). CSVD2 achieves almost the same diagonal- Regarding MA, one notices an abrupt decrease ization performance as MS by targetting the anni- of the tracking error of MA is observed each 20 hilation of large off-diagonal entries, but with samples only, i.e. at k"2 and k"22. This is due to a smaller searching complexity cost. This confirms the permutation properties of outer rotations; every that our estimation of the off-norm distribution in

Fig. 5. Tracking error of complete tracking algorithms: sudden subspace rotation. Exact SVD, CSVD2, MS and MA. N"20, r"5, SNR"10 dB, j"0.96, average of 10 experiments: (a) CSVD2 uses only 20% of N!1 Givens rotations (M"9), (b) CSVD2 uses 100% of the N!1 Givens rotations (M"19). 272 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277

R(k) is precise. Note that, even with a loss of in- TQR-SVD [6]. The simulation parameters are: " " " " " formation at the QR step, CSVD2 reaches the sub- N 10, r 5, SNR 10 dB, a 10°, a 20°, " " " j" space tracking performance of an exact SVD. In a 30°, a 40°, a 50°, 0.99. Fig. 4 shows the Fig. 3(a) and (b), we compare MA (M"19) with tracking error during initial convergence, average CSVD2 using only M"9 pairs of rotations at the of 10 experiments. There is almost no difference refinement step (approximatively 50% of the refine- between NA-CSVD, RO-FST and the exact SVD. ment 19 pairs of rotations in MA). It shows that This confirms that the uppper-triangular structure even with fewer Givens rotations, CSVD2 tracks is not required, since the main difference between the subspaces and diagonalizes R(k) more efficiently RO-FST and NA-CSVD is the structure of R(k) than MA. and consequently the refinement step. As expected, NASVD suffers from the initialization starting- Experiment 2. Initial convergence of partial tracking point which has been obtained with only six sam- algorithms. Along with the exact SVD, we simulate ples. Also, since its tracking error is modified by various partial tracking algorithms: NASVD (the only one pair of rotations at the refinement step, its noise sphericalized version of MA), RO-FST (one convergence is slow, compared to the other cross- refinement only) and proposed NA-CSVD. We lim- terms based algorithms, i.e. NA-CSVD and RO- ited this experiment to those three algorithms since FST. [22] offers numerous comparison tests between RO-FST and other well-known algorithms, includ- Experiment 3. Complete tracking algorithms and sud- ing ROSA [4], Karasalo’s algorithm [11] and den subspace rotation. N"20, r"5, SNR"10 dB,

R R R Fig. 6. Number of Givens rotations applied by CSVD2 (100%) in 1(k) (Signal), ,(k) (Noise) and 1, (Cross-terms) averaged over 40 trials. The simulation parameters are as in Fig. 5. P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 273 j"0.96. The algorithms are initialized with an exact Fig. 6 shows the number of pairs of rotations ap- " R SVDobtainedfromthefollowingDOAs:a plied by CSVD2 (M 19) to the Signal block 1(k), " " " " R 10°, a 20°, a 30°, a 40° and a 50°.At the Noise block ,(k) and the Cross-terms block " R " k 30, we remove these sources and introduce new ( 1,), averaged over 40 trials. At k 30, the pro- "! " R ones with opposite DOAs a 10°, a portion of rotations applied to , increases to ! "! "! "! R 20°, a 30°, a 40° and a 50°. respond to the off-norm increase in ,(k). This causes the signal subspace associated with the This experiment shows that CSVD2 effectively incoming data to rotate by 90°. We then observe applies the rotations where a diagonalization the length of time each algorithm takes to stabilize procedure is required, without any prior informa- itself, i.e. to rotate the approximate subspaces in the tion of the DOA changes. The direct consequence is orthogonal direction. The tracking error of the the robustness of CSVD2 which stabilizes itself algorithms are in Fig. 5(a), where the refinement faster. step of CSVD2 consists in only M"4 pairs of rotations (approximatively 20% of the usual 19 Experiment 4. Partial tracking algorithms and sud- pairs of rotations) in its refinement step. CSVD2 den subspace rotation. The simulation parameters appears as a robust algorithm, even when it uses are the same as in Experiment 3. Fig. 7 shows the fewer pairs of rotations at the refinement step. Fig tracking error. We note once again the superiority 5(b) shows the tracking error of CSVD2 compared of cross-terms based algorithms (NA-CSVD and to MA, with the same number of pairs of Givens RO-FST) over NASVD. Note that under the same rotations at the refinement step, i.e. M"19. experiment conditions, spherical subspace trackers

Fig. 7. Tracking error of partial tracking algorithms: sudden subspace rotation. N"20, r"5, SNR"10 dB, j"0.96, 10 experiments average. 274 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 face drastic DOA changes with less robustness if a stairs-like angle estimate. What may be con- compared to complete tracking algorithms; they sidered as an advantage in Experiment 1 (initial require a longer period to stabilize themselves. convergence) is a drawback when the experiment involves non-stationary data. On the other hand, Experiment 5. Complete tracking algorithms and CSVD2 tracks continuously the sources DOAs, non-stationary data. N"40 sensors are used to even with fewer Givens rotations at the refinement " " # " track r 2 sources, a (10 0.04k)° and step, i.e. only 20% (M 8) of the 39 pairs of Givens " ! " a (30 0.04k)° which cross at k 250 with rotations used in MA. SNR"10 dB and j"0.92. All the algorithms are initialized with the exact SVD computed with 20 Experiment 6. Partial tracking algorithms and non- previous samples. For each algorithm, roots-MU- stationary data. Here, we have N"8, r"2, j" " " # SIC [1] is used to estimate the DOAs at time index 0.95, SNR 5 dB, a (20 0.01k)° and " ! " k. Fig. 8 shows the tracked angles averaged over 10 a (30 0.01k)° cross at k 500. Fig. 9 shows trials. One can clearly see how the choice of the the estimated DOAs from one realization only. As rotations type coupled with the choice of the pivots expected, the cross-terms based algorithms, i.e. NA- location can greatly influence the parameters track- CSVD and RO-FST, perform better and provide ing performance. MA is influenced by the permuta- more accurate DOA estimates. Fig. 10 shows the tions induced by outer rotations. Indeed, in each tracking error averaged over 20 experiments; it block of 40 consecutive samples, the tracking is indicates that the algorithms face a stronger per- effective during the first two samples, and almost turbation when the sources get closer. Here, stall during the 38 following ones; the result is the superior performance of cross-terms based

Fig. 8. Angles estimates by complete tracking algorithms: crossing sources. Exact SVD, Maximum Search (MS), CSVD2 (20%) and " " " # " ! " j" MA. N 20, r 2, a (10 0.04k)°, a (30 0.04k)°, SNR 10 dB, 0.92, 10 experiments average. P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 275

Fig. 9. Angles estimates by sphericalized noise subspace algorithms: crossing sources. N"8, r"2 crossing sources, j"0.95, " " # " ! " SNR 10 dB, a (20 0.01k)° and a (30 0.01k)° cross at k 500, 1 experiment.

Fig. 10. Tracking error of partial tracking algorithms, averaged over 20 experiments: crossing sources. Exact SVD, NASVD, RO-FST " " " # " ! " j" " and NA-CSVD; N 8, r 2 crossing sources a (20 0.01k)° and a (30 0.01k)° cross at k 500, 0.93, SNR 10 dB. 276 P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 algorithms confirm the need to locate pivots almost [3] B. Champagne, Adaptive eigendecomposition of data R covariance matrices based on first-order perturbations, exclusively in 1,. More particularly, NA-CSVD outperforms RO-FST and NASVD. IEEE Trans. Signal Process. 42 (10) (October 1994) 2758—2770. [4] R.D. Degroat, Noniterative subspace tracking, IEEE Trans. Signal Process. 40 (3) (March 1992) 571—577. [5] R.D. DeGroat, E.M. Dowling, H. Ye, D.A. Linebarger, 5. Conclusion Spherical subspace tracking for efficient, high performance adaptive signal processing applications, Signal Processing In this paper, we investigated the optimal use of 50 (1996) 101—121. Givens rotation as both a diagonalization tool and [6] E.M. Dowling, L.P. Ammann, R.D. DeGroat, An adaptive a vector rotation tool. On one hand we showed TQR-SVD for angle and frequency tracking, IEEE Trans. Signal Process. 42 (4) (April 1994) 914—926. how a better diagonalization of the singular-value [7] R.D. Fierro, Perturbation analysis for two-sided (or com- matrix is obtained when the Givens rotations are plete) orthogonal decompositions, SIAM J. Matrix Anal. applied so as to annihilate the largest entries of the Appl. 17 (2) (April 1996) 383—400. singular-value matrix. To reduce the complexity of [8] R.D. Fierro, J.R. Bunch, Bounding the subspaces from this approach, we proposed an efficient method rank-revealing two-sided orthogonal decompositions, R SIAM J. Matrix Anal. Appl. 16 (3) (July 1995) 743—759. which controls the off-norm in (k) by distributing [9] G.H. Golub, C.F. Van Loan, Matrix Computations, 2nd dynamically the Givens rotations. On the other ed., John Hopkins Univ. Press, Baltimore, MD, 1989, pp. hand, we demonstrated the need to zero cross- 444—445. terms entries in the case of subspace tracking. We [10] S. Haykin, Adaptive Filter Theory, 2nd ed., Prentice-Hall, showed how the choice for inner or outer rotations Englewood Cliffs, NJ, 1991, p. 426. [11] I. Karasalo, Estimating the covariance matrix by signal can be made to maintain the ordering of the singu- subspace averaging, IEEE Trans. Accoust. Speech Signal lar values. Finally, based on this study, we pro- Process. 34 (February 1986) 8—12. posed two efficient and robust subspace tracking [12] A. Kavcic, B. Yang, A new efficient subspace tracking algorithms, CSVD2 and NA-CSVD. As shown in based on singular value decomposition, in: Proc. Internat. simulation experiments, the proposed algorithms Conf. on Acoustic Speech and Signal Processing, Vol. IV, 1994, pp. 485—488. outperform existing QR Jacobi-type algorithms, [13] A. Kavcic, B. Yang, Adaptive rank estimation for spherical even with fewer Givens rotations. subspace trackers, IEEE Trans. Signal Process. 44 (6) (June 1996) 206—217. [14] K. Konstantinides, K. Yao, Statistical analysis of effective singular values in matrix rank determination, IEEE Trans. Acknowledgements Accoust. Speech Signal Process. 36 (5) (May 1988) 757—763. This work was supported by a grant from the [15] R. Kumaresan, D.W. Tuffs, Estimating the angles of arri- val of multiple plane waves, IEEE Trans. Aerospace Elec- Natural Sciences and Engineering Research Coun- tron. Systems AES-19 (January 1982) 134—139. cil of Canada. Philippe Pango acknowledges a fel- [16] F.T. Luk, A triangular processor array for computing the lowship from the Government of Coˆ te d’Ivoire. The singular values, Linear Algebra Appl. 77 (1986) 259—273. authors would like to thank the anonymous re- [17] M. Moonen, Van Dooren, J. Vandewalle, A singular value viewers for their valuable comments. decomposition algorithm for subspace tracking, SIAM Matrix Anal. Appl. 13 (4) (October 1992) 1015—1038. [18] N.L. Owsley, Adaptive data orthogonalization, in: IEEE Internat. Conf. on Acoustic Speech and Signal Processing, Tulsa, 1978, pp. 109—112. References [19] P.A. Pango, Algorithmes rapides de mise a` jour de de´- compositions matricielles et application au traitement de [1] A. Barabell, Improving the resolution performance of signaux multi-dimensionnels, Ph.D Dissertation (to be eigenstructure-based direction-finding algorithms, in: published), INRS Telecommunications, 1998. Proc. Internat. Conf. Acoustic Speech and Signal Process- [20] P.A. Pango, B. Champagne, Accurate subspace tracking ing, 1983, pp. 336—339. algorithms based on cross-space properties, in: Proc. Inter- [2] C.H. Bischof, G.M. Shroff, On updating signal subspaces, nat. Conf. on Acoustic Speech and Signal Processing, IEEE Trans. Signal Process. 40 (1) (1992) 96—105. Munich, Vol. 5, 1997, pp. 3833—3836. P. Pango, B. Champagne / Signal Processing 74 (1999) 253—277 277

[21] P.A. Pango, B. Champagne, On the issue of rank estima- [28] G.W. Stewart, Error and perturbation bounds for subspa- tion in subspace tracking: the NA-CSVD solution, in: ces associated with certain eigenvalue problems, SIAM Proc. European Signal Processing Conf. EUSIPCO-98, Rev. 15 (1973) 727—764. Rhodes, Vol. 3, 1998, pp. 1773—1776. [29] G.W. Stewart, A Jacobi-like algorithm for computing the [22] D.J. Rabideau, Fast, rank adaptive subspace tracking, Schur decomposition of a Non-Hermitian matrix, SIAM J. IEEE Trans. Signal Process. 44 (9) (September 1996) Sci. Stat. Comput. 6 (4) (October 1985) 853—864. 2229—2244. [30] G.W. Stewart, An updating algorithm for subspace track- [23] D.J. Rabideau, A.O. Steinhardt, Fast subspace tracking, in: ing, IEEE Trans. Signal Process. 40 (6) (June 1992) Proc. 7th Signal Processing, Workshop on SSAP, June 1994. 1535—1541. [24] R. Roy, T. Kailath, Estimation of signal parameters via [31] G.W. Stewart, Updating a rank-revealing ULV decompo- rotational invariance techniques, IEEE Trans. Accoust. sition, SIAM J. Matrix Anal. Appl. 4 (1993) 494—499. Speech Signal Process. 37 (7) (July 1989) 984—995. [32] J.E. Volder, The CORDIC trigonometric computing tech- [25] R.O. Schmidt, Multiple emitter location and signal para- nique, IRE Trans. Electron. Comput. 8 (3) (September meter estimation, in: Proc. RADC Spectrum Estimation 1959) 330—334. Workshop, Rome, New York, 1979, pp. 243—258; reprinted [33] M. Wax, T. Kailath, Detection of signals by information in IEEE Trans. Antennas and Propagation 34 (3) (March theoretic criteria, IEEE Trans. Signal Process. 33 (1985) 1986) 276—280. 387—392. [26] R. Schreiber, Implementation of adaptive array algo- [34] B. Yang, Projection approximation subspace track- rithms, IEEE Trans. Accoust. Speech Signal Process. 34 ing, IEEE Trans. Signal Process. 43 (1) (January 1995) (October 1986) 1038—1045. 95—107. [27] Shen-Fu Hsiao, Adaptive Jacobi method for parallel sin- [35] J. Yang, M. Kaveh, Adaptive eigensubspace algorithms for gular value decompositions, in: Proc. Internat. Conf. on direction or frequency estimation and tracking, IEEE Acoustic Speech and Signal Processing, Vol. V, 1995, pp. Trans. Accoust. Speech Signal Process. 36 (2) (February 3203—3206. 1988) 241—251.