ONLINE DOMINANT GENERALIZED EIGENVECTORS EXTRACTION VIA A RANDOMIZED METHOD

Haoyuan Cai ⇤, Maboud F. Kaloorazi ⇤, Jie Chen ⇤, Wei Chen † and Cedric´ Richard ‡

⇤ Center of Intelligent Acoustics and Immersive Communications (CIAIC) School of Marine Science and Technology, Northwestern Polytechnical University, China † State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, China † Univiersite´ Cote d’Azur, France Emails: [email protected], [email protected], [email protected], [email protected], [email protected]

ABSTRACT because of narrow search space [19]. Tanaka [19] developed an on- line algorithm based on the scheme. To track the The generalized Hermitian eigendecomposition problem is ubiqui- r-dominant generalized eigenvectors, however, this method needs tous in signal and machine learning applications. Considering the (rN2) operations in each iteration (with N being the dimension need of processing streaming data in practice and restrictions of ex- ofO an input ), which is still computationally expensive. isting methods, this paper is concerned with fast and efficient gener- The generalized eigenvalues and eigenvectors are extracted from alized eigenvectors tracking. We first present a computationally effi- a matrix pencil (A, B). In online applications [2,4,6–10], however, cient algorithm based on randomization termed alternate-projections this pair is unknown, and the -1 update strategy [14–16, 18, 19] randomized eigenvalue decomposition (APR-EVD) to solve a stan- uses the observed streaming stochastic signals to estimate it. Also, dard eigenvalue problem. By exploiting rank-1 strategy, two online in many cases, the signal subspace spanned by the dominant gener- algorithms based on APR-EVD are developed for the dominant gen- alized eigenvectors, lies in a low-dimensional space [10]. This im- eralized eigenvectors extraction. Numerical examples show the prac- plies that low-rank approximation techniques can be applied to treat tical applicability and efficacy of the proposed online algorithms. GHEPs. Recent low-rank approximation methods based on random- Index Terms— Randomized algorithms, dominant generalized ized sampling [20–22] are computationally efficient and, in addition, eigenvectors, online algorithms, fast subspace tracking. can harness advanced computer architectures. Our Contributions. Through compounding a randomized low-rank 1. INTRODUCTION matrix factorization method and the rank-1 update strategy, we pro- The generalized Hermitian eigenvalue problem (GHEP) [1] is of pose two online algorithms for r-dominant generalized eigenvectors extraction, where r 1: we first present the APR-EVD (alternate- great interest in signal processing, machine learning and data anal- ysis applications. The GHEP algorithms provide powerful tools to projections randomized eigenvalue decomposition) algorithm that treat problems in blind source separation [2,3], feature extraction [4, efficiently solves a standard eigenvalue problem. Then, by harness- 5], noise filtering [6], fault detection [7], antenna array process- ing the rank-1 update scheme, we devise two line algorithms to ex- ing [8], classification [9], and speech enhancement [10]. Traditional tract the generalized eigenvectors with streaming data. Our proposed methods for solving the GHEP include power and algorithms are computationally efficient, as the necessary steps in each iteration need (N 2) operations. Further, they can be paral- based methods, Lanczos method and Jacobi-Davidson method [1, O 11]. These batch methods, however, are inefficient and, in some lelized on modern computers. cases, infeasible to apply due to their computational workload. The Notation. Normal fonts x and X denote . Boldface small online methods presented in [8,9,12] are gradient-based, and extract letters x and capital letters X denote column vectors and matrices, respectively. C denotes the complex domain. The superscript ( )⇤ the first dominant (or principal) . However, · denotes the conjugate of a complex number, ( )H denotes the Her- they are unsuitable for applications where multiple dominant gen- · eralized eigenvectors are desired [10]. In addition, these methods mitian operator, and ( )† denotes the pseudo-inverse of a · suffer from the so-called speed-stability problem [13], i.e., it is hard matrix. IN denotes an identity matrix of order N. orthr( ) con- · to select an appropriate learning rate to guarantee tracking speed and structs an orthonormal with r columns for the range of a matrix. numerical stability. To address the issue, coupled learning methods were proposed in [14, 15]. These methods, which are considered as 2. PROBLEM FORMULATION sequential methods, in addition to be difficult to parallelize in order to harness modern computational platforms, may even cause error N N Given a matrix pencil (Ry, Rx), where Ry, Rx ⇥ are Her- propagation during the procedure of orthogonal projection. Yang C mitian and positive definite, the GHEP [1, 11] is defined2 as: et al. [16] proposed recursive least-square (RLS)-based online al- gorithms based on the projection approximation subspace tracking Rywi = iRxwi,i=1, ,N (1) (PAST) technique [17]. The work in [18] presented a computation- ··· where w1, , wN are nonzero vectors corresponding to N gener- ally efficient algorithm, but it suffers from slow convergence speed ··· alized eigenvalues 1 >2 > >N > 0. Provided that Rx ··· This work was supported in part by NSFC grants 61671382 and is invertible, to obtain a generalized eigen-pair (wi,i), in general, 61811530283, and 111 project (B18041). Corresponding author: J. Chen. (1) is reduced to a Hermitian or non-Hermitian eigenvalue problem:

978-9-0827-9705-3 2353 EUSIPCO 2020 1/2 1/2 H Case HEP: Provided that Rx (Rx ) = Rx, the set of eigen- Computational Cost of APR-EVD. To factor A, APR-EVD incurs 1/2 H vectors w is obtained by (Rx ) v, where v is the set of eigen- these costs: generating a random matrix costs (Nd). Forming 2 O 1/2 1/2 H T G and H each costs (N d). Considering an estimation to (7), vectors of Rx Ry(Rx ) determined so that v Rxv = I. O forming T and computing an eigenpair cost (Nr2)+ (r3). Thus, Case Non-HEP: The generalized eigenvectors wi and generalized O O 1 the operation count (dominated by multiplications of A) satisfies eigenvalues i are obtained by solving Rx Rywi = iwi. In many signal and information processing applications, Rx and e 2 APR-EVD = (N d). (9) Ry are associated to data covariance matrices. Let the covariance C O H matrices of x(k) and y(k) be given by Rx = E x(k)x (k) and H { } Here d (the sampling size parameter) is very close to r. Ry = E y(k)y (k) . When processing streaming data, at each instant k {these matrices} are typically estimated by time averaging r with the most recent data [14–16, 18, 19, 23] as follows: 3.2. Online Algorithm for Extracting -dominant Generalized Eigenvectors with Case HEP (Algorithm 1). H Rx(k)=↵Rx(k 1) + x(k)x (k), (2) 1/2 1/2 H Directly recalculating Rx (k)Ry(k)(Rx (k)) has the com- H 3 Ry(k)=Ry(k 1) + y(k)y (k), (3) putation complexity of (N ). We therefore consider the estimated covariance matrices byO rank-1 matrices at each instant, i.e., equa- where parameters ↵ (0, 1) and (0, 1) are smoothing con- tions (2) and (3), together with the proposed APR-EVD algorithm to 2 2 stants. recursively compute the generalized eigenvectors. For ease of nota- 1/2 H Under the above setting, in this paper we devise two online algo- tion, let K(k)=Rx (k) and R(k)=K(k)Ry(k)K (k). rithms by first transforming (1) into a standard eigenvalue problem In order to use APR-EVD with the arrival of x(k) and y(k), we and then apply a randomized EVD algorithm, which enables pro- recursively update R(k), then G(k)=RH(k) (k) and H(k)= cessing streaming data for tracking generalized eigenvactors. R(k)G(k) so that the orthonormal basis at instant k, Q(k), can be extracted. Exploiting the results in [19], we update R(k) and K(k): 3. PROPOSED ALGORITHMS 1 R(k)= R(k 1) + y˜(k)y˜H(k)+x˜(k)cH(k) In this section, we first propose a randomized eigenvalue decompo- ↵ sition algorithm. We then adapt this algorithm to the streaming data + (⇥k)h(k)x˜H(k) , setting with Case HEP and Case Non-HEP respectively. 1 1 H K(k)= K(k 1)⇤ + 1(k)x˜(k)¯x (k). (10) p↵ 3.1. The APR-EVD Algorithm N N where Given a rank-r matrix A C ⇥ , the proposed APR-EVD algo- 2 y˜(k)=K(k 1)y(k), (11) rithm is computed as follows: we generate a random Gaussian matrix N d 1 C ⇥ , where r d

N d h(k)=rx(k)+a1(k)y˜(k), (14) Matrix G C ⇥ is a projection onto the row space of A by . 2 Next, we form the N d matrix: 1 H ⇥ ¯x (k)= K (k 1)x˜(k). (15) p↵ H = AG. (5) In the above relations, a1(k), 1(k), 2(k) and rx(k) are defined by: Matrix H is a projection onto the column space of A by G. After, H we orthonormalize the columns of H to obtain an N r basis Q: a1(k)=y˜ (k)x˜(k), (16) ⇥ 1 1 Q = orthr(H). (6) 1(k)= 2 1 , (17) 2 x˜(k) 1+ x˜(k) k k ⇣ k k ⌘ Note that the rank of H is at most r [24], and Q approximates 2 qH 2 2(k)= 1(k) (x˜ (k)rx(k)+ a1(k) ), (18) the range of A. Through exploiting Q, we use the Rayleigh-Ritz | | | | method [25, 26] to compute the eigenvalues i and corresponding rx(k)=R(k 1)x˜(k). (19) eigenvectors vi of the following matrix which is of order r: After updating R(k) and K(k), G(k) is obtained by the recursion: T = QHAQ. (7) G(k)=RH(k) r Defining ui , Qvi, consequently (ui,i) i=1 constitute the ap- 1 { } = G(k 1) + y˜(k)yH(k)+c(k)xH(k) proximate eigen-pairs of A. ↵ o o APR-EVD makes two passes over A given the matrix is stored H +⇥1(k)x˜(k)h (k) , (20) in the row-major format. By approximating T in (7), we devise a o single-pass algorithm, which can be directly employed for streaming H ⇤ H H where yo(k) , y˜(k), xo(k) , x˜(k), and ho(k) , Q H data processing. In doing so, we pre-multiply (7) by , obtain- h(k). After that, H(k) is obtained through the recursion: ing HQT = HQQHAQ. Having known that A QQHA ⇡ and by the definition of G (4), an estimate T is given by: H(k)=R(k)G(k) (21) H H 1 2 T =( Q)†G Q. (8) = H(k 1) + S1(k)+S2(k)+S3(k)+S4(k) . e ↵2 ⇥ ⇤ e

2354 4 The terms Si(k) i=1 are given by: where qx(k) , Qx(k 1)x(k). Using the above recursion and (2), { } P(k) H H H is consequently obtained by: S1(k)=ry(k)yo (k)+rc(k)xo (k)+1(k)rx(k)ho (k), (22) 1 H H P(k)= P(k 1) + qy(k)y (k) qx(k)z (k) , (40) H H ↵ S2(k)=y˜(k) yh (k)+a2(k)yo (k) H H ⇥ ⇤ + a3⇤(⇥k)xo (k)+1(k)a1(k)ho (k) , (23) where H H S3(k)=x˜(k) ch (k)+a3(k)yo (k) ⇤ H H qy(k)=Qx(k 1)y(k), (41) + a4(k)xo (k)+1(k)a5(k)ho (k) , (24) ⇥ H H H H H x (k)P(k 1) qx (k)y(k)y (k) H S4(k)=h(k) 1(k)xh (k)+1(k)a1⇤(k)y⇤o (k) z(k)= H + H . (42) ↵ + x (k)qx(k) ↵ + x (k)qx(k) H 2 H + 1(⇥k)a5⇤(k)xo (k)+ 1(k) a6(k)ho (k) . (25) ⇥ ⇤ | | H where ⇤ Combining these results leads to the update of G(k)=P (k) as: ry(k)=R(k 1)y˜(k), (26) 1 H H rc(k)=R(k 1)c(k), (27) G(k)= G(k 1) + y(k)my (k) z(k)mx (k) , (43) ↵ H h i yh(k)=G (k 1)y˜(k), (28) H H H where mx(k) , qx(k) and my(k) , qy(k). After, H(k) a2(k)=y˜ (k)y˜(k), (29) is obtained through the following recursion: H a3(k)=c (k)y˜(k), (30) H ch(k)=G (k 1)c(k), (31) H(k)=P(k)G(k) H 1 2 (44) a4(k)=c (k)c(k), (32) = H(k 1) + J1(k)+J2(k)+J3(k) , ↵2 H a5(k)=c (k)x˜(k), (33) ⇥ ⇤ 3 H The terms Ji(k) in (44) are given by: xh(k)=G (k 1)x˜(k), (34) i=1 { } H a6(k)=x (k)x˜(k). (35) H H J1(k)=dy(k)m (k) dz(k)m (k), (45) y x Next, we orthonormalize the columns of H(k) (21), obtaining Q(k): H H H J2(k)=qy(k) n (k)+b1(k)m (k) b2(k)m (k) , (46) y y x Q(k)=orth (H(k)), H H H r (36) J3(k)=qx(k)b3(k)mx (k) nz (k) b2⇤(k)my (k), (47) Then, T(k) is computed through the formula (8): where H H T(k)=( Q(k))†(G (k)Q(k)). (37) e dy(k)=P(k 1)y(k), (48) By performing the Rayleigh-Ritz method on T(k), we obtain dz(k)=P(k 1)z(k), (49) e H the eigenpair (⇤(k), V(k)). Accordingly, an approximation to the ny(k)=G (k 1)y(k), (50) r leading generalized eigenvectors of (Ry, Rx) is obtainede by: H e e b1(k)=y (k)y(k), (51) Wr(k)=K(k)Q(k)V(k). (38) H b2(k)=y (k)z(k), (52) H Computational Cost of Algorithm 1. The main steps involve com- nz(k)=G (k 1)z(k), (53) e putations of (10)-(35) in each iteration. The calculations of parame- H 6 2 b3(k)=z (k)z(k). (54) ters ai(k) i=1, i(k) i=1 require (N) operations. Computing { } N{ } O h(k), c(k) C costs (N). Computing yh(k), xh(k), ch(k), 2 Od 4 N d yo(k), xo(k), co(k) C for Si(k) i=1 C ⇥ , and updating Following the procedure described in Algorithm 1, we form T(k) N d2 { } 2 G(k), H(k) C ⇥ cost (Nd). Computing the column vectors and compute the eigenpair (⇤(k), V(k)). The r leading generalized x˜(k) y˜(k) x¯2(k) r (k) rO(k) r (k) N 6N 2 , , , y , x , c C requires multi- eigenvectors of the matrix pencil (Ry, Rx) are then estimatede via: N2 N 2 plications. Updating K(k), R(k) C ⇥ requires 4N multipli- e e cations. Thus, the dominant cost of2 Algorithm 1 is 10N 2 + (Nd). O Wr(k)=Q(k)V(k). (55) 3.3. Online Algorithm for Extracting r-dominant Generalized e Case Non-HEP Computational Cost of Algorithm 2. Computing the parameters Eigenvectors with (Algorithm 2). 3 bi(k) i=1 requires (N) operations. Computations of mx(k), 1 P(k)=Q (k)R (k) Q (k)=R (k) { } O d 3 Let x y , where x x . We first my(k), nz(k), ny(k) C cost (Nd). Computing Ji(k) i=1 N d 2 O N d { } recursively updates P(k). Applying the SM-formula (Sherman- C ⇥ , and updating G(k), H(k) C ⇥ cost (Nd). Updat- 2 2 N O 2 Morrison-formula) [11] immediately leads to a recursion for Qx(k): ing qx(k), qy(k), dz(k), dy(k), z(k) C needs 5N multiplica- 2 2 H 1 tions, and calculations of P(k), Qx(k) require 3N multiplications. Qx(k)= ↵Rx(k 1) + x(k)x (k) 2 The dominant cost of Algorithm 2 is thus 8N + (Nd). The extrac- H (39) 2 O 3 ⇥1 qx(k)qx⇤ (k) tion of generalized eigenvectors needs (Nr ) + (r ) operations. = Q (k 1) , O O 2 2 x H Therefore, the flop count of Algorithm 2 satisfies 8N + (Nr ). ↵ ↵ + x (k)qx(k) O h i

2355 1 0.3 Algorithm 2 Gradient-based Algorithm 1 Gradient-based Algorithm 2 R-GEVE 0.9 0.9995 0.25 PI-based Algorithm 1 0.999 PAST-based PI-based Gradient-based 0.8 0.9985 PAST-based R-GEVE Algorithm 1 GSVD 0.2 0.998 GSVD PAST-based PI-based 0.9975 0 0.7 10-4 9000 9500 10000 PI-based Algorithm 2 0.15 15 R-GEVE 0.6 1 Algorithm 1 10 Algorithm 1 Algorithm 2 Direction Cosine GSVD 0.1 Gradient-based PI-based 5 0.5 0.95 GSVD PAST-based PAST-based Sample Standard Deviation Gradient-based Gradient-based 9000 9500 10000 R-GEVE 0.05 Algorithm 2 0.4 0.9 R-GEVE R-GEVE PAST-based 0 200 400 600 GSVD 1 0.3 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Iteration Number(k) Iteration Number(k) (a) Direction cosine of the 1st generalized eigenvector. (b) Sample standard deviation of the 1st generalized eigenvector.

0.3 1 Algorithm 1 Gradient-based Algorithm 2 Algorithm 2 Gradient-based PI-based 0.9 0.9995 GSVD 0.25 PAST-based 0.999 R-GEVE Gradient-based 0.9985 0.8 PI-based PAST-based R-GEVE 0.998 Algorithm 1 0.2 GSVD 0.9975 0 10-3 0.7 9000 9500 10000 5 R-GEVE Algorithm 2 Algorithm 1 PAST-based 0.15 4 PI-based PI-based 3 Algorithm 1 0.6 Algorithm 1 1 2 GSVD

Direction Cosine Algorithm 2 0.1 PI-based 1 Gradient-based 0.5 R-GEVE PAST-based Algorithm 2 0.95 Sample Standard Deviation 9000 9500 10000 GSVD Gradient-based 0.05 0.4 PAST-based R-GEVE PAST-based 0.9 GSVD Gradient-based 0 200 400 600 1 0.3 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Iteration Number(k) Iteration Number(n) (c) Direction cosine of the 2nd generalized eigenvector. (d) Sample standard deviation of the 2nd generalized eigenvector. Fig. 1: DC and SSD results of tracking the first and second eigenvectors.

4. EXPERIMENTAL RESULTS set to ⌘ = 0.0005 (2/ (N 1) , 0). All algorithms are ini- 2 tialized with Rx = Ry = IN , wi(0) = ei for i =1, , 4, where ··· We validate proposed Algorithms 1 and 2 through a subspace track- ei is the ith column of IN . ing problem. The performance of our algorithms are compared To compare the convergence speed and estimation accuracy, we with the following algorithms: (1) Power-iteration based (PI-based) use the direction cosine, which measures the similarity between the method [19], (2) PAST-based method, here we use the sequential ith estimated and exact generalized eigenvectors of (Ry, Rx): version [16, Algorithm 2], (3) reduced-rank generalized eigenvector H extraction (R-GEVE) [18], (4) gradient-based method with negative wi (k)wi DCi(k)= , (58) step-size [15, Algorithm 3], and (5) batch mode GSVD [11]. wi(k) wi k kk k Our signals are generated by two sinusoids in additive noise de- e where wi and wi are the ith estimated and exact generalized eigen- fined in the time domain [14–16, 18, 19, 23] as follows: e vectors, respectively. Here the result of (58) is averaged over 100 x(k)=p2sin(0.46⇡k+✓2)+p2sin(0.74⇡k+✓3)+n1(k), (56) independente trials. Moreover, to measure the numerical stability of considered algorithms, we make use of the sample standard devia- p y(k)= 2sin(0.62⇡k+✓1)+n2(k), (57) tion (SSD) of the direction cosine defined as:

3 L where ✓i i=1 follow the uniform distribution (0, 2⇡), and n1(k) 1 2 { } U 2 SSDi(k)= DCi,j (k) DCi(k) , (59) and n2(k) are zero-mean white Gaussian noises with variance 1 = v L 1 2 =0.1. The vectors y(k) and x(k) are arranged in blocks u j=1 2 { } { } u X ⇥ ⇤ of size N =8, that is, y(k)=[y(k), ,y(k N +1)]>, where DCi,j (k) ist the direction cosine of jth independent trial, ··· x(k)=[x(k), ,x(k N +1)]>, and k N. The gener- where j =1, ,L, of the ith estimated generalized eigenvector, ··· ··· alized eigenvalues of matrix pencil (Rx, Ry) are given by 1 = and DCi(k) is the direction cosine of ith estimated generalized 16.0680, 2 =6.8302, 3 =1.0, 4 =1.0, 5 =0.1592, eigenvector averaged over L trials. Here L =100. 6 =0.0708, 7 =0.0254, and 8 =0.0198. We track the The results for the first two generalized eigenvectors are dis- first four dominant generalized eigenvectors. For the random ma- played in Figs 1. We make several observations: i) for the first gen- trix of Algorithms 1 and 2, we set d =5. The parameters of eralized eigenvector, Algorithm 2 outperforms other algorithms in considered algorithms are set as follows: For the PAST-based algo- terms of convergence speed, estimation accuracy and numerical sta- rithm, we set µ =0.998 as suggested in [16]; For R-GEVE, we set bility. ii) For the second generalized eigenvector, again Algorithm 1 = 2 =0.998 as proposed in [18]. For other algorithms, we set 2 shows the best performance among the algorithms considered in ↵ = =0.998. In the Gradient-based algorithm, the step-size is convergence speed and numerical stability, while its estimation ac-

2356 curacy being similar to that of the gradient-based method. iii) Al- [12] S. Choi, J. Choi, H.-J. Im, and B. Choi, “A novel adaptive gorithm 1 shows similar performance as the PI-based method, how- beamforming algorithm for antenna array cdma systems with ever it is computationally more efficient (see Section 3.2). iv) The strong interferers,” IEEE Transactions on Vehicular Technol- gradient-based method has the slowest convergence speed among all ogy, vol. 51, no. 5, pp. 808–816, 2002. methods. [13] R. Moller and A. Konies, “Coupled principal component anal- ysis,” IEEE Transactions on Neural Networks, vol. 15, no. 1, 5. CONCLUSION pp. 214–222, 2004. [14] X. Feng, X. Kong, Z. Duan, and H. Ma, “Adaptive generalized In this paper, we proposed the APR-EVD algorithm for standard eigen-pairs extraction algorithms and their convergence analy- eigenvalue decomposition through randomization. By exploiting sis,” IEEE Transactions on Signal Processing, vol. 64, no. 11, rank-1 update strategy, we developed two online algorithms based pp. 2976–2989, 2016. on APR-EVD for generalized eigenvectors extraction. Our numer- ical results show that Algorithm 2 outperforms the compared algo- [15] T. D. Nguyen, N. Takahashi, and I. Yamada, “An adaptive ex- rithms in tracking the first two dominant generalized eigenvectors traction of generalized eigensubspace by using exact nested in terms of convergence speed, estimation accuracy and numerical orthogonal complement structure,” Multidimensional Systems stability. Further, although Algorithm 1 has similar performance as and Signal Processing, vol. 24, no. 3, pp. 457–483, 2013. the PI-based method, it has lower computational cost. [16] J. Yang, H. Xi, F. Yang, and Y. Zhao, “RLS-based adaptive algorithms for generalized eigen-decomposition,” IEEE Trans- actions on Signal Processing, vol. 54, no. 4, pp. 1177–1188, 6. REFERENCES 2006. [1] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, [17] B. Yang, “Projection approximation subspace tracking,” IEEE Templates for the solution of Algebraic Eigenvalue Problems: Transactions on Signal processing, vol. 43, no. 1, pp. 95–107, A Practical Guide, SIAM, (2000). 1995. [2] L. Parra and P. Sajda, “ Blind source separation via general- [18] S. Attallah and K. Abed-Meraim, “A fast adaptive algorithm ized eigenvalue decomposition,” Journal of Machine Learning for the generalized symmetric eigenvalue problem,” IEEE Sig- Research, vol. 4, no. 4, pp. 1261–1269, Dec 2003. nal Processing Letters, vol. 15, pp. 797–800, 2008. [3] G. Yingbin, K. Xiangyu, Z. Zhengxin, and H. Li’an, “An adap- [19] T. Tanaka, “Fast generalized eigenvector tracking based on tive self-stabilizing algorithm for generalized eigen- the power method,” IEEE Signal Processing Letters, vol. 16, vector extraction and its convergence analysis,” IEEE Trans- no. 11, pp. 969–972, 2009. actions on Neural Networks and Learning Systems, vol. 29, [20] N. Halko, P.-G. Martinsson, and J. Tropp, “Finding Structure no. 10, pp. 4869–4881, Oct 2018. with Randomness: Probabilistic Algorithms for Constructing [4] X. Han and L. Clemmensen, “Regularized generalized eigen- Approximate Matrix Decompositions,” SIAM Review, vol. 53, decomposition with applications to sparse supervised feature no. 2, pp. 217–288, Jun 2011. extraction and sparse discriminant analysis,” Pattern Recogni- [21] M. Gu, “Subspace Iteration Randomization and Singular Value tion, vol. 49, pp. 43–54, Dec 2016. Problems,” SIAM J. Sci. Comput., vol. 37, no. 3, pp. A1139– [5] G. Yuan, L. Shen, and W. Zheng, “A Decomposition Algorithm A1173, 2015. for the Sparse Generalized Eigenvalue Problem,” in CVPR, [22] M. F. Kaloorazi and J. Chen, “Randomized Truncated Pivoted 2019, pp. 6113–6122. QLP Factorization for Low-Rank Matrix Recovery,” IEEE Sig- [6] A. Valizadeh and M. Najibi, “ A constrained optimization ap- nal Processing Letters, vol. 26, no. 7, pp. 1075–1079, Jul 2019. proach for an adaptive generalized subspace tracking algo- [23] T. D. Nguyen and I. Yamada, “Adaptive normalized quasi- rithm,” Computers &Electrical Engineering, vol. 36, no. 4, pp. newton algorithms for extraction of generalized eigen-pairs 596–602, 2010. and their convergence analysis,” IEEE Transactions on Signal [7] H. Chen, G. Jiang, and K. Yoshihira, “Failure detection in Processing, vol. 61, no. 6, pp. 1404–1418, 2012. large-scale internet services by principal subspace mapping,” [24] G. W. Stewart, Matrix Algorithms: Volume 1: Basic Decompo- IEEE Transactions on Knowledge and Data Engineering, sitions, SIAM, Philadelphia, PA, (1998). vol. 19, no. 10, pp. 1308–1320, 2007. [25] Y. Saad, Numerical methods for large eigenvalue problems. [8] D. R. Morgan, “Adaptive algorithms for solving generalized SIAM, 2nd Ed., 2011. eigenvalue signal enhancement problems,” Signal processing, [26] G. W. Stewart, “A generalization of saad’s theorem on vol. 84, no. 6, pp. 957–968, 2004. rayleigh–ritz approximations,” and its Appli- [9] S. Ding, X. Hua, and J. Yu, “An overview on nonparallel hyper- cations, vol. 327, no. 1-3, pp. 115–119, 2001. plane support vector machine algorithms,” Neural computing and applications, vol. 25, no. 5, pp. 975–982, 2014. [10] J. Benesty, M. G. Christensen, and J. R. Jensen, Signal en- hancement with variable span linear filters. Springer, 2016, vol. 7. [11] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed., Johns Hopkins Univ. Press, Baltimore, MD, (1996).

2357