A MULTIGRID METHOD for the MAXIMAL CORRELATION PROBLEM Xin-Guo Liu and Kun Wang (Communicated by Zhongzhi Bai) 1. Introduction
Total Page:16
File Type:pdf, Size:1020Kb
NUMERICAL ALGEBRA, doi:10.3934/naco.2012.2.785 CONTROL AND OPTIMIZATION Volume 2, Number 4, December 2012 pp. 785–796 A MULTIGRID METHOD FOR THE MAXIMAL CORRELATION PROBLEM Xin-Guo Liu and Kun Wang School of Mathematical Science Ocean University of China, Qiaodao 266100, China (Communicated by Zhongzhi Bai) Abstract. The maximal correlation problem (MCP) aiming at optimizing correlations between sets of variables plays an important role in several areas of statistical applications. Several algorithms have been proposed for MCP. However, these algorithms are not guaranteed to find a global optimal solution of MCP. Employing an idea of the multigrid method for PDEs, a new method for MCP is proposed in the present paper. Numerical tests are carried out and suggest superior performance of the new method to the others in finding a global optimal solution of MCP. 1. Introduction. The maximal correlation problem (MCP) as a generalization of the canonical correlation analysis [9] aims at assessing the relationship between n n sets of random variables. Suppose A R × is a symmetric and positive definite ∈ m matrix, and = n1,n2,...,nm is a set of positive integers with i=1 ni = n. P { } n ni Let Σm = x = (x>, , x> )> R : xi R , xi 2 =1, for i =1, 2P, ,m and { 1 ··· m ∈ ∈ k k ··· } ρ(x)= x>Ax. The MCP is formulated as the following optimization problem: max ρ(x), (1.1) s.t. x Σm. ∈ Upon employing the Lagrange multiplier theory, it is easy to see that the first or- der necessary optimality condition for (1.1) is the existence of real scalars λ , , λm 1 ··· and a vector x Σm that solve the system of equations: ∈ Ax =Λx, (1.2) x Σm, ∈ where Λ = diag λ1In1 , , λmInm . This gives the multivariate eigenvalue problem (MEP), and (Λ{, x) satisfying··· (1.2)} is called a solution to MEP. A brief but comprehensive discussion on the statistical background of MCP can be found in Sun [11] and Chu and Watterson [1]. More applications of MEP can be found in [10, 13]. Generalizations of MCP can be found in [12]. An important result proved in [1] states that whenever the matrix A has n distinct eigenvalues, m then there are precisely i=1(2ni) solutions to the MEP (1.2). Q 2000 Mathematics Subject Classification. Primary: 62H20; Secondary: 65F10. Key words and phrases. Multivariate statistics, correlation, maximal correlation problem, mul- tivariate eigenvalue problem, Horst method, Gauss-Seidel method, alternating projection method, global solution, starting point. This research was supported in part by NSF of China (10971204, 11071228) and NSF of Shan- dong province (Y2008A07). 785 786 XIN-GUOLIUANDKUNWANG Obviously, a special case of MEP where m = 1 is the classical symmetric ei- genvalue problem. The Horst method [7] is a generalization of the classical power iteration. Its convergence is established later by Hu [8], and independently by Chu and Watterson [1]. A Gauss-Seidel-type iteration is proposed in [1] and analyzed later in [17]. As another generalization of the Horst method, the P-SOR iteration is proposed and analyzed by Sun [11]. These algorithms for the general MCP stop at solutions of the associated MEP for a related matrix A. Recently, Grzeg´orski [4], and independently Zhang and Liao [18], proposes an alternating projection method (APM) (in [18], called alternating variable method (AVM)). They prove that (i) the algorithm converges globally and monotonically to a solution of MEP under some assumptions, and (ii) any accumulation point satisfies a globally optimal condition of MCP. However, simple examples show that the APM also can not guarantee to converge to a global optimal solution of MCP (see Section 3 below). In searching the global optimal solution of MCP, several achievements have been made by Hanafi and Ten Berge [6], Zhang and Chu [17], and Zhang, Liao and Sun [19]. For the general case, however, solving (1.1) globally still remains very challenging. One idea suggested by Xu [14] and independently by Zhang and Chu [17] is to set a starting point strategy for the existing algorithms. Even though numerical experiment has demonstrated the substantial improvement in boosting up the probability of finding a global optimal solution of (1.1), we are still not satisfied with the current algorithms. Employing an idea of the multigrid method for PDEs, in this paper, we propose a multigrid-type method for MCP. The rest of the paper is organized as follows. In the next section, we prove several properties of MCP. In Section 3, simple examples are presented to demonstrate that no algorithm so far can guarantee convergence to a global optimal solution of MCP. We propose a multigrid algorithm in Section 4. Finally, in Section 5, numerical tests are carried out and suggest superior per- formance of the new algorithm to the others in finding a global optimal solution of MCP. n n 2. Properties of MCP. Let A R × be symmetric and positive definite. If ∈ m = n ,n ,...,nm is a set of positive integers with ni = n, then we P { 1 2 } i=1 call a partition of n. Let = n1,n2,...,nm be aP partition of n, and let P P { } 0 = n1,...,nk 1, nk,nk+2,...,nm with nk = nk + nk+1. Then we call a finer P { − } P partition of 0. P Obviously, givenc a partition = n ,nc,...,nm of n, there are a series of P { 1 2 } partitions , , m, such that P1 ··· P (i) = n , m = , P1 { } P P (ii) i is finer than i 1 for i =2, ,m. P P − ··· Let i = t1, ,ti . Accordingly, let P { ··· } Rn Rtj Σi = x = (x1>, , xi>)> : xj , xj 2 =1, for j =1, 2, ,i , { ··· ∈ n ∈ tj k k ··· } Ωi = x = (x>, , x>)> R : xj R , xj 1, for j =1, 2, ,i . { 1 ··· i ∈ ∈ k k2 ≤ ··· } Theorem 2.1. 1 max ρ(x) max ρ(x) max ρ(x). (2.1) 2 x Σi ≤ x Σi−1 ≤ x Σi ∈ ∈ ∈ Proof. It is proved by Sun [11] that max ρ(x) = max ρ(x). x Σi x Ωi ∈ ∈ A MULTIGRID METHOD FOR MCP 787 This implies that the second inequality in (2.1) holds. On the other hand, we note 1 that x Ωi 1 for any x Ωi. Then √2 ∈ − ∈ 1 max ρ(x) = max ρ(x) = 2max ρ( x) 2 max ρ(y)=2 max ρ(x). x Σi x Ωi x Ωi √ ≤ y Ωi−1 x Σi−1 ∈ ∈ ∈ 2 ∈ ∈ This completes the proof. The following corollary presents bounds of the function value for MCP. Corollary 2.2. Let A = (aij )n n. Then × λmax(A) max ρ(x) min mλmax(A),µ , (2.2) ≤ x Σm ≤ { } ∈ where λmax(A) stands for the largest eigenvalue of A, and n µ = max didj aij . d = = dn =1 | 1| ··· | | i,jX=1 Proof. The first inequality in fact can be reformulated as max ρ(x) max ρ(x), x Σ x Σm ∈ 1 ≤ ∈ and directly obtained from Theorem 2.1. The inequality max ρ(x) µ x Σm ≤ ∈ is in fact equivalent to the following one: max ρ(x) max ρ(x), x Σm x Σn ∈ ≤ ∈ and directly obtained from Theorem 2.1. Finally, by the Courant-Fisher minmax theorem (see, e.g., [3, Theorem 8.1.2] ) one gets that ρ(x) mλ (A). ≤ max This completes the proof. The next theorem presents other bounds for the MCP value. Theorem 2.3. Let = n1,n2,...,nm be a partition of n. Partition A into block forms according to P as{ follows, } P A11 A12 ... A1m A21 A22 ... A2m Rn n A = . × , . .. ∈ A A ... A m1 m2 mm ni ni with Aii R × . Then ∈ m m 1 2 2 λmax(Aii) max ρ(x) ( λmax(Aii)) . (2.3) ≤ x Σm ≤ Xi=1 ∈ Xi=1 Proof. The first inequality is a consequence of [17, Theorem 3.2] . In order to prove the second inequality in (2.3), consider the Cholesky factorization of A, A = LT L, 788 XIN-GUOLIUANDKUNWANG Rn ni where L = [L1, ,Lm] with Li × . Then, for any x = (x1>, , xm> )> Σm, one gets that ··· ∈ ··· ∈ m 2 2 2 ρ(x)= Lx = L x + + Lmxm ( Li ) . k k2 k 1 1 ··· k2 ≤ k k2 Xi=1 This inequality and the fact Aii = Li>Li imply that the second inequality in (2.3) holds. This completes the proof. The following theorem presents a comparison of the lower bounds in (2.2) and (2.3). Theorem 2.4. m λ (A) λ (Aii). (2.4) max ≤ max Xi=1 Proof. Let Ax = λ (A)x, x Rn, x =1. max ∈ k k2 Partition x according to into block form P ni x = (x>, , x> )>, xi R . 1 ··· m ∈ Write xi as ni xi = tivi, vi R , vi =1, ti R. ∈ k k2 ∈ Let V = diag(v , , vm), t = (t , ,tm)>. 1 ··· 1 ··· Obviously, one has that V >V = Im, t =1, x = V t. k k2 As a consequence, we see that (V >AV )t = λmax(A)t. And therefore, m m λ (A) tr(V >AV )= v>Aiivi λ (Aii). max ≤ i ≤ max Xi=1 Xi=1 This completes the proof. The next corollary presents a comparison of the upper bounds in (2.2) and (2.3) under some assumptions. Corollary 2.5. Suppose Aii = In for i =1, 2, ,m. Then i ··· λ (A) m. (2.5) max ≤ Corollary 2.5 demonstrates that under the assumptions Aii = Ini for j =1, ,m, the upper bound in (2.2) is sharper than the one in (2.3). On the other hand,··· it is worth pointing out that for general A, mλmax(A) may be larger than the upper bound in (2.3) as the following example demonstrates.