Regularized Consensus PCA ∑

Regularized Consensus PCA Michel Tenenhaus (1), Arthur Tenenhaus (2), Patrick J. F. Groenen (3) (1) HEC Paris, Jouy-en-Josas, France, [email protected] (2) CentraleSupélec-L2S, UMR CNRS 8506, Gif-sur-Yvette, France and Bioinformatics- Biostatistics Platform IHU-A-ICM, Paris, France, [email protected] (3) Econometric Institute, Erasmus School of Economics, Erasmus University, Rotterdam, [email protected] Abstract A new framework for many multiblock component methods (including consensus and hierarchical PCA) is proposed. It is based on the consensus PCA model: a scheme connecting each block of variables to a superblock obtained by concatenation of all blocks. Regularized consensus PCA is obtained by applying regularized generalized canonical correlation analysis to this scheme for the function gx() xm where m 1. A gradient algorithm is proposed. At convergence, a solution of the stationary equation related to the optimization problem is obtained. For m = 1, 2 or 4 and shrinkage constants equal to 0 or 1, many multiblock component methods are recovered. Keywords: Consensus PCA, hierarchical PCA, multiblock component methods, regularized generalized canonical correlation analysis. 1. Introduction In this paper, we consider several data matrices XXX1,...,bB ,..., . Each nJ b data matrix Xb is called a block and represents a set of Jb variables observed on n individuals. The number and the nature of the variables usually differ from one block to another but the individuals must be the same across blocks. We assume that all variables are centered. Generalized canonical correlation analysis (Carroll, 1968a), consensus PCA (Westerhuis, Kourti, and MacGregor, 1998) and hierarchical PCA (Smilde, Westerhuis, and de Jong, 2003) can be considered as methods extending PCA of variables to PCA of blocks of variables. These methods belong to the family of multiblock component methods. These three methods are special cases of a very general optimization problem. The objective is to find block components yXwbbb, bB1,..., and an auxiliary variable yB1 solution of the following optimization problem: B m Maximize cov(Xwbb , y B1 ) w ,...,w, y (1) 11BB b1 t s.t. wMwbbb1, bB1, ..., , and var( y B1 ) 1 1 where m is any value 1 and MI(1 ) XXt with a shrinkage constant bb bn bb b between 0 and 1. The resulting auxiliary variable yB1 plays the role of a principal component of the superblock taking into account the block structure. The quantity 1 m mm cbbbBbbB cov(Xw , y11 ) / cov( Xw , y ) measures the contribution of block b to the b1 construction of the auxiliary variable yB1 . For m 2 and all b 0, (1) corresponds to Carroll’s generalized canonical correlation analysis. For m 2 and all b 1, consensus PCA (version of Westerhuis, Kourti, and MacGregor, 1998) is recovered; in this case, the auxiliary variable is in fact the first principal component of the superblock. For m 4 and all b 1, hierarchical PCA (version of Smilde, Westerhuis, and de Jong, 2003) is obtained. In this paper, we show that the three multiblock component methods mentioned above and many others can also be included in a general framework that we call “regularized consensus PCA”. It is based on the “consensus PCA model” defined as a scheme connecting each block XXX1,...,bB ,..., to the superblock XXXXBbB11 ,..., ,..., . In Wold (1982) and in Lohmöller (1989) this model is discussed under the name “hierarchical model with repeated manifest variables”, but in this paper we prefer to use the first name. In (1), the auxiliary variable is replaced by a superblock component yXwBBB111 and the following optimization problem is considered: B m Maximize cov(Xwbb , X B11 w B ) w ,...,w (2) 11B b1 t s.t. wMwbbb1, bB1,..., 1 1 where m 1, MI(1 ) XXt and 01 . Optimization problem (2) is a special bb bn bb b case of RGCCA-M, a new regularized generalized canonical correlation analysis method proposed by Groenen and Tenenhaus (2015). In this paper, (2) will be studied directly without reference to RGCCA-M because many simplifications occur and new mathematical properties are obtained when considering this specific problem. In Section 2, we present a very simple iterative algorithm (called R-CPCA) related to (2). It is proved that this algorithm is monotonically convergent and yields a solution of the stationary equation related to (2). Furthermore, under a mild condition, the sequences of block and superblock weight- vectors are converging to this solution. However, it is not guaranteed that the globally optimal solution of (2) is reached, although this seems to be the case in practical applications. Many well-known multiblock component methods are obtained as special cases of (2) by fixing m and the shrinkage constants 11,...,BB , . In Section 3, we study various special cases of the R-CPCA algorithm for m 1, 2 or 4 . When all shrinkage constants b are equal to 0, several correlation-based methods are recovered and when all these b are equal to 1, covariance-based methods are recovered. For 1 ... B 1 and B1 0 , or the opposite, generalizations of redundancy analysis are obtained. Using shrinkage constants between 0 and 1 insures a continuum between covariance-based and correlation-based methods. Higher order block and superblock components can be obtained by adding orthogonality constraints to (2). Various deflation strategies are discussed in Section 4. Regularized consensus PCA allows the construction of orthogonal components in each block and in the superblock. In the usual consensus PCA and related methods, this orthogonality property of 2 both the block and superblock components is not achievable because these methods use an auxiliary variable and not a superblock. In Section 5, we discuss the connections between many multiblock component methods, regularized generalized canonical correlation analysis (Tenenhaus and Tenenhaus, 2011) and PLS path modeling (Wold (1982, 1985), Lohmöller (1989), Tenenhaus, Esposito Vinzi, Chatelin and Lauro (2005)). 2. Regularized consensus PCA Consensus PCA was first introduced by Wold, Hellberg, Lundstedt, Sjöström and Wold (1987) as a method extending PCA of variables to PCA of blocks of variables. Westerhuis, Kourti, and MacGregor (1998) noticed some convergence problem in the consensus PCA algorithm and proposed a slightly modified version which solved this problem. Then, Hanafi, Kohler and Qannari (2011) showed that this modified algorithm was actually maximizing a covariance-based criterion. This modified consensus PCA algorithm and many other multiblock component methods are recovered with regularized consensus PCA. 2.1 Definition of regularized consensus PCA Regularized consensus PCA of a set of data matrices XXX1,...,bB ,..., observed on the same individuals is defined by optimization problem (2). The choice MIb corresponding t to b 1 is called Mode A while the choice MXXbbb (1 /n ) corresponding to b 0 is called Mode B. Choosing a shrinkage constant b between 0 and 1 yields a shrinkage t estimate MIbb(1 b )(1 /n ) XX bb of the true covariance matrix Σb related to block b (Ledoit and Wolf, 2004) and insures a continuum between Modes A and B. Note that the inverse of Mb is used in Algorithm 1 presented below to solve (2). Therefore, when Mode B is selected in (2) for a block or superblock data matrix Xb , we assume that this matrix is full rank. Note that when the superblock is full rank and Mode B selected for this superblock, problems (1) and (2) give the same results. This point will be discussed below. How to manage a situation where the user wants to select Mode B for not full rank block or superblock? We first consider a not full rank superblock. Mode B cannot be selected for this superblock in (2), but (1) can still be used instead. Regularized consensus PCA can also be extended to a situation where Mode B is selected for a not full rank block Xb (but with rank(Xb ) n , otherwise regularization is mandatory) by replacing in the criterion used in (2) Xwbb by yb and by imposing the constraints yXwbbb and var(yb ) 1. This situation will be discussed at the end of Section 2. 3 2.2 Regularized consensus PCA of transformed data 1/2 t Optimization problem (2) can be simplified. Let PXMbbb , QPPbbB 1 , 1/2 vMwbbb and y bbbbbXw Pv . Expressing (2) in terms of Pb and v b yields B m Maximize cov(Pvbb , P B11 v B ) v ,...,v (3) 11B b1 t s.t. vvbb1, bB1,..., 1. Maximizing each term of the criterion in (3) for a fixed v B1 gives * tt (4) vPPvbbBB11// PPv bBB 11 QvQv bBbB 1 1 t This computation assumes QvbB1 0. In fact, QvbB111 PP bB v B 0 implies that cov(Xwbb , y B1 ) 0 for any wb . This means that the block Xb does not contribute to the superblock component yB1 and could be removed from the analysis. Therefore, we suppose in the following that all QvbB1 appearing in this paper are strictly positive. For the optimal * v b in (4), we obtain 1 (5) cov(Pv* , P v ) Qv . bb B11 Bn bB 1 Using (5), optimization problem (3) becomes B 1 m (6) Maximize Qv s.t. v 1. m bB11 B n b1 B m Note that the objective function ()vQv b is convex and continuously differentiable b1 for m 1. Convexity comes from the fact that the Euclidean norm is convex and that a monotonically increasing convex function of a convex function is convex, too. Therefore, we may use all the results on the maximization of a convex function presented in Appendix 1. We first compute the gradient ()v of ()v : m 2 BBm B Qvm 221 Qv m (7) ()vQvQvQQvbb 2 =m t . bbbb bb11vv2 b 1 4 It appears from the following equation B t m (8) vv()mm Qvb () v b1 that the gradient ()v is different from 0 for any point v such that ()v 0.

Load more