Canonical Correlation Analysis (CCA)
Total Page:16
File Type:pdf, Size:1020Kb
C Canonical Correlation Analysis Canonical The latent variables (one per data variates table) computed in CCA (also called canonical variables, Hervé Abdi1, Vincent Guillemot2, Aida Eslami3 canonical variable scores, or and Derek Beaton4 canonical factor scores). The 1School of Behavioral and Brain Sciences, The canonical variates have maximal University of Texas at Dallas, Richardson, TX, correlation. USA Canonical The set of coefficients of the linear 2Bioinformatics and Biostatistics Hub, Institut vectors combinations used to compute the Pasteur (IP), C3BI, USR 3756 CNRS, Paris, canonical variates, also called France canonical weights. Canonical 3Centre for Heart Lung Innovation, University of vectors are also sometimes called British Columbia, Vancouver, BC, Canada canonical loadings. 4Rotman Research Institute, Baycrest Health Latent A linear combination of the Sciences, Toronto, ON, Canada variable variables of one data table. In general, a latent variable is Synonyms computed to satisfy some predefined criterion. Canonical analysis; Canonical variate analysis; Definition External factor analysis. Canonical correlation analysis (CCA) is a statisti- Glossary cal method whose goal is to extract the informa- tion common to two data tables that measure quantitative variables on a same set of observa- Canonical Correlation between two canonical tions. To do so, CCA creates pairs of linear com- correlation variates of the same pair. This is the binations of the variables (one per table) that have criterion optimized by CCA. maximal correlation. Canonical Correlation between the original loadings variables and the canonical variates. Sometimes used as a synonym for canonical vectors (because these quantities differ only by their normalization). # Springer Science+Business Media LLC 2018 R. Alhajj, J. Rokne (eds.), Encyclopedia of Social Network Analysis and Mining, DOI 10.1007/978-1-4614-7163-9_110191-1 2 Canonical Correlation Analysis Introduction Historical Background Originally defined by Hotelling in 1935 After principal component analysis (PCA), CCA (Hotteling 1935; Hotteling 1936, see also Bartlett is one of the oldest multivariate techniques that 1948), canonical correlation analysis (CCA) is a was first defined in 1935 by Hotelling. In addition statistical method whose goal is to extract the of being the first method created for the statistical information common to two data tables that mea- analysis of two data tables, CCA is also of theo- sure quantitative variables on a same set of obser- retical interest because a very large number of vations. To do so, CCA computes two sets of linear multivariate analytic tools are particular linear combinations – called latent variables – cases of CCA. Like most multivariate statistical (one for each data table) that have maximum techniques, CCA has been really feasible only correlation. To visualize this common information with the advent of modern computers. Recent extracted by the analysis, a convenient way is developments involve generalization of CCA to (1) to plot the latent variables of one set against the case of more than two-tables and cross- the other set (this creates plots akin to plots of validation approaches to select important vari- factor scores in principal component analysis); ables and the stability and reliability of the solu- (2) to plot the coefficients of the linear combina- tion obtained on a given sample. tions (this creates plots akin to plotting the load- ings in principal component analysis); and (3) to plot the correlations between the original vari- Canonical Correlation Analysis ables and the latent variables (this creates correla- tion circle” plots like in principal component Notations analysis). Matrices are denoted in upper case bold letters, CCA generalizes many standard statistical vectors are denoted in lower case bold, and their techniques (e.g., multiple regression, analysis of elements are denoted in lower case italic. Matrices, variance, discriminant analysis) and can also be vectors, and elements from the same matrix all declined in several related methods that address use the same letter (e.g., A, a, a). The transpose slightly different types of problems (e.g., different ⊺ operation is denoted by the superscript ,the normalization conditions, different types of data). inverse operation is denoted by À1. The identity matrix is denoted I, vectors or matrices of ones are denoted 1, and matrices or vectors of zeros are Key Points denoted 0. When provided with a square matrix, the diag operator gives a vector with the diagonal CCA extracts the information common to two elements of this matrix. When provided with a data tables measuring quantitative variables on vector, the diag operator gives a diagonal matrix the same set of observations. For each data table, with the elements of the vector as the diagonal CCA computes a set of linear combinations of the elements of this matrix. When provided with a variables of this table called latent variables or square matrix, the trace operator gives the sum of canonical variates with the constraints that a the diagonal elements of this matrix. latent variable from one table has maximal corre- The data tables to be analyzed by CCA of, respec- lation with one latent variable of the other table tively, size N Â I and N Â J, are denoted X and and no correlation with the remaining latent vari- Y and collect two sets of, respectively, I and J ables of the other table. The results of the analysis quantitative measurements obtained on the same are interpreted using different types of graphical N observations. Except if mentioned otherwise, displays that plot the latent variables and the coef- matrices X and Y are column centered and normal- ficients of the linear combinations used to create ized, and so: the latent variables. Canonical Correlation Analysis 3 ⊺ ⊺ ⊺ ⊺ ⊺ ⊺ ⊺ 1 X 5 0, 1 Y 5 0, (1) f f ¼ p X Xp ¼ p RXp ¼ 1 ¼ g g ⊺ ⊺ ⊺ ¼ q Y Yq ¼ q R q: (6) (with 1 being a conformable vector of 1 s and 0 a Y conformable vector of 0 s), and With these notations, the maximization prob- ÈÉ ÈÉ lem from Eq. 5 becomes: diag X⊺, X 51, diag Y⊺, Y 51: (2) ÈÉ ⊺ ⊺ arg max f g ¼ p Rq under the contraints that Note that because X and Y are centered and p,q normalized matrices, their inner products are cor- ⊺ ⊺ p RXp ¼ 1 ¼ q RYq: relation matrices that are denoted: (7) ⊺ ⊺ RX ¼ X X, RY ¼ Y Y and Equivalent Optimum Criteria ⊺ (3) R ¼ X Y: The maximization problem expressed in Eq. 5 can also be expressed as the following equivalent minimization problem: ÈÉ Optimization Problem 2 arg max jjXp À Yq2jj ¼ p,q fi In CCA, the problem is to nd two latent vari- ÈÉÈÉ⊺ ables, denoted f and g obtained as linear combi- arg max trace ðÞXp À Yq ðÞXp À Yq p,q nations of the columns of, respectively, X and Y. fi ⊺ ⊺ The coef cients of these linear combinations are under the contraints that p RXp ¼ 1 ¼ q RYq: stored, respectively, in the I Â 1 vector p and the (8) J Â 1 vector q; and, so, we are looking for f ¼ Xp and g ¼ Yq: (4) Lagrangian Approach A standard way to solve the type of maximization In CCA, we want the latent variables to have described in Eq. 7 is to use a Lagrangian approach. maximum correlation and so we are looking for To do so we need: first to define a Lagrangian that p and q satisfying: incorporates the constraints; next, to take the deriv- ative of the Lagrangian respective to the unknown fi d ¼ arg max fgcorðÞf, g quantities (here: p and q); and, nally, to set the p,q 8 9 derivative to zero in order to obtain the normal equations. Solving the normal equations will then <> ⊺ => f g give the values of the parameters that solve the ¼ arg max qÀÁffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : (5) > ⊺ > p,q : f f ðÞg⊺g ; optimization problem (i.e., for CCA: maximum correlation). Here the Lagrangian (denoted ℒ) that includes – The correlation between latent variables the constraints with the two Lagrange multipliers – maximized by CCA is called the canonical a and b is correlation (Gittins 2012; Hotteling 1936; Horst, ÀÁ 1961; Mardia et al. 1980). ⊺ ⊺ ℒ ¼ p Rq À a p RXp À 1 An easy way to solve the maximization prob- ÀÁ ⊺ lem of CCA is to constraint the latent variables to Àb q RYq À 1 : (9) have unitary norm (because, in this case, the term ⊺ f g gives the correlation between f and g). Specif- The derivatives of the Lagrangian for p and q are ically, we require that: 4 Canonical Correlation Analysis ÀÁ À1 ⊺ À1 @ℒ respectively, RY R RX R , associated with the ¼ Rq À 2aRXp (10) 2 @p first eigenvalue l1 = d , and that the maximum correlation (i.e., the canonical correlation) is @ℒ ⊺ equal to d. Note that in order to make explicit ¼ R p À 2bR q: (11) @q Y the constraints expressed in Eq. 7, the vectors p and q are normalized (respectively) in the metric ⊺ ⊺ RX and RY (i.e., p RXp = 1 and q RYq = 1). The Normal Equations Setting Eqs. 10 and 11 to zero gives the normal equations: Additional Pairs of Latent Variables Rq ¼ 2aRXp (12) After the first pair of latent variables has been ⊺ R p ¼ 2bRYq: (13) found, additional pairs of latent variables can be extracted. The criterion from Eqs. 5 and 7 is still used for the subsequent pairs of latent variables, Solution of the Normal Equations along with the requirement that the new latent The first step to solve the normal equations is to variables are orthogonal to the previous ones. a = b show that . This is done by premultiplying Specifically, if f‘ and g‘ denote the ‘-th pair of Eq. 12 by p and Eq. 13 by q to obtain (using the latent variables, the orthogonality condition constraints from Eq.