Canonical Correlations Canonical Correlations
I Like Principal Components Analysis, Canonical Correlation Analysis looks for interesting linear combinations of multivariate observations.
I In Canonical Correlation Analysis, a multivariate response X is partitioned into two groups: X(1) p×1 X = (p+q)×1 X(2) q×1
0 (1) 0 (2) I We look for linear combinations U = a X and V = b X that are highly correlated.
Statistics 784 NC STATE UNIVERSITY 1 / 1 Multivariate Analysis Canonical Correlations
I The covariance matrix of X is correspondingly partitioned:
Σ Σ Cov(X) = Σ = 1,1 1,2 Σ2,1 Σ2,2
I Then a0Σ b Corr(U, V ) = 1,2 p 0 p 0 a Σ1,1a b Σ2,2b 0 I Without loss of generality, Var(U) = a Σ1,1a and 0 Var(V ) = b Σ2,2b can both be constrained to equal 1.
I The first pair of canonical variables are the U1 and V1 that maximize the correlation, subject to this constraint.
Statistics 784 NC STATE UNIVERSITY 2 / 1 Multivariate Analysis Canonical Correlations
I Differentiation shows that the maximizing a and b satisfy
Σ1,2b − αΣ1,1a = 0
Σ2,1a − βΣ2,2b = 0
for some α and β.
I Then
0 0 Corr(U, V ) = a Σ1,2b = αa Σ1,1a = α 0 = βb Σ2,2b = β √ so α = β = λ, say.
Statistics 784 NC STATE UNIVERSITY 3 / 1 Multivariate Analysis Canonical Correlations
I We assume that Σ1,1 and Σ2,2 are invertible, and then √ −1 Σ1,1Σ1,2b = λa √ −1 Σ2,2Σ2,1a = λb
I Eliminate b: −1 −1 Σ1,1Σ1,2Σ2,2Σ2,1a = λa. −1 −1 I That is, a is an eigenvector of Σ1,1Σ1,2Σ2,2Σ2,1 and b is a multiple −1 of Σ2,2Σ2,1a. −1 −1 I Consequently, b is an eigenvector of Σ2,2Σ2,1Σ1,1Σ1,2 with the same eigenvalue as a.
Statistics 784 NC STATE UNIVERSITY 4 / 1 Multivariate Analysis Canonical Correlations
√ I Since Corr(U, V ) = λ, the first canonical pair, U1 and V1, is 2 ∗2 determined by the largest eigenvalue λ1 = Corr(U1, V1) = ρ1 . I We then look for the pair U2 and V2 with the greatest correlation, subject to Corr(U1, U2) = Corr(V1, V2) = 0.
I As a bonus, Corr(U1, V2) = Corr(U2, V1) = 0.
I Not surprisingly, this second canonical pair, U2 and V2, is determined 2 ∗2 by the next largest eigenvalue λ2 = Corr(U2, V2) = ρ2 . I Continuing the process, we find r = min(p, q) canonical pairs.
Statistics 784 NC STATE UNIVERSITY 5 / 1 Multivariate Analysis Canonical Correlations Example: head and leg measurements of chicken bones
I Head group:
I skull length; I skull breadth.
I Leg group:
I femur length;
I tibia length.
I SAS proc cancorr program and output.
Statistics 784 NC STATE UNIVERSITY 6 / 1 Multivariate Analysis Canonical Correlations The number of canonical pairs
I In sample data, all r = min(p, q) canonical correlations will be positive, even if the data are drawn from a population with only m < r positively correlated pairs.
I proc cancorr tests the hypothesis
∗ ∗ ∗ H0 : ρm = ρm+1 = ··· = ρr = 0
for each value of m, m = 1, 2,..., r.
I The first null hypothesis is that the two groups are entirely uncorrelated. If that is rejected, the second is that the correlation between them is carried by a single linear combination of each group, etc.
Statistics 784 NC STATE UNIVERSITY 7 / 1 Multivariate Analysis Canonical Correlations Standardization
I The coefficient vectors ai and bi may be difficult to interpret, especially when the variables are in different physical units or have very different variances.
I Consequently, the data are often standardized, or equivalently, the correlation matrix is used instead of the covariance matrix.
I Unlike in Principal Components Analysis, the canonical variables are unchanged by any such scaling.
I proc cancorr reports both raw and standardized forms.
Statistics 784 NC STATE UNIVERSITY 8 / 1 Multivariate Analysis Canonical Correlations Correlations
I Sometimes the canonical variables can be interpreted as representing characteristics of the items being measured.
I The interpretation may be made based on the (raw or standardized) coefficients.
I The correlations between the original variables and the canonical variables can also be informative, but “must be interpreted with caution.”
I proc cancorr reports the correlations of each original variable with both sets of canonical variables.
Statistics 784 NC STATE UNIVERSITY 9 / 1 Multivariate Analysis Canonical Correlations Special Cases
p = q = 1: the “canonical” correlation between single variables X1 and X2 is just the absolute value of their simple Pearson correlation ρ.
p = 1, q > 1: the canonical correlation of a single variable X1 and a group of variables X(2) is the multiple correlation R in the (2) ∗2 2 regression of X1 on X , so ρ1 = R .
∗ 2 2 (2) I Also, in general, ρk is the R in the regression of Uk on X , and (1) also in the regression of Vk on X : canonical correlations are closely related to R2.
Statistics 784 NC STATE UNIVERSITY 10 / 1 Multivariate Analysis