Theory of Correspondence Analysis A

Theory of Correspondence Analysis A CA is based on fairly straightforward, classical results in matrix theory. The central result is the singular value decomposition (SVD), which is the basis of many multivariate methods such as principal component analysis, canonical correlation analysis, all forms of linear biplots, discriminant analysis and met- ric multidimensional scaling. In this appendix the theory of CA is summarized, as well as the theory of related methods discussed in the book. Matrix–vector notation is preferred because it is more compact, but also because it is closer to the implementation of the method in the R computing language. Contents Correspondence matrix and preliminary notation ............. 201 Basic computational algorithm ....................... 202 A note on the singular value decomposition (SVD) ............ 203 The bilinear CA model ........................... 204 Transition equations between rows and columns .............. 204 Supplementary points ............................ 204 Total inertia and χ2-distances ........................ 204 Contributions of points to principal inertias ................ 205 Contributions of principal axes to point inertias (squared correlations) . 205 Ward clustering of row or column profiles ................. 206 Stacked tables ................................ 206 Multiple CA ................................. 206 Joint CA ................................... 207 Percentage of inertia explained in JCA ................... 207 Contributions in JCA ............................ 208 Adjusted inertias in MCA .......................... 208 Subset CA and subset MCA ......................... 209 Analysis of square asymmetric tables .................... 209 Canonical CA ................................. 210 Tables for testing for significant clustering or significant dimensions . 211 Let N denote the I × J data matrix, with positive row and column sums Correspondence (almost always N consists of nonnegative numbers, but there are some ex- matrix and ceptions such as the one described at the end of Chapter 23). For notational preliminary simplicity the matrix is first converted to the correspondence matrix P by notation T dividing N by its grand total n = i j nij = 1 N1 (the notation 1 is used for a vector of ones of length that is appropriate to its use; hence the first 1 is I × 1 and the second is J × 1 to match the row and column lengths of N). 201 202 Theory of Correspondence Analysis Correspondence matrix: 1 P = N (A.1) n The following notation is used: Row and column masses: J I ri = j=1 pij cj = i=1 pij (A.2) i.e., r = P1 c = PT1 Diagonal matrices of row and column masses: Dr = diag(r)andDc =diag(c)(A.3) Note that all subsequent definitions and results are given in terms of these relative quantities P = {pij }, r = {ri} and c = {cj}, whose elements add up to 1 in each case. Multiply these by n to recover the elements of the original matrix N: npij = nij , nri = i-th row sum of N, ncj = j-thcolumnsumofN. Basic The computational algorithm to obtain coordinates of the row and column computational profiles with respect to principal axes, using the singular value decomposition algorithm (SVD), is as follows: CA Step 1 — Calculate the matrix S of standarized residuals: 1 1 − 2 T − 2 S = Dr (P − rc )Dc (A.4) CA Step 2 — Calculate the SVD of S: T T T S = UDαV where U U = V V = I (A.5) where Dα is the diagonal matrix of (positive) singular values in descending order: α1 ≥ α2 ≥··· CA Step 3 — Standard coordinates Φ of rows: 1 − 2 Φ = Dr U (A.6) CA Step 4 — Standard coordinates Γ of columns: 1 − 2 Γ = Dc V (A.7) CA Step 5 — Principal coordinates F of rows: 1 − 2 F = Dr UDα = ΦDα (A.8) CA Step 6 — Principal coordinates G of columns: 1 − 2 G = Dc VDα = ΓDα (A.9) CA Step 7 — Principal inertias λk: 2 λk = αk,k=1, 2,...,K where K =min{I − 1,J − 1} (A.10) A note on the singular value decomposition (SVD) 203 The rows of the coordinate matrices in (A.6)–(A.9) refer to the rows or columns, as the case may be, of the original table, while the columns of these matrices refer to the principal axes, or dimensions, of which there are min{I −1,J−1}, i.e., one less than the number of rows or columns, whichever is smaller. Notice how the principal and standard coordinates are scaled: T T FDrF = GDcG = Dλ (A.11) T T ΦDrΦ = ΓDc Γ = I (A.12) That is, the weighted sum-of-squares of the principal coordinates on the k-th dimension (i.e., their inertia in the direction of this dimension) is equal to 2 the principal inertia (or eigenvalue) λk = αk, the square of the k-th singular value, whereas the standard coordinates have weighted sum-of-squares equal to 1. All coordinate matrices have orthogonal columns, where the masses are always used in the calculation of the (weighted) scalar products. The SVD is the fundamental mathematical result for CA, as it is for other di- Anoteonthe mension reduction techniques such as principal component analysis, canonical singular value correlation analysis and linear discriminant analysis. This matrix decomposi- decomposition tion expresses any rectangular matrix as a product of three matrices of simple (SVD) T structure, as in (A.5) above: S = UDαV . The columns of the matrices U and V are the left and right singular vectors respectively, and the positive values αk down the diagonal of Dα, in descending order, are the singular values. The SVD is related to the more well-known eigenvalue–eigenvector decomposition (or eigendecomposition) of a square symmetric matrix as follows: SST and STS are square symmetric matrices which have eigendecompositions T 2 T T 2 T SS = UDαU and S S = VDαV , so the singular vectors are also eigen- vectors of these respective matrices, and the singular values are the square roots of their eigenvalues. The practical utility of the SVD is that if one con- structs another I × J matrix S(m) from the the first m columns of U(m) and T V(m) and the first m singular values in Dα(m): S(m) = U(m)Dα(m)V(m),then S(m) is the least-squares rank m approximation of S (this result is known as the Eckart-Young theorem). Since the objective of finding low-dimensional best-fitting subspaces coincides with the objective of finding low-rank matrix approximations by least-squares, the SVD solves our problem completely and in a very compact way. The only adaptation needed is to incorporate the weighting of the rows and columns by the masses into the SVD so that the approximations are by weighted least squares. If a generalized form of the SVD were defined, where the singular vectors are normalized with weighting by the masses, then the CA solution can be obtained in one step. For example, the generalized SVD of the contingency ratios pij /(ricj), elements of the −1 −1 matrix Dr PDc , centred at the constant value 1, leads to the standard row and column coordinates directly: −1 −1 T T T T Dr PDc − 11 = ΦDαΓ where Φ DrΦ = Γ DcΓ = I (A.13) 204 Theory of Correspondence Analysis The bilinear From steps 1 to 4 of the basic algorithm, the data in P canbewrittenas CA model follows (see also (13.4) on page 101 and (14.9) on page 109): K pij = ricj 1+ λkφikγjk (A.14) k=1 (also called the reconstitution formula). In matrix notation: T 1/2 T P = Dr(11 + ΦDλ Γ )Dc (A.15) Because of the simple relations (A.8) and (A.9) between the principal and standard coordinates, this bilinear model can be written in several alternative ways — see also (14.10) and (14.11) on pages 109–110. Transition The left and right singular vectors are related linearly, for example by multi- equations between plying the SVD on the right by V: SV = UDα. Expressing such relations in rows and columns terms of the principal and standard coordinates gives the following variations of the same theme, called transition equations (see (14.1) & (14.2) and (14.5) & (14.6) for the equivalent scalar versions): Principal as a function of standard (barycentric relationships): −1 −1 T F = Dr PΓ G = Dc P Φ (A.16) Principal as a function of principal: −1 −1/2 −1 T −1/2 F = Dr PGDλ G = Dc P FDλ (A.17) The equations (A.16) are those that were mentioned as early as Chapter 3, which express the profile points as weighted averages of the vertex points, where the weights are the profile elements. These are the equations that govern the asymmetric maps. The equations (A.17) show that the two sets of principal coordinates, which govern the symmetric map, are also related by a barycentric (weighted average) relationship, but with scale factors (the inverse square roots of the principal inertias) that are different on each axis. Supplementary The transition equations are used to situate supplementary points on the map. points For example, given a supplementary column point with values in h (I × 1), divide by its total 1Th to obtain the column profile h˜ =(1/1Th)h and then use the profile transposed as a row vector in the second equation of (A.16), for example, to calculate the coordinates g of the supplementary column: g = h˜TΦ (A.18) Total inertia The total inertia of the data matrix is the sum of squares of the matrix S in and χ2-distances (A.4): I J 2 (pij − ricj) inertia = trace(SST)= (A.19) i=1 j=1 ricj Contributions of points to principal inertias 205 The inertia is also the sum of squares of the singular values, i.e., the sum of the eigenvalues: K K 2 inertia = αk = λk (A.20) k=1 k=1 The χ2-distances between row profiles and between column profiles are: J 2 2 pij pij χ -distances between rows i and i : − /cj (A.21) j=1 ri ri I 2 2 pij pij χ -distances between columns j and j : − /ri (A.22) i=1 cj cj To write the full set of χ2-distances in the form of a square symmetric matrix requires a bit more work.

Load more