Overview Multiple factor analysis: principal component analysis for multitable and multiblock data sets Herve´ Abdi1, Lynne J. Williams2 and Domininique Valentin3
Multiple factor analysis (MFA, also called multiple factorial analysis) is an extension of principal component analysis (PCA) tailored to handle multiple data tables that measure sets of variables collected on the same observations, or, alternatively, (in dual-MFA) multiple data tables where the same variables are measured on different sets of observations. MFA proceeds in two steps: First it computes a PCA of each data table and ‘normalizes’ each data table by dividing all its elements by the first singular value obtained from its PCA. Second, all the normalized data tables are aggregated into a grand data table that is analyzed via a (non-normalized) PCA that gives a set of factor scores for the observations and loadings for the variables. In addition, MFA provides for each data table a set of partial factor scores for the observations that reflects the specific ‘view-point’ of this data table. Interestingly, the common factor scores could be obtained by replacing the original normalized data tables by the normalized factor scores obtained from the PCA of each of these tables. In this article, we present MFA, review recent extensions, and illustrate it with a detailed example. © 2013 Wiley Periodicals, Inc.
How to cite this article: WIREs Comput Stat 2013. doi: 10.1002/wics.1246
Keywords: multiple factor analysis (MFA); multiple factorial analysis; multiblock correspondence analysis; STATIS; INDSCAL; multiblock barycentric discriminant analysis (MUDICA); multiple factor analysis barycentric discriminant analysis (MUFABADA); barycentric discriminant analysis (BADA); generalized Procrustes analysis (GPA); generalized singular value decomposition; principal component analysis; consensus PCA; multitable PCA; multiblock PCA
INTRODUCTION (see Ref 2). As such, MFA is part of the multitable (also called multiblock or consensus analysis3–20) ultiple factor analysis (MFA, also sometimes PCA family which comprises related techniques Mnamed ‘multiple factorial analysis’ to avoid the such as STATIS, multiblock correspondence analysis confusion with Thurstone’s multiple factor analysis (MUDICA), and SUM-PCA. described in Ref 1) is a generalization of principal MFA is a recent technique (ca 1980) that component analysis (PCA). Its goal is to analyze originated from the work of the French statisticians several data sets of variables collected on the same set Brigitte Escofier and Jer´ omeˆ Pages` (see Refs 14,21,22, of observations, or—as in its dual version—several sets for an introduction and for example see Ref 23, for of observations measured on the same set of variables an extensive and comprehensive review see Ref 24). The goals of MFA are (1) to analyze several data ∗Correspondence to: [email protected] sets measured on the same observations; (2) to 1Department of Brain and Behavioral Sciences, The University of provide a set of common factor scores (often called Texas at Dallas, Richardson, TX, USA ‘compromise factor scores’); and (3) to project each 2Rotman Research Institute, Baycrest, Toronto, ON, Canada of the original data sets onto the compromise to 3Universite´ de Bourgogne, Dijon, France analyze communalities and discrepancies. The main
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats idea behind MFA is remarkably simple and akin to evaluates the wines with his/her own set of scales). the idea behind the Z-score normalization that makes In this case, the first data set corresponds to the first variables comparable by dividing each element of a subject, the second one to the second subject and so variable by the variable standard deviation (i.e., the on. The goal of the analysis, then, is to evaluate if square root of the variance) of this variable. For a there is an agreement between the subjects or groups PCA a notion similar to the standard deviation is of subjects. the singular value which is the square root of an Alternatively, dual-MFA can be used when the eigenvalue (which can be seen as a variance). So, in same variables are measured on different populations MFA each data table is normalized by dividing all or on different participants. When both observations of its elements by the first singular value of this data and variables are the same for all data tables the table. This transformation ensures that the length (i.e., technique could be called (by analogy with STATIS, the singular value) of the first principal component of see, Ref 3) ‘partial triadic MFA’. each data table is equal to 1 and therefore that no data table can dominate the common solution only The Main Idea because it has a larger inertia on its first dimension. The general idea behind MFA is to normalize each of MFA is a popular method for analyzing mul- the individual data sets so that their first principal tiple sets of variables measured on the same obser- component has the same length (as measured by vations and it has been recently used in various the first singular value of each data table) and domains, such as sensory and consumer science then to combine these data tables into a common research (a domain where MFA applications and representation of the observations sometimes called developments have been particularly rich and var- the compromise,ortheconsensus. This compromise ied, see Refs 9,21,25–40), chemometry and pro- is obtained from a (non-normalized) PCA of the cess monitoring,9,41–43 ecology,44–53 agriculture54,55, grand table obtained from the concatenation of broadcasting,56 geology,57 neuroimaging,4,5,58–60 med- the normalized data tables. This PCA decomposes icine and health,61–64 genetics,54,65,66 statistical qual- the variance of the compromise into a set of new ity control,27,67–72 economy,73 and molecular biology orthogonal variables (i.e., the principal components to name but a few.46,50 In addition to being used also often called dimensions, axes, factors, or even in several domains of applications, MFA is also a latent variables) ordered by the amount of variance vigourous domain of theoretical developments that that each component explains. The coordinates of are explored later in this article. the observations on the components are called factor scores and these can be used to plot maps of the ob- When to Use It servations in which the observations are represented MFA is used when several sets of variables have been as points such that the distances in the map best measured on the same set of observations. The number reflect the similarities between the observations. The and/or nature of the variables used to describe the positions of the observations ‘as seen by’ each data observations can vary from one set of variables to the set are called partial factor scores and can be also other, but the observations should be the same in all represented as points in the compromise map. The the data sets. average of the factor scores of all the tables gives back For example, the data sets can be measurements the factor score of the compromise. A pictorial sketch taken on the same observations (individuals or objects, of the technique is provided in Figure 1. e.g., students) at different occasions (e.g., semesters). As the components are obtained by combining In this case, the first data set corresponds to the data the original variables, each variable ‘contributes’ a cer- collected at time 1 (e.g., the first semester), the second tain amount to each component. This quantity, called one to the data collected at time 2 (e.g., the second the loading of a variable on a component, reflects the semester) and so on. The goal of the analysis, then importance of that variable for this component and is to evaluate how the positions of the observations can also be used to plot maps of the variables that change over time (note, however, that MFA does not reflect their association. Squared loadings can also explicitly models the time variable as it does not be used to evaluate the importance of variables. A make assumptions about the relationships between variation over squared loadings, called contributions the measurements). evaluate the importance of each variable as the pro- In another example, the different data sets can portion of the explained variance of the component be the same observations (e.g., wines) evaluated by by the variable. The contribution of a data table to a different subjects (e.g., wine experts) or groups of sub- component can be obtained by adding the contribu- jects with different variables (e.g., each wine expert tions of its variables. These contributions can then be
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
used to draw plots expressing the importance of the Step 1: K tables of Jk variables collected on the same observations ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ data tables in the common solution. J1.1 J1.k Jk.1 Jk.k JK.1 JK.k
I1
⋅ NOTATIONS AND PRELIMINARIES
In Matrices are denoted by boldface uppercase letters (e.g., X), vectors by boldface lowercase letters (e.g., q), Step 2: Compute generalized PCA on each of the K tables (where ϒ is the first singular value of each table) elements of vectors and matrices are denoted by italic
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ J1.1 J1.k Jk.1 Jk.k JK.1 JK.k lower case letters with appropriate indices if needed I 1 (e.g., xij is an element of X). Blocks of variables (i.e., tables) are considered as sub-matrices of larger
⋅ matrices and are represented in brackets separated by vertical bars (e.g., a matrix X made of two sub- In matrices X[1] and X[2] is written X = X[1]|X[2] ). The identity matrix is denoted by I, a vector of ones GPCA GPCA GPCA is denoted by 1 (indices may be used to specify the dimensions if the context is ambiguous). The transpose ϒ ϒ ϒ of a matrix is denoted by . The inverse of a matrix 1 k K is denoted by −1. When applied to a square matrix, the diag operator takes the diagonal elements of this Step 3: Normalize each table by dividing by its first singular value (ϒ) matrix and stores them into a column vector; when applied to a vector, the diag operator stores the −1 elements of this vector on the diagonal elements of ϒ1 × = = a diagonal matrix. The trace operator computes the sum of the diagonal elements of a square matrix. The vec operator transforms a matrix into a vector by ϒ −1 k × = = stacking the elements of a matrix into a column vector. The standard product between matrices is implicitly denoted by simple juxtaposition or by × when it needs to be explicitly stated (e.g., XY = X × Y is the product ϒ −1 × = = of matrices X and Y). The Hadamard or element-wise K product is denoted by ◦ (e.g., X ◦ Y). The raw data consist of K data sets collected Step 4: Concatenate the K normalized tables on the same observations. Each data set is also called a table, a sub-table, or sometimes also a block or a study (in this article we prefer the term table or occasionally block). The data for each table are stored × in an I J[k] rectangular data matrix denoted by Y[k],
Step 5: Compute a generalized PCA on the concatenated table where I is the number of observations and J[k] the number of variables collected on the observations for the k-th table. The total number of variables is = denoted by J (i.e., J J[k]). Each data matrix is, in general, preprocessed (e.g., centered, normalized) and the preprocessed data matrices actually used in the analysis are denoted by X[k] (the preprocessing steps GPCA are detailed below in section More on Preprocessing). The K data matrices X[k], each of dimensions I rows by J[k] columns, are concatenated into the complete I by J data matrix denoted by X: = | | | | X X[1] ... X[k] ... X[K] . (1)
A mass, denoted by m , is assigned to each | i FIGURE 1 The different steps of MFA. observation. These masses are collected in the mass
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats vector, denoted by m, and in the diagonal elements of the eigenvalues of XXT and XTX (these two matrices the mass matrix denoted by M, which is obtained as have the same eigenvalues). Key property: the SVD provides the best M = diag {m} . (2) reconstitution (in a least squares sense) of the original matrix by a matrix with a lower rank. Masses are non-negative elements whose sum equals = 1 GSVD one. Often, equal masses are chosen with mi I . The GSVD generalizes the SVD of a matrix by incor- To each matrix X[k], we associate its cross- product matrixdefinedas porating two additional positive definite matrices (recall that a positive definite matrix is a square = T symmetric matrix whose eigenvalues are all positive) S[k] X[k]X . (3) [k] that represent ‘constraints’ to be incorporated in A cross-product matrix of a table expresses the pattern the decomposition (formally, these matrices are of relationships between the observations as seen by constraints on the orthogonality of the singular this table. Note that because of the block structure of vectors, see Refs 75,78 for more details). Specifically let M denote an I by I positive definite matrix X, the cross product of X can be expressed as representing the ‘constraints’ imposed on the rows T = | | | | of an I by J matrix X,andA a J by J positive definite XX X[1] ... X[k] ... X[K] matrix representing the ‘constraints’ imposed on the × | | | | T columns of X.MatrixM is almost always a diagonal X[1] ... X[k] ... X[K] matrix of the ‘masses’ of the observations (i.e., the rows); whereas matrix A implements a metric on = X XT [k] [k] the variables and if often but not always diagonal. k Obviously, when M = A = I, the GSVD reduces to the plain SVD. The GSVD of X, taking into account = . S[k] (4) M and A, is expressed as (compare with Eq. (5)): k X = P QT with PTMP = QTAQ = I (6) Singular Value Decomposition and where P is the I by L matrix of the normalized Generalized Singular Value Decomposition left generalized singular vectors (with L being the MFA is part of the PCA family and therefore its main rank of X), Q the J by L matrix of the normalized analytical tool is the singular value decomposition generalized right singular vectors, and the L by L (SVD) and the generalized singular value decompo- diagonal matrix of the L generalized singular values. sition (GSVD) of a matrix (see for tutorials, Refs The GSVD implements the whole class of generalized 74–80). We briefly describe these two methods below. PCA which includes (with a proper choice of matrices M and A and preprocessing of X) techniques such SVD as discriminant analysis, correspondence analysis, Recall that the SVD of a given I × J matrix Z canonical variate analysis, etc. With the so called decomposes it into three matrices as: ‘triplet notation’, that is used as a general framework to formalize multivariate techniques, the GSVD of X = U VT with UTU = VTV = I (5) X under the constraints imposed by M and A is equivalent to the statistical analysis of the triplet where U is the I by L matrix of the normalized left (X, A, M) (see, Refs 81–85). singular vectors (with L being the rank of X), V Key property: the GSVD provides the best the J by L matrix of the normalized right singular reconstitution (in a least squares sense) of the original vectors, and the L by L diagonal matrix of the L matrix by a matrix with a lower rank under the singular values; also γ, u,andv are, respectively, constraints imposed by two positive definite matrices. the th singular value, left, and right singular vectors. The generalized singular vectors are orthonormal with Matrices U and V are orthonormal matrices (i.e., respect to their respective matrix of constraints. UTU = VTV = I). The SVD is closely related to and generalizes the well-known eigendecomposition as U THE DIFFERENT STEPS OF MFA is also the matrix of the normalized eigenvectors of XXT, V is the matrix of the normalized eigenvectors MFA comprises three main steps: In the first step, of XTX, and the singular values are the square root of a PCA of each data table is performed and the first
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
singular value of each table recorded. In the second where 1[k] stands for a J[k] vector of ones. step a grand matrix is created by concatenating all the Alternatively, the weights can be stored as the data tables and a non-normalized generalized PCA is diagonal elements of a diagonal matrix denoted by A performed by decomposing the grand matrix with a obtained as GSVD where the column weights are obtained from the first (squared) singular value of each data table. A = diag {a} An equivalent way of performing this second step is to = T T T divide all the elements of a data table by the table’s first diag α11[1], ..., αk1[k], ..., αK1[K] . (12) singular value, concatenate the data tables into a grand data table and then perform a non-normalized PCA of this grand data table. In the third step, the observation partial factor scores for each table are computed by Step 2: Generalized PCA of X projecting each data table onto the common space. GSVD of X As MFA boils down to the PCA of the grand After the weights have been collected, they are used data table, the usual PCA indices can be computed to compute the GSVD of X under the constraints to identify the important components, observations, provided by M (masses for the observations) and A and variables (see Ref 75). In addition some indices— (squared singular value derived weights for the K specific to MFA—can also be derived to quantify the tables). This GSVD is expressed as: importance of each table in the common solution. X = P QT with PTMP = QTAQ = I. (13) Step 1: PCA of Each Data Table This GSVD corresponds to a generalized PCA of In the first step of MFA each data table is analyzed via matrix X and, consequently, will provide factor scores a standard PCA. Specifically, each table is expressed to describe the observations and factor loadings via its SVD as to describe the variables. Each column of P and Q refers to a principal component also called a X = U VT with UT U = VT V = I. (7) [k] [k] [k] [k] [k] [k] [k] [k] dimension (because the numbers in these columns are often used as coordinates to plot maps, see From the SVD of each table we also obtain its factor 75 for more details). In PCA, Eq. (13) is often scores (as in any standard PCA) that are computed as rewritten as G = U . (8) [k] [k] [k] X = FQT with F = P (14) The matrices U and V store (respectively) the left [k] [k] where F stores the factor scores (describing the and right singular vectors of the table X , whose [k] observations) and Q stores the loadings (describing singular values are stored in the diagonal of the the variables). Note, incidentally, that in the triplet (diagonal) matrix [k] notation, MFA is equivalent to the statistical analysis of the triplet (X, A, M). γ , ..., γ , ..., γ = diag . (9) 1,k ,k L,k [k] Because the matrix X concatenates K tables, each of them, in turn, comprising J variables, the matrix In MFA, the weight of a table is obtained from the [k] Q of the left singular vectors can be partitioned in the first singular value of its PCA. This weight, denoted by same way as X. Specifically, Q can be expressed as a α , is equal to the inverse of the first squared singular k column block matrix as: value: ⎡ ⎤ 1 − Q[1] α = = γ 2. (10) ⎢ . ⎥ k 2 1,k ⎢ . ⎥ γ1,k ⎢ . ⎥ T = ⎢ ⎥ = T | | T | | T Q ⎢ Q[k] ⎥ Q[1] ... Q[k] ... Q[K] , (15) For convenience, the α weights can be gathered in ⎢ . ⎥ ⎣ . ⎦ a J by 1 vector denoted a where each variable is Q assigned the α weight of the matrix to which it belongs. [K] Specifically, a is constructed as: where Q[k] is a J[k] by L (with L being the rank of X) matrix storing the right singular vectors corresponding a = α 1T , ..., α 1T , ..., α 1T , (11) 1 [1] k [k] K [K] to the variables of matrix X[k]. With this in mind,
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats
Eq. (13) can re-expressed as: This equation suggests that the partial factor scores for a table can be defined from the projection of this = | | | | = T X X[1] ... X[k] ... X[K] P Q table onto its right singular vectors (i.e., Q[k]). Specif- ically, the partial factor scores for the kth table are T T = T | | T | | T stored in a matrix denoted by F[k] computed as P Q[1] ... Q[k] ... Q[K] = × × F[k] K αk X[k]Q[k]. (22) = T | | T | | T P Q[1] ... Q[k] ... Q[K] Note that the compromise factor scores matrix is the barycenter (also called centroid or center of gravity = T | | T | | T P Q[1] ... P Q[k] ... P Q[K] . (16) see Ref 86) of the partial factor scores because it is the average of all K partial factor scores (cf., Eq. (20)): Note, that, the pattern in Eq. (13) does not completely generalize to Eq. (16) because, if we define A as 1 1 [k] F = Kα X Q = α X Q = F. K [k] K k [k] [k] k [k] [k] A = α I, (17) k k k [k] k (23) we have, in general, QT A Q = I. [k] [k] [k] Also as in standard PCA, the elements of Q are loadings and can be plotted either on their own or Factor Scores along with the factor scores as a biplot (see Refs The factor scores for X represent a compromise 87,88). As the loadings come in blocks (i.e., the (i.e., a common representation) for the set of the K loadings correspond to the variables of a table), it matrices. Recall that these compromise factor scores, makes sense to create a biplot with the partial factor are computed (cf., Eqs (13) and (14)) as scores (i.e., F[k]) for a block and the loadings (i.e., Q[k]) F = P . (18) for this block. In doing so, it is often practical to normalize the loadings such that their variance is Factor scores can be used to plot the observations as commensurable with the variance of the factor scores. done in standard PCA for which each column of F This can be achieved, for example, by normalizing, represents a dimension. Note that the variance of the for each dimension, the loadings of a block such that factor scores of the observations is computed using their variance is equal to the square of the singular their masses (stored in matrix M) and can be found value of the dimension or even to the singular value as the diagonal of the matrix FTMF. This variance itself (as illustrated in the example that we present is equal, for each dimension, to the square of the in a following section). These biplots are helpful for singular value of this dimension as shown by understanding the statistical structure of each block, even though the relative positions of the factor scores FTMF = PTMP = 2. (19) and the loadings are not directly interpretable because only the projections of observations on the loading As in standard PCA, F can be obtained from X vectors can be meaningfully interpreted in a biplot by combining Eqs (13) and (18) to get: (cf., Refs 87,88). An alternative pictorial representation of the F = P = XAQ. (20) variables and the components plots the correlations between the original variables of X and the factor Taking into account the block structure of X, A,and scores. These correlations are plotted as two- Q,Eq.(13)canalsoberewrittenas(cf.,Eq.(17)): dimensional maps in which a circle of radius one ⎡ ⎤ (called the circle of correlation75,89) is also plotted. Q[1] The closer to the circle a variable is, the better this ⎢ ⎥ ⎢ . ⎥ variable is ‘explained’ by the components used to ⎢ . ⎥ = = | | | | × × ⎢ ⎥ create the plot (see Refs 23,24 for examples). Loadings F XAQ X[1] ... X[k] ... X[K] A ⎢ Q[k] ⎥ ⎢ ⎥ and correlations as often used interchangeably because ⎣ . ⎦ . these two concepts are very similar and, sometimes the Q[K] names loading is used for both concepts (see Ref 75). In fact, loadings and correlation differ only by a = = X[k]A[k]Q[k] αkX[k]Q[k]. (21) normalization factor: the sum of the squared loadings k k of all the variables for a given dimension is equal to
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis one whereas the sum of the squared correlations of contribution, the more the observation contributes all the dimensions for a given variable is equal to one to the component. A useful heuristic is to base the (and therefore it is always possible to transform one interpretation of a component on the observations set into the other). that have contributions larger than the average contribution. Observations with high contributions and whose factor scores have different signs can then HOW TO FIND THE IMPORTANT be contrasted to help interpreting the component. ELEMENTS: CONTRIBUTIONS, ETC. Alternatively (as described in a later section) we can derive pseudo t statistics (called bootstrap ratios) in Contributions of Observations, Variables, order to find the observations important for a given and Tables to a Dimension dimension. In MFA, just like in standard PCA, the importance of a dimension (i.e., principal component) is reflected by Contributions of a Variable to a Dimension its eigenvalue which indicates how much of the total As we did for the observations, we can find inertia (i.e., variance) of the data is explained by this the important variables for a given dimension by component. computing variable contributions. The variance of the loadings for the variables is equal to one when the α To better understand the relationships between weights are taken into account (cf., Eq. (13)). So if we components, observations, variables, and tables denote by a the α weight for the j th variable (recall and also to help interpret a component, we can j that all variables from the same table share the same evaluate how much an observation, a variable, or α weight cf., Eq. (11)), we have a whole table contribute to the inertia extracted by a component. In order to do so, we com- 2 1 = aj × q (26) pute descriptive statistics, called contributions (see j, j Refs 78,89–91 and Ref 75, p. 437ff.). The stability of these descriptive statistics can be assessed by cross- where qi, is the loading of the jth variable for the × 2 validation techniques such as the bootstrap whose th dimension. As all terms aj qi, are positive or results can be used to select the relevant elements for null, we can evaluate the contribution of a variable to a dimension. a dimension as its squared weighted loading for this dimension. Formally, the contribution of variable j to Contribution of an Observation to a Dimension component , denoted ctrj,, is computed as As stated in Eq. (19), the variance of the factor = × 2 scores for a given dimension is equal to its eigenvalue ctrj, aj qj,. (27) (i.e., the square of the singular value) associated with Variable contributions take values between 0 and this dimension. If we denote λ , the eigenvalue of a 1, and for a given component, the contributions of given dimension, we can rewrite Eq. (19) as all variables sum to 1. The larger a contribution of a variable to a component the more this variable = × 2 λ mi fi, (24) contributes to this component. Variables with high i contributions and whose loadings have different signs can then be contrasted to help interpreting the where m and f are, respectively, the mass of the ith i i, component. observation and the factor score of the ith observation × 2 for the th dimension. As all the terms mi fi, are Contribution of a Table to a Dimension positive or null, we can evaluate the contribution Specific to multiblock analysis is the notion of of an observation to a dimension as the ratio of a table contribution. As a table comprises several the squared weighted factor score by the dimension variables, the contribution of a table can simply be eigenvalue. Formally, the contribution of observation defined as the sum of the contributions of its variables i to component , denoted ctri,, is computed as (a simple consequence of the Pythagorean theorem that states that squared lengths are additive). So the × 2 mi fi, contribution of table k to component is denoted ctri, = . (25) λ ctrk, and is defined as
J Contributions take values between 0 and 1, and for [k] = a given component, the sum of the contributions ctrk, ctrj,. (28) of all observations is equal to 1. The larger a j
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats
Table contributions take values between 0 and 1, An eigen-decomposition of the K by K between- and for a given component, the contributions of all table of RV or Lg coefficients can provide factor tables sum to 1. The larger a contribution of a table scores for the tables that can be used to plot maps of to a component, the more this table contributes to the tables in an analogous way to the STATIS method this component. The contributions of the tables for a (see Ref 3). given dimension sum to one, an alternative approach re-scales the contributions so that the sum of the contributions for a dimension is now equal to ALTERNATIVE PRESENTATION the eigenvalue of this dimension. These re-scaled OF MFA contributions are called partial inertias and are denoted Ipartial. The partial inertias are obtained from MFA with Cross-Product Matrices the contributions by multiplying the contributions for An alternative computational approach to MFA— a dimension by the dimension eigenvalue. particularly useful for generalizing to other data types Table contributions and partial inertias can be and for comparison to other methods—uses the cross- used to create plots that show the importance of these product matrices S[k]. In this context, the first step is tables for the components. These plots can be drawn to compute an average cross-product matrix (called one component at a time or two (or rarely three) a ‘compromise’ by analogy with the STATIS method, components at a time in a manner analogous to factor see 3) which is computed as the weighted average maps. of the S[k] matrices with weights provided by the elements of α. Specifically, the compromise cross- product matrix is denoted S[+] and is computed as How to Analyze the Between-Table Structure K S + = α S . (31) To evaluate the similarity between two tables one [ ] k [k] can compute coefficients of similarity between data k tables. A traditional coefficient is Escoufier’s RV Note that S[+] can also be directly computed from coefficient (see Refs 92,93, see, also Refs 94 and X as 95 for alternatives), which can be interpreted as a non centered squared coefficient of correlation between T S[+] = XAX . (32) two matrices. The RV coefficient varies between 0 and 1 and reflects the amount of variance shared by two The compromise matrix being a weighted sum matrices. Specifically, the R coefficient between data V of cross-product matrices, is also a cross-product tables k and k is computed as matrix (i.e., it is a positive semi-definite matrix) and therefore its eigendecomposition amounts to a T × T trace X[k]X[k] X[k ]X[k] PCA. The generalized eigendecomposition under the = RVk,k . constraints provided by matrix M of the compromise T × T trace X[k]X X[k]X gives: [k] [k] × T × T trace X[k ]X X[k ]X T T [k ] [k ] S[+] = P P with P MP = I. (33) (29) Eqs (32) and (13), together indicate that the A slightly different coefficient, called the Lg generalized eigenvectors of the compromise are the coefficient, is also often used in the context of MFA. left generalized singular vectors of X (cf., Eq. (13)) This coefficient reflects the MFA normalization and and that the eigenvalues of S[+] are the squares of the = 2 takes positive values. Specifically, the Lg coefficient singular values of X (i.e., ). The loadings can between data tables k and k is computed as be computed by rewriting Eq. (13) as = T −1 T × T Q X MP . (34) trace X[k]X[k] X[k ]X[k] L = g(k,k ) 2 × 2 γ γ Similarly, the compromise factor scores can be 1,k 1,k computed from S[+] (cf., Eq. (20)) as = T × T trace X[k]X[k] X[k ]X[k] = −1 × × F S[+]MP . (35) (αk αk ). (30)
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
In this context, the loadings for the variables from MFA from the Table Factor Scores table k are obtained from Eq. (34) as The whole set factor scores derived from the PCA − of one of the tables (i.e., from Eq. (8)) spans Q = XT MP 1. (36) [k] [k] the same space as the column space of this table. The factor scores for table k are obtained from Eqs Therefore, the MFA can be computed directly from (36) and (22) as these factor scores, and so the MFA factor scores (and loadings) can be obtained from the SVD of = = T −1 the table factor scores (as defined in Eq.(8)) instead F[k] KαkX[k]Q[k] KαkX[k]X[k]MP than from the SVD of X. This approach can be = −1 KαkS[k]MP . (37) particularly useful when the number of variables in some tables is significantly larger than the number MFA as Simple PCA of observations (a configuration referred to as the MFA can also be computed as the simple PCA of the ‘N << P problem’), as can occur with problems such as, for example, brain imaging, genomics, or data set of the X[k] matrices, each weighted by the square root of its respective α weight (this assumes, as it is mining. the case in general for MFA, that matrix M is equal To do so, suffice to replace in Eq. (1) each 1 matrix X[k] by the corresponding table of factor to I I). Specifically, if we define the matrix √ √ √ scores G[k] and call the new grand matrix G.The = | | | | X α1X[1] ... α[k]X[k] ... αKX[K] , (38) GSVD of matrix G with masses M and weights A (note that A needs to be made conformable with whose (simple) SVD is given by the new dimensions of G) will provide observation factor scores which will be identical to those obtained = T T = T = X P Q with P P Q Q I. (39) from the analysis of X. This GSVD will also provide loadings for the table factor scores. The variable Then the factor scores for the observations can be obtained as loadings can be obtained by projecting these variables as supplementary columns (and will be identical to F = P . (40) those obtained from the analysis of X). Conversely, loadings for the table factor scores can be obtained The loadings for the kth table are obtained as by projecting the table factor scores as supplementary variables onto the analysis obtained from X. All this = 1 Q[k] √ Q[k]. (41) shows that the analyses obtained from X or from G α are identical. The MFA results can also be obtained from the To prove this last identity, we use the relations between the simple SVD and the GSVD (see Refs simple SVD of the matrix G obtained by treating the 74,76,78,79 for more details). To do so, we first G[k] matrices as defined in Eq. (38) (i.e., by dividing need to re-express X as a function of X and A as: all the elements of each matrix G[k] by its first singular value). √ √ √ = |···| |···| = 1 X α1X[1] α[k]X[k] αKX[K] XA 2 . (42) More on Preprocessing To obtain this result we use here the facts (1) that The preprocessing step is a crucial part of the A is defined as ‘blocks’ of K values α[k] and (2) that analysis and can be performed on the columns or A being a diagonal matrix with positive elements on the rows. Most of the time, each variable is is positive definite (and therefore its square root is centered (i.e., the mean of each variable is 0) and uniquely determined). Rewriting the GSVD of X in normalized (i.e., the sum of the squared elements terms of X shows that of each column is equal to one, I,orevenI − 1). In some cases, the normalization affects the rows = − 1 = T − 1 X XA 2 P Q A 2 (43) of the matrix and in this case the sum of each row can be equal to 1 (e.g., as in correspondence and therefore that analysis,78 see also Ref 96 and 154 for an explicit − 1 Q = QA 2 , (44) integration of correspondence analysis and MFA) or the sum of squares of the elements of a given row can which is equivalent to Eq. (41) (and completes the be equal to one (e.g., as in Hellinger/Bhattacharyya proof). analysis97–101).
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats
T SUPPLEMENTARY ELEMENTS (A.K.A. this observation from Eq. (22). Specifically, let xsup[k] OUT OF SAMPLE) denote a 1 by J[k] vector of measurements collected on T the J[k] variables of table k (note that xsup[k] should As in standard PCA, we can use the results have been pre-processed in the same way as the whole of the analysis to compute approximate statistics matrix X[k]). The partial factor scores for this supple- (e.g., factor scores, loadings, or optimum weights) mentary observation from table k are obtained as: for new elements (i.e., elements that have not been used in the analysis). These new elements are = × T × fsup[k] K x [k] Q[k]. (47) called supplementary,75,78,89–91,102–105 illustrative,or sup 106 out of sample elements. In contrast with the Incidentally, the factor scores of a supplementary supplementary elements, the elements actually used observation collected on all tables can also be obtained in the analysis are called active elements. The statistics as the average of the supplementary partial factor for the supplementary elements are obtained by scores (see Eqs (23) and (45)). projecting these elements onto the active space. In the To compute the loadings for a supplementary MFA framework, we can have supplementary rows variable for a specific table, it suffices to pre-process and columns (like in PCA) but also supplementary this variable like the variables of this table (and this tables. Supplementary rows for which we have values includes dividing all the data of the table by the first for all J variables and supplementary variables for singular value of this table) and then to use Eq. (46). which we have measurements for all I observations are projected in the same way as for PCA (see Ref 75, pp. 436ff.). Computing statistics for supplementary Supplementary Tables tables, however, is specific to MFA. As MFA involves tables, it is of interest to be able to project a whole table as a supplementary element. This table will include new variables measured on the Supplementary Rows and Columns same observations described by the active tables. Such As MFA is a generalized PCA, we can add a table is represented by the I by Jsup matrix Ysup. supplementary rows and columns as in standard The matrix Ysup is preprocessed in the same manner PCA (see Ref 75 for details). Note incidentally, that (e.g., centered, normalized) as the Y[k] matrices to this procedure assumes that the supplementary rows give the supplementary matrix Xsup which is also and columns are scaled in a manner comparable normalized by dividing all its elements by its first to the rows and columns of the original matrix X. singular value (i.e., this normalization ensures that the Specifically, from Eqs (22) and (13), we can compute first singular value of Xsup is equal to 1). This matrix the factor scores, denoted fsup for a supplementary will provide supplementary factor scores and loadings observation (i.e., a supplementary row of dimensions for the compromise solution, as described below. 1byJ recording measurements on the same variables as the whole matrix X). This supplementary row is Factor Scores T represented by a 1 by J vector denoted rsup (which To obtain the factor scores for a new table, the first has been preprocessed in the same way as X), the step is to obtain (from Eq. (46)) the supplementary supplementary factor scores are computed as loadings which are computed as = T = T −1 fsup rsupAQ. (45) Qsup XsupMP . (48)
Loadings are denoted qsup for a new column Then, using Eq. (22), (and taking into account that which is itself denoted by an I by 1 vector osup it first singular value is equal to one) we obtain the (note that osup needs to have been pre-processed in a supplementary factor scores for the new table Xsup as way comparable with the tables). These loadings are = = T −1 obtained, in a way similar to Eq. (45) as Fsup KXsupQsup KXsupXsupMP = −1 −1 KSsupMP . (49) qsup = osupMP . (46)
Supplementary Partial Observations INFERENTIAL ASPECTS In some cases, we have supplementary observa- tions for only one (or some) table(s). In this case, MFA is a descriptive multivariate technique, but it called a supplementary partial observation,wecan is often important to be able to complement the obtain the supplementary partial factor scores for descriptive conclusions of an analysis by assessing
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis if its results are reliable and replicable. For example, To do so, we first take a sample of integers with a standard question, in the PCA framework, is to find replacement from the set of integers from 1 to K. the number of reliable components and most of the Recall that, when sampling with replacement, any approaches used in PCA will work also for MFA. For element from the set can be sampled zero, one, or example, we can use of the informal ‘scree’ test (a.k.a more than one times. We call this set B (for bootstrap). ‘elbow’) and the more formal tests RESS, PRESS, and For example with five elements, a possible bootstrap Q2 statistics (see for details, Refs 75, p. 440ff. and set could be B ={1, 5, 1, 3, 3}. We then generate also 107–109). We can also use techniques such as a new data set (i.e., a new X matrix comprising K 110 111,155 the jackknife or the bootstrap to identify tables) using matrices X[k] with these indices. So with important observations, variables, or tables. These K = 5, this would give the following bootstrap set approaches are implemented differently if we consider the observations, variables, or even tables as being a X[1], X[5], X[1], X[3], X[3] . (50) random or a fixed factor (see Ref 112). From this set we would build a data matrix denoted X∗ that would then be analyzed by MFA. This analy- Bootstrap for the Factor Scores 1 sis would provide a set of bootstrapped factor scores If we assume that the tables constitute a random ∗ (denoted F ) obtained by projecting the bootstrapped factor (i.e., the tables are independent and identically 1 data table as a supplementary element (see Eqs (37) distributed—i.i.d—and sampled from a potentially and (49)). Interestingly, as a consequence of the infinite set of tables) and if we consider the barycentric properties of the factor scores (see Eq. observations as a fixed factor, we may want to estimate (23)), this last step can also be directly obtained by the stability of the compromise factor scores. Such an ∗ computing F as the weighted average of the corre- evaluation could be done, for example, by using a 1 sponding partial factor scores. We then repeat the jackknife or a bootstrap approach. procedure a large number of times (e.g., L = 1000) We briefly sketch here a possible bootstrap and generate L bootstrapped matrices of factor scores approach for the factor scores (see Ref 113 for the F∗. From these bootstrapped matrices of factor scores, problem of bootstrapping in the context of the SVD, we can derive CIs and estimate the mean factor and Refs 114,115 for a review of the bootstrap in the scores as the mean of the bootstrapped factor scores. PCA context, and Refs 116–119 for recent applica- ∗ Formally, F denotes the bootstrap estimated factor tions and developments to MFA). The main idea is to scores, and is computed as use the properties of Eq. (23) which indicate that the compromise factor scores are the average of the par- L ∗ 1 ∗ tial factor scores. Therefore, we can obtain bootstrap F = F . (51) L confidence intervals (CIs) by repeatedly sampling with replacement from the set of tables and compute new compromise factor scores (this approach corresponds In a similar way, the bootstrapped estimate of the ∗ to the partial bootstrap of Ref 114, see Ref 59 for variance is obtained as the variance of the F matrices. 2 an alternative approach using split-half resampling). Formally, σF∗ denotes the bootstrapped estimate of From these estimates we can also compute bootstrap the variance and is computed as ratios for each dimension by dividing the mean of L ∗ ∗ the bootstrap estimates by their standard deviation. 2 1 ∗ ∗ σ ∗ = F − F ◦ F − F . (52) These bootstrap ratios are akin to t statistics and can F L be used to detect observations that reliably contribute to a given dimension. So, for example, for a given Bootstrap ratios (denoted T∗) are computed by dimension and a given observation a value of the dividing the bootstrapped means by the corresponding bootstrap ratio larger than 2 will be considered reli- bootstrapped standard deviations (denoted σF∗ and is able (by analogy with a t larger than 2 which would the square root of the bootstrapped estimate of the be ‘significant’ at p <.05). When evaluating bootstrap variance). These bootstrap ratios are often interpreted ratios, the multiple comparisons problem can be taken as Student’s t statistics. into account by using, for example, a Bonferroni-type correction (see Ref 120) and instead of using critical Bootstrapped Confidence Intervals values corresponding to say p <.05 we would use ∗ The bootstrap factor scores (i.e., the F ’s) can also values corresponding to p < .05 . I be used to compute CIs for the observations. For a More formally, in order to compute a bootstrap given dimension, the bootstrapped factor scores of an estimate, we need to generate a bootstrap sample.
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats observation can be ordered from the smallest to the For the example, the data consist of K = 10 largest and a CI for a given p value can be obtained tables (one for each assessor) shown in Table 1. For 1 by trimming the upper and lower p proportion of the example, the first table denoted Y[1] is equal to distribution. In general, for a given dimension, the ⎡ ⎤ 867416 bootstrap ratios, and the CIs will agree in detecting ⎢ ⎥ ⎢ 758128⎥ the relevant observations. In addition, CIs can also be ⎢ ⎥ ⎢ 656534⎥ plotted directly on the factor scores map as confidence ⎢ ⎥ ⎢ 968435⎥ ellipsoids or confidence convex-hulls which comprise a ⎢ 222873⎥ ⎢ ⎥ 1 − p proportion of the bootstrapped factor scores (see ⎢ 344961⎥ Y[1] = ⎢ ⎥ . (53) Ref 59 and our example illustration in Figure 8). When ⎢ 535483⎥ ⎢ ⎥ the ellipsoids or convex-hulls of two observations do ⎢ 524874⎥ ⎢ ⎥ not overlap, these two observations can be considered ⎢ 868447⎥ ⎢ 462534⎥ as reliably different. Like for the bootstrap ratios, in ⎣ ⎦ order to correct for the potential problem of multiple 848133 comparisons, a Bonferroni type of correction can also 536442 be implemented when plotting hulls or ellipsoids 118 Each table was then preprocessed by first (see Ref 59 for details). Recent work suggests centering and normalizing each column such that its that these CIs could be too liberal (i.e., too small) mean is equal to 0 and the sum of the square values when the number of tables is large and that a better of all its elements is equal to 1. For example, X[1],the procedure would be to use instead a ‘total bootstrap’ pre-processed matrix for Assessor 1 is equal to: (i.e., recomputing the factors scores from the whole ⎡ ⎤ bootstrap tables rather than from the partial factor 0.30 0.32 0.18 −0.09 −0.44 0.27 ⎢ − − ⎥ scores cf., Ref 114). ⎢ 0.16 0.13 0.31 0.45 0.31 0.57 ⎥ ⎢ − − ⎥ ⎢ 0.02 0.13 0.04 0.03 0.17 0.02 ⎥ ⎢ − − ⎥ ⎢ 0.43 0.32 0.31 0.09 0.17 0.12 ⎥ ⎢−0.52 −0.45 −0.49 0.39 0.37 −0.17 ⎥ AN EXAMPLE ⎢ ⎥ ⎢−0.39 −0.06 −0.22 0.51 0.24 −0.47 ⎥ X[1] = ⎢ ⎥ . To illustrate MFA, we selected a (fictitious) example ⎢−0.11 −0.26 −0.09 −0.09 0.51 −0.17 ⎥ ⎢ ⎥ previously used (see Ref 3) to illustrate the ⎢−0.11 −0.45 −0.22 0.39 0.37 −0.02 ⎥ ⎢ − − ⎥ STATIS method which is a possible alternative to ⎢ 0.30 0.32 0.31 0.09 0.03 0.42 ⎥ MFA. In this example, a set of wines was described ⎢−0.25 0.32 −0.49 0.03 −0.17 −0.02 ⎥ ⎣ ⎦ by a group of expert tasters called assessors. This 0.30 −0.06 0.31 −0.45 −0.17 −0.17 type of data could be analyzed using a standard PCA, −0.11 −0.26 0.04 −0.09 −0.03 −0.32 but this approach obviously neglects the inter-assessor (54) differences. MFA has the advantages of providing a common space for the products (i.e., the factor scores), PCA of the Data Tables as well as information about how each assessor relates A PCA of each of the data table will then be to this common space (i.e., the partial factor scores). performed. For example, the SVD of the first data Specifically, this example concerns 12 wines table gives made from Sauvignon Blanc grapes coming from three = T T = T = wine regions (four wines from each region): New X[1] U[1] [1]V[1] with U[1]U[1] V[1]V[1] I, (55) Zealand, France, and Canada. Ten expert assessors with were asked to evaluate these wines. The assessors ⎡ ⎤ were asked (1) to evaluate the wines on 9-point rating 0.32 0.26 −0.03 0.09 0.32 −0.22 ⎢ ⎥ scales, using four variables considered as standard ⎢ 0.38 −0.06 0.35 −0.49 0.17 0.41 ⎥ ⎢ − ⎥ for the evaluation of these wines (cat-pee, passion- ⎢ 0.06 0.14 0.15 0.07 0.14 0.10 ⎥ ⎢ − − − ⎥ fruit, green pepper, and mineral) and, (2) if they felt ⎢ 0.30 0.00 0.06 0.45 0.13 0.23 ⎥ ⎢−0.49 0.06 0.24 −0.23 0.19 0.01 ⎥ the need, to add some variables of their own (some ⎢ ⎥ ⎢−0.38 0.20 −0.28 0.41 0.00 0.52 ⎥ assessors choose none, some choose one or two more U[1] = ⎢ ⎥ , ⎢−0.21 −0.40 0.12 −0.08 −0.65 −0.01 ⎥ variables). The raw data are presented in Table 1. The ⎢ ⎥ ⎢−0.31 −0.16 0.42 0.19 0.32 −0.46 ⎥ goals of the analysis were twofold: ⎢ − ⎥ ⎢ 0.29 0.05 0.39 0.29 0.26 0.25 ⎥ ⎢−0.07 0.63 −0.18 −0.39 −0.37 −0.30 ⎥ ⎣ ⎦ 1 to obtain a typology of the wines, and 0.21 −0.45 −0.44 −0.14 −0.01 −0.23 − − − − 2 to discover agreement (if any) between the 0.10 0.27 0.39 0.17 0.27 0.16 assessors. (56)
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
TABLE 1 Raw Data (Tables Y[1] Through Y[10]) Assessor 1 Assessor 2 Assessor 3 Assessor 4 Assessor 5 V1 V2 V3 V4 V5 V6 V1 V2 V3 V4 V7 V8 V1 V2 V3 V4 V9 V10 V1 V2 V3 V4 V8 V1 V2 V3 V4 V11 V12
NZ1 86741686837586837 2 958269693 8 2
NZ2 75812865637787728 2 877357771 9 2
NZ3 65653466658787769 1 889277771 7 2
NZ4 96843586846682839 3 889478975 6 1
FR1 22287323174334364 6 422434442 4 4
FR2 34496143493543483 9 322624556 1 5
FR3 53548333274454523 6 444646572 3 1
FR4 52487443553363771 7 522946658 4 5
CA1 86844786955685915 2 756328682 5 4
CA2 46253455565855465 1 566446664 6 3
CA3 84813384837783735 4 736167484 5 1
CA4 53644253748554454 3 522665555 6 1 Assessor 6 Assessor 7 Assessor 8 Assessor 9 Assessor 10 V1 V2 V3 V4 V13 V1 V2 V3 V4 V1 V2 V3 V4 V14 V5 V1 V2 V3 V4 V15 V1 V2 V3 V4
NZ1 8562 9 85847674 9 28691 7 867 5
NZ2 6662 4 76846562 7 28791 6 757 3
NZ3 7772 7 67636664 9 27784 7 766 2
NZ4 8782 8 78618782 8 28993 9 877 4
FR1 3227 2 42363344 4 43445 4 231 7
FR2 3333 4 44444447 3 65557 2 333 9
FR3 4233 3 43445353 3 55556 3 425 8
FR4 5359 3 53576463 2 45565 3 342 8
CA1 7771 4 84948654 5 48784 7 867 4
CA2 4624 6 47525754 6 15645 6 564 4
CA3 7482 3 85737482 6 28474 5 748 5
CA4 4533 7 43525462 4 35453 4 546 6
V1, Cat Pee; V2, Passion Fruit; V3, Green Pepper; V4, Mineral; V5, Smoky; V6, Citrus; V7, Tropical; V8, Leafy; V9, Grassy; V10, Flinty; V11, Vegetal; V12, Hay; V13, Melon; V14, Grass; V15, Peach.
⎧⎡ ⎤⎫ ⎪ 2.037 ⎪ From this PCA we obtain the following factor ⎪⎢ ⎥⎪ ⎨⎪⎢ 0.886 ⎥⎬⎪ scores for the first table: ⎢ 0.701 ⎥ = diag ⎢ ⎥ , (57) [1] ⎪⎢ 0.589 ⎥⎪ ⎪⎣ ⎦⎪ G = U ⎩⎪ 0.402 ⎭⎪ [1] ⎡ [1] [1] ⎤ 0.255 0.65 0.23 −0.02 0.05 0.13 −0.06 ⎢ ⎥ ⎢ 0.77 −0.05 0.25 −0.29 0.07 0.11 ⎥ ⎢ − ⎥ and ⎢ 0.13 0.13 0.11 0.04 0.06 0.03 ⎥ ⎢ 0.60 0.00 −0.04 0.26 −0.05 −0.06 ⎥ ⎡ ⎤ ⎢− − ⎥ 0.45 −0.25 0.02 0.44 −0.03 −0.73 ⎢ 0.99 0.05 0.17 0.14 0.08 0.00 ⎥ ⎢− − ⎥ ⎢ 0.38 0.62 −0.17 0.26 −0.58 0.20 ⎥ = ⎢ 0.77 0.18 0.19 0.24 0.00 0.13 ⎥ ⎢ ⎥ ⎢− − − − − ⎥(59). ⎢ 0.42 −0.47 −0.04 0.39 0.19 0.65 ⎥ ⎢ 0.43 0.35 0.09 0.05 0.26 0.00 ⎥ V = ⎢ ⎥ . ⎢− − − ⎥ [1] ⎢−0.40 0.39 0.20 0.69 0.41 0.02 ⎥ ⎢ 0.64 0.15 0.29 0.11 0.13 0.12 ⎥ ⎣ ⎦ ⎢ − ⎥ −0.41 −0.38 0.41 0.25 −0.67 0.06 ⎢ 0.59 0.04 0.27 0.17 0.10 0.06 ⎥ ⎢−0.15 0.56 −0.12 −0.23 −0.15 −0.08 ⎥ 0.37 0.19 0.87 −0.22 0.12 0.05 ⎣ ⎦ 0.43 −0.40 −0.31 −0.08 −0.00 −0.06 (58) −0.20 −0.24 −0.27 −0.10 0.11 0.04
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats
From Eq. (57), we have found that the first Generalized PCA of X singular value of X[1] had a value of 2.037, and The 12 × 12 diagonal mass matrix M for the therefore the α weight for the first table is obtained as = 1 = observations (with equal masses set to mi I .08) (cf., Eq. (10)): is given as
= 1 = −2 = 1 = α1 γ1,1 0.241. (60) γ 2 2.0372 M = diag {m} 1,1 ⎧⎡ ⎤⎫ ⎪ ⎪ ⎪ 0.08 ⎪ ⎪⎢ ⎥⎪ Creating the α Weight Vector ⎪⎢ 0.08 ⎥⎪ ⎪⎢ ⎥⎪ We collect the value of the α weights of the K ⎪⎢ 0.08 ⎥⎪ ⎪⎢ ⎥⎪ tables into a K by one weight vector denoted by α ⎪⎢ 0.08 ⎥⎪ ⎨⎪⎢ ⎥⎬⎪ ⎡ ⎤ ⎢ 0.08 ⎥ = 1 × = ⎢ ⎥ 0.241 diag 1 diag ⎪⎢ 0.08 ⎥⎪ . (63) ⎢ ⎥ 12 ⎪⎢ ⎥⎪ ⎢ 0.239 ⎥ ⎪⎢ 0.08 ⎥⎪ ⎢ ⎥ ⎪⎢ ⎥⎪ ⎢ ⎥ ⎪⎢ 0.08 ⎥⎪ ⎢ 0.275 ⎥ ⎪⎢ ⎥⎪ ⎢ ⎥ ⎪⎢ 0.08 ⎥⎪ ⎢ 0.273 ⎥ ⎪⎣ ⎦⎪ ⎢ ⎥ ⎩⎪ 0.08 ⎭⎪ ⎢ 0.307 ⎥ = ⎢ ⎥ 0.08 α ⎢ ⎥ . (61) ⎢ 0.302 ⎥ ⎢ ⎥ ⎢ 0.417 ⎥ The GSVD of X with matrices M (Eq. (63)) ⎢ ⎥ and A (Eq. (62)) gives X = P QT. The eigenvalues ⎢ 0.272 ⎥ ⎢ ⎥ (denoted λ) are equal to the squares of the singular ⎣ . ⎦ 0 264 values and are often used to informally decide upon the 0.309 number of components to keep for further inspection. The eigenvalues and the percentage of inertia that they = There are K 10 values in α (see Eqs (2) and (61)). explain are given in Table 2 = × These values are stored in the J 53 1 vector The pattern of the eigenvalue distribution a, which can itself be used to fill in the diagonal suggests to keep two or three dimensions for future × elements of the 53 53 diagonal matrix A: examination and we decided (somewhat arbitrarily) ⎡ ⎤ to keep only the first two dimensions for this example. 1 × 0.241 ⎢ [1] ⎥ The matrix Q of the right singular vectors ⎢ 1 × 0.239 ⎥ ⎢ [2] ⎥ (loadings for the variables) is given in Table 3, the ⎢ × ⎥ ⎢ 1[3] 0.275 ⎥ matrix P of the right singular vectors and the matrix ⎢ ⎥ of the singular values are given below: ⎢ 1[4] × 0.273 ⎥ ⎢ ⎥ ⎢ 1 × 0.307 ⎥ = ⎢ [5] ⎥ ⎡ ⎤ a ⎢ × ⎥ and ⎢ 1[6] 0.302 ⎥ −1.117 0.466 ⎢ ⎥ ⎢ ⎥ ⎢ 1[7] × 0.417 ⎥ ⎢ −0.922 0.093 ⎥ ⎢ ⎥ ⎢ − − ⎥ ⎢ 1[8] × 0.272 ⎥ ⎢ 0.867 1.295 ⎥ ⎢ ⎥ ⎢ − − ⎥ ⎣ 1 × 0.264 ⎦ ⎢ 1.270 0.473 ⎥ [9] ⎢ − ⎥ × ⎢ 1.564 0.366 ⎥ 1[10] 0.309 ⎢ − ⎥ ⎧⎡ ⎤⎫ = ⎢ 1.440 0.308 ⎥ = 0.878 ⎪ × ⎪ P ⎢ ⎥ diag . ⎪ 1[1] 0.241 ⎪ ⎢ 0.921 0.584 ⎥ 0.351 ⎪⎢ × ⎥⎪ ⎢ ⎥ ⎪⎢ 1[2] 0.239 ⎥⎪ ⎢ 1.054 1.163 ⎥ ⎪⎢ × ⎥⎪ ⎢ − . . ⎥ ⎪⎢ 1[3] 0.275 ⎥⎪ ⎢ 0 762 1 051 ⎥ ⎪⎢ × ⎥⎪ ⎢ . − . ⎥ ⎨⎪⎢ 1[4] 0.273 ⎥⎬⎪ ⎢ 0 083 2 158 ⎥ ⎢ 1 × 0.307 ⎥ ⎣ −0.542 1.463 ⎦ = diag { } = diag ⎢ [5] ⎥ A a ⎪⎢ × ⎥⎪ , (62) . − . ⎪⎢ 1[6] 0.302 ⎥⎪ 0 418 0 217 ⎪⎢ × ⎥⎪ ⎪⎢ 1[7] 0.417 ⎥⎪ (64) ⎪⎢ × ⎥⎪ ⎪⎢ 1[8] 0.272 ⎥⎪ ⎪⎣ × ⎦⎪ ⎩⎪ 1[9] 0.264 ⎭⎪ Factor Scores 1[10] × 0.309 The factor scores for X show the best two-dimensional representation of the compromise of the K tables. where 1[k] is an J[k] by 1 vector of ones. Using Eqs (18), (20), and (64), we obtain:
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
TABLE 2 Eigenvalues and Percentage of Explained Inertia of the MFA of X Component 1234567891011 Singular value (δ) 0.878 0.351 0.301 0.276 0.244 0.198 0.176 0.158 0.137 0.116 0.106 Eigenvalue (λ = δ2) 0.770 0.123 0.091 0.076 0.060 0.039 0.031 0.025 0.019 0.013 0.011 cumulative 0.770 0.893 0.984 1.060 1.120 1.159 1.190 1.215 1.233 1.247 1.258 %Inertia(τ) 6110765322111 cumulative 61 71 78 84 89 92 94 96 97 98 100
⎡ ⎤ and are displayed in Figure 3 (i.e., first top left panel, −0.980 0.163 see also Figure 2). From Figure 3, we can see that, ⎢ ⎥ ⎢ −0.809 0.033 ⎥ for all the assessors, Component 1 separates the ⎢ − − ⎥ ⎢ 0.761 0.454 ⎥ New Zealand from the French Sauvignon Blancs, ⎢ − − ⎥ ⎢ 1.115 0.166 ⎥ a configuration replicating the pattern seen in the ⎢ 1.373 −0.128 ⎥ ⎢ ⎥ compromise (Figure 2a). However, the assessors show ⎢ 1.264 −0.108 ⎥ F = P = XAQ = ⎢ ⎥ . (65) a large inter-individual differences in how they rated ⎢ 0.808 0.205 ⎥ the Canadian wines. ⎢ . . ⎥ ⎢ 0 925 0 408 ⎥ The original variables are analyzed, as in ⎢ −0.669 0.369 ⎥ ⎢ ⎥ standard PCA, by computing loadings which are given ⎢ 0.073 −0.757 ⎥ ⎣ ⎦ −0.476 0.513 in Table 3 for the first two dimensions. The loadings 0.367 −0.076 are also plotted in a biplot fashion in Figure 3. Here we show the partial factor scores (of the wines) along In the F matrix, each row represents a wine with the loadings for each assessor (which we have and each column is a component. Figure 2a shows the re-scaled so that their variance, for each dimension, wines in the space created by the first two components. is equal to the singular value of the compromise). From these plots, we can see that for all the assessors, The first component (with an eigenvalue equal to λ1 = 0.8782 = 0.770) explains 61% of the inertia, and the New Zealand wines are rated as having a more contrasts the French and New Zealand wines. The sec- cat-pee aroma, with some green pepper and passion 2 fruit, while the French wines are rated as being more ond component (with an eigenvalue of λ2 = 0.351 = 0.123) explains 10% of the inertia and is more delicate mineral, smoky, or hay-like. to interpret from the wines alone (its interpretation Determining the Importance of the Tables will becomes clearer after looking at the loadings). in the Compromise There are two ways to determine which tables play Partial Factor Scores the largest role in the compromise: contributions and The partial factor scores (which are the projections of partial inertias. The contribution of a table reflects the tables onto the compromise) are computed from the proportion of the variance of a dimension that Eq. (22). For example, for the first assessor, the partial can be attributed to this table (see Eq. (28)). The factor scores of the 12 wines are obtained as: larger the contribution of a table to a component, the more important this table is for this component. The F = Kα X Q [1] 1 [1] [1] ⎡ ⎤ contributions for the tables are as follows: −1.037 0.155 ⎡ ⎤ ⎢ − ⎥ 0.101 0.095 ⎢ 1.179 0.596 ⎥ ⎢ ⎥ ⎢ − − ⎥ ⎢ 0.100 0.068 ⎥ ⎢ 0.213 0.104 ⎥ ⎢ ⎥ ⎢ −0.946 0.446 ⎥ ⎢ 0.101 0.152 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1.546 −0.676 ⎥ ⎢ 0.096 0.049 ⎥ ⎢ − ⎥ ⎢ ⎥ = × ⎢ 1.176 0.747 ⎥ ⎢ 0.098 0.063 ⎥ 10 0.241 ⎢ ⎥ (66) ctr = ⎢ ⎥ . (67) ⎢ 0.698 0.166 ⎥ k, ⎢ 0.101 0.104 ⎥ ⎢ − ⎥ ⎢ ⎥ ⎢ 1.006 0.063 ⎥ ⎢ 0.102 0.224 ⎥ ⎢ − . . ⎥ ⎢ ⎥ ⎢ 0 922 0 486 ⎥ ⎢ 0.096 0.134 ⎥ ⎢ 0.189 −0.936 ⎥ ⎣ ⎦ ⎣ 0.100 0.053 ⎦ −0.643 0.640 0.105 0.057 0.323 0.036
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats 180 032 417 141 376 267 275 080 283 071 ...... 277 0 076 0 417 0 084 0 290 0 . . . . . 0 277 0 297 0 275 0 088 0 077 0 . . . . . − 0 0 − − 246 410 0 061 0 417 0 168 0 . . . . . 0 0 365 275 0 133 0 203 041 0 − − . . . . . 0 − 275 076 0 417 0 144 0 380 . . . . . 0 − 258 0 275 0 144 0 379 067 0 . . . . . 0 − 205 340 0 042 0 302 0 116 0 . . . . . 0 0 − − 228 0 136 275 0 052 0 018 0 . . . . . 0 0 − − 053 0 302 0 001 0 031 231 . . . . . 305 275 0 055 0 234 093 0 . . . . . 0 265 0 070 0 302 0 107 0 328 0 − . . . . . 0 − 333 0 222 239 0 111 0 049 0 . . . . . 0 0 274 0 277 302 0 075 0 077 0 . . . . . − − 0 0 − − 113 238 013 0 057 0 . . . . 239 0 . 0 0 302 302 0 046 0 215 091 0 . . . . . − − 0 − 240 239 0 058 0 256 066 0 . . . . . 307 0 0 000 0 019 0 177 031 0 . . . . . − 144 259 307 0 021 067 . . . . . 267 0 239 0 040 0 200 071 0 . 0 0 . . . . 0 − − − 307 0 017 0 132 124 015 0 . . . . . 178 0 296 239 0 032 0 088 0 . . . . . 0 0 X − − 268 0 307 0 067 0 258 0 072 0 . . . . . 0 − 297 239 0 034 0 183 088 0 . . . . . 0 − 249 0 213 307 0 062 0 045 0 . . . . . 0 0 − − 233 241 0 017 0 129 0 054 0 . . . . . 0 296 307 0 040 0 201 088 0 − . . . . . 0 − 241 0 026 0 161 0 286 082 0 . . . . . 117 0 205 273 0 014 0 042 0 . . . . . 0 0 − − 184 0 241 0 034 0 241 0 058 0 . . . . . 0 273 0 004 0 066 230 053 0 . . . . . − 260 0 241 0 157 0 396 068 0 Assessor 1 Assessor 2 Assessor 3 . . . . . 169 0 303 0 273 0 029 0 092 0 0 . . . . . 0 0 − − − Assessor 4 Assessor 5 Assessor 6 Assessor 7 248 0 267 241 0 071 0 062 0 . . . . . 353 261 273 0 125 0 068 0 0 0 . . . . . 0 0 − − − − 294 241 0 087 0 101 0 318 313 273 0 007 0 082 098 0 ...... V1 V2 V3 V4 V5 V6 V1 V2 V3 V4 V7 V8 V1 V2 V3 V4 V9 V10 0 V1 V2 V3 V4 V8 V1 V2 V3 V4 V11 V12 V1 V2 V3 V4 V13 V1 V2 V3 V4 1000 1000 0 Weights, Loadings, Squared Loadings, and Contributions for the MFA of − × × − ) ) α Q Q Dim 2 0 Dim 2 0 Dim 2 0 Dim 1 Dim 1Dim 2 21 24 17 15 16 38 14 8 20 6 13 21 4 21 8 17 8 16 10 14 14 12 3 26 27 5 15 18 14 11 39 21 37 20 24 22 Dim 1 Dim 1Dim 2 27 2 19 34 25 8 14 12 1 27 4 14 12 22 19 5 20 20 5 10 6 28 0 23 14 21 23 16 32 13 0 31 35 25 60 32 14 70 35 59 Dim 1 0 Dim 1 0 Dim 2 0 -Weights 0 -Weights 0 α α Loadings ( Contribution Loadings ( Contribution Squared loadings Squared Loadings TABLE 3
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
The contributions can also be plotted to obtain a 309 272 282 080 074 . . . . . visual representation of the importance of the studies. Figure 4 shows the relative contributions of each of the tables to Components 1 and 2. From this figure, 286 0 309 0 187 0 082 0 035 0 . . . . . we can see that Assessor 10 contributes the most to the 0
− first component of the compromise, while Assessor 7 contributes most to the second component. Assessor 5, by contrast, contributes the least to both Components 262 0 274 . . 309 0 075 0 069 0 . . . 0 0 1 and 2. − − In addition to using contributions, we can determine a table’s importance with partial inertia 323 . 309 0 080 104 0 006 0 . . . . that gives the proportion of the compromise vari- 0
− ance (i.e., inertia) explained by the table. This is obtained by multiplying, for each dimension, the 188 0 296 264 0 088 0 035 0 2, Hay; V13, Melon; V14, Grass; V15, Peach. table contribution by the eigenvalue of the dimen- . . . . . 0 0 sion. For our example, the partial inertias denoted − − Ipartial are as follows: ⎡ ⎤ 083 264 0 063 0 251 007 0 ...... 0 0 0779 0 0117
− ⎢ ⎥ ⎢ 0.0771 0.0084 ⎥ ⎢ ⎥ ⎢ 0.0778 0.0186 ⎥ ⎢ ⎥ 287 0 226 264 0 082 0 051 0 . . . . . ⎢ 0.0743 0.0060 ⎥ 0 ⎢ ⎥
− ⎢ ⎥ I = ⎢ 0.0751 0.0078 ⎥ partial ⎢ ⎥ . (68) ⎢ 0.0776 0.0128 ⎥ ⎢ ⎥ 221 0 235 264 0 055 0 049 0 0.0787 0.0275 . . . . . ⎢ ⎥ 0 0 ⎢ ⎥ − − ⎢ 0.0736 0.0165 ⎥ ⎣ 0.0771 0.0065 ⎦ . .
303 0 0810 0 0070 241 264 0 092 0 058 0 . . . . . 0 − Like the contributions, the partial inertia can be plotted to get a visual representation. From Figure 5, 293 0 272 0 057 0 086 0 239 . . . . . we see that Assessor 10 accounts for the most inertia on the first dimension, while Assessor 5 accounts for the lowest proportion of the total 261 0 286 272 0 082 0 068 0 . . . . . 0 0 variance. − −
Supplementary Table 219 272 0 019 0 048 0 . . . . 138 . 0 As we would like to know what qualities of the wines − are associated with the assessors’ ratings, we wanted to include some chemical components of the wines 235 0 272 0 055 0 054 0 231 Assessor 8 Assessor 9 Assessor 10 . . . . . as variables, namely, titratable acidity, pH, alcohol, 0
− and residual sugar. The values for these variables are shown in Table 4. However, because these properties 247 376 0 272 0 061 0 142 0 are qualitatively different from the assessors’ ratings . . . . . 0 0 (i.e., did not come from the same population), we − − did not want to include them as active elements in the analysis and therefore projected them as a 276 076 0 096 0 309 272 0 . . . . .
V1 V2 V3 V4 V14 V5 V1supplementary V2 V3 table. V4 V15 V1 V2 V3 V4 0 − 1000 To obtain the factor scores, the first step Continued × )
Q is to center and normalize each variable of the supplementary data table. Then after computing its Dim 1 Dim 1 0 Dim 2 0 Dim 2 0 i1175 2216241422172332232525 Dim12117155 Dim 2 26first 39 singular 15 value 13 (equal 19 to 1. 233867), 15 each element 13 of 13 2 9 2 21 11 23 -Weights 0 TABLE 3 α V1, Cat Pee; V2, Passion Fruit; V3, Green Pepper; V4, Mineral; V5, Smoky; V6, Citrus; V7, Tropical; V8, Leafy; V9, Grassy; V10, Flinty; V11, Vegetal; V1 Loadings ( Squared Loadings Contribution the centered and normalized table is divided by the
© 2013 Wiley Periodicals, Inc. Overview wires.wiley.com/compstats
(a) (b)
FIGURE 2| Compromise of the 10 tables. (a) Factor scores (wines). (b) Assessors’ partial factor scores projected into the compromise as supplementary elements. Each assessor is represented by a dot, and for each wine a line connects the wine factor scores to the partial factors scores of a given assessor for this wine. (λ1 = 0.770, τ1 = 61%; λ2 = 0.123, τ2 = 10%).
first singular value. This gives the matrix Xsup (whose Next, we compute the supplementary factor first singular value is now equal to 1): scores for the first 2 components as (cf., Eq. 49): ⎡ ⎤ − T −1 0.094 0.081 0.315 0.139 Fsup = KXsupQsup = KXsupX MP ⎢ ⎥ sup ⎢ −0.152 0.171 0.143 0.288 ⎥ ⎡ ⎤ ⎢ ⎥ −0.727 −0.954 ⎢ 0.023 0.015 0.315 0.139 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −0.677 −0.463 ⎥ ⎢ 0.470 −0.032 0.143 0.362 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −0.857 −0.986 ⎥ ⎢ − − − ⎥ ⎢ ⎥ ⎢ 0.210 0.213 0.200 0.234 ⎥ ⎢ − − ⎥ ⎢ ⎥ ⎢ 1.556 0.615 ⎥ ⎢ −0.039 −0.146 −0.200 −0.110 ⎥ ⎢ ⎥ X = ⎢ ⎥ . ⎢ 1.030 0.771 ⎥ sup ⎢ −0.307 0.051 −0.029 −0.408 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0.651 0.594 ⎥ ⎢ − − − ⎥ = ⎢ ⎥ ⎢ 0.094 0.093 0.372 0.085 ⎥ ⎢ ⎥ . (71) ⎢ ⎥ ⎢ 1.241 0.281 ⎥ ⎢ 0.295 0.033 −0.029 0.089 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0.910 1.178 ⎥ ⎢ −0.074 0.111 0.143 −0.085 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −0.506 0.035 ⎥ ⎣ 0.023 0.033 −0.200 0.014 ⎦ ⎢ ⎥ ⎢ −0.011 −0.353 ⎥ 0.159 −0.625 −0.029 −0.110 ⎢ ⎥ ⎣ 0.281 0.600 ⎦ (69) 0.219 −0.089 We then compute the supplementary loadings, In a manner similar to the biplot approach we used which are obtained as (cf., Eq. (48)): for the assessors (see Figure 3) we plotted together, ⎡ ⎤ in a biplot way, the supplementary partial factor −0.125 −0.009 ⎢ ⎥ scores and loadings for the chemical properties of ⎢ −0.024 0.032 ⎥ = T −1 = ⎢ ⎥ the wines (see Figure 6). This figure confirms the Qsup XsupMP . ⎣ −0.173 −0.298 ⎦ interpretation that we reached from the numerical −0.201 −0.037 values. (70) Bootstrap From these loadings, we can see that, on Component 1, To estimate the stability of the compromise factor the New Zealand Sauvignon Blancs are more acidic, scores, we used a bootstrap approach. We generated have a greater alcohol content, and more residual 1000 bootstrap samples that gave 1000 estimated sugar than the French wines. bootstrapped factor scores. For example, for the first
© 2013 Wiley Periodicals, Inc. WIREs Computational Statistics Multiple factor analysis
FIGURE 3| Partial factor scores and variable loadings for the first two dimensions of the compromise space. The loadings have been re-scaled to have a variance equal the singular values of the compromise analysis.
2 2 x 10−3 7 7 2.5
3 2 8 0.2 1.5 6 1 10 1 2 4 5 9 0.5 3 0.15 0 081234567 1 8 x 10−3
6 FIGURE 5| Partial Inertias of the tables. The sizes of the assessors’ 0.1 1 icons are proportional to their explained inertia for Components 1 and 2. 2 5 10 0.05 9 With this sample, we then computed the bootstrapped 4 estimate of the factor scores as the projection as supplementary element of the whole bootstrapped 0 matrix. Interestingly (for computational efficiency) 0 0.05 0.1 1 these factor scores can directly be obtained as the average of the partial factor scores. Specifically, the FIGURE 4| Contributions of the tables to the compromise. The bootstrapped estimate of the factor scores from the sizes of the assessors’ icons are proportional to their contribution to ∗ first bootstrap sample, is denoted F1 and computed Components 1 and 2. as: