Factor Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst | The Harvard Clinical & Translational Science Center Short course, October 27, 2016 1 Well-used latent variable models Latent Observed variable scale variable scale Continuous Discrete Continuous Factor Discrete FA analysis IRT (item response) LISREL Discrete Latent profile Latent class Growth mixture analysis, regression General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS) Objectives § What is factor analysis? § What do we need factor analysis for? § What are the modeling assumptions? § How to specify, fit, and interpret factor models? § What is the difference between exploratory and confirmatory factor analysis? § What is and how to assess model identifiability? 3 What is factor analysis § Factor analysis is a theory driven statistical data reduction technique used to explain covariance among observed random variables in terms of fewer unobserved random variables named factors 4 An Example: General Intelligence (Charles Spearman, 1904) Y1 δ1 δ Y2 2 δ General Y3 3 Intelligence Y4 δ4 F Y5 δ5 5 Why Factor Analysis? 1. Testing of theory § Explain covariation among multiple observed variables by § Mapping variables to latent constructs (called “factors”) 2. Understanding the structure underlying a set of measures § Gain insight to dimensions § Construct validation (e.g., convergent validity) 3. Scale development § Exploit redundancy to improve scale’s validity and reliability 6 Part I. Exploratory Factor Analysis (EFA) 7 One Common Factor Model: Model Specification Y δ1 λ1 1 Y1 = λ1F +δ1 λ2 Y δ F 2 2 Y2 = λ2 F +δ 2 λ3 Y δ 3 3 Y3 = λ3F +δ3 § The factor F is not observed; only Y1, Y2, Y3 are observed § δi represent variability in the Yi NOT explained by F § Yi is a linear function of F and δi 8 One Common Factor Model: Model Assumptions λ Y δ1 1 1 Y1 = λ1F +δ1 λ2 Y δ F 2 2 Y2 = λ2 F +δ 2 λ3 Y δ 3 3 Y3 = λ3F +δ3 § Factorial causation § F is independent of δj, i.e. cov(F,δj)=0 § δi and δj are independent for i≠j, i.e. cov(δi,δj)=0 § Conditional independence: Given the factor, observed variables are independent of one another, i.e. cov( Yi ,Yj | F ) = 0 9 One Common Factor Model: Model Interpretation Given all variables in standardized form, i.e. λ1 Y1 δ1 var(Yi)=var(F)=1 § Factor loadings: λi λ2 δ λ = corr(Y ,F) F Y2 2 i i λ3 2 § Communality of Yi: hi Y δ3 3 2 2 2 hi = λi = [corr(Yi,F)] =% variance of Yi explained by F Y1 = λ1F +δ1 § Uniqueness of Y : 1-h 2 Y F i i 2 = λ2 +δ 2 = residual variance of Yi Y3 = λ3F +δ3 § Degree of factorial determination: 2 =Σ λi /n, where n=# observed variables Y 10 Two-Common Factor Model (Orthogonal): Model Specification Y 1 δ1 λ 11 Y1 = λ11F1 + λ12F2 +δ1 λ F 21 δ 1 λ Y2 2 31 Y2 = λ21F1 + λ22F2 +δ 2 λ41 λ51 Y F F λ61 3 = λ31 1 + λ32 2 +δ3 Y3 δ3 Y4 = λ41F1 + λ42F2 +δ 4 λ12 λ22 λ32 Y δ4 4 Y = λ F + λ F +δ λ42 5 51 1 52 2 5 F λ52 2 Y5 δ5 Y6 = λ61F1 + λ62F2 +δ 6 λ62 Y6 δ6 F1 and F2 are common factors because they are shared by ≥2 variables ! 11 Matrix Notation with n variables and m factors Ynx1 = ΛnxmFmx1 + δnx1 ⎡Y1 ⎤ ⎡λ11 λ1m ⎤ ⎡δ1 ⎤ ⎡ F ⎤ ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ + ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢Fm ⎥ ⎢ ⎥ Y ⎣ ⎦m×1 ⎣ n ⎦ ⎣λn1 λnm ⎦n×m ⎣δ n ⎦n×1 12 Factor Pattern Matrix § Columns represent derived factors ⎡λ11 λ1m ⎤ § Rows represent input variables ⎢ ⎥ § Loadings represent degree to which each ⎢ ⎥ of the variables “correlates” with each of ⎢ ⎥ the factors ⎢ ⎥ § Loadings range from -1 to 1 ⎣λn1 λnm ⎦n×m § Inspection of factor loadings reveals extent to which each of the variables contributes to the meaning of each of the factors. § High loadings provide meaning and interpretation of factors (~ regression coefficients) 13 Two-Common Factor Model (Orthogonal): Model Assumptions Y § Factorial causation 1 δ1 λ 11 § F1 and F2 are independent of δj, i.e. λ 21 cov(F ,δ )= cov(F ,δ )= 0 F1 Y δ2 1 j 2 j λ31 2 λ41 § δi and δj are independent for i≠j, i.e. λ 51 cov(δi,δj)=0 λ61 Y3 δ3 § Conditional independence: Given factors F1 and F2, observed variables λ12 λ22 λ32 δ are independent of one another, i.e. Y4 4 λ 42 cov( Yi ,Yj | F1, F2) = 0 for i ≠j F λ52 § Orthogonal (=independent): 2 Y5 δ5 λ 62 cov(F1,F2)=0 Y6 δ6 14 Two-Common Factor Model (Orthogonal): Model Interpretation Given all variables in standardized form, i.e. var(Y )=var(F )=1; Y i i 1 δ1 λ11 AND orthogonal factors, i.e. cov(F1,F2)=0 λ 21 F1 Y δ2 λ31 2 § Factor loadings: λij λ41 λ51 λij = corr(Yi,Fj) λ61 Y3 δ3 2 § Communality of Yi: hi λ12 λ22 2 2 2 λ32 δ h = + =% variance of Y Y4 4 i λi1 λ i2 i λ42 explained by F1 AND F2 λ F2 52 Y δ 5 5 2 λ62 § Uniqueness of Yi: 1-hi Y6 δ6 § Degree of factorial determination: 2 =Σ λij /n, n=# observed variables Y 15 Two-Common Factor Model : The Oblique Case Given all variables in standardized form, i.e. var(Y )=var(F )=1; Y i i 1 δ1 λ11 AND oblique factors (i.e. cov(F1,F2)≠0) λ 21 F1 Y δ2 λ31 2 § The interpretation of factor loadings: λij λ41 λ51 is no longer correlation between Y and λ61 F; it is direct effect of F on Y Y3 δ3 λ12 λ22 § The calculation of communality of Yi λ32 Y δ4 4 (h 2) is more complex λ42 i F λ52 2 Y5 δ5 λ62 Y6 δ6 16 Extracting initial factors § Least-squares method (e.g. principal axis factoring with iterated communalities) § Maximum likelihood method 17 Model Fitting: Extracting initial factors Least-squares method (LS) (e.g. principal axis factoring with iterated communalities) v Goal: minimize the sum of squared differences between observed and estimated corr. matrices v Fitting steps: a) Obtain initial estimates of communalities (h2) e.g. squared correlation between a variable and the remaining variables b) Solve objective function: det(RLS-ηI)=0, 2 where RLS is the corr matrix with h in the main diag. (also termed adjusted corr matrix), η is an eigenvalue c) Re-estimate h2 d) Repeat b) and c) until no improvement can be made 18 Model Fitting: Extracting initial factors Maximum likelihood method (MLE) v Goal: maximize the likelihood of producing the observed corr matrix v Assumption: distribution of variables (Y and F) is multivariate normal v Objective function: det(RMLE- ηI)=0, -1 2 -1 -1 -1 2 2 where RMLE=U (R-U )U =U RLSU , and U is diag(1-h ) v Iterative fitting algorithm similar to LS approach v Exception: adjust R by giving greater weights to correlations with smaller unique variance, i.e. 1- h2 v Advantage: availability of a large sample χ2 significant test for goodness-of-fit (but tends to select more factors for large n!) 19 Choosing among Different Methods § Between MLE and LS § LS is preferred with v few indicators per factor v Equeal loadings within factors v No large cross-loadings v No factor correlations v Recovering factors with low loadings (overextraction) § MLE if preferred with v Multivariate normality v unequal loadings within factors § Both MLE and LS may have convergence problems 20 Factor Rotation § Goal is simple structure § Make factors more easily interpretable § While keeping the number of factors and communalities of Ys fixed!!! § Rotation does NOT improve fit! 21 Factor Rotation To do this we “rotate” factors: § redefine factors such that ‘loadings’ (or pattern matrix coefficients) on various factors tend to be very high (-1 or 1) or very low (0) § intuitively, it makes sharper distinctions in the meanings of the factors 22 Factor Rotation (Intuitively) F2 1, 2 1, 2 3 3 5 F1 5 4 4 F1 F2 Factor 1 Factor 2 Factor 1 Factor 2 x1 0.4 0.69 x1 -0.8 0 x2 0.4 0.69 x2 -0.8 0 x3 0.65 0.32 x3 -0.6 0.4 x4 0.69 -0.4 x4 0 0.8 x5 0.61 -0.35 x5 0 0.7 23 Factor Rotation § Uses “ambiguity” or non-uniqueness of solution to make interpretation more simple § Where does ambiguity come in? § Unrotated solution is based on the idea that each factor tries to maximize variance explained, conditional on previous factors § What if we take that away? § Then, there is not one “best” solution 24 Factor Rotation: Orthogonal vs. Oblique Rotation § Orthogonal: Factors are independent § varimax: maximize variance of squared loadings across variables (sum over factors) v Goal: the simplicity of interpretation of factors § quartimax: maximize variance of squared loadings across factors (sum over variables) v Goal: the simplicity of interpretation of variables § Intuition: from previous picture, there is a right angle between axes § Note: “Uniquenesses” remain the same! 25 Factor Rotation: Orthogonal vs. Oblique Rotation § Oblique: Factors are NOT independent. Change in “angle.” § oblimin: minimize covariance of squared loadings between factors. § promax: simplify orthogonal rotation by making small loadings even closer to zero. § Target matrix: choose “simple structure” a priori. § Intuition: from previous picture, angle between axes is not necessarily a right angle. § Note: “Uniquenesses” remain the same! 26 Pattern versus Structure Matrix § In oblique rotation, one typically presents both a pattern matrix and a structure matrix § Also need to report correlation between the factors § The pattern matrix presents the usual factor loadings § The structure matrix presents correlations between the variables and the factors § For orthogonal factors, pattern matrix=structure matrix § The pattern matrix is used to interpret the factors 27 Factor Rotation: Which to use? § Choice is generally not critical § Interpretation with orthogonal (varimax) is “simple” because factors are independent: “Loadings” are correlations.