Chapter 14 Factor Analysis

Chapter 14 Factor analysis 14.1 INTRODUCTION Factor analysis is a method for investigating whether a number of variables of interest Y1, Y2, : : :, Yl, are linearly related to a smaller number of unobservable factors F1, F2, : : :, Fk . The fact that the factors are not observable disquali¯es regression and other methods previously examined. We shall see, however, that under certain conditions the hypothesized factor model has certain implications, and these implications in turn can be tested against the observations. Ex- actly what these conditions and implications are, and how the model can be tested, must be explained with some care. 14.2 AN EXAMPLE Factor analysis is best explained in the context of a simple example. Stu- dents entering a certain MBA program must take three required courses in ¯nance, marketing and business policy. Let Y1, Y2, and Y3, respectively, represent a student's grades in these courses. The available data consist of the grades of ¯ve students (in a 10-point numerical scale above the passing mark), as shown in Table 14.1. Table 14.1 Student grades Student Grade in: no. Finance, Y1 Marketing, Y2 Policy, Y3 1 3 6 5 2 7 3 3 3 10 9 8 4 3 9 7 5 10 6 5 c Peter Tryfos, 1997. This version printed: 14-3-2001. ° 2 Chapter 14: Factor analysis It has been suggested that these grades are functions of two underlying factors, F1 and F2, tentatively and rather loosely described as quantitative ability and verbal ability, respectively. It is assumed that each Y variable is linearly related to the two factors, as follows: Y1 = ¯10 + ¯11F1 + ¯12F2 + e1 Y2 = ¯20 + ¯21F1 + ¯22F2 + e2 (14:1) Y3 = ¯30 + ¯31F1 + ¯32F2 + e3 The error terms e1, e2, and e3, serve to indicate that the hypothesized relationships are not exact. In the special vocabulary of factor analysis, the parameters ¯ij are referred to as loadings. For example, ¯12 is called the loading of variable Y1 on factor F2. In this MBA program, ¯nance is highly quantitative, while marketing and policy have a strong qualitative orientation. Quantitative skills should help a student in ¯nance, but not in marketing or policy. Verbal skills should be helpful in marketing or policy but not in ¯nance. In other words, it is expected that the loadings have roughly the following structure: Loading on: Variable, Yi F1, ¯i1 F2, ¯i2 Y1 + 0 Y2 0 + Y3 0 + The grade in the ¯nance course is expected to be positively related to quantitative ability but unrelated to verbal ability; the grades in marketing and policy, on the other hand, are expected to be positively related to verbal ability but unrelated to quantitative ability. Of course, the zeros in the preceding table are not expected to be exactly equal to zero. By `0' we mean approximately equal to zero and by `+' a positive number substantially di®erent from zero. It may appear that the loadings can be estimated and the expectations tested by regressing each Y against the two factors. Such an approach, however, is not feasible because the factors cannot be observed. An entirely new strategy is required. Let us turn to the process that generates the observations on Y1, Y2 and Y3 according to (14.1). The simplest model of factor analysis is based on two assumptions concerning the relationships (14.1). We shall ¯rst describe these assumptions and then examine their implications. A1: The error terms ei are independent of one another, and such 2 that E(ei) = 0 and V ar(ei) = ¾i . 14.2 An example 3 One can think of each ei as the outcome of a random draw with replace- ment from a population of ei-values having mean 0 and a certain variance 2 ¾i . A similar assumption was made in regression analysis (Section 3.2). A2: The unobservable factors Fj are independent of one another and of the error terms, and are such that E(Fj) = 0 and V ar(Fj) = 1. In the context of the present example, this means in part that there is no relationship between quantitative and verbal ability. In more advanced models of factor analysis, the condition that the factors are independent of one another can be relaxed. As for the factor means and variances, the assumption is that the factors are standardized. It is an assumption made for mathematical convenience; since the factors are not observable, we might as well think of them as measured in standardized form. Let us now examine some implications of these assumptions. Each observable variable is a linear function of independent factors and error terms, and can be written as Yi = ¯i0 + ¯i1F1 + ¯i2F2 + (1)ei: The variance of Yi can be calculated by applying the result in Appendix A.11: 2 2 2 V ar(Yi) = ¯i1V ar(F1) + ¯i2V ar(F2) + (1) V ar(ei) 2 2 2 = ¯i1 + ¯i2 + ¾i : We see that the variance of Yi consists of two parts: 2 2 2 V ar(Yi) = ¯i1 + ¯i2 + ¾i : communality speci¯c variance | {z } |{z} The ¯rst, the communality of the variable, is the part that is explained by the common factors F1 and F2. The second, the speci¯c variance, is the part of the variance of Yi that is not accounted by the common factors. If the two factors were perfect predictors of grades, then e1 = e2 = e3 = 0 2 2 2 always, and ¾1 = ¾2 = ¾3 = 0: To calculate the covariance of any two observable variables, Yi and Yj, we can write Yi = ¯i0 + ¯i1F1 + ¯i2F2 + (1)ei + (0)ej; and Yj = ¯j0 + ¯j1F1 + ¯j2F2 + (0)ei + (1)ej; and apply the result in Appendix A.12 to ¯nd 4 Chapter 14: Factor analysis Cov(Yi; Yj) = ¯i1¯j1V ar(F1) + ¯i2¯j2V ar(F2) + (1)(0)V ar(ei) + (0)(1)V ar(ej) = ¯i1¯j1 + ¯i2¯j2: We can arrange all the variances and covariances in the form of the following table: Variable: Variable: Y1 Y2 Y3 2 2 2 Y1 ¯11 + ¯12 + ¾1 ¯21¯11 + ¯22¯12 ¯31¯11 + ¯32¯12 2 2 2 Y2 ¯11¯21 + ¯12¯22 ¯21 + ¯22 + ¾2 ¯21¯31 + ¯22¯32 2 2 2 Y3 ¯11¯31 + ¯12¯32 ¯21¯31 + ¯22¯32 ¯31 + ¯32 + ¾3 We have placed the variances of the Y variables in the diagonal cells of the table and the covariances o® the diagonal. These are the variances and covariances implied by the model's assumptions. We shall call this table the theoretical variance covariance matrix (see Appendix A.11). The matrix is symmetric, in the sense that the entry in row 1 and column 2 is the same as that in row 2 and column 1, and so on. Let us now turn to the available data. Given observations on the variables Y1, Y2, and Y3, we can calculate the observed variances and covariances of the variables, and arrange them in the observed variance covariance matrix: Variable: Variable: Y1 Y2 Y3 2 Y1 S1 S12 S13 2 Y2 S21 S2 S23 2 Y3 S31 S32 S3 2 Thus, S1 is the observed variance of Y1, S12 the observed covariance of Y1 and Y2, and so on. It is understood, of course, the S12 = S21, S13 = S31, and so on; the matrix, in other words, is symmetric. It can be easily con¯rmed that the observed variance covariance matrix for the data of Table 14.1 is as follows: 9:84 0:36 0:44 0:36 ¡5:04 3:84 ¡ 0 0:44 3:84 3:04 1 @ A 14.3 Factor loadings are not unique 5 On the one hand, therefore, we have the observed variances and covariances of the variables; on the other, the variances and covariances implied by the factor model. If the model's assumptions are true, we should be able to estimate the loadings ¯ij so that the resulting estimates of the theoretical variances and covariances are close to the observed ones. We shall soon see how these estimates can be obtained, but ¯rst let us examine an important feature of the factor model. 14.3 FACTOR LOADINGS ARE NOT UNIQUE We would like to demonstrate that the loadings are not unique, that is, that there exist an in¯nite number of sets of values of the ¯ij yielding the same theoretical variances and covariances. Consider ¯rst an arbitrary set of loadings de¯ning what we shall call Model A: Y1 = 0:5 F1 + 0:5 F2 + e1 Y2 = 0:3 F1 + 0:3 F2 + e2 Y = 0:5 F 0:5 F + e 3 1 ¡ 2 3 It is not di±cult to verify that the theoretical variance covariance matrix is 2 0:5 + ¾1 0:3 0 2 0:3 0:18 + ¾2 0 0 2 1 0 0 0:5 + ¾3 @ 2 2 2 A 2 For example, V ar(Y1) = (0:5) + (0:5) + ¾1 = 0:5 + ¾1; Cov(Y1; Y2) = (0:5)(0:3) + (0:5)(0:3) = 0:3; and so on. Next, consider Model B, having a di®erent set of ¯ij: Y1 = (p2=2) F1 + 0 F2 + e1 Y2 = (0:3p2) F1 + 0 F2 + e2 Y = 0 F (p2=2) F + e 3 1 ¡ 2 3 It can again be easily con¯rmed that the theoretical variances and covari- 2 ances are identical to those of Model A. For example, V ar(Y1) = (p2=2) + 2 2 2 (0) + ¾1 = 0:5 + ¾1; Cov(Y1; Y2) = (p2=2)(0:3p2) + (0)(0) = 0:3; and so on. Examine now panel (a) of Figure 14.1.

Chapter 14 Factor Analysis

Arxiv:1908.07390V1 [Stat.AP] 19 Aug 2019

Discriminant Function Analysis

Factor Analysis

Covid-19 Epidemiological Factor Analysis: Identifying Principal Factors with Machine Learning

T-Test & Factor Analysis

Statistical Analysis in JASP

Autocorrelation-Based Factor Analysis and Nonlinear Shrinkage Estimation of Large Integrated Covariance Matrix

Linear Discriminant Analysis 1

An Exploratory Factor Analysis and Reliability Analysis of the Student Online Learning Readiness (SOLR) Instrument

CHAPTER 4 Exploratory Factor Analysis and Principal Components

REFEREE's REPORT Title of the Paper

Factor Analysis and Logistic Regression for Forest Categorical and Quantitative Data