Factor Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Factor Analysis Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst | The Harvard Clinical & Translational Science Center Short course, October 27, 2016 1 Well-used latent variable models Latent Observed variable scale variable scale Continuous Discrete Continuous Factor Discrete FA analysis IRT (item response) LISREL Discrete Latent profile Latent class Growth mixture analysis, regression General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS) Objectives § What is factor analysis? § What do we need factor analysis for? § What are the modeling assumptions? § How to specify, fit, and interpret factor models? § What is the difference between exploratory and confirmatory factor analysis? § What is and how to assess model identifiability? 3 What is factor analysis § Factor analysis is a theory driven statistical data reduction technique used to explain covariance among observed random variables in terms of fewer unobserved random variables named factors 4 An Example: General Intelligence (Charles Spearman, 1904) Y1 δ1 δ Y2 2 δ General Y3 3 Intelligence Y4 δ4 F Y5 δ5 5 Why Factor Analysis? 1. Testing of theory § Explain covariation among multiple observed variables by § Mapping variables to latent constructs (called “factors”) 2. Understanding the structure underlying a set of measures § Gain insight to dimensions § Construct validation (e.g., convergent validity) 3. Scale development § Exploit redundancy to improve scale’s validity and reliability 6 Part I. Exploratory Factor Analysis (EFA) 7 One Common Factor Model: Model Specification Y δ1 λ1 1 Y1 = λ1F +δ1 λ2 Y δ F 2 2 Y2 = λ2 F +δ 2 λ3 Y δ 3 3 Y3 = λ3F +δ3 § The factor F is not observed; only Y1, Y2, Y3 are observed § δi represent variability in the Yi NOT explained by F § Yi is a linear function of F and δi 8 One Common Factor Model: Model Assumptions λ Y δ1 1 1 Y1 = λ1F +δ1 λ2 Y δ F 2 2 Y2 = λ2 F +δ 2 λ3 Y δ 3 3 Y3 = λ3F +δ3 § Factorial causation § F is independent of δj, i.e. cov(F,δj)=0 § δi and δj are independent for i≠j, i.e. cov(δi,δj)=0 § Conditional independence: Given the factor, observed variables are independent of one another, i.e. cov( Yi ,Yj | F ) = 0 9 One Common Factor Model: Model Interpretation Given all variables in standardized form, i.e. λ1 Y1 δ1 var(Yi)=var(F)=1 § Factor loadings: λi λ2 δ λ = corr(Y ,F) F Y2 2 i i λ3 2 § Communality of Yi: hi Y δ3 3 2 2 2 hi = λi = [corr(Yi,F)] =% variance of Yi explained by F Y1 = λ1F +δ1 § Uniqueness of Y : 1-h 2 Y F i i 2 = λ2 +δ 2 = residual variance of Yi Y3 = λ3F +δ3 § Degree of factorial determination: 2 =Σ λi /n, where n=# observed variables Y 10 Two-Common Factor Model (Orthogonal): Model Specification Y 1 δ1 λ 11 Y1 = λ11F1 + λ12F2 +δ1 λ F 21 δ 1 λ Y2 2 31 Y2 = λ21F1 + λ22F2 +δ 2 λ41 λ51 Y F F λ61 3 = λ31 1 + λ32 2 +δ3 Y3 δ3 Y4 = λ41F1 + λ42F2 +δ 4 λ12 λ22 λ32 Y δ4 4 Y = λ F + λ F +δ λ42 5 51 1 52 2 5 F λ52 2 Y5 δ5 Y6 = λ61F1 + λ62F2 +δ 6 λ62 Y6 δ6 F1 and F2 are common factors because they are shared by ≥2 variables ! 11 Matrix Notation with n variables and m factors Ynx1 = ΛnxmFmx1 + δnx1 ⎡Y1 ⎤ ⎡λ11 λ1m ⎤ ⎡δ1 ⎤ ⎡ F ⎤ ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ + ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢Fm ⎥ ⎢ ⎥ Y ⎣ ⎦m×1 ⎣ n ⎦ ⎣λn1 λnm ⎦n×m ⎣δ n ⎦n×1 12 Factor Pattern Matrix § Columns represent derived factors ⎡λ11 λ1m ⎤ § Rows represent input variables ⎢ ⎥ § Loadings represent degree to which each ⎢ ⎥ of the variables “correlates” with each of ⎢ ⎥ the factors ⎢ ⎥ § Loadings range from -1 to 1 ⎣λn1 λnm ⎦n×m § Inspection of factor loadings reveals extent to which each of the variables contributes to the meaning of each of the factors. § High loadings provide meaning and interpretation of factors (~ regression coefficients) 13 Two-Common Factor Model (Orthogonal): Model Assumptions Y § Factorial causation 1 δ1 λ 11 § F1 and F2 are independent of δj, i.e. λ 21 cov(F ,δ )= cov(F ,δ )= 0 F1 Y δ2 1 j 2 j λ31 2 λ41 § δi and δj are independent for i≠j, i.e. λ 51 cov(δi,δj)=0 λ61 Y3 δ3 § Conditional independence: Given factors F1 and F2, observed variables λ12 λ22 λ32 δ are independent of one another, i.e. Y4 4 λ 42 cov( Yi ,Yj | F1, F2) = 0 for i ≠j F λ52 § Orthogonal (=independent): 2 Y5 δ5 λ 62 cov(F1,F2)=0 Y6 δ6 14 Two-Common Factor Model (Orthogonal): Model Interpretation Given all variables in standardized form, i.e. var(Y )=var(F )=1; Y i i 1 δ1 λ11 AND orthogonal factors, i.e. cov(F1,F2)=0 λ 21 F1 Y δ2 λ31 2 § Factor loadings: λij λ41 λ51 λij = corr(Yi,Fj) λ61 Y3 δ3 2 § Communality of Yi: hi λ12 λ22 2 2 2 λ32 δ h = + =% variance of Y Y4 4 i λi1 λ i2 i λ42 explained by F1 AND F2 λ F2 52 Y δ 5 5 2 λ62 § Uniqueness of Yi: 1-hi Y6 δ6 § Degree of factorial determination: 2 =Σ λij /n, n=# observed variables Y 15 Two-Common Factor Model : The Oblique Case Given all variables in standardized form, i.e. var(Y )=var(F )=1; Y i i 1 δ1 λ11 AND oblique factors (i.e. cov(F1,F2)≠0) λ 21 F1 Y δ2 λ31 2 § The interpretation of factor loadings: λij λ41 λ51 is no longer correlation between Y and λ61 F; it is direct effect of F on Y Y3 δ3 λ12 λ22 § The calculation of communality of Yi λ32 Y δ4 4 (h 2) is more complex λ42 i F λ52 2 Y5 δ5 λ62 Y6 δ6 16 Extracting initial factors § Least-squares method (e.g. principal axis factoring with iterated communalities) § Maximum likelihood method 17 Model Fitting: Extracting initial factors Least-squares method (LS) (e.g. principal axis factoring with iterated communalities) v Goal: minimize the sum of squared differences between observed and estimated corr. matrices v Fitting steps: a) Obtain initial estimates of communalities (h2) e.g. squared correlation between a variable and the remaining variables b) Solve objective function: det(RLS-ηI)=0, 2 where RLS is the corr matrix with h in the main diag. (also termed adjusted corr matrix), η is an eigenvalue c) Re-estimate h2 d) Repeat b) and c) until no improvement can be made 18 Model Fitting: Extracting initial factors Maximum likelihood method (MLE) v Goal: maximize the likelihood of producing the observed corr matrix v Assumption: distribution of variables (Y and F) is multivariate normal v Objective function: det(RMLE- ηI)=0, -1 2 -1 -1 -1 2 2 where RMLE=U (R-U )U =U RLSU , and U is diag(1-h ) v Iterative fitting algorithm similar to LS approach v Exception: adjust R by giving greater weights to correlations with smaller unique variance, i.e. 1- h2 v Advantage: availability of a large sample χ2 significant test for goodness-of-fit (but tends to select more factors for large n!) 19 Choosing among Different Methods § Between MLE and LS § LS is preferred with v few indicators per factor v Equeal loadings within factors v No large cross-loadings v No factor correlations v Recovering factors with low loadings (overextraction) § MLE if preferred with v Multivariate normality v unequal loadings within factors § Both MLE and LS may have convergence problems 20 Factor Rotation § Goal is simple structure § Make factors more easily interpretable § While keeping the number of factors and communalities of Ys fixed!!! § Rotation does NOT improve fit! 21 Factor Rotation To do this we “rotate” factors: § redefine factors such that ‘loadings’ (or pattern matrix coefficients) on various factors tend to be very high (-1 or 1) or very low (0) § intuitively, it makes sharper distinctions in the meanings of the factors 22 Factor Rotation (Intuitively) F2 1, 2 1, 2 3 3 5 F1 5 4 4 F1 F2 Factor 1 Factor 2 Factor 1 Factor 2 x1 0.4 0.69 x1 -0.8 0 x2 0.4 0.69 x2 -0.8 0 x3 0.65 0.32 x3 -0.6 0.4 x4 0.69 -0.4 x4 0 0.8 x5 0.61 -0.35 x5 0 0.7 23 Factor Rotation § Uses “ambiguity” or non-uniqueness of solution to make interpretation more simple § Where does ambiguity come in? § Unrotated solution is based on the idea that each factor tries to maximize variance explained, conditional on previous factors § What if we take that away? § Then, there is not one “best” solution 24 Factor Rotation: Orthogonal vs. Oblique Rotation § Orthogonal: Factors are independent § varimax: maximize variance of squared loadings across variables (sum over factors) v Goal: the simplicity of interpretation of factors § quartimax: maximize variance of squared loadings across factors (sum over variables) v Goal: the simplicity of interpretation of variables § Intuition: from previous picture, there is a right angle between axes § Note: “Uniquenesses” remain the same! 25 Factor Rotation: Orthogonal vs. Oblique Rotation § Oblique: Factors are NOT independent. Change in “angle.” § oblimin: minimize covariance of squared loadings between factors. § promax: simplify orthogonal rotation by making small loadings even closer to zero. § Target matrix: choose “simple structure” a priori. § Intuition: from previous picture, angle between axes is not necessarily a right angle. § Note: “Uniquenesses” remain the same! 26 Pattern versus Structure Matrix § In oblique rotation, one typically presents both a pattern matrix and a structure matrix § Also need to report correlation between the factors § The pattern matrix presents the usual factor loadings § The structure matrix presents correlations between the variables and the factors § For orthogonal factors, pattern matrix=structure matrix § The pattern matrix is used to interpret the factors 27 Factor Rotation: Which to use? § Choice is generally not critical § Interpretation with orthogonal (varimax) is “simple” because factors are independent: “Loadings” are correlations.
Recommended publications
  • Arxiv:1908.07390V1 [Stat.AP] 19 Aug 2019
    An Overview of Statistical Data Analysis Rui Portocarrero Sarmento Vera Costa LIAAD-INESC TEC FEUP, University of Porto PRODEI - FEUP, University of Porto [email protected] [email protected] August 21, 2019 Abstract The use of statistical software in academia and enterprises has been evolving over the last years. More often than not, students, professors, workers and users in general have all had, at some point, exposure to statistical software. Sometimes, difficulties are felt when dealing with such type of software. Very few persons have theoretical knowledge to clearly understand software configurations or settings, and sometimes even the presented results. Very often, the users are required by academies or enterprises to present reports, without the time to explore or understand the results or tasks required to do a optimal prepara- tion of data or software settings. In this work, we present a statistical overview of some theoretical concepts, to provide a fast access to some concepts. Keywords Reporting Mathematics Statistics and Applications · · 1 Introduction Statistics is a set of methods used to analyze data. The statistic is present in all areas of science involving the collection, handling and sorting of data, given the insight of a particular phenomenon and the possibility that, from that knowledge, inferring possible new results. One of the goals with statistics is to extract information from data to get a better understanding of the situations they represent. Thus, the statistics can be thought of as the science of learning from data. Currently, the high competitiveness in search technologies and markets has caused a constant race for the information.
    [Show full text]
  • Discriminant Function Analysis
    Discriminant Function Analysis Discriminant Function Analysis ● General Purpose ● Computational Approach ● Stepwise Discriminant Analysis ● Interpreting a Two-Group Discriminant Function ● Discriminant Functions for Multiple Groups ● Assumptions ● Classification General Purpose Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college, (2) to attend a trade or professional school, or (3) to seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one of the three categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students' subsequent educational choice. A medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types. To index Computational Approach Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA ). Let us consider a simple example. Suppose we measure height in a random sample of 50 males and 50 females. Females are, on the average, not as tall as http://www.statsoft.com/textbook/stdiscan.html (1 of 11) [2/11/2008 8:56:30 AM] Discriminant Function Analysis males, and this difference will be reflected in the difference in means (for the variable Height ).
    [Show full text]
  • Covid-19 Epidemiological Factor Analysis: Identifying Principal Factors with Machine Learning
    medRxiv preprint doi: https://doi.org/10.1101/2020.06.01.20119560; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . Covid-19 Epidemiological Factor Analysis: Identifying Principal Factors with Machine Learning Serge Dolgikh1[0000-0001-5929-8954] Abstract. Based on a subset of Covid-19 Wave 1 cases at a time point near TZ+3m (April, 2020), we perform an analysis of the influencing factors for the epidemics impacts with several different statistical methods. The consistent con- clusion of the analysis with the available data is that apart from the policy and management quality, being the dominant factor, the most influential factors among the considered were current or recent universal BCG immunization and the prevalence of smoking. Keywords: Infectious epidemiology, Covid-19 1 Introduction A possible link between the effects of Covid-19 pandemics such as the rate of spread and the severity of cases; and a universal immunization program against tuberculosis with BCG vaccine (UIP, hereinafter) was suggested in [1] and further investigated in [2]. Here we provide a factor analysis based on the available data for the first group of countries that encountered the epidemics (Wave 1) by applying several commonly used factor-ranking methods. The intent of the work is to repeat similar analysis at several different points in the time series of cases that would allow to make a confident conclusion about the epide- miological and social factors with strong influence on the course of the epidemics.
    [Show full text]
  • T-Test & Factor Analysis
    University of Brighton Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives • Continue using parametric test if the sample is large or enough evidence available to prove the usage • Transforming and manipulating variables • Use non parametric test Dr. Paurav Shukla t-test Tests used for significant differences between Used when they are two groups only groups Compares the mean score on some continuous Independent sample t-test variable Paired sample t-test Largely used in measuring before – after effect One-way ANOVA (between groups) Also called intervention effect One-way ANOVA (repeated groups) Option 1: Same sample over a period of time Two-way ANOVA (between groups) Option 2: Two different groups at a single point of MANOVA time Assumptions associated with t-test Why t-test is important? Population distribution of sample is normal Highly used technique for hypothesis testing Probability (random) sampling Can lead to wrong conclusions Level of measurement Type 1 error Dependent variable is continuous • Reject null hypothesis instead of accepting – When we assume there is a difference between groups, when it Observed or measured data must be is not independent Type 2 error If interaction is unavoidable use a stringent alpha • Accept null hypothesis instead of rejecting value (p<.01) – When we assume there is no difference between groups, when Homogeneity of variance there is Assumes that samples are obtained from populations Solution of equal variance • Interdependency between both errors ANOVA is reasonably robust against this • Selection of alpha level Dr.
    [Show full text]
  • Statistical Analysis in JASP
    Copyright © 2018 by Mark A Goss-Sampson. All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatsoever without the express written permission of the author except for the purposes of research, education or private study. CONTENTS PREFACE .................................................................................................................................................. 1 USING THE JASP INTERFACE .................................................................................................................... 2 DESCRIPTIVE STATISTICS ......................................................................................................................... 8 EXPLORING DATA INTEGRITY ................................................................................................................ 15 ONE SAMPLE T-TEST ............................................................................................................................. 22 BINOMIAL TEST ..................................................................................................................................... 25 MULTINOMIAL TEST .............................................................................................................................. 28 CHI-SQUARE ‘GOODNESS-OF-FIT’ TEST............................................................................................. 30 MULTINOMIAL AND Χ2 ‘GOODNESS-OF-FIT’ TEST. ..........................................................................
    [Show full text]
  • Autocorrelation-Based Factor Analysis and Nonlinear Shrinkage Estimation of Large Integrated Covariance Matrix
    Autocorrelation-based Factor Analysis and Nonlinear Shrinkage Estimation of Large Integrated covariance matrix Qilin Hu A Thesis Submitted for the Degree of Doctor of Philosophy Department of Statistics London School of Economics and Political Science Supervisors: Assoc Prof. Clifford Lam and Prof. Piotr Fryzlewicz London, Aug 2016 Declaration I certify that the thesis I have presented for examination for the MPhil/PhD degree of the London School of Economics and Political Science is solely my own work other than where I have clearly indicated that it is the work of others (in which case the extent of any work carried out jointly by me and any other person is clearly identified in it). The copyright of this thesis rests with the author. Quotation from it is permitted, provided that full acknowledgement is made. This thesis may not be reproduced without my prior written consent. I warrant that this authorisation does not, to the best of my belief, infringe the rights of any third party. I declare that my thesis consists of 4 chapters. Statement of conjoint work I confirm that chapter 3 was jointly co-authored with Assoc Prof. Clifford Lam. 2 Acknowledgements I would like to express my whole-hearted gratitude to my Ph.D. supervisor, Assoc Prof. Clifford Lam. For his tremendous support and encouragement during my study. I would also like to thank Prof. Piotr Fryzlewicz for his support. In addition, I want to express my appreciation to the Department of Statistics at LSE for providing me a fantastic research environment and partial financial support. Furthermore, I am grateful for the generosity of China Scholarship Council (CSC) who provides me financial support during the last three years.
    [Show full text]
  • Linear Discriminant Analysis 1
    MTTTS17 Dimensionality Reduction and Visualization Spring 2020 Jaakko Peltonen Lecture 3: Linear feature extraction 1 Feature extraction ● feature extraction: (more general) transform the original to (k < d). Example transformations: Linear transformations: Principal component analysis (PCA), Linear Discriminant Analysis (LDA), Factor Analysis (FA). Some metric learning methods are similar to linear transformations. Nonlinear transformations: Self-Organizing Map (SOM), Multidimensional scaling (MDS), manifold embedding methods. Often based on assuming the data lies on a low-dimensional manifold. ● Today's lecture: linear transformations and kernel-based nonlinear transformations 2 Principal Component Analysis 1 Definitions: ● PCA finds a low-dimensional linear subspace such that when x is projected there information loss (here dened as variance) is minimized. ● Finds directions of maximal variance. ● Equivalent to finding eigenvalues and eigenvectors of the covariance matrix. ● Can also be derived probabilistically (see Tipping, Bishop (1999) Mixtures of Probabilistic Principal Component Analyzers. Neural Computation 11: 443-482). 3 Principal Component Analysis 2 Figure 6.1: Principal components analysis centers the sample and then rotates the axes to line up with the directions of highest variance. If the variance on z is too small, it can be ignored and we 2 have dimensionality reduction from two to one. From: E. Alpaydın. 2004. Introduction to Machine Learning. Copyright The MIT Press. 4 Principal Component Analysis 3 Solution: ● More formally: data ● Center data: , where 1 t t T ● Compute the covariance matrix S= ( y )( y ) N ∑t ● Diagonalize S using Spectral Decomposition: where ● C is an orthogonal (rotation) matrix satisfying (identity matrix), and ● D is a diagonal matrix whose diagonal elements are the eigenvalues ● i:th column of C is the i:th eigenvector.
    [Show full text]
  • An Exploratory Factor Analysis and Reliability Analysis of the Student Online Learning Readiness (SOLR) Instrument
    An Exploratory Factor Analysis and Reliability Analysis of the Student Online Learning Readiness (SOLR) Instrument Taeho Yu University of Virginia Jennifer C. Richardson Purdue University Abstract The purpose of this study was to develop an effective instrument to measure student readiness in online learning with reliable predictors of online learning success factors such as learning outcomes and learner satisfaction. The validity and reliability of the Student Online Learning Readiness (SOLR) instrument were tested using exploratory factor analysis (EFA) and reliability analysis. Twenty items from three competencies—social competencies, communication competencies, and technical competencies—were designated for the initial instrument based on the Student Online Learning Readiness (SOLR) Model as a new conceptual model. An exploratory factor analysis (EFA) revealed that four factor-structures of the instrument of student readiness in online learning explained 66.69% of the variance in the pattern of relationships among the items. All four factors had high reliabilities (all at or above Cronbach’s Į > .823). Twenty items remained in the final questionnaire after deleting two items which cross-loaded on multiple factors (social competencies with classmates: five items; social competencies with instructor: five items; communication competencies: four items; and technical competencies: six items). The four-factor structure of the Student Online Learning Readiness (SOLR) instrument has been confirmed through this study. Educators can use the Student Online Learning Readiness (SOLR) instrument in order to gain a better understanding of the level of freshmen college students’ online learning readiness by measuring their social, communication, and technical competencies. In addition, this study examined two factors of social integration in Tinto’s Student Integration Model (SIM) and has introduced the Student Online Learning Readiness (SOLR) conceptual model with the purpose of extending Tinto’s social integration model to online learning environments.
    [Show full text]
  • CHAPTER 4 Exploratory Factor Analysis and Principal Components
    CHAPTER 4 Exploratory Factor Analysis and Principal Components Analysis Exploratory factor analysis (EFA) and principal components analysis (PCA) both are methods that are used to help investigators represent a large number of relationships among normally distributed or scale variables in a simpler (more parsimonious) way. Both of these approaches determine which, of a fairly large set of items, “hang together” as groups or are answered most similarly by the participants. EFA also can help assess the level of construct (factorial) validity in a dataset regarding a measure purported to measure certain constructs. A related approach, confirmatory factor analysis, in which one tests very specific models of how variables are related to underlying constructs (conceptual variables), requires additional software and is beyond the scope of this book so it will not be discussed. The primary difference, conceptually, between exploratory factor analysis and principal components analysis is that in EFA one postulates that there is a smaller set of unobserved (latent) variables or constructs underlying the variables actually observed or measured (this is commonly done to assess validity), whereas in PCA one is simply trying to mathematically derive a relatively small number of variables to use to convey as much of the information in the observed/measured variables as possible. In other words, EFA is directed at understanding the relations among variables by understanding the constructs that underlie them, whereas PCA is simply directed toward enabling one to derive fewer variables to provide the same information that one would obtain from the larger set of variables. There are actually a number of different ways of computing factors for factor analysis; in this chapter, we will use only one of these methods, principal axis factor analysis (PA).
    [Show full text]
  • REFEREE's REPORT Title of the Paper
    REFEREE’S REPORT Title of the paper : Sentiment Indices on Financial Markets: What Do they Measure? Evaluation The paper is acceptable with substantial revisions COMMENTS AND SUGGESTIONS Summary After a critical review of the literature about the concept of sentiment data in economics and finance, in this paper the author proposes a procedure for the analysis of weekly data from the Sentix survey. The author bases his analysis on two hypotheses: - short term sentiment has higher variance and shorter autocorrelation than long term sentiment; - investor’s behaviour is conditioned mainly by two feelings that develop and act differently in her/his mind: mood (long term feeling) and emotion (short term feeling). General comment My comments attain to the statistical aspects of the paper, while I prefer not to comment on the economics and psychological aspects. Generally speaking, the quantitative analysis should be improved. Major Essential Revisions 1) the use of factor analysis for extracting latent dimensions from a correlation structure is a good idea, but: a) as the author specifies in a footnote (on page 19), it’s an exploratory method, i.e. its results can’t be generalized to the population, so using them to confirm (or reject) a priori knowledge is inappropriate, and b) in this case, results of factor analysis are obvious given the correlation analysis in table 3, that already shows the presence in the data of two latent factors strictly related to short and long term sentiment respectively. 2) Several times the author justifies his methodological choices with unsuitable arguments, such as the lack of software resources for running appropriate testing.
    [Show full text]
  • Factor Analysis and Logistic Regression for Forest Categorical and Quantitative Data
    Innova Ciencia Vol 5, No. 5;May 2013 Factor analysis and logistic regression for forest categorical and quantitative data Kyriaki Kitikidou (Corresponding author) Dimokritos University of Thrace, Greece, Department of Forestry and Management of Environment and Natural Resources Northern Canada University Pandazidou 193, 68200, Orestiada, Greece Tel: +30-6932042106 E-mail: [email protected] Nikolaos Arambatzis Prefecture of Chalkidiki, Department of Agricultural Development, Greece Abstract Analyzing units described by a mixture of sets of quantitative and categorical variables is a relevant challenge. In this paper, data were extracted from the management study of the University forest of Taxiarchis (Northern Greece). The variables used in the analyses were: site quality, exposure, rock, soil, altitude, incline, tree age, tree crown ratio, tree height, and section size. Principal components analysis (PCA) was used to include these two types of variables in order to study the correlations of a large number of forest variables by grouping the variables in factors. A few components accounted for most of the variation, and these components were used to replace the original variables. Thus, this method reduced the number of variables in the data file. Further on, hierarchical cluster analysis (HCA) was used for detecting groupings in data. The objects in these groups were cases (forest sections). Finally, a multinomial logistic regression model was fitted to the treated data, and it was proved to be quite useful for site quality prediction in the above-mentioned forest. Keywords: Forest variables, hierarchical cluster analysis, multinomial logit, principal components analysis. 1. Introduction In different studies, statistical units are simultaneously described by heterogeneous variables, quantitative and categorical.
    [Show full text]
  • Factor Analysis Rachael Smyth and Andrew Johnson
    Factor Analysis Rachael Smyth and Andrew Johnson Introduction For this lab, we are going to explore the factor analysis technique, looking at both principal axis and principal components extraction methods, two different methods of identifying the correct number of factors to extract (scree plot and parallel analysis), and two different methods of rotating factors to facilitate interpretation. Load Libraries The libraries we’ll need for this lab are psych and GPArotation. library(psych) library(GPArotation) The Data The dataset that we’ll use for this demonstration is called bfi and comes from the psych package. It is made up of 25 self-report personality items from the International Personality Item Pool, gender, education level and age for 2800 subjects and used in the Synthetic Aperture Personality Assessment. The personality items are split into 5 categories: Agreeableness (A), Conscientiousness (C), Extraversion (E), Neuroticism (N), Openness (O). Each item was answered on a six point scale: 1 Very Inaccurate, 2 Moderately Inaccurate, 3 Slightly Inaccurate, 4 Slightly Accurate, 5 Moderately Accurate, 6 Very Accurate. data("bfi") Describing the data It’s a good idea to look at your data before you run any analysis. Any participant who is missing any piece of data will be fully excluded from the analysis. It’s important to keep that in mind. describe(bfi[1:25]) ## vars n mean sd median trimmed mad min max range skew kurtosis se ## A1 1 2784 2.41 1.41 2 2.23 1.48 1 6 5 0.83 -0.31 0.03 ## A2 2 2773 4.80 1.17 5 4.98 1.48 1 6 5 -1.12 1.05
    [Show full text]