Patrick Mair Modern Psychometrics with R Use R!
Total Page:16
File Type:pdf, Size:1020Kb
UseR! Patrick Mair Modern Psychometrics with R Use R! Series Editors Robert Gentleman Kurt Hornik Giovanni Parmigiani More information about this series at http://www.springer.com/series/6991 Patrick Mair Modern Psychometrics with R 123 Patrick Mair Department of Psychology Harvard University Cambridge, MA, USA ISSN 2197-5736 ISSN 2197-5744 (electronic) Use R! ISBN 978-3-319-93175-3 ISBN 978-3-319-93177-7 (eBook) https://doi.org/10.1007/978-3-319-93177-7 Library of Congress Control Number: 2018946544 © Springer International Publishing AG, part of Springer Nature 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface We are living in exciting times when it comes to statistical applications in psychology. The way statistics is used (and maybe even perceived) in psychology has drastically changed over the last few years – computationally as well as methodologically. From a computational perspective, up to some years ago many psychologists considered R as an analysis tool used by the nerdiest quants only. In recent years, R has taken the field of psychology by storm, to the point that it can now safely be considered the lingua franca for statistical data analysis in psychology. R is open source, makes statistical analyses reproducible, and comes with a vast amount of statistical methodology implemented in a myriad of packages. Speaking of packages, in the past decade a flourishing psychometric developer community has evolved; their work has made R a powerful engine for performing all sorts of psychometric analyses. From a methodological perspective, the latent continuum of statistical methods in modern psychology ranges from old school classical test theory approaches to funky statistical and machine learning techniques. This book is an attempt to pay tribute to this expansive methodology spectrum by broadening the standard “psychological measurement” definition of psychometrics toward the statistical needs of a modern psychologist. An important aspect of this book is that in several places uncommon, maybe even exotic techniques are presented. From my own experience consulting and teaching at a psychology department, these techniques turned out to be very useful to the students and faculty when analyzing their data. The broad methodological range covered in this book comes at a price: It is not possible to provide in-depth narrative details and methodological elaborations on various methods. Indeed, an entire book could probably be written about each single chapter. At the end of the day, the goal of this book is to give the reader a starting point when analyzing data using a particular method (including fairly advanced versions for some of them), and to hopefully motivate him/her to delve deeper into a method based on the additional literature provided in the respective sections, if needed. In the spirit of Springer’s useR! series, each method is presented according to the following triplet: (1) when to apply it and what is it doing, (2) how to compute it in R, and (3) how to present, visualize, and interpret the results. v vi Preface Throughout the book an effort has been made to use modern, sparkling, real-life psychological datasets instead of standard textbook data. This implies that results do not always turn out perfectly (e.g., assumptions are not always perfectly fulfilled, models may not always fit perfectly, etc.), but it should make it easier for the reader to adapt these techniques to his/her own datasets. All datasets are included in the MPsychoR package available on CRAN. The R code for reproducing the entire book is available from the book website. The target audience of this manuscript is primarily graduate students and postdocs in psychology, thus the language is kept fairly casual. It is assumed that the reader is familiar with the R syntax and has a solid understanding of basic statistical concepts and linear models and preferably also generalized linear models and mixed-effects models for some sections. In order to be able to follow some of the chapters, especially those on multivariate exploratory techniques, basic matrix algebra skills are needed. In addition, some sections cover Bayesian modeling approaches. For these parts, the reader needs to be familiar with basic Bayesian principles. Note that no previous exposure to psychometric techniques is required. However, if needed, the reader may consult more narrative texts on various topics to gain a deeper understanding, since explanations in this book are kept fairly crisp. The book is organized as follows. It starts with the simplest and probably oldest psychometric model formulation: the true score model within the context of classical test theory, with special emphasis placed on reliability. The simple reliability concept is then extended to a more general framework called general- izability theory. Chapter 2 deals with factor analytic techniques, exploratory as well as confirmatory. The sections that cover confirmatory factor analysis (CFA) focus on some more advanced techniques such as CFA with covariates, longitudinal, multilevel, and multigroup CFA. As a special exploratory flavor, Bayesian factor analysis is introduced as well. The CFA sections lay the groundwork for Chap. 4 on structural equation models (SEM). SEM is introduced from a general path modeling perspective including combined moderator-mediator models. Modern applications such as latent growth models are described as well. Chapter 4 covers what is often called modern test theory. The first four sections cover classical item response theory (IRT) methods for dichotomous and polytomous items. The remaining sections deal with some more advanced flavors such as modern DIF (differential item functioning) techniques, multidimensional IRT, longitudinal IRT, and Bayesian IRT. Chapter 5, the last parametric chapter for a while, tackles a special type of input data: preference data such as paired comparisons, rankings, and ratings. Then, a sequence of multivariate exploratory methods chapters begins. Chapter 6 is all about principal component analysis (PCA), including extensions such as three-way PCA and independent component analysis (ICA) with application on EEG (electroencephalography) data. What follows is a chapter on correspondence analysis which, in the subsequent section, is extended to a more general framework called Gifi. An important incarnation of Gifi is the possibility to fit a PCA on ordinal data or, more generally, mixed input scale levels. Chapter 9 covers in detail multidimensional scaling (MDS). Apart from standard exploratory MDS, various Preface vii sections illustrate extensions such as confirmatory MDS, unfolding, individual differences scaling, and Procrustes. Chapter 10 explores techniques for advanced visualization techniques by means of biplots, which can be applied to each of the preceding exploratory methods. The remaining chapters of the book cover various special techniques useful for modern psychological applications. Chapter 11 introduces correlation networks, a hot topic in clinical psychology at the moment. These simple network approaches are extended to various advanced techniques involving latent networks and Bayesian networks, the latter based on the concept of directed acyclic graphs. Chapter 12 is all about sophisticated parametric clustering techniques. By way of the concept of mixture distributions, various extensions of the well-known psychometric technique called latent class analysis are presented. This mixture distribution idea is then integrated into a regression framework, which leads to mixture regression and Dirichlet process regression. Since text data are of increasing importance in psychology, a special clustering approach called topic models is included as well. Many modern psychological experiments involve longitudinal or time series measurements on a single participant. Chapter 13 offers some possibilities for how to model such data. It starts with Hidden Markov models, yet another approach from the parametric clustering family. The subsequent section presents time series analysis (ARIMA models), where the main goal is to forecast future observations. Finally, functional data analysis is introduced, a modeling framework for