UCLA Department of Statistics Papers

Title in and Psychometrics in R

Permalink https://escholarship.org/uc/item/90n874pr

Author Leeuw, Jan de

Publication Date 2006-08-11

Peer reviewed

eScholarship.org Powered by the California Digital Library University of California R in Psychometrics and Psychometrics in R

Jan de Leeuw, UCLA Statistics

StatisticsUcla In psychometrics, and in the closely related fields of quantititative methods for the social and educational sciences, R is not yet used very often. Traditional mainframe packages such as SAS and SPSS are still dominant at the user-level, R in Psychometrics and has made inroads at the teaching level, and Matlab is quite prominent at the research level.

Psychometrics in R In this paper we define the most visible techniques in the psychometrics area, we give an overview of what is available in R, and we discuss what is missing. Jan de Leeuw We then outline a strategy and a project to fill in the gaps. The outcome will hopefully be a more prominent position of R in the social and behavioral sciences, and as a result less of a gap between these disciplines and mainstream statistics.

2

1. What is Psychometrics ? How is it related to other Foometrics ?

2. How much R is there in Psychometrics ? Can there be more ? Should there be more ?

3. How much Psychometrics is there in R ? Will there be more ? What is missing.

A recent overview of what Psychometricians themselves think about Psychometrics is in Statistica Neerlandica, 60, 2006, 135-144.

3 4 If Foo is a science then Foo often has both an Each of the social and behavioural sciences has a area Foometrics and an area Mathematical Foo. form of Foometrics, although they may not all use a name in this family. Mathematical Foo applies mathematical modeling to the Foo subject area, while Clearly Economics, Psychology, Biology, Foometrics develops and studies data analysis Archeology, Anthropology, and Environmental techniques for empirical data collected in Foo. Science have their own Foometrics.

What we call statistics is the union of the various And then there are various recent upstarts such as Foometrics over all Foo. Not the intersection, but Cliometrics, Informetrics, Bibliometrics, the union. Behaviormetrics, Ecolometrics, Cybermetrics, and Scientometrics.

5 6

Sociology would like to have Sociometrics, but In this presentation we'll look at Psychometrics the name was already in use for something quite and Educometrics, with a dash of Sociometrics different. Historiometrics and Archeometrics are and Econometrics. there, but struggling. Psychometrics and Educometrics have been Education does not really have Educometrics, but around for a long time, at least since Galton, and we'll use it anyway. their development has been very closely linked and often the two have been indistinguishable. Social sciences in which data are less prominent usually have books and conferences with titles So we do not distort reality too much if we just such as Statistics in Foo -- they will have their simply call the body of techniques we discuss very own Foometrics in the future. Psychometrics.

7 8 R in Psychometrics

Psychometrics Traditionally psychologists doing data analysis

MDS use SPSS, some use SAS. 3Mode

IRT CA Psychometricians developing data analysis FA SEM techniques use Matlab, sociometricians and

HLM LogLin econometricians (at least in the US) tend to use Stata. Educometrics Sociometrics The situation in France or England may be quite different.

9 10

This has mainly historical reasons -- it has to do Psychometric software is often distributed by with where these packages originated. incorporating it as modules in the standard packages (SPSS, SAS, Stata), using either native But it also has to do with the rather large distance matrix routines if available or linking in compiled between areas such as psychometrics and code. This guarantees good distribution, some (academic) statistics, which again has historical money, but certainly not efficient computation. reasons, most of them silly. Typically, there is not much interaction, despite institutions like ETS Examples are CATEGORIES for CA in SPSS, and Bell Labs. PROC CALIS for SEM and PROC GLM for MLA in SAS, and gllamm for SEM and MLA in And thus the R revolution has largely passed Stata. psychometrics by.

11 12 In addition, psychometricians tend to write stand- Writing stand-alone compiled packages often alone packages for specific families of means that the psychometrician is a small techniques. This is often compiled code company, trying to make money. It also means a combined with a suitable GUI. certain form of competition, which does not really belong in academia. And it means The prototypical example are SEM packages like proprietary software, which costs money. LISREL, EQS, M-PLUS, AMOS, or MLA packages such as HLM or ML-WIN -- but there More seriously, perhaps, is that this approach many similar stand-alone packages for IRT and means black-box software, in which the CA and LLA as well. In fact the number of CA machinery is almost completely hidden. This packages in marketing, for example, is means the user often will not even try to staggering. understand what is going on.

13 14

The techniques implemented in the black-box Promoting the teaching and the use of R in packages are often complicated (many parameters, psychometrics has some major advantages. complicated optimizations, doubtful standard errors). 1. The distance to academic statistics becomes smaller. This is necessarily true: simpler techniques are already implemented in SAS or SPSS and usually 2. Software is more transparent -- driven by the institution has a site license for those. interpreted code. Reproducible results are more likely.

Thus we have Deus Ex Machina software: it 3. One can teach with R. One can teach SAS, but one transforms large datasets into rather mysterious cannot teach with SAS (or LISREL). pictures or tables that are nevertheless acceptable, and often even encouraged, by peers and journals. 4. Software should be free. 15 16 Psychometrics in R The psychoR project.

We give a quick inventory of the psychometric I have been writing and planning a substantial software now available or soon to be available in R. number of psychometric techniques in R. Eventually they will grow up to be packages. I shall concentrate on CRAN, of course, while mentioning some additional easily available They are not intended to replace existing packages on other servers. packages: let a thousand flowers bloom. They are written following the familiar programming We shall see there is quite an abundance, although philosophy that you can write FORTRAN in any in most cases all forms of organization is lacking language. You can find them at and duplications abound. http://www.cuddyvalley.org/psychoR

17 18

1. Simple and Multiple Correspondence Analysis. JSS (www.jstatsoft.org) is planning a number of special issues, with appropriate guest editors, and There is CA and MCA both in MASS, in ade4, in names such as FactoMineR, and in homals. Many variations (Canonical CA, Fuzzy CA, Detrended CA, -- R in Psychometrics Multiway CA, Discriminant CA, Co-CA) in -- R in Econometrics ade4, PTAk, cocorresp, vegan, made4. At least -- R in Sociometrics three more CA packages (Greenacre, Beh, De Leeuw) with various options are currently being and whatever else anyone suggests along these prepared. lines. Of course there is an inherent risk in actually making constructive suggestions -- you An Embarrassment of riches. may wind up to be a guest editor.

19 20 The homals (soon gifi) package does what SPSS 2. Categories does, and more. It has many forms of multivariate analysis with optimal scaling, ltm fits the simple Rasch model, the graded logistic organized as extensions of MCA. But it is rather model for polytomous data, and the linear poorly documented. multidimensional logistic model. m mprobit fits the multivariate binary probit model. min min tr (X G jY j)!(X G jY j) X! X=I Y j Y j − − ∈ j=1 ! Logistic IRT is related to Gaussian ordination, CA and MCA are extended in the psychoR project implemented in various forms in VGAM. with distance association models (distassoc, scalassoc, singlepeaked, logithom), which also More Rasch model fitting packages are on their way. generalize many common IRT models. 21 22

In psychoR we have This covers most IRT models, and then some. n m k j There are also versions for marginal maximum β j! exp(η(xi, y j!)) yi j! log likelihood estimation, and for cross tables with k j i=1 j=1 !=1 β jν exp(η(xi, y jν)) frequencies in the form ! ! ! ν=1 k n m j " n m y log Φ(τ η(x , y )) i j! j! − i j! − yi j log λi j λi j, i=1 j=1 !=1 − ! ! ! !i=1 !j=1 Φ(τ j! 1 η(xi, y j! 1)) λ = α β exp(η(x , y )) − − − − i j i j i j

xi!y j, This generalizes CA, the RC model, Quasi- η(xi, y j) = x y , −# i − j#  2 Symmetry, and so on.  xi y j . −# − #  23 24  3. Factor Analysis (see also under SEM) 4. Three-mode Analysis factanal in stats can do exploratory maximum PTAk has various forms of k-mode component likelihood factor analysis. analysis or singular value decomposition, popular in both psychometrics, chemometrics, and fMRI MCMCpack has some options for sampling from analysis. the posterior for ordinal and mixed factor models. These are related to IRT. Although there is a three-mode slot in the psychoR project, currently PTAk seems to cover homals can do various forms of mixed data most of the useful analysis. principal component analysis, which the French sometimes call FA. See also FactoMineR.

25 26

5. Structural Equations Models psychoR has a slot for least squares SEM. Find a patterned matrix A of coefficients and a matrix of sem fits SEM's using the RAM specification. transformed (quantified and standardized) This is quite general, and allows one to specify variables B such that arbitrary path models with observed and latent n variables. min min tr A#B#BA A A B K S ∈ ∈ ∩ i=1 In order to compete with the stand-alone ! programs sem may need various constraints, Some of the blocks in B can also be "latent confirmatory analysis, asymptotically distribution variables", which basically means they are free methods, ordinal variables, and hierarchical completely missing and are only defined by the structures. orthogonality constraints.

27 28 6. Multidimensional Scaling psychoR has metric and non-metric least squares multidimensional scaling, including unfolding There is non-metric MDS in MASS, labdsv, individual difference models, using the SMACOF ecodist, vegan and xgobi/ggobi. These are all majorization algorithm. Kruskal-type least squares loss function using m n n 2 step-size gradient optimization methods. wi jk(δi jk di j(Xk)) − !k=1 !i=1 !j=1 There is classic (Torgerson) metric MDS in stats, It also has least squares squared-distance and Principal Coordinate Analysis (Gower) in multidimensional scaling, using either the ecodist, ade4, labdsv, and vegan. ALSCAL or the ELEGANT algorithm. m n n w (δ2 d2 (X ))2 i jk i jk − i j k !k=1 !i=1 !j=1 29 30

We do not discuss HLM and LogLin because they are mostly outside Psychometrics.

In any case, it seems that quite a few procedures (in many cases packages) are available, and more are coming on line regularly.

It seems that providing more options and better plots will pay off in the long run, but GUI's and spreadsheet data editors (for instance, a diagram editor for SEM) also seem to be a necessary condition for acceptance.

31