Ordinal Regression Models in Psychology: a Tutorial

Running head: ORDINAL MODELS IN PSYCHOLOGY 1 1 Ordinal Regression Models in Psychology: A Tutorial 1 2 2 Paul-Christian Bürkner & Matti Vuorre 1 3 Department of Psychology, University of Münster, Germany 2 4 Department of Psychology, Columbia University, USA 5 Author Note 6 Correspondence concerning this article should be addressed to Paul-Christian Bürkner, 7 Department of Psychology, University of Münster, Fliednerstrasse 21, 48149 Münster, 8 Germany. E-mail: [email protected] ORDINAL MODELS IN PSYCHOLOGY 2 9 Abstract 10 Ordinal variables, while extremely common in Psychology, are almost exclusively analysed 11 with statistical models that falsely assume them to be metric. This practice can lead to 12 distorted effect size estimates, inflated error rates, and other problems. We argue for the 13 application of ordinal models that make appropriate assumptions about the variables under 14 study. In this tutorial article, we first explain the three major ordinal model classes; the 15 cumulative, sequential and adjacent category models. We then show how to fit ordinal 16 models in a fully Bayesian framework with the R package brms, using data sets on stem cell 17 opinions and marriage time courses. Two appendices provide detailed mathematical 18 derivations of the models, and a third practical example that connects ordinal models to 19 Signal Detection Theory with confidence rating data. Ordinal models provide better 20 theoretical interpretation and numerical inference from ordinal data, and we recommend 21 their widespread adoption in Psychology. 22 Keywords: ordinal models, Likert items, signal detection theory, brms, R ORDINAL MODELS IN PSYCHOLOGY 3 23 Ordinal Regression Models in Psychology: A Tutorial 24 1 Introduction 25 Whenever a variable’s categories have a natural order, we speak of an ordinal variable 26 (Stevens, 1946). Ordinal data is ubiquitous in Psychology: Almost all data gathered with 27 questionnaires using Likert-type scales are ordinal. Assuming these variables to be metric is 28 problematic: As demonstrated by Liddell and Kruschke (2017), analysing ordinal data with 29 statistical models that assume metric variables, such as t-tests and ANOVA, can lead to low 30 correct detection rates, distorted effect size estimates, and inflated false alarm (type-I-error) 31 rates – a problem that cannot be solved by simply averaging over multiple ordinal items. 32 Historically, methods for analysing ordinal data were limited, although simple analyses – 33 such as the comparison between two groups – could be performed with non-parametric 34 approaches (Gibbons & Chakraborti, 2011). For more general analyses – regression-like 35 methods, in particular – there were few alternatives to incorrectly treating ordinal data as 36 either continuous or nominal. However, using a continuous or nominal model with ordinal 37 data leads to over- or under-estimating (respectively) the information provided by the data. 38 Therefore, whenever possible, researchers should use appropriate ordinal models instead. 39 Fortunately, recent advances in statistics and statistical software have provided many 40 options for approriate models of ordinal response variables. These methods are often 41 summarized under the term ordinal regression models. Nevertheless, application of these 42 methods remains limited, while the use of less appropriate linear regression models is 43 widespread (Liddell & Kruschke, 2017). Several reasons may underlie the persistence with 44 linear models for ordinal data: Researchers might not be aware of more appropriate methods, 45 or they may hesitate to use them because of the perceived complexity in applying or 46 interpreting them. Moreover, since closely related (or even the same) ordinal models are 47 referred to with very different names in different contexts, it may be difficult for researchers 48 to decide which model is most relevant for them. Finally, researchers may also feel compelled 49 to use “standard” analyses because journal editors and reviewers may be sceptical of any ORDINAL MODELS IN PSYCHOLOGY 4 50 “non-standard” approaches. Therefore, there is need for a review and practical tutorial of 51 ordinal models to facilitate their use in psychological research. This tutorial article provides 52 just that. 53 The structure of this paper is as follows. In Section 2, we briefly introduce three 54 common ordinal model classes. Section 3 is a practical tutorial on fitting ordinal models with 55 two sample data sets using the R statistical computing environment (R Core Team, 2017). 56 In Section 4, we further motivate the use of ordinal models, and provide practical guidelines 57 on selecting the appropriate model for different research questions and data sets. In two 58 appendices, we provide a detailed mathematical derivation and theoretical interpretation of 59 the ordinal models, and a third somewhat longer practical example that connects ordinal 60 models to Signal Detection Theory using confidence rating data. While the model classes we 61 describe are not new, the derivations, unifying notation, and software implementation 62 introduced in the present tutorial are new and hopefully guide researchers with the 63 application of ordinal models. 64 2 Ordinal model classes 65 A large number of parametric ordinal models can be found in the literature. 66 Confusingly, they all have their own names, and their interrelations are often unclear. 67 Fortunately, the vast majority of these models can be expressed within a framework of three 68 distinct model classes (Mellenbergh, 1995; Molenaar, 1983; Van Der Ark, 2001). These are 69 the Cumulative Model, the Sequential Model, and the Adjacent Category Model. Here, we 70 explain the rationale behind these models in sufficient detail to allow researchers to use them 71 and decide which model best fits their research question and data. A detailed mathematical 72 derivation and discussion is provided in Appendix A. In the following, we assume to have 73 observed an ordinal response variable Y with K + 1 categories from 0 to K. For example, Y 74 might be a set of responses to a Likert item with 5 response options (K = 4). 75 The cumulativel model (CM), sometimes also called graded response model (Samejima, ORDINAL MODELS IN PSYCHOLOGY 5 76 1997), assumes that the observed ordinal variable Y originates from the categorization of a 77 latent (not observable) continuous variable Y˜ . That is, there are K latent thresholds which 78 partition Y˜ into the K + 1 observable, ordered categories of Y . In the CM, we formulate a 79 linear model for the latent variable Y˜ and then transform it for use with the observed 80 ordinal variable Y . The categorization interpretation is natural for many Likert-item data 81 sets, where ordered verbal (or numerical) labels are used to get discrete responses about a 82 possibly continuous psychological variable. Due to the widespread use of Likert-items in 83 Psychology, the CM is possibly the most important ordinal model class for psychological 84 research. In Section 3.1, we illustrate its use in the context of a survey on stem cell research 85 opinions. In Appendix B, we use the CM to implement two Signal Detection Theoretic 86 models of recognition memory confidence ratings. 87 For many ordinal variables, the assumption of a single underlying, continuous variable 88 may not be fully appropriate. If the response can be understood as being the result of a 89 sequential process, such that a higher response category is possible only after all lower 90 categories are achieved, the sequential model (SM) as proposed by Tutz (1990) is usually 91 appropriate. For instance, a family can only decide to buy a second car if they had already 92 bought a first car in the past. Or, a couple can only decide to divorce in the 7th year if they 93 haven’t been divorced in their first six years of marriage. More formally, for every category k ˜ 94 we assume a latent continuous variable Yk mediating the transition between the kth and the 95 k + 1th category. The SM models those latent variables, and is especially useful for discrete 96 time data, as shown in Section 3.2. 97 The adjacent category model (ACM) directly predicts the decision between two 98 adjacent categories, instead of a continuous latent variable or a sequential process. This 99 model is also different to the CM and SM because there is no natural process leading to the 100 ACM, and it is instead chosen for its mathematical convenience. For instance, the ACM may 101 be a convenient choice if category-specific effects are of interest (i.e. a predictor having 102 different effects on different response categories). The ACM is the most widely used ordinal ORDINAL MODELS IN PSYCHOLOGY 6 103 model in item-response theory and is applied in many large scale assessment studies such as 104 PISA (OECD, 2017). Its use is illustrated in Section 3.1. 105 In all of these models, predictors are by default assumed to have the same effect on all 106 response categories, which may not always be an appropriate assumption. It is plausible that 107 a predictor may have, for instance, a higher impact on the lower categories of an item than 108 on its higher categories. In such a case, one can model predictors as having category specific 109 effects so that not one but K coefficients are estimated for this predictor. Doing so is 110 unproblematic in the SM and ACM, but may lead to negative probabilities in the CM and 111 thus problems in the model fitting. We will come back to this issue in the next section. An 112 overview of the three model classes, and how to apply them with the software package 113 described below, is shown in Box 1. 114 3 Fitting ordinal models in R 115 Although there are a number of software packages in the R statistical programming 116 environment (R Core Team, 2017) that allow modelling ordinal responses, here we will use 117 the brms package (Bürkner, 2017b, 2017a) for several reasons.

Ordinal Regression Models in Psychology: a Tutorial

Regression Models for Ordinal Responses: a Review of Methods and Applications

Ordinal Probit Functional Outcome Regression with Application to Computer-Use Behavior in Rhesus Monkeys

Regression Models for Ordinal Dependent Variables

Logistic, Multinomial, and Ordinal Regression

Gaussian Processes for Ordinal Regression

Support Vector Ordinal Regression

Ordinal Regression Models in Psychology: a Tutorial

Reduction from Cost-Sensitive Ordinal Ranking to Weighted Binary Classiﬁcation

Ordinal Logistic Regression Model: an Application to Pregnancy Outcomes

Ordinal Logistic and Probit Examples: SPSS and R

Ordinal Regression Earlier (“Analysis of Ordinal Contingency Tables”)

Multinomial and Ordinal Logistic Regression Analyses with Multi- Categorical Variables Using R