Using Generalized Ordinal Logistic Regression Models to Estimate Educational Data Xing Liu Eastern Connecticut State University, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Modern Applied Statistical Methods Volume 11 | Issue 1 Article 21 5-1-2012 Ordinal Regression Analysis: Using Generalized Ordinal Logistic Regression Models to Estimate Educational Data Xing Liu Eastern Connecticut State University, [email protected] Hari Koirala Eastern Connecticut State University Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Liu, Xing and Koirala, Hari (2012) "Ordinal Regression Analysis: Using Generalized Ordinal Logistic Regression Models to Estimate Educational Data," Journal of Modern Applied Statistical Methods: Vol. 11 : Iss. 1 , Article 21. DOI: 10.22237/jmasm/1335846000 Available at: http://digitalcommons.wayne.edu/jmasm/vol11/iss1/21 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Journal of Modern Applied Statistical Methods Copyright © 2012 JMASM, Inc. May 2012, Vol. 11, No. 1, 242-254 1538 – 9472/12/$95.00 Ordinal Regression Analysis: Using Generalized Ordinal Logistic Regression Models to Estimate Educational Data Xing Liu Hari Koirala Eastern Connecticut State University, Willimantic, CT The proportional odds (PO) assumption for ordinal regression analysis is often violated because it is strongly affected by sample size and the number of covariate patterns. To address this issue, the partial proportional odds (PPO) model and the generalized ordinal logit model were developed. However, these models are not typically used in research. One likely reason for this is the restriction of current statistical software packages: SPSS cannot perform the generalized ordinal logit model analysis and SAS requires data restructuring. This article illustrates the use of generalized ordinal logistic regression models to predict mathematics proficiency levels using Stata and compares the results from fitting PO models and generalized ordinal logistic regression models. Key words: Generalized ordinal logistic regression models, proportional odds models, partial proportional odds model, ordinal regression analysis, mathematics proficiency, stata, comparison. Introduction same across the categories of the ordinal Ordinal data in education are substantive. dependent variable. This means that for each Perhaps the most well-known model for predictor, the effect on the odds of being at or estimating an ordinal outcome variable is the below any category remains the same within the proportional odds (PO) model (Agresti, 1996, model. This restriction is referred to as the 2002, 2007; Anath & Kleinbaum, 1997; proportional odds, or the parallel lines, Armstrong & Sloan, 1989; Hardin & Hilbe, assumption. 2007; Long, 1997; Long & Freese, 2006; The assumption of proportional odds is McCullagh, 1980; McCullagh & Nelder, 1989; often violated, however, because it is strongly O’Connell, 2000, 2006; Powers & Xie, 2000). affected by sample size and the number of Current general-purpose statistical software covariate patterns – for example, including packages, such as SAS, SPSS and Stata, use this continuous covariates or interactions as the model as the default for ordinal regression predictors (Allison, 1999; Brant, 1990; analysis. The PO model is used to estimate the O’Connell, 2006). It is misleading and invalid to cumulative probability of being at or below a interpret results if this assumption is not tenable. particular level of a response variable, or being It has been suggested that the separate beyond a particular level, which is the underlying binary logistic regression models are complementary direction. In this model, the fitted and then are compared with the original effect of each predictor is assumed to be the PO model (Allison, 1999; Bender & Grouven, 1998; Brant, 1990; Clogg & Shihadeh, 1994; Long, 1997; O’Connell, 2000, 2006). Although this strategy would help researchers identify the Xing Liu is an Associate Professor of Research reason why the overall PO assumption is and Assessment in the Education Department. violated, it is not clear how a well-fitting Email him at: [email protected]. Hari Koirala parsimonious model with a violated PO is a Professor in the Education Department. assumption is developed and interpreted. Email him at: [email protected]. To address this issue, the partial proportional odds (PPO) model (Peterson & 242 XING LIU & HARI KOIRALA Harrell, 1990) and the generalized ordinal logit of predictors, such as, using computers for fun, model (Fu, 1998; Williams, 2006) were school work and to learn on their own. developed. The PPO model allows for Theoretical Framework interactions between a predictor variable that General Logistic Regression Model and the violates the PO assumption and different Proportional Odds Model categories of the ordinal outcome variable. The The binary logistic regression model analysis of a PPO model using SAS GENMOD estimates the odds of success or experiencing an procedure requires a restructured data set, which event for the dichotomous response variable includes a new binary variable indicating given a set of predictors. The logistic regression whether an individual is at or beyond a model can be defined as (Allison, 1999; Menard, particular level (O’Connell, 2006; Stokes, Davis 1995): & Koch, 2000). The generalized ordinal logit model ln(Y′ ) = logit [π(x)] developed by Fu (1998) and William (2006) π()x relaxes the PO assumption by allowing the effect = ln (1) − of each explanatory variable to vary across 1 π()x different cut points of the ordinal outcome =+ + +…+ variable without data restructuring. In addition, αβ11X β 2X 2β pX p this model estimates parameters differently from the PPO model using SAS. Williams’ gologit2 In an ordinal logistic regression model, program (2006) for Stata is a more powerful the outcome variable has more than two levels. extension of Fu’s gologit (1998); it can estimate It estimates the probability being at or below a the generalized ordered logit model, the PPO specific outcome level given a collection of model, the PO model and the logistic regression explanatory variables. The ordinal logistic model within one program. regression model can be expressed in the logit In educational research, the PO model is form (Liu, 2009; Long, 1997; Long & Freese, widely used. However, the use of the 2006) as follows: generalized ordinal logit model appears to be overlooked even in cases where the PO ln(Y′ ) = logit [π(x)] assumption is violated. One likely reason for this j is the restriction of current statistical software π()x packages: SPSS cannot perform the generalized = ln − ordinal logit model analysis and SAS requires 1 π()x data restructuring prior to data analysis, =+−αβ()X −β X −…−β X therefore, it is important to help educational j1122pp researchers better understand this model and (2) utilize it in practice. The purpose of this study is to illustrate where πj(x) = π(Y ≤ j | x1,x2, …, xp) is the the use of generalized ordinal logistic regression probability of being at or below category j, given models to predict mathematics proficiency levels a set of predictors, j =1, 2, …, J−1, αj are the cut using Stata and to compare the results of fitting points and β1, β2, …, βp are logit coefficients. PO and the generalized ordinal logistic When there are j categories, the PO model regression models. This article is an extension of estimates J-1 cut points. This PO model assumes previous research focusing on the PO model that the logit coefficient of any predictor is (Liu, 2009), and the Continuation Ratio model independent of categories, i.e., the coefficients with Stata (Liu, O’Connell, & Koirala, 2011). for the underlying binary models are the same Ordinal regression analyses are based on data across all cutpoints. The equal logit slope or the from the 2002 Educational Longitudinal Study proportional odds assumption can be assessed by (ELS) in which the ordinal outcome of students’ the Brant test (Brant, 1990), which estimates mathematics proficiency was forecast from a set logit coefficients for underlying binary logistic regressions, and provides the chi-square test 243 GENERALIZED ORDINAL LOGISTIC REGRESSION MODELS statistics for each predictor and the overall logit [π()Y > j| x1,x2,…,x ] model in Stata. p To estimate the ln (odds) of being at or π()Y > j| x12 , x ,...,x p below the jth category, the PO model can be = ln π()Y≤ j| x , x ,...,x rewritten as the following form: 12 p =+α (β X +β X +…+β X), logit [π()Yj|x1,x2,,x]≤… j1j12j2 pjp p (5) ≤ π()Y j | x12 , x ,..., x p = ln π()Y> j | x , x ,..., x α 12 p where, in both equations, j are the intercepts or cutpoints, and β , β , …, β are logit =+−−α ( β X β X −…−β X). 1j 2j pj j1122pp coefficients. This model estimates the odds of (3) being beyond a certain category relative to being at or below that category. A positive logit Thus, this model predicts cumulative logits coefficient generally indicates that an individual across J−1 response categories. The cumulative is more likely to be in a higher category as logits can then be used to calculate the estimated opposed to a lower category of the outcome cumulative odds and the cumulative variable. To estimate the odds of being at or probabilities being at or below the jth category. below a particular category, however, the signs Different