Multinomial Logistic Regression
Total Page:16
File Type:pdf, Size:1020Kb
26-Osborne (Best)-45409.qxd 10/9/2007 5:52 PM Page 390 26 MULTINOMIAL LOGISTIC REGRESSION CAROLYN J. ANDERSON LESLIE RUTKOWSKI hapter 24 presented logistic regression Many of the concepts used in binary logistic models for dichotomous response vari- regression, such as the interpretation of parame- C ables; however, many discrete response ters in terms of odds ratios and modeling prob- variables have three or more categories (e.g., abilities, carry over to multicategory logistic political view, candidate voted for in an elec- regression models; however, two major modifica- tion, preferred mode of transportation, or tions are needed to deal with multiple categories response options on survey items). Multicate- of the response variable. One difference is that gory response variables are found in a wide with three or more levels of the response variable, range of experiments and studies in a variety of there are multiple ways to dichotomize the different fields. A detailed example presented in response variable. If J equals the number of cate- this chapter uses data from 600 students from gories of the response variable, then J(J – 1)/2 dif- the High School and Beyond study (Tatsuoka & ferent ways exist to dichotomize the categories. Lohnes, 1988) to look at the differences among In the High School and Beyond study, the three high school students who attended academic, program types can be dichotomized into pairs general, or vocational programs. The students’ of programs (i.e., academic and general, voca- socioeconomic status (ordinal), achievement tional and general, and academic and vocational). test scores (numerical), and type of school How the response variable is dichotomized (nominal) are all examined as possible explana- depends, in part, on the nature of the variable. If tory variables. An example left to the interested there is a baseline or control category, then the reader using the same data set is to model the analysis could focus on comparing each of the students’ intended career where the possibilities other categories to the baseline. With three or consist of 15 general job types (e.g., school, more categories, whether the response variable manager, clerical, sales, military, service, etc.). is nominal or ordinal is an important consider- Possible explanatory variables include gender, ation. Since models for nominal responses achievement test scores, and other variables in can be applied to both nominal and ordinal the data set. response variables, the emphasis in this chapter 390 26-Osborne (Best)-45409.qxd 10/9/2007 5:52 PM Page 391 Multinomial Logistic Regression 391 is on extensions of binary logistic regression to ODDS models designed for nominal response vari- ables. Furthermore, a solid understanding of the The baseline model can be viewed as the set models for nominal responses facilitates master- of binary logistic regression models fit simulta- ing models for ordinal data. A brief overview of neously to all pairs of response categories. With models for ordinal variables is given toward the three or more categories, a binary logistic end of the chapter. regression model is needed for each (nonredun- A second modification to extend binary dant) dichotomy of the categories of the response logistic regression to the polytomous case is the variable. As an example, consider high school need for a more complex distribution for the program types from the High School and response variable. In the binary case, the distri- Beyond data set (Tatsuoka & Lohnes, 1988). bution of the response is assumed to be bino- There are three possible program types: aca- mial; however, with multicategory responses, = demic, general, and vocational. Let P(Yi aca- the natural choice is the multinomial distribu- = = demic), P(Yi general), and P(Yi vocational) tion, a special case of which is the binomial dis- be the probabilities of each of the program types tribution. The parameters of the multinomial for individual i. Recall from Chapters 24 and 25 distribution are the probabilities of the cate- that odds equal ratios of probabilities. In our gories of the response variable. example, only two of the three possible pairs of The baseline logit model, which is sometimes program types are needed because the third can also called the generalized logit model, is the be found by taking the product of the other two. starting point for this chapter because it is a Choosing the general program as the reference, well-known model, it is a direct extension of the odds of academic versus general and the binary logistic regression, it can be used with odds of vocational versus general equal ordinal response variables, and it includes explanatory variables that are attributes of indi- P(Y = academic) viduals, which is common in the social sciences. i (1a) ( = ) The baseline model is a special case of the condi- P Yi general tional multinomial logit model, which can P(Yi = vocational) include explanatory variables that are character- . (1b) P(Y = general) istics of the response categories, as well as attri- i butes of individuals. The third odds, academic versus vocational, A word of caution is warranted here. In the equals the product of the two odds in (1a) and literature, the term multinomial logit model some- (1b)—namely, times refers to the baseline model, and sometimes it refers to the conditional multinomial logit P(Y = academic) model. An additional potential source of confu- i P(Y = vocational) sion lies in the fact that the baseline model is i (2) ( = )/ ( = ) a special case of the conditional model, which in = P Yi academic P Yi general . turn is a special case of Poisson (log-linear) P(Yi = vocational)/P(Yi = general) regression.1 These connections enable researchers to tailor models in useful ways and test interesting More generally, let J equal the number of cate- hypotheses that could not otherwise be tested. gories or levels of the response variable. Of the J(J – 1)/2 possible pairs of categories, only (J – 1) of them are needed. If the same category is used in the denominator of the (J – 1) odds, then the set MULTINOMIAL REGRESSION MODELS of odds will be nonredundant, and all other pos- sible odds can be formed from this set. In the One Explanatory Variable Model baseline model, one response category is chosen The most natural interpretation of logistic as the baseline against which all other response regression models is in terms of odds and odds categories are compared. When a natural baseline ratios; therefore, the baseline model is first pre- exists, that category is the best choice in terms of sented as a model for odds and then presented as convenience of interpretation. If there is not a a model for probabilities. natural baseline, then the choice is arbitrary. 26-Osborne (Best)-45409.qxd 10/9/2007 5:52 PM Page 392 392 BEST PRACTICES IN QUANTITATIVE METHODS As a Model for Odds When fitting the baseline model to data, the binary logistic regressions for the (J – 1) Continuing our example from the High odds must be estimated simultaneously to School and Beyond data, where the general pro- ensure that intercepts and coefficients for all gram is chosen as the baseline category, the first other odds equal the differences of the corre- model contains a single explanatory variable, the sponding intercepts and coefficients (e.g., α mean of five achievement test scores for each 3 = (α − α ) and β = (β − β ) in Equation 5). student (i.e., math, science, reading, writing, and 1 2 3 1 2 To demonstrate this, three separate binary civics). The baseline model is simply two binary logistic regression models were fit to the High logistic regression models applied to each pair of School and Beyond data, as well as the base- program types; that is, line regression model, which simultaneously estimates the models for all the odds. The ( = | ) estimated parameters and their standard P Yi academic xi = α + β (3) exp[ 1 1xi ] errors are reported in Table 26.1. Although P(Y = general|x ) i i the parameters for the separate and simulta- and neous cases are quite similar, the logical rela- tionships between the parameters when the ˆ ( = | ) models are fit separately are not met (e.g., β P Yi vocational xi = α + β , ˆ 1 exp[ 2 2xi ] (4) −β= 0.1133 + 0.0163 = 0.1746 ≠ 0.1618); P(Y = general|x ) 2 i i however, the relationships hold for simulta- βˆ −βˆ + neous estimation (e.g., 1 2 = 0.1099 = = = where P(Yi academic|xi), P(Yi general|xi), 0.0599 0.1698). = and P(Yi vocational|xi) are the probabili- Besides ensuring that the logical relation- ties for each program type given mean ships between parameters are met, a second achievement test score xi for student i, the advantage of simultaneous estimation is that it α β js are intercepts, and the js are regression is a more efficient use of the data, which in turn coefficients. The odds of academic versus leads to more powerful statistical hypothesis vocational are found by taking the ratio of tests and more precise estimates of parameters. (3) and (4), Notice that the parameter estimates in Table 26.1 from the baseline model have smaller stan- dard errors than those in the estimation of P(Y = academic|x ) α + β j i = exp[ 1 1xi ] separate regressions. When the model is fit ( = | ) α + β simultaneously, all 600 observations go into the P Yj vocational xi exp[ 2 2xi ] estimation of the parameters; however, in the = exp[(α1 − α2) + (5) separately fit models, only a subset of the obser- (β − β ) 1 2 xi ] vations is used to estimate the parameters (e.g., 453 for academic and general, 455 for academic = exp[α + β x ], 3 3 i and vocational, and only 292 for vocational and general).