Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016)

© 2019 SAGE Publications Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016)

Student Guide

Introduction This dataset example introduces multinomial logit. This technique allows researchers to evaluate whether a with three or more unordered categories is a function of one or more independent variables. Examples of unordered categorical variables include gender, race, and birthplace. Taking race for example, it can take values in African, American, Asian, White (any maybe others depending on the context), but those values don’t follow a general numerical order, and hence they are unordered categories. The multinomial logit model is most commonly estimated via Maximum Likelihood Estimation (MLE).

This example describes multinomial logit, discusses the assumptions underlying it, and shows how to estimate and interpret multinomial logit models. We illustrate multinomial logit using a subset of data from the 2016 General Social Survey (http://gss.norc.org/). Specifically, we test whether a 5-category measure of employment status is predicted by gender, age, and education. An analysis like this allows researchers to evaluate factors that influence labor force status, which may be useful in policy designs.

What Is Multinomial Logit? Multinomial logit models explain variation in a categorical variable that consists of three or more unordered categories as a function of one or more independent

Page 2 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 variables. Categories for the dependent variable need not follow any order (if they do, you can still estimate a multinomial logit model without biasing your results, but you might consider an model as a more statistically efficient alternative). Multinomial logit models estimate every possible two-way comparison of categories on the dependent variable, which means the number of parameters to estimate increases rapidly as the number of categories for the dependent variable increases. Multinomial logit models are therefore typically used when the dependent variable has 3 to 5 unordered categories. More than that, researchers often consider combining some categories or imposing some other restrictions. If the dependent variable only has two categories, the multinomial logit model reduces to simple logit.

Multinomial logit is one example from the family of Generalized Linear Models (GLMs). GLMs connect a linear combination of independent variables and estimated parameters – often called the linear predictor – to a dependent variable using a link function. The link function typically involves some sort of non-linear transformation, which in the case of multinomial logit means that the probabilities that a given observation in the dataset falls into each of the categories of the dependent variable are non-linear functions of the independent variables. The parameters of GLMs are typically estimated using MLE. Because multinomial logit models are estimated via MLE, it is best if the dataset has a sufficiently large number of observations. Just how many is open to debate, but in his book Regression Models for Categorical and Limited Dependent Variables (SAGE, 1997), J. Scott Long suggests trying to meet two criteria: (1) have at least 100 observations total, and (2) have at least 10 observations for each coefficient estimated in the model.

In simple terms, MLE is an iterative process that approximates estimates for the coefficients that maximize the fit of the model to the sample of data. By maximizing fit, MLE also minimizes the unexplained variance in the dependent variable. In that

Page 3 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 sense, MLE accomplishes the same objective as ordinary (OLS) does for standard regression.

When computing statistical tests, it is customary to define the null hypothesis (H0) to be tested. In multinomial logit, the standard null hypothesis is that each coefficient is equal to zero. The actual coefficient estimates will not be exactly equal to zero in any particular sample of data, simply due to random chance in sampling. The t-tests conducted to test each individual coefficient are designed to help determine whether the coefficients are different enough from zero to be declared statistically significant. “Different enough” is typically defined as producing a test statistic with a level of statistical significance, or p-value that is less than .05. This would lead us to reject the null hypothesis (H0) that the coefficient in question equals zero.

Assumptions Behind the Method Nearly every or test relies on some underlying assumptions, and they are all affected by the mix of data you happen to have. Different textbooks present the assumptions for a multinomial logit model in different ways. Here are the key factors to consider when estimating a multinomial logit:

• The dependent variable must consist of categories which are assumed to be unordered; the categories could be ordered but the ordering information will be ignored. • The model is correctly specified (e.g., we have the right independent variables in the model properly measured). • The values of the independent variables are fixed in repeated samples. • The individual residuals are independent of each other and follow a logistic distribution. • Because it is generally estimated via MLE, multinomial logit regression requires moderate to large sample sizes.

Page 4 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 While not a formal assumption, researchers should consider how many parameters a multinomial logit model estimates. For each additional independent variable, a multinomial logit must estimate J − 1 intercepts and J − 1 slope coefficients. Similarly, for each additional category on the dependent variable, a multinomial logit model must estimate another full set of slopes for all of the independent variables plus another intercept. Thus, the size and complexity of a multinomial logit model grows quickly as the number of categories on the dependent variable or the number of independent variables increase.

Estimating a Multinomial Logit Model One way to understand the multinomial logit model is as an extension of the simple logit model. The simple logit model is designed to evaluate whether values of the independent variables in a model help in sorting observations on the dependent variable into one of two categories. The multinomial logit extends this logic to sorting observations into one of three or more categories on the dependent variable. One of the categories of the dependent variable is selected as the baseline, and then parameters are estimated that predict the probability of being in each of the remaining categories compared to the baseline.

Suppose we have a dependent variable Y with categories labeled A, B, and C that we believe is affected by values of an independent variable named X. We might arbitrarily select category A as the baseline category. If we then estimate a multinomial logit model, we will estimate an intercept and slope that describes how X is related to the probability of an observation being in Category B versus A and another intercept and slope that describes how X is related to the probability of an observation being in Category C versus A. In other words, a multinomial logit model where Y takes on three values is similar to simultaneously estimating two simple logit models. We say “similar” because the parameter estimates of a multinomial logit model are constrained (appropriately so) by the requirement that

Page 5 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 the probability of an observation being in Category A, B, or C must sum to 1. Because of this restriction, if we know how Category B compares to A and how Category C compares to A, we know by definition how Category B compares to Category C.

It is important to select an appropriate baseline category in your analysis, as it will allow for more straightforward interpretations of the results. For example, if you are modeling the side effects of a drug as a function of age, gender, race, etc., then it may be helpful to choose “no side effect” as the baseline category. Assuming the other two possible side effects are headache and cough, then a positive effect from age on headache versus the baseline (i.e., no effect) can be interpreted as older people being more likely to have the side effect of headache. Now consider another case when you choose “cough” as the baseline; then a positive effect from age on headache versus cough is not as easy to understand nor informative as the earlier result with no effect as the baseline.

Multinomial logit models still express the dependent variable as a function of one or more independent variables. We can start with the linear predictor that includes a single independent variable as shown in Equation (1):

(1)

ηij = β0j + β1jX1i

The subscript i refers to individual observations and the subscript j refers to one of the categories of the dependent variable.

Next, we need a link function for the multinomial logit model that shows how the linear predictor relates to the probability that the dependent variable falls into category j relative to a baseline category. For simplicity, we will assume categories are numbered 1 through J, and we will set category 1 as the baseline:

Page 6 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 (2)

pij ηij = ln , j = 2, ⋯ , J pi1

The link function in Equation (2) expresses the linear predictor from Equation (1) as the log of the odds of an observation falling into category j relative to falling in Category 1. Finally, we can use the inverse of the link function to express the probability that the dependent variable falls into any particular category as follows in Equation (3):

(3)

exp(ηij) pij = J 1 + exp(ηij) ∑j = 2

Here is the summary of notations used above:

• ηi1 = 0. • ηij = the linear combination of the independent variables, or the linear predictor. • X1i = individual values of the first independent variable. • β0j = the intercept, or constant, associated with the multinomial logit model comparing category j to the baseline category. • β1j = the coefficient operating on the first independent variable comparing category j to the baseline category. • pij = the probability that the dependent variable falls in category j for observation i. • ln = the natural logarithm. • exp = the exponential function, which serves as the base for the natural log.

Page 7 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 Researchers have values for the dependent and independent variables in their datasets – they use MLE to estimate the β coefficients. Unlike standard multiple regression, the β coefficients from a multinomial logit model cannot be directly interpreted as slope coefficients that describe the marginal effect of each independent variable on the probability that Y falls into any particular category. Interpreting the coefficient estimates of a multinomial logit model is more complicated and is something described below in the context of a specific example.

Illustrative Example: Employment Status This analysis examines whether gender, age, and education predict employment status. The specific research questions are

• Controlling for age and education, are women more or less likely than men to be employed? • Controlling for gender and education, are people of different ages more or less likely to be employed? • Controlling for gender and age, are people of different education levels more or less likely to be employed?

Each of these research questions could be stated in the form of a null hypothesis:

H0a= Controlling for age and education, gender has no effect on employment status. H0b = Controlling for gender and education, age has no effect on employment status. H0c = Controlling for gender and age, education has no effect on employment status.

Page 8 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1

The Data This example uses a subset of data from the 2016 General Social Survey (http://gss.norc.org/). We use several variables:

• Employment status (WRKSTAT): possible values are Keeping house, Retired, Temp not working, Working full-time and Working part-time. • Age (AGE): a continuous variable. • Education (DEGREE): Highest degree earned; it is an ordinal variable with possible values: 1 = Little high school, 2 = High school, 3 = Junior college, 4 = Bachelor, 5 = Graduate. • Gender (SEX): male or female.

There are 2,681 subjects in this sample. Responses for the dependent variable (WRKSTAT) are categorical but not really ordered, making this example appropriate for multinomial logit.

Analyzing the Data Before proceeding to the multinomial logit regression, it is a good idea to examine the distribution of the dependent variable. Its frequency distribution is shown in Figure 1.

Figure 1: Frequency Distribution of the Employment Status.

In this figure, we can see that 1,321 people was working full-time, 345 part- time, 175 not working, and so on. Multinomial logit models do not perform as well if there are small numbers of observations in one or more of the categories of the dependent variable or if there is a substantial skew in the distribution of

Page 9 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 observations across the categories. We have neither of those problems here.

It would also be valuable to produce summary and explore the distributions of each of the independent variables as well. However, in the interest of space, we will forgo doing so now.

The results of the multinomial logit model itself are presented in Figure 2.

Figure 2: Summary of the Multinomial Logit Regression Model.

In this example, we focus our attention on the individual coefficient estimates linking the independent variables to the dependent variable and their corresponding level of statistical significance.

The table in Figure 2 reports the individual parameter estimates, their estimated standard errors, and indicators of statistical significance. Each variable name is

Page 10 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 followed by a:1 or a:2, etc., which indicates that the coefficient in question is relevant to predicting the probability of an observation falling in the first category of the dependent variable (Keeping house) compared to the last category (Working part-time), or to predicting the probability of an observation falling into the second category (Retired) compared to the last category, and so on. In other words, “Working part time” is serving as the base category for this analysis. The numbers after the colon (e.g.,:1 or:2) should not be confused with the actual values of the dependent variable.

We can see that not all coefficient estimates are statistically significantly different from zero. Here are several typical observations to be made from the table:

• For age, only AGE:1 and AGE:2 have coefficients that are significant, meaning that age is significantly associated with employment status 1 (Keeping house) and 2 (Retired) relative to the baseline (Working part- time). Those coefficients are also positive, suggesting that older people are more likely to be “Keeping house” or “Retired” than younger people do, when the reference category is “Working part-time.” • All coefficients for gender (SEXmale:1 through SEXmale:4) are significant, indicating that gender is associated with all categories of employment status. Only the coefficient for SEXmale:1 is negative, suggesting that compared to female, male are less likely to be “Keeping house,” while male are more likely to be in other employment status, when the reference category is “Working part-time.” • Two degree terms are significant with the coefficient of DEGREE:1 being negative and that of DEGREE:4 positive, suggesting that people with higher degree are less likely to be “Keeping house,” and more likely to be “Working full-time,” when the reference category is “Working part time.”

However, because the selection of a base category is arbitrary, whether individual

Page 11 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 coefficient estimates are statistically significant or not can be as well – they can change depending on the selection of the base category. That, combined with the complexity of the results presented in Figure 2 and the non-linearity inherent in the multinomial logit model, makes direct interpretation of multinomial logit coefficient estimates difficult and of limited value. We explore some of the findings in greater detail through computing predicted probabilities.

Predicted Probabilities We can compute the predicted probability of a respondent falling into the various employment categories based on the results in Figure 2 using the inverse link function as shown previously in Equation (3). Because the relationship between all of the independent variables and the probability that Y falls into a particular category is nonlinear, you can only compute a predicted probability by setting every independent variable in the model to some specific value.

For example, to compare the predicted probabilities of falling into each of the five categories on the dependent variable among male and female, we need to set the value for the gender variable to the appropriate value, and we need to set all of the other independent variables to some fixed value as well. The most common strategy is to set the remaining variables to central measures such as their means, medians, or modes. An alternative is to compute the predicted probability for each observation based on its own values for its independent variables, but this makes it harder to isolate the potential effect of any one independent variable. In order to keep it simple, we consider male and female separately and keeping other independent variables at their means.

Figure 3 reports the results of estimating these predicted probabilities for male and female using post-estimation simulation. A full discussion of this process is beyond the scope of this example, but briefly, the process computes 1,000 sets of predicted probabilities by simulating values for the model coefficients based

Page 12 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 on their estimated values, variances, and covariances. For more information, see “Making the most of statistical analyses: improving interpretation and presentation” by King, Tomz, and Wittenberg (American Journal of Political Science, 44 (2): 341–355).

Figure 3: Estimated Predicted Probability of Falling Into Each of Five Employment Categories by Male (Top) and Female (Bottom) Holding the Remaining Independent Variables as Constant.

The results include estimated means, standard deviations, medians (50%), and the 2.5 and 97.5 percentiles for the 1,000 simulated expected values of the dependent variable. This is generally where researchers focus their attention. At the bottom, the results also include a mean for the predicted proportion of observations that would fall into each category of the dependent variable.

Page 13 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 Tables in Figure 3 are helpful when you only need to compute a small number of predicted probabilities to interpret the findings of the model. The best way to explore the impact of a continuous independent variable or an independent variable that takes on many values is to compute the predicted probability of falling into one of the employment categories based on values of the independent variable in question and present the results graphically. Figure 4 presents the predicted probability of “Keeping house,” along with confidence intervals, as a function of age, holding all other independent variables constant. The solid blue curve at the center of the plot is the predicted probabilities, and lighter grey curves around it are upper and lower confidence limits at various significance levels (0.8, 0.95, and 0.99). We can see that the probability of “Keeping house” increases with age slightly from 20 to 60 and starts to decrease as age continues to increase.

Figure 4: Predicted Probability of “Keeping house,” Along With Confidence Intervals, as a Function of Age, Holding All Other Independent Variables Constant.

Page 14 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 Complete interpretation of the results of a multinomial logit model would present similar tables or figures for every independent variable in the model.

Presenting Results The results of a multinomial logit model can be presented in a variety of ways. Here, we offer one example.

“We used a subset of data from the 2016 General Social Survey to test the following null hypotheses:

H0a= Controlling for age and education, gender has no effect on employment status. H0b = Controlling for gender and education, age has no effect on employment status. H0c = Controlling for gender and age, education has no effect on employment status.

There are 2,681 subjects in this sample. Results from the multinomial logit model are presented in Figure 2. With “Working part-time” as the reference category, those results show that (1) older people are significantly more likely to be “Keeping house” or “Retired,” (2) compared to female, male are less likely to be “Keeping house,” while more likely to be in other employment status, and (3) people with higher degree are less likely to be “Keeping house” and more likely to be “Working full-time.” Further interpretation and diagnostic testing should be explored to evaluate the robustness of these findings.”

Review Multinomial logit expresses a categorical dependent variable as a function of one or more independent variables. Multinomial logit models are estimated via MLE. Direct interpretation of the coefficient estimates is limited to specific pairs

Page 15 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 1 of comparisons and to whether the coefficients are positive, negative, or not statistically significant. To really understand the results of a multinomial logit model requires calculating predicted probabilities.

The multinomial logit model is very similar to the ordered logit model. Ordered logit simply assumes that the categories for the dependent variable follow some order and that a single set of parameters can be estimated across all categories. If either of those assumptions are violated, multinomial logit is more appropriate. There is also a multinomial , which shares some similarities to multinomial logit, but describing their differences and similarities extends beyond the scope of this example.

You should know:

• What types of variables are suitable for a multinomial logit model. • The basic assumptions behind the multinomial logit model. • How to estimate and interpret the results of a multinomial logit model. • How to report the results from a multinomial logit model.

Your Turn You can download the sample dataset along with a guide showing how to estimate a multinomial logit model using statistical software. See whether you can reproduce the results presented here, then repeat the analysis adding another independent variable (CHILDS), which is the number of children one has.

Page 16 of 16 Learn About Multinomial Logit Regression in R With Data From the General Social Survey (2016)