SUPPLEMENTAL MATERIAL

Appendix 1: Annotated output for Appendix 2: Annotated output for binomial regression Appendix 3: Annotated output for proportional odds logistic regression Appendix 4: Annotated output for multinomial regression Appendix 5: Annotated output for Appendix 6: Special cases of Poisson Regression

Appendix 1: Annotated output for logistic regression

Understanding how to interpret the output R presents can be challenging. The following figures are designed to help demystify the R output for Logistic regression. All and code can be found here: https://github.com/ejtheobald/BeyondLinearRegression. In addition, a table, akin to what a researcher might publish, is displayed based on the R output.

Figure A1.1: Logistic regression annotated output

Presenting logistic regression results in manuscripts

Table A1.1: Regression coefficients (β), (SE), p-value, and odds-ratio for the logistic regression predicting students reporting being likely to take a mathematical modeling in biology course. The odds of a student reporting being likely to take a mathematical modeling in biology course increased with increasing interest in using math to understand biology and decreased with increasing cost of incorporating math into biology courses. Additionally, the odds of a fourth-year student reporting being likely to take a mathematical modeling course are lower than those of a first-year student.

Predictor β SE p-value Odds-ratio Interest 0.71 0.07 < 0.0001 2.04 Utility value 0.11 0.08 0.18 1.12 Cost -0.27 0.06 <0.0001 0.77 Gender (ref: Male) Female -0.22 0.19 0.26 0.81 Year in School (ref: 1st-year) 2nd year -0.16 0.23 0.49 0.85 3rd year -0.19 0.22 0.39 0.82 4th-year -0.52 0.24 0.03 0.60 Highest High School Math Course (ref: Calculus) Pre-calculus -0.03 0.21 0.87 0.97 Algebra/Geometry -0.06 0.38 0.87 0.94 Stats 0.11 0.34 0.75 1.12

Appendix 2: Annotated output for binomial regression

Understanding how to interpret the output R presents can be challenging. The following figures are designed to help demystify the R output for binomial regression. All data and code can be found here: https://github.com/ejtheobald/BeyondLinearRegression. In addition, a table, akin to what a researcher might publish, is displayed based on the R output.

Figure A2.1: Binomial regression annotated output

Presenting binomial regression results in manuscripts Table A2.1: Regression coefficients (β), standard error (SE), p-value, and odds-ratio for the binomial regression predicting the odds of a student completing an optional practice exam. The odds of a student completing a practice exam decreased with increasing GPA. Additionally, the odds of a female student completing a practice exam are higher than those of a male student.

Predictor β SE p-value Odds-ratio GPA -1.40 0.09 < 0.0001 0.25 Gender (ref: Male) Female 0.17 0.06 < 0.0001 1.19 First-generation (ref: Continuing-generation) First-generation -0.04 0.09 0.64 0.96

Appendix 3: Annotated output for proportional odds logistic regression

Understanding how to interpret the output R presents can be challenging. The following figures are designed to help demystify the R output for proportional odds logistic regression. All data and code can be found here: https://github.com/ejtheobald/BeyondLinearRegression. In addition, a table, akin to what a researcher might publish, is displayed based on the R output.

Figure A3.1: Proportional Odds Logistic Regression annotated output

Presenting proportional odds results, using , in manuscripts

When model selection is conducted to test hypotheses, it is important to indicate the starting model, the best model, and the comparison of the best model to the null model (to show how much better the best model fits; done here with ΔAIC). Here is one way to do that for the hypothesis we tested.

Table A3.1: Students were less likely to report a dominator after the interactive activity, compared to the constructive activity. Additionally, students with higher course grades were less likely to report a dominator than students with lower course grades. The table show odds ratios.

Outcome Ethnicity1 Activity Type2 Course Grade ΔAIC3 Dominator4 Asian American5 1.67 0.56 0.76 20.207 International 3.35 URM 1.39 1 Reference group is White students 2 Reference group is Constructive activity; effect shown of Interactive activity. 3 In comparison to the null model 4 Outcome was measured on a 5-point Likert scale. Higher numbers indicate more agreement that one person dominated the group. 5 Bold face coefficients show statistically significant relationships but note that interpreting t- values for significance testing is unreliable with a small sample. 6 Units are shown as odds ratios.

Appendix 4: Annotated output for multinomial regression

Understanding how to interpret the output R presents can be challenging. The following figures are designed to help demystify the R output for multinomial regression. All data and code can be found here: https://github.com/ejtheobald/BeyondLinearRegression. In addition, a table, akin to what a researcher might publish, is displayed based on the R output.

Figure A4.1: Multinomial regression annotated R output

Presenting multinomial regression results in manuscripts

Table A4.1: Model selection using backwards model selection and the likelihood ratio test. In each comparison the bolded term is the one being tested.

Degrees Likelihood Models of Ratio p-value

Freedom Gender + Class Standing + University GPA + Gender x Class Standing + Gender x University GPA vs. 8 11.1 0.195 Gender + Class Standing + University GPA + Gender x University GPA

Gender + Class Standing + University GPA + Gender x University GPA 4 9.6 0.048 vs. Gender + Class Standing + College GPA

Gender + Class Standing + University GPA + Gender x University GPA 8 5.2 0.736 vs. Gender + University GPA + Gender x University GPA

Gender + University GPA + (Gender x University GPA) vs. 20 50.0 < 0.0001 Null (no predictors) Best model: Gender + University GPA + (Gender x University GPA)

Table A4.2: Regression coefficients from multinomial regression exploring the impact of gender, class standing, and university GPA on the role students assume in groups. Collaborator is the reference level for the outcome variables, so all the other roles are compared to it and the estimates are the log odds of being in that role versus being a collaborator. Table shows estimates, standard error, and p-value from the Wald statistic (in parentheses).

Gender x Outcome Gender: Male University GPA Intercept University GPA (ref: Female) at start of class (ref: Female) -1.17 ± 0.272 1.4 ± 0.366 -1.9 ± 0.723 2.7 ± 0.903 Leader vs. Collaborator (< 0.001) (< 0.001) (0.009) (0.002)

Listener vs. Collaborator -1.54 ± 0.308 0.12 ± 0.522 -1.2 ± 0.839 0.76 ± 1.13 (< 0.001) (0.812) (0.156) (0.502)

Recorder vs. Collaborator -1.75 ± 0.354 -1.0 ± 0.848 0.53 ± 0.955 0.63 ± 2.06 (< 0.001) (0.227) (0.578) (0.760)

Other vs. Collaborator -1.29 ± 0.286 -0.12 ± 0.514 -2.0 ± 0.753 1.2 ± 1.03 (< 0.001) (0.820) (0.007) (0.255)

Summarizing the output from the effects package verbally example. Figure 2D visually summarizes the output from the effects package, but sometimes researchers may want to explain a particular variable in more detail. Here we present how one might write up the effects package output for the gender x GPA found in our example.

Women with a college GPA that is 0.25 points below the have a 42% chance of reporting being a collaborator, at the mean they have a 50% chance and 0.5 points above the mean they have a 64% chance (table below). Males with the same of GPAs do not see the upward shift in percent chance of being a collaborator: at -0.25 points below mean: 37%; at mean: 36%; 0.5 points above mean: 29%). Instead, as GPA increases males become increasingly likely to report preferring to be leaders: -0.25 points below mean: 38%; at mean GPA: 45%; 0.5 points above mean: 57%. The table below shows these probabilities as well as the 95% confidence intervals (shown in parentheses). No other outcome categories see shifts based on gender (as indicated by the 95% on all their estimates overlapping).

Table A4.3: The probability of preferring to be a collaborator varied with student reported gender and with GPA. Mean GPA Mean GPA Mean GPA - 0.25 pts + 0.5 pts Collaborator 42% 50% 64% Women (31 – 53%) (42 – 60%) (50 – 76%)

Men 37% 36% 29% (27 – 50%) (26 – 46%) (18 – 45%) Leader 38% 45% 57% Men (28 – 51%) (34 – 56%) (41 – 71%)

Appendix 5: Annotated output for Poisson regression

Understanding how to interpret the output R presents can be challenging. The following figures are designed to help demystify the R output for Poisson regression. All data and code can be found here: https://github.com/ejtheobald/BeyondLinearRegression. In addition, a table, akin to what a researcher might publish, is displayed based on the R output.

Figure A5.1: Poisson Regression annotated output

Presenting Poisson regression results in manuscripts Table A5.1: Regression coefficients (β), standard error (SE), p-value, and the effect of the predictor on the outcome variable (eβ) for the Poisson regression predicting the number of times students raise their hands in class. The number of times a student raises their hand in class increases if they have higher total exam points and if they are a physics major.

Predictor β SE p-value eβ Total Exam Points 0.004 0.002 < 0.01 1.004 Major (ref: Physics major) Non-physics major -0.87 0.06 < 0.0001 0.42 Appendix 6: Overdispersion and Poisson regression

Special Cases of the Poisson Model Overdispersion: Negative Binomial Model As discussed in the main text, the Poisson model assumes that the of the response variable equals its mean. When the variance is greater than the mean, the data are overdispersed. Using overdispersed in a Poisson model will lead to an underestimation of standard errors [85], which can lead to incorrectly reporting a predictor as significantly affecting the outcome. Therefore, when overdispersed count data are encountered, a special case of the Poisson can be used: the negative binomial. The negative binomial model has been used when measuring the number of articles published as a measure of research productivity [86], the number of credits a student has enrolled in during a semester [84], the number of teaching resources provided to a faculty member [78], and the number of procedures that students have used when solving biology problems [87]. To determine whether count data are overdispersed, the researcher can compare the mean and variance of their response variable [88]. Overdispersion can be formally tested for by first conducting a negative binomial model regression using the glm.nb function in the MASS package [57], and then conducting an overdispersion test on that model using the function odTest (see main text and example in Appendix 5). Furthermore, the AIC value from the negative binomial model can be compared to the AIC value of the Poisson model. If the AIC value of the negative binomial is lower by at least 2, then the negative binomial model fits the data better than the Poisson [44]. The negative binomial model is similar to the Poisson in that it models the natural log of count data as a linear function of a set of predictors. However, in a negative binomial the variance of the response variable is not fixed at the mean (Var(Y) = 휇 in Poisson models), but rather is a function of the mean and a dispersion parameter, 훼 (Var(Y) = 휇 + 훼휇2) [13]. When there is no dispersion (훼 = 0), the model reverts to a Poisson. The output from glm.nb is similar to the output from a Poisson model. The regression coefficients, their standard errors, z-, and p-values are all shown. Furthermore, similar to interpreting results from a Poisson, raising e to the power of the regression coefficients will make it possible to understand how a one-unit change in a continuous predictor variable (or a specific group compared to the reference group in a ) effects the count response data.

Excess Zeros: Hurdle and Zero-Inflated Models Data sets may have an excess number of zeros compared to what would be expected under Poisson or negative binomial models, leading to a special case of overdispersion. Overdispersion by excess zeros can be handled through either hurdle models or zero-inflated models. Cruce and Hillman [89] used hurdle models to model the number of hours an adult has participated in coursework because many of the adults they sampled had not participated in any coursework. Hurdle models have also been used to model the number of publications by science researchers, in which many researchers had zero publications [83]. A zero-inflated Poisson model was used to examine the number of graphics in science trade books [90]. Both the hurdle model and the zero-inflated model utilize two-part models: one model for the excess zeros and one model for the count data [85]. The two-part model assumes that there are separate processes underlying the excess zeros and the count data. However, hurdle and zero- inflated models conceptually differ in how the zeros are modeled. Hurdle models assume that all zeros are “structural” zeros, meaning that there is one structure, or process, that results in subjects having either a zero count or a positive count [91]. Therefore, the first model is a logistic regression of the probability of a positive count (versus a zero count), or in other words, the probability that the “hurdle is crossed” [85,88]. The second model then uses a zero-truncated Poisson or negative binomial regression to model the positive count data [85,88]. Zero-inflated models, on the other hand, assume that some zeros are structural zeros, whereas other zeros are “ zeros,” or true zeros due to chance as part of the underlying Poisson model [91]. Therefore, the first model is a logistic regression of the probability of a structural zero, and the second model is a standard Poisson or negative binomial regression that includes sampling zeros as part of the count data [85,88]. Determining whether to use a hurdle or zero-inflated model depends on the experimental design and source of the zeros. One way in which to conceptualize this is to think about whether the subjects all have the potential to have positive count response data. If they do, there is only one underlying process explaining why a subject had zero count data. In these cases a Hurdle model is appropriate. However, if there is a barrier that prevents some subjects from even the possibility of attaining positive count data, there are two underlying processes competing to explain why a subject had zero count data: it was not possible for them to attain a positive count or it was possible for them to attain a positive count, but they did not. In these cases, the zero- inflated model would be appropriate when there is more than one underlying explanation for the zeros. Let’s consider a hypothetical example for illustration. A researcher has measured the number of clicker questions that each student got correct during one class. Imagine that all of the students were present in class, but that the clicker questions were extremely difficult this particular day. Many students did not get any clicker questions correct (i.e., many students got 0 correct). However, since every student had the potential to get at least one clicker question correct, there is only one underlying process explaining zero count vs. positive count data: whether the student was able to answer at least one question correctly. Contrast this with the situation in which half of the class is on a lab field trip the day of the clicker questions in lecture. Therefore, many, though not all, of the zeros are due to class absence, not inability to answer the questions correctly. Here, there are two underlying processes explaining zeros: whether the student was in class or not (i.e., their potential to get positive count data), and whether the student was able to answer at least one question correctly. The zero-inflated model accounts for both of these processes. Baccini and colleagues’ study [83] examining the effects of time spent teaching on the number of papers published by Italian researchers is one of the few papers in the education literature to justify the use of a particular model for excess zeros. They used a hurdle model, explaining that all the researchers in their study had the potential to publish; there was not an extra barrier to publication that needed to be modeled via a zero-inflated model. Desjardins [46] also clearly explains his use of a hurdle model to examine school suspensions among students because there are no students for whom suspension is not a risk. Both hurdle and zero-inflated models can be implemented in R through the pscl package [81]. Both types of models can be fit with the same predictors in the binary and count models, or with a different set of predictors in the binary and count models. Among other methods, the fit of these models can be compared to the fit of a standard Poisson or negative binomial model using AIC values. Because the binary model and the count model are fit separately, there are two output tables in hurdle and zero-inflated models. In R, the first table contains the count model (either Poisson or negative binomial, as specified), and the second table contains the results of logistic regression. Both tables contain regression coefficients, and their standard errors, z-values, and p- values. As in Poisson regression, exponentiating (eβ) coefficients from the count model will describe the effect on the response variable with a one unit increase in a continuous predictor or with a categorical predictor compared to its reference level. Similarly, as discussed in the Logistic Regression section, exponentiating coefficients from the logistic regression will give the odds of “crossing the hurdle” (hurdle model) or the odds of being unable to attain a positive count (zero-inflated model). For example, in Cruce and Hillman’s study [89] of the factors influencing adults to participate in coursework, the results of their hurdle model demonstrated that the odds of a female participating in coursework was 1.72 that of the odds of a male. However, if participating in coursework, gender did not influence the number of hours an adult participated in coursework. We have detailed some methods for the various steps needed when addressing count data that is overdispersed or contains excess zeros. Figure A6 is a decision tree to aid in selecting the most appropriate model type. However, there are many ways of dealing with both these issues that we have not included. For further reading, we suggest the following books that cover negative binomial, hurdle, and zero-inflated models: Cameron and Trivedi’s of Count Data [88] and Hilbe’s Negative Binomial Regression [85]. Additionally, Agresti [13] and Fox & Weisberg [23] cover negative binomials in their books (Categorical Data Analysis and An R Companion to Applied Regression, respectively). Finally, although in the context of ecology, Zuur and colleagues [92] have a nice discussion of hurdle and zero-inflated models in Mixed Effects Models and Extensions in Ecology with R.

Figure A6.1: Diagnostic tool useful for determining which glm method is appropriate for overdispersed data. As with the dichotomous key in the main text, this decision tree is used in an analogous manner as a dichotomous key.

Figure A6.2: Zero-inflated Poisson Regression annotated output