7 Dummy-Variable Regression
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Download Preprint
Running head: MAIN EFFECTS AND AVERAGE EFFECTS 1 On the Relationship between ANOVA Main Effects and Average Treatment Effects Linda Graefe Friedrich Schiller University Jena Sonja Hahn University of Education Karlsruhe Axel Mayer Bielefeld University This is a preprint that has not undergone peer-review. April 30, 2021 Author Note Correspondence concerning this article should be addressed to Linda Graefe, Department of Psychological Methods, Institute of Psychology, Friedrich Schiller University, Am Steiger 3, Haus 1, D-07743 Jena, Germany, e-mail: [email protected] This work was supported by a grant to the last author from the German Research Foundation (DFG; Grant No. MA 7702/1-1). MAIN EFFECTS AND AVERAGE EFFECTS 2 Abstract In unbalanced designs, there is a controversy about which ANOVA type of sums of squares should be used for testing main effects and whether main effects should be considered at all in the presence of interactions. Looking at this problem from a causal inference perspective, we show in which designs and under which conditions the ANOVA main effects correspond to average treatment effects as defined in the causal inference literature. We consider balanced, proportional and nonorthogonal designs, and models with and without interactions. In balanced designs, main effects obtained by type I, II, and III sums of squares all correspond to the average treatment effect. This is also true for proportional designs except for ANOVA type III which leads to bias if there are interactions. In nonorthogonal designs, ANOVA type I is always highly biased and ANOVA type II and III are biased if there are interactions. -
7 Dummy-Variable Regression
Dummy-Variable 7 Regression n obvious limitation of multiple-regression analysis, as presented in Chapters 5 and 6, is A that it accommodates only quantitative response and explanatory variables. In this chap- ter and the next, I will show how qualitative (i.e., categorical) explanatory variables, called fac- tors, can be incorporated into a linear model.1 The current chapter begins by introducing a dummy-variable regressor, coded to represent a dichotomous (i.e., two-category) factor. I proceed to show how a set of dummy regressors can be employed to represent a polytomous (many-category) factor. I next describe how interac- tions between quantitative and qualitative explanatory variables candistribute be included in dummy- regression models and how to summarize models that incorporate interactions. Finally, I explain why it does not make sense to standardize dummy-variableor and interaction regressors. 7.1 A Dichotomous Factor Let us consider the simplest case: one dichotomous factor and one quantitative explanatory variable. As in the two previous chapters, assumepost, that relationships are additive—that is, that the partial effect of each explanatory variable is the same regardless of the specific value at which the other explanatory variable is held constant. As well, suppose that the other assump- tions of the regression model hold: The errors are independent and normally distributed, with zero means and constant variance. The general motivationcopy, for including a factor in a regression is essentially the same as for including an additional quantitative explanatory variable: (1) to account more fully for the response variable, by making the errors smaller, and (2) even more important, to avoid a biased assessment of thenot impact of an explanatory variable, as a consequence of omitting a causally prior explanatory variable that is related to it. -
15 Generalized Linear Models
Generalized 15 Linear Models ue originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable D synthesis and extension of familiar regression models such as the linear models described in Part II of this text and the logit and probit models described in the preceding chapter. The current chapter begins with a consideration of the general structure and range of application of generalized linear models; proceeds to examine in greater detail generalized linear models for count data, including contingency tables; briefly sketches the statistical theory underlying generalized linear models; and concludes with the extension of regression diagnostics to generalized linear models. The unstarred sections of this chapter are perhaps more difficult than the unstarred material in preceding chapters. Generalized linear models have become so central to effective statistical data analysis, however, that it is worth the additional effort required to acquire a basic understanding of the subject. 15.1 The Structure of Generalized Linear Models A generalized linear model (or GLM1) consists of three components: 1. A random component, specifying the conditional distribution of the response variable, Yi (for the ith of n independently sampled observations), given the values of the explanatory variables in the model. In Nelder and Wedderburn’s original formulation, the distribution of Yi is a member of an exponential family, such as the Gaussian (normal), binomial, Pois- son, gamma, or inverse-Gaussian families of distributions. Subsequent work, however, has extended GLMs to multivariate exponential families (such as the multinomial distribution), to certain non-exponential families (such as the two-parameter negative-binomial distribu- tion), and to some situations in which the distribution of Yi is not specified completely. -
15 Generalized Linear Models
Generalized 15 Linear Models ue originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable D synthesis and extension of familiar regression models such as the linear models described in Part II of this text and the logit and probit models described in the preceding chapter. The current chapter begins with a consideration of the general structure and range of application of generalized linear models; proceeds to examine in greater detail generalized linear models for count data, including contingency tables; briefly sketches the statistical theory underlying generalized linear models; and concludes with the extension of regression diagnostics to generalized linear models. The unstarred sections of this chapter are perhaps more difficult than the unstarred material in preceding chapters. Generalized linear models have become so central to effective statistical data analysis, however, that it is worth the additional effort required to acquire a basic understanding of the subject. 15.1 The Structure of Generalized Linear Models A generalized linear model (or GLM1) consists of three components: 1. A random component, specifying the conditional distribution of the response variable, Yi (for the ith of n independently sampled observations), given the values of the explanatory variables in the model. In Nelder and Wedderburn’s original formulation, the distribution of Yi is a member of an exponential family, such as the Gaussian (normal), binomial, Pois- son, gamma, or inverse-Gaussian families of distributions. Subsequent work, however, has extended GLMs to multivariate exponential families (such as the multinomial distribution), to certain non-exponential families (such as the two-parameter negative-binomial distribu- tion), and to some situations in which the distribution of Yi is not specified completely.