Sw 983: Lecture Notes for Week 4

SW 983: MR WITH CURVILINEAR AND INTERACTION TESTS

Curvilinear MRA (aka “trend analysis”) to be applied in three contexts:

1. Experimental design, continuous independent variable, equal intervals and ballanced; 2. Experimental design, continuous independent variable, unequal intervals and/or unbalanced; 3. Nonexperimental design, continuous independent variable unequal, intervals and unequal cells.

When the independent variable is continuous, the choice between analysis of variance (ANOVA) and multiple regression (MR) is not arbitrary (as it is when dealing with categorical independent variable).

ANOVA treats the continuous independent variable as though it were categorical. It ignores the fact that the groups defined by the independent variable differ in degree not just in kind (see definition of categorical and continuous variables).

ANOVA tests whether the groups differ in any meaningful way. It doesn't tell anything about the nature or direction of the relationship.

MR tests whether the groups differ in a specific way (linear or curvilinear relationship). Use of regression implies a more sophisticated underlying theory or understanding of the nature of the relationship between the variables. MR can lead to a more concise statement of the relationship.

Curvilinear Regression

Remember that regression analysis tests for linear relationships. This can be a limitation in that it represents an oversimplification of many relationships.

Curvilinear regression permits greater sophistication and parsimony (relative to ANOVA) when it is found to adequately describe relationships.

See plots of residuals. Residuals contain all the information about how your data don't fit the specified equation. Computer will do plots for you.

You can test significance of deviations from linearity by including higher- order terms in equation and generating additional SS (R2).

SSdev = SStreat - SSreg

Note that the highest degree polynomial possible equals g-1 where g=number of distinct values in the independent variable (i.e. number of groups - 1).

1 Points to remember:

 Nonlinear in variables not in parameters

 Analysis proceeds hierarchically

 Lower order variables are not omitted

 Like stepwise, we seek parsimony (i.e., lowest order that adequately explains the relationship).

 Seldom have data beyond quadratic. Seldom have theory beyond cubic.

 Polynomials tend to be highly correlated. Ignore b's except highest order.

 Interpolation is OK but not extrapolation

Orthogonal Polynomials

Get them from a table or computer. It's easy to do with equally spaced and equal cells. Possible in other conditions but more complicated.

Easily extended to factorial designs (i.e., treat the factors one at a time, include all crossproducts for interaction).

Main reason for using - simplifies calculations and interpretation.

Nonexperimental Research

Orthogonal polynomials generally not possible or used

Messier but proceeds the same way, i.e. hierarchically, testing increment in R2 for each successively higher order polynomial.

Hierarchical analysis may result in declaring a component to be nonsignificant because the error term used at the given stage includes variance that may be due to higher-order polynomials. For this reason, it is suggested that an additional degree of polynomial be tested beyond the first nonsignificant one.

2 Multiple Curvilinear Regression

It is possible, with either experimental or nonexperimental data, to extend polynomial regression analysis to multiple independent variables. The mechanics of the analysis are fairly simple, but the interpretation of the results is far from simple. Consider the case of two independent variables:

2 2 Y' = a + b1X + b2Z + b3XZ + b4X + b5Z

The analysis proceeds hierarchically to determine first if the regression of Y on X and Z is linear or curvilinear. In effect, one tests whether b3=b4=b5=0. This can be easily accomplished by testing whether the last three terms in the above equation add significantly to the proportion of explained variance.

Centering Data i Lower order coefficients in higher order regression equations only have meaningful interpretation if the variable with which we are working has a meaningful zero. Centering data provides meaning to zero values (zero = mean). We center X not Y. Predicted scores will remain in the metric of the observed dependent variable. Centering also reduces the extreme multicollinearity associated with using powers of predictors or multiplicative terms in a single equation. If all the predictors in a regression equation containing interactions are centered, then each first- order coefficient has an interpretation that is meaningful in terms of the variables under investigation: the regression of the criterion on the predictor at the sample means of all other variables in the equation. Each first-order regression coefficient also can be interpreted as the average regression of the criterion on the predictor across the range of the other predictors. Only if there is an interaction does rescaling a variable by a linear transformation (e.g. centering) change the first order regression coefficients. McDonald notes that if you have a significant interaction the cautions that we discussed with factorial ANOVA still apply. In other words, don’t talk about main effects (first-order) until you’ve examined and explained the interaction. Centering of continuous independent variables is recommended unless

i Most of this is from Cohen, et al. (2003).

3