Quick viewing(Text Mode)

Analysis of Variance: General Concepts

Analysis of Variance: General Concepts

Research Skills for Psychology Majors: Everything You Need to Know to Get Started : General Concepts

This chapter is designed to present the most basic ideas in analysis of variance in a non-statistical manner. Its intent is to communicate the general idea of the analysis and provide enough information to begin to read result sections that report ANOVA analyses.

Analysis of Variance is a general-purpose statistical procedure that is used to anal- ize a wide range of research designs and to investigate many complex problems. In this chapter we will only discuss the original, basic use of ANOVA: analysis of experiments that include more than two groups. When ANOVA is used in this simple sense, it follows directly from a still simpler procedure, the t-test. A Deeper Truth The t-test compares two groups, either in a between-subjects design (different subjects in the groups) or a repeated-measures design (same subjects assessed Actually, the t-test is a twice). ANOVA can be thought of as an extension of the t-test to situations in special case of ANOVA. ANOVA is the real thing. which there are more than two groups (one-way design) or where there is more than one independent variable (factorial design). These situations are the most common in research, so ANOVA is used far more frequently than t-tests.

A Still Deeper Truth Variance is Analyzed Actually, ANOVA is a sim- The name “analysis of variance” is more representative of what the analysis is plification of very complex about than “t-test” because we are in fact focusing on analyzing variances. The correlations. Correlation is conceptual model for ANOVA follows the familiar pattern first introduced in the real thing. the Inferential chapter: a ratio is formed between the differences in the means of the groups and the error variance. In the same way that a variance (or standard deviation) can be calculated from a set of data, a variance can be calcu- lated from a set of means. So the differences among the means is thought of as their variance: higher variance among the means indicates that there are more differences (which is good, right?). The variance among the means is called the between-groups variance.

The ratio, then, is between-groups variance divided by error variance. A larger ratio indicates that the differences between the groups are greater than the error or “noise” going on inside the groups. If this ratio, the F statistic, is large enough given the size of the sample, we can reject the null hypothesis. The whole story in ANOVA is figuring out how to calculate (and understand) these two types of variance.

©2003 W. K. Gabrenya Jr. Version: 1.0 Page 2

A Visual Example

Here is an example of a one-way, between-groups design that would be analyzed using ANOVA. Four groups of participants are randomly sampled from four ma- jors on campus. We will not identify the majors for the sake of interdepartmental harmony, but the identity of Group 4 is clear. Each sample includes five students. They are each administered the Wechsler Adult Intelligence Scale (WAIS-III) to obtain a measure if IQ. IQs have a mean of 100 in the population as a whole. Our question: which major is smarter?

The following table presents the raw data (IQ scores), the means within each group, the standard deviation within each group, and the variance. The variance is simply the SD squared, a more useful number for certain aspects of the calcultions. It is normal that some of the SDs are larger than others. The gray bars below the scale represent the range of the IQs in each major, which is one indication of the within-group variability. (A wider range often produces a higher SD.) In the last column, the mean of the means (grand mean), the standard deviation of the means, and the variance of the means are presented.

Group 1 Group 2 Group 3 Group 4 Grand Mean Data: 80, 85, 90, 90, 93, 96, 97, 100, 103, 105, 110, 115, (the group means 95, 100 99, 102 106, 109 120, 125 could go here) Mean: 90.0 96.0 103.0 115.0 101.0 Std. Dev.: 7.9 4.7 4.7 7.9 Variance: 62.5 22.5 22.5 62.5 Grand Mean

80 90 100 110 120 125 1 1 1 Gp 1 1 1 Group 4 Group 2 Group 3

What’s the null hypothesis? The null condition is that there is no difference be- tween the population means:

H0: µ1 = µ2 = µ3 = µ4, where µ is “mu,” the population mean. Our task is to determine if the sample means, presented in the table above, are sufficiently different from each other compared to the error variance within the groups, to reject the null hypothesis. Of course the sample size will also affect the outcome because larger samples allow for better tests of the null hypothesis. In the of ANOVA, we will look at the ratio of the between-group variance to the within-group (error) variance. Between-groups variance F = Error Variance Within Groups Page 3

In the example, we have included the Calculating the Variances individual data for group 1 as circles inside the group 1 gray bar. The SD of group Within-Groups (Error) Variance: 1, 7.9, is computed from these 5 values. The overall amount of error variance is the combined variances of the Recall that the SD is the variability of the four groups. Combining the variances from several groups together is individual data based on how distant each called pooling, so the resulting combined variance is termed the pooled one is from the group mean (90.0). In variance. Averaging the variances in this study produces a pooled error other words, it is a measure of the extent variance of 6.522 =42.5. to which the five students sampled for that Between-Groups Variance: major are not exactly of the same intel- ligence. The student in group 2 are more Calculation of the between-groups variance is not as intuitive as the similar to each other and produce a SD whithin-groups variance. Conceptually, it seems that the SD of the four of 4.7. The overall error variance for the group means would be a good measure. (The SD of the means is 10.7.) sample is computed by combining these However, the actual between-groups SD is 24.0, so the between-groups 2 four SDs (see sidebar). variance is 24 = 577.

The between-groups variability is com- puted in the same way, but we look at how much the group means vary from the grand mean (the mean of the means). The higher this variability, the more the means differ from each other and the more the null hyothesis looks “rejectable.” (See sidebar.)

Finally, the ANOVA ANOVA Source Table

The ANOVA focuses on the ratio of the 1 2 3 4 5 6 between group variance to the within- Sum of df Mean F Sig. groups variance. SPSS produces an Squares Square ANOVA source table to report the result Between 1730.000 3 576.667 13.569 .0001 of the analysis.This table is called a source Groups table because it identifies the sources of Within 680.000 16 42.500 variability in the data. As explained above, Groups there are two kinds of variability, vari- Total 2410.000 19 ability between group means, and variabil- ity within groups (error variance). The source table provides information about these two sources. The column numbers Degrees of Freedom in ANOVA have been added for our use. All statistics, such as F, t, and chi-square, are evaluated in the context of the sample size: larger samples allow lower statistical values to reach the Column 3, reflecting the number of groups magic .05 level of confidence. The sample size is expressed in terms of and the sample size, is discussed in the degrees of freedom (df). Your statistics class has more to say about df. In sidebar. Column 4 presents the variance a t-test, the df is the sample size minus 2 (N-2). In ANOVA, we use two associated with the mean differences df values. The df-error is based on the sample size: (between groups) and within-group er- df = ∑(n -1), where n is the size of each of the group samples ror. These numbers are discussed in the e g g Variances sidebar. Column 5 is the ratio of 16 = (5-1) + (5-1) + (5-1) + (5-1) these two values, the F statistic. Column 6 presents the p-value (see Inferential Statis- ANOVA also requires a df for the number of groups: tics chapter) of the F statistic based on the dfbg = g-1, where g is the number of groups sample size. Because our normal criterion for rejecting the null hypothesis is p<.05, The F statistic is always presented along with these df values, e.g., this p value is very good (good = low), and F(3,19)=13.6, p<.0001. Page 4 we can reject the null hypothesis.

What has been rejected? By rejecting the null hypothesis, we conclude that the four means are not equal in the population, that is, all majors are not created equal. However, what it does not tell us is exactly which major is smarter than which other major. Is Group 4 smarter than Group 2, or just smarter than the hapless Group 1? Just eyeballing the means is not good enough: we need to know if the differences between particular pairs of means are really significantly different. How is this done? One way is to perform t-tests between pairs of means (there are several other ways as well).

Using SPSS to Calculate One-Way ANOVA

A one-way ANOVA is an analysis in which there is only one independent variable, as the preceding example. This is the sim- plest kind of ANOVA, and SPSS dedicates a procedure purely to it. (See menu screen illustration.)

The dialog window in which the details of the analysis are entered is quite simple.

In the dialog illustration, the dependent variable (IQ) and the independent variable (group) have been entered. In the Options... dialog you can ask for descriptive statistics and a rather sorry looking graph of the means.

Syntax:

ONEWAY iq BY group /STATISTICS DESCRIPTIVES /PLOT MEANS /MISSING ANALYSIS .

The principal output of the procedure is the source table shown above.

In a paper, the approprate way to report the results of an ANOVA is a variation of:

A one-way between-groups ANOVA revealed a significant effect of major, F(3,16) = 13.6, p < .05.

Note that the ANOVA used two types of degrees of freedom: the between-groups df and the error df. Page 5

Factorial ANOVA

If indeed “the truth lies in the interactions,” then we need to perform more com- plicated studies that include more than one IV. Factorial designs of this kind were introduced in the research designs chapter. For example, in the study presented above, we might want to know if gender is related to IQ. The obvious design would be a 4x2 between-subjects factorial: four majors crossed with gender. In the table below, the 40 students are indicated by S1...S40 in the 8 cells of the facto- rial design.

Group 1 Group 2 Group 3 Group 4

Male S1, S2, S3, S4, S5 S11, S12, S13, S14, S15 S21, S22, S23, S24, S25 S31, S32, S33, S34, S35

Female S6, S7, S8, S9, S10 S16, S17, S18, S19, S20 S26, S27, S28, S29, S30 S36, S37, S38, S39, S40

The of a factorial ANOVA are more complicated than those of the one-way ANOVA, but the principles are the same. The ANOVA compares the variability due to between-groups differences to the amount of error variance in the sample. However, in this two-way factorial, we need to look at three types of between-groups variability: the variability between the majors, the variability between the genders, and the interaction effect variability. A ratio (F statistic) of between-subjects variability to error variance is calculated for each of these three types of between-groups variability.

How many null hypotheses are there?

SPSS and Factorial ANOVA

The simple one-way ANOVA procedure cannot be used. Instead, factorial ANOVAs are produced by the SPSS GLM proce- dure. GLM mean “general linear model.” You will study the GLM in your second year of graduate-level statistics. GLM is a very powerful and flexible procedure that was only introduced to SPSS in the 1980s. Because it is powerful and flexible, it can be configured in many ways and has a large number of options.

Univariate refers to the fact that you will be analyzing one dependent variable at a time. The IQ across majors study presented previously was enhanced by adding gender as a second independent variable to serve as an example of a factorial ANOVA. The analysis dialog What Other Goodies are in this Menu? box shown here has been configured to run this 4x2 ANOVA. Multivariate ANOVA (MANOVA) allows you to analyze several DVs simultaneously, in a single set. Use the Fixed Factors box for the IVs. Ignore the boxes below that until you get to graduate school. You can Repeated Measures ANOVA analyzes the repeated mea- sures designs introduced in the research designs chapter. specify in detail which means tables you would like to see display in the output by clicking on Options. Page 6

Double-clicking on the items in the left-side box moves them to the right side ‘Display Means for’ box. In this case, moving ‘group’ to the Display Means box pro- duces a means table that includes just the main effect of group. The ‘group*gender’ item displays a 4x2 table of means from which you can see if there is an interaction effect.

Syntax:

UNIANOVA iq BY group gender /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS = TABLES(group*gender) /CRITERIA = ALPHA(.05) /DESIGN = group gender group*gender .

The Output

The source table in a factorial ANOVA expands on that of the one-way ANOVA. Two additional sources are reported: the second IV, and the interaction effect. (See Tests of Between-Subjects Effects table.)

The only rows of importance in this source table are those indicating the effects in the factorial model: GROUP, GENDER, and GROUP*GENDER. The F statistics in this type of source table are calculated by dividing a factor’s Mean Square by the Mean Square of the Error row. “Mean Square” is another way of saying “variance.” Hence, F for the Group factor is:

MSgroup/MSerror = 477.2/14.167 = 33.685.

These results show that the Group and Gender main effects are sig- nificant at a very low p value. SPSS will not print all of the significant digits of a very small p value. For Group, the actual p value is .000039, but no one cares because it is so far below .05. The Group X Gender interaction is not signifi- cant because the p value is so large (p = .567).

In a paper, there are several forms for reporting the results of a facto- rial ANOVA:

A 4 (major) x 2 (gender) be- tween-groups ANOVA revealed Page 7

significant main effects of ma- Tests of Between-Subjects Effects jor, F(3,12) = 33.7, p < .05, and Dependent Variable: IQ gender, F(1, 12) = 33.9, p < .05. The interaction effect did not Source Type III Sum df Mean F Sig. approach significance, F < 1. of Squares Square Corrected 2240.000 7 320.000 22.588 .000 or, if the interaction had been Model stronger: Intercept 195859.200 1 195859.200 13825.355 .000 GROUP 1431.600 3 477.200 33.685 .000 A 4 (major) X 2 (gender) between-groups ANOVA re- GENDER 480.000 1 480.000 33.882 .000 vealed significant main effects GROUP * 30.000 3 10.000 .706 .567 of major, F(3,12) = 33.7, p < GENDER .05, and gender, F(1, 12) = 33.9, Error 170.000 12 14.167 p < .05. However, these main Total 206430.000 20 effects must be interpreted Corrected 2410.000 19 within the significant Major X Total Gender interaction, F(3,12) = R Squared = .929 (Adjusted R Squared = .888) 8.5, p < .05.

Note that the ANOVA used two types of degrees of freedom: the Reprise of “Still Deeper Truth” between-groups df and the error The intercept reveals a clue to the ridiculous conspiracy that ANOVA is just df. a lot of correlations. Do you remember the equation for a line from algebra? In statistics we call this a regression line, and write the equation as

Digging Deeper y = a + bx + e

Overall, do major and gender help y is the dependent variable, IQ a is the y-intercept of the line us know what students’ IQs are? b is the slope of the line Said another way, do major and x is sort of the independent vari- e is the error variance gender predict IQ? The ‘Cor- ables, major, gender, and the interac- rected Model’ row in the source tion, all rolled into one (sort of) table answers this general ques- tion: yes. The idea of a model was In the ANOVA table, the intercept F-test is testing if the y-intercept (a) is different introduced in an early chapter. than zero. In a certain sense, the corrected model F-test is testing whether the Here, the model is expressed slope (b) is different than zero. When the slope is different than zero, the indepen- dent variables (x) affect the dependent variable (y). mathematically: In the manner of a correlation, a (b) near 1.0 and a low (e) gives us a correlation IQ = ƒ (major, gender) scattergram with a long, skinny oval. (i.e., a good correlation). Error variance (e) is analogous to the fatness of the oval. The Corrected Model essentially combines all the predictors of IQ Line, slope (b) (Group, Gender, and their inter- action) to see if, as a whole, they predict the dependent variable. y (Hint: add the df.) Of course, we (DV) usually don’t care about the whole Skinny oval and model, but rather only about its y-intercept (a) slope near 1.0 indicates high cor- component parts, the individual relation coefficient IVs. x (IV) The ‘Intercept’ row in the table is Page 8 not usually important. It compares the grand mean (101.0) to zero. Because 101 is so far from zero, the F is enormous. (But see the sidebar for its deeper meaning.)

What’s Next?

A lot more...