<<

Goal of The Formal ANOVA Model Explanation by Example Multiple Comparisons Assumptions

Analysis of Variance and Contrasts

Ken Kelley’s Class Notes

1 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example Multiple Comparisons Assumptions Lesson Breakdown by Topic

Example: Weight Loss Drink 1 Goal of Analysis of Variance ANOVA Using SPSS A Conceptual Example Appropriate for ANOVA 4 Multiple Comparisons Example F -Test for Independent Why Multiplicity Matters Variances Error Rates Conceptual Underpinnings of Linear Combinations of Means ANOVA Controlling the Type I Error Mean Squares 5 Assumptions 2 The Formal ANOVA Model Assumptions of the ANOVA A Worked Example What You Learned 3 Explanation by Example Notations

2 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example Multiple Comparisons Assumptions What You Will Learn from this Lesson

You will learn: How to compare more than two independent means to assess if there are any differences via an analysis of variance (ANOVA). How the total sums of squares for the data can be decomposed into a part that is due to the mean differences between groups and to a part that is due to within group differences. Why doing multiple t-tests is not the same thing as ANOVA. Why doing multiple t-tests leads to a multiplicity issue, in that as the number of tests increases, so to does the probability of one or more error. How to correct for the multiplicity issue in order for a set of contrasts/comparisons has a Type I error rate for the collection of tests at the desirable (e.g., .05) level. How to use SPSS and R to implement an ANOVA and follow-up tests.

3 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example Multiple Comparisons Assumptions Motivation

When looking at different allergy medicines, there are numerous options. So how can it be determined which brand will work best when they all claim to do so? Data could be collected to determine the outcomes from each product among numerous individuals randomly assigned to different brands. An ANOVA could be run to infer if there is a performance difference between these different brands. If there are no significant results, evidence would not exist to suggest there are differences in performance among the brands. If there are significant results, we would infer that the brands do not perform the same, but further tests would have to be conducted so as to infer where the differences are .

4 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions Goal of Analysis of Variance

The goal of ANOVA is to detect if mean differences exist among m groups. Recall the independent groups t-test is designed to detect differences between two independent groups. 2 The t-test is a special case of ANOVA when m = 2 (tdf equals the F(1,df ) from ANOVA for two groups).

5 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Obtaining a statistically significant result for ANOVA conveys that not all groups have the same population mean.

However, a statistically significant ANOVA with more than two groups does not convey where those differences exist. Follow-up tests (contrasts/comparisons) can be conducted to help discern specifically where group means differ.

6 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions Consumer Preference

Consider the overall perception of how consumers regard different companies.

An was done in which 30 individuals were randomly assigned into one of three groups.

All participants saw (almost) the same commercial advertising a new Android smart phone. The difference between the groups was that the commercial attributed the phone to either (a) Nokia, (b) Samsung, or (c) Motorola. Of interest is in whether consumers tend to rate the brands differently, even for the “same” cell phone.

7 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

What are other examples in which ANOVA would be useful?

8 / 104 Notice that F cannot be negative and is unbound on the high side. F -is a positively skewed distribution.

The F - is used to evaluate the above null hypothesis, and is defined as the ratio of two independent variances:

2 s1 F(df1,df2) = 2 , s2

2 2 where df1 and df2 are the degrees of freedom for s1 and s2 , respectively.

Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Consider the null hypotheses of equal variances:

2 2 H0: σ1 = σ2.

9 / 104 Notice that F cannot be negative and is unbound on the high side. F -is a positively skewed distribution.

Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Consider the null hypotheses of equal variances:

2 2 H0: σ1 = σ2.

The F -statistic is used to evaluate the above null hypothesis, and is defined as the ratio of two independent variances:

2 s1 F(df1,df2) = 2 , s2

2 2 where df1 and df2 are the degrees of freedom for s1 and s2 , respectively.

10 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Consider the null hypotheses of equal variances:

2 2 H0: σ1 = σ2.

The F -statistic is used to evaluate the above null hypothesis, and is defined as the ratio of two independent variances:

2 s1 F(df1,df2) = 2 , s2

2 2 where df1 and df2 are the degrees of freedom for s1 and s2 , respectively. Notice that F cannot be negative and is unbound on the high side. F -is a positively skewed distribution.

11 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions Examples

We have previously asked questions about the mean difference, but the F -distribution allows us to ask questions about variability. Is the variability of user satisfaction of Gmail users different than the variability of user satisfaction of Outlook.com?

Does Mars and their M&M’s production have “better control” (i.e., smaller variance) than Wrigley’s Skittles?

For a given item, are Wal-Mart prices across the country more stable than Kroger’s (for like items)?

Does a particular machine (or location/worker/shift) produce more variable products than a counterpart?

12 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

The standard deviation of Gmail user satisfaction was 6.35 based on a sample size of 55. The standard deviation of Outlook.com user satisfaction was 8.90 based on a sample size of 42. For an F -test of this sort addressing any differences in the variance (e.g., is there more variability in user satisfaction in one group), there are two critical values, one at the α/2 value and one at the 1 − α/2 value. The critical values are and for the .025 and .975 quantiles (i.e., when α = .05). The F -statistic for the test of the null hypothesis is 6.352 40.3225 F = = = .509. 8.902 79.21

The conclusion is: .

13 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Thus far, we have talked only about the idea of comparing two variances.

But, what does this have to do with comparing means, which is the question we are interested in addressing?

14 / 104 We thus consider the variability of the group means to assess if the population group means differ.

one variance calculates the variance of the group means;

another variance is the (weighted) mean of within group 2 variances (recall sp from the two group t-test).

Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Analysis of variance (ANOVA) considers two variances:

15 / 104 We thus consider the variability of the group means to assess if the population group means differ.

Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Analysis of variance (ANOVA) considers two variances: one variance calculates the variance of the group means;

another variance is the (weighted) mean of within group 2 variances (recall sp from the two group t-test).

16 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

Analysis of variance (ANOVA) considers two variances: one variance calculates the variance of the group means;

another variance is the (weighted) mean of within group 2 variances (recall sp from the two group t-test).

We thus consider the variability of the group means to assess if the population group means differ.

17 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions Conceptual Underpinnings of ANOVA

The null hypothesis in an ANOVA context is that all of the group means are the same: µ1 = µ2 = ... = µm = µ, where m is the total number of groups.

When the null hypothesis is true, we can estimate the variance of the scores with two methods, both of which are independent of one another.

18 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

If the ratio of variances (i.e., F -test) is so much larger than 1 that it seems unreasonable to have happened by chance alone, then the null hypothesis can be rejected. Of course, “so much larger than 1 that it seems unreasonable” is defined in terms of the p-value (compared to α).

If the p-value is smaller than α, the null hypothesis of equal population means is rejected. The variance of the scores can be calculated from within each group and then pooled across the groups (in exactly the same manner as was done for the independent groups t-test).

19 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions Mean Square Within

Recall that the best way to arrive at a pooled within group variance is to calculate a weighted mean of the variances:

m m P 2 P (nj − 1)sj SSj 2 j=1 j=1 2 sPooled = m = = sWithin = MSWithin, P N − m nj − m j=1

where SS is sum of squares, MS is mean square (i.e., a variance), m is the number of groups, nj is the sample size in the jth group (j = 1,..., m), and N is the total sample size m P (N = nj ). j=1

20 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

In the special case where n1 = n2 = ... = nm = n, the equation for the pooled variance reduces:

m P 2 sj j=1 = s2 = MS . m Within Within Notice that the degrees of freedom here are N − m. The degrees of freedom are N − m because there are N independent observations yet m sample means estimated.

21 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions Mean Square Between

Recall from the single group situation that the variance of the mean is equal to the variance of the scores divided by the  s2  sample size i.e.,s2 = Yj . Y¯j nj

That is, the variance of the sample means is the variance of the scores divided by the sample size.

22 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

However, under the null hypothesis, we can calculate the variance of the sample means directly by using the m means as if they were individual scores.

Then, an estimate of the variance of the scores could be obtained by multiplying the variance of the means by sample size (s2 = ns2 ). Between Y¯

If the F -statistic is statistically significant, the conclusions is that the variance of the means is larger than it should have been, if in fact the null hypothesis was true.

Notice that the degrees of freedom here are m − 1.

23 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

There are thus two variances that estimate the same value under the null hypothesis.

2 One (σWithin) calculated by pooling within group variances.

2 The other (σBetween) by calculating the variance of the means and multiplying by the within group sample size.

2 σBetween If the null hypothesis is exactly true, 2 = 1. σWithin

If the null hypothesis is false and mean differences do exist, 2 sBetween will be larger than would be expected under the null 2 sBetween hypothesis, then 2 > 1. sWithin

24 / 104 Thus, we are comparing means based on variances!

Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

2 sBetween MSBetween If F = 2 (i.e., F = MS ) is statistically significant, sWithin Within we will reject the null hypothesis and conclude that H0: µ1 = µ2 = ... = µm = µ is false.

25 / 104 Goal of Analysis of Variance A Conceptual Example Appropriate for ANOVA The Formal ANOVA Model Example F -Test for Independent Variances Explanation by Example Conceptual Underpinnings of ANOVA Multiple Comparisons Mean Squares Assumptions

2 sBetween MSBetween If F = 2 (i.e., F = MS ) is statistically significant, sWithin Within we will reject the null hypothesis and conclude that H0: µ1 = µ2 = ... = µm = µ is false.

Thus, we are comparing means based on variances!

26 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions The ANOVA Model

The ANOVA assumes that the score for the ith individual in the jth group is a function of some overall mean, µ, some effect for being in the jth group exists, τj , and some uniqueness exists, εij . Such a scenario implies that

Yij = µ + τj + εij ,

where

τj = µj − µ,

with τj being the treatment effect of the jth group.

27 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

When the null hypothesis is true, the sum of the τs squared m P 2 equals zero: τj = 0. j=1

When the null hypothesis is false, the sum of the τs squared m P 2 equals some number larger than zero: τj > 0. j=1

28 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

Thus, we can formally write the null and alternative hypotheses for ANOVA as

m P 2 H0: τj = 0 j=1 and m P 2 Ha: τj > 0, j=1 respectively. m P 2 Note that H0: τj = 0 is equivalent to j=1 H0: µ1 = µ2 = ... = µm = µ.

29 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

The null hypothesis can be evaluated by determining, probabilistically, if the sum of the estimated τs squared is greater than zero by more than what would be expected by chance alone. The “hard to believe” part is evaluated by the specified α value.

30 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

The sums of squares are defined as follows:

m X 2 SSBetween = SSTreatment = SSAmong = nj (Y¯j − Y¯..) ; j=1 and

m nj X X 2 SSWithin = SSError = SSResidual = (Yij − Y¯j ) ; j=1 i=1

m nj X X 2 SSTotal = (Yij − Y¯..) . j=1 i=1

SSTotal = SSBetween + SSWithin

31 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

Like usual, we divide the sums of squares by the appropriate degrees of freedom in order to obtain a variance.

In the ANOVA context, the sums of squares divided by its SS degrees of freedom is called a mean square: df = MS. “Mean squares” are so named because when the sums of squares is divided by its degrees of freedom, the resultant value is the mean of the squared deviations (i.e., the mean square).

Mean square simply means variance.

32 / 104 The ANOVA source table is (very) similar to that used in the context of multiple regression, a widely applicable future topic.

Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

In general, the ANOVA source table is defined as:

Source SS df MS F p-value m P 2 SSBetween MSBetween Between nj (Y¯j − Y¯..) m − 1 p m−1 MSWithin j=1 m nj P P ¯ 2 SSWithin Within (Yij − Y.j ) N − m N−m j=1 i=1 m nj P P 2 Total (Yij − Y¯..) N − 1 j=1 i=1

33 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

In general, the ANOVA source table is defined as:

Source SS df MS F p-value m P 2 SSBetween MSBetween Between nj (Y¯j − Y¯..) m − 1 p m−1 MSWithin j=1 m nj P P ¯ 2 SSWithin Within (Yij − Y.j ) N − m N−m j=1 i=1 m nj P P 2 Total (Yij − Y¯..) N − 1 j=1 i=1 The ANOVA source table is (very) similar to that used in the context of multiple regression, a widely applicable future topic.

34 / 104 Goal of Analysis of Variance The Formal ANOVA Model Explanation by Example A Worked Example Multiple Comparisons Assumptions

It can also be shown that the expected values of the mean squares are given as

m P 2 nj τj j=1 E [MS ] = σ2 + , Between Within m − 1

2 E [MSWithin] = σWithin, When all of the population means are equal, the second component of the MSBetween and the expectation of the two mean squares is the same.

When any population mean difference exists, E [MSBetween] > E [MSWithin].

35 / 104 Worked Example – Raw Data

Nokia Samsung Motorola 2 2 2 2 2 e2 = y − y Ratings ei1 = yi1 − y1 ei1 = (yi1 − y1 ) Ratings ei2 = yi2 − y2 ei2 = (yi2 − y2 ) Ratings ei3 = yi3 − y3 i3 ( i3 3 ) 6 1.5 2.25 10 2 4 10 3 9 6 1.5 2.25 10 2 4 6 11 1 2 12.5 6.25 9 1 1 10 3 9 3 11.5 2.25 4 14 16 5 12 4 4 10.5 0.25 4 14 16 10 3 9 4 10.5 0.25 10 2 4 5 12 4 6 1.5 2.25 10 2 4 2 15 25 2 12.5 6.25 10 2 4 10 3 9 5 0.5 0.25 3 15 25 2 15 25 7 2.5 6.25 10 2 4 10 3 9 Σ 45.00 0.00 28.50 80.00 0.00 82.00 70.00 0.00 104.00 Mean 4.50 0.00 8.00 0.00 7.00 0.00 SD 1.78 1.78 3.02 3.02 3.40 3.40 Variance 3.17 3.17 9.11 9.11 11.56 11.56

Grand=Mean=(======;=y1bar=dot=dot==)=y.. ===(4.50*10=+=8.00*10=+=7.00*10)/30===6.50 The=grand=mean=is=the=(weighted)=mean=of=the=sample=means=(here=it=is=simply=equal=to=the=mean=of=the=means=due=to=equal=group=sample=sizes.

Sums%of%Squares 2 2 2 Between=Sum=of=Squares ===10*(4.5016.50) =+=10*(8.0016.50) =+=10*(7.0016.50) ===65.00===SSBetween This=is=the=weighted=(because=each=score=in=a=group=has=the=same=sample=mean,=of=course)=sum=of=squared=deviation=between=the=group=means=and=the=grand=mean.

Within=Sum=of=Squares ===9*3.17=+=9*9.11=+=9*11.56===28.5=+=82=+=104===214.50===SSWithin This=is=the=sum=of=each=of=the=within=group=sum=of=squares.

Mean%Squares Now,=to=obtain=the=mean=squares,=divide=the=sums=of=squares=by=their=appropriate=degrees=of=freedom:

Mean=Square=Between ===65.00/(311)===32.50===MSBetween

Mean=Square=Within= ===214.50/=27===7.94===MSWithin

Inference Now,=to=obtain=the=F1statistic,=divide=the=Mean=Square=Between=by=the=Mean=Square=Within: F"=" 32.50/7.94===4.091

To=obtain=the=p1value,=use=the="F.Dist.RT"=formula=for=finding=the=area=in=the=right=tail=that=exceeds=the=F=value=of=4.091 p"=" 0.028061704

Now,=because=the=p1value=is=less=than=α=(.05=being=typical),=we=reject=the=null=hypothesis.=We=infer=that=the=population=group=mean=are=not=all=equal.= Thus,=the=same=phone=commercial,=as=attributed=to=different=brands,=had=an=effect=on=the=ratings=of=the=phone.= The=conclusion=is=that=there=is=an=effect=of=brand=on=consumer=sentiment=1=consumers=rate=the=same=thing=differently=depending=on=the=brand=attribution.=

The data are available here: nd.edu/~kkelley/Teaching/Data/Phone_Commercial_Preference.sav. Worked Example – Summary

Summary8Statistics8from8the8Phone8Evaluation Nokia Samsung Motorola

Mean yj 4.50 8.00 7.00

Standard8deviation s j 1.78 3.02 3.40

Sample8size y.. 10 10 10

Grand8mean y.. 8=8(4.50*108+88.00*108+87.00*10)/(30)8=86.50

Rather8than8using8the8full8data8set,8only8the8summary8statistics8are8actually8needed.8The8reason8is8because8we8can8determine8the8 sums8of8squares8from8the8summary8data.8The8within8sum8of8squares8is8literally8the8sum8of8the8degrees8of8freedom8multiplied8by8the8variance8from8each8group.

Sums%of%Squares 2 2 2 Between8Sum8of8Squares 8=810*(4.50N6.50) 8+810*(8.00N6.50) 8+810*(7.00N6.50) 8=8865.008=8SSBetween This8is8the8weighted8(because8each8score8in8a8group8has8the8same8sample8mean,8of8cousre)8sum8of8squared8deviation8between8the8group8means8and8the8grand8mean.

Within8Sum8of8Squares 8=81.782*(10N1)8+83.022*(10N1)8+83.402*(10N1)8=8214.50,8which8in8terms8of8variances8(instead8of8standard8deviations)8can8be8written8as:

8=83.17*(10N1)8+9.11*(10N1)8+811.56*(10N1)8=8214.508=8SSWithin Recall8that8the8sums8of8squares8divided8by8its8degree8of8freedom8is8a8variance8Correspdongly,8a8variance8multiplied8by8its8degrees8 of8freedom8is8a8sum8of8squares.8Thus,8we8are8able8to8find8the8sum8of8squares8by8multiplying8the8variances8by8their8degrees8of8freedom.8

Mean%Squares Now,8to8obtain8the8mean8squares,8divide8the8sums8of8squares8by8their8appropriate8degrees8of8freedom:

Mean8Square8Between 8=865.00/(3N1)8=832.508=8MSBetween

Mean8Square8Within8 8=8214.50/8278=87.948=8MSWithin

Inference Now,8to8obtain8the8FNstatistic,8divide8the8Mean8Square8Between8by8the8Mean8Square8Within: F"=" 32.50/7.948=84.091

To8obtain8the8pNvalue,8use8the8"F.Dist.RT"8formula8for8finding8the8area8in8the8right8tail8that8exceeds8the8F8value8of84.091 p"=" 0.0281

Now,8because8the8pNvalue8is8less8than8α8(.058being8typical),8we8reject8the8null8hypothesis.8We8infer8that8the8population8group8mean8are8not8all8equal.8 Thus,8the8same8phone8commercial,8as8attributed8to8different8brands,8had8an8effect8on8the8ratings8of8the8phone.8 The8conclusion8is8that8there8is8an8effect8of8brand8on8consumer8sentiment8N8consumers8rate8the8same8thing8difference,8depending8on8the8brand8attribution.8

The data are available here: nd.edu/~kkelley/Teaching/Data/Phone_Commercial_Preference.sav. Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions Product Effectiveness: Weight Loss Drinks

Over a two month period in the early spring, 99 participants from the midwest were randomly assigned to one of three groups (33 each) to assess the effectiveness of meal replacement weight loss drink. Study was conducted and analyzed by an independent firm.

The three groups were a (a) control group, (b) SF, and (c) TL.

All participants were encouraged to exercise and given running shoes, workout outfit, and a pedometer.

38 / 104 Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions

The summary statistics for weight change in pounds (before breakfast) are given as:

Control SF TL Total Y¯ -1.61 -3.06 -7.29 -3.78 s 1.83 2.12 1.79 3.00 n 26 29 22 77

As can be seen, 22 participants did not compete the study. Implications?

39 / 104 Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions

The following table is the ANOVA source table: Source SS df MS F p Between 408.28 2 204.14 54.567 < .001 Within 276.84 74 3.74 Total 685.12 76

40 / 104 Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions

The critical F -value at the .05 level for 2 and 69 degrees of freedom is F(2,74;.95) = 3.12.

So, given the information, the decision is to .

The one-sentence interpretation of the results is:

41 / 104 Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions Performing an ANOVA in SPSS

42 / 104 Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions

43 / 104 ANOVA Output from SPSS Goal of Analysis of Variance The Formal ANOVA Model Example: Weight Loss Drink Explanation by Example ANOVA Using SPSS Multiple Comparisons Assumptions Suggestions when Performing ANOVA in SPSS

Start with Analyze → Descriptives → Explore.

Analyze → Compare Means → One-Way ANOVA for ANOVA procedure.

In the One-Way ANOVA specification, request a Means Plot (via Options). Consider using Analyze → General Linear Model → Univariate for a more general approach.

45 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Omnibus Versus Targeted Tests

Procedures such as the t-test are targeted, and thus test specific hypotheses. For example, the independent groups t-test evaluates the hypothesis that µ1 = µ2. Thus, after an ANOVA is performed, oftentimes we want to know where the mean differences exist.

However, a rationale of ANOVA was not to perform many significance tests.

46 / 104 1 live feed from the corporate server; 2 live feed to the corporate server; 3 live feed to the departing airport; 4 live feed to the arrival airport; 5 computer terminal to function property (e.g., no software glitches, no power loss).

This system requires all five processes to simultaneously function:

Suppose that the “uptime” or ”reliability” of each of these independent systems is .95, meaning at any given time there is a 95% chance each process is working.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions An Analogy

Consider an airline scheduling system at the gate of departure.

47 / 104 Suppose that the “uptime” or ”reliability” of each of these independent systems is .95, meaning at any given time there is a 95% chance each process is working.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions An Analogy

Consider an airline scheduling system at the gate of departure. This system requires all five processes to simultaneously function: 1 live feed from the corporate server; 2 live feed to the corporate server; 3 live feed to the departing airport; 4 live feed to the arrival airport; 5 computer terminal to function property (e.g., no software glitches, no power loss).

48 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions An Analogy

Consider an airline scheduling system at the gate of departure. This system requires all five processes to simultaneously function: 1 live feed from the corporate server; 2 live feed to the corporate server; 3 live feed to the departing airport; 4 live feed to the arrival airport; 5 computer terminal to function property (e.g., no software glitches, no power loss). Suppose that the “uptime” or ”reliability” of each of these independent systems is .95, meaning at any given time there is a 95% chance each process is working.

49 / 104 Note that the rate of errors in the system is (.2262/.05) 4.524 times higher than in a given process!

Recalling the rule of independent events, the probability that the system can be used is .95 × .95 × .95 × .95 × .95 = .955 = .7738. Thus, even though each piece of the system has a 95% chance of working properly, there is only a 77.38% chance that the system itself can be used. The implication here is that an error occurring somewhere in the set of processes (1-.7738=0.2262) is much higher than for any given system (1-.95=.05).

This is the multiplicity issue – an error somewhere among a set of “tests” is higher than for any given test.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

What is the probability that the system can be used when needed (i.e., that all five systems working properly)?

50 / 104 Note that the rate of errors in the system is (.2262/.05) 4.524 times higher than in a given process!

Thus, even though each piece of the system has a 95% chance of working properly, there is only a 77.38% chance that the system itself can be used. The implication here is that an error occurring somewhere in the set of processes (1-.7738=0.2262) is much higher than for any given system (1-.95=.05).

This is the multiplicity issue – an error somewhere among a set of “tests” is higher than for any given test.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

What is the probability that the system can be used when needed (i.e., that all five systems working properly)? Recalling the rule of independent events, the probability that the system can be used is .95 × .95 × .95 × .95 × .95 = .955 = .7738.

51 / 104 Note that the rate of errors in the system is (.2262/.05) 4.524 times higher than in a given process!

The implication here is that an error occurring somewhere in the set of processes (1-.7738=0.2262) is much higher than for any given system (1-.95=.05).

This is the multiplicity issue – an error somewhere among a set of “tests” is higher than for any given test.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

What is the probability that the system can be used when needed (i.e., that all five systems working properly)? Recalling the rule of independent events, the probability that the system can be used is .95 × .95 × .95 × .95 × .95 = .955 = .7738. Thus, even though each piece of the system has a 95% chance of working properly, there is only a 77.38% chance that the system itself can be used.

52 / 104 Note that the rate of errors in the system is (.2262/.05) 4.524 times higher than in a given process! This is the multiplicity issue – an error somewhere among a set of “tests” is higher than for any given test.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

What is the probability that the system can be used when needed (i.e., that all five systems working properly)? Recalling the rule of independent events, the probability that the system can be used is .95 × .95 × .95 × .95 × .95 = .955 = .7738. Thus, even though each piece of the system has a 95% chance of working properly, there is only a 77.38% chance that the system itself can be used. The implication here is that an error occurring somewhere in the set of processes (1-.7738=0.2262) is much higher than for any given system (1-.95=.05).

53 / 104 This is the multiplicity issue – an error somewhere among a set of “tests” is higher than for any given test.

Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

What is the probability that the system can be used when needed (i.e., that all five systems working properly)? Recalling the rule of independent events, the probability that the system can be used is .95 × .95 × .95 × .95 × .95 = .955 = .7738. Thus, even though each piece of the system has a 95% chance of working properly, there is only a 77.38% chance that the system itself can be used. The implication here is that an error occurring somewhere in the set of processes (1-.7738=0.2262) is much higher than for any given system (1-.95=.05). Note that the rate of errors in the system is (.2262/.05) 4.524 times higher than in a given process!

54 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

What is the probability that the system can be used when needed (i.e., that all five systems working properly)? Recalling the rule of independent events, the probability that the system can be used is .95 × .95 × .95 × .95 × .95 = .955 = .7738. Thus, even though each piece of the system has a 95% chance of working properly, there is only a 77.38% chance that the system itself can be used. The implication here is that an error occurring somewhere in the set of processes (1-.7738=0.2262) is much higher than for any given system (1-.95=.05). Note that the rate of errors in the system is (.2262/.05) 4.524 times higher than in a given process! This is the multiplicity issue – an error somewhere among a set of “tests” is higher than for any given test.

55 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Why Multiplicity Matters – Multiple Testing

The probability of making a Type I error out of C (i.e., independent) comparisons is given as

p(At least one Type I error) = 1−p(No Type I errors) = 1−(1−α)C ,

where C is the number of independent comparisons to be performed (based on rules of probability). If C = 5, then p(At least one Type I error) = .2262!

Note that this is the same probability that 1 or more confidence intervals when 5 are computed, each at the 95% level, do not bracket the population quantity. The scenario here is analogous to the airline scheduling system.

56 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Types of Error Rates

There are three types of error rates that can be considered:

1 Per comparison error rate (αPC). Analogous to the per process failure rate (5%) in the the airline system example.

2 Familywise error rate (αFW). Analogous to the system failure rate (22.62%) of the airline system example above.

3 Experimentwise error rate (αEW). Analogous to the multiple systems being required to fly the airplane (e.g., not only the scheduling system, but also that the plan functions properly, the flight team arrives on time, etc.), which can be much higher than αFW (if there are multiple families).

57 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Per Comparison Error Rate

αPC: the probability that a particular test (i.e., a comparison) will reject a true null hypothesis.

This is the Type I error rate with which we have always used (as we only worked with a single test at a time).

58 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Familywise Error Rate (αFW)

αFW: the probability that one or more tests will reject a true null hypothesis somewhere in the “family.”

Defining exactly what a family is can be difficult and open to interpretation. As an aside, there are many statistical issues “open to interpretation.”

Reasonable people can disagree on how to handle various issues.

Openness about the methods, it’s assumptions, and limitations is key.

59 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Experimentwise Error Rate (αEW)

αEW: the probability that one or more tests will reject a true null hypothesis somewhere in the “experiment” (or study more generally).

Modifying the significance criterion so that αFW is the probability of a Type I error out of the set C significance tests is the same as forming C simultaneous confidence intervals.

We do not focus on the experiment wise error rate, as we will assume a single family for our set of tests.

60 / 104 3/31/2016 xkcd: Significant

ARCHIVE WHAT IF? A WEBCOMIC OF ROMANCE, BLAG SARCASM, MATH, AND LANGUAGE.

STORE THING EXPLAINER IS AVAILABLE AT: AMAZON, BARNES & NOBLE, INDIE BOUND, HUDSON A Hypothesis,ABOUT A Result

SIGNIFICANT

|< < PREV RANDOM NEXT > >|

From XKCD: http://xkcd.com/882/

http://xkcd.com/882/ 1/4 3/31/2016 xkcd: Significant

ARCHIVE WHAT IF? A WEBCOMIC OF ROMANCE, BLAG SARCASM, MATH, AND LANGUAGE.

STORE THING EXPLAINER IS AVAILABLE AT: AMAZON, BARNES & NOBLE, INDIE BOUND, HUDSON ABOUT

SIGNIFICANT

|< < PREV RANDOM NEXT > >|

Tests, tests, tests, . . .

From XKCD: http://xkcd.com/882/

http://xkcd.com/882/ 1/4 3/31/2016 xkcd: Significant

ARCHIVE WHAT IF? A WEBCOMIC OF ROMANCE, BLAG SARCASM, MATH, AND LANGUAGE.

STORE THING EXPLAINER IS AVAILABLE AT: AMAZON, BARNES & NOBLE, INDIE BOUND, HUDSON ABOUT

SIGNIFICANT

|< < PREV RANDOM NEXT > >|

. . . and more tests. . .

http://xkcd.com/882/ 1/4

From XKCD: http://xkcd.com/882/ . . . and more tests. . .

3/31/2016 xkcd: Significant

From XKCD: http://xkcd.com/882/

|< < PREV RANDOM NEXT > >|

PERMANENT LINK TO THIS COMIC: HTTP://XKCD.COM/882/ IMAGE URL (FOR HOTLINKING/EMBEDDING): HTTP://IMGS.XKCD.COM/COMICS/SIGNIFICANT.PNG

http://xkcd.com/882/ 2/4 3/31/2016 xkcd: Significant

. . . and yet more tests. . .

From XKCD: http://xkcd.com/882/

|< < PREV RANDOM NEXT > >|

PERMANENT LINK TO THIS COMIC: HTTP://XKCD.COM/882/ IMAGE URL (FOR HOTLINKING/EMBEDDING): HTTP://IMGS.XKCD.COM/COMICS/SIGNIFICANT.PNG

http://xkcd.com/882/ 2/4 3/31/2016 xkcd: Significant

ARCHIVE WHAT IF? A WEBCOMIC OF ROMANCE, BLAG SARCASM, MATH, AND LANGUAGE.

STORE THING EXPLAINER IS AVAILABLE AT: AMAZON, BARNES & NOBLE, INDIE BOUND, HUDSON A TypeABOUT I Error (It Seems)

SIGNIFICANT

|< < PREV RANDOM NEXT > >|

From XKCD: http://xkcd.com/882/

http://xkcd.com/882/ 1/4 3/31/2016 xkcd: Significant

After Many Tests, A “Finding”

|< < PREV RANDOM NEXT > >|

From XKCD: http://xkcd.com/882/ PERMANENT LINK TO THIS COMIC: HTTP://XKCD.COM/882/ IMAGE URL (FOR HOTLINKING/EMBEDDING): HTTP://IMGS.XKCD.COM/COMICS/SIGNIFICANT.PNG

http://xkcd.com/882/ 2/4 3/31/2016 xkcd: Significant

Error Rate

|< < PREV RANDOM NEXT > >|

PERMANENT LINK TO THIS COMIC: HTTP://XKCD.COM/882/ The probabilityIMA ofGE URL a(FOR H TypeOTLINKING/EMBEDD ING): errorHTTP://IMGS.XKCD. forCOM/COMIC 20S/SIGNIFICA independentNT.PNG tests, which the jelly bean comparisons were, is http://xkcd.com/882/ 2/4

1 − (1 − .05)20 = 1 − .0.3584859 = 0.6415141

Thus, there is a 64% chance of a Type I error in such a case!

From XKCD: http://xkcd.com/882/ A Summary. . .

Multiplicity Matters! Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Linear Comparisons: Specifying Contrasts of Interest

Suppose a question of interest is the contrast of group 1 and group 3 in a three group design.

That is, we are interested in the following effect: Y¯1 − Y¯3.

The above is equivalent to: (1) × Y¯1 + (0) × Y¯2 + (−1) × Y¯3.

70 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Suppose a question of interest is the mean of group 1 and group 2 (i.e., the mean of the two group means) and group 3 in a three group design. ¯ ¯ Y1+Y2 ¯ That is, we are interested in the following effect: 2 − Y3.

¯ ¯ Y1+Y2 ¯ The above is equivalent to: 2 + (−1) × Y3.

1 ¯ 1 ¯ ¯ The above is equivalent to: ( 2 ) × Y1 + ( 2 ) × Y2 + (−1) × Y3.

We could also write the above as: (.5) × Y¯1 + (.5) × Y¯2 + (−1) × Y¯3.

71 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Consider a situation in which group 1 receives one type of allergy medication, group 2 receives another type of allergy medication, and group 3 receives a placebo (i.e., no medication). The question here is “does taking medication have an effect over not taking medication on self reported allergy symptoms.”

72 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Forming Linear comparisons

In the population, the value of any contrast of interest is given as m X Ψ = c1µ1 + c2µ2 + c3µ3 + ... + cmµm = cj µj , j=1

where cj is the comparison weight for the jth group and Ψ is the population value of a particular linear combination of means. An estimated linear comparisons is of the form m X Ψˆ = c1Y¯1 + c2Y¯2 + c3Y¯3 + ... + cmY¯m = cj Y¯j , j=1

where cj is the comparison weight for the jth group and Ψˆ is the particular linear combination of means. 73 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Forming Linear comparisons

The first example from above was comparing the mean of group 1 versus group 2. In c-weight form the c-weights are [1, 0, −1]: (1) × Y¯1 + (0) × Y¯2 + (−1) × Y¯3.

Comparing one mean to another (i.e., using a 1 and -1 c-weight with the rest 0’s) is called a pairwise comparisons (as the comparison only involves a pair).

74 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Forming Linear comparisons

The second example was comparing the mean of groups 1 and 2 versus group 3. In c-weight form the c-weights are [.5,.5, −1]: (.5) × Y¯1 + (.5) × Y¯2 + (−1) × Y¯3.

Comparing weightings of two or more groups to one or more other groups is called a complex comparison. That is, if the c-weights are something other than 1 and -1 and the rest 0’s, it is a complex comparison.

75 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

m P It is required that cj = 0. j=1

For example, setting c1 to 1 and c2 to −1 yields the pair-wise comparison of Group 1 and Group 2:

Ψˆ = (1 × Y¯1) + (−1 × Y¯2) = Y¯1 − Y¯2.

76 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Rules for c-Weights

The sum of the c-weights for a comparison that are positive should sum to 1. The sum of the c-weights for a comparison that are negative should sum to -1. By implication of the two rules above, sum of all c-weights for P a comparison should sum to 0 (i.e., cj = 0).

Otherwise, the corresponding confidence interval is not as intuitive. However, any rescaling of such c-weights produces the same t-test. The confidence interval will have a different interpretation than usual, as the effect will be for a specific linear combination (e.g., Ψˆ = 2Y¯1 − Y¯2 − Y¯3).

77 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Thus, for the mean of Groups 1 and 2 compared to the mean of Group 3, the contrast is

Y¯ + Y¯ Ψˆ = (.5 × Y¯ ) + (.5 × Y¯ ) + (−1 × Y¯ ) = 1 2 − Y¯ . 1 2 3 2 3

78 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Consider a situation in which one wants to weight the groups based on the relative size of an outside factor, such as marketshare, profit-per-segment, number of users, et cetera.

Suppose that interest is in comparing teens versus a weighted average of 20 year olds and 30 year olds in an online community, where among the 20 and 30 year olds the proportion of users is 70 percent and 30 percent, respectively.

Ψˆ = 1 × Y¯Teens + (−.70 × Y¯20s ) + (−.30 × Y¯30s ).

79 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

There are technically an infinite number of comparisons that can be formed, but only a few will likely be of interest.

The comparisons are formed so that targeted research questions about population mean differences can be addressed.

But, recall that in general, the sum of the c-weights that are positive should sum to 1 and the sum of the c-weights that are negative should sum to -1 so as to have a more interpretable confidence interval.

80 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions A More Powerful t-Test

The t-test corresponding to a particular contrast it given as P cj Y¯j Ψˆ t = s = , m  c2  SE(Ψˆ) MS P j Within nj j=1

where the MSWithin is from the ANOVA and is the best estimate of the population variance. Importantly, this t-test has N − m degrees of freedom (i.e., the MSWithin degrees of freedom). Note that the denominator is simply the standard error of the contrast, which is used for the corresponding confidence interval.

81 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Recall that when the homogeneity of variance assumption holds, there are m different estimates of σ2.

For homogeneous variances, the best estimate of the population variance for any group is the mean square error (MSWithin), which uses information from all groups.

Thus, the independent groups t-test can be given as

Y¯j − Y¯k t = r ,  1 1  MSWithin + nj nk with degrees of freedom based on the mean square within (N − m), which provides more power.

82 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

The above two-group t-test is still addresses the question “does the population mean of Group 1 differ from the population mean of Group 2?”

However, there is more information is used because the error term is based on N − m degrees of freedom instead of n1 + n2 − 2.

83 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

The MSWithin – Even for a Single Group

Even if we are interested in testing or forming a confidence interval for a single group, the mean square within can (and usually should) be used — again, due to having a better estimate of σ2:

Y¯ − µ t = j 0 . r   MS 1 Within nj The two-sided confidence interval is thus: s   ¯ 1 Yj ± MSWithin × t(1−α/2,N−m). nj The degrees of freedom for the above test and confidence 2 interval is, because MSWithin is used as the estimate of σ ,

N − m. 84 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Thus, using MSWithin is one way to have more power to test the null hypothesis concerning a single group or two groups, even when more than two groups are available.

Additionally, precision is increased because the confidence interval will be narrower (due to the smaller standard error and smaller critical value).

85 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions The Bonferroni Procedure

The Bonferroni Procedure is also called Dunn’s procedure.

Good for a few pre-planned targeted tests, but doing too many leads to conservative critical values. Conservative critical values are those that are bigger (i.e., harder to achieve significance) than would be the case ideally.

Liberal critical values are those that are smaller (i.e., easier to achieve significance) than would be the case ideally.

86 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

It can be shown that αPC ≤ αFW ≤ CαPC, where C is the number of comparisons.

The per comparison error rate can be manipulated by dividing the desired familywise (or experimentwise) Type I error rate by αFW C, the number of comparisons: αPC = C .

The standard t-test formula is used, but the obtained t value is compared to a critical value based on α/C: t(1−(α/C)/2,df ). The observed p-values (e.g., from SPSS) can be corrected for multiplicity by multiplying the C p-values by C.

If the corrected p value is less than αFW , then the test is statistically significant in the context of a correct familywise Type I error rate.

87 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

The critical value is what changes in the context of a Bonferroni test, not the way in which the t-test and/or confidence intervals are calculated. √ Incorporating MSWithin√into the denominator of the t-test is not really a change, as MSWithin is just a pooled variance based on m (rather than 2) groups.

2 Recall this is just an extension of sPooled when information on more than two groups is available.

88 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions Tukey’s Test

Tukey’s test is used when all (or several) pairwise comparisons are to be tested.

For comparing all possible pair-wise comparisons, Tukey’s test provides the most powerful multiple comparison procedure. There is a Tukey-b in SPSS — I recommend “Tukey.” The p-values and confidence intervals given by SPSS already yields, for the Tukey procedure, “corrected” p-values and confidence intervals.

89 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

Pairwise comparisons compare the means of two groups (i.e., a pair; µ1 − µ3) without allowing any other complex comparisons (e.g., (Y¯1 + Y¯2)/2 − Y¯3).

The observed test statistic is compared to the tabled values of the Studentized range distribution. This is the distribution that the Tukey procedure uses to obtain confidence intervals and p-values.

90 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions The Scheff´eTest

For any number of post hoc tests with any linear combination of means, the Scheff´eTest is generally optimal.

Although the Scheff´eTest is conservative for a small number of comparisons, any number of comparisons can be conducted while controlling the Type I error rate.

91 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

We compute the F -value (just a t-value squared) in accord with some linear combination of means, and a critical value is determined for the specific context.

The Scheff´ecritical F -value (take the square root for the critical t-value) is given as

(m − 1)F(m−1,N−m;α), which is m − 1 times larger than the critical ANOVA value.

92 / 104 Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

The Scheff´eprocedure should not be done for all pairwise comparisons (it is not as powerful as Tukey’s Test for pairwise comparisons).

If many complex and other (e.g., pairwise) are to be done, usually the Scheff´eprocedure is optimal.

93 / 104 Flowchart for Multiple Comparisons

Begin

Testing all pairwise and no complex comparisons (either planed or post Yes Use Tukey’s method hoc) or choosing to test only some pairwise comparisons post hoc?

No

Are all comparisons No Use Scheff´e’smethod planned?

Yes

Is Bonferroni critical Yes value less than Use Bonferroni’s method Scheff´ecritical value?

No

Use Scheff´e’smethod (or, prior to collecting the data, reduce the number of contrasts to be tested) Goal of Analysis of Variance Why Multiplicity Matters The Formal ANOVA Model Error Rates Explanation by Example Linear Combinations of Means Multiple Comparisons Controlling the Type I Error Assumptions

SPSS does not make it easy to get the appropriate p-values and confidence intervals for complex comparisons.

The Bonferroni and Scheff´eprocedures in SPSS are for pair-wise comparisons, which are not of interest because Tukey is almost always preferred for pair-wise.

For the specified contrasts, SPSS reports only the standard output (i.e., not controlling the Type I error rate).

Thus, users need to be really careful they are appropriately controlling the Type I error rate appropriately.

95 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions Assumptions of the ANOVA

The assumptions of the ANOVA are the same as for the two-group t-test.

1 The population from which the scores were sampled is normally distributed.

2 The variances for each of the m groups is the same.

3 The observations are independent.

Recall that multiple regression assumes homoscedasticity, which is just an extension of homogeneity of variance.

96 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions

Also like the independent groups t-test, the first two assumptions become less important as sample size increases. This is especially when the per group sample sizes are equal or nearly so.

Thus, the larger the sample size, the more robust the model to these two assumption violations.

97 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions

Again, like the t-test, the ANOVA is very sensitive (i.e., it is not robust) to violations of the assumption of independence. Observations that are not independent can make the empirical α rate much different than the nominal α rate.

98 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions

Analysis of variance procedures test an omnibus (i.e., an overall) hypothesis. More specifically, ANOVA models test the hypothesis that µ1 = µ2 = ... = µm.

In many situations, primary interest concerns targeted null hypotheses (not just the omnibus hypothesis).

Thus, additional analyses may be necessary.

99 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions A Summary from Designing and Analyzing Data

This discussion “focuses on special methods that are needed when the

goal is to control αFW instead of to control αPC . Once a decision has been made to control αFW , further consideration is required to choose an appropriate method of achieving this control for the specific circumstance. One consideration is whether all comparisons of interest have been planned in advance of collecting the data. If so, the Bonferroni adjustment is usually most appropriate, unless the number of planned comparisons is quite large. Statisticians have devoted a great deal of

attention to methods of controlling αFW for conducting all pairwise comparisons, because researchers often want to know which groups differ from other groups. We generally recommend Tukey?s method for conducting all pairwise comparisons. Neither Bonferroni nor Tukey is appropriate when interest includes complex comparisons chosen after having collected the data, in which case Scheff´e’smethod is generally 100 / 104 most appropriate” (notation changed to reflect current use). Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions What You Learned from this Lesson

You learned: How to compare more than two independent means to assess if there are any differences via analysis of variance (ANOVA). How the total sums of squares for the data can be decomposed to a part that is due to the mean differences between groups and to a part that is due to within group differences. Why doing multiple t-tests is not the same thing as ANOVA. Why doing multiple t-tests leads to a multiplicity issue, in that as the number of tests increases, so to does the probability of one or more error. How to correct for the multiplicity issue in order for a set of contrasts/comparisons has a Type I error rate for the collection of tests at the desirable (e.g., .05) level. How to use SPSS to implement an ANOVA and follow-up tests.

101 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions Notations

2 2 H0: σ1 = σ2 - The null hypothesis of equal variances

F(df1,df2) - The F -statistic with df1 and df2 as the degrees of freedom 2 2 s1 and s2 - The variances for group 1 and group 2, respectively 2 sPooled - Pooled within group variance m - Number of groups

nj - Sample size in the jth group (j = 1,..., m) N - Total sample size m P N = nj j=1

102 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions Notations Continued

SS - Sum of squares This can be for the Between, Treatment, Among, Within, Error, or Total Sum of Squares MS - Mean square (i.e., a variance)

MSWithin is the mean square within a group

Yij - The score for the ith individual in the jth group

τj - The treatment effect of the jth group

εij - Some uniqueness for the ith individual in the jth group

E[MSWithin] - The expected value of the mean squares within a group

103 / 104 Goal of Analysis of Variance The Formal ANOVA Model Assumptions of the ANOVA Explanation by Example What You Learned Multiple Comparisons Notations Assumptions Notations Continued

C - The number of independent comparisons to be performed

αPC - Per comparison error rate

αFW - Familywise error rate

αEW - Experimentwise error rate

Ψˆ - The particular linear combination of means

cj - Comparison weight for the jth group

104 / 104