Quick viewing(Text Mode)

Analysis of Variance

Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands

Analysis of

Janette Walde

[email protected] Department of University of Innsbruck

Janette Walde of Variance Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands Outline I

1 Introduction Problems What is Some Terminology 2 ANOVA Object of Investigation Exploratory Analysis Notation Assumptions 3 One-Way ANOVA of Application Hypothesis Testing Example

Janette Walde Analysis of Variance Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands Outline II Post-Hoc Analysis Power Analysis

4 Two-Way ANOVA Terminology Assumptions Results Exploratory Analysis Example

5 Further Extensions

6 Useful R-commands

Janette Walde Analysis of Variance Introduction ANOVA Problems One-Way ANOVA What is Analysis of Variance Two-Way ANOVA Some Terminology Further Extensions Useful R-commands Problems/Questions

Do fertilizer have different effects on different kind of wheat? React females differently on anti-cancer drugs as males? Does water evaporation of soil depend on the kind of vegetation growing, controlling for climate conditions? A new treatment meant to help those with chronic arthritis pain was developed and tested for its long-tern effectiveness. Participants in the rated their level of pain on a 0 (no pain) to 9 (extreme pain) scale at three-month intervals. Was the treatment effective? ( Does the exposure of plants to various amounts of CO2 affect characteristics of the plant?

Janette Walde Analysis of Variance Introduction ANOVA Problems One-Way ANOVA What is Analysis of Variance Two-Way ANOVA Some Terminology Further Extensions Useful R-commands What is ANOVA?

ANalysis Of VAriance. Partitions the observed variance based on explanatory (independent) variables. Compares partitions to test significance on explanatory variables.

Janette Walde Analysis of Variance Introduction ANOVA Problems One-Way ANOVA What is Analysis of Variance Two-Way ANOVA Some Terminology Further Extensions Useful R-commands Some Terminology

Between subject design - each subject participates in one and only one . Within subjects design - the same group of subjects serves in more than one treatment - Subject is now a factor. Mixed design - a study which has both between and within subject factors. Repeated measures - general term for any study in which multiple measurements are measured on the same subject. Can be either multiple treatments or several measurements over time.

Janette Walde Analysis of Variance Introduction ANOVA Object of Investigation One-Way ANOVA Exploratory Analysis Two-Way ANOVA Notation Further Extensions Assumptions Useful R-commands Object of investigation

Use and variance like quantities to study the equality or non-equality of population . So, although it is analysis of variance we are actually analyzing means, not variances. There are other methods which analyze the variances between groups.

Janette Walde Analysis of Variance Introduction ANOVA Object of Investigation One-Way ANOVA Exploratory Analysis Two-Way ANOVA Notation Further Extensions Assumptions Useful R-commands Typical exploratory analysis include

Tabulation of the number of subjects in experimental group. Side-by-side box plots. Statistics about each group.

Janette Walde Analysis of Variance Introduction ANOVA Object of Investigation One-Way ANOVA Exploratory Analysis Two-Way ANOVA Notation Further Extensions Assumptions Useful R-commands Notation

If we have K groups denote the means of the groups as µ1,µ2, ..., µK . Subject i in group j has observation yij : yij = µj + ij 2 where ij are independent distributed N(0, σ ). Can combine this and say that subjects from group j have distribution N(µ, σ2). With the sample for any treatment group is representative of the population mean for that group.

Janette Walde Analysis of Variance Introduction ANOVA Object of Investigation One-Way ANOVA Exploratory Analysis Two-Way ANOVA Notation Further Extensions Assumptions Useful R-commands Assumptions

1 The errors ij are normally distributed. 2 Across the conditions the errors have equal spread. Often referred to as equal vaiances

Rule of thumb: the assumption is met if the largest variance is less than twice the smallest variance. If unequal variances need to make a correction. This is usually α/2. 3 The errors are independent from each other.

Janette Walde Analysis of Variance Introduction ANOVA Object of Investigation One-Way ANOVA Exploratory Analysis Two-Way ANOVA Notation Further Extensions Assumptions Useful R-commands Checking the assumptions

Use the residuals which are the estimates of ij . 1 Look at normal probability plot. 2 Look at residual versus fitted plot. 3 Hard to check often assumed from study design. For mild violations of the assumptions there are options for correction. When the assumptions are NOT met the p-values are simply wrong.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Basics

One-way ANOVA is used when Only testing the effect of one explanatory variable. Each subject has only one treatment or condition. Thus, a between-subject design. Used to test for differences among two or more independent groups. Gives the same results as two sample t-tests if explanatory variable has to levels.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Hypothesis

H0 : µ1 = µ2 = ... = µK

H1: The µ’s are not all equal. The null hypothesis is called the overall null and is the hypothesis tested by ANOVA. If the overall null is rejected you must do more specific hypothesis testing to determine which means are different, often referred to as contrasts.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Terminology

The sample variance is the sum of the squared deviations from the mean divided by the number of observations minus 1 (x − x¯)2 s2 = i n − 1 P A mean square (MS) is a variance like quantity calculated as the sum of the squared deviations (SS) divided by the degrees of freedom (df )

SS MS = df

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Within versus Between

In one-way ANOVA we work with two mean square quantities

* MSwithin ... the mean square within-groups * MSbetween ... the mean square between-groups For each individual group we have n P i (x −x¯)2 SSi = j=1 ij i dfi ni −1 So the estimate of MSwithin is K SS MS = SSwithin = Pi=1 i within dfwithin N−K And the estimate of MSbetween is K − 2 SSbetween Pi=1 ni (x¯i x¯) MSbetween = = dfbetween K −1

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Mean Squares

What do these values mean? 2 MSwithin is considered a true estimate of σ that is unaffected by whether the null or is true. 2 MSbetween is considered a good estimate of σ only when the null hypothesis is true. If the alternative is true, values of MSbetween tend to be inflated. Thus, we can look at the ratio of the two mean square values to evaluate the null hypothesis.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Testing the Hypothesis

The F-test looks at the variation among the group means relative to the variation within the sample − F = MSbetween = SSbetween/dfbetween = SSbetween/(K 1) MSwithin SSwithin/dfwithin SSwithin/(N−K ) The F- tends to be larger if the alternative hypothesis is true than if the null hypothesis is true. The test statistic F has an F(K − 1, N − K ) distribution.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands What does the F ratio tell us?

F = MSbetween/MSwithin The denominator is always an estimate of σ2 (under both the null and alternative hypotheses). The numerator is either another estimate of σ2 (under the null) or is inflated (under the alternative). If the null is true, values of F are close to 1. If the alternative is true, values of F are larger. Large values of F depend on the degrees of freedom.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands The ANOVA table

When running an ANOVA, statistical packages will return an ANOVA table summarizing the SS, MS, df , F-statistic, and p-value: SS df MS F Sig Group (Treatment, SS df MS MSbet. p-value bet. bet. bet. MSwithin between) Residual (Error, SSwithin dfwithin MSwithin within) Total SSbet.+ dfbet.+ SSwithin dfwithin

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Example

The are gathered from a plant physiology experiment which investigated the effect of various sugar on the growth of peas. Growth or length is measured in ’ocular units’. Five groups are analyzed, a control group and four groups varying in the kind of sugar and its amount. In each groups there are 10 measurements.

sub- control 2% 2% 1% glucose + 1% jects group glucose fructose 2% saccharose fructose 1 71 57 58 58 62 2 68 58 61 59 66 3 70 60 56 58 65 ......

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Example

We want to know whether the means of the variables ’length’ differ significantly across the groups, i.e. does the supply with sugar influence the growth of the peas? Use 5 Control group, group 1, group 2, group 3, and group 4.

H0 : Growth is independent of the sugar support. H0 : µcontrolgroup = µgroup1 = µgroup2 = µgroup3 = µgroup4

H1 : Growth varies across groups. H1 : At least one of the means is different.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Box plots 60 65 70

"1% fructose" "2% fructose" "control group"

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Summary

The largest variance is less than twice the smallest variance (2.2 < 2 · 1.4 = 2.8). Use α = 0.05.

Groups ni Mean Variance Control group 10 70.1 2.2 Group 1 10 64.1 1.8 Group 2 10 58.0 1.4 Group 3 10 58.2 1.9 Group 4 10 59.3 1.6

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Degrees of Freedom

How many groups do we have? There are K = 5 groups What is the sample size? There are N = 50 peas. Using these values:

What is dfwithin? K − 1 = 5 − 1 = 4 What is dfbetween? N − K = 50 − 5 = 45

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Degrees of Freedom

How many groups do we have? There are K = 5 groups What is the sample size? There are N = 50 peas. Using these values:

What is dfwithin? K − 1 = 5 − 1 = 4 What is dfbetween? N − K = 50 − 5 = 45

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Degrees of Freedom

How many groups do we have? There are K = 5 groups What is the sample size? There are N = 50 peas. Using these values:

What is dfwithin? K − 1 = 5 − 1 = 4 What is dfbetween? N − K = 50 − 5 = 45

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Degrees of Freedom

How many groups do we have? There are K = 5 groups What is the sample size? There are N = 50 peas. Using these values:

What is dfwithin? K − 1 = 5 − 1 = 4 What is dfbetween? N − K = 50 − 5 = 45

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Sample Output

SS df MS F Sig Group (Treatment, 1077.3 4 269.330 82.168 0.000 between) Residual (Error, 147.5 45 3.278 within) Total 1224.8 9

Our estimate of σ2 is approximately 3.3. The numerator MS = 269.330 and appears to be highly inflated.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Results

F-statistic = 22.1. p-value: < 0.05. Conclusion - the growth differs for at least one of the groups. To make stronger statements need to do further testing.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Checking the assumptions

Normal Q−Q Residuals vs Fitted

4 4

34 34 Residuals Standardized residuals Standardized −2 0 2 4 −1 0 1 2

8 8

−2 −1 0 1 2 58 60 62 64 66 68 70

Theoretical Quantiles Fitted values Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Further Analysis

If H0 is rejected, we conclude that not all the µ’s are equal. We would like to make statements about where there are differences. Can use planned or unplanned comparisons (or contrasts).

* Planned comparisons are interesting comparisons decided on before analysis. * Unplanned comparisons occur after seeing the results. Be careful not to go fishing for results!

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Contrasts

A simple hypothesis compares two population means:

* H0 : µ1 = µ5 A complex contrast hypothesis has multiple population means on either side:

* H0 : (µ1 + µ2)/2 = µ3 * H0 : (µ1 + µ2)/2 = (µ3 + µ4 + µ5)/3

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Post-Hoc Analysis

Coefficients: Estimate Std. Error t value Pr(> |t|) (Intercept) 64.1 0.5725 111.961 <2e − 16 "1% glu, 2% sac" −6.1 0.8097 −7.534 1.66e − 09 "2% fructose" −5.9 0.8097 −7.287 3.83e − 09 "2% glucose" −4.8 0.8097 −5.928 3.99e − 07 control group" 6.0 0.8097 7.410 2.52e − 09 Residual : 1.81 on 45 degrees of freedom Multiple R-squared: 0.8796, Adjusted R-squared: 0.8689 F-statistic: 82.17 on 4 and 45 DF, p-value: < 2.2e − 16

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Post-Hoc Analysis, cont.

What if we notice a possible interesting difference when looking at the results? Can do comparisons but need to adjust the α-level to control for Type-1 error. Bonferroni correction for the number of comparisons done: α∗ = α (Bonferroni-Holm number of comparisons correction).

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Other Options

One common method is to use Tukey’s simultaneous confidence intervals to calculate any and all pairs of group population means. This procedure takes multiple comparisons into consideration to preserve the α-level. Dunnett’s tests. Scheffe procedure.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Bonferroni-Holm correction for previous example

Pairwise comparisons using t tests with pooled SD

data: length and group_name "1% fru" "1% glu, "2% fru" "2% glu" 2% sac" "1% glu, 2% sac" 1.2e − 08 − − − "2% fructose" 1.9e − 08 0.81 − − "2% glucose" 1.6e − 06 0.35 0.36 − control group" 1.5e − 08 < 2e − 16 < 2e − 16 2.4e − 16 P value adjustment method: holm.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Comparison to

The conclusions about the overall null hypothesis will be the same. In regression can make statements comparing groups to baseline. To make more conclusive statements will need to do more analysis. ANOVA and either planned or post-hoc comparisons will do the same and is often easier.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands One-way ANOVA Power

Two different TOEFL prep. courses charge $1200 for a two month course. An (unethical) experiment would be to randomize students into one of the two courses or take no course. What information is needed to calculate power for this one-way ANOVA? * Sample size * Within group variance (σ2) * Estimated or minimally interesting outcome means for each group.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Estimate of σ2

Based on previous years, we know that 95% of the student scores on TOEFL fall between 900 and 1500: σ2 = (1500 − 900)/4 = 150 σ2 = 1502

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Minimally interesting outcome

What is the minimally average benefit, in points gained, that would justify the program? The minimally interesting outcome is based on previous knowledge. For this example we’ll try several different values.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Computing the Power

Different applets will define things slightly different (http : //www.epibiostat.ucsf .edu/biostat/sampsize.html). For the applet I used (’nQuery’), they require ’sd[treatment]’. From their definition this is calculated as:

K (µ − µ)2 sd[treatment] = i=1 i s K P

µi ... mean of group i K ... number of groups

Ready to go to power applet.

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Computing the Power, Cont. I

Let σ = 150, n = 50, effect = 50 points Power = 38% Let σ = 150, n = 100, effect = 50 points Power = 68% Let σ = 150, n = 50, effect = 100 points Power = 94% Let σ = 150, n = 50, effect = 25 points Power = 12% Let σ = 100, n = 50, effect = 50 points Power = 73%

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Computing the Power, Cont. II

Let σ = 100, n = 100, effect = 50 points Power = 96% Let σ = 100, n = 50, effect = 100 points Power = 99% Let σ = 100, n = 50, effect = 25 points Power = 23%

Janette Walde Analysis of Variance Introduction Area of Application ANOVA Hypothesis Testing One-Way ANOVA Example Two-Way ANOVA Post-Hoc Analysis Further Extensions Power Analysis Useful R-commands Moving past One-way ANOVA

What if we have two categorical explanatory variables? What if we have categorical and quantitative explanatory variables? What if subjects have more than one treatment? What if there is more than one response variable? And many other combinations...

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Two-way ANOVA

Two-way (or multi-way) ANOVA is an appropriate analysis method for a study with a quantitative outcome and two (or more) categorical explanatory variables.

Suppose we now have two categorical explanatory variables:

Is there a significant X1 effect?

Is there a significant X2 effect? Are there significant effects?

If X1 has k levels and X2 has m levels, then the analysis is often referred to as a ’k by m ANOVA’ or ’k × m ANOVA’.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Terminology

If the interaction is significant, the model is called an interaction model. If the interaction is not significant, the model is called an additive model. Explanatory variables are often referred to as factors.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Assumptions

The assumptions are the same as in One-way ANOVA:

1 The errors ij are normally distributed. 2 Across the conditions, the errors have equal spread. Often referred to as equal variances. 3 The errors are independent from each other.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Results

Results are again displayed in an ANOVA table Will have one line for each term in the model. For a model with two factors, we will have one line for each factor and one line for the interaction. We will also have a line for the error and the total. See next page.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands The ANOVA table

SS df MS F Sig. Factor 1 k − 1 Factor 2 m − 1 Interaction (k − 1)(m − 1) Error N − k · m ? Total N − 1

The MS(error), denoted by ? in the above table, is the true estimate of σ2. The MS in each row is that row’s SS/df . The F-statistic is the MS/MS(error).

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Exploratory Analysis

Table of means Interaction or profile plots * An interaction plot is a way to look at outcome means for two factors simultaneously. * A plot with parallel lines suggests an additive model. * A plot with non-parallel lines suggests an interaction model. * Note that an interaction plot should NOT be the deciding factor in whether or not to run an interaction model.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Example

Do anti-cancer drugs have different effects on males and females? Three types of different drugs are given patients having cancer. The diameter of the tumor is measured.

X1: Kind of drug - 3 levels

X2: Gender - 2 levels Response: Tumor diameter We will fit a 3 by 2 ANOVA.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Table of means and counts

Male Female Overall Cisplatin 66.875 60.0 63.4375 Vinblastine 66.875 62.5 64.6875 5-fluorouracil 40.625 57.5 49.0625 Overall 58.125 60.0 59.0625

Note, this table should also include the standard error of each of the means.

Male Female Cisplatin 8 8 Vinblastine 8 8 5-fluorouracil 8 8

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Interaction plots

as.factor(gender_name) as.factor(drug_name)

"male" "cisplatin" "female" "vinblastine" "5−fluorouracil" mean of diameter mean of diameter 40 45 50 55 60 65 40 45 50 55 60 65

"5−fluorouracil" "cisplatin" "vinblastine" "female" "male"

as.factor(drug_name) as.factor(gender_name) Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Interaction plots

There are two ways to do an interaction plot. Both are legitimate. Ease of interpretation is the final criteria of which to do. If one explanatory variable has more levels than the other, interpretation is often easier if the explanatory variable with more levels defines the x-axis. If one explanatory variable is quantitative but has been categorized and the other is categorical, interpretation is often easier if the categorized quantitative variable defines the x-axis. Example: age, 20 − 29, 30 − 39, 40 − 49, etc.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Results

Output: Df Sum Sq Mean Sq F value Pr(> F) as.factor(drug) 2 2412.50 1206.25 16.5260 5.077e − 06 as.factor(gender) 1 42.19 42.19 0.5780 0.4513514 as.factor(drug): 2 1362.50 681.25 9.3333 0.0004429 as.factor(gender) Residuals 42 3065.62 72.99 The last column contains the p-values * Always check interaction first! * If the interaction is not significant, rerun without it.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Checking the assumptions

Normal Q−Q Residuals vs Fitted

27 27 Residuals Standardized residuals Standardized

9

−29 −1 0 1 2 −20 −10 0 10 20 25

25

−2 −1 0 1 2 40 45 50 55 60 65

Theoretical Quantiles Fitted values

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Notes

The main effects should always be kept if the interaction is significant. Note that due to the groups of students, you will see vertical lines in the residual versus predicted plot. This is due to the fact that all students with a particular combination of the factors will have the same predicted value.

Janette Walde Analysis of Variance Introduction Terminology ANOVA Assumptions One-Way ANOVA Results Two-Way ANOVA Exploratory Analysis Further Extensions Example Useful R-commands Post-hoc Comparisons

You can get Tukey HSD (Tukey Honestly Significant Differences) tests in order to calculate post hoc comparisons on each factor in the model. You can specify specific factors as an option.

diff lwr upr p adj cisplatin":female 2.500 -10.252 15.252 0.991 "5-fluorouracil":female" "vinblastine":female 5.000 -7.752 17.752 0.848 "5-fluorouracil":female" "5-fluorouracil":male -16.875 -29.627 -4.123 0.004 "5-fluorouracil":female" . . . .

Janette Walde Analysis of Variance Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands Extensions

Analysis of * At least one quantitative and one categorical explanatory variable are included in the model. * In general, the main interest is the effects of the and the quantitative variable is considered to be a control variable. * It is a blending of regression and ANOVA. Multivariate designs: MANOVA/MANCOVA

Janette Walde Analysis of Variance Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands R-commands I

boxplot(length ∼ group_name) peas.aov < − aov(length ∼ group_name, data = peas.data) pairwise.t.test(length, group_name, p.adj = "holm") tapply(diameter, interaction(drug_name, gender_name), mean) interaction.plot(as.factor(drug_name), as.factor(gender_name), diameter)

Janette Walde Analysis of Variance Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands R-commands II

cancer.aovfit1 < − aov(diameter ∼ as.factor(drug_name) ∗ as.factor(gender_name)) summary(cancer.aovfit1) plot(cancer.aovfit1, which= 2) TukeyHSD(cancer.aovfit1) cancer.aovfit2 < − aov(diameter ∼ as.factor(drug_name) + as.factor(gender_name)+as.factor(drug_name) : as.factor(gender_name)) summary(cancer.aovfit2)

Janette Walde Analysis of Variance