<<

Analysis of (ANOVA) Lecture #7 BIOE 597, Spring 2017, Penn State University By Xiao Liu Agenda

• Review

• One-way ANOVA

• Multiple Comparison

Hypothesis Testing II (Review)

• Type I/II Errors and Power Hypothesis Testing II (Review) • Type I/II Errors and Power

• Type I error (a) = P(reject H0 | H0 is true):

• Type II error (b) = P(not reject H0 | H1 is true):

• Power (1-b) = P(reject H0 | H1 is true): Hypothesis Testing II (Review) • Formulas for Power Calculation • Left-tailed

za • Right-tailed

z1-a • Two-tailed Hypothesis Testing II (Review) • Increase Power Hypothesis Testing II (Review) • Increase Power

Increase a Large

One-tailed test Large sample Hypothesis Testing II (Review)

• Sample size calculation:

o Given the effect size (the difference between null and alternative hypotheses), calculate the sample size required to achieve certain power (typically 80%) for hypothesis testing at the significance level of a

� + � � = �0 − �1 � Hypothesis Testing II (Review) • Sensitivity and Specificity •Sensitivity measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). True Positive Rate = TP / (TP + FN) = Power •Specificity measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). True Negative Rate = TN / (TN + FP) = 1 – False Positive Rate = 1 - a Hypothesis Testing II (Review) • Sensitivity and Specificity •Sensitivity measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). True Positive Rate = TP / (TP + FN) = Power •Specificity measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). True Negative Rate = TN / (TN + FP) = 1 – False Positive Rate = 1 - a Hypothesis Testing II (Review) • Sensitivity and Specificity •Sensitivity measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). True Positive Rate = TP / (TP + FN) = Power •Specificity measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). True Negative Rate = TN / (TN + FP) = 1 – False Positive Rate = 1 - a • Receiver Operating Characteristic (ROC) Curve Hypothesis Testing II (Review) • Sensitivity and Specificity •Sensitivity measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). True Positive Rate = TP / (TP + FN) = Power •Specificity measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). True Negative Rate = TN / (TN + FP) = 1 – False Positive Rate = 1 - a • Receiver Operating Characteristic (ROC) Curve Hypothesis Testing II (Review) • Two-way table

• Pearson’s Chi-squared Test (�������� ����� − �������� �����) � = �������� ����� (� − �) � = ~ �2((� − 1)(� − 1)) � • Test Hypothesis Testing II (Review) • Simpson’s Paradox o Without considering the lurking variable

< Hypothesis Testing II (Review) • Simpson’s Paradox o Without considering the lurking variable

<

o With considering the lurking variable

< < ANOVA

• ANOVA stands for of variance ANOVA

• ANOVA stands for Analysis of variance

• Purpose of ANOVA: Comparing of different populations ANOVA

• ANOVA stands for Analysis of variance

• Purpose of ANOVA: Comparing means of different populations

• Difference from t-test o T-test: for comparing at most two population means o ANOVA: for comparing means of two or more populations Advantage of ANOVA

• Why not do multiple two-sample t tests Advantage of ANOVA

• Why not do multiple two-sample t tests

• The type I error (a) can accumulate over a series of tests so that the final experimentwise a-level can be quite high

• ANOVA allows researchers to evaluate all of the differences in a single hypothesis test using a single α-level and, thereby, keeps the risk of a Type I error under control no matter how many different means are being compared ANOVA

• The null hypothesis of ANOVA

H0: µ1 = µ2 = … = µI ANOVA

• The null hypothesis of ANOVA

H0: µ1 = µ2 = … = µI

• The of ANOVA is that at least one of the means is different, but we don’t necessary know which one from the ANOVA result

H1: µi ≠ µj for at lease one pair of (i,j) ANOVA

• Example: 25 patients with blisters and they were divided into 3 groups that received treatment A, treatment B, and placebo respectively.

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

• Question: Are these differences between three treatment groups are significant? Notation of ANOVA

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

• N = the total number of individuals (25) • I = the number of groups (3) • � = the mean of entire dataset, i.e., the overall mean (8.8)

• ni = the number of individuals in group i (8,8,9) • xij = the value for individual j in group i; • �� = the mean for group i, (7.25, 8.875, and 10.11) • si = the for group i. Assumptions of ANOVA

• Independence: o Each sample is drawn independently

• Normality: each group is approximately normal o Use or normal Q-Q plot to check this o Robust to some nonnormality, but not severe outliers

• Equal Variance: standard deviations of different groups are approximately equal o Rule of thuma: ratio of largest to smallest standard deviation is less than 2 Assumptions of ANOVA

Variable treatment N Mean StDev days A 8 7.250 7.000 1.669 B 8 8.875 9.000 1.458 P 9 10.111 10.000 1.764

Compare largest and smallest standard deviations: • largest: 1.764 • smallest: 1.458 • 1.458 x 2 = 2.916 > 1.764 How ANOVA Works

• Recall a two sample pooled t test

�∗ How ANOVA Works

• Recall a two sample pooled t test statistic “Signal”: difference between groups �∗

Averaging ”noise” out “Noise”: across large sample How ANOVA Works

• Recall a two sample pooled t test statistic “Signal”: difference between groups �∗

Averaging ”noise” out “Noise”: randomness across large sample

• We actually calculate something like a signal-to-noise ratio

• We can partition measurements from individuals into two parts

xij = µi + eij

Group means Errors ~ N(0, s2) How ANOVA Works

xij = µi + eij

Group means Errors ~ N(0, s2)

• Like t test, ANOVA also compare the relative amplitude of these two parts (“signal” versus “noise”), but via variance How ANOVA Works

xij = µi + eij

Group means Errors ~ N(0, s2)

• Like t test, ANOVA also compare the relative amplitude of these two parts (“signal” versus “noise”), but via variance

• ANOVA partitions the total variation (sum of squares, SS) of the dataset (�−�̅) into two separate components

variation between groups o (� − �̅) o variation within groups (�−�) Sum of Squares

• SST: Sum of Squares Total

��� = (�−�̅) Sum of Squares

• SST: Sum of Squares Total

��� = (�−�̅) • SSG: Sum of Squares Groups

��� = (�� − �̅) = ��(�� − �̅) Sum of Squares

• SST: Sum of Squares Total

��� = (�−�̅) • SSG: Sum of Squares Groups

��� = (�� − �̅) = ��(�� − �̅)

• SSE: Sum of Squares of Errors

��� = (�−��) = (�� −1) ∗ �� Sum of Squares

• SST: Sum of Squares Total

��� = (�−�̅) ��� = � − 1 • SSG: Sum of Squares Groups

��� = (�� − �̅) = ��(�� − �̅) ��� = � − 1

• SSE: Sum of Squares of Errors

��� = � − � ��� = (�−��) = (�� −1) ∗ �� Mean Sum of Squares

• Normalize SS with degree of freedom

��� ��� ��� ��� = ��� = ��� = ��� ��� ��� Variance Decomposition

• SST = SSG + SSE?

��� = (�−�̅)

= [ (�−��) + (�� − �̅)]

= (�−��) + (�� − �̅) +2 ∗ (�−��)(�� − �̅) Variance Decomposition

• SST = SSG + SSE?

��� = (�−�̅)

= [ (�−��) + (�� − �̅)]

= (�−��) + (�� − �̅) +2 ∗ (�−��)(�� − �̅)

SSG SST zero Graphical Illustration Graphical Illustration F-statistic

• The ANOVA F-statistic is a ratio of the Between Group Variation divided by the Within Group Variation:

������� ����� ��������� ��� � = = ~ �(� − 1, � − �) ���ℎ�� ����� ��������� ���

• A large F is evidence against H0, since it indicates that there is more difference between groups than within groups. ANOVA Table

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Source SS df MS F p Between Within Total ANOVA Table

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Source SS df MS F p Between 34.74 Within 59.26 Total 94

SSG = 8*(7.25-8.8)2+8*(8.875-8.8)2+9*(10.11-8.8)2=34.74 SSE = (8-1)*2.7857+(8-1)*2.1250+(9-1)*3.1111=59.26 ANOVA Table

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Source SS df MS F p Between 34.74 2 17.37 Within 59.26 22 2.69 Total 94 24

• MSG = SSG/DFG = 34.74/2 = 17.37 • MSE = SSE/DFE = 59.26/22 = 2.69 ANOVA Table

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Source SS df MS F p Between 34.74 2 17.37 6.45 Within 59.26 22 2.69 Total 94 24

• F = MSG/MSE = 17.37/2.69 = 6.45 ANOVA Table

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Source SS df MS F p Between 34.74 2 17.37 6.45 0.0062 Within 59.26 22 2.69 Total 94 24

• P-value = 1-fcdf(6.45,2,22) = 0.0062 ANOVA Table

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

Source SS df MS F p Between 34.74 2 17.37 6.45 0.0062 Within 59.26 22 2.69 Total 94 24

• P-value = 1-fcdf(6.45,2,22) = 0.0062

There are significant difference between three groups ANOVA and T-Test

• If we use ANOVA to compare two groups, will it give the same result as the two-sample t-test? ANOVA and T-Test

• If we use ANOVA to compare two groups, will it give the same result as the two-sample t-test?

o Use ANOVA to compare the treatment groups A and B ANOVA and T-Test

• If we use ANOVA to compare two groups, will it give the same result as the two-sample t-test?

o Use ANOVA to compare the treatment groups A and B

o Use t-test to compare the treatment groups A and B ANOVA and T-Test

• If we use ANOVA to compare two groups, will it give the same result as the two-sample t-test?

o Use ANOVA to compare the treatment groups A and B

o Use t-test to compare the treatment groups A and B ANOVA and T-Test

• If we use ANOVA to compare two groups, will it give the same result as the two-sample t-test?

o Use ANOVA to compare the treatment groups A and B

o Use t-test to compare the treatment groups A and B ANOVA and T-Test

• If we use ANOVA to compare two groups, will it give the same result as the two-sample t-test?

o Use ANOVA to compare the treatment groups A and B

o Use t-test to compare the treatment groups A and B F = 4.3 = (-2.0741)2=t2 ANOVA and T-Test • F-statistic • t-statistic

��� ���/��� ���/(� − 1) � = = = �∗ ��� ���/��� ���/(� − �)

��� = (�−��) = (�� −1) ∗ ��

��� = (�� − �̅) = ��(�� − �̅) ANOVA and T-Test • F-statistic • t-statistic

��� ���/��� ���/(� − 1) � = = = �∗ ��� ���/��� ���/(� − �)

��� = (�−��) = (�� −1) ∗ ��

��� = (�� − �̅) = ��(�� − �̅)

∑()∗ ∑()∗ ��� = = = �� ∑() ANOVA and T-Test • F-statistic • t-statistic

��� ���/��� ���/(� − 1) � = = = �∗ ��� ���/��� ���/(� − �)

��� = (�−��) = (�� −1) ∗ ��

��� = (�� − �̅) = ��(�� − �̅)

So, MSE is the pooled ∑()∗ ∑()∗ ��� = = = �� estimate of variance ∑() ANOVA and T-Test • F-statistic • t-statistic

��� ���/��� ���/(� − 1) � = = = �∗ ��� ���/��� ���/(� − �)

��� = (�−��) = (�� −1) ∗ ��

��� = (�� − �̅) = ��(�� − �̅)

So, MSE is the pooled ∑()∗ ∑()∗ ��� = = = �� estimate of variance ∑()

For I = 2 2 2 ��� = �1 �1 − �̅ + �2(�2 − �̅) ANOVA and T-Test • F-statistic • t-statistic

��� ���/��� ���/(� − 1) � = = = �∗ ��� ���/��� ���/(� − �)

��� = (�−��) = (�� −1) ∗ ��

��� = (�� − �̅) = ��(�� − �̅)

So, MSE is the pooled ∑()∗ ∑()∗ ��� = = = �� estimate of variance ∑()

For I = 2 � � � − � 2 ��� = � � − �̅ 2 + � (� − �̅)2 = 1 2 � − � 2 = 1 2 1 1 2 2 1 2 1 1 �1 + �2 + �1 �2 After ANOVA

• ANOVA only gives us a general answer to a general question: Are the differences among observed group means different?

• Once ANOVA indicates that the groups do not all appear to have the same means, what do we do? Multiple Comparison

• Once ANOVA indicates that the groups do not all appear to have the same means, we can compare them two by two using the two-sample t test. Then, we have to deal with the issue of multiple comparison Multiple Comparison

• Once ANOVA indicates that the groups do not all appear to have the same means, we can compare them two by two using the two-sample t test. Then, we have to deal with the issue of multiple comparison

• For example, if we have 5 groups for ANOVA, there will be 10 t-tests. Suppose we test each of them at the significance level of a = 0.05, which is the type I error for each test. However, the experimentwise type I error will be

P(at least one type I error in the 10 tests) = ? Multiple Comparison

• Once ANOVA indicates that the groups do not all appear to have the same means, we can compare them two by two using the two-sample t test. Then, we have to deal with the issue of multiple comparison

• For example, if we have 5 groups for ANOVA, there will be 10 t-tests. Suppose we test each of them at the significance level of a = 0.05, which is the type I error for each test. However, the experimentwise type I error will be

P(at least one type I error in the 10 tests) = 1-P(accept all) = 1 - (.95)10 = 0.40126!!! Fisher’s Least Significant Difference

• The easiest method without any corrections for multiple comparison

o Select a significance level a o Calculate the least significant difference (LSD)

1 1 LSDij = t , ��� ∗ ( + ) �� ��

o Compare group difference to the LSD Fisher’s Least Significant Difference

• The easiest method without any corrections for multiple comparison

o Select a significance level a o Calculate the least significant difference (LSD)

1 1 LSDij = t , ��� ∗ ( + ) �� ��

o Compare group difference to the LSD

• Equivalent to a pooled t test Fisher’s Least Significant Difference

• The easiest method without any corrections for multiple comparison

o Select a significance level a o Calculate the least significant difference (LSD)

1 1 LSDij = t , ��� ∗ ( + ) �� ��

o Compare group difference to the LSD

• Equivalent to a pooled t test

• Use it only after ANOVA rejects H0!! ANOVA provide some a level protection Fisher’s Least Significant Difference

• The easiest method without any corrections for multiple comparison

o Select a significance level a o Calculate the least significant difference (LSD)

1 1 LSDij = t , ��� ∗ ( + ) �� ��

o Compare group difference to the LSD

• Equivalent to a pooled t test

• Use it only after ANOVA rejects H0!! ANOVA provide some a level protection • Not good for the number of means being compared is large Bonferroni Method

• Use a/p instead of a for p individual tests Bonferroni Method

• Use a/p instead of a for p individual tests

• The experimentwise error will be close to a

� 1 − 1 − ≈ � � Bonferroni Method

• Use a/p instead of a for p individual tests

• The experimentwise error will be close to a

� 1 − 1 − ≈ � �

• Very conservative: less type I error but less power as well Bonferroni Method

The formula 1 − 1 − � would be valid only if the tests are independent; often they’re not.

[ e.g., µ1= µ2, µ2= µ3, µ1= µ3 If accepted & rejected, isn’t it more likely that rejected? ] Bonferroni Method

The formula 1 − 1 − � would be valid only if the tests are independent; often they’re not.

1 2 3 [ e.g., µ1= µ2, µ2= µ3, µ1= µ3

If 1 accepted & 2 rejected, isn’t it more likely that 3 rejected? ] Turkey’s Honest Significant Difference (HSD)

• Similar to Fisher’s LSD, but replace t quantile

1 1 LSDij = t , ��� ∗ ( + ) �� �� Turkey’s Honest Significant Difference (HSD)

• Similar to Fisher’s LSD, but replace t quantile

1 1 LSDij = t , ��� ∗ ( + ) �� ��

� ,, 1 1 ����� = ��� ∗ ( + ) 2 �� ��

where q follows a studentized distribution Turkey’s Honest Significant Difference (HSD)

• Similar to Fisher’s LSD, but replace t quantile

1 1 LSDij = t , ��� ∗ ( + ) �� ��

� ,, 1 1 ����� = ��� ∗ ( + ) 2 �� ��

where q follows a studentized range distribution

• Turkey’s HSD take into account all the inter-dependencies of different comparisons, so it is less conservative than Bonferroni method. Contrast

• Back to our example: 25 patients with blisters and they were divided into 3 groups that received treatment A, treatment B, and placebo respectively.

Data [and means]: A: 5,6,6,7,7,8,9,10 [7.25] B: 7,7,8,9,9,10,10,11 [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11]

• Old question: Are these differences between three treatment groups are significant?

• New question: Is there significant difference between placebo and non-placebo, i.e., A and B, groups? Contrast

• We can construct different contrasts to test more complicated hypotheses.

• A contrast is a linear combination of group means in the form of

� = �� with the restriction that

� = 0 Contrast

• The corresponding sample contrast is

� = �� • The for the contrast is

� � �� = �� = ��� ∗ � � � �

• Test statistic � � = ~ �(���) ��� Contrast

• New question: Is there significant difference between placebo and non-placebo, i.e., A and B, groups?

�1 = ? �2 = ? �3 = ? Contrast

• New question: Is there significant difference between placebo and non-placebo, i.e., A and B, groups?

�1 = 1 �2 = 1 �3 = -2 Contrast

• New question: Is there significant difference between placebo and non-placebo, i.e., A and B, groups?

�1 = 1 �2 = 1 �3 = -2

Source SS df MS F p Between 34.74 2 17.37 6.45 Within 59.26 22 2.69 Total 94 24 � −4.095 � = 7.25 + 8.875 + −2 ∗ 10.11 = −4.095 � = = = −2.996 ��� 1.3668 1 1 4 �� = 2.69 ∗ ( + + ) = 1.3668 � = 0.0063 8 8 9