Analysis of Variance: the Fundamental Concepts
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of Variance: The Fundamental Concepts STEVEN F. SAWYER, PT, PhD nalysis of variance (ANOVA) is a procedures (or post hoc tests), effect size, ANOVA General Linear Models statistical tool used to detect differ- statistical power, etc. How do these terms ences between experimental group pertain to p values and statistical signifi- ANOVA is based mathematically on lin- A ear regression and general linear models means. ANOVA is warranted in experi- cance? What precisely is meant by a “sta- mental designs with one dependent vari- tistically significant ANOVA”? How does that quantify the relationship between the dependent variable and the indepen- able that is a continuous parametric nu- analyzing variance result in an inferential 1 merical outcome measure, and multiple decision about differences in group dent variable(s) . There are three different experimental groups within one or more means? Can ANOVA be performed on general linear models for ANOVA: (i) independent (categorical) variables. In non-parametric data? What are the vir- Fixed effects model (Model 1)makes infer- ANOVA terminology, independent vari- tues and potential pitfalls of ANOVA? ences that are specific and valid only to ables are called factors, and groups within These are the issues to be addressed in the populations and treatments of the each factor are referred to as levels. The this primer on the use and interpretation study. For example, if three treatments array of terms that are part and parcel of of ANOVA. The intent is to provide the involve three different doses of a drug, ANOVA can be intimidating to the un- clinician reader, whose misspent youth inferential conclusions can only be drawn initiated, such as: partitioning of vari- did not include an enthusiastic reading of for those specific drug doses. The levels ance, main effects, interactions, factors, statistics textbooks, an understanding of within each factor are fixed as defined by sum of squares, mean squares, F scores, the fundamentals of this widely used the experimental design. (ii) Random ef- familywise alpha, multiple comparison form of inferential statistical analysis. fects model (Model 2) makes inferences about levels of the factor that are not used in the study, such as a continuum of drug doses when the study only used three ABSTRACT: Analysis of variance (ANOVA) is a statistical test for detecting differences doses. This model pertains to random ef- in group means when there is one parametric dependent variable and one or more indepen- fects within levels, and makes inferences dent variables. This article summarizes the fundamentals of ANOVA for an intended benefit about a population’s random variation. of the clinician reader of scientific literature who does not possess expertise in statistics. The (iii) Mixed effects model (Model 3) con- emphasis is on conceptually-based perspectives regarding the use and interpretation of tains both Fixed and Random effects. ANOVA, with minimal coverage of the mathematical foundations. Computational exam- In most types of orthopedic reha- ples are provided. Assumptions underlying ANOVA include parametric data measures, bilitation clinical research, the Fixed ef- normally distributed data, similar group variances, and independence of subjects. However, fects model is relevant since the statistical normality and variance assumptions can often be violated with impunity if sample sizes are inferences being sought are fixed to the sufficiently large and there are equal numbers of subjects in each group. A statistically sig- levels of the experimental design. For this nificant ANOVA is typically followed up with a multiple comparison procedure to identify reason, the Fixed effects model will be the which group means differ from each other. The article concludes with a discussion of effect focus of this article. Computer statistics size and the important distinction between statistical significance and clinical significance. programs typically default to the Fixed KEYWORDS: Analysis of Variance, Interaction, Main Effects, Multiple Comparison effects model for ANOVA analysis, but Procedures higher end programs can perform ANOVA with all three models. Department of Rehabilitation Sciences, School of Allied Health Sciences, Texas Tech University Health Sciences Center, Lubbock, TX Address all correspondence and requests for reprints to: Steven F. Sawyer, PT, PhD, [email protected] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2 [E27] ANALYSIS OF VARIANCE: THE FUNDAMENtaL CONCEPTS Assumptions of ANOVA as a way to infer whether the normal dis- A normal distribution curve will have tribution curves of different data sets are skewness = 0 and kurtosis = 3. (Note that Assumptions for ANOVA pertain to the best thought of as being from the same an alternative definition of kurtosis sub- underlying mathematics of general lin- population or different populations tracts 3 from the final value so that a ear models. Specifically, a data set should (Figure 1). It follows that a fundamental normal distribution will have kurtosis = meet the following criteria before being assumption of parametric ANOVA is 0. This “minus 3” kurtosis value is some- subjected to ANOVA: that each group of data (each level) be times referred to as “excess kurtosis” to Parametric data: A parametric normally distributed. The Shapiro-Wilk distinguish it from the value obtained ANOVA, the topic of this article, re- test2 is commonly used to test for nor- with the standard kurtosis function. The quires parametric data (ratio or interval mality for group sample sizes (N) less kurtosis value calculated by many statis- measures). There are non-parametric, than 50; D’Agnostino’s modification3 is tical programs is the “minus 3” variant one-factor versions of ANOVA for non- useful for larger samplings (N>50). but is referred to, somewhat mislead- parametric ordinal (ranked) data, spe- A normal distribution curve can be ingly, as “kurtosis.”). Normality of a data cifically the Kruskal-Wallis test for inde- described by whether it has symmetry set can be assessed with a z-test in refer- pendent groups and the Friedman test about the mean and the appropriate ence to the standard error of skewness for repeated measures analysis. width and height (peakedness). These (estimated as √[6 / N) and the standard Normally distributed data within attributes are defined statistically by error of kurtosis (estimated as √[24 / each group: ANOVA can be thought of “skewness” and “kurtosis”, respectively. N)4. A conservative alpha of 0.01 (z ≥ FIGURE 1. Graphical representation of statistical Null and Alternative hypotheses for ANOVA in the case of one dependent variable (change in ankle ROM pre/post manual therapy treatment, in units of degrees), and one independent variable with three levels (three different types of manual therapy treatments). For this fictitious data, the group (sample) means are 13, 14 and 18 degrees of increased ankle ROM for treatment type groups 1, 2 and 3, respectively (raw data are presented in Figure 2). The Null hypothesis is represented in the left graph, in which the population means for all three groups are assumed be identical to each other (in spite of difference in sample means calculated from the experimental data). Since in the Null hypothesis the subjects in the three groups are considered to compose a single population, by definition the population means of each group are equal to each other, and are equal to the Grand mean (mean for all data scores in the three groups). The corresponding normal distribution curves are identical and precisely overlap along the X-axis. The Alternative hypothesis is shown in right graph, in which differences in group sample means are inferred to represent true differences in group population means. These normal distribution curves do not overlap along the X-axis because each group of subjects are considered to be distinct populations with respect to ankle ROM, created from the original single population that experienced different efficacies of the three treatments. Graph is patterned after Wilkinson et al11. ANOVA Null Hypothesis: ANOVA Null Hypothesis: Identical Normal distribution curve Different Normal distribution curve ensity Function ensity Function D D robability robability robability P P Increased elbow ROM (degree) Increased elbow ROM (degree) [E28] THE JOURNAL OF MANUAL & MANIPULATIVE THERAPY n VOLUME 17 n NUMBER 2 ANALYSIS OF VARIANCE: THE FUNDAMENtaL CONCEPTS 2.56) is appropriate, due to the overly the F score calculation are warranted. so9,10. If normality and homogeneity of sensitive nature of these tests, especially The two most commonly used correc- variance violations are problematic, for large sample sizes (>100)4. As a com- tion methods are the Greenhouse- there are three options: (i) Mathemati- putational example, for N = 20, the esti- Geisser and Huynh-Feldt, which calcu- cally transform (log, arcsin, etc.) the mation of standard error of skewness = late a descriptive statistic called epsilon, data to best mitigate the violation, with √[6 / 20] = 0.55, and any skewness value which is a measure of the extent to which the cost of cognitive fog in understand- greater than ±2.56 x 0.55 = ±1.41 would sphericity has been violated. The range ing the meaning to the ANOVA results indicate non-normality. Perhaps the of values for epsilon are 1 (no sphericity (e.g., “A statistically significant main ef- best “test” is what always should be violation) to a lower boundary of 1 / fect was obtained for the arcsin transfor- done: examine a histogram of the distri- (m—1), where m = number of levels. For mation of degrees of ankle range of mo- bution of the data. In practice, any dis- example, with three groups, the range tion”). (ii) Use one of the non-parametric tribution that resembles a bell-shaped would be 1 to 0.50. The closer epsilon is ANOVAs mentioned above, but at the curve will be “normal enough” to pass to the lower boundary, the greater the cost of reduced power and being limited normality tests, especially if the sample degree of violation.