ANOVA

Note: If I were doing a quick ANOVA analysis (without the diagnostic checks, etc.), I’d do the following: 1) load the packages (#1); 2) do the prep work (#2); and 3) run the ggstatsplot::ggbetweenstats analysis (in the #6 section).

1. Packages needed. Here are the recommended packages to load prior to working.

library(ggplot2) # for graphing library(ggstatsplot) # for graphing and statistical analyses (one-stop shop) library(GGally) # This package offers the ggpairs() function. library(moments) # This package allows for and functions library(Rmisc) # Package for calculating stats and bar graphs of library(ggpubr) # normality related commands

2. Prep Work

Declare factor variables as such. class(mydata$catvar) # this tells you how currently sees the variable (e.g., double, factor) mydata$catvar <- factor(mydata$catvar) #Will declare the specified variable as a factor variable

3. Checking for violations of assumptions: a) relatively equal group sizes; b) equal ; and c) normal distribution.

a. Group Sizes Group counts (to check group frequencies): table(mydata$catvar)

b. Checking Equal Variances Group means and standard deviations (Note: the aggregate and by commands give you the same results): aggregate(mydata$intvar, by = list(mydata$catvar), FUN = , na.rm = TRUE) aggregate(mydata$intvar, by = list(mydata$catvar), FUN = sd, na.rm = TRUE) by(mydata$intvar, mydata$catvar, mean, na.rm = TRUE) by(mydata$intvar, mydata$catvar, sd, na.rm = TRUE)

A simple bar graph of group means and CIs (using Rmisc package). This command is repeated further below in the graphing section. The ggplot command will vary depending on the number of categories in the grouping variable.

object <- Rmisc::summarySE(mydata, measurevar = "intvar", groupvars = c("catvar"), na.rm = TRUE) ggplot2::ggplot(object, aes(x = factor(catvar), y = intvar)) + geom_bar(stat = "Identity", fill = "gray", width = 0.8) + geom_errorbar(aes(ymin = intvar - se, ymax = intvar + se), width = .2, color = "black") + xlab("Overall label for x-axis groups") + ylab("Y-axis label") + scale_x_discrete(breaks = c("1", "2", "3"), labels = c("category 1 label", "category 2 label", "category 3 label")) # Your catvar values may not be 1, 2, 3, etc. They could be any number of values; adjust accordingly.

If you want to run a Bartlett’s test for equal variances: bartlett.test(intvar ~ catvar, data = mydata)

c. Checking Normality Checking for skewness and kurtosis (skewness of 0 and kurtosis of 3 are normal). Can also visually inspect the data by looking at a density graph and quantile-quantile (qqplot) (the qqplot draws a correlation between a sample and a normal distribution; the dots should form a relatively straight 45 degree line if there is a normal distribution). moments::skewness(mydata$intvar, na.rm = TRUE) moments::kurtosis(mydata$intvar, na.rm = TRUE) ggpubr::ggdensity(mydata$intvar, fill = "lightgray") ggpubr::ggqqplot(mydata$intvar)

Beyond a visual inspection, you can conduct a Shapiro-Wilk’s test of normality, where a p < .05 indicates a non-normal distribution and a p > .05 indicates normally distributed data. This can be a sensitive test, particularly with a large N, so use in conjunction with other information. It is also limited to a sample of 5000. shapiro.test(mydata$intvar)

4. The analysis of . The first few commands are the basic commands for running an ANOVA. It assigns the results to the object. You can follow it with a pairwise t-test (bonferroni). Better still, jump straight to the ggstatsplot option.

object <- aov(intvar ~ catvar, data = mydata) summary(object)

Conducting follow-up pairwise t-tests (Bonferroni) to see which groups are significantly different from one another (results give the p-values): pairwise.t.test(mydata$intvar, mydata$catvar, p.adj = "bonf")

**I actually recommend jumping straight to the ggstatsplot::ggbetweenstats command to produce a graph and calculate your .

ggstatsplot::ggbetweenstats(data = mydata, x = catvar, y = intvar, pairwise.comparisons = TRUE, p.adjust.method = "bonferroni", bf.message = FALSE, title = "Title of Graph", xlab = "X-axis label", ylab = "Y-axis label")

5. Kruskal-Wallis non-parametric test (if needed). If you have violated one or more of the assumptions, you can run a Kruskal-Wallis rank sum test as a check on your ANOVA.

kruskal.test(intvar ~ catvar, data = mydata)

6. Graphing Options. Boxplots and Violin plots are probably the most helpful graphs, though you can also just do bar graphs of the group means with CIs. The ggplot2 package is generally the go-to package for graphs. The ggstatsplot package, however, provides a one-stop shop for graphing and conducting ANOVA and pairwise comparisons. It produces your boxplot/violin graph and calculates your ANOVA, while indicating whether the differences in means between any two groups is statisticallly significant.

The ggplot2 commands (the second shows how to add a marker for the means; the third shows box/violin combination) ggplot2::ggplot(data = mydata, aes(x = catvar, y = intvar)) + geom_boxplot() + labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")

ggplot2::ggplot(data = mydata, aes(x = catvar, y = intvar)) + geom_boxplot() + stat_summary(fun.y = mean, geom = "point", shape = 8, size = 4, color = "blue", fill = "blue") + labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")

ggplot2::ggplot(data = mydata, aes(x = catvar, y = intvar)) + geom_violin() + geom_boxplot() + labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")

The ggstatsplot package command. If variances are equal, you can include var_equal = TRUE as an argument.

ggstatsplot::ggbetweenstats(data = mydata, x = catvar, y = intvar, pairwise.comparisons = TRUE, p.adjust.method = "bonferroni", bf.message = FALSE, title = "Title of Graph", xlab = "X-axis label", ylab = "Y-axis label")

GGally’s ggpairs() function offers another way to look at the relationship. GGally::ggpairs(data = subset(mydata, select = c(intvar, catvar)), ggplot2::aes(color = catvar, alpha = .5), title = "Title of Graph")

A simple bar graph of group means and CIs (using Rmisc package). You’ll obviously adjust the ggplot command based on the number of categories and their number assignment. object <- Rmisc::summarySE(mydata, measurevar = "intvar", groupvars = c("catvar"), na.rm = TRUE) object ggplot2::ggplot(object, aes(x = factor(catvar), y = intvar)) + geom_bar(stat = "Identity", fill = "gray", width = 0.8) + geom_errorbar(aes(ymin = intvar - se, ymax = intvar + se), width = .2, color = "black") + xlab("Overall x-axis groups label") + ylab("Y-axis label") + scale_x_discrete(breaks = c("1", "2", "3"), labels = c("category 1 label", "category 2 label", "category 3 label")) # Your catvar values may not be 1, 2, 3, etc. They could be any number of values; adjust accordingly.