<<

14. Non-Parametric Tests

You know the story. You’ve tried for hours to normalise your data. But you can’t. And even if you can, your turn out to be unequal anyway. But do not dismay. There are statistical tests that you can run with non-normal, heteroscedastic data. All of the tests that we have seen so far (e.g. t-tests, ANOVA, etc.) rely on comparisons with some such as a t-distribution or f-distribution. These are known as “parametric ” as they make inferences about population parameters. However, there are other statistical tests that do not have to meet the same assumptions, known as “non-parametric tests.” Many of these tests are based on ranks and/or are comparisons of sample rather than (as means of non-normal data are not meaningful). Non-parametric tests are often not as powerful as parametric tests, so you may find that your p-values are slightly larger. However, if your data is highly non-normal, non-parametric test may detect an effect that is missed in the equivalent parametric test due to falsely large variances caused by the skew of the data. There is, of course, no problem in analyzing normally distributed data with non- also. Non-parametric tests are equally valid (i.e. equally defendable) and most have been around as long as their parametric counterparts. So, if your data does not meet the assumption of normality or and if it is impossible (or inappropriate) to transform it, you can use non-parametric tests. The assumptions of independence and randomization, of course, still apply.

14.1 Non-Normal Data For this lab, we will work with tree improvement data from a tree breeding program in British Columbia. The data we will work with has two categorical independent variables: three species of spruce tree (SPECIES) and two different site moisture conditions (SITE). Tree heights were measured after 1 year (HEIGHT).  Download the “non-parametric.csv” data from the website. Import it into as “data” and attach it. You can use boxplots to see that the data is not normal for many treatment levels. We will assume for this lab that we cannot or do not want to transform the data (perhaps we want to maintain the original units). You can also use a Shapiro test to verify quantitatively that the data is not normal in many cases: boxplot(HEIGHT~SPECIES) boxplot(HEIGHT~SITE) shapiro.test(data[data$SPECIES=="engelman","HEIGHT"]) shapiro.test(data[data$SPECIES=="black","HEIGHT"]) shapiro.test(data[data$SPECIES=="white","HEIGHT"]) # Repeat for SITE treatment levels

14.2 Wilcoxon Rank Sum Test (alternative to t-test when distributions are similarly shaped) The Wilcoxon test (aka. Mann-Whitney test) is very common and is the non-parametric equivalent of a t- test. However, you should only use this test when your distributions are similarly shaped (e.g. when your samples are skewed in the same direction). To start, we will compare just the two levels of the SITE treatment: xeric (dry) and mesic (moist). We will do this first with a t-test for comparison, then with a Wilcoxon test.  Create an R object of the height after the first year (HEIGHT) for each of the SITE treatments: xeric <- data[data$SITE=="xeric","HEIGHT"] mesic <- data[data$SITE=="mesic","HEIGHT"]

 First, we will run a one-sample t-test equivalent using both the t-test and the Wilcoxon test in R. We will compare a height measurement of 0.46 to the rest of the xeric heights. How do the results of the two tests compare? t.test(xeric,mu=0.46) wilcox.test(xeric,mu=0.46)

 Now, run the non-parametric equivalent of a two-sample t-test using the Wilcoxon test. (This is often referred to also as a Mann-Whitney test.) Compare the tree heights between the xeric and mesic sites with a t-test (which assumes normality for both samples) and with a Wilcoxon test. What do you notice? t.test(xeric,mesic) wilcox.test(xeric,mesic)

 Wilcoxon tests are also able to run one-tailed equivalents. We simply add the alternative hypothesis to the statement, just as we did with the t-test. In this case, our alternative hypothesis is that the xeric site produces larger trees: wilcox.test(xeric,mesic,alternative=”greater”) # And compare to the same t-test: t.test(xeric,mesic,alternative=”greater”)

 We can also run a paired test with a Wilcoxon test (but not with these data), just as we did with a t- test. To do this, we would simply add the option paired=TRUE to the command (just like a t-test).  Note that there is an exact option that you can specify if your sample sizes are not large (default is on). If you disable it (set this option to NULL in R), the procedure will take some computational shortcuts. You may need to do this for large samples. Otherwise the test may run for hours or days!

14.3 Kolmogorov-Smirnov Test (alternative to t-test when distributions are of different shape) The Wilcox rank sum test is not suitable if you have distributions/samples that do not look roughly similar in a (i.e. they don’t have similar skew). In that case, you can use the again less-powerful Kolmogorov-Smirnov test.  Coding in R for the Kolmogorov-Smirnov test, using the ks.test() command, uses basically the same syntax as the t.test() and wilcox.test() commands. We’ll try using it for comparing the same samples (xeric and mesic) as above. As far as I know, there is no way to run a one-sample test with the K-S test (but this is no problem, as you can always use a Wilcoxon test for that). # Two-sample, two-tailed test ks.test(xeric,mesic) # Two-sample, one-tailed test # Note that the order of the variables must be reversed (or use “less”) ks.test(xeric,mesic, alternative=”greater”)

14.4 Kruskal-Wallis test (alternative to one-way ANOVA for non-normal distributions) The Kruskal-Wallis test is an extension of the Wilcox rank sum test for more than two treatment levels, just as a parametric ANOVA is a kind of extension of the parametric t-test. However, the K-W test may only be used as an equivalent of a one-way ANOVA. It is not capable of comparing more than one treatment (more than one factor).  Use a K-W test to compare the heights of the three spruce species: kruskal.test(HEIGHT~SPECIES) # And compare the output to a parametric ANOVA summary(aov(HEIGHT~SPECIES))

 As in ANOVA, a significant treatment effect indicates that at least one population differs from another (we compare medians, as means are somewhat meaningless in skewed data). So, it must be followed up by pair-wise Wilcoxon tests, comparing each of the treatment levels and because of that you will have to manually adjust your p-values if you run multiple comparisons.

14.5 Friedman’s Rank Test (alternative to two-way ANOVA with blocks) There are no non-parametric statistical tests for multi-factor experimental designs (that I know of, anyway). However, there is one non-parametric test to analyze a randomized complete block design, which is Friedman’s Rank Test.  In R, you can only execute the Friedman Rank Test if there are no replications within blocks. So, we will add some blocks to the lab data just for the sake of this example: # Use the LETTERS command to repeat capital letters in a vector data$BLOCK = rep(LETTERS[1:8],3) # You will have to detach & reattach the data to refer to BLOCK detach(data) attach(data)

 Now, you can run the blocked non-parametric ANOVA with Friendman’s Test. Note that BLOCK and SPECIES (or whatever you name the variables) cannot be stated in reverse order: friedman.test(YIELD~SPECIES|BLOCK) # Compare this to a parametric blocked ANOVA summary(aov(YIELD~SPECIES+BLOCK))

CHALLENGE: 1. Why are parametric tests considered stronger statistical tests? 2. What should your steps be before using a non-parametric test? 3. How do the p-values between a t-test and a Wilcox test compare?