Nonparametric Statistics for Social Scientists

Nonparametric statistics for social scientists Brad Luen goo.gl/7ORvwq April 10, 2015 Brad Luen Nonparametric statistics for social scientists Parametric statistical models Parametric statistical models: Models for data or populations with a few parameters to be estimated (e.g. normal distribution; linear model.) Nonparametric statistical models include: I Models that don't assume any particular kind of probability distribution (e.g. rank tests) I Models that don't make strong assumptions about the form of the relationship between variables (e.g. smoothing splines) Brad Luen Nonparametric statistics for social scientists Significance and hypothesis tests Start with a null hypothesis about the population. Significance test: Is the data consistent with the null hypothesis? Calculate a P-value: What's the probability of getting data at least this extreme if the null hypothesis is true? If the P-value is small, this suggests the data is not consistent with the null. Hypothesis test: Do we have enough evidence to reject the null hypothesis in favor of an alternative? Choose a small probability threshold before performing the test. If the P-value is less than this threshold, we have enough evidence to reject the null. Otherwise, we fail to reject it. Brad Luen Nonparametric statistics for social scientists Significance level Because data is random, sometimes when the null hypothesis is true, we incorrectly reject it (Type I error.) We specify a significance level α before performing the test. If the assumptions of the test are met, α is the (maximum) probability that we reject when the null hypothesis is true. However, as assumptions are rarely perfectly satisfied, the true significance level might be different from α. Brad Luen Nonparametric statistics for social scientists Power When the null hypothesis is false, we want to reject it with high probability. Power is the probability of rejecting the null hypothesis when it's false (under some assumption about the alternative hypothesis.) It's heavily dependent on sample size: larger samples ! more power ! good. But it also depends on the type of test you run. I If you're planning a large or expensive study that will result in a hypothesis test, always estimate power before collecting the data. Brad Luen Nonparametric statistics for social scientists Goals of statistical tests I Getting a true significance level close to the nominal significance level. It you can't get it exactly, it's better to be a bit under than a bit over (conservatism.) I High power for a given sample size. I Interpretability. Brad Luen Nonparametric statistics for social scientists Parametric statistics: The t-tests Problem: Randomized experiment comparing a treatment group and control group. Is the treatment group any different?1 Classical methods: I Student's two-sample t-test: Assume normal distributions and that the two populations have equal variances I Welch's two-sample t-test: Assume normal distributions; variances may be equal or unequal In both cases, the null hypothesis is that the two populations have equal means. 1The same math holds for observational studies, but make sure you really want to do a test rather than, say, confidence intervals. Brad Luen Nonparametric statistics for social scientists Example: Stereograms Brad Luen Nonparametric statistics for social scientists Example: Stereograms In R: Student's t-test: t.test(treatment, control, var.equal=TRUE) Welch's t-test: t.test(treatment, control) Testing at level α = 0:05, we get different results. I In terms of level and power, Student's test only does well when variances are similar. Welch's test does well when variances are similar and when they're different. If in doubt, use Welch's. Brad Luen Nonparametric statistics for social scientists Example: Stereograms Welch's is reasonably robust to violations of its assumptions | even when data is non-normal, the true significance level doesn't get too high. However, if data is strongly non-normal, there may be more powerful choices. We can check normality visually by drawing boxplots, histograms, and/or normal quantile plots. boxplot(treatment, control) qqnorm(treatment) qqnorm(control) If we see either of the following: I Systematic bend in a QQ plot I Really bad outliers we should consider a test that doesn't rely on the normal assumption. Brad Luen Nonparametric statistics for social scientists Example: Stereograms Brad Luen Nonparametric statistics for social scientists Rank tests If we don't know what (if any) kind of distribution is the correct model for the population, how do we test? One of many solutions: Use ranks I Assign a rank of 1 to the smallest observation over both groups, 2 to the second smallest, 3 to the third smallest, . I Do the two sets of ranks look similar or different? Brad Luen Nonparametric statistics for social scientists Example: Rank tests I Treatment: 22, 23, 26, 28 I Control: 31, 35, 39, 40 Change to ranks: I Treatment: 1, 2, 3, 4 I Control: 5, 6, 7, 8 The treatment has all the low ranks and the control has all the high ranks. Looks like there's a difference between treatment and control. Brad Luen Nonparametric statistics for social scientists Example: Rank tests I Treatment: 26, 27, 30, 32 I Control: 22, 25, 36, 40 Change to ranks: I Treatment: 3, 4, 5, 6 I Control: 1, 2, 7, 8 On average, the treatment ranks and the control ranks are the same. No evidence of a systematic difference between treatment and control. Brad Luen Nonparametric statistics for social scientists The Wilcoxon rank-sum test The two independent sample rank test is called the Wilcoxon rank-sum test or the Mann-Whitney test. The null hypothesis is that the observations are exchangeable: That is, all permutations of the data across treatment and control are equally likely. This is automatically satisfied (for example) by a randomized experiment in which the treatment has no effect on anyone. Brad Luen Nonparametric statistics for social scientists Example: Inherent rank data I wish to find out if I prefer a typical Beatles song or a typical Bob Dylan song. I randomly select five songs from each artist, and rank them. 1. Beatles, \Little Child" 2. Beatles, \Do You Want to Know a Secret" 3. Beatles, \Run for Your Life" 4. Bob Dylan, \Gotta Serve Somebody" 5. Beatles, \What Goes On" 6. Beatles, \Love You To" 7. Bob Dylan, \Boots of Spanish Leather" 8. Bob Dylan, \Diamond Joe" 9. Bob Dylan, \Never Gonna Be the Same Again" 10. Bob Dylan, \Ninety Miles an Hour (Down a Dead End Street)" Brad Luen Nonparametric statistics for social scientists Example: Inherent rank data I Compare the summed ranks for the two artists. The Beatles' ranks sum to 17; Dylan's ranks sum to 38. Rank-sum: 17 (or 38) I Compare each Beatles song to each Dylan song (5 × 5 comparisons). The Beatles win 23 times; Dylan wins 2 times. Mann-Whitney statistic: 2 (or 23) The rank-sum and Mann-Whitney statistics lead to the same P-value. Brad Luen Nonparametric statistics for social scientists Finding the P-value We consider all the ways of splitting the ten ranks into two groups of 5. (1; 2; 3; 4; 5) (6; 7; 8; 9; 10) (1; 2; 3; 4; 6) (5; 7; 8; 9; 10) etc. Then the P-value is the proportion of these combinations that have a test statistic at least as extreme as the one we observed. If you are good at probability you can do this by hand. Otherwise, use R. Brad Luen Nonparametric statistics for social scientists Finding the P-value beatles = c(1, 2, 3, 4, 6) dylan = c(5, 7, 8, 9, 10) wilcox.test(beatles, dylan) Brad Luen Nonparametric statistics for social scientists Example: Stereograms wilcox.test(treatment, control) We get a smaller P-value than the t-tests give. If the null hypothesis is false, this is an (after-the-fact) sign of high power. (If you don't like the error message and want to deal with ties exactly: install.packages("coin") library(coin) wilcox test(time ∼ factor(group), data = stereograms, dist = "exact") This is almost always overkill, however.) Brad Luen Nonparametric statistics for social scientists Confidence intervals wilcox.test(treatment, control, conf.int=TRUE) produces a 95% confidence interval for a shift parameter. That is, we assume the treatment distribution has the same spread and shape as the control distribution, but is shifted to the right or left. If this is not true, the confidence interval isn't very useful | you might as well use the Welch interval, because at least that's easy to interpret. I The test itself does NOT require the shift alternative assumption. Brad Luen Nonparametric statistics for social scientists Simulation: Level and power One good reason to use R is to do simulations before gathering data to see what power you get for a particular test and a particular sample size. Then you can decide what test to use before gathering data, which is more intellectually hygienic. e.g. You want to do a randomized experiment. You think the control group will have something like a exponential distribution with parameter 1, and the treatment group will have a distribution of the same shape and scale except shifted to the right by 0.25. Do 100,000 simulations of level 0:05 Welch t- and Wilcoxon rank-sum tests when: 1. There isn't a shift (null hypothesis) 2. There is a shift (alternative hypothesis) Brad Luen Nonparametric statistics for social scientists Null true: Significance level At α = 0:05: I Welch t-test: Reject 4:83% of the time I Wilcoxon rank-sum test: Reject 4:95% of the time Pretty good in both cases.

Load more