Quick viewing(Text Mode)

Nonparametric Statistics for Social Scientists

Nonparametric for social scientists

Brad Luen

goo.gl/7ORvwq

April 10, 2015

Brad Luen Nonparametric statistics for social scientists Parametric statistical models

Parametric statistical models: Models for or populations with a few to be estimated (e.g. normal distribution; .) Nonparametric statistical models include:

I Models that don’t assume any particular kind of (e.g. rank tests)

I Models that don’t make strong assumptions about the form of the relationship between variables (e.g. smoothing splines)

Brad Luen Nonparametric statistics for social scientists Significance and hypothesis tests

Start with a null hypothesis about the population. Significance test: Is the data consistent with the null hypothesis? Calculate a P-value: What’s the probability of getting data at least this extreme if the null hypothesis is true? If the P-value is small, this suggests the data is not consistent with the null. Hypothesis test: Do we have enough evidence to reject the null hypothesis in favor of an alternative? Choose a small probability threshold before performing the test. If the P-value is less than this threshold, we have enough evidence to reject the null. Otherwise, we fail to reject it.

Brad Luen Nonparametric statistics for social scientists Significance level

Because data is random, sometimes when the null hypothesis is true, we incorrectly reject it (Type I error.) We specify a significance level α before performing the test. If the assumptions of the test are met, α is the (maximum) probability that we reject when the null hypothesis is true. However, as assumptions are rarely perfectly satisfied, the true significance level might be different from α.

Brad Luen Nonparametric statistics for social scientists Power

When the null hypothesis is false, we want to reject it with high probability. Power is the probability of rejecting the null hypothesis when it’s false (under some assumption about the .) It’s heavily dependent on sample size: larger samples → more power → good. But it also depends on the type of test you run.

I If you’re planning a large or expensive study that will result in a hypothesis test, always estimate power before collecting the data.

Brad Luen Nonparametric statistics for social scientists Goals of statistical tests

I Getting a true significance level close to the nominal significance level. It you can’t get it exactly, it’s better to be a bit under than a bit over (conservatism.)

I High power for a given sample size.

I Interpretability.

Brad Luen Nonparametric statistics for social scientists : The t-tests

Problem: Randomized comparing a treatment group and control group. Is the treatment group any different?1 Classical methods:

I Student’s two-sample t-test: Assume normal distributions and that the two populations have equal

I Welch’s two-sample t-test: Assume normal distributions; variances may be equal or unequal In both cases, the null hypothesis is that the two populations have equal .

1The same math holds for observational studies, but make sure you really want to do a test rather than, say, confidence intervals. Brad Luen Nonparametric statistics for social scientists Example: Stereograms

Brad Luen Nonparametric statistics for social scientists Example: Stereograms

In R: Student’s t-test: t.test(treatment, control, var.equal=TRUE) Welch’s t-test: t.test(treatment, control) Testing at level α = 0.05, we get different results. . .

I In terms of level and power, Student’s test only does well when variances are similar. Welch’s test does well when variances are similar and when they’re different. If in doubt, use Welch’s.

Brad Luen Nonparametric statistics for social scientists Example: Stereograms

Welch’s is reasonably robust to violations of its assumptions — even when data is non-normal, the true significance level doesn’t get too high. However, if data is strongly non-normal, there may be more powerful choices. We can check normality visually by drawing boxplots, , and/or normal quantile plots. boxplot(treatment, control) qqnorm(treatment) qqnorm(control) If we see either of the following:

I Systematic bend in a QQ plot

I Really bad outliers we should consider a test that doesn’t rely on the normal assumption.

Brad Luen Nonparametric statistics for social scientists Example: Stereograms

Brad Luen Nonparametric statistics for social scientists Rank tests

If we don’t know what (if any) kind of distribution is the correct model for the population, how do we test? One of many solutions: Use ranks

I Assign a rank of 1 to the smallest observation over both groups, 2 to the second smallest, 3 to the third smallest, . . .

I Do the two sets of ranks look similar or different?

Brad Luen Nonparametric statistics for social scientists Example: Rank tests

I Treatment: 22, 23, 26, 28

I Control: 31, 35, 39, 40 Change to ranks:

I Treatment: 1, 2, 3, 4

I Control: 5, 6, 7, 8 The treatment has all the low ranks and the control has all the high ranks. Looks like there’s a difference between treatment and control.

Brad Luen Nonparametric statistics for social scientists Example: Rank tests

I Treatment: 26, 27, 30, 32

I Control: 22, 25, 36, 40 Change to ranks:

I Treatment: 3, 4, 5, 6

I Control: 1, 2, 7, 8 On average, the treatment ranks and the control ranks are the same. No evidence of a systematic difference between treatment and control.

Brad Luen Nonparametric statistics for social scientists The Wilcoxon rank-sum test

The two independent sample rank test is called the Wilcoxon rank-sum test or the Mann-Whitney test. The null hypothesis is that the observations are exchangeable: That is, all permutations of the data across treatment and control are equally likely. This is automatically satisfied (for example) by a in which the treatment has no effect on anyone.

Brad Luen Nonparametric statistics for social scientists Example: Inherent rank data

I wish to find out if I prefer a typical Beatles song or a typical Bob Dylan song. I randomly select five songs from each artist, and rank them. 1. Beatles, “Little Child” 2. Beatles, “Do You Want to Know a Secret” 3. Beatles, “Run for Your Life” 4. Bob Dylan, “Gotta Serve Somebody” 5. Beatles, “What Goes On” 6. Beatles, “Love You To” 7. Bob Dylan, “Boots of Spanish Leather” 8. Bob Dylan, “Diamond Joe” 9. Bob Dylan, “Never Gonna Be the Same Again” 10. Bob Dylan, “Ninety Miles an Hour (Down a Dead End Street)”

Brad Luen Nonparametric statistics for social scientists Example: Inherent rank data

I Compare the summed ranks for the two artists. The Beatles’ ranks sum to 17; Dylan’s ranks sum to 38. Rank-sum: 17 (or 38)

I Compare each Beatles song to each Dylan song (5 × 5 comparisons). The Beatles win 23 times; Dylan wins 2 times. Mann-Whitney : 2 (or 23) The rank-sum and Mann-Whitney statistics lead to the same P-value.

Brad Luen Nonparametric statistics for social scientists Finding the P-value

We consider all the ways of splitting the ten ranks into two groups of 5.

(1, 2, 3, 4, 5) (6, 7, 8, 9, 10) (1, 2, 3, 4, 6) (5, 7, 8, 9, 10)

etc. Then the P-value is the proportion of these combinations that have a test statistic at least as extreme as the one we observed. If you are good at probability you can do this by hand. Otherwise, use R.

Brad Luen Nonparametric statistics for social scientists Finding the P-value

beatles = c(1, 2, 3, 4, 6) dylan = c(5, 7, 8, 9, 10) wilcox.test(beatles, dylan)

Brad Luen Nonparametric statistics for social scientists Example: Stereograms

wilcox.test(treatment, control) We get a smaller P-value than the t-tests give. If the null hypothesis is false, this is an (after-the-fact) sign of high power. (If you don’t like the error message and want to deal with ties exactly: install.packages("coin") library(coin) wilcox test(time ∼ factor(group), data = stereograms, dist = "exact") This is almost always overkill, however.)

Brad Luen Nonparametric statistics for social scientists Confidence intervals

wilcox.test(treatment, control, conf.int=TRUE) produces a 95% confidence interval for a shift . That is, we assume the treatment distribution has the same spread and shape as the control distribution, but is shifted to the right or left. If this is not true, the confidence interval isn’t very useful — you might as well use the Welch interval, because at least that’s easy to interpret.

I The test itself does NOT require the shift alternative assumption.

Brad Luen Nonparametric statistics for social scientists Simulation: Level and power

One good reason to use R is to do simulations before gathering data to see what power you get for a particular test and a particular sample size. Then you can decide what test to use before gathering data, which is more intellectually hygienic. e.g. You want to do a randomized experiment. You think the control group will have something like a exponential distribution with parameter 1, and the treatment group will have a distribution of the same shape and scale except shifted to the right by 0.25. Do 100,000 simulations of level 0.05 Welch t- and Wilcoxon rank-sum tests when: 1. There isn’t a shift (null hypothesis) 2. There is a shift (alternative hypothesis)

Brad Luen Nonparametric statistics for social scientists Null true: Significance level

At α = 0.05:

I Welch t-test: Reject 4.83% of the time

I Wilcoxon rank-sum test: Reject 4.95% of the time Pretty good in both cases.

Brad Luen Nonparametric statistics for social scientists Alternative true: Power

I Welch t-test: Reject 43% of the time

I Wilcoxon rank-sum test: Reject 78% of the time The Wilcoxon rank-sum test is much more powerful, so you should use that one.

Brad Luen Nonparametric statistics for social scientists More tests

There’s a rank test for almost any circumstance. e.g. To compare K samples, the standard parametric method is the (ANOVA) F -test. The method that instead compares ranks of K samples is the Kruskal-Wallis test: kruskal.test(list(x1, x2, x3)) Again, this does better than the parametric test when the data is skewed or when there are heavy tails or bad outliers.

Brad Luen Nonparametric statistics for social scientists Even more generally

Rank tests are part of a broader class of tests called permutation tests, where the null distribution and P-value are determined by considering all permutations of the data. Permutation tests are conservative under weak assumptions (exchangeability.) Instead of using the sum of ranks, you can design a test statistic that gives power against likely alternatives, and then use R to find the P-value. e.g. Instead of doing a t-test, do a permutation t-test. Then you no longer need the normality assumption.

Brad Luen Nonparametric statistics for social scientists What is regression for?

I Describing data

I Predicting data

I Estimating causal effects (but only if you have a controlled experiment or a very, very good scientific model)

Brad Luen Nonparametric statistics for social scientists

Linear regression is great at describing and predicting many kinds of data. But: 1. It only fits straight lines (or planes) 2. It’s sensitive to outliers 3. The inference is difficult to interpret if the model isn’t close to literally true The first problem is the most serious, so it’s the one we’ll deal with.

Brad Luen Nonparametric statistics for social scientists Example: Smart TVs

Brad Luen Nonparametric statistics for social scientists Parametric fixes

I Maybe a quadratic?

I Maybe fit a linear model to the log of y?

Brad Luen Nonparametric statistics for social scientists Example: Smart TVs

Brad Luen Nonparametric statistics for social scientists Limitations of low-dimensional models

I Neither the quadratic nor the log-linear model is very flexible.

I The log-linear is still fairly interpretable; the quadratic model less so.

Brad Luen Nonparametric statistics for social scientists Overfitting

Brad Luen Nonparametric statistics for social scientists Overfitting

Overfitting a model identifies structure that isn’t really there. It’s bad for prediction and worse for inference.

Brad Luen Nonparametric statistics for social scientists Nonparametric fixes

I : loess(price ∼ size)

I Smoothing spline: smooth.spline(size, price)

Brad Luen Nonparametric statistics for social scientists Nonparametric fixes

Brad Luen Nonparametric statistics for social scientists Smoothing splines

I Draw a curve near the data that has a small sum of squared errors.

I Add a penalty based on how unsmooth the curve is (based on the second derivative.) We find the curve that does the best according to these two criteria.

Brad Luen Nonparametric statistics for social scientists Smoothing splines: Tuning parameter

There’s a hidden tuning parameter (in R: spar) that determines how we weigh the two criteria.

I Low spar: low smoothing — spline nearly goes through all the points

I High spar: high smoothing — nearly a straight line

Brad Luen Nonparametric statistics for social scientists Smoothing splines: Tuning parameter

Brad Luen Nonparametric statistics for social scientists Choosing the tuning parameter

If your aim is prediction, ise cross-validation:

I Leave one data point out

I Determine what value of the tuning parameter give you a curve that comes close to the point you left out

I Repeat this, leaving out a different data point

I Continue until you work out which value of the tuning parameter does the best overall You can get R to do all this for you. . . If your aim is description, fiddling with the smoothing parameter manually may do just as well, if not better.

Brad Luen Nonparametric statistics for social scientists Example:

Brad Luen Nonparametric statistics for social scientists Example: Time series

Brad Luen Nonparametric statistics for social scientists GAMs

The same principles apply when we have multiple predictors and when we have other kinds of regression (e.g. logistic.) The generalized additive model (GAM) fits a sum of splines as a regression prediction, given a probability family for the response (given the predictors.) In R: gam() in library(mgcv)

Brad Luen Nonparametric statistics for social scientists Example: California housing

Brad Luen Nonparametric statistics for social scientists GAMs

Advantages of GAMs:

I More flexible, realistic models than linear regression or its relatives

I Not too bad to interpret after some practice Disadvantages:

I Predictive accuracy isn’t as high as hardcore data mining

I Complex interactions are computationally messy

Brad Luen Nonparametric statistics for social scientists Messages to take home

I When performing statistical tests, think about power early and often.

I You don’t have to fit straight lines!

Brad Luen Nonparametric statistics for social scientists