Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016
Total Page:16
File Type:pdf, Size:1020Kb
Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016 Randomization Distributions in Two-Proportion Settings By calling our setting a “two proportion” one, I mean that the data frame has two binary categorical variables, when the one that delineates which of two groups a subject comes from serves as the explanatory variable, and the other, the response variable, also has just two outcomes. In the cocaine addiction data, we have an explanatory variable, “treatment”, which has three levels: “De- sipramine”, “Lithium”, and “Placebo.” We cut that back to two by ignoring one set of patients, perhaps those receiving Desipramine, thereby giving us just two groups to consider. The response variable is “relapsed or not?” which has just two values, “yes” or “no.” We focus on the relapsers. Natural hypotheses for a study to see if Lithium helps to decrease the chance of relapse are H0 : pL − pP = 0, Ha : pL − pP < 0. The sample proportions among the lithium and placebo groups are pˆL = 18/24 and pˆP = 20/24, giving us test statistic 18 20 2 . pˆ − pˆ = − = − = −0.083. L P 24 24 24 Like the study about tapping fingers under the influence of caffeine, this study is an experiment, where the treatment (Lithium or Placebo) was randomly assigned to patients. When we generate a randomization distribution, we want to be faithful to this process, even as we take the null hypothesis into account. That is, the mental image of dropping slips of paper into two bags, one bag containing the 48 relapse results (38 “yes” and 10 “no”) and the other containing the 48 treatments (24 “Lithiums” and 24 “Placebos”), and randomly assigning the latter to the former as we select our randomization sample, is achieving both goals. Generating a randomization distribution, however, is trickier in RStudio for this situation than in earlier scenarios, primarily because of the work we must do to prepare data for randomization samples. You may well prefer to use StatKey, the software meant to accompany the textbook, over RStudio, for cases involving two proportions. I will, however, provide details in RStudio for your perusal. The main difficulty, as indicated above, is preparing data. Here are two approaches. Approach 1: Recreate the data from scratch We have done this sort of thing once before, back in Section 2.1. Perhaps you recall the commands. part1 <- do(6)* data.frame(Drug="Lithium", Relapse="no") part2 <- do(18)* data.frame(Drug="Lithium", Relapse="yes") part3 <- do(4)* data.frame(Drug="Placebo", Relapse="no") part4 <- do(20)* data.frame(Drug="Placebo", Relapse="yes") addictTreatments <- rbind(part1, part2, part3, part4) Approach 2: Filtering the supplied data frame It turns out we don’t actually need to recreate the data, as it has been supplied to us as part of the Lock5withR package in a data frame called CocaineTreatment. But working with it is not so straightforward as it 1 would at first seem, because this data frame contains all the patients, including those who received the drug called Desipramine. We can select the desired subset by leaving out these subjects: myFilteredData <- subset(CocaineTreatment, Drug != "Desipramine") However, there seems to be a lingering “memory” that there were three levels for the Drug variable. You see this, for instance, when you produce a frequency table on Drug: tally(~Drug, myFilteredData) ## Drug ## Desipramine Lithium Placebo ## 0 24 24 While the count of Desipramine patients is 0, we would prefer that our filtered data frame not know Desipramine is part of this study. One way to make it “forget” is to combine the removal of Desipramine patients with the droplevels() command. myFilteredData <- droplevels(subset(CocaineTreatment, Drug != "Desipramine")) tally(~Drug, myFilteredData) ## Drug ## Lithium Placebo ## 24 24 Now our Drug variable truly has just two levels in the myFilteredData data frame. Once data has been prepared . If you carried out the commands above, you now have two data frames, addictTreatments and myFil- teredData, which can be used for our analysis. Either will work, but I will use myFilteredData. head(myFilteredData) ## Drug Relapse ## 25 Lithium no ## 26 Lithium yes ## 27 Lithium yes ## 28 Lithium yes ## 29 Lithium yes ## 30 Lithium no We obtain our test statistic from the sample itself: diff(prop(Relapse~Drug, data=myFilteredData)) ## no.Placebo ## -0.08333333 As when dealing with the difference of two means (see the example using data from CaffeineTaps in a prior handout), our null hypothesis dictates that the drug received (Lithium vs. Placebo) is not actually a factor, and we should generate many randomization statistics by shuffling values of the explanatory variable. One randomization statistic is obtained with the command diff(prop(Relapse~shuffle(Drug), data=myFilteredData)) ## no.Placebo ## 0.1666667 and this may be repeated many times to obtain a randomization distribution: 2 manyDiffs <- do(5000)* diff(prop(Relapse~shuffle(Drug), data=myFilteredData)) head(manyDiffs) ## no.Placebo ## 1 0.25000000 ## 2 -0.16666667 ## 3 0.25000000 ## 4 0.08333333 ## 5 -0.16666667 ## 6 0.00000000 The column, containing 5000 randomization statistics, has been given the curious name no.Placebo. We may view a histogram and mark the region corresponding to our P -value: histogram(~no.Placebo, data=manyDiffs, groups = no.Placebo <=-0.083333, width=.1) 2.5 2.0 1.5 1.0 Density 0.5 0.0 −0.4 −0.2 0.0 0.2 0.4 no.Placebo nrow(subset(manyDiffs, no.Placebo <=-0.083333)) / 5000 ## [1] 0.3488 This P -value, here approximately 0.36, represents the probability, in a world where Lithium does not help deter relapse into cocaine addiction, of obtaining a sample with a test statistic (difference in sample proportions) of −0.08333 or more. This P -value is not statistically significant under any of the usual significance levels α = 0.1, 0.05 or 0.01. In fact, such samples statistics would arise about 36% of the time, which makes our sample statistic appear consistent with the null hypothesis. We fail to reject the null hypothesis. Example: Hypothesis Test for Positive Correlation (NFL Malevo- lence) The hypotheses (explained in the text, Section 4.4): H0 : ρ = 0, Ha : ρ > 0. The test statistic: cor(ZPenYds ~ NFL_Malevolence, data=MalevolentUniformsNFL) ## [1] 0.429796 Generation of many randomization statistics: 3 manyCors <- do(5000)* cor(ZPenYds ~ shuffle(NFL_Malevolence), data=MalevolentUniformsNFL) head(manyCors) ## cor ## 1 -0.22396686 ## 2 -0.39130305 ## 3 0.06329420 ## 4 0.11707616 ## 5 0.19503326 ## 6 0.09328136 histogram(~cor, data=manyCors, groups=cor>=0.42979) 1.5 1.0 Density 0.5 0.0 −0.5 0.0 0.5 cor The P -value: nrow(subset(manyCors, cor>=0.42979)) / 5000 ## [1] 0.0108 In the case where the significance level α = 0.05, this result is statistically signficant, and we would reject the null hypothesis in favor of the alternative, concluding that there is a positive correlation. Example: Is the mean body temperature really 98.6◦? The hypotheses: H0 : µ = 98.6, Ha : µ 6= 98.6. The test statistic: mean(~BodyTemp, data=BodyTemp50) ## [1] 98.26 The natural thing would be to simulate the bootstrap distribution for x¯, as when we constructed a confidence interval for the population mean µ: manyMeans = do(5000)* mean(~BodyTemp, data=resample(BodyTemp50)) head(manyMeans) ## mean ## 1 98.332 ## 2 98.190 4 ## 3 98.250 ## 4 98.280 ## 5 98.206 ## 6 98.280 histogram(~mean, data=manyMeans) 3 2 Density 1 0 98.0 98.2 98.4 98.6 mean But this cannot be an proper simulation of the null distribution, as it is not centered at the right place. It appears the center is about 98.26, the value of our point estimate x¯, not at the hypothesized (population) mean of 98.6, which is what happens whenever we bootstrap a mean. Our randomization statistics should not be the same as bootstrap statistics here, but need to be modified so that they are centered on the proposed mean 98.6. The modification can simply be that we add to each of our sample means the difference between the intended center (98.6) and where they were centered above (at the sample mean x¯ = 98.26): that is, we should add 98.6 − 98.26 = 0.34: manyMeans = do(5000)*( mean(~BodyTemp, data=resample(BodyTemp50)) + 0.34) names(manyMeans) ## [1] "result" histogram(~result, data=manyMeans, groups = abs(result-98.6)>=0.34) 3 2 Density 1 0 98.2 98.4 98.6 98.8 99.0 result We see this modified test statistic has a randomization distribution centered where it ought to be if serving as the null distribution. We have attempted to shade those regions in both tails corresponding to randomization statistics at least as extreme as ours, though there are very few. We obtain the approximate P -value by calculating the area in one tail and doubling it: 5 nrow(subset(manyMeans, result <= 98.26)) *2/ 5000 ## [1] 0.002 Given this small P -value, we reject the null hypothesis and conclude that the actual (population) mean body temperature is something other than 98.6. Example 4.34: A New Wrinkle on Finger Tapping and Caffeine This example has already been done adequately. Since it was a controlled, randomized experiment in which one treatment, either caffeine or placebo, was assigned randomly to each subject, we obtained our randomization distribution in a manner that also randomly assigned treatment values while adhering to the null hypothesis that “treatment doesn’t matter.” We obtained one randomization statistic with the command diff(mean(Taps ~ shuffle(Caffeine), data=CaffeineTaps)) and an entire distribution of such statistics by repeating this command often. Example 4.34 challenges us to imagine different ways of studying the question: “Does caffeine increase tapping rates?” Surely there are other approaches besides a controlled randomized experiment.