Randomization Distributions
Total Page:16
File Type:pdf, Size:1020Kb
Section 4.4 Creating Randomization Distributions Statistics: Unlocking the Power of Data Lock5 Randomization Distributions p-values can be calculated by randomization distributions: simulate samples, assuming H0 is true calculate the statistic of interest for each sample find the p-value as the proportion of simulated statistics as extreme as the observed statistic Today we’ll see ways to simulate randomization samples for a variety of situations Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • In a randomized experiment on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed • Question of interest: Is Desipramine better than Lithium at treating cocaine addiction? Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • What are the null and alternative hypotheses? • What are the possible conclusions? Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • What are the null and alternative hypotheses? pD, pL: proportion of cocaine addicts who relapse after taking Desipramine or Lithium, respectively ̂ H0: pD = pL H : p < p a D L • What are the possible conclusions? Reject H0; Desipramine is better than Lithium Do not reject H0: We cannot determine from these data whether Desipramine is better than Lithium Statistics: Unlocking the Power of Data Lock5 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Statistics: Unlocking the Power of Data Lock5 2. Conduct experiment 3. Observe relapse counts in each group R = Relapse N = No Relapse 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R ppˆˆDL− R R R R R R R R R R NR NR 10 18 R R R R R R = − NR NR N N N N 24 24 R R R R R R = −.333 N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse Statistics: Unlocking the Power of Data Lock5 Measuring Evidence against H0 To see if a statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 were true Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • “by random chance” means by the random assignment to the two treatment groups • “if H0 were true” means if the two drugs were equally effective at preventing relapses (equivalently: whether a person relapses or not does not depend on which drug is taken) • Simulate what would happen just by random chance, if H0 were true… Statistics: Unlocking the Power of Data Lock5 R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse Statistics: Unlocking the Power of Data Lock5 R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N Simulate another randomization Desipramine Lithium R N R N N N N R ppˆˆD − L R R R R R R 16 12 N R R N N N = − R N R R R N 24 24 N R N R R N R N N N R R = 0.167 R N R R R R 16 relapse, 8 no relapse 12 relapse, 12 no relapse Statistics: Unlocking the Power of Data Lock5 Simulate another randomization Desipramine Lithium R R R R R R R R R R R R ppˆˆDL− R N R R N N 17 11 R R R R R R = − R R N R N R 24 24 R R R R R R R N R N R R = 0.250 N N N N N N 17 relapse, 7 no relapse 11 relapse, 13 no relapse Statistics: Unlocking the Power of Data Lock5 Simulate Your Own Sample In the experiment, 28 people relapsed and 20 people did not relapse. Create cards or slips of paper with 28 “R” values and 20 “N” values. Pool these response values together, and randomly divide them into two groups (representing Desipramine and Lithium) Calculate your difference in proportions Plot your statistic on the class dotplot To create an entire randomization distribution, we simulate this process many more times with technology: StatKey Statistics: Unlocking the Power of Data Lock5 www.lock5stat.com/statkey p-value Statistics: Unlocking the Power of Data Lock5 Randomization Distribution Center A randomization distribution simulates samples assuming the null hypothesis is true, so A randomization distribution is centered at the value of the parameter given in the null hypothesis. Statistics: Unlocking the Power of Data Lock5 Randomization Distribution In a hypothesis test for H0: µ = 12 vs Ha: µ < 12, we have a sample with n = 45 and = 10.2. • What do we require about the method̅ to produce randomization samples? µ = 12 • Where will the randomization distribution be centered? 12 Statistics: Unlocking the Power of Data Lock5 Randomization Distribution For a randomization distribution, each simulated sample should… •be consistent with the null hypothesis •use the data in the observed sample •reflect the way the data were collected Statistics: Unlocking the Power of Data Lock5 Randomized Experiments • In randomized experiments the “randomness” is the random allocation to treatment groups • If the null hypothesis is true, the response values would be the same, regardless of treatment group assignment • To simulate what would happen just by random chance, if H0 were true: o reallocate cases to treatment groups, keeping the response values the same Statistics: Unlocking the Power of Data Lock5 Observational Studies In observational studies, the “randomness” is random sampling from the population To simulate what would happen, just by random chance, if H0 were true: Simulate resampling from a population in which H0 is true How do we simulate resampling from a population when we only have sample data? Bootstrap! How can we generate randomization samples for observational studies? Make H0 true, then bootstrap! Statistics: Unlocking the Power of Data Lock5 Body Temperatures • µ = average human body temperate98.6° H0 : µ = 98.6° Ha : µ ° • = 98.26≠ 98.6 • We̅ can make the null true just by adding 98.6 – 98.26 = 0.34° to each value, to make the mean be 98.6 • Bootstrapping from this revised sample lets us simulate samples, assuming H0 is true! Statistics: Unlocking the Power of Data Lock5 Body Temperatures • In StatKey, when we enter the null hypothesis, this shifting is automatically done for us StatKey p-value = 0.002 Statistics: Unlocking the Power of Data Lock5 Creating Randomization Samples 1. Do males exercise more hours per week than females? = 3 ̅ − ̅ 2. Is blood pressure negatively correlated with heart rate? = 0.057 State null and alternative hypotheses − Devise a way to generate a randomization sample that Uses the observed sample data Makes the null hypothesis true Reflects the way the data were collected Statistics: Unlocking the Power of Data Lock5 Exercise and Gender • H0: µm = µf , Ha: µm > µf • To make H0 true, we must make the means equal. One way to do this is to add 3 to every female value (there are other ways) • Bootstrap from this modified sample • In StatKey, the default randomization method is “reallocate groups”, but “Shift Groups” is also an option, and will do this Statistics: Unlocking the Power of Data Lock5 Exercise and Gender p-value = 0.095 Statistics: Unlocking the Power of Data Lock5 Blood Pressure and Heart Rate • H0: ρ = 0 , Ha: ρ < 0 • Two variables have correlation 0 if they are not associated. We can “break the association” by randomly permuting/scrambling/shuffling one of the variables • Each time we do this, we get a sample we might observe just by random chance, if there really is no correlation Statistics: Unlocking the Power of Data Lock5 Blood Pressure and Heart Rate Even if blood pressure and p-value = heart rate are not correlated, 0.219 we would see correlations this extreme about 22% of the time, just by random chance. Statistics: Unlocking the Power of Data Lock5 Randomization Distribution Paul the Octopus (single proportion): Flip a coin 8 times Cocaine Addiction (randomized experiment): Rerandomize cases to treatment groups, keeping response values fixed Body Temperature (single mean): Shift to make H0 true, then bootstrap Exercise and Gender (observational study): Shift to make H0 true, then bootstrap Blood Pressure and Heart Rate (correlation): Randomly permute/scramble/shuffle one variable Statistics: Unlocking the Power of Data Lock5 Generating Randomization Samples • As long as the original data is used and the null hypothesis is true for the randomization samples, most methods usually give similar answers in terms of a p-value • StatKey generates the randomizations for you, so most important is not understanding how to generate randomization samples, but understanding why Statistics: Unlocking the Power of Data Lock5 Summary Randomization samples should be generated Consistent with the null hypothesis Using the observed data Reflecting the way the data were collected The specific method varies with the situation, but the general idea is always the same Statistics: Unlocking the Power of Data Lock5 .