<<

Section 4.4

Creating Distributions

Statistics: Unlocking the Power of Lock5 Randomization Distributions  p-values can be calculated by randomization distributions:

 simulate samples, assuming H0 is true  calculate the of interest for each  find the p-value as the proportion of simulated statistics as extreme as the observed statistic  Today we’ll see ways to simulate randomization samples for a variety of situations

Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • In a randomized on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed

• Question of interest: Is Desipramine better than Lithium at treating cocaine addiction?

Statistics: Unlocking the Power of Data Lock5

Cocaine Addiction • What are the null and alternative hypotheses?

• What are the possible conclusions?

Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • What are the null and alternative hypotheses?

pD , pL: proportion of cocaine addicts who relapse after taking Desipramine or Lithium,𝐷𝐷 respectively 𝑝𝑝̂ H0: pD = pL Ha: pD < pL • What are the possible conclusions?

Reject H0; Desipramine is better than Lithium

Do not reject H0: We cannot determine from these data whether Desipramine is better than Lithium

Statistics: Unlocking the Power of Data Lock5 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

1. Randomly assign units to treatment groups Desipramine Lithium

R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

Statistics: Unlocking the Power of Data Lock5 2. Conduct experiment 3. Observe relapse counts in each group

R = Relapse N = No Relapse

1. Randomly assign units to treatment groups Desipramine Lithium

R R R R R R ppˆˆDL− R R R R R R R R R R NR NR 10 18 R R R R R R = − NR NR N N N N 24 24 R R R R R R = −.333 N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse Statistics: Unlocking the Power of Data Lock5 Measuring Evidence against H0 To see if a statistic provides

evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance,

if H0 were true

Statistics: Unlocking the Power of Data Lock5 Cocaine Addiction • “by random chance” by the to the two treatment groups

• “if H0 were true” means if the two drugs were equally effective at preventing relapses (equivalently: whether a person relapses or not does not depend on which drug is taken) • Simulate what would happen just by random chance, if H0 were true…

Statistics: Unlocking the Power of Data Lock5 R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

Statistics: Unlocking the Power of Data Lock5 R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N

Simulate another randomization Desipramine Lithium

R N R N N N N R ppˆˆD − L R R R R R R 16 12 N R R N N N = − R N R R R N 24 24 N R N R R N R N N N R R = 0.167 R N R R R R 16 relapse, 8 no relapse 12 relapse, 12 no relapse

Statistics: Unlocking the Power of Data Lock5 Simulate another randomization Desipramine Lithium

R R R R R R R R R R R R ppˆˆDL− R N R R N N 17 11 R R R R R R = − R R N R N R 24 24 R R R R R R R N R N R R = 0.250 N N N N N N 17 relapse, 7 no relapse 11 relapse, 13 no relapse

Statistics: Unlocking the Power of Data Lock5 Simulate Your Own Sample  In the experiment, 28 people relapsed and 20 people did not relapse. Create cards or slips of paper with 28 “R” values and 20 “N” values.  Pool these response values together, and randomly divide them into two groups (representing Desipramine and Lithium)  Calculate your difference in proportions  Plot your statistic on the class dotplot  To create an entire randomization distribution, we simulate this process many more times with technology: StatKey

Statistics: Unlocking the Power of Data Lock5 www.lock5stat.com/statkey

p-value

Statistics: Unlocking the Power of Data Lock5 Randomization Distribution Center

 A randomization distribution simulates samples assuming the null hypothesis is true, so

A randomization distribution is centered at the value of the parameter given in the null hypothesis.

Statistics: Unlocking the Power of Data Lock5 Randomization Distribution

In a hypothesis test for H0: µ = 12 vs Ha: µ < 12, we have a sample with n = 45 and = 10.2.

• What do we require about the method𝑥𝑥̅ to produce randomization samples? µ = 12 • Where will the randomization distribution be centered? 12

Statistics: Unlocking the Power of Data Lock5 Randomization Distribution

For a randomization distribution, each simulated sample should…

•be consistent with the null hypothesis •use the data in the observed sample •reflect the way the data were collected

Statistics: Unlocking the Power of Data Lock5 Randomized • In randomized experiments the “” is the random allocation to treatment groups • If the null hypothesis is true, the response values would be the same, regardless of treatment group assignment • To simulate what would happen just by random chance, if H0 were true:

o reallocate cases to treatment groups, keeping the response values the same

Statistics: Unlocking the Power of Data Lock5 Observational Studies  In observational studies, the “randomness” is random from the population  To simulate what would happen, just by random chance, if H0 were true:  Simulate from a population in which H0 is true  How do we simulate resampling from a population when we only have sample data?  Bootstrap!  How can we generate randomization samples for observational studies?

 Make H0 true, then bootstrap! Statistics: Unlocking the Power of Data Lock5 Body Temperatures • µ = average human body temperate98.6° H0 : µ = 98.6° Ha : µ °

• = 98.26≠ 98.6 • 𝑥𝑥We̅ can make the null true just by adding 98.6 – 98.26 = 0.34° to each value, to make the be 98.6 • Bootstrapping from this revised sample lets us simulate samples, assuming H0 is true! Statistics: Unlocking the Power of Data Lock5

Body Temperatures • In StatKey, when we enter the null hypothesis, this shifting is automatically done for us

StatKey

p-value = 0.002

Statistics: Unlocking the Power of Data Lock5 Creating Randomization Samples

1. Do males exercise more hours per week than females? = 3

𝑥𝑥̅𝑚𝑚 − 𝑥𝑥𝑓𝑓̅ 2. Is blood pressure negatively correlated with heart rate? = 0.057

 State null and alternative hypotheses𝑟𝑟 −  Devise a way to generate a randomization sample that  Uses the observed sample data  Makes the null hypothesis true  Reflects the way the data were collected Statistics: Unlocking the Power of Data Lock5

Exercise and Gender

• H0: µm = µf , Ha: µm > µf

• To make H0 true, we must make the means equal. One way to do this is to add 3 to every female value (there are other ways) • Bootstrap from this modified sample • In StatKey, the default randomization method is “reallocate groups”, but “Shift Groups” is also an option, and will do this

Statistics: Unlocking the Power of Data Lock5

Exercise and Gender

p-value = 0.095

Statistics: Unlocking the Power of Data Lock5 Blood Pressure and Heart Rate

• H0: ρ = 0 , Ha: ρ < 0 • Two variables have correlation 0 if they are not associated. We can “break the association” by randomly permuting/scrambling/shuffling one of the variables • Each time we do this, we get a sample we might observe just by random chance, if there really is no correlation

5 Statistics: Unlocking the Power of Data Lock

Blood Pressure and Heart Rate

Even if blood pressure and p-value = heart rate are not correlated, 0.219 we would see correlations this extreme about 22% of the time, just by random chance.

Statistics: Unlocking the Power of Data Lock5 Randomization Distribution  Paul the Octopus (single proportion):  Flip a coin 8 times  Cocaine Addiction ():  Rerandomize cases to treatment groups, keeping response values fixed  Body Temperature (single mean):  Shift to make H0 true, then bootstrap  Exercise and Gender ():  Shift to make H0 true, then bootstrap  Blood Pressure and Heart Rate (correlation):  Randomly permute/scramble/shuffle one variable

Statistics: Unlocking the Power of Data Lock5 Generating Randomization Samples

• As long as the original data is used and the null hypothesis is true for the randomization samples, most methods usually give similar answers in terms of a p-value • StatKey generates the randomizations for you, so most important is not understanding how to generate randomization samples, but understanding why

Statistics: Unlocking the Power of Data Lock5 Summary  Randomization samples should be generated

Consistent with the null hypothesis Using the observed data Reflecting the way the data were collected  The specific method varies with the situation, but the general idea is always the same

Statistics: Unlocking the Power of Data Lock5