How Surprised is the Skeptic? Simulating the Skeptic’s World

STAT 113 Hypothesis Testing II

The World According to the Null Hypothesis

Colin Reimer Dawson

October 19, 2020

1 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

How Surprised is the Skeptic?

Simulating the Skeptic’s World Randomization Distribution

2 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

The Lady Tasting Tea At a 1920s party in Cambridge, UK, a lady (Dr. Muriel Bristol, a phycologist, specializing in algae) claimed she could tell whether a cup of tea had been prepared by adding milk before or after the tea was poured. A statistician, , also in attendance, proposed a blind taste test w/ 10 cups of tea, each prepared in random order.

• How much success is enough to believe her? 3 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

The Null Hypothesis

• R.A. Fisher: Formulate the “most boring” hypothesis about the world/process/population (“nothing to see here; moving along”) • Try to measure how surprising the data would have been if the “boring” thing were true. • Fisher called this boring “antihypothesis” the null hypothesis (abbreviated as H0) 4 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

The Alternative Hypothesis

and added the idea of a specific alternative hypothesis to this formulation • The “alternative” is usually the one that you started with

H0: the new drug works no better than the old one

H1: the new drug works better than the old one

H0: there is no relationship between bill and tip percent

H1: there is some relationship between bill and tip percent 5 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Hypotheses Are About Parameters

• We know what’s true about our dataset (the sample) • Our hypotheses propose possibilities involving the wider context (population/process) • Important: When formulating statistical hypotheses, they will always be about the population/process/phenomenon

• Correct H1: A majority of all U.S. registered voters plan to vote for Biden in November • Incorrect H1: A majority of the registered voters in the poll plan to vote for Biden in November

6 / 17 • Data: She tastes 10 cups in a blind taste test.

• Data: 20 mice are randomly split into two groups. One group is fed in the light, another in the dark. Their food intake in grams is measured.

• Data: We measure pH and mercury levels in 50 random lakes in Florida. 2. Claim: Dr. Bristol can tell the difference between milk-first and tea-first preparations better than a coin flip

3. Claim: Lab mice eat more on average if the room is light at meal time than if it is dark at meal time.

How Surprised is the Skeptic? Simulating the Skeptic’s World

Self-Check: Null and Alternative Hypotheses For the following research claims and datasets, identify (a) the relevant parameter(s) and the context where it applies (b) the statistic(s) that we can use to estimate the parameter(s), (c) the null hypothesis (H0), and (d) the alternative hypothesis (H1) 1. Claim: There is a positive linear association between pH and mercury in Florida lakes.

7 / 17 • Data: She tastes 10 cups in a blind taste test.

• Data: 20 mice are randomly split into two groups. One group is fed in the light, another in the dark. Their food intake in grams is measured.

2. Claim: Dr. Bristol can tell the difference between milk-first and tea-first preparations better than a coin flip

3. Claim: Lab mice eat more on average if the room is light at meal time than if it is dark at meal time.

How Surprised is the Skeptic? Simulating the Skeptic’s World

Self-Check: Null and Alternative Hypotheses For the following research claims and datasets, identify (a) the relevant parameter(s) and the context where it applies (b) the statistic(s) that we can use to estimate the parameter(s), (c) the null hypothesis (H0), and (d) the alternative hypothesis (H1) 1. Claim: There is a positive linear association between pH and mercury in Florida lakes. • Data: We measure pH and mercury levels in 50 random lakes in Florida.

7 / 17 • Data: 20 mice are randomly split into two groups. One group is fed in the light, another in the dark. Their food intake in grams is measured.

• Data: She tastes 10 cups in a blind taste test. 3. Claim: Lab mice eat more on average if the room is light at meal time than if it is dark at meal time.

How Surprised is the Skeptic? Simulating the Skeptic’s World

Self-Check: Null and Alternative Hypotheses For the following research claims and datasets, identify (a) the relevant parameter(s) and the context where it applies (b) the statistic(s) that we can use to estimate the parameter(s), (c) the null hypothesis (H0), and (d) the alternative hypothesis (H1) 1. Claim: There is a positive linear association between pH and mercury in Florida lakes. • Data: We measure pH and mercury levels in 50 random lakes in Florida. 2. Claim: Dr. Bristol can tell the difference between milk-first and tea-first preparations better than a coin flip

7 / 17 • Data: 20 mice are randomly split into two groups. One group is fed in the light, another in the dark. Their food intake in grams is measured.

3. Claim: Lab mice eat more on average if the room is light at meal time than if it is dark at meal time.

How Surprised is the Skeptic? Simulating the Skeptic’s World

Self-Check: Null and Alternative Hypotheses For the following research claims and datasets, identify (a) the relevant parameter(s) and the context where it applies (b) the statistic(s) that we can use to estimate the parameter(s), (c) the null hypothesis (H0), and (d) the alternative hypothesis (H1) 1. Claim: There is a positive linear association between pH and mercury in Florida lakes. • Data: We measure pH and mercury levels in 50 random lakes in Florida. 2. Claim: Dr. Bristol can tell the difference between milk-first and tea-first preparations better than a coin flip • Data: She tastes 10 cups in a blind taste test.

7 / 17 • Data: 20 mice are randomly split into two groups. One group is fed in the light, another in the dark. Their food intake in grams is measured.

How Surprised is the Skeptic? Simulating the Skeptic’s World

Self-Check: Null and Alternative Hypotheses For the following research claims and datasets, identify (a) the relevant parameter(s) and the context where it applies (b) the statistic(s) that we can use to estimate the parameter(s), (c) the null hypothesis (H0), and (d) the alternative hypothesis (H1) 1. Claim: There is a positive linear association between pH and mercury in Florida lakes. • Data: We measure pH and mercury levels in 50 random lakes in Florida. 2. Claim: Dr. Bristol can tell the difference between milk-first and tea-first preparations better than a coin flip • Data: She tastes 10 cups in a blind taste test. 3. Claim: Lab mice eat more on average if the room is light at meal time than if it is dark at meal time.

7 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Self-Check: Null and Alternative Hypotheses For the following research claims and datasets, identify (a) the relevant parameter(s) and the context where it applies (b) the statistic(s) that we can use to estimate the parameter(s), (c) the null hypothesis (H0), and (d) the alternative hypothesis (H1) 1. Claim: There is a positive linear association between pH and mercury in Florida lakes. • Data: We measure pH and mercury levels in 50 random lakes in Florida. 2. Claim: Dr. Bristol can tell the difference between milk-first and tea-first preparations better than a coin flip • Data: She tastes 10 cups in a blind taste test. 3. Claim: Lab mice eat more on average if the room is light at meal time than if it is dark at meal time. • Data: 20 mice are randomly split into two groups. One group is fed in the light, another in the dark. Their food intake in grams is measured. 7 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Outline

How Surprised is the Skeptic?

Simulating the Skeptic’s World Randomization Distribution

8 / 17 • a skeptic who thinks there is nothing interesting going on (they believe H0) • a proponent who thinks there is something interesting there (they believe H1) • Ask which values of the statistic would surprise the proponent less than they would surprise the skeptic • Of those, sort them in descending order according to how much they favor the proponent • If the data yields a statistic which is sufficiently far up that list, the skeptic will change their mind

How Surprised is the Skeptic? Simulating the Skeptic’s World

Logic of Testing H0

• Imagine two observers:

9 / 17 • a proponent who thinks there is something interesting there (they believe H1) • Ask which values of the statistic would surprise the proponent less than they would surprise the skeptic • Of those, sort them in descending order according to how much they favor the proponent • If the data yields a statistic which is sufficiently far up that list, the skeptic will change their mind

How Surprised is the Skeptic? Simulating the Skeptic’s World

Logic of Testing H0

• Imagine two observers: • a skeptic who thinks there is nothing interesting going on (they believe H0)

9 / 17 • Ask which values of the statistic would surprise the proponent less than they would surprise the skeptic • Of those, sort them in descending order according to how much they favor the proponent • If the data yields a statistic which is sufficiently far up that list, the skeptic will change their mind

How Surprised is the Skeptic? Simulating the Skeptic’s World

Logic of Testing H0

• Imagine two observers: • a skeptic who thinks there is nothing interesting going on (they believe H0) • a proponent who thinks there is something interesting there (they believe H1)

9 / 17 • Of those, sort them in descending order according to how much they favor the proponent • If the data yields a statistic which is sufficiently far up that list, the skeptic will change their mind

How Surprised is the Skeptic? Simulating the Skeptic’s World

Logic of Testing H0

• Imagine two observers: • a skeptic who thinks there is nothing interesting going on (they believe H0) • a proponent who thinks there is something interesting there (they believe H1) • Ask which values of the statistic would surprise the proponent less than they would surprise the skeptic

9 / 17 • If the data yields a statistic which is sufficiently far up that list, the skeptic will change their mind

How Surprised is the Skeptic? Simulating the Skeptic’s World

Logic of Testing H0

• Imagine two observers: • a skeptic who thinks there is nothing interesting going on (they believe H0) • a proponent who thinks there is something interesting there (they believe H1) • Ask which values of the statistic would surprise the proponent less than they would surprise the skeptic • Of those, sort them in descending order according to how much they favor the proponent

9 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Logic of Testing H0

• Imagine two observers: • a skeptic who thinks there is nothing interesting going on (they believe H0) • a proponent who thinks there is something interesting there (they believe H1) • Ask which values of the statistic would surprise the proponent less than they would surprise the skeptic • Of those, sort them in descending order according to how much they favor the proponent • If the data yields a statistic which is sufficiently far up that list, the skeptic will change their mind

9 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

• Dr. Bristol tastes 10 cups of tea and guesses the preparation of each. What is the statistic of interest? • Which values should suprise a proponent less than they surprise a skeptic? • Which of those favor the proponent the most?

10 / 17 • These weights together must add up to 1 • Then, sort the according to how much they favor the proponent’s explanation • Once we have data, start adding up weights from the top of the list until we hit the observed statistic from our dataset • The number we get to when we stop (will be between 0 and 1) is called the P -value

How Surprised is the Skeptic? Simulating the Skeptic’s World

The P -value

• The skeptic assigns a weight to every possible value of the statistic, according to how often that value should occur in the skeptic’s model of the world

11 / 17 • Then, sort the statistics according to how much they favor the proponent’s explanation • Once we have data, start adding up weights from the top of the list until we hit the observed statistic from our dataset • The number we get to when we stop (will be between 0 and 1) is called the P -value

How Surprised is the Skeptic? Simulating the Skeptic’s World

The P -value

• The skeptic assigns a weight to every possible value of the statistic, according to how often that value should occur in the skeptic’s model of the world • These weights together must add up to 1

11 / 17 • Once we have data, start adding up weights from the top of the list until we hit the observed statistic from our dataset • The number we get to when we stop (will be between 0 and 1) is called the P -value

How Surprised is the Skeptic? Simulating the Skeptic’s World

The P -value

• The skeptic assigns a weight to every possible value of the statistic, according to how often that value should occur in the skeptic’s model of the world • These weights together must add up to 1 • Then, sort the statistics according to how much they favor the proponent’s explanation

11 / 17 • The number we get to when we stop (will be between 0 and 1) is called the P -value

How Surprised is the Skeptic? Simulating the Skeptic’s World

The P -value

• The skeptic assigns a weight to every possible value of the statistic, according to how often that value should occur in the skeptic’s model of the world • These weights together must add up to 1 • Then, sort the statistics according to how much they favor the proponent’s explanation • Once we have data, start adding up weights from the top of the list until we hit the observed statistic from our dataset

11 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

The P -value

• The skeptic assigns a weight to every possible value of the statistic, according to how often that value should occur in the skeptic’s model of the world • These weights together must add up to 1 • Then, sort the statistics according to how much they favor the proponent’s explanation • Once we have data, start adding up weights from the top of the list until we hit the observed statistic from our dataset • The number we get to when we stop (will be between 0 and 1) is called the P -value 11 / 17 Slightly Less Formal Definition: P -value The combined weight assigned by the skeptic to potential statistics on the list up to and including the statistic we actually got (where the potential statistics have been ordered from most to least favorable to the proponent)

How Surprised is the Skeptic? Simulating the Skeptic’s World

Formal Definition: P -value

The probability of obtaining a result that favors H1 over H0 at least as much as what was actually observed, based on H0’s model of potential datasets

12 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Formal Definition: P -value

The probability of obtaining a result that favors H1 over H0 at least as much as what was actually observed, based on H0’s model of potential datasets

Slightly Less Formal Definition: P -value The combined weight assigned by the skeptic to potential statistics on the list up to and including the statistic we actually got (where the potential statistics have been ordered from most to least favorable to the proponent)

12 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Outline

How Surprised is the Skeptic?

Simulating the Skeptic’s World Randomization Distribution

13 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Simulating the Skeptic’s World

• Often we can simulate the world according to the skeptic (H0) in order to figure out the weights assigned to each potential value of the statistic • Physical simulations: Coin flips, dice, cards, etc. • Computer simulations: R, StatKey

14 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Outline

How Surprised is the Skeptic?

Simulating the Skeptic’s World Randomization Distribution

15 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Randomization Distribution A randomization distribution is a hypothetical sampling distribution generated from a population/process which is constructed so as to be consistent with the skeptic’s (H0’s) worldview

• The randomization distribution tells us how likely any particular value of the statistic of interest would be if the skeptic (H0) were correct

16 / 17 • Because there are two choices, one of which is correct, if she were guessing randomly, each cup can be modeled by a coin flip • We can therefore construct a virtual world in which the taster is guessing randomly by flipping a coin 10 times • By repeating this simulation several thousand times, we can see how often each value of the statistic “number of correct responses” occurs • Then, we can get a P -value by finding the proportion of those simulations that yield between and correct guesses

How Surprised is the Skeptic? Simulating the Skeptic’s World

Simulating a Randomization Distribution • Suppose Dr. Bristol guessed correctly for 9 out of 10 cups of tea

StatKey 17 / 17 • We can therefore construct a virtual world in which the taster is guessing randomly by flipping a coin 10 times • By repeating this simulation several thousand times, we can see how often each value of the statistic “number of correct responses” occurs • Then, we can get a P -value by finding the proportion of those simulations that yield between and correct guesses

How Surprised is the Skeptic? Simulating the Skeptic’s World

Simulating a Randomization Distribution • Suppose Dr. Bristol guessed correctly for 9 out of 10 cups of tea • Because there are two choices, one of which is correct, if she were guessing randomly, each cup can be modeled by a coin flip

StatKey 17 / 17 • By repeating this simulation several thousand times, we can see how often each value of the statistic “number of correct responses” occurs • Then, we can get a P -value by finding the proportion of those simulations that yield between and correct guesses

How Surprised is the Skeptic? Simulating the Skeptic’s World

Simulating a Randomization Distribution • Suppose Dr. Bristol guessed correctly for 9 out of 10 cups of tea • Because there are two choices, one of which is correct, if she were guessing randomly, each cup can be modeled by a coin flip • We can therefore construct a virtual world in which the taster is guessing randomly by flipping a coin 10 times

StatKey 17 / 17 • Then, we can get a P -value by finding the proportion of those simulations that yield between and correct guesses

How Surprised is the Skeptic? Simulating the Skeptic’s World

Simulating a Randomization Distribution • Suppose Dr. Bristol guessed correctly for 9 out of 10 cups of tea • Because there are two choices, one of which is correct, if she were guessing randomly, each cup can be modeled by a coin flip • We can therefore construct a virtual world in which the taster is guessing randomly by flipping a coin 10 times • By repeating this simulation several thousand times, we can see how often each value of the statistic “number of correct responses” occurs

StatKey 17 / 17 How Surprised is the Skeptic? Simulating the Skeptic’s World

Simulating a Randomization Distribution • Suppose Dr. Bristol guessed correctly for 9 out of 10 cups of tea • Because there are two choices, one of which is correct, if she were guessing randomly, each cup can be modeled by a coin flip • We can therefore construct a virtual world in which the taster is guessing randomly by flipping a coin 10 times • By repeating this simulation several thousand times, we can see how often each value of the statistic “number of correct responses” occurs • Then, we can get a P -value by finding the proportion of those simulations that yield between and correct guesses StatKey 17 / 17