<<

Lecture 14: Hypothesis Testing for Proportions API-201Z

Maya Sen

Harvard Kennedy School http://scholar.harvard.edu/msen I We will be posting examples of successful final exercise memos from years past

Announcements Announcements

I We will be posting examples of successful final exercise memos from years past I Continuing with different types of hypothesis tests

I Last time: Hypothesis tests involving and difference in means

I Today: Hypothesis tests involving proportions

I Interpreting hypothesis tests

I Practical versus statistical significance

I Type I and Type II errors (and power, if time)

Roadmap I Last time: Hypothesis tests involving means and difference in means

I Today: Hypothesis tests involving proportions

I Interpreting hypothesis tests

I Practical versus statistical significance

I Type I and Type II errors (and power, if time)

Roadmap

I Continuing with different types of hypothesis tests I Today: Hypothesis tests involving proportions

I Interpreting hypothesis tests

I Practical versus statistical significance

I Type I and Type II errors (and power, if time)

Roadmap

I Continuing with different types of hypothesis tests

I Last time: Hypothesis tests involving means and difference in means I Interpreting hypothesis tests

I Practical versus statistical significance

I Type I and Type II errors (and power, if time)

Roadmap

I Continuing with different types of hypothesis tests

I Last time: Hypothesis tests involving means and difference in means

I Today: Hypothesis tests involving proportions I Practical versus statistical significance

I Type I and Type II errors (and power, if time)

Roadmap

I Continuing with different types of hypothesis tests

I Last time: Hypothesis tests involving means and difference in means

I Today: Hypothesis tests involving proportions

I Interpreting hypothesis tests I Type I and Type II errors (and power, if time)

Roadmap

I Continuing with different types of hypothesis tests

I Last time: Hypothesis tests involving means and difference in means

I Today: Hypothesis tests involving proportions

I Interpreting hypothesis tests

I Practical versus statistical significance Roadmap

I Continuing with different types of hypothesis tests

I Last time: Hypothesis tests involving means and difference in means

I Today: Hypothesis tests involving proportions

I Interpreting hypothesis tests

I Practical versus statistical significance

I Type I and Type II errors (and power, if time) I Often not interested in making inferences about a population , but a population proportion

I Proportions commonly used to summarize yes-or-no outcomes:

I True U.S. presidential approval rating I True support in the U.S. population for same-sex marriage I Proportion of people in an indigenous population who have a certain genetic marker I Proportion of households in a city that are below national poverty levels

I Can extend hypothesis testing to analyze all sorts of proportions

I Tweak existing hypothesis framework only slightly

Hypothesis Testing for Proportions I Proportions commonly used to summarize yes-or-no outcomes:

I True U.S. presidential approval rating I True support in the U.S. population for same-sex marriage I Proportion of people in an indigenous population who have a certain genetic marker I Proportion of households in a city that are below national poverty levels

I Can extend hypothesis testing to analyze all sorts of proportions

I Tweak existing hypothesis framework only slightly

Hypothesis Testing for Proportions

I Often not interested in making inferences about a population mean, but a population proportion I Can extend hypothesis testing to analyze all sorts of proportions

I Tweak existing hypothesis framework only slightly

Hypothesis Testing for Proportions

I Often not interested in making inferences about a population mean, but a population proportion

I Proportions commonly used to summarize yes-or-no outcomes:

I True U.S. presidential approval rating I True support in the U.S. population for same-sex marriage I Proportion of people in an indigenous population who have a certain genetic marker I Proportion of households in a city that are below national poverty levels I Tweak existing hypothesis framework only slightly

Hypothesis Testing for Proportions

I Often not interested in making inferences about a population mean, but a population proportion

I Proportions commonly used to summarize yes-or-no outcomes:

I True U.S. presidential approval rating I True support in the U.S. population for same-sex marriage I Proportion of people in an indigenous population who have a certain genetic marker I Proportion of households in a city that are below national poverty levels

I Can extend hypothesis testing to analyze all sorts of proportions Hypothesis Testing for Proportions

I Often not interested in making inferences about a population mean, but a population proportion

I Proportions commonly used to summarize yes-or-no outcomes:

I True U.S. presidential approval rating I True support in the U.S. population for same-sex marriage I Proportion of people in an indigenous population who have a certain genetic marker I Proportion of households in a city that are below national poverty levels

I Can extend hypothesis testing to analyze all sorts of proportions

I Tweak existing hypothesis framework only slightly I Can use hypothesis tests to explore these questions

I Objective: Use sample of data to make inferences about true population proportion, π (sometimes denoted p)

I Same steps as before

I Test slightly different

Hypothesis Testing for Proportions I Objective: Use sample of data to make inferences about true population proportion, π (sometimes denoted p)

I Same steps as before

I Test statistics slightly different

Hypothesis Testing for Proportions

I Can use hypothesis tests to explore these questions I Same steps as before

I Test statistics slightly different

Hypothesis Testing for Proportions

I Can use hypothesis tests to explore these questions

I Objective: Use sample of data to make inferences about true population proportion, π (sometimes denoted p) I Test statistics slightly different

Hypothesis Testing for Proportions

I Can use hypothesis tests to explore these questions

I Objective: Use sample of data to make inferences about true population proportion, π (sometimes denoted p)

I Same steps as before Hypothesis Testing for Proportions

I Can use hypothesis tests to explore these questions

I Objective: Use sample of data to make inferences about true population proportion, π (sometimes denoted p)

I Same steps as before

I Test statistics slightly different Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results

Hypothesis Testing for Proportions 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results

Hypothesis Testing for Proportions

Conducted using several steps: 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 5. Use the p-value to decide whether to reject null hypothesis and interpret results

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results I Historically it has been difficult to attract general physicians in the U.S. to rural areas, with 52% leaving rural areas within 1 year of arriving

I You work on a new program designed to incentive doctors to stay in rural areas via loan forgiveness, housing allowances

I Random sample of physicians in the program shows that 62 out of 130 leave within 1 year of arriving

I Based on this sample, what can you conclude about enrollees in the program compared to physicians arriving to rural areas overall?

Proportion Example: Physician Training Program I You work on a new program designed to incentive doctors to stay in rural areas via loan forgiveness, housing allowances

I Random sample of physicians in the program shows that 62 out of 130 leave within 1 year of arriving

I Based on this sample, what can you conclude about enrollees in the program compared to physicians arriving to rural areas overall?

Proportion Example: Physician Training Program

I Historically it has been difficult to attract general physicians in the U.S. to rural areas, with 52% leaving rural areas within 1 year of arriving I Random sample of physicians in the program shows that 62 out of 130 leave within 1 year of arriving

I Based on this sample, what can you conclude about enrollees in the program compared to physicians arriving to rural areas overall?

Proportion Example: Physician Training Program

I Historically it has been difficult to attract general physicians in the U.S. to rural areas, with 52% leaving rural areas within 1 year of arriving

I You work on a new program designed to incentive doctors to stay in rural areas via loan forgiveness, housing allowances I Based on this sample, what can you conclude about enrollees in the program compared to physicians arriving to rural areas overall?

Proportion Example: Physician Training Program

I Historically it has been difficult to attract general physicians in the U.S. to rural areas, with 52% leaving rural areas within 1 year of arriving

I You work on a new program designed to incentive doctors to stay in rural areas via loan forgiveness, housing allowances

I Random sample of physicians in the program shows that 62 out of 130 leave within 1 year of arriving Proportion Example: Physician Training Program

I Historically it has been difficult to attract general physicians in the U.S. to rural areas, with 52% leaving rural areas within 1 year of arriving

I You work on a new program designed to incentive doctors to stay in rural areas via loan forgiveness, housing allowances

I Random sample of physicians in the program shows that 62 out of 130 leave within 1 year of arriving

I Based on this sample, what can you conclude about enrollees in the program compared to physicians arriving to rural areas overall? I Step 1: Null and Alternative Hypotheses

I H0: π = 0.52 I Ha: π 6= 0.52 for two-sided test (or π > 0.52, for one-sided test)

Proportion Example: Physician Training Program I H0: π = 0.52 I Ha: π 6= 0.52 for two-sided test (or π > 0.52, for one-sided test)

Proportion Example: Physician Training Program

I Step 1: Null and Alternative Hypotheses I Ha: π 6= 0.52 for two-sided test (or π > 0.52, for one-sided test)

Proportion Example: Physician Training Program

I Step 1: Null and Alternative Hypotheses

I H0: π = 0.52 Proportion Example: Physician Training Program

I Step 1: Null and Alternative Hypotheses

I H0: π = 0.52 I Ha: π 6= 0.52 for two-sided test (or π > 0.52, for one-sided test) I Step 2: Collect sample data

I Assume this is done for us already!

I n = 130, with 62 doctors leaving within 1 year

Proportion Example: Physician Training Program I Assume this is done for us already!

I n = 130, with 62 doctors leaving within 1 year

Proportion Example: Physician Training Program

I Step 2: Collect sample data I n = 130, with 62 doctors leaving within 1 year

Proportion Example: Physician Training Program

I Step 2: Collect sample data

I Assume this is done for us already! Proportion Example: Physician Training Program

I Step 2: Collect sample data

I Assume this is done for us already!

I n = 130, with 62 doctors leaving within 1 year I Step 3: Conduct statistical test

I As before, need to find appropriate test statistic to compare our sample to distribution of data under the null hypothesis

I As before, another application of CLT

Proportion Example: Physician Training Program I As before, need to find appropriate test statistic to compare our sample to distribution of data under the null hypothesis

I As before, another application of CLT

Proportion Example: Physician Training Program

I Step 3: Conduct statistical test I As before, another application of CLT

Proportion Example: Physician Training Program

I Step 3: Conduct statistical test

I As before, need to find appropriate test statistic to compare our sample to distribution of data under the null hypothesis Proportion Example: Physician Training Program

I Step 3: Conduct statistical test

I As before, need to find appropriate test statistic to compare our sample to distribution of data under the null hypothesis

I As before, another application of CLT I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated , πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program xi I x¯ = πˆ = n P I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately normal distribution I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately normal distribution I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I (1) The sums and means of random variables have an approximately normal distribution I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I As a review, under the CLT, I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately normal distribution I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately normal distribution I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation)

Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately normal distribution I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal Proportion Example: Physician Training Program

I We’ll work w/ sample proportion (denoted πˆ orp ˆ) → sample mean when all observations 1s and 0s xi I x¯ = πˆ = n P I As a review, under the CLT,

I (1) The sums and means of random variables have an approximately normal distribution I (2) This distribution becomes more and more normal the more observations are included in the sum or the mean I (3) This is the case even though underlying distribution may not be normal

I Thus, under repeated sampling, πˆ’s distribution will follow a normal distribution (see Lecture 11 slides for the simulation) I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and in Appendix)

I ( xi follows a binomial distribution, so for derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize πˆ − π Z = q π(1−π) n

I where Z follows a standard normal

Proportion Example: Physician Training Program π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and variance in Appendix)

I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize πˆ − π Z = q π(1−π) n

I where Z follows a standard normal

Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will I See proofs for both mean and variance in Appendix)

I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize πˆ − π Z = q π(1−π) n

I where Z follows a standard normal

Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize πˆ − π Z = q π(1−π) n

I where Z follows a standard normal

Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and variance in Appendix) I Which means we can standardize πˆ − π Z = q π(1−π) n

I where Z follows a standard normal

Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and variance in Appendix)

I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n) πˆ − π Z = q π(1−π) n

I where Z follows a standard normal

Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and variance in Appendix)

I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize I where Z follows a standard normal

Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and variance in Appendix)

I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize πˆ − π Z = q π(1−π) n Proportion Example: Physician Training Program

I Specifically, with large enough sample, πˆ will π(1 − π) πˆ ∼ N(π, ) n

I See proofs for both mean and variance in Appendix)

I ( xi follows a binomial distribution, so parameters for sampling distribution derived from the parameters of a binomialP distribution divided by n)

I Which means we can standardize πˆ − π Z = q π(1−π) n

I where Z follows a standard normal I Under assumption that null is true (here, π = 0.52), can calculate test statistic using our data: πˆ − π Z = q π(1−π) n πˆ − π0 z = q π0(1−π0) n 62/130 − 0.52 = q 0.52(1−0.52) 130 = −0.983

Proportion Example: Physician Training Program πˆ − π Z = q π(1−π) n πˆ − π0 z = q π0(1−π0) n 62/130 − 0.52 = q 0.52(1−0.52) 130 = −0.983

Proportion Example: Physician Training Program

I Under assumption that null is true (here, π = 0.52), can calculate test statistic using our data: Proportion Example: Physician Training Program

I Under assumption that null is true (here, π = 0.52), can calculate test statistic using our data: πˆ − π Z = q π(1−π) n πˆ − π0 z = q π0(1−π0) n 62/130 − 0.52 = q 0.52(1−0.52) 130 = −0.983 I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample )

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance: I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null) I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation) I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If not → calculate using binomial distribution directly

Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal) Proportion Example: Physician Training Program

I Note: Test statistic does not rely on any sample variance:

πˆ − π0 z = q π0(1−π0) n

I In fact, fully determined by n and π0 (assuming null)

I Thus, more OK to use standard normal (b/c we are not using sample standard deviation)

I But still want to check sufficient sample size → above n = 30 and π not close to 0 or 1: I n × π0 > 10 and n × (1 − π0) > 10 I If yes, distribution of πˆ is approximately normal, ok to use z-test (Normal)

I If not → calculate using binomial distribution directly I Step 4: Once have calculated the test statistic, use that to calculate p-value

I Under two-tailed test: I 2 × P(Z 6 −0.983) = 0.3256

Proportion Example: Physician Training Program I Under two-tailed test: I 2 × P(Z 6 −0.983) = 0.3256

Proportion Example: Physician Training Program

I Step 4: Once have calculated the test statistic, use that to calculate p-value I 2 × P(Z 6 −0.983) = 0.3256

Proportion Example: Physician Training Program

I Step 4: Once have calculated the test statistic, use that to calculate p-value

I Under two-tailed test: Proportion Example: Physician Training Program

I Step 4: Once have calculated the test statistic, use that to calculate p-value

I Under two-tailed test: I 2 × P(Z 6 −0.983) = 0.3256 I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I Given two-tailed test with p-value of 0.3256, what would you do?

I Would you reject if one-tailed test?

Proportion Example: Physician Training Program I Given two-tailed test with p-value of 0.3256, what would you do?

I Would you reject if one-tailed test?

Proportion Example: Physician Training Program

I Step 5: Decide whether or not to reject the null hypothesis and interpret results I Would you reject if one-tailed test?

Proportion Example: Physician Training Program

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I Given two-tailed test with p-value of 0.3256, what would you do? Proportion Example: Physician Training Program

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I Given two-tailed test with p-value of 0.3256, what would you do?

I Would you reject if one-tailed test? I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means) I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists?

Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? Extending to Comparing Differences in Proportions

I This example used a single proportion I Just like in case of a single mean, there is “some value” to compare sample proportion to

I Ex) Whether program changes share of doctors leaving rural areas (H0 : π = 0.52) I Ex) Whether half of jobs occupied by women (H0 : π = 0.50) I However, can extend to explore comparisons of proportions between groups (analogy to difference in means)

I Ex) Compare voter turnout on Brexit in Scotland (not competitive) versus England (competitive)? I Ex) Are Canadians more likely to favor tighter gun control restrictions compared to Americans? I Ex) Do police in the U.S. pull over black motorists more frequently than white motorists? I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes

I Looking at six US hospitals, patients randomly assigned into two groups

I 604 assigned to have volunteers pray for them → 315 developed complications I 597 assigned to control (no prayer) → 304 developed complications I Patients in both groups ”informed that they may or may not receive prayer”

I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes?

Randomized Controlled Trial Example I Looking at six US hospitals, patients randomly assigned into two groups

I 604 assigned to have volunteers pray for them → 315 developed complications I 597 assigned to control (no prayer) → 304 developed complications I Patients in both groups ”informed that they may or may not receive prayer”

I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes?

Randomized Controlled Trial Example

I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes I 604 assigned to have volunteers pray for them → 315 developed complications I 597 assigned to control (no prayer) → 304 developed complications I Patients in both groups ”informed that they may or may not receive prayer”

I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes?

Randomized Controlled Trial Example

I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes

I Looking at six US hospitals, patients randomly assigned into two groups I 597 assigned to control (no prayer) → 304 developed complications I Patients in both groups ”informed that they may or may not receive prayer”

I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes?

Randomized Controlled Trial Example

I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes

I Looking at six US hospitals, patients randomly assigned into two groups

I 604 assigned to have volunteers pray for them → 315 developed complications I Patients in both groups ”informed that they may or may not receive prayer”

I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes?

Randomized Controlled Trial Example

I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes

I Looking at six US hospitals, patients randomly assigned into two groups

I 604 assigned to have volunteers pray for them → 315 developed complications I 597 assigned to control (no prayer) → 304 developed complications I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes?

Randomized Controlled Trial Example

I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes

I Looking at six US hospitals, patients randomly assigned into two groups

I 604 assigned to have volunteers pray for them → 315 developed complications I 597 assigned to control (no prayer) → 304 developed complications I Patients in both groups ”informed that they may or may not receive prayer” Randomized Controlled Trial Example

I 2006 study to determine healing power of spirituality with coronary surgery patients & their health outcomes

I Looking at six US hospitals, patients randomly assigned into two groups

I 604 assigned to have volunteers pray for them → 315 developed complications I 597 assigned to control (no prayer) → 304 developed complications I Patients in both groups ”informed that they may or may not receive prayer”

I Based on this data, should we agree with skeptics that say prayer unrelated to surgical outcomes? Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions

Conducted using several steps: 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example

Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Hypothesis Testing for Proportions

Conducted using several steps: 1. Generate your null and alternative hypotheses 2. Collect sample/s of data 3. Calculate appropriate test statistic 4. Use that to calculate a probability called a p-value 5. Use the p-value to decide whether to reject null hypothesis and interpret results Let’s review these steps using the prayer and surgery randomized controlled trial (RTC) example I Step 1: Null and Alternative Hypotheses I Null Hypothesis

I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example I Null Hypothesis

I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I Null Hypothesis I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I Null Hypothesis

I No difference (prayer not effective) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I Null Hypothesis

I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I Null Hypothesis

I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I (Ha : π1 − π2 > 0)

Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I Null Hypothesis

I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) Randomized Controlled Trial Example

I Step 1: Null and Alternative Hypotheses I Null Hypothesis

I No difference (prayer not effective) I H0 : π1 − π2 = 0 (or π1 = π2) I Alternative Hypothesis I Ha : π1 − π2 6= 0 (or π1 6= π2) I (Ha : π1 − π2 > 0) I Step 2: Collect sample data

I Done for us:

Prayer No Complications Complications Total Yes 289 315 604 No 293 304 597

Randomized Controlled Trial Example I Done for us:

Prayer No Complications Complications Total Yes 289 315 604 No 293 304 597

Randomized Controlled Trial Example

I Step 2: Collect sample data Prayer No Complications Complications Total Yes 289 315 604 No 293 304 597

Randomized Controlled Trial Example

I Step 2: Collect sample data

I Done for us: Randomized Controlled Trial Example

I Step 2: Collect sample data

I Done for us:

Prayer No Complications Complications Total Yes 289 315 604 No 293 304 597 I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the standard error will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic I Also, for independent samples, the standard error will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the standard error will just be the sum of the two standard errors π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the standard error will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow: I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the standard error will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2 πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2

Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the standard error will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to Randomized Controlled Trial Example

I Step 3: Calculated appropriate test statistic

I Again, under CLT, if n1 and n2 are relatively large, then estimator πˆ1 − πˆ2 will have an approximately normal distribution

I Also, for independent samples, the standard error will just be the sum of the two standard errors

I So πˆ1 − πˆ2 will follow:

π1(1 − π1) π2(1 − π2) ∼ N(π1 − π2, + ) n1 n2

I and we can standardize to

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2 I Given that the null is true, π1 − π2 = 0, we can calculate the test statistic

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2 πˆ1 − πˆ2 z = q πˆ1(1−πˆ1) + πˆ2(1−πˆ2) n1 n2

I Using our example: 315/604 − 304/597 = q 315/604(1−315/604) 304/597(1−304/597) 604 + 597 = 0.4637

Randomized Controlled Trial Example πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2 πˆ1 − πˆ2 z = q πˆ1(1−πˆ1) + πˆ2(1−πˆ2) n1 n2

I Using our example: 315/604 − 304/597 = q 315/604(1−315/604) 304/597(1−304/597) 604 + 597 = 0.4637

Randomized Controlled Trial Example

I Given that the null is true, π1 − π2 = 0, we can calculate the test statistic I Using our example: 315/604 − 304/597 = q 315/604(1−315/604) 304/597(1−304/597) 604 + 597 = 0.4637

Randomized Controlled Trial Example

I Given that the null is true, π1 − π2 = 0, we can calculate the test statistic

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2 πˆ1 − πˆ2 z = q πˆ1(1−πˆ1) + πˆ2(1−πˆ2) n1 n2 315/604 − 304/597 = q 315/604(1−315/604) 304/597(1−304/597) 604 + 597 = 0.4637

Randomized Controlled Trial Example

I Given that the null is true, π1 − π2 = 0, we can calculate the test statistic

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2 πˆ1 − πˆ2 z = q πˆ1(1−πˆ1) + πˆ2(1−πˆ2) n1 n2

I Using our example: Randomized Controlled Trial Example

I Given that the null is true, π1 − π2 = 0, we can calculate the test statistic

πˆ1 − πˆ2 − (π1 − π2) Z = q π1(1−π1) + π2(1−π2) n1 n2 πˆ1 − πˆ2 z = q πˆ1(1−πˆ1) + πˆ2(1−πˆ2) n1 n2

I Using our example: 315/604 − 304/597 = q 315/604(1−315/604) 304/597(1−304/597) 604 + 597 = 0.4637 I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t) I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813 I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results I Evidence to reject null of no effect of prayer? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Using a one-sided test?

Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? Randomized Controlled Trial Example

I Step 4: Calculate p-value

I Note: Again, w/ proportions, standard normal works well (no need to use t)

I Let’s use a two-tailed test

I Here, p-value = 0.642813

I Step 5: Decide whether or not to reject the null hypothesis and interpret results

I What would you think? I Evidence to reject null of no effect of prayer? I Using a one-sided test? Hypothesis tests are not perfect!

I 1) Hypothesis Tests test the null hypothesis, not the alternate hypothesis

I 2) Will not guarantee practical significance

I 3) Cannot guarantee full certainty regarding whether null hypothesis is 100% false

Potential issues with Hypothesis Testing Hypothesis tests are not perfect!

I 1) Hypothesis Tests test the null hypothesis, not the alternate hypothesis

I 2) Will not guarantee practical significance

I 3) Cannot guarantee full certainty regarding whether null hypothesis is 100% false

Potential issues with Hypothesis Testing I 1) Hypothesis Tests test the null hypothesis, not the alternate hypothesis

I 2) Will not guarantee practical significance

I 3) Cannot guarantee full certainty regarding whether null hypothesis is 100% false

Potential issues with Hypothesis Testing

Hypothesis tests are not perfect! I 2) Will not guarantee practical significance

I 3) Cannot guarantee full certainty regarding whether null hypothesis is 100% false

Potential issues with Hypothesis Testing

Hypothesis tests are not perfect!

I 1) Hypothesis Tests test the null hypothesis, not the alternate hypothesis I 3) Cannot guarantee full certainty regarding whether null hypothesis is 100% false

Potential issues with Hypothesis Testing

Hypothesis tests are not perfect!

I 1) Hypothesis Tests test the null hypothesis, not the alternate hypothesis

I 2) Will not guarantee practical significance Potential issues with Hypothesis Testing

Hypothesis tests are not perfect!

I 1) Hypothesis Tests test the null hypothesis, not the alternate hypothesis

I 2) Will not guarantee practical significance

I 3) Cannot guarantee full certainty regarding whether null hypothesis is 100% false I Interpret as to whether you do or do not reject the null

I NOT whether you accept the alternative

I (A bit like proof by contradiction)

I Is p-value the probability that the null is true?

1) Hypotheses Tests test the null hypothesis I NOT whether you accept the alternative

I (A bit like proof by contradiction)

I Is p-value the probability that the null is true?

1) Hypotheses Tests test the null hypothesis

I Interpret as to whether you do or do not reject the null I (A bit like proof by contradiction)

I Is p-value the probability that the null is true?

1) Hypotheses Tests test the null hypothesis

I Interpret as to whether you do or do not reject the null

I NOT whether you accept the alternative I Is p-value the probability that the null is true?

1) Hypotheses Tests test the null hypothesis

I Interpret as to whether you do or do not reject the null

I NOT whether you accept the alternative

I (A bit like proof by contradiction) 1) Hypotheses Tests test the null hypothesis

I Interpret as to whether you do or do not reject the null

I NOT whether you accept the alternative

I (A bit like proof by contradiction)

I Is p-value the probability that the null is true? I A very small p-value provides strong evidence to reject null hypothesis

I But: statistical significance 6= practical significance

I We can often get small p-values due to large sample sizes

2) Practical versus Statistical Significance I But: statistical significance 6= practical significance

I We can often get small p-values due to large sample sizes

2) Practical versus Statistical Significance

I A very small p-value provides strong evidence to reject null hypothesis I We can often get small p-values due to large sample sizes

2) Practical versus Statistical Significance

I A very small p-value provides strong evidence to reject null hypothesis

I But: statistical significance 6= practical significance 2) Practical versus Statistical Significance

I A very small p-value provides strong evidence to reject null hypothesis

I But: statistical significance 6= practical significance

I We can often get small p-values due to large sample sizes I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do?

I Would it affect your decision if intervention cost $1 billion? $500,000? Costless?

2) Practical versus Statistical Significance I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do?

I Would it affect your decision if intervention cost $1 billion? $500,000? Costless?

2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2) I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do?

I Would it affect your decision if intervention cost $1 billion? $500,000? Costless?

2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0 I → What would you do?

I Would it affect your decision if intervention cost $1 billion? $500,000? Costless?

2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001 I Would it affect your decision if intervention cost $1 billion? $500,000? Costless?

2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do? $500,000? Costless?

2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do?

I Would it affect your decision if intervention cost $1 billion? Costless?

2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do?

I Would it affect your decision if intervention cost $1 billion? $500,000? 2) Practical versus Statistical Significance

I Example: Suppose you have intervention to increase test scores of poor children (Group 1) versus affluent children (Group 2)

I H0 : µ1 − µ2 = 0, Ha : µ1 − µ2 6= 0

I n = 1 million study yields difference of 0.02 on 100-point test and p-value < 0.00001

I → What would you do?

I Would it affect your decision if intervention cost $1 billion? $500,000? Costless? I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true

I However: Do not allow us to rule out null hypothesis w/ 100% certainty

I Specifically, we’re concerned about two scenarios

I We reject the null hypothesis, even though the null is true

I We fail to reject the null hypothesis, even though the null is false

I Sound familiar? (Conditional statements)

3) No full certainty about correctly rejecting null I However: Do not allow us to rule out null hypothesis w/ 100% certainty

I Specifically, we’re concerned about two scenarios

I We reject the null hypothesis, even though the null is true

I We fail to reject the null hypothesis, even though the null is false

I Sound familiar? (Conditional statements)

3) No full certainty about correctly rejecting null

I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true I Specifically, we’re concerned about two scenarios

I We reject the null hypothesis, even though the null is true

I We fail to reject the null hypothesis, even though the null is false

I Sound familiar? (Conditional statements)

3) No full certainty about correctly rejecting null

I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true

I However: Do not allow us to rule out null hypothesis w/ 100% certainty I We reject the null hypothesis, even though the null is true

I We fail to reject the null hypothesis, even though the null is false

I Sound familiar? (Conditional statements)

3) No full certainty about correctly rejecting null

I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true

I However: Do not allow us to rule out null hypothesis w/ 100% certainty

I Specifically, we’re concerned about two scenarios I We fail to reject the null hypothesis, even though the null is false

I Sound familiar? (Conditional statements)

3) No full certainty about correctly rejecting null

I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true

I However: Do not allow us to rule out null hypothesis w/ 100% certainty

I Specifically, we’re concerned about two scenarios

I We reject the null hypothesis, even though the null is true I Sound familiar? (Conditional statements)

3) No full certainty about correctly rejecting null

I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true

I However: Do not allow us to rule out null hypothesis w/ 100% certainty

I Specifically, we’re concerned about two scenarios

I We reject the null hypothesis, even though the null is true

I We fail to reject the null hypothesis, even though the null is false 3) No full certainty about correctly rejecting null

I Hypothesis tests given us p-values, or how extreme our test statistic is given null being true

I However: Do not allow us to rule out null hypothesis w/ 100% certainty

I Specifically, we’re concerned about two scenarios

I We reject the null hypothesis, even though the null is true

I We fail to reject the null hypothesis, even though the null is false

I Sound familiar? (Conditional statements) H0 is true H0 is not true Reject H0 Do not reject H0

Type I versus Type II Errors Type I versus Type II Errors

H0 is true H0 is not true Reject H0 Do not reject H0 Type I versus Type II Errors

H0 is true H0 is not true Reject H0 Good Do not reject H0 Good Type I versus Type II Errors

H0 is true H0 is not true Reject H0 Bad! Good Do not reject H0 Good Bad! Type I versus Type II Errors

H0 is true H0 is not true Reject H0 Type 1 Good Do not reject H0 Good Type 2 I Both types of errors can occur w/ a hypothesis test

I Good practice: Before study carried out, set limits on how much error we would tolerate

Type I versus Type II Errors I Good practice: Before study carried out, set limits on how much error we would tolerate

Type I versus Type II Errors

I Both types of errors can occur w/ a hypothesis test Type I versus Type II Errors

I Both types of errors can occur w/ a hypothesis test

I Good practice: Before study carried out, set limits on how much error we would tolerate I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01 I Question: How do you interpret a level of significance of 0.05?

Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error? Type I Error

I Type I: Null hypothesis rejected when in fact it is true I Akin to false negative

I Mammogram analogy: Negative test, despite having the disease

I P(Type I error) = P(Rejecting H0|H0 true) = α

I α: Also referred to as level of significance or the critical value

I Rejection region: Values of X¯ to left/right of values of α for which the hypothesis test would reject the null

I Very common to set α = 0.05, α = 0.10, or α = 0.01

I Question: Why would we want to avoid Type I Error?

I Question: How do you interpret a level of significance of 0.05? I Type II: Null hypothesis is not rejected when in fact it is false I Akin to false positive

I Mammogram analogy: Positive test, despite not having the disease

I P(Type II error) = P(Not rejecting H0|H0 false)

I Often set at P(Type II error) = 0.20 = β

Type II Error I Akin to false positive

I Mammogram analogy: Positive test, despite not having the disease

I P(Type II error) = P(Not rejecting H0|H0 false)

I Often set at P(Type II error) = 0.20 = β

Type II Error

I Type II: Null hypothesis is not rejected when in fact it is false I Mammogram analogy: Positive test, despite not having the disease

I P(Type II error) = P(Not rejecting H0|H0 false)

I Often set at P(Type II error) = 0.20 = β

Type II Error

I Type II: Null hypothesis is not rejected when in fact it is false I Akin to false positive I P(Type II error) = P(Not rejecting H0|H0 false)

I Often set at P(Type II error) = 0.20 = β

Type II Error

I Type II: Null hypothesis is not rejected when in fact it is false I Akin to false positive

I Mammogram analogy: Positive test, despite not having the disease I Often set at P(Type II error) = 0.20 = β

Type II Error

I Type II: Null hypothesis is not rejected when in fact it is false I Akin to false positive

I Mammogram analogy: Positive test, despite not having the disease

I P(Type II error) = P(Not rejecting H0|H0 false) Type II Error

I Type II: Null hypothesis is not rejected when in fact it is false I Akin to false positive

I Mammogram analogy: Positive test, despite not having the disease

I P(Type II error) = P(Not rejecting H0|H0 false)

I Often set at P(Type II error) = 0.20 = β I 1 − β = P(Rejecting H0|H0 false)

I Probability of correctly rejecting the null

I Known as the

I Sometimes considered less important from substantive perspective

I However: consider your specific problem → in some instances, either error may be more costly

I Note: Exact power calculation will depend on your alternative Ha

Type II Error and Power I Probability of correctly rejecting the null

I Known as the power of a test

I Sometimes considered less important from substantive perspective

I However: consider your specific problem → in some instances, either error may be more costly

I Note: Exact power calculation will depend on your alternative Ha

Type II Error and Power

I 1 − β = P(Rejecting H0|H0 false) I Known as the power of a test

I Sometimes considered less important from substantive perspective

I However: consider your specific problem → in some instances, either error may be more costly

I Note: Exact power calculation will depend on your alternative Ha

Type II Error and Power

I 1 − β = P(Rejecting H0|H0 false)

I Probability of correctly rejecting the null I Sometimes considered less important from substantive perspective

I However: consider your specific problem → in some instances, either error may be more costly

I Note: Exact power calculation will depend on your alternative Ha

Type II Error and Power

I 1 − β = P(Rejecting H0|H0 false)

I Probability of correctly rejecting the null

I Known as the power of a test I However: consider your specific problem → in some instances, either error may be more costly

I Note: Exact power calculation will depend on your alternative Ha

Type II Error and Power

I 1 − β = P(Rejecting H0|H0 false)

I Probability of correctly rejecting the null

I Known as the power of a test

I Sometimes considered less important from substantive perspective I Note: Exact power calculation will depend on your alternative Ha

Type II Error and Power

I 1 − β = P(Rejecting H0|H0 false)

I Probability of correctly rejecting the null

I Known as the power of a test

I Sometimes considered less important from substantive perspective

I However: consider your specific problem → in some instances, either error may be more costly Type II Error and Power

I 1 − β = P(Rejecting H0|H0 false)

I Probability of correctly rejecting the null

I Known as the power of a test

I Sometimes considered less important from substantive perspective

I However: consider your specific problem → in some instances, either error may be more costly

I Note: Exact power calculation will depend on your alternative Ha I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null? I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true) I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null → Both mean that power analyses usually done before data are collected (and $ spent!)

Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size Power Analyses

I Power Analysis: If a given alternative hypothesis were true, how good would our test be at (correctly) rejecting the null?

I Somewhat similar to hypothesis test set up, but for purposes of calculating power, assume alternative true: P(Rejecting H0|Ha true)

I Usually approached in one of two ways: I 1) You are asked to calculate the sample size you will need to detect an effect (difference between null and alternative) of a certain size

I Need to state a precise alternative null

I 2) You are asked to calculate what is the smallest effect (difference) you could detect given a sample size → Both mean that power analyses usually done before data are collected (and $ spent!) I You are an urban policy expert studying commuting times

I Suppose you are considering testing whether average time spent in commuting is 4 hrs/week versus alternative that it is greater due to recent construction

I Suppose further that you know (assume) true standard deviation to be 2hrs

I Find power when a) sample size equals 16 and b) alternative hypothesis is that commuting hours equals 6hrs (so effect of construction is 2hrs)

Power Analysis Example I Suppose you are considering testing whether average time spent in commuting is 4 hrs/week versus alternative that it is greater due to recent construction

I Suppose further that you know (assume) true standard deviation to be 2hrs

I Find power when a) sample size equals 16 and b) alternative hypothesis is that commuting hours equals 6hrs (so effect of construction is 2hrs)

Power Analysis Example

I You are an urban policy expert studying commuting times I Suppose further that you know (assume) true standard deviation to be 2hrs

I Find power when a) sample size equals 16 and b) alternative hypothesis is that commuting hours equals 6hrs (so effect of construction is 2hrs)

Power Analysis Example

I You are an urban policy expert studying commuting times

I Suppose you are considering testing whether average time spent in commuting is 4 hrs/week versus alternative that it is greater due to recent construction I Find power when a) sample size equals 16 and b) alternative hypothesis is that commuting hours equals 6hrs (so effect of construction is 2hrs)

Power Analysis Example

I You are an urban policy expert studying commuting times

I Suppose you are considering testing whether average time spent in commuting is 4 hrs/week versus alternative that it is greater due to recent construction

I Suppose further that you know (assume) true standard deviation to be 2hrs Power Analysis Example

I You are an urban policy expert studying commuting times

I Suppose you are considering testing whether average time spent in commuting is 4 hrs/week versus alternative that it is greater due to recent construction

I Suppose further that you know (assume) true standard deviation to be 2hrs

I Find power when a) sample size equals 16 and b) alternative hypothesis is that commuting hours equals 6hrs (so effect of construction is 2hrs) Steps:

I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values)

I Assume alternative hypothesis is true, µ = 6

I Calculate probability of not being in rejection region when this alternative is true (this is β)

I And then finally calculate 1 − β to get Power

I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function

Power Analysis Example I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values)

I Assume alternative hypothesis is true, µ = 6

I Calculate probability of not being in rejection region when this alternative is true (this is β)

I And then finally calculate 1 − β to get Power

I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function

Power Analysis Example

Steps: I Assume alternative hypothesis is true, µ = 6

I Calculate probability of not being in rejection region when this alternative is true (this is β)

I And then finally calculate 1 − β to get Power

I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function

Power Analysis Example

Steps:

I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values) I Calculate probability of not being in rejection region when this alternative is true (this is β)

I And then finally calculate 1 − β to get Power

I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function

Power Analysis Example

Steps:

I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values)

I Assume alternative hypothesis is true, µ = 6 I And then finally calculate 1 − β to get Power

I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function

Power Analysis Example

Steps:

I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values)

I Assume alternative hypothesis is true, µ = 6

I Calculate probability of not being in rejection region when this alternative is true (this is β) I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function

Power Analysis Example

Steps:

I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values)

I Assume alternative hypothesis is true, µ = 6

I Calculate probability of not being in rejection region when this alternative is true (this is β)

I And then finally calculate 1 − β to get Power Power Analysis Example

Steps:

I Because you’ll eventually do a hypothesis test assuming the null, first calculate the rejection region under standard set-up (with given α values)

I Assume alternative hypothesis is true, µ = 6

I Calculate probability of not being in rejection region when this alternative is true (this is β)

I And then finally calculate 1 − β to get Power

I Note: Can repeat for different values of µ (e.g., µ = 6, 7, 8, 9) to plot a power curve or power function Power Analysis Intuition 0.5 0.4 0.3 0.2 0.1 Null 0.0

−4 −2 0 2 4

z Power Analysis Intuition 0.5 0.4 0.3 0.2 0.1 Null 0.0

−4 −2 0 2 4

z Power Analysis Intuition 0.5 Reject Do not reject Reject 0.4 0.3 0.2 0.1 Null 0.0

−4 −2 0 2 4

z Power Analysis Intuition 0.5 Reject Do not reject Reject 0.4 0.3 0.2 0.1 Null Alternative 0.0

−4 −2 0 2 4

z Power Analysis Intuition 0.5 Reject Do not reject Reject 0.4 0.3

Power 0.2 0.1

Null Type II Alternative Error 0.0

−4 −2 0 2 4

z I Step 1: Calculate rejection region for eventual (null) hypothesis test

I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n

I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05)

I Step 2: Calculate rejection regions under null

X¯ − 4 1.645 < → X¯ > 4.8825 1/2

I That is, if data have a sample mean of X¯ > 4.89 you would reject null

Power Analysis Example I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n

I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05)

I Step 2: Calculate rejection regions under null

X¯ − 4 1.645 < → X¯ > 4.8825 1/2

I That is, if data have a sample mean of X¯ > 4.89 you would reject null

Power Analysis Example

I Step 1: Calculate rejection region for eventual (null) hypothesis test I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05)

I Step 2: Calculate rejection regions under null

X¯ − 4 1.645 < → X¯ > 4.8825 1/2

I That is, if data have a sample mean of X¯ > 4.89 you would reject null

Power Analysis Example

I Step 1: Calculate rejection region for eventual (null) hypothesis test

I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n I Step 2: Calculate rejection regions under null

X¯ − 4 1.645 < → X¯ > 4.8825 1/2

I That is, if data have a sample mean of X¯ > 4.89 you would reject null

Power Analysis Example

I Step 1: Calculate rejection region for eventual (null) hypothesis test

I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n

I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05) X¯ − 4 1.645 < → X¯ > 4.8825 1/2

I That is, if data have a sample mean of X¯ > 4.89 you would reject null

Power Analysis Example

I Step 1: Calculate rejection region for eventual (null) hypothesis test

I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n

I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05)

I Step 2: Calculate rejection regions under null I That is, if data have a sample mean of X¯ > 4.89 you would reject null

Power Analysis Example

I Step 1: Calculate rejection region for eventual (null) hypothesis test

I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n

I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05)

I Step 2: Calculate rejection regions under null

X¯ − 4 1.645 < → X¯ > 4.8825 1/2 Power Analysis Example

I Step 1: Calculate rejection region for eventual (null) hypothesis test

I Remember that test statistic for single mean is

X¯ − µ z = √ σ/ n

I Assuming one-tailed test, we reject H0 if z > 1.645 (α = 0.05)

I Step 2: Calculate rejection regions under null

X¯ − 4 1.645 < → X¯ > 4.8825 1/2

I That is, if data have a sample mean of X¯ > 4.89 you would reject null I Now, assume alternative true, µ = 6

I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4?

I That is, what is P(X¯ > 4.89) if µ = 6?

X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22

I Finally P(Z < −2.22) = 0.013

I So Power = 1 − P(Z < −2.22) = 0.987

Power Analysis Example I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4?

I That is, what is P(X¯ > 4.89) if µ = 6?

X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22

I Finally P(Z < −2.22) = 0.013

I So Power = 1 − P(Z < −2.22) = 0.987

Power Analysis Example

I Now, assume alternative true, µ = 6 I That is, what is P(X¯ > 4.89) if µ = 6?

X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22

I Finally P(Z < −2.22) = 0.013

I So Power = 1 − P(Z < −2.22) = 0.987

Power Analysis Example

I Now, assume alternative true, µ = 6

I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4? X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22

I Finally P(Z < −2.22) = 0.013

I So Power = 1 − P(Z < −2.22) = 0.987

Power Analysis Example

I Now, assume alternative true, µ = 6

I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4?

I That is, what is P(X¯ > 4.89) if µ = 6? I Finally P(Z < −2.22) = 0.013

I So Power = 1 − P(Z < −2.22) = 0.987

Power Analysis Example

I Now, assume alternative true, µ = 6

I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4?

I That is, what is P(X¯ > 4.89) if µ = 6?

X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22 I So Power = 1 − P(Z < −2.22) = 0.987

Power Analysis Example

I Now, assume alternative true, µ = 6

I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4?

I That is, what is P(X¯ > 4.89) if µ = 6?

X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22

I Finally P(Z < −2.22) = 0.013 Power Analysis Example

I Now, assume alternative true, µ = 6

I If µ was 6, how often would we get a sample mean in the rejection region of null of µ = 4?

I That is, what is P(X¯ > 4.89) if µ = 6?

X¯ − µ z = √ σ/ n 4.89 − 6 = 1/2 = −2.22

I Finally P(Z < −2.22) = 0.013

I So Power = 1 − P(Z < −2.22) = 0.987 I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level)

I Larger sample size → more power

I Bigger difference between null and alternative you are testing → requires less power

I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative

I Rule of thumb: 80% power

I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error)

Power I Larger sample size → more power

I Bigger difference between null and alternative you are testing → requires less power

I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative

I Rule of thumb: 80% power

I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error)

Power

I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level) I Bigger difference between null and alternative you are testing → requires less power

I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative

I Rule of thumb: 80% power

I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error)

Power

I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level)

I Larger sample size → more power I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative

I Rule of thumb: 80% power

I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error)

Power

I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level)

I Larger sample size → more power

I Bigger difference between null and alternative you are testing → requires less power I Rule of thumb: 80% power

I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error)

Power

I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level)

I Larger sample size → more power

I Bigger difference between null and alternative you are testing → requires less power

I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error)

Power

I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level)

I Larger sample size → more power

I Bigger difference between null and alternative you are testing → requires less power

I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative

I Rule of thumb: 80% power Power

I Note: Power dependent on a) sample size, b) your Ha (size of effect), and c) type of test (change α level)

I Larger sample size → more power

I Bigger difference between null and alternative you are testing → requires less power

I A two-sided hypothesis test has less power than the one- sided hypothesis, since it is more conservative

I Rule of thumb: 80% power

I Higher power may be better, but perhaps not if it comes in changing the type of test (b/c it affects Type 1 error) Next time

I and confidence intervals Proof of Expected Value of πˆ

I If X can only take on 0, 1 values with constant probability, π, then xi follows a binomial distribution I Furthermore,P the expected value of a binomial is nπ (or np, in our previous notation)

I Therefore: x E[πˆ] = E[ i ] n 1 P = E[ x ] n i np = X n = p Proof of Variance of πˆ

I The variance of a binomial is nπ(1 − π) (or np(1 − p), using our previous notation)

I Therefore: x Var[πˆ] = Var[ i ] n 1 P = Var[ x ] n2 i 1 X = × nπ(1 − π) n2 π(1 − π) = n