BIO5312 Biostatistics Lecture 6: Statistical Hypothesis Testings

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings Yujin Chung October 4th, 2016 Fall 2016 Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30 Previous Two types of statistical inferences: Estimation: concerned with estimating the values of specific population parameters. These specific values are referred to as point estimates. Sometimes, interval estimation is carried out to specify an interval which likely includes the parameter values. Hypothesis testing: concerned with testing whether the value of a population parameter is equal to some specific value Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 2/30 Hypothesis testing Philosophy: prove a claim by contradiction. Analogy: \dependent love" story Claim : You don't love me. Reasoning : If you loved me, you would take the trash out every week and put your socks away. Data : Some weeks you don't take the trash out or leave your socks where they fall. Conclusion : You don't love me. Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 3/30 Statistical hypothesis testing Hypothesis-testing framework specifies two hypotheses: null and alternative hypothesis I The null hypothesis (H0) is often an initial claim that researchers specify using previous research or knowledge. Typically it is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. I The alternative hypothesis (H1) is what you might believe to be true or hope to prove true. H0 : you love me vs: H1 : you don't love me Hypothesis-testing provides an objective framework for making decisions using probabilities methods, rather than relying on subjective impressions. Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 4/30 Examples The average of cholesterol level in children is 175mg/dL. A group of men who have died from heart disease within the past year are identified, and the cholesterol of their offspring are measured. (1) Is the average cholesterol level of these children larger than 175mg/dL? (2) Is the average cholesterol level different from that of children whose fathers do not have a history of heart disease? I µ1: the population mean of cholesterol level in the case group µ2: the poulation mean of cholesterol level in the control group I (1) H0: µ1 = 175 vs. H1: µ1 > 175 I (2) H0: µ1 = µ2 vs. H1: µ1 6= µ2 Are the IQ and the number of finger-wrist taps (fwt) of children in the lead exposed group different from those of children in the control group? I H0: the population mean of fwt in the two groups are the same vs. H1: the means are different Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 5/30 Four possible outcomes in hypothesis testing No reject H0 Reject H0 H0 true true negative false positive (1 − α) Type I error (α) H1 true false negative true positive Type II error (β) Power (1 − β) Two possible errors I type I error (α): Pr(Reject H0jH0 true). commonly referred to as the significance level of a test. I type II error (β): Pr(Not reject H0jH1 true) The power of a test: 1 − β = Pr(Reject H0jH1 true) We prefer a test with small α and large power (1 − β). Statistical hypothesis test: the greatest power (1 − β) among all possible tests of a given type I error α Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 6/30 t-Test for the Mean We assume the cholesterol levels in children follow N(µ, σ2). We wish to test whether the cholesterol levels of children with family history is same as 175mg/dL, the average cholesterol without family or larger than 175. Hypotheses H0 : µ = 175 vs H1 : µ > 175. Logic: 1 Assume H0 is true 2 Ifx ¯ is too large, it is a contradiction to the assumption that H0 is true. Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 7/30 t-test for the mean: critical-value method H0 : µ = 175 vs H1 : µ > 175. The distribution of X¯ under H0 X¯ − 175 Since X ;:::;X ∼ N(175; σ2), t = p ∼ t . 1 n S= n n−1 Critical−value method: H0:µ=175 vs. H1:µ>175 0.4 1 − n 0.3 Acceptance region 0.2 0.1 Density function of t Rejection region 0.0 −4 0 tn−11−α 4 Critical value: tn−1;1−α If t > tn−1;1−α, reject H0; if t ≤ tn−1,α, not reject H0 Type I error: Pr(t > tn−1;1−αjH0) = α; Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 8/30 t-Test for the Mean: p-value method H0 : µ = 175 vs H1 : µ > 175. X¯ − 175 Test statistic: t = p . S= n Under H0: test statistic t ∼ tn−1 p-value: Pr(t > t(obs)jH0), the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic value, given that H0 is true. p−value for the test: H0:µ=175 vs. H1:µ>175 0.4 1 − n 0.3 ● Rejection region 0.2 ● p−value: Pr(t>t(obs)) 0.1 Density function of t 0.0 −4 0 tn−11−t(obs)α 4 If p-value < α, reject H0; if p-value ≥ α, not reject H0. Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 9/30 Significance level: α H0 is rejected if t > tn−1;1−α or p-value< α. α: significance level, type-I error, typically set to 0.05 Guidelines for judging the significance of a p-value If p ≥ 0:05, then the results are considered not statistically significant If 0:01 ≤ p < 0:05, then the results are statistically significant If 0:001 ≤ p < 0:01, then the results are highly significant If p < 0:001, then the results are very highly significant Report an exact p-value! The p-value indicates exactly how significant the results are without performing repeated significance tests at different α levels. The p-value indicate how close to statistical significance the results have come even when they are not statistically significant Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 10/30 Example: the cholesterol of children Suppose the mean cholesterol level of 10 children whose fathers died from heart disease is 200 mg/dL and the sample standard deviation is 50 mg/dL. The average of cholesterol level in children is known as 175mg/dL. Is the average cholesterol level of these children larger than 175mg/dL? Let µ be the population mean cholesterol level of children whose fathers died from heart disease. The hypotheses are H0 : µ = 175 vs. H1 : µ > 175. x¯ − µ The test statistic is t = p 0 and follows t under H . The s= n n−1 0 200−p175 observed test statistic is 50= n = 1:58. Critical-value method At the significance level 5%, the critical value is tn−1;1−α = t9;0:95 = 1:833 and the rejection region is t > 1:833. Since t(obs) = 1:58 < 1:833, we cannot reject H0 at significance level 5% Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 11/30 Example: the cholesterol of children Suppose the mean cholesterol level of 10 children whose fathers died from heart disease is 200 mg/dL and the sample standard deviation is 50 mg/dL. The average of cholesterol level in children is known as 175mg/dL. Is the average cholesterol level of these children larger than 175mg/dL? Let µ be the population mean cholesterol level of children whose fathers died from heart disease. The hypotheses are H0 : µ = 175 vs. H1 : µ > 175. x¯ − µ0 The test statistic is t = p and follows t = t under H . The s= n n−1 9 0 200−p175 observed test statistic is 50= n = 1:58. p-value method The p-value is p = Pr(t > t(obs)jH0) = Pr(t > 1:58jH0) = 0:074. Since p > 0:05, we cannot reject H0 at significance level 5% Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 12/30 One-tailed t-test for the mean A one-tailed test is a test in which the values of the parameter being studied under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis (µ0) but not both. H0 : µ = µ0 X¯ − µ test statistic: t = p 0 S= n H1 rejection region p-value µ > µ0 t > tn−1;1−α Pr(t > t(obs)jH0) µ < µ0 t < tn−1,α Pr(t < t(obs)jH0) Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 13/30 Two-sided alternatives x¯−pµ0 The test for H0 : µ = µ0 vs. H1 : µ 6= µ0 is based on t = s= n . Critical-value method I Rejection region: If jtj > tn−1;1−α/2, then H0 is rejected. I Acceptance region: If jtj > tn−1;1−α/2, then H0 is NOT rejected. I Type-I error: Pr(jtj > tn−1;1−α/2jH0) = α. p-value method: p-value is Pr(jtj > jt(obs)jjH0). If p < 0:05, then H0 is rejected at significance level 5%. Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 14/30 Example: two-sided alternatives (continued from the cholesterol data) Is the average cholesterol level of children whose fathers had heart disease different from the US average cholesterol level (175) of children? The hypotheses are H0 : µ = 175 vs. H1 : µ 6= 175. The observed test statistic is t(obs) = 1:58. Critical-value method: At significance level of 5%, the rejection region is t > 2:262 or t < −2:262.

Load more