BIO5312 Biostatistics Lecture 6: Statistical testings

Yujin Chung

October 4th, 2016

Fall 2016

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30 Previous

Two types of statistical inferences: Estimation: concerned with estimating the values of specific parameters. These specific values are referred to as point estimates. Sometimes, is carried out to specify an interval which likely includes the parameter values. Hypothesis testing: concerned with testing whether the value of a population parameter is equal to some specific value

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 2/30 Hypothesis testing

Philosophy: prove a claim by contradiction.

Analogy: “dependent love” story Claim : You don’t love me. Reasoning : If you loved me, you would take the trash out every week and put your socks away. : Some weeks you don’t take the trash out or leave your socks where they fall. Conclusion : You don’t love me.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 3/30 Statistical hypothesis testing

Hypothesis-testing framework specifies two hypotheses: null and

I The null hypothesis (H0) is often an initial claim that researchers specify using previous research or knowledge. Typically it is a statement that the value of a population parameter (such as proportion, , or ) is equal to some claimed value. I The alternative hypothesis (H1) is what you might believe to be true or hope to prove true.

H0 : you love me vs. H1 : you don’t love me

Hypothesis-testing provides an objective framework for making decisions using probabilities methods, rather than relying on subjective impressions.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 4/30 Examples

The average of cholesterol level in children is 175mg/dL. A group of men who have died from heart within the past year are identified, and the cholesterol of their offspring are measured. (1) Is the average cholesterol level of these children larger than 175mg/dL? (2) Is the average cholesterol level different from that of children whose fathers do not have a history of heart disease?

I µ1: the population mean of cholesterol level in the case group µ2: the poulation mean of cholesterol level in the control group I (1) H0: µ1 = 175 vs. H1: µ1 > 175 I (2) H0: µ1 = µ2 vs. H1: µ1 6= µ2 Are the IQ and the number of finger-wrist taps (fwt) of children in the lead exposed group different from those of children in the control group?

I H0: the population mean of fwt in the two groups are the same vs. H1: the are different

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 5/30 Four possible outcomes in hypothesis testing

No reject H0 Reject H0 H0 true true negative false positive (1 − α) Type I error (α) H1 true false negative true positive Type II error (β) Power (1 − β)

Two possible errors

I type I error (α): Pr(Reject H0|H0 true). commonly referred to as the significance level of a test. I type II error (β): Pr(Not reject H0|H1 true) The : 1 − β = Pr(Reject H0|H1 true) We prefer a test with small α and large power (1 − β). Statistical hypothesis test: the greatest power (1 − β) among all possible tests of a given type I error α

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 6/30 t-Test for the Mean

We assume the cholesterol levels in children follow N(µ, σ2). We wish to test whether the cholesterol levels of children with family history is same as 175mg/dL, the average cholesterol without family or larger than 175. Hypotheses H0 : µ = 175 vs H1 : µ > 175. Logic:

1 Assume H0 is true 2 Ifx ¯ is too large, it is a contradiction to the assumption that H0 is true.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 7/30 t-test for the mean: critical-value method

H0 : µ = 175 vs H1 : µ > 175. The distribution of X¯ under H0 X¯ − 175 Since X ,...,X ∼ N(175, σ2), t = √ ∼ t . 1 n S/ n n−1

Critical−value method: H0:µ=175 vs. H1:µ>175 0.4 1 − n 0.3 Acceptance region 0.2 0.1 Density of t Rejection region 0.0

−4 0 tn−11−α 4

Critical value: tn−1,1−α If t > tn−1,1−α, reject H0; if t ≤ tn−1,α, not reject H0 Type I error: Pr(t > tn−1,1−α|H0) = α;

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 8/30 t-Test for the Mean: p-value method

H0 : µ = 175 vs H1 : µ > 175. X¯ − 175 Test : t = √ . S/ n Under H0: t ∼ tn−1 p-value: Pr(t > t(obs)|H0), the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic value, given that H0 is true.

p−value for the test: H0:µ=175 vs. H1:µ>175 0.4 1 − n 0.3 ● Rejection region 0.2 ● p−value: Pr(t>t(obs)) 0.1 Density function of t 0.0

−4 0 tn−11−t(obs)α 4

If p-value < α, reject H0; if p-value ≥ α, not reject H0. Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 9/30 Significance level: α

H0 is rejected if t > tn−1,1−α or p-value< α. α: significance level, type-I error, typically set to 0.05 Guidelines for judging the significance of a p-value If p ≥ 0.05, then the results are considered not statistically significant If 0.01 ≤ p < 0.05, then the results are statistically significant If 0.001 ≤ p < 0.01, then the results are highly significant If p < 0.001, then the results are very highly significant Report an exact p-value! The p-value indicates exactly how significant the results are without performing repeated significance tests at different α levels. The p-value indicate how close to statistical significance the results have come even when they are not statistically significant

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 10/30 Example: the cholesterol of children

Suppose the mean cholesterol level of 10 children whose fathers died from heart disease is 200 mg/dL and the standard deviation is 50 mg/dL. The average of cholesterol level in children is known as 175mg/dL. Is the average cholesterol level of these children larger than 175mg/dL? Let µ be the population mean cholesterol level of children whose fathers died from heart disease. The hypotheses are H0 : µ = 175 vs. H1 : µ > 175. x¯ − µ The test statistic is t = √ 0 and follows t under H . The s/ n n−1 0 200−√175 observed test statistic is 50/ n = 1.58. Critical-value method At the significance level 5%, the critical value is tn−1,1−α = t9,0.95 = 1.833 and the rejection region is t > 1.833. Since t(obs) = 1.58 < 1.833, we cannot reject H0 at significance level 5%

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 11/30 Example: the cholesterol of children

Suppose the mean cholesterol level of 10 children whose fathers died from heart disease is 200 mg/dL and the sample standard deviation is 50 mg/dL. The average of cholesterol level in children is known as 175mg/dL. Is the average cholesterol level of these children larger than 175mg/dL? Let µ be the population mean cholesterol level of children whose fathers died from heart disease. The hypotheses are H0 : µ = 175 vs. H1 : µ > 175. x¯ − µ0 The test statistic is t = √ and follows t = t under H . The s/ n n−1 9 0 200−√175 observed test statistic is 50/ n = 1.58. p-value method The p-value is p = Pr(t > t(obs)|H0) = Pr(t > 1.58|H0) = 0.074. Since p > 0.05, we cannot reject H0 at significance level 5%

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 12/30 One-tailed t-test for the mean

A one-tailed test is a test in which the values of the parameter being studied under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis (µ0) but not both.

H0 : µ = µ0 X¯ − µ test statistic: t = √ 0 S/ n

H1 rejection region p-value µ > µ0 t > tn−1,1−α Pr(t > t(obs)|H0) µ < µ0 t < tn−1,α Pr(t < t(obs)|H0)

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 13/30 Two-sided alternatives

x¯−√µ0 The test for H0 : µ = µ0 vs. H1 : µ 6= µ0 is based on t = s/ n . Critical-value method

I Rejection region: If |t| > tn−1,1−α/2, then H0 is rejected. I Acceptance region: If |t| > tn−1,1−α/2, then H0 is NOT rejected. I Type-I error: Pr(|t| > tn−1,1−α/2|H0) = α.

p-value method: p-value is Pr(|t| > |t(obs)||H0). If p < 0.05, then H0 is rejected at significance level 5%.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 14/30 Example: two-sided alternatives

(continued from the cholesterol data) Is the average cholesterol level of children whose fathers had heart disease different from the US average cholesterol level (175) of children?

The hypotheses are H0 : µ = 175 vs. H1 : µ 6= 175. The observed test statistic is t(obs) = 1.58. Critical-value method: At significance level of 5%, the rejection region is t > 2.262 or t < −2.262. Since the observed test statistic is −2.262 < 1.58 < 2.262, we cannot reject H0 at significance level 5%.

p-value method: p = Pr(|t| > |t(obs)||H0) = Pr(t > 1.58|H0) + Pr(t < −1.58|H0) = 0.149. Since p > 0.05, we cannot reject H0 at significance level 5%.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 15/30 The Relationship Between Hypothesis Testing and Confidence Intervals

Suppose we are testing H0 : µ = µ0 vs. H1 : µ 6= µ0.

H0 is rejected at significance level α, if and only if 100% × (1 − α) CI for µ does not contain µ0. √ Recall that 100%(1 − α) CI for µ isx ¯ ± tn−1,1−α/2s/ n. If H0 is rejected, x¯ − µ t = √ 0 < −t or t > t s/ n n−1,1−α/2 n−1,1−α/2 √ √ ⇒ x¯ − µ0 < −t s/ n orx ¯ − µ0 > t s/ n n−1,1−α/2 √ n−1,1−α/2 √ ⇒ µ0 > x¯ + tn−1,1−α/2s/ n or µ0 < x¯ − tn−1,1−α/2s/ n

Therefore, 100%(1 − α) CI does not include µ0. Similarly, the inverse can be proved.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 16/30 z-test for the Mean with Known

2 2 Let X1,...,Xn ∼ N(µ, σ ) and σ is known. X¯ − µ The test for H : µ = µ is based on the test statistic Z = √ 0 0 0 σ/ n which follows N(0, 1) under H0.

H1 Rejection region p-value µ > µ0 z > z1−α Pr(Z > z(obs)|H0) µ < µ0 z < zα Pr(Z < z(obs)|H0) µ 6= µ0 |z| > z1−α/2 Pr(|Z| > |z(obs)||H0)

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 17/30 Tests for Binomial probability p

Let X ∼ Binomial(n, p). We want to test H0 : p = p0 vs. H1 : p 6= p0. pˆ − p0 The test statistic is Z = p , wherep ˆ = X/n. If p0(1 − p0)/n np0(1 − p0) ≥ 5, Z ∼ N(0, 1) under H0.

Rejection region: z(obs) < zα/2 or z(obs) > z1−α/2 p-value:

I Ifp ˆ ≤ p0, then p = 2 × Pr(Z < z(obs)|H0) I Ifp ˆ > p0, then p = 2 × Pr(Z > z(obs)|H0)

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 18/30 Two-sample case

In a two-sample hypothesis-testing problem, the underlying parameters of two different are compared. fwt left and right maxfwt in the control group and exposed group Independent samples: when the data points in one sample are unrelated to the data points in the second sample.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 19/30 The paired sample: Paired t-test

Paired sample: when each data point in the first sample is matched and is related to a unique data point in the second sample. Paired samples may represent two sets of on the same people or on different people who are chosen on an basis using criteria, such as age and sex, to be very similar to each other. LEAD data: The numbers of right-hand and left-hand finger-wrist tapping (fwt and fwt l), respectively, were observed from each of 124 children. We want to test whether the number of finger-wrist tapping is different between right hand and left hand.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 20/30 The paired sample: paired t-test

Consider two samples: (X1,1,X2,1),..., (X1,n,X2,n), where E(X1,i) = µ1 and E(X2,i) = µ2 for all i = 1, . . . , n. We want to test H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.

Let ∆ = µ1 − µ2. Then, H0 : ∆ = 0 vs. H1 : ∆ 6= 0

To get rid of the correlation X1,i and X2,i, we consider the differences di = X1,i − X2,i for i = 1, . . . , n. 2 We assume d1, . . . , dn ∼ N(∆, σd). It is a one-sample t-test problem. d¯ ¯ Test statistic: t = √ , where d and sd are the sample mean sd/ n and standard deviation of the differences, respectively.

Under H0: t ∼ tn−1

p-value= 2 × Pr(t > |t(obs)| |H0) ¯ √ CI: d ± tn−1,1−α/2sd/ n

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 21/30 Example: Lead data

The numbers of right-hand and left-hand finger-wrist tapping (fwt r and fwt l), respectively, were observed from each of 124 children. We want to test whether the number of finger-wrist tapping is different between right hand and left hand. Since fwt r and fwt l are not independent and paired sample, we consider the difference of them.

H0 : ∆ = 0 vs. H1 : ∆ 6= 0 ¯ mean of difference: d = 5.919 and s.d. sd = 6.711 test statistic: t = 5.919√ = 9.8206 6.711/ 124 The distribution of test statistic: t124−1 p-value: 2 × Pr(t > 9.8206) = 3.699 × 10−17 At significance level 5%, we reject the null. Right- and left- hand fwt are significantly different.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 22/30 Two independent samples: equal

2 Consider two independent samples: X1,1,...,X1,n1 ∼ N(µ1, σ1) 2 (sample size n1) and X2,1,...,X2,n2 ∼ N(µ2, σ2) (sample size n2). We want to test H0 : µ1 = µ2 vs. H1 : µ1 6= µ2. 2 2 2 We assume σ = σ1 = σ2. 2 2 X¯1 ∼ N(µ1, σ /n1), X¯2 ∼ N(µ2, σ /n2) 2 2 X¯1 − X¯2 ∼ N(µ1 − µ2, σ /n1 + σ /n2) the pooled variance estimation of σ2: 2 2 2 (n1 − 1)s1 + (n2 − 1)s2 2 2 s = , weighted average of s1 and s2 n1 + n2 − 2 X¯1 − X¯2 Test statistic: t = p ∼ tn1+n2−2 under H0. s 1/n1 + 1/n2

Rejection region: t(obs) > tn1+n2−2,1−α/2 or

t(obs) < −tn1+n2−2,1−α/2 p-value= 2 × Pr(t > |t(obs)| |H0) 2p CI: (¯x1 − x¯2) ± tn1+n2−2,1−α/2s 1/n1 + 1/n2

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 23/30 Example: Lead data

We now assume the variances of maxfwt in the control (n1 = 78) and exposed group (n2 = 46) are the same. We want to test for H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.

sample meansx ¯1 = 62.44,x ¯2 = 59.76; sample variances: 2 2 s1 = 415.18 and s2 = 625.43 (78 − 1)415.18 + (46 − 1)625.43 pooled variance: s2 = = 492.734 78 + 46 − 2 62.44 − 59.76 test statistic: t = = 0.6482 p492.734(1/78 + 1/46)

Under H0, t ∼ t122 (df=78+46-2=122) p-value: 2 × Pr(t > 0.6482) = 0.518 At significance level 5%, we cannot reject the null hypothesis. No evidence of different means.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 24/30 Two independent samples: different variances

2 Two samples: X1,1,...,X1,n1 ∼ N(µ1, σ1) (sample size n1) and 2 X2,1,...,X2,n2 ∼ N(µ2, σ2) (sample size n2). We want to test H0 : µ1 = µ2 vs. H1 : µ1 6= µ2. ¯ ¯ 2 2 X1 − X2 ∼ N(µ1 − µ2, σ1/n1 + σ1/n2) X¯ − X¯ Test statistic: t = 1 2 p 2 2 s1/n1 + s2/n2 Under H0: the test statistic approximately follows t-distribution 2 2 2 0 (s1/n1 + s2/n2) with d.f. d = 2 2 2 2 (s1/n1) /(n1 − 1) + (s2/n2) /(n2 − 1) Rejection region: t(obs) > td0,1−α/2 or t(obs) < −td0,1−α/2

p-value= 2 × Pr(t > |t(obs)| |H0) p 2 2 CI: (¯x1 − x¯2) ± td0,1−α/2 s1/n1 + s2/n2

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 25/30 F-test for the Equal Variances

2 Two samples: X1,1,...,X1,n1 ∼ N(µ1, σ1) (sample size n1) and 2 X2,1,...,X2,n2 ∼ N(µ2, σ2) (sample size n2). We want to test 2 2 2 2 2 2 H0 : σ1 = σ2 vs. H1 : σ1 6= σ2. In other words, H0 : σ1/σ2 = 1 vs. 2 2 H1 : σ1/σ2 6= 1 (n − 1)S2/σ2 ∼ χ2 and (n − 2)S2/σ2 ∼ χ2 1 1 1 n1−1 2 2 2 n2−1 2 S1 Test statistic: F = 2 ∼ Fn1−1,n2−1 under H0 S2

Rejection region: F (obs) > Fn1−1,n2−1,1−α/2 or

F (obs) < Fn2−1,n1−1,α/2 p-value

I If F (obs) ≥ 1, then p = 2 × Pr(F > F (obs)|H0) I If F (obs) < 1, then p = 2 × Pr(F < F (obs)|H0)

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 26/30 Example: Lead data

Test whether the variances of maxfwt in the control (n1 = 78) and exposed group (n2 = 46) are the same or not. 2 2 2 2 H0 : σ1 = σ2 vs. H0 : σ1 6= σ2 2 2 sample variances: s1 = 415.18 and s2 = 625.43 2 2 test statistic: F = s1/s2 = 415.18/625.43 = 0.6638 The distribution of test statistic under H0: F77,45 p-value: 2 × Pr(F < 0.6638) = 0.1133 At significance level 5%, we cannot reject the null hypothesis. There is no evidence that the variances are different.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 27/30 Overlapping Confidence Intervals and Statistical Significance

Can we judge whether two statistics are significantly different depending on whether or not their confidence intervals overlap? The answer is: not always. If two statistics have non-overlapping confidence intervals, they are significantly different. If they have overlapping confidence intervals, it is not necessarily true that they are not significantly different.

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 28/30 Overlapping Confidence Intervals and Statistical Significance

Assumex ¯1 − x¯2 ≥ 0 without loss of generality. The means are significantly different if p 2 2 (¯x1 − x¯2) > 1.96 SE1 + SE2 CIs do not overlap ifx ¯1 − 1.96SE1 > x¯2 + 1.96SE2 which implies (x1 − x2) > 1.96(SE1 + SE2)

p 2 2 Since SE1 + SE2 ≤ SE1 + SE2,

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 29/30 Summary

1 What are your hypotheses? 2 Identify data type and test statistic 2 I t-test, z-test, χ -test, F-test I one-sample or two-sample (paired or independent) 3 Perform a test 4 Go back to numerical and/or graphical summary and confirm your test result matches your data

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 30/30