18.650 – Fundamentals of Statistics 26Mm 4. Hypothesis Testing
Total Page:16
File Type:pdf, Size:1020Kb
18.650 { Fundamentals of Statistics 4. Hypothesis testing 1/47 Goals We have seen the basic notions of hypothesis testing: I Hypotheses H0/H1, I Type 1/Type 2 error, level and power I Test statistics and rejection region I p-value Our tests were based on CLT (and sometimes Slutsky). 2 I What if data is Gaussian, σ is unknown and Slutsky does not apply? I Can we use asymptotic normality of MLE? I Tests about multivariate parameters θ = (θ1; : : : ; θd) (e.g.: θ1 = θ2)? I More complex tests: "Does my data follow a Gaussian distribution"? 2/47 Parametric hypothesis testing 3/47 Clinical trials Let us go through an example to remind the main notions of hypothesis testing. I Pharmaceutical companies use hypothesis testing to test if a new drug is efficient. I To do so, they administer a drug to a group of patients (test group) and a placebo to another group (control group). I We consider testing a drug that is supposed to lower LDL (low-density lipoprotein), a.k.a "bad cholesterol" among patients with a high level of LDL (above 200 mg/dL) 4/47 Notation and modelling I Let ∆d > 0 denote the expected decrease of LDL level (in mg/dL) for a patient that has used the drug. I Let ∆c > 0 denote the expected decrease of LDL level (in mg/dL) for a patient that has used the placebo. I We want to know if I We observe two independent samples: iid 2 I X1;:::;Xn ∼ N ( ; σd) from the test group and iid 2 I Y1;:::;Ym ∼ N ( ; σc ) from the group. 5/47 Hypothesis testing I Hypotheses: H0 : vs. H1 : I Since the data is Gaussian by assumption we don't need the I We have X¯n ∼ and Y¯m ∼ I Therefore X¯n − Y¯m − (∆ − ∆ ) d c ∼ N (0; 1) 6/47 Asymptotic test I Assume that m = cn and n ! 1 I Using lemma, we also have (d) −−−! N (0; 1) n!1 where n m 2 1 X 2 2 1 X 2 σ^ = (Xi − X¯n) and σ^ = (Yi − Y¯m) d n c m i=1 i=1 I We get the the following test at asymptotic level α: 8 9 < = R = : ; I This is -sided, -sample test. 7/47 Asymptotic test I Example n = 70; m = 50, X¯n = 156:4, Y¯m = 132:7, 2 2 σ^d = 5198:4, σ^c = 3867:0, 156:4 − 132:7 = 1:57 q 5198:4 3867:0 70 + 50 Since q5% = 1:645, we I We can also compute the p-value: p-value = = 0:0582 8/47 8/47 Small sample size I What if n = 20; m = 12? I We cannot realistically apply Slutsky's lemma I We needed it to find the (asymptotic) distribution of quantities of the form X¯n − µ σ^2 iid 2 when X1;:::;Xn ∼ N (µ, σ ). I It turns out that this distribution does not depend on µ or σ so we can compute its 9/47 The χ2 distribution Definition For a positive integer d, the χ2 (pronounced \Kai-squared") distribution with d degrees of freedom is the law of the random 2 2 2 iid variable Z1 + Z2 + ::: + Zd , where Z1;:::;Zd ∼ N (0; 1). Examples: 2 I If Z ∼ Nd(0;Id), then kZk2 ∼ 2 I χ2 = Exp(1=2). 10/47 1.2 1.0 0.8 df=1 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 1.2 1.0 0.8 df=2 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 1.2 1.0 0.8 df=3 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 1.2 1.0 0.8 df=4 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 1.2 1.0 0.8 df=5 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 1.2 1.0 0.8 df=10 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 1.2 1.0 0.8 df=20 0.6 0.4 0.2 0.0 0 5 10 15 20 25 10/47 Properties χ2 distribution (2) Definition For a positive integer d, the χ2 (pronounced \Kai-squared") distribution with d degrees of freedom is the law of the random 2 2 2 iid variable Z1 + Z2 + ::: + Zd , where Z1;:::;Zd ∼ N (0; 1). 2 Properties: If V ∼ χk, then I IE[V ] = I var[V ] = 11/47 Important example: the sample variance I Recall that the sample variance is given by n n 1 X 2 1 X 2 2 Sn = (Xi − X¯n) = X − (X¯n) n n i i=1 i=1 iid 2 I Cochran's theorem states that for X1;:::;Xn ∼ N (µ, σ ), if Sn is the sample variance, then I X¯n ?? Sn; nSn 2 I ∼ χ : σ2 n−1 2 I We often prefer the unbiased estimator of σ : 12/47 Student's T distribution Definition For a positive integer d, the Student's T distribution with d degrees of freedom (denoted by td) is the law of the random Z 2 variable p , where Z ∼ N (0; 1), V ∼ χd and Z ?? V (Z is V=d independent of V ). 13/47 14/47 Who was Student? This distribution was introduced by William Sealy Gosset (1876{1937) in 1908 while he worked for the Guinness brewery in Dublin, Ireland. 15/47 Student's T test (one sample, two-sided) iid 2 2 I Let X1;:::;Xn ∼ N (µ, σ ) where both µ and σ are unknown I We want to test: H0 : µ = 0; vs H1 : µ =6 0 Test statistic: I p nX¯n Tn = p = S~n p 2 I Since nX¯n/σ ∼ (under ) and S~n/σ ∼ are independent by theorem, we have: Tn ∼ I Student's test with (non asymptotic) level α 2 (0; 1): α = 1IfjTnj > qα/2g; where qα/2 is the (1 − α=2)-quantile of tn−1. 16/47 Student's T test (one sample, one-sided) I We want to test: H0 : µ ≤ µ0; vs H1 : µ > µ0 I Test statistic: p n(X¯n ) Tn = p ∼ S~n under H0. I Student's test with (non asymptotic) level α 2 (0; 1): α = 1I ; 17/47 Two-sample T-test I Back to our cholesterol example. What happens for small sample sizes? I We want to know the distribution of X¯n − Y¯m − (∆d − ∆c) q 2 2 σ^d σ^c n + m I We have approximately X¯n − Y¯m − (∆d − ∆c) ∼ tN q 2 2 σ^d σ^c n + m where σ^2=n +σ ^2=m2 d c ≥ N = 4 4 min(n; m) σ^d σ^c n2(n−1) + m2(m−1) (Welch-Satterthwaite formula) 18/47 Non-asymptotic test I Example n = 70; m = 50, X¯n = 156:4, Y¯m = 132:7, 2 2 σ^d = 5198:4, σ^c = 3867:0, 156:4 − 132:7 = 1:57 q 5198:4 3867:0 70 + 50 I Using the shorthand formula N = min(n; m) = , we get q5% = 1:68 and p-value = = 0:0614 I Using the W-S formula 5198:4 + 3867:0 2 N = 70 50 = 113:78 5198:42 3867:02 702(70−1) + 502(50−1) we round to . I We get p-value = = 0:0596 19/47 Discussion Advantage of Student's test: Non asymptotic / Can be run on small samples Drawback of Student's test: It relies on the assumption that the sample is Gaussian (soon we will see how to test this assumption) 20/47 A test based on the MLE I Consider an i.i.d. sample X1;:::;Xn with statistical model d (E; (IPθ)θ2Θ), where Θ ⊆ IR (d ≥ 1) and let θ0 2 Θ be fixed and given. I Consider the following hypotheses: ( H0 : θ = θ0 H1 : θ =6 θ0: MLE I Let θ^ be the MLE. Assume the MLE technical conditions are satisfied. I If H0 is true, then MLE (d) × θ^ − θ0 −−−! Nd (0;Id) n n!1 21/47 Wald's test I Hence, > MLE MLE MLE (d) n θ^ − θ0 I(θ^ ) θ^ − θ0 −−−! n n n!1 | {z } Tn I Wald's test with asymptotic level α 2 (0; 1): = 1IfTn > qαg; 2 where qα is the (1 − α)-quantile of χd (see tables). I Remark: Wald's test is also valid if H1 has the form \θ > θ0" or \θ < θ0" or \θ = θ1"... 22/47 A test based on the log-likelihood I Consider an i.i.d. sample X1;:::;Xn with statistical model d (E; (IPθ)θ2Θ), where Θ ⊆ IR (d ≥ 1). I Suppose the null hypothesis has the form (0) (0) H0 :(θr+1; : : : ; θd) = (θr+1; : : : ; θd ); (0) (0) for some fixed and given numbers θr+1; : : : ; θd . I Let θ^n = argmax `n(θ) (MLE) θ2Θ and ^c θn = argmax `n(θ) (\constrained MLE") θ2Θ0 where Θ0 = 23/47 Likelihood ratio test Test statistic: ^ ^c Tn = 2 `n(θn) − `n(θn) : Wilks' Theorem Assume H0 is true and the MLE technical conditions are satisfied. Then, (d) Tn −−−! n!1 Likelihood ratio test with asymptotic level α 2 (0; 1): = 1IfTn > qαg; 2 where qα is the (1 − α)-quantile of χd−r (see tables).