18.650 – Fundamentals of

4. Hypothesis testing

1/47 Goals

We have seen the basic notions of hypothesis testing:

I Hypotheses H0/H1, I Type 1/Type 2 error, level and power

I Test statistics and rejection region

I p-value

Our tests were based on CLT (and sometimes Slutsky). . .

2 I What if is Gaussian, σ is unknown and Slutsky does not apply?

I Can we use asymptotic normality of MLE?

I Tests about multivariate parameters θ = (θ1, . . . , θd) (e.g.: θ1 = θ2)? I More complex tests: ”Does my data follow a Gaussian distribution”?

2/47 Parametric hypothesis testing

3/47 Clinical trials

Let us go through an example to remind the main notions of hypothesis testing.

I Pharmaceutical companies use hypothesis testing to test if a new drug is efficient.

I To do so, they administer a drug to a group of patients (test group) and a placebo to another group (control group).

I We consider testing a drug that is supposed to lower LDL (low-density lipoprotein), a.k.a ”bad cholesterol” among patients with a high level of LDL (above 200 mg/dL)

4/47 Notation and modelling

I Let ∆d > 0 denote the expected decrease of LDL level (in mg/dL) for a patient that has used the drug.

I Let ∆c > 0 denote the expected decrease of LDL level (in mg/dL) for a patient that has used the placebo.

I We want to know if

I We observe two independent samples:

iid 2 I X1,...,Xn ∼ N ( , σd) from the test group and

iid 2 I Y1,...,Ym ∼ N ( , σc ) from the group.

5/47 Hypothesis testing

I Hypotheses:

H0 : vs. H1 :

I Since the data is Gaussian by assumption we don’t need the

I We have

X¯n ∼ and Y¯m ∼

I Therefore

X¯n − Y¯m − (∆ − ∆ ) d c ∼ N (0, 1)

6/47 Asymptotic test

I Assume that m = cn and n → ∞ I Using lemma, we also have

(d) −−−→ N (0, 1) n→∞

where n m 2 1 X 2 2 1 X 2 σˆ = (Xi − X¯n) and σˆ = (Yi − Y¯m) d n c m i=1 i=1

I We get the the following test at asymptotic level α:     Rψ =  

I This is -sided, -sample test. 7/47 Asymptotic test

I Example n = 70, m = 50, X¯n = 156.4, Y¯m = 132.7, 2 2 σˆd = 5198.4, σˆc = 3867.0, 156.4 − 132.7 = 1.57 q 5198.4 3867.0 70 + 50

Since q5% = 1.645, we

I We can also compute the p-value:

p-value = = 0.0582

8/47 8/47 Small sample size

I What if n = 20, m = 12?

I We cannot realistically apply Slutsky’s lemma

I We needed it to find the (asymptotic) distribution of quantities of the form X¯n − µ σˆ2 iid 2 when X1,...,Xn ∼ N (µ, σ ).

I It turns out that this distribution does not depend on µ or σ so we can compute its

9/47 The χ2 distribution

Definition For a positive integer d, the χ2 (pronounced “Kai-squared”) distribution with d degrees of freedom is the law of the random 2 2 2 iid variable Z1 + Z2 + ... + Zd , where Z1,...,Zd ∼ N (0, 1).

Examples: 2 I If Z ∼ Nd(0,Id), then kZk2 ∼ 2 I χ2 = Exp(1/2).

10/47 1.2 1.0 0.8 df=1 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 1.2 1.0 0.8 df=2 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 1.2 1.0 0.8 df=3 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 1.2 1.0 0.8 df=4 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 1.2 1.0 0.8 df=5 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 1.2 1.0 0.8 df=10 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 1.2 1.0 0.8 df=20 0.6 0.4 0.2 0.0

0 5 10 15 20 25

10/47 Properties χ2 distribution (2)

Definition For a positive integer d, the χ2 (pronounced “Kai-squared”) distribution with d degrees of freedom is the law of the random 2 2 2 iid variable Z1 + Z2 + ... + Zd , where Z1,...,Zd ∼ N (0, 1).

2 Properties: If V ∼ χk, then

I IE[V ] =

I var[V ] =

11/47 Important example: the sample

I Recall that the sample variance is given by n n 1 X 2 1 X 2 2 Sn = (Xi − X¯n) = X − (X¯n) n n i i=1 i=1 iid 2 I Cochran’s theorem states that for X1,...,Xn ∼ N (µ, σ ), if Sn is the sample variance, then I X¯n ⊥⊥ Sn;

nSn 2 I ∼ χ . σ2 n−1

2 I We often prefer the unbiased estimator of σ :

12/47 Student’s T distribution

Definition For a positive integer d, the Student’s T distribution with d degrees of freedom (denoted by td) is the law of the random Z 2 variable p , where Z ∼ N (0, 1), V ∼ χd and Z ⊥⊥ V (Z is V/d independent of V ).

13/47 14/47 Who was Student?

This distribution was introduced by William Sealy Gosset (1876–1937) in 1908 while he worked for the Guinness brewery in Dublin, Ireland.

15/47 Student’s T test (one sample, two-sided) iid 2 2 I Let X1,...,Xn ∼ N (µ, σ ) where both µ and σ are unknown I We want to test:

H0 : µ = 0, vs H1 : µ =6 0 Test : I √ nX¯n Tn = p = S˜n

√ 2 I Since nX¯n/σ ∼ (under ) and S˜n/σ ∼ are independent by theorem, we have:

Tn ∼

I Student’s test with (non asymptotic) level α ∈ (0, 1):

ψα = 1I{|Tn| > qα/2},

where qα/2 is the (1 − α/2)-quantile of tn−1.

16/47 Student’s T test (one sample, one-sided)

I We want to test:

H0 : µ ≤ µ0, vs H1 : µ > µ0

I Test statistic: √ n(X¯n ) Tn = p ∼ S˜n

under H0. I Student’s test with (non asymptotic) level α ∈ (0, 1):   ψα = 1I ,

17/47 Two-sample T-test

I Back to our cholesterol example. What happens for small sample sizes?

I We want to know the distribution of

X¯n − Y¯m − (∆d − ∆c) q 2 2 σˆd σˆc n + m

I We have approximately

X¯n − Y¯m − (∆d − ∆c) ∼ tN q 2 2 σˆd σˆc n + m where σˆ2/n +σ ˆ2/m2 d c ≥ N = 4 4 min(n, m) σˆd σˆc n2(n−1) + m2(m−1) (Welch-Satterthwaite formula) 18/47 Non-asymptotic test

I Example n = 70, m = 50, X¯n = 156.4, Y¯m = 132.7, 2 2 σˆd = 5198.4, σˆc = 3867.0, 156.4 − 132.7 = 1.57 q 5198.4 3867.0 70 + 50

I Using the shorthand formula N = min(n, m) = , we get q5% = 1.68 and p-value = = 0.0614

I Using the W-S formula

5198.4 + 3867.0 2 N = 70 50 = 113.78 5198.42 3867.02 702(70−1) + 502(50−1) we round to . I We get p-value = = 0.0596 19/47 Discussion

Advantage of Student’s test: Non asymptotic / Can be run on small samples

Drawback of Student’s test: It relies on the assumption that the sample is Gaussian (soon we will see how to test this assumption)

20/47 A test based on the MLE

I Consider an i.i.d. sample X1,...,Xn with d (E, (IPθ)θ∈Θ), where Θ ⊆ IR (d ≥ 1) and let θ0 ∈ Θ be fixed and given.

I Consider the following hypotheses: ( H0 : θ = θ0

H1 : θ =6 θ0.

MLE I Let θˆ be the MLE. Assume the MLE technical conditions are satisfied.

I If H0 is true, then

 MLE  (d) × θˆ − θ0 −−−→ Nd (0,Id) n n→∞

21/47 Wald’s test

I Hence,

>  MLE  MLE  MLE  (d) n θˆ − θ0 I(θˆ ) θˆ − θ0 −−−→ n n n→∞ | {z } Tn

I Wald’s test with asymptotic level α ∈ (0, 1):

ψ = 1I{Tn > qα},

2 where qα is the (1 − α)-quantile of χd (see tables).

I Remark: Wald’s test is also valid if H1 has the form “θ > θ0” or “θ < θ0” or “θ = θ1”...

22/47 A test based on the log-likelihood

I Consider an i.i.d. sample X1,...,Xn with statistical model d (E, (IPθ)θ∈Θ), where Θ ⊆ IR (d ≥ 1).

I Suppose the null hypothesis has the form

(0) (0) H0 :(θr+1, . . . , θd) = (θr+1, . . . , θd ),

(0) (0) for some fixed and given numbers θr+1, . . . , θd .

I Let θˆn = argmax `n(θ) (MLE) θ∈Θ and ˆc θn = argmax `n(θ) (“constrained MLE”) θ∈Θ0

where Θ0 =

23/47 Likelihood ratio test

Test statistic:   ˆ ˆc Tn = 2 `n(θn) − `n(θn) .

Wilks’ Theorem Assume H0 is true and the MLE technical conditions are satisfied. Then, (d) Tn −−−→ n→∞

Likelihood ratio test with asymptotic level α ∈ (0, 1):

ψ = 1I{Tn > qα},

2 where qα is the (1 − α)-quantile of χd−r (see tables).

24/47 Implicit hypotheses

d I Let X1,...,Xn be i.i.d. random variables and let θ ∈ IR be a parameter associated with the distribution of X1 (e.g. a , the parameter of a statistical model, etc...)

d k I Let g : IR → IR be continuously differentiable (with k < d).

I Consider the following hypotheses: ( H0 : g(θ) = 0

H1 : g(θ) =6 0.

I E.g. g(θ) = (θ1, θ2) (k = 2), or g(θ) = (k = 1), or...

25/47 Delta method

I Suppose an asymptotically normal estimator θˆn is available:

√   (d) n θˆn − θ −−−→ Nd(0, Σ(θ)). n→∞

I Delta method:

√   (d) n g(θˆn) − g(θ) −−−→ Nk (0, Γ(θ)) , n→∞

where Γ(θ) = ∇g(θ)>Σ(θ)∇g(θ) ∈ IRk×k.

I Assume Σ(θ) is invertible and ∇g(θ) has rank k. So, Γ(θ) is invertible and

√ −1/2   (d) n Γ(θ) g(θˆn) − g(θ) −−−→ Nk (0, ) . n→∞

26/47 Wald’s test for implicit hypotheses

I Then, by Slutsky’s theorem, if Γ(θ) is continuous in θ,

√ −1/2   (d) n Γ( ) g(θˆn) − g(θ) −−−→ Nk (0,Ik) . n→∞

I Hence, if H0 is true, i.e., g(θ) = 0,

ˆ > −1 ˆ ˆ (d) 2 ng(θn) Γ (θn)g(θn) −−−→ χk. | {z } n→∞ Tn

I Test with asymptotic level α:

ψ = 1I{},

2 where qα is the (1 − α)-quantile of χk (see tables).

27/47 Goodness of fit

28/47 Goodness of fit tests

Let X be a r.v. Given i.i.d copies of X we want to answer the following types of questions:

I Does X have distribution N (0, 1)? (Cf. Student’s T distribution)

I Does X have distribution U([0, 1])?

I Does X have PMF p1 = 0.3, p2 = 0.5, p3 = 0.2

These are all goodness of fit (GoF) tests: we want to know if the hypothesized distribution is a good fit for the data.

Key characteristic of GoF tests: no parametric modeling.

29/47 The zodiac sign of the most powerful people is....

Sign Count Can your zodiac sign Aries 23 predict how successful you Taurus 20 will be later in life? Gemini 18 Fortune magazine collected Cancer 23 the signs of 256 heads of Leo 20 the Fortune 500. Fyi: Virgo 19 256/12 Libra 18 =21.33 Scorpio 21 Sagittarius 19 Capricorn 22 Aquarius 24 Pisces 29

29/47 The zodiac sign of the most successful people is....

Sign Count In view of this data, is there Aries 23 statistical evidence that Taurus 20 successful people are more Gemini 18 likely to be born under Cancer 23 some sign than others? Leo 20 Virgo 19 Libra 18 Scorpio 21 Sagittarius 19 Capricorn 22 Aquarius 24 Pisces 29

29/47 275 jurors with identified racial group. We want to know if the jury is representative of the population of this county.

Race White Black Hispanic Other Total # jurors 205 26 25 19 275 proportion 0.72 0.07 0.12 0.09 1 in county

29/47 Discrete distribution

Let E = {a1, . . . , aK } be a finite space and (IPp)p∈∆K be the family of all probability distributions on E:

 K   K X  I ∆K = p = (p1, . . . , pK ) ∈ (0, 1) : pj = 1 .  j=1 

I For p ∈ ∆K and X ∼ IPp,

IPp[X = aj] = pj, j = 1,...,K.

30/47 Goodness of fit test

iid I Let X1,...,Xn ∼ IPp, for some unknown p ∈ ∆K , and let 0 p ∈ ∆K be fixed.

I We want to test: 0 0 H0: p = p vs. H1: p =6 p with asymptotic level α ∈ (0, 1).

0 I Example: If p = (1/K, 1/K, . . . , 1/K), we are testing whether IPp is on E.

31/47 Multinomial likelihood

I Likelihood of the model:

N1 N2 NK Ln(X1,...,Xn, p) = p1 p2 . . . pK ,

where Nj = #{i = 1, . . . , n : Xi = aj}.

I Let pˆ be the MLE:

Nj pˆj = , j = 1,...,K. n

B pˆ maximizes log Ln(X1,...,Xn, p) under the constraint

32/47 χ2 test

√ 0 I If H0 is true, then n(pˆ − p ) is asymptotically normal, and the following holds.

Theorem

 2 K pˆ − p0 X j j (d) n −−−→ χ2 . p0 n→∞ K−1 j=1 j | {z } Tn

2 I χ test with asymptotic level α: ψα = 1I{Tn > qα}, 2 where qα is the (1 − α)-quantile of χK−1.

I Asymptotic p-value of this test: p − value = IP[Z > Tn|Tn], 2 where Z ∼ χK−1 and Z ⊥⊥ Tn.

33/47 CDF and empirical CDF

Let X1,...,Xn be i.i.d. real random variables. Recall the cdf of X1 is defined as:

F (t) = IP[X1 ≤ t], ∀t ∈ IR.

It completely characterizes the distribution of X1.

Definition The empirical cdf of the sample X1,...,Xn is defined as:

n 1 X Fn(t) = 1I{Xi ≤ t} n i=1 #{i = 1, . . . , n : Xi ≤ t} = , ∀t ∈ IR. n

34/47 Consistency

By the LLN, for all t ∈ IR,

a.s. Fn(t) −−−→ F (t). n→∞

Glivenko-Cantelli Theorem (Fundamental theorem of statistics)

a.s. sup |Fn(t) − F (t)| −−−→ 0. t∈IR n→∞

35/47 Asymptotic normality

By the CLT, for all t ∈ IR,

√ (d)  n (Fn(t) − F (t)) −−−→ N 0, . n→∞

Donsker’s Theorem If F is continuous, then

√ (d) n sup |Fn(t) − F (t)| −−−→ sup |B(t)|, t∈IR n→∞ 0≤t≤1

where B is a Brownian bridge on [0, 1].

36/47 Goodness of fit for continuous distributions

I Let X1,...,Xn be i.i.d. real random variables with unknown cdf F and let F 0 be a continuous cdf.

I Consider the two hypotheses:

0 0 H0 : F = F v.s. H1 : F =6 F .

I Let Fn be the empirical cdf of the sample X1,...,Xn.

0 0 I If F = F , then Fn(t) ≈ F (t), for all t ∈ [0, 1].

37/47 Kolmogorov-Smirnov test

√ 0 I Let Tn = sup n Fn(t) − F (t) . t∈IR

(d) I By Donsker’s theorem, if H0 is true, then Tn −−−→ Z, n→∞ where Z has a known distribution (supremum of a Brownian bridge).

I KS test with asymptotic level α:

KS δα = 1I{Tn > qα},

where qα is the (1 − α)-quantile of Z (obtained in tables).

I p-value of KS test: IP[Z > Tn|Tn].

38/47 Computational issues

I In practice, how to compute Tn ?

0 I F is non decreasing, Fn is piecewise constant, with jumps at ti = Xi, i = 1, . . . , n.

I Let X(1) ≤ X(2) ≤ ... ≤ X(n) be the reordered sample.

I The expression for Tn reduces to the following practical formula:

  √ n i − 1 0 i 0 o Tn = n max max − F (X(i)) , − F (X(i)) . i=1,...,n n n

39/47 Pivotal distribution

I Tn is called a pivotal statistic: If H0 is true, the distribution

of Tn does not depend on the distribution of the Xi’s and it is easy to reproduce it in simulations.

0 I Indeed, let Ui = F (Xi), i = 1, . . . , n and let Gn be the

empirical cdf of U1,...,Un.

i.i.d. I If H0 is true, then U1,...,Un ∼ √ and Tn = sup n |Gn(x) − x|. 0≤x≤1

40/47 Quantiles and p-values

I For some large integer M: 1 M I Simulate M i.i.d. copies Tn ,...,Tn of Tn;

(n) I Estimate the (1 − α)-quantile qα of Tn by taking the sample (n,M) 1 M (1 − α)-quantile qˆα of Tn ,...,Tn .

I Test with approximate level α:

(n,M) δα = 1I{Tn > qˆα }.

I Approximate p-value of this test:

j #{j = 1,...,M : Tn > Tn} p-value ≈ . M

41/47 P1: JYS JWST106-App03 JWST106-O’Connor October 24, 2011 12:59 Printer: Yet to come

K-S table Appendix 3 Kolmogorov–Smirnov Tables

a Critical values, dalpha;(n) , of the maximum absolute difference between sample Fn(x) and population F(x) cumulative distribution.

Level of significance, α Number of trials, n 0.10 0.05 0.02 0.01

1 0.95000 0.97500 0.99000 0.99500 2 0.77639 0.84189 0.90000 0.92929 3 0.63604 0.70760 0.78456 0.82900 4 0.56522 0.62394 0.68887 0.73424 5 0.50945 0.56328 0.62718 0.66853 6 0.46799 0.51926 0.57741 0.61661 7 0.43607 0.48342 0.53844 0.57581 8 0.40962 0.45427 0.50654 0.54179 9 0.38746 0.43001 0.47960 0.51332 10 0.36866 0.40925 0.45662 0.48893 11 0.35242 0.39122 0.43670 0.46770 12 0.33815 0.37543 0.41918 0.44905 13 0.32549 0.36143 0.40362 0.43247 14 0.31417 0.34890 0.38970 0.41762 15 0.30397 0.33760 0.37713 0.40420 16 0.29472 0.32733 0.36571 0.39201 42/47 17 0.28627 0.31796 0.35528 0.38086 18 0.27851 0.30936 0.34569 0.37062 19 0.27136 0.30143 0.33685 0.36117 20 0.26473 0.29408 0.32866 0.35241 21 0.25858 0.28724 0.32104 0.34427 22 0.25283 0.28087 0.31394 0.33666 23 0.24746 0.27490 0.30728 0.32954 24 0.24242 0.26931 0.30104 0.32286

Practical , Fifth Edition. Patrick D. T. O’Connor and Andre Kleyner. © 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd. Other goodness of fit tests

We want to measure the distance between two functions: Fn(t) and F (t). There are other ways, leading to other tests:

I Kolmogorov-Smirnov:

d(Fn,F ) = sup |Fn(t) − F (t)| t∈IR

I Cram´er-Von Mises: Z 2 2 d (Fn,F ) = [Fn(t) − F (t)] dF (t) IR

I Anderson-Darling:

Z 2 2 [Fn(t) − F (t)] d (Fn,F ) = dF (t) IR F (t)(1 − F (t))

43/47 Composite goodness of fit tests

What if I want to test: ”Does X have Gaussian distribution?” but I don’t know the parameters? Simple idea: plug-in

sup Fn(t) − Φµ,ˆ σˆ2 (t) t∈IR where ¯ 2 2 µˆ = Xn, σˆ = Sn 2 and Φµ,ˆ σˆ2 (t) is the cdf of N (ˆµ, σˆ ).

In this case Donsker’s theorem is no longer valid. This is a common and serious mistake!

44/47 Kolmogorov-Lilliefors test (1)

Instead, we compute the quantiles for the test statistic:

sup Fn(t) − Φµ,ˆ σˆ2 (t) t∈IR They do not depend on unknown parameters!

This is the Kolmogorov-Lilliefors test.

45/47 400 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1967 thus estimated. The calculations were performed at The George Washington University Computing Center. When the values are compared with those in the standard table for the Kolmogorov-Smirnov test [ref. 2, 41 it is found that the Monte Carlo critical values are in most cases approximately two-thirds the standard values. Since the ratio of the Monte Carlo values to the standard values remains relatively fixed, especially for the larger values of N, it appeared that the values were then decreasing as 1/dR. The Monte Carlo values for a sample of size 40 were multiplied by the square root of 40 and the result was used as the numerator for the critical values for sample sizes greater than 30. In ref. 3 values were ob- tained via a similar calculation for N =100 using 400 samples. The values were in accord with the "asymptotic" values given in Table 1. Comparing Table 1 with the standard table for the Kolmogorov-Smirnov test from ref. 2, it is seen that the critical values in Table 1for a .O1 significance level are for each value of N slightly smaller than critical values for a .20 sig- nificance level using the standard tables. Thus the result of using the standard table when values of the and are estimated from the

TABLE 1. TABLE OF CRITICAL VALUES OF D K-L tableThe values of D given in the table are critical values associated with selected values of N. Any value of D which is greater than or equal to the t,abulated value is significant at the indicated level of significance. These values were obtained as a result of Monte Carlo calculations, using 1,000 or more samples for each value of N.

sample 1 Level of Significance for D = Max / F*(X)-Sn?(X)I Size I

Over 30 46/47 Quantile-Quantile (QQ) plots (1)

I Provide a visual way to perform GoF tests

I Not formal test but quick and easy check to see if a distribution is plausible.

I Main idea: we want to check visually if the plot of Fn is close −1 to that of F or equivalently if the plot of Fn is close to that of F −1.

I More convenient to check if the points 1 1 2 2 n − 1 n − 1 F −1( ),F −1( ), F −1( ),F −1( ),..., F −1( ),F −1( ) n n n n n n n n n are near the line y = x.

I Fn is not technically invertible but we define

−1 Fn (i/n) = X(i),

the ith largest observation.

47/47 47/47 Quantile-Quantile (QQ) plots (2)

47/4717/25 Quantile-QuantileFigure 1: QQ-plots for samples (QQ) of sizes plots 10, 50, 100 (3), 1000, 5000, 10000 from a standard . The upper-left figure is for sample size 10, the lower-right is for sample 10000.

Figure 2: QQ-plots for samples of sizes 10, 50, 100, 1000, 5000, 10000 from a t15 distribution. The upper-left figure is for sample size 10, the lower-right is for sample 10000. 47/4718/25 13