Kolmogorov-Smirnov test Mann-Whitney test Brief summary Quiz

Hypothesis testing III

Botond Szabo

Leiden University

Leiden, 16 April 2018 Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Outline

1 Kolmogorov-Smirnov test

2 Mann-Whitney test

3 Normality test

4 Brief summary

5 Quiz Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

One-sample Kolmogorov-Smirnov test

IID observations X1,..., Xn∼ F . Want to test

H0 : F = F0 versus H1 : F 6=F0,

where F0 is some fixed CDF. Test √ √ Tn = nDn = n sup |Fˆn(x) − F0(x)| x

If X1,..., Xn ∼ F0 and F0 is continuous, then the asymptotic distribution of Tn is independent of F0 and is given by the Kolmogorov distribution.

Reject the null hypothesis if Tn > Kα, where Kα is the 1 − α quantile of the Kolmogorov distribution. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Kolmogorov-Smirnov test (estimated parameters)

IID X1,..., Xn ∼ F . Want to test

H0 : F ∈ F versus H1 : F ∈/ F,

where F = {Fθ : θ ∈ Θ}. Test statistic √ √ ˆ Tn = nDn = n sup |Fn(x) − Fθˆ(x)|, x

where θˆ is an estimate of θ based on Xi ’s. Astronomers often proceed by treating θˆ as a constant and use the critical values from the usual Kolmogorov-Smirnov test. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Kolmogorov-Smirnov test (continued)

This is a faulty practice: asymptotically Tn will typically not have the Kolmogorov distribution. Extra work required to obtain the correct critical values.

The case when Fθ is the normal CDF has been worked out explicitly (Lilliefors test). Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Two-sample Kolmogorov-Smirnov test

IID observations X1,..., Xn∼ FX and Y1,..., Ym∼ FY . Want to test

H0 : FX = FY versus H1 : FX 6=FY .

Test statistic r nm r nm Tn,m = Dn,m = sup |FˆX (x) − FˆY (x)|. n + m n + m x

The limit distribution of Tn,m under the null hypothesis is the Kolmogorov distribution.

Reject the null hypothesis if Tn,m > Kα, where Kα is the 1 − α quantile of the Kolmogorov distribution. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Mann-Whitney test

IID observations X1,..., Xn∼ FX and Y1,..., Ym∼ FY . Want to test

H0 : FX = FY versus H1 : FX 6=FY .

Mann-Whitney test (or Wilcoxon rank sum test): Basic idea: group all the n + m observations together, rank them in order of increasing size and look at the rank sum of Y ’s, say (Wilcoxon statistic). If the latter takes too unlikelya value compared to what we could have obtained under the null hypthesis, we reject the null hypothesis. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Example

Example Consider the data in the following table (ranks are shown in parentheses):

X 0s Y ’s

1 (1) 6 (4) 3 (2) 4 (3)

The rank sum of X ’s is 3 and that of Y ’s is R = 7. 4 Under the null hypothesis each of the 2 = 6 assignments of ranks to Y ’s are equally likely. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Example (continued)

Example Ranks R

{1, 2} 3 {1, 3} 4 {1, 4} 5 {2, 3} 5 {2, 4} 6 {3, 4} 7 Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Example (continued)

Example Under the null hypothesis the distribution of R is:

r 3 4 5 6 7

1 1 1 1 1 P(R = r) 6 6 3 6 6

In particular, P(R = 7) = 1/6. So if the null hypothesis were true, the rank sum we saw would occur one time out of six purely on the basis of chance. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Mann-Whitney test: formalisation

It can be shown that the rank sum (Wilcoxon statistic) can be expressed in terms of the Mann-Whitney statistic

m n 1 X X U = 1 . mn [Xi

The latter is an estimator of P(X < Y ). If FX =FY , then P(X < Y ) = 1/2. The statistic U is trying to detect deviation from this. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Shapiro-Wilk test

IID observations X1,..., Xn ∼ F . Want to test

H0 : F is normal vs H1 : F is not normal . Test statistic Pn 2 ( ai X ) W = i=1 (i) , n Pn 2 i=1(Xi − X )

where X(i) is the ith order ,

mT V −1 (a ,..., a ) = , 1 n (mT V −1V −1m)1/2

with mi ’s the expectations of order statistics of IID standard normal random variables Z1,..., Zn, and V is the corresponding matrix. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Shapiro-Wilk test (continued)

The test rejects for small values of Wn.

The distribution of Wn is tabulated (or approximated), which allows determination of the critical values. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Brief summary

Wald test relies on an asymptotic argument and is useful in the large sample settings. t-test is exact (no asymptotics) and useful, but makes the normality assumption. Likelihood ratio test is excellent, and even the best in various senses in various setups, but makes parametric assumptions. There are many normality tests, Shapiro-Wilk is just one example. They complement nicely graphical tools for checking normality (, QQ-plot). Not so useful with small samples, but small samples always pose difficulties. Nonparametric tests are nice, sometimes exact, other times based on asymptotic arguments. No silver bullet. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Question 1

What is not true for the one-sample Kolmogorov-Smirnov test?

Answers: 1 It is a nonparametric test. √ ˆ 2 The test statistics is Tn = n supx |Fn(x) − F0(x)| 3 The asymptotic distribution of Tn depends on F0.

4 The asymptotic distribution of Tn is the Kolmogorov distribution. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Question 1

What is not true for the one-sample Kolmogorov-Smirnov test?

Answers: 1 It is a nonparametric test. √ ˆ 2 The test statistics is Tn = n supx |Fn(x) − F0(x)| 3 The asymptotic distribution of Tn depends on F0.

4 The asymptotic distribution of Tn is the Kolmogorov distribution. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Question 2

What is the basic idea in the Mann-Whitney test for checking if two data sets are coming from the same distribution?

Answers: 1 Rank the observations and check if the rand sum of the first sample is not too unlikely. 2 Check if the average of the observations is different. 3 The empirical distributions are close to each other. 4 The difference of the observations should concentrate aorund zero. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Question 2

What is the basic idea in the Mann-Whitney test for checking if two data sets are coming from the same distribution?

Answers: 1 Rank the observations and check if the rand sum of the first sample is not too unlikely. 2 Check if the average of the observations is different. 3 The empirical distributions are close to each other. 4 The difference of the observations should concentrate aorund zero. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Question 3

What is not true for the Shapiro-Wilk test?

Answers: 1 It is a nonparametric test. 2 It is a normality test.

3 The distribution of the test statisitcs Wn is given in a table. 4 It is the single best test to check normality and especially useful for small sample size. Kolmogorov-Smirnov test Mann-Whitney test Normality test Brief summary Quiz

Question 3

What is not true for the Shapiro-Wilk test?

Answers: 1 It is a nonparametric test. 2 It is a normality test.

3 The distribution of the test statisitcs Wn is given in a table. 4 It is the single best test to check normality and especially useful for small sample size.