STAT 416 – Nonparametric Tests

1. Test for Randomness (Trend) (1). Test based on total number of runs (2). Runs Up and Down: (3). Rank von Nueman Test

2. Goodness-of-fit test.

H0 : FX (x) = F0 (x) for all x, F0 (x) is specified. (1). Chi-square Test.

k (f − e ) Q = X i i ∼ χ2 (k − 1) , under null hypothesis, i=1 ei where fi is the observed frequency in i-th group, and ei is the ex- pected frequency in i-th group, i = 1, ..., k. (2). Kolmogrov-Smirnov Test.

Dn = sup kFn (x) − F0 (x)k x where Fn (x) is the empirical cdf of the observed sample.

3. Location Test for One-Sample and Paired-Sample. n n n Data {Xi}i=1 or {Di = Yi − Xi}i=1 for paired sample {(Xi,Yi)}i=1. Hypothesis H0 : M = M0 (θ = P (X > M0) = 0.5) , where M0 is specific . (1).

n X K = I {Xi > M0} ∼ Binomial (n, p = 0.5) , underH0 i=1 (2). Wilcoxon Signed-Rank Test

n + X T = Zi · r (|Di|) i=1 where Di = Xi − M0,Zi = I {Di > 0} .

1 4. Test for General Two-Sample

Data: {X1, ...Xm}{Y1,..., Yn} two independent samples.

(1). Hypothesis H0 : FX (x) = FY (x) for all x. Kolmogrov- Smirnov Test.

Dm,n = sup kFm,X (x) − Fn,Y (x)k x where Fm,X and Fn,Y are the empirical cdfs of samples X and Y . (2). Location Test for General Two-Sample from a Location Fam- ily. Hypothesis H0 : MX = MY . m P a). Median Test: U = I {Xi < M} ,where M is the median of i=1 pooled sample {X1, ...Xm,Y1,..., Yn}. m P b). Control Median (Y is a control sample): V = I {Xi < MY } i=1 c). Mann-Whitney Test (for tendency): m n X X U = Dij,Dij = I {YJ < Xi} i=1 j=1

5. Linear Rank Test for Location Problems

Hypothesis H0 : FY (x) = FX (x) for all x, vs. H1 : FY (x) =

FX (x − θ) for all x and some θ 6= 0. Pooled sample size N=n+m.

Wilcoxon Rank Sum Test. H0 : θ = 0, vs. H1 : θ 6= 0, N X WN = iZi, i=1 where Zi = 1 if the i-th ordered random variable is an X; otherwise

Zi = 0.

6. Linear Rank Test for Scale Problems

Hypothesis H0 : FY (x) = FX (x) for all x, vs. H1 : FY (x) =

FX (x · θ) for all x and some θ > 0, θ 6= 1. (1). Mood Test:

N 2 X  N + 1 MN = i − Zi i=1 2

2 (2). Siegel-Tukey Test   2i for i even, , 1 < i ≤ N/2 N  X  2i − 1 for i odd, , 1 < i ≤ N/2 SN = aiZi, where ai = i=1  2 (N − i) + 2 for i even, ,N/2 < i ≤ N   2 (N − i) + 1 for i odd, ,N/2 < i ≤ N

(3). Sukhatme Test

m n X X T = Dij, where Dij = I {YJ < Xi < 0, or 0 < Xi < Yj} i=1 j=1

7. Test for Equality of k Independent Samples

All samples are from a location model F (x − θi) , i = 1, ...k.

Hypothesis: H0 : θ1 = θ2 = ... = θk vs H1 : θi 6= θj, for at least one pair. (1). Extension of Median Test and Control Median Test (3). Kruskal-Wallis Test:

k " #2 12 X 1 ni (N + 1) H = Ri − N (N + 1) i=1 ni 2 where Ri is the rank sum of i-th sample, ni is the size of i-th sample (4). Multiple Comparison ¯ ¯ Ri − Rj Z = ij r   N(N+1) 1 + 1 12 ni nj

Compare the above with normal score z∗ = Φ−1 (α∗/ (k (k − 1))) ,for multiple comparison α∗ = 0.20.

8. Measure of Association for Bivariate Sample

Data {(Xi,Yi) , i = 1, ..., n} . Hypothesis H0 : two samples are independent

3 (1). Kendall’s Tau Statistic for τ = pc − pd (difference of proba- bilities of concordance and discordance)

n n P P Aij T = i=1 j=1 , where A = sign (X − X ) sign (Y − Y ) , n (n − 1) ij j i j i (2). Spearman’s Rho coefficient of

n n P  ¯  ¯ P 2 12 Ri − R Si − S 6 Di R = i=1 = 1 − i=1 n (n2 − 1) n (n2 − 1) where Ri and Si are the ranks of Xi and Yi respectively,Di = Ri−Si.

9. Friedman’s ANOVA Test by Ranks A set of observations is collected over k blocks and n treatments

(complete randomized block design), its rank Rij is the rank of ob- k P servation in i-th block. Rj = Rij is the rank of j-th treatment i=1 Hypothesis on treatment effect. H0 : θ1 = θ2 = ... = θn Friedman’s Test Statistic

n !2 X k (n + 1) S = Rj − j=1 2 12 · S Q = ∼ χ2 (n − 1) under H . kn (n + 1) 0

10. Kendall’s Coefficient of Concordance of k sets of n objects There are k sets of observations collected, and each set includes n objects. Rank Rij is the rank of observation in i-th set, and k P Rj = Rij. i=1 Hypothesis H0 : k sets are independent (or there is no associa- tion).

4 The deviation is n !2 X k (n + 1) S = Rj − j=1 2 12 · S Q = ∼ χ2 (n − 1) under H . kn (n + 1) 0 Kendall’s Coefficient of Concordance (ratio statistic): 12 · S W = k2n (n − 1) where 0 ≤ W ≤ 1.

11. Chisquare Test for Independence ()

Two-dimensional lists count number Xij at i-th level of factor A (Ai) and j-th level of factor B (Bj). Denote Xi· and

X·j be the row total and column total. Let θij = P (Ai ∩ Bj) , θi· = P P j θij = P (Ai) , θ·j = i θij = P (Bj) , which is subject to restric- P P tion i θi· = j θ·j = 1. The hypothesis of independence:

H0 : θij = θi·θ·j for all i and all j Under the null hypothesis, test statistic r k (NX − X X )2 Q = X X ij i· ·j ∼ χ2 ((r − 1) (k − 1)) . i=1 j=1 NXi·X·j

12. Fisher’s Exact Test

Two independent bionomial random samples, Yi ∼ Bin (ni, θi) , i =

1, 2. Under null hypothesis H0 : θ1 = θ2 = θ, the exact distribution given that Y = Y1 + Y2     n1 n2     y1 y − y1 P (Y1 = y1| Y = y) =   N   y where N = n1 + n2.

5