Nonparametrics

Example Nonparametrics A random sample of ten “400-gram soil specimens” were sampled in location A and analyzed for certain contaminant. The sample data are the followings:

65, 54, 66, 70, 72, 68, 64, 51, 81, 49

The contaminant levels are normally distributed. Test the hypothesis, at the level of significance 0.05, that the true mean contaminant level in this location is different from 50 mg/kg. One sample t-test

NP - 2

Advantages of Disadvantages of Nonparametric Nonparametric Procedures Procedures

 May Waste Information  Used With All Scales – Example: Converting Data From Interval to  Make Fewer Assumptions Ordinal Scale  Need Not Involve Population Parameters  If Data Permit Using Parametric Procedures  Results May Be as Good as Parametric  Difficult to Compute by Hand for Large Procedures Samples  Tables Not Widely Available

NP - 3 NP - 4

H0 :  = 50

HA :  > 50 The Use of Sign Test Sign Test (Binomial Test)  is median.

65, 54, 66, 70, 72, 68, 64, 49, 81, 48 • Tests One Population Median,  (eta) Ordered List • AlttifAn alternative for t -TtfOTest for One Mean  48, 49, 54, 64, 65, 66, 68, 70, 72, 81 • Assumes Population Is Continuous • Can Use Normal Approximation If n  10 + + + + + + + +

NP - 5 NP - 6

NP - 1 Nonparametrics

Sign Test (Binomial Test) Sign Test (Binomial Test)

H0:  =  Binomial probability density mass function: HA:  <  (or  >  , 

Test Statistic: n x nx P(X  x)    p (1 p) D = Total number of plus signs  x Observed significant level: ( Use Binomial p.m.f., n, p = 0.50) if H :  >  , pp--value = P(X  D) n! x nx A  b   p (1 p) if HA:  <  , pp--value = P(Xb  D), x!(n  x)! * * if HA:  , pp--value = 2P(Xb  D ), D is smaller of D, nD

Reject H0 if p--valuevalue < .. NP - 7 NP - 8

Sign Test SPSS Output Sign Test SPSS Output

Frequencies Test Statisticsb N conlevel - h0 Negative Differencesa 2 conlevel - h0 Exact Sig. (2-tailed) .109a Positive Differencesb 8 Tiesc 0 a. Binomial distribution used.

Total 10 b. Sign Test a. conlevel < h0 b. conlevel > h0 Median of the distribution is not significantly c. conlevel = h0 different from 50 mg/kg.

NP - 9 NP - 10

H0 :  = 50

HA :  > 50 Sign Test (Binomial Test)  is median. Large Sample Sign Test

65, 54, 66, 70, 72, 68, 64, 49, 81, 48 H0:  =  H :  <  (or  >  ,  Ordered List D = 8 A    Test Statistic: ( D  ( n / 2 )  48, 49, 54, 64, 65, 66, 68, 70, 72, 81 z  n / 4 + + + + + + + + Reject Null Hypothesis if p-value

 z  za , if HA:  >  [or p-value = P(Z  z)]  z  z , if H :  <  [or p-value = P(Z  z)] n = 10, p = .5  P(Xb  8) = .0547 [= P(Xb  2)]  A   |z  z2, if HA:  [or p-value = 2P(Z  |z|)] p-value if HA :  > 50. For Two-Sided Test : p-value = 2 x P(X  8) = .109 NP - 11 NP - 12

NP - 2 Nonparametrics

Difference = Before - After Example: To determine the median life span of certain spices of Signed Rank Test Is there significant animal is greater than 5 years, a random sample of 25 difference? observations were made and life span in year is the following: (for Paired-Sample)

11.3 5.8 3.1 4.1 7.3 4.4 1.4 2.5 6.6 7.6 24.9 30.1 2.9 Subject 1 2 3 4 5 6 7 8 9 10 11 12 5.5 7.2 3.2 3.9 7.2 20.1 3.1 6.1 4.9 19.4 4.2 6.3 After(()x) 55.2 63.6 58.8 77.2 58.5 69.2 59.5 70.0 68.9 74.0 83.9 74.8 At 0.05 level of significant, use sign test to test if the median life span is greater than 5 years. Before(y) 55.4 63.9 60.1 78.8 59.2 68.7 59.9 70.0 69.2 73.7 84.9 75.3

H0:  =  Difference 0.2 0.3 1.3 1.6 0.7 -0.5 0.4 0.0 0.3 -0.3 1.0 0.5 HA:  >  14  25/ 2 Test Statistic: D = 14 (# of “+” signs), z  = 0.6    p-value = P(Z  0.6) = .274 > .05 25/ 4 Conclusion: Fail to reject H0. There is no sufficient evidence to Ranks: 1 3 10 11 8 6.5 5 339 6.5 support that the median life span is greater than 5 yrs. NP - 13 NP - 14

Wilcoxon Signed Rank Test Wilcoxon Signed Rank Test (Paired-sample test) (Paired-sample test)

Diet Drug Data (SPSS Output) Diet Drug Data (SPSS Output) b Ranks Test Statistics N Mean Rank Sum of Ranks BEFORE - p-value BEFORE - AFTER Negative Ranks 2a 4.75 9.50 AFTER Z -2.095a Positive Ranks 9b 6.28 56.50 Ties 1c Asymp. Sig. (2-tailed) .036 Total 12 a. Based on negative ranks. a. BEFORE < AFTER b. Wilcoxon Signed Ranks Test b. BEFORE > AFTER c. BEFORE = AFTER NP - 15 NP - 16

The Wilcoxon Signed Rank Test The Wilcoxon Signed Rank Test (Large sample)

(Nonparametric alternative for one sample t-test.) Test Statistic: (Use overall rank and don’t consider 0)

T = Sum of ranks that have negative sign HA:  > 0 One sample: (Di= xi 0) T = Sum of ranks that have positive sign HA:  < 0 H0:  = 0 T = Small of – and + sign rank sums, H :  HA: 0, ( < 0,  > 0) A 0 The smaller the For Paired-Sample: (Di= yi  xi) Approximated z-test statistic: negative sign rank n(n 1) T  sum is, the more Assumptions: 4 likely that the H •D’s are independent and from a continuous distribution. z  A i n(n 1)(2n 1) is true. 24 NP - 17 NP - 18

NP - 3 Nonparametrics

The Wilcoxon Signed Rank Test The Wilcoxon Signed Rank Test (Large sample) (Small sample case)

Test Statistic: (Use overall rank and don’t consider 0)

Decision Rule: Reject H0 if p-value <  T = Sum of ranks that have negative sign HA:  > 0 T = Sum of ranks that have positive sign H :  <  Use standard normal table. A 0 T = Smaller of – and + sign rank sums, HA:   0

p-values: Decision Rule: Reject H0 if p-value <  HA:  > 0 or  < 0 , p-value = P (Z z) Table A-6 provides P (T T0) P-values: HA: 0 , p-value = 2P (Z z) HA:  > 0 or  < 0 , p-value = P (T T0) HA: 0 , p-value = 2P (T T0)

NP - 19 NP - 20

Diet Drug Example Wilcoxon Signed Rank Test (Small sample case) (Paired-sample test)

Is there significant difference before and after? Diet Drug Data (SPSS Output)

Ranks HA:   0 N Mean Rank Sum of Ranks The p-value based on different sample sizes can be found BEFORE - AFTER Negative Ranks 2a 4.75 9.50 in Wilcoxon Signed Rank Test Table A-6 (page A-15). Positive Ranks 9b 6.28 56.50 c Two-tailed test: (sample size 11) Ties 1 T = 3 + 6.5 = 9.5 Total 12 a. p-value is in between 2x0.0162=0.0324 and BEFORE < AFTER 2x0.0210=0.0420 which is less than 0.05, therefore the null b. BEFORE > AFTER hypothesis is rejected. c. BEFORE = AFTER NP - 21 NP - 22

Wilcoxon Signed Rank Test Nonparametric Test for Two (Paired-sample test) Independent Samples

Diet Drug Data (SPSS Output) Test Statisticsb 1.Tests Two Independent Distributions 2.Corresponds to t -Test for 2 Independent Means BEFORE - AFTER p-value 3.Assumptions Z -2.095a • Independent Random Samples Asymp. Sig. (2-tailed) .036 • Distributions (Populations) Are Continuous a. Based on negative ranks. 4.Can Use Normal Approximation If ni  10 11(111) b. Wilcoxon Signed Ranks Test 9.5  4 z   2.0894 (SPSS corrected for ties.) 11(111)(2111) 24 NP - 23 NP - 24

NP - 4 Nonparametrics

Wilcoxon Rank Sum Test Wilcoxon Rank Sum Test (Two independent samples test) (Two independent samples test)

Filtered Cigarette Data (SPSS Output) 0.9 1.1 1.2 0.8 1.6 0.9 0.7 1.0 0.9 Ranks

5.5 10 11.5 3 17 5.5 2 8.5 5.5 Cigarette Type N Mean Rank Sum of Ranks Ranks Amount of Tar Filtered 9 7.61 68.50 Non-filtered 10 12.15 121.50 Non-filtered Total 19 1.5 0.9 1.6 0.5 1.4 1.9 1.0 1.3 1.2 1.6 15 5.5 17 1 14 19 8.5 13 11.5 17

NP - 25 NP - 26

Wilcoxon Rank Sum Test Wilcoxon Rank Sum Test (Two independent samples test) (Two independent samples test)

Cigarette Data (SPSS Output)

b Test Statistics H0: D1 and D2 have Identical Distributions Amount of Tar (  >  ) Mann-Whitney U 23.500 p-value HA: D1 is to the right of D2 1 2 Wilcoxon W 68.500 D1 is to the left of D2 ( 1 < 2 ) Z -1.768 D is to the right or to the left of D (   ) Asymp. Sig. (2-tailed) .077 1 2 1 2 Exact Sig. [2*(1-tailed a .079 Sig.)] a. Not corrected for ties. b. Grouping Variable: Cigarette Type NP - 27 NP - 28

Decision Rule (Small Samples) Decision Rule (Large Samples)

Test Statistic: First, Find the rank sum from each of the two samples. n (n  n 1) Test Statistic: W  1 1 2 W = The smaller of the rank sums from the two samples. z  2 n1n2 (n1  n2 1) Reject Null Hypothesis if: 12 p-value < , where p-value can be found from Table A.7 where W is the rank sum of the sample 1. Reject Null Hypothesis if: For a two-sided test: p-value = 2x(p-value from the table.) (  >  )  z  z , if HA: D1 is to the right of D2 1 2 For a one-sided test: p-value = p  value from table A.7, (  <  )  z z , if HA: D1 is to the left of D2 1 2 if the evidence support Ha, otherwise, 1 table A.7 p-value. (   )  |z  z2, if HA: D1 is to the right or the left of D2 1 2 NP - 29 NP - 30

NP - 5 Nonparametrics

Kruskal-Wallis Test (One-way ANOVA) Birth Weight Example

Group 1: (Mother is a nonsmoker) Purpose: for comparing k distributions, k > 2. Weight 7.5 6.9 7.4 9.2 8.3 7.6 Assumptions: k independent random samples from k Rank 19 14 18 24 23 20 118 continuous distributions Group 2: (Mother is an ex-smoker but not during the pregnancy) Weight 5.8 7.1 8.2 7.1 7.8 Hypothesis: Rank 6 15.5 22 15.5 21 80

H0 : The distribution for all k populations are all the same Group 3: (Mother is a current smoker and smoke less than 1 pack per day) HA : At least two the population distributions differ in Weight 5.9 6.2 5.8 4.7 7.2 6.2 location. Rank 8 10.5 6 1 17 10.5 53 Group 4: (Mother is a current smoker and smoke more than 1 pack per day) Weight 6.2 6.8 5.7 4.9 6.2 5.8 5.4 Rank 10.5 13 4 2 10.5 6 3 49 NP - 31 NP - 32

Kruskal-Wallis Test Kruskal-Wallis Test (One-way ANOVA) (One-way ANOVA)

Infant Birth Weight Data (SPSS Output) Infant Birth Weight Data (SPSS Output)

Ranks Test Statisticsa,b

Mother's smoking status N Mean Rank Infant's birth Infant's birth weight Non-smoker 6 19.67 weight Ex-smoker 5 16.00 Chi-Square 13.324 Smoke < 1 6 8.83 p-value df 3 Smoke > 1 7 7.00 Asymp. Sig. .004 Total 24 a. Kruskal Wallis Test b. Grouping Variable: Mother's smoking status

NP - 33 NP - 34

Kruskal-Wallis Test Kruskal-Wallis Test (One-way ANOVA) (One-way ANOVA)

Test statistic: H0 : The weight distribution for all four 12 k R 2 12 k i 2 populations are all the same H    3(n 1)   ni (Ri  R ) n(n 1) i1 ni n(n 1) i1 HA : At l east t wo th e popul ati on di s trib u tions differ in location.  For large samples (n > 5), H ~ 2 with (k – 1) d.f. j 2 2 2  For small samples, use Kruskal-Wallis table. H = {12/[24(24+1)]}{118 /6 + 80 /5 + 53 /6 + 492/7}  3(24 + 1) where ni = number of measurement in sample i Ri = rank sum for sample i (overall ranking) = 13.324 => p -value < 0.01 n = total sample size (Chi -square table with d.f.: 4  1= 3)

2 Reject H . Decision Rule: Reject H0 if H >  or p-value <  . 0 NP - 35 NP - 36

NP - 6 Nonparametrics

Asymptotic Relative Efficiency

The asymptotic relative efficiency of Wilcoxon Rank-Sum Test to t-test Distribution Efficiency n Uniform 1.0 R.E.(1 v.s. 2) = 2 n1 Normal 0.955

n1 and n2 are sample Laplace 1.5 sizes for the tests to achieve the same power. Exponential 3.0 Cauchy  NP - 37

NP - 7