Nonparametric Statistics a Brief Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Nonparametric Statistics A Brief Introduction Curtis Greene December 4, 2003 Statistics Across the Curriculum – p.1 Nonparametric tests: examples • Sign Test • Mann-Whitney(=Wilcoxon Rank Sum) Test • Wilcoxon Signed Rank Test • Tukey’s Quick Test • Wald-Wolfowitz Runs Test • Kruskal-Wallis Test • Friedman Test • Spearman’s Rho /Kendall’s Tau • Chi-Squared Test • Kolmogorov-Smirnov Test • Log Rank Test Statistics Across the Curriculum – p.2 Nonparametric tests: examples • Sign Test (t-test/location) • Mann-Whitney(=Wilcoxon Rank Sum) Test • Wilcoxon Signed Rank Test (t-test/location) • Tukey’s Quick Test (t-test/location) • Wald-Wolfowitz Runs Test (t-test/location) • Kruskal-Wallis Test (ANOVA) • Friedman Test (ANOVA) • Spearman’s Rho /Kendall’s Tau (Correlation) • Chi-Squared Test (Goodness of fit) • Kolmogorov-Smirnov Test (Goodness of fit) • Log Rank Test (Survival data) Statistics Across the Curriculum – p.2 And possibly even more . Some statisticians would add to this list: • Binomial Test (comparing sample proportions) This despite the obvious presence of “parameters” being compared. Statistics Across the Curriculum – p.3 Definition of "nonparametric" • There is not complete agreement about the precise meaning of "nonparametric test". Some statisticians are more generous than others in what they would include. • Fortunately, it’s not a distinction worth arguing about. Nobody seems to care much about the semantics here. • The next slide contains a definition that seems to work pretty well. It is paraphrased from Practical Nonparametric Statistics, by W. J. Conover, which is an excellent book on this subject. Statistics Across the Curriculum – p.4 Definition of "nonparametric" A test is nonparametric if it satisfies at least one of the following conditions: • It may be applied to categorical(nominal) data. • It may be applied to ordinal data (measurements may be put in rank order). • It may be applied to continuous data where the distribution function of the random variable producing the data is either unspecified, or specified except for an infinite number of unknown parameters. – W. J. Conover, Practical Nonparametric Statistics Statistics Across the Curriculum – p.5 Some possible confusions • Nonparametric tests can also be applied to “parametric” data. • Nonparametric tests often compute “statistics” that are approximated by parametric distributions in the limit, e.g. approximately normal for large sample sizes. • For example, the chi-square test may be applied to any kind of data (even continuous normal). The accuracy of the chi-square statistic rests on “parametric” ideas, but requires the the sample sizes to be fairly large. For very small samples, there is an “even more nonparametric” version called Fishers’s exact test, that should be used instead. Statistics Across the Curriculum – p.6 Data for this talk Interval from primary AIDS diagnosis to death for a sample of 21 hemophiliac patients, stratified by age at diagnosis (1990 study). Age ≤ 40 Age > 40 P atientNumber Survival(months) P atientNumber Survival(months) 1 2 1 1 2 3 2 1 3 6 3 1 4 6 4 1 5 7 5 2 6 10 6 3 7 15 7 3 8 15 8 9 9 16 9 22 10 27 11 30 12 32 From Ch. 21, Principles of Biostatistics, by M. Pagano and K. Gauvreau Statistics Across the Curriculum – p.7 Approaches to Analyzing this Data • Compute means for the two groups and use a t-test to compare them. Statistics Across the Curriculum – p.8 Approaches to Analyzing this Data • Compute means for the two groups and use a t-test to compare them. • NO WAY!! This is a really bad idea. Statistics Across the Curriculum – p.8 Approaches to Analyzing this Data • Compute means for the two groups and use a t-test to compare them. • NO WAY!! This is a really bad idea. • Mann-Whitney (= Wilcoxon rank sum) test. Statistics Across the Curriculum – p.8 Approaches to Analyzing this Data • Compute means for the two groups and use a t-test to compare them. • NO WAY!! This is a really bad idea. • Mann-Whitney (= Wilcoxon rank sum) test. • Wald-Wolfowitz runs test. Statistics Across the Curriculum – p.8 Approaches to Analyzing this Data • Compute means for the two groups and use a t-test to compare them. • NO WAY!! This is a really bad idea. • Mann-Whitney (= Wilcoxon rank sum) test. • Wald-Wolfowitz runs test. • Tukey’s quick test. Statistics Across the Curriculum – p.8 Approaches to Analyzing this Data • Compute means for the two groups and use a t-test to compare them. • NO WAY!! This is a really bad idea. • Mann-Whitney (= Wilcoxon rank sum) test. • Wald-Wolfowitz runs test. • Tukey’s quick test. • Kaplan-Meier survival analysis, with a log-rank test*. *The data came from a chapter in Pagano-Gauvreau illustrating this method. Statistics Across the Curriculum – p.8 Wald-Wolfowitz Runs Test Objective: to test • H0: samples come from the populations with the same distribution, vs. • Ha: samples come from populations with different distributions Statistics Across the Curriculum – p.9 Wald-Wolfowitz Runs Test Objective: to test • H0: samples come from the populations with the same distribution, vs. • Ha: samples come from populations with different distributions Method: • Arrange all of the data in increasing order, labelling entries with “A” or “B” according to which sample they came from. • Compute the number T of “runs” of A’s or B’s. • Reject H0 if T is too large or too small. Statistics Across the Curriculum – p.9 Wald-Wolfowitz Runs Test 111122333667 B B B B A B A B B A A A 9 10 15 15 16 22 27 30 32 B A A A A B A A A T = # Runs = 10 Or maybe (because of ties) 111122333667 B B B B A B B B A A A A 9 10 15 15 16 22 27 30 32 B A A A A B A A A T = # Runs =8 This is neither too many nor too few to reject H0 at the 5% level (but it’s close). Statistics Across the Curriculum – p.10 Wald-Wolfowitz Runs Test • When a and b (the size of the two groups) are small, the cutoff values at various significance levels can be looked up in tables. Statistics Across the Curriculum – p.11 Wald-Wolfowitz Runs Test • When a and b (the size of the two groups) are small, the cutoff values at various significance levels can be looked up in tables. • For this example (with a = 12 and b =9) the acceptance region for a 2-tailed test (p = .05) is 7 ≤ T ≤ 15, and for a 1-tailed test it is T ≥ 8. Statistics Across the Curriculum – p.11 Wald-Wolfowitz Runs Test • When a and b (the size of the two groups) are small, the cutoff values at various significance levels can be looked up in tables. • For this example (with a = 12 and b =9) the acceptance region for a 2-tailed test (p = .05) is 7 ≤ T ≤ 15, and for a 1-tailed test it is T ≥ 8. • When a and b are large, T is approximately normal, with 2ab 2ab(2ab − a − b) µ = +1 , σ = 2 a + b s(a + b) (a + b − 1) Statistics Across the Curriculum – p.11 Jacob Wolfowitz Jacob Wolfowitz (1910-1981) is described in one obituary as “a giant among the founders of American Statistics”. He is best known for his work on sequential analysis and decision theory. According to one account, he is the first one to use the phrase “nonparametric statistics”, in a 1942 paper. His son Paul Wolfowitz (1943-) is currently serving as Deputy Secretary of Defense. Statistics Across the Curriculum – p.12 Main Page | Recent changes | Edit this page | Page Not logged in history Log in | Help Printable version Go Search Other languages: Deutsch | (Nihongo) Paul Wolfowitz From Wikipedia, the free encyclopedia. Paul Wolfowitz (born December 22, 1943) is an American political advisor and United States Deputy Secretary of Defense. Main Page Recent changes Random page He is a neoconservative and Straussian Current events known for his "hawkish" views, passionate pro-Israel advocacy and staunch support for Edit this page war on Iraq. Discuss this page Page history Wolfowitz is currently United States Deputy What links here Secretary of Defense — second in charge of Related changes the defense department, under the US Secretary of Defense, Donald Rumsfeld. Special pages Bug reports Donations A military analyst under Ronald Reagan, Wolfowitz was later a leading participant in the Project for the New American Century. That think tank formed in 1997 during the Clinton presidency, and expressed a new foreign policy with regard to Iraq and other "potential aggressor states", dismissing "containment" in favor of "preemption"; strike first to eliminate threats. Clinton, along with George H. W. Bush, Colin Powell, and other former Bush administration officials, dismissed calls for "preemption" in favor of continued "containment." This was the policy of George W. Bush as well for his first several months in office. Many saw Wolfowitz's plan as a "blueprint for US hegemony" and his "preemption" policy remained contained until the terrorist attacks of September 11 revived hawkish advocacy for defending by attacking. Folllowing the terrorist attacks of 9-11 debate began within the White House as to the degrees of action to take against Al Qaeda. The neoconservative members of President Bush's cabinet led by Wolfowitz advocated premptive strikes on terror cells in Afghanistan. This signaled the start of a new direction in the foreign policy plan of the Bush Administration, and led to the creation of what would later be dubbed the Bush Doctrine. Secretary of State Colin Powell supports the philosophy behind containment, as a moderated degree of action.