Nonparametric Methods
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Section 13 Kolmogorov-Smirnov Test
Section 13 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X1; : : : ; Xn with some unknown distribution P and we would like to test the hypothesis that P is equal to a particular distribution P0; i.e. decide between the following hypotheses: H : P = P ; H : P = P : 0 0 1 6 0 We already know how to test this hypothesis using chi-squared goodness-of-fit test. If dis tribution P0 is continuous we had to group the data and consider a weaker discretized null hypothesis. We will now consider a different test for H0 based on a very different idea that avoids this discretization. 1 1 men data 0.9 0.9 'men' fit all data women data 0.8 normal fit 0.8 'women' fit 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 Cumulative probability 0.3 Cumulative probability 0.3 0.2 0.2 0.1 0.1 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Data Data Figure 13.1: (a) Normal fit to the entire sample. (b) Normal fit to men and women separately. Example.(KS test) Let us again look at the normal body temperature dataset. Let ’all’ be a vector of all 130 observations and ’men’ and ’women’ be vectors of length 65 each corresponding to men and women. First, we fit normal distribution to the entire set ’all’. -
Nonparametric Statistics for Social and Behavioral Sciences
Statistics Incorporating a hands-on approach, Nonparametric Statistics for Social SOCIAL AND BEHAVIORAL SCIENCES and Behavioral Sciences presents the concepts, principles, and methods NONPARAMETRIC STATISTICS FOR used in performing many nonparametric procedures. It also demonstrates practical applications of the most common nonparametric procedures using IBM’s SPSS software. Emphasizing sound research designs, appropriate statistical analyses, and NONPARAMETRIC accurate interpretations of results, the text: • Explains a conceptual framework for each statistical procedure STATISTICS FOR • Presents examples of relevant research problems, associated research questions, and hypotheses that precede each procedure • Details SPSS paths for conducting various analyses SOCIAL AND • Discusses the interpretations of statistical results and conclusions of the research BEHAVIORAL With minimal coverage of formulas, the book takes a nonmathematical ap- proach to nonparametric data analysis procedures and shows you how they are used in research contexts. Each chapter includes examples, exercises, SCIENCES and SPSS screen shots illustrating steps of the statistical procedures and re- sulting output. About the Author Dr. M. Kraska-Miller is a Mildred Cheshire Fraley Distinguished Professor of Research and Statistics in the Department of Educational Foundations, Leadership, and Technology at Auburn University, where she is also the Interim Director of Research for the Center for Disability Research and Service. Dr. Kraska-Miller is the author of four books on teaching and communications. She has published numerous articles in national and international refereed journals. Her research interests include statistical modeling and applications KRASKA-MILLER of statistics to theoretical concepts, such as motivation; satisfaction in jobs, services, income, and other areas; and needs assessments particularly applicable to special populations. -
Nonparametric Estimation by Convex Programming
The Annals of Statistics 2009, Vol. 37, No. 5A, 2278–2300 DOI: 10.1214/08-AOS654 c Institute of Mathematical Statistics, 2009 NONPARAMETRIC ESTIMATION BY CONVEX PROGRAMMING By Anatoli B. Juditsky and Arkadi S. Nemirovski1 Universit´eGrenoble I and Georgia Institute of Technology The problem we concentrate on is as follows: given (1) a convex compact set X in Rn, an affine mapping x 7→ A(x), a parametric family {pµ(·)} of probability densities and (2) N i.i.d. observations of the random variable ω, distributed with the density pA(x)(·) for some (unknown) x ∈ X, estimate the value gT x of a given linear form at x. For several families {pµ(·)} with no additional assumptions on X and A, we develop computationally efficient estimation routines which are minimax optimal, within an absolute constant factor. We then apply these routines to recovering x itself in the Euclidean norm. 1. Introduction. The problem we are interested in is essentially as fol- lows: suppose that we are given a convex compact set X in Rn, an affine mapping x A(x) and a parametric family pµ( ) of probability densities. Suppose that7→N i.i.d. observations of the random{ variable· } ω, distributed with the density p ( ) for some (unknown) x X, are available. Our objective A(x) · ∈ is to estimate the value gT x of a given linear form at x. In nonparametric statistics, there exists an immense literature on various versions of this problem (see, e.g., [10, 11, 12, 13, 15, 17, 18, 21, 22, 23, 24, 25, 26, 27, 28] and the references therein). -
Downloaded 09/24/21 09:09 AM UTC
MARCH 2007 NOTES AND CORRESPONDENCE 1151 NOTES AND CORRESPONDENCE A Cautionary Note on the Use of the Kolmogorov–Smirnov Test for Normality DAG J. STEINSKOG Nansen Environmental and Remote Sensing Center, and Bjerknes Centre for Climate Research, and Geophysical Institute, University of Bergen, Bergen, Norway DAG B. TJØSTHEIM Department of Mathematics, University of Bergen, Bergen, Norway NILS G. KVAMSTØ Geophysical Institute, University of Bergen, and Bjerknes Centre for Climate Research, Bergen, Norway (Manuscript received and in final form 28 April 2006) ABSTRACT The Kolmogorov–Smirnov goodness-of-fit test is used in many applications for testing normality in climate research. This note shows that the test usually leads to systematic and drastic errors. When the mean and the standard deviation are estimated, it is much too conservative in the sense that its p values are strongly biased upward. One may think that this is a small sample problem, but it is not. There is a correction of the Kolmogorov–Smirnov test by Lilliefors, which is in fact sometimes confused with the original Kolmogorov–Smirnov test. Both the Jarque–Bera and the Shapiro–Wilk tests for normality are good alternatives to the Kolmogorov–Smirnov test. A power comparison of eight different tests has been undertaken, favoring the Jarque–Bera and the Shapiro–Wilk tests. The Jarque–Bera and the Kolmogorov– Smirnov tests are also applied to a monthly mean dataset of geopotential height at 500 hPa. The two tests give very different results and illustrate the danger of using the Kolmogorov–Smirnov test. 1. Introduction estimates of skewness and kurtosis. Such moments can be used to diagnose nonlinear processes and to provide The Kolmogorov–Smirnov test (hereafter the KS powerful tools for validating models (see also Ger- test) is a much used goodness-of-fit test. -
Repeated-Measures ANOVA & Friedman Test Using STATCAL
Repeated-Measures ANOVA & Friedman Test Using STATCAL (R) & SPSS Prana Ugiana Gio Download STATCAL in www.statcal.com i CONTENT 1.1 Example of Case 1.2 Explanation of Some Book About Repeated-Measures ANOVA 1.3 Repeated-Measures ANOVA & Friedman Test 1.4 Normality Assumption and Assumption of Equality of Variances (Sphericity) 1.5 Descriptive Statistics Based On SPSS dan STATCAL (R) 1.6 Normality Assumption Test Using Kolmogorov-Smirnov Test Based on SPSS & STATCAL (R) 1.7 Assumption Test of Equality of Variances Using Mauchly Test Based on SPSS & STATCAL (R) 1.8 Repeated-Measures ANOVA Based on SPSS & STATCAL (R) 1.9 Multiple Comparison Test Using Boferroni Test Based on SPSS & STATCAL (R) 1.10 Friedman Test Based on SPSS & STATCAL (R) 1.11 Multiple Comparison Test Using Wilcoxon Test Based on SPSS & STATCAL (R) ii 1.1 Example of Case For example given data of weight of 11 persons before and after consuming medicine of diet for one week, two weeks, three weeks and four week (Table 1.1.1). Tabel 1.1.1 Data of Weight of 11 Persons Weight Name Before One Week Two Weeks Three Weeks Four Weeks A 89.43 85.54 80.45 78.65 75.45 B 85.33 82.34 79.43 76.55 71.35 C 90.86 87.54 85.45 80.54 76.53 D 91.53 87.43 83.43 80.44 77.64 E 90.43 84.45 81.34 78.64 75.43 F 90.52 86.54 85.47 81.44 78.64 G 87.44 83.34 80.54 78.43 77.43 H 89.53 86.45 84.54 81.35 78.43 I 91.34 88.78 85.47 82.43 78.76 J 88.64 84.36 80.66 78.65 77.43 K 89.51 85.68 82.68 79.71 76.5 Average 89.51 85.68 82.68 79.71 76.69 Based on Table 1.1.1: The person whose name is A has initial weight 89,43, after consuming medicine of diet for one week 85,54, two weeks 80,45, three weeks 78,65 and four weeks 75,45. -
7 Nonparametric Methods
7 NONPARAMETRIC METHODS 7 Nonparametric Methods SW Section 7.11 and 9.4-9.5 Nonparametric methods do not require the normality assumption of classical techniques. I will describe and illustrate selected non-parametric methods, and compare them with classical methods. Some motivation and discussion of the strengths and weaknesses of non-parametric methods is given. The Sign Test and CI for a Population Median The sign test assumes that you have a random sample from a population, but makes no assumption about the population shape. The standard t−test provides inferences on a population mean. The sign test, in contrast, provides inferences about a population median. If the population frequency curve is symmetric (see below), then the population median, iden- tified by η, and the population mean µ are identical. In this case the sign procedures provide inferences for the population mean. The idea behind the sign test is straightforward. Suppose you have a sample of size m from the population, and you wish to test H0 : η = η0 (a given value). Let S be the number of sampled observations above η0. If H0 is true, you expect S to be approximately one-half the sample size, .5m. If S is much greater than .5m, the data suggests that η > η0. If S is much less than .5m, the data suggests that η < η0. Mean and Median differ with skewed distributions Mean and Median are the same with symmetric distributions 50% Median = η Mean = µ Mean = Median S has a Binomial distribution when H0 is true. The Binomial distribution is used to construct a test with size α (approximately). -
9 Blocked Designs
9 Blocked Designs 9.1 Friedman's Test 9.1.1 Application 1: Randomized Complete Block Designs • Assume there are k treatments of interest in an experiment. In Section 8, we considered the k-sample Extension of the Median Test and the Kruskal-Wallis Test to test for any differences in the k treatment medians. • Suppose the experimenter is still concerned with studying the effects of a single factor on a response of interest, but variability from another factor that is not of interest is expected. { Suppose a researcher wants to study the effect of 4 fertilizers on the yield of cotton. The researcher also knows that the soil conditions at the 8 areas for performing an experiment are highly variable. Thus, the researcher wants to design an experiment to detect any differences among the 4 fertilizers on the cotton yield in the presence a \nuisance variable" not of interest (the 8 areas). • Because experimental units can vary greatly with respect to physical characteristics that can also influence the response, the responses from experimental units that receive the same treatment can also vary greatly. • If it is not controlled or accounted for in the data analysis, it can can greatly inflate the experimental variability making it difficult to detect real differences among the k treatments of interest (large Type II error). • If this source of variability can be separated from the treatment effects and the random experimental error, then the sensitivity of the experiment to detect real differences between treatments in increased (i.e., lower the Type II error). • Therefore, the goal is to choose an experimental design in which it is possible to control the effects of a variable not of interest by bringing experimental units that are similar into a group called a \block". -
Exponentiality Test Using a Modified Lilliefors Test Achut Prasad Adhikari
University of Northern Colorado Scholarship & Creative Works @ Digital UNC Dissertations Student Research 5-1-2014 Exponentiality Test Using a Modified Lilliefors Test Achut Prasad Adhikari Follow this and additional works at: http://digscholarship.unco.edu/dissertations Recommended Citation Adhikari, Achut Prasad, "Exponentiality Test Using a Modified Lilliefors Test" (2014). Dissertations. Paper 63. This Text is brought to you for free and open access by the Student Research at Scholarship & Creative Works @ Digital UNC. It has been accepted for inclusion in Dissertations by an authorized administrator of Scholarship & Creative Works @ Digital UNC. For more information, please contact [email protected]. UNIVERSITY OF NORTHERN COLORADO Greeley, Colorado The Graduate School AN EXPONENTIALITY TEST USING A MODIFIED LILLIEFORS TEST A Dissertation Submitted in Partial Fulfillment Of the Requirements for the Degree of Doctor of Philosophy Achut Prasad Adhikari College of Education and Behavioral Sciences Department of Applied Statistics and Research Methods May 2014 ABSTRACT Adhikari, Achut Prasad An Exponentiality Test Using a Modified Lilliefors Test. Published Doctor of Philosophy dissertation, University of Northern Colorado, 2014. A new exponentiality test was developed by modifying the Lilliefors test of exponentiality for the purpose of improving the power of the test it directly modified. Lilliefors has considered the maximum absolute differences between the sample empirical distribution function (EDF) and the exponential cumulative distribution function (CDF). The proposed test considered the sum of all the absolute differences between the CDF and EDF. By considering the sum of all the absolute differences rather than only a point difference of each observation, the proposed test would expect to be less affected by individual extreme (too low or too high) observations and capable of detecting smaller, but consistent, differences between the distributions. -
Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests
Journal ofStatistical Modeling and Analytics Vol.2 No.I, 21-33, 2011 Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests Nornadiah Mohd Razali1 Yap Bee Wah 1 1Faculty ofComputer and Mathematica/ Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia E-mail: nornadiah@tmsk. uitm. edu.my, yapbeewah@salam. uitm. edu.my ABSTRACT The importance of normal distribution is undeniable since it is an underlying assumption of many statistical procedures such as I-tests, linear regression analysis, discriminant analysis and Analysis of Variance (ANOVA). When the normality assumption is violated, interpretation and inferences may not be reliable or valid. The three common procedures in assessing whether a random sample of independent observations of size n come from a population with a normal distribution are: graphical methods (histograms, boxplots, Q-Q-plots), numerical methods (skewness and kurtosis indices) and formal normality tests. This paper* compares the power offour formal tests of normality: Shapiro-Wilk (SW) test, Kolmogorov-Smirnov (KS) test, Lillie/ors (LF) test and Anderson-Darling (AD) test. Power comparisons of these four tests were obtained via Monte Carlo simulation of sample data generated from alternative distributions that follow symmetric and asymmetric distributions. Ten thousand samples ofvarious sample size were generated from each of the given alternative symmetric and asymmetric distributions. The power of each test was then obtained by comparing the test of normality statistics with the respective critical values. Results show that Shapiro-Wilk test is the most powerful normality test, followed by Anderson-Darling test, Lillie/ors test and Kolmogorov-Smirnov test. However, the power ofall four tests is still low for small sample size. -
1 Exercise 8: an Introduction to Descriptive and Nonparametric
1 Exercise 8: An Introduction to Descriptive and Nonparametric Statistics Elizabeth M. Jakob and Marta J. Hersek Goals of this lab 1. To understand why statistics are used 2. To become familiar with descriptive statistics 1. To become familiar with nonparametric statistical tests, how to conduct them, and how to choose among them 2. To apply this knowledge to sample research questions Background If we are very fortunate, our experiments yield perfect data: all the animals in one treatment group behave one way, and all the animals in another treatment group behave another way. Usually our results are not so clear. In addition, even if we do get results that seem definitive, there is always a possibility that they are a result of chance. Statistics enable us to objectively evaluate our results. Descriptive statistics are useful for exploring, summarizing, and presenting data. Inferential statistics are used for interpreting data and drawing conclusions about our hypotheses. Descriptive statistics include the mean (average of all of the observations; see Table 8.1), mode (most frequent data class), and median (middle value in an ordered set of data). The variance, standard deviation, and standard error are measures of deviation from the mean (see Table 8.1). These statistics can be used to explore your data before going on to inferential statistics, when appropriate. In hypothesis testing, a variety of statistical tests can be used to determine if the data best fit our null hypothesis (a statement of no difference) or an alternative hypothesis. More specifically, we attempt to reject one of these hypotheses. -
Nonparametric Analogues of Analysis of Variance
*\ NONPARAMETRIC ANALOGUES OF ANALYSIS OF VARIANCE by °\\4 RONALD K. LOHRDING B. A., Southwestern College, 1963 A MASTER'S REPORT submitted in partial fulfillment of the requirements for the degree MASTER OF SCIENCE Department of Statistics and Statistical Laboratory KANSAS STATE UNIVERSITY Manhattan, Kansas 1966 Approved by: ' UKC )x 0-i Major PrProfessor L IVoJL CONTENTS 1. INTRODUCTION 1 2. COMPLETELY RANDOMIZED DESIGN 4 2. 1. The Kruskal-Wallis One-way Analysis of Variance by Ranks 4 2. 2. Mosteller's k-Sample Slippage Test 7 2.3. Conover's k-Sample Slippage Test 8 2.4. A k-Sample Kolmogorov-Smirnov Test 11 2 2. 5. The Contingency x Test 13 3. RANDOMIZED COMPLETE BLOCK 14 3. 1. Cochran Q Test 15 3.2. Friedman's Two-way Analysis of Variance by Ranks . 16 4. PROCEDURES WHICH GENERALIZE TO SEVERAL DESIGNS . 17 4.1. Mood's Median Test 18 4.2. Wilson's Median Test 21 4.3. The Randomized Rank-Sum Test 27 4.4. Rank Test for Paired-Comparison Experiments ... 30 5. BALANCED INCOMPLETE BLOCK 33 5. 1. Bradley's Rank Analysis of Incomplete Block Designs 33 5.2. Durbin's Rank Analysis of Incomplete Block Designs 40 6. A 2x2 FACTORIAL EXPERIMENT 44 7. PARTIALLY BALANCED INCOMPLETE BLOCK 47 OTHER RELATED TESTS 50 ACKNOWLEDGEMENTS 53 REFERENCES 54 NONPARAMETRIC ANALOGUES OF ANALYSIS OF VARIANCE 1. Introduction J. V. Bradley (I960) proposes that the history of statistics can be divided into four major stages. The first or one parameter stage was when statistics were merely thought of as averages or in some instances ratios. -
An Averaging Estimator for Two Step M Estimation in Semiparametric Models
An Averaging Estimator for Two Step M Estimation in Semiparametric Models Ruoyao Shi∗ February, 2021 Abstract In a two step extremum estimation (M estimation) framework with a finite dimen- sional parameter of interest and a potentially infinite dimensional first step nuisance parameter, I propose an averaging estimator that combines a semiparametric esti- mator based on nonparametric first step and a parametric estimator which imposes parametric restrictions on the first step. The averaging weight is the sample analog of an infeasible optimal weight that minimizes the asymptotic quadratic risk. I show that under mild conditions, the asymptotic lower bound of the truncated quadratic risk dif- ference between the averaging estimator and the semiparametric estimator is strictly less than zero for a class of data generating processes (DGPs) that includes both cor- rect specification and varied degrees of misspecification of the parametric restrictions, and the asymptotic upper bound is weakly less than zero. Keywords: two step M estimation, semiparametric model, averaging estimator, uniform dominance, asymptotic quadratic risk JEL Codes: C13, C14, C51, C52 ∗Department of Economics, University of California Riverside, [email protected]. The author thanks Colin Cameron, Xu Cheng, Denis Chetverikov, Yanqin Fan, Jinyong Hahn, Bo Honoré, Toru Kitagawa, Zhipeng Liao, Hyungsik Roger Moon, Whitney Newey, Geert Ridder, Aman Ullah, Haiqing Xu and the participants at various seminars and conferences for helpful comments. This project is generously supported by UC Riverside Regents’ Faculty Fellowship 2019-2020. Zhuozhen Zhao provides great research assistance. All remaining errors are the author’s. 1 1 Introduction Semiparametric models, consisting of a parametric component and a nonparametric com- ponent, have gained popularity in economics.