Inference Procedures Based on Order Statistics

INFERENCE PROCEDURES BASED ON ORDER STATISTICS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Jesse C. Frey, M.S. * * * * * The Ohio State University 2005 Dissertation Committee: Approved by Prof. H. N. Nagaraja, Adviser Prof. Steven N. MacEachern Adviser ¨ ¨ Prof. Omer Ozturk¨ Graduate Program in Prof. Douglas A. Wolfe Statistics ABSTRACT In this dissertation, we develop several new inference procedures that are based on order statistics. Each procedure is motivated by a particular statistical problem. The first problem we consider is that of computing the probability that a fully- specified collection of independent random variables has a particular ordering. We derive an equal conditionals condition under which such probabilities can be computed exactly, and we also derive extrapolation algorithms that allow approximation and computation of such probabilities in more general settings. Romberg integration is one idea that is used. The second problem we address is that of producing optimal distribution-free confidence bands for a cumulative distribution function. We treat this problem both in the case of simple random sampling and in the more general case in which the sample consists of independent order statistics from the distribution of interest. The latter case includes ranked-set sampling. We propose a family of optimality criteria motivated by the idea that good confidence bands are narrow, and we develop theory that makes the identification and computation of optimal bands possible. The Brunn-Minkowski Inequality from the theory of convex bodies plays a key role in this work. The third problem we investigate is that of how best to take advantage of auxiliary information when estimating a population mean. We develop a general procedure, intentionally representative sampling, that is unbiased in the nonparametric sense, yet offers great flexibility for incorporating auxiliary ii information. The final problem we consider is that of modeling imperfect judgment rankings in ranked-set sampling. We develop a new class of models so large that essentially any judgment rankings model is a limit of models in this class, and we propose an algorithm for selecting one-parameter families from the class. iii ACKNOWLEDGMENTS I thank my adviser, Dr. H. N. Nagaraja, for his advice, his encouragement, and his consistently positive outlook. I also thank the other members of my committee, Dr. Steven N. MacEachern, Dr. Omer¨ Ozt¨ urk,¨ and Dr. Douglas A. Wolfe, for their advice both on this dissertation and on other work during the past four years. iv VITA July 11, 1975 . Born - Greenville, SC 1996 . .B.S. Mathematics and Physics, B.A. History, Presbyterian College, Clinton, SC 2000 . .M.S. Mathematics, University of North Carolina, Chapel Hill, NC PUBLICATIONS Deshpande, J.V., Frey, J., and Ozt¨ urk,¨ O.¨ \Inference regarding the constant of pro- portionality in the Cox hazards model", 147{163 In: de Silva, B.M. and Mukhopad- hyay, N., eds., Proceedings of the International Sri Lankan Statistical Conference: Visions of Futuristic Methodologies, Melbourne: RMIT University, 2004. Frey, J. \A ranking method based on minimizing the number of in-sample errors", The American Statistician, 59 : 207{216, 2005. Frey, J. and Cressie, N. \Some results on constrained Bayes estimators", Statistics and Probability Letters, 65 : 389{399, 2003. FIELDS OF STUDY Major Field: Statistics v TABLE OF CONTENTS Page Abstract . ii Acknowledgments . iv Vita . v List of Tables . viii List of Figures . x Chapters: 1. Introduction . 1 2. Computational Algorithms . 5 2.1 Introduction . 5 2.2 The Exact Algorithm . 7 2.3 Extending the Algorithm to More General Cases . 13 2.4 A Product Trapezoid Rule Approach . 23 2.5 Some Applications of the General Algorithm . 29 3. Optimal Distribution-free Confidence Bands for a CDF . 34 3.1 An Overview of the Literature . 34 3.2 A Class of Distribution-free Confidence Bands . 44 3.3 A Class of Optimality Criteria . 50 3.4 Shape and Power Comparisons for the Confidence Band Procedures 70 3.5 Optimal Confidence Bands in the Ranked-set Sampling Case . 72 3.6 A Comparison of RSS-based Confidence Bands . 86 vi 4. Intentionally Representative Sampling . 91 4.1 Introduction . 91 4.2 Motivation . 93 4.3 The Sampling Procedure . 95 4.4 A Rank-Based Implementation Method . 98 4.5 Comparisons of Estimator Performance . 101 4.6 Discussion and Conclusions . 105 5. Imperfect Rankings Models for Ranked-set Sampling . 108 5.1 Introduction . 108 5.2 A New Class of Models . 109 5.3 Connection with More General Models . 118 5.4 Uniform Approximation Results . 122 5.5 A Method for Selecting One-Parameter Families . 129 5.6 Conclusions . 132 6. Summary and Future Work . 137 Appendices: A. R Code . 140 Bibliography . 146 vii LIST OF TABLES Table Page 2.1 Simultaneous 92.4% nonparametric confidence intervals for the deciles of an unknown continuous distribution when the sample size is n = 40. 13 2.2 The triangular array of approximations obtained via Romberg integration 18 2.3 Relative errors in computing the probability P (X X X ), where 1 ≤ 2 ≤ 3 X Beta(i; 4 i); i = 1; : : : ; 3: The true probability is 64 . 22 i ∼ − 105 2.4 Relative errors in computing the probability P (X X ), where 1 ≤ · · · ≤ 10 X Beta(i; 11 i); i = 1; : : : ; 10: The true probability is approxi- i ∼ − mately 0:00836. 23 2.5 Distribution of the ranks R[1]1; R[2]1, and R[3]1 when m = 3, n1 = n2 = n3 = 1, and the rankings are perfect. 30 2.6 Exact critical values (in scientific notation) and levels for the exact test. In cases where the levels 0.05 and 0.10 are not achieved (to four decimal places), the two bracketing values are given. 33 3.1 The values p(t; i) when the algorithm of Section 2.2 is applied to the f g example in Section 3.2. 48 3.2 The values s (i; j) when the algorithm of Section 2.2 is applied to f t g the example in Section 3.2. 49 3.3 Values of the optimality criterion (b a ) for 95% confidence bands i i − i when the set size is 3 and 1 cycle of observations is measured . 87 P 3.4 Values of the optimality criterion (b a ) for 95% confidence bands i i − i when the set size is 2 and 2 cycles of observations are measured . 90 P viii 4.1 Distribution of the sample to be measured. 95 4.2 The groups identified by the rank algorithm when n = 10 and k = 3. 100 4.3 Simulated efficiencies (relative to SRS) for the IRS and RSS mean estimators. 102 4.4 Simulated efficiencies (relative to SRS) for the IRS and RSS mean estimators when diameter is used as a covariate for height. The value r is the set size used for RSS. 103 ix LIST OF FIGURES Figure Page 2.1 The weights θi;j for the case in which n = 2 and M = 8. 26 3.1 Six different 95% confidence bands when n = 10. The bands are (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 73 3.2 Six different 95% confidence bands when n = 20. The bands are (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 74 3.3 Six different 95% confidence bands when n = 40. The bands are (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 75 3.4 Power curves for the 95% confidence bands when n = 10. The al- ternative is the Beta(η; η) distribution. The plots are for (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 76 x 3.5 Power curves for the 95% confidence bands when n = 20. The al- ternative is the Beta(η; η) distribution. The plots are for (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 77 3.6 Power curves for the 95% confidence bands when n = 40. The al- ternative is the Beta(η; η) distribution. The plots are for (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 78 3.7 Optimal conditional 95% confidence bands based on RSS when the set size is 3 and 1 cycle of observations is measured. Each band is plot- ted against data taking on the expected values of the respective order statistics given the observed sequence of order statistics. Individual plots are labeled by the observed sequence of order statistics. 88 3.8 Optimal conditional 95% confidence bands based on RSS when the set size is 2 and 2 cycles of observations are measured.

Inference Procedures Based on Order Statistics

The Probability Lifesaver: Order Statistics and the Median Theorem

Lecture 14: Order Statistics; Conditional Expectation

Medians and Order Statistics

Statistical Significance and Statistical Error in Antitrust Analysis

Exponential Families and Theoretical Inference

Statistical Proof of Discrimination: Beyond "Damned Lies"

Exploratory Versus Confirmative Testing in Clinical Trials

Decomposing High-Order Statistics for Sensitivity Analysis

Speech Detection Using High Order Statistic (Skewness)

The Multivariate Normal Distribution

Algorithms Chapter 9 Medians and Order Statistics

Concomitants of Order Statistics Shie-Shien Yang Iowa State University