Inference Procedures Based on Order Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Inference Procedures Based on Order Statistics INFERENCE PROCEDURES BASED ON ORDER STATISTICS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Jesse C. Frey, M.S. * * * * * The Ohio State University 2005 Dissertation Committee: Approved by Prof. H. N. Nagaraja, Adviser Prof. Steven N. MacEachern Adviser ¨ ¨ Prof. Omer Ozturk¨ Graduate Program in Prof. Douglas A. Wolfe Statistics ABSTRACT In this dissertation, we develop several new inference procedures that are based on order statistics. Each procedure is motivated by a particular statistical problem. The first problem we consider is that of computing the probability that a fully- specified collection of independent random variables has a particular ordering. We derive an equal conditionals condition under which such probabilities can be computed exactly, and we also derive extrapolation algorithms that allow approximation and computation of such probabilities in more general settings. Romberg integration is one idea that is used. The second problem we address is that of producing optimal distribution-free confidence bands for a cumulative distribution function. We treat this problem both in the case of simple random sampling and in the more general case in which the sample consists of independent order statistics from the distribution of interest. The latter case includes ranked-set sampling. We propose a family of optimality criteria motivated by the idea that good confidence bands are narrow, and we develop theory that makes the identification and computation of optimal bands possible. The Brunn-Minkowski Inequality from the theory of convex bodies plays a key role in this work. The third problem we investigate is that of how best to take advantage of auxiliary information when estimating a population mean. We develop a general procedure, intentionally representative sampling, that is unbiased in the nonparametric sense, yet offers great flexibility for incorporating auxiliary ii information. The final problem we consider is that of modeling imperfect judgment rankings in ranked-set sampling. We develop a new class of models so large that essentially any judgment rankings model is a limit of models in this class, and we propose an algorithm for selecting one-parameter families from the class. iii ACKNOWLEDGMENTS I thank my adviser, Dr. H. N. Nagaraja, for his advice, his encouragement, and his consistently positive outlook. I also thank the other members of my committee, Dr. Steven N. MacEachern, Dr. Omer¨ Ozt¨ urk,¨ and Dr. Douglas A. Wolfe, for their advice both on this dissertation and on other work during the past four years. iv VITA July 11, 1975 . Born - Greenville, SC 1996 . .B.S. Mathematics and Physics, B.A. History, Presbyterian College, Clinton, SC 2000 . .M.S. Mathematics, University of North Carolina, Chapel Hill, NC PUBLICATIONS Deshpande, J.V., Frey, J., and Ozt¨ urk,¨ O.¨ \Inference regarding the constant of pro- portionality in the Cox hazards model", 147{163 In: de Silva, B.M. and Mukhopad- hyay, N., eds., Proceedings of the International Sri Lankan Statistical Conference: Visions of Futuristic Methodologies, Melbourne: RMIT University, 2004. Frey, J. \A ranking method based on minimizing the number of in-sample errors", The American Statistician, 59 : 207{216, 2005. Frey, J. and Cressie, N. \Some results on constrained Bayes estimators", Statistics and Probability Letters, 65 : 389{399, 2003. FIELDS OF STUDY Major Field: Statistics v TABLE OF CONTENTS Page Abstract . ii Acknowledgments . iv Vita . v List of Tables . viii List of Figures . x Chapters: 1. Introduction . 1 2. Computational Algorithms . 5 2.1 Introduction . 5 2.2 The Exact Algorithm . 7 2.3 Extending the Algorithm to More General Cases . 13 2.4 A Product Trapezoid Rule Approach . 23 2.5 Some Applications of the General Algorithm . 29 3. Optimal Distribution-free Confidence Bands for a CDF . 34 3.1 An Overview of the Literature . 34 3.2 A Class of Distribution-free Confidence Bands . 44 3.3 A Class of Optimality Criteria . 50 3.4 Shape and Power Comparisons for the Confidence Band Procedures 70 3.5 Optimal Confidence Bands in the Ranked-set Sampling Case . 72 3.6 A Comparison of RSS-based Confidence Bands . 86 vi 4. Intentionally Representative Sampling . 91 4.1 Introduction . 91 4.2 Motivation . 93 4.3 The Sampling Procedure . 95 4.4 A Rank-Based Implementation Method . 98 4.5 Comparisons of Estimator Performance . 101 4.6 Discussion and Conclusions . 105 5. Imperfect Rankings Models for Ranked-set Sampling . 108 5.1 Introduction . 108 5.2 A New Class of Models . 109 5.3 Connection with More General Models . 118 5.4 Uniform Approximation Results . 122 5.5 A Method for Selecting One-Parameter Families . 129 5.6 Conclusions . 132 6. Summary and Future Work . 137 Appendices: A. R Code . 140 Bibliography . 146 vii LIST OF TABLES Table Page 2.1 Simultaneous 92.4% nonparametric confidence intervals for the deciles of an unknown continuous distribution when the sample size is n = 40. 13 2.2 The triangular array of approximations obtained via Romberg integration 18 2.3 Relative errors in computing the probability P (X X X ), where 1 ≤ 2 ≤ 3 X Beta(i; 4 i); i = 1; : : : ; 3: The true probability is 64 . 22 i ∼ − 105 2.4 Relative errors in computing the probability P (X X ), where 1 ≤ · · · ≤ 10 X Beta(i; 11 i); i = 1; : : : ; 10: The true probability is approxi- i ∼ − mately 0:00836. 23 2.5 Distribution of the ranks R[1]1; R[2]1, and R[3]1 when m = 3, n1 = n2 = n3 = 1, and the rankings are perfect. 30 2.6 Exact critical values (in scientific notation) and levels for the exact test. In cases where the levels 0.05 and 0.10 are not achieved (to four decimal places), the two bracketing values are given. 33 3.1 The values p(t; i) when the algorithm of Section 2.2 is applied to the f g example in Section 3.2. 48 3.2 The values s (i; j) when the algorithm of Section 2.2 is applied to f t g the example in Section 3.2. 49 3.3 Values of the optimality criterion (b a ) for 95% confidence bands i i − i when the set size is 3 and 1 cycle of observations is measured . 87 P 3.4 Values of the optimality criterion (b a ) for 95% confidence bands i i − i when the set size is 2 and 2 cycles of observations are measured . 90 P viii 4.1 Distribution of the sample to be measured. 95 4.2 The groups identified by the rank algorithm when n = 10 and k = 3. 100 4.3 Simulated efficiencies (relative to SRS) for the IRS and RSS mean estimators. 102 4.4 Simulated efficiencies (relative to SRS) for the IRS and RSS mean estimators when diameter is used as a covariate for height. The value r is the set size used for RSS. 103 ix LIST OF FIGURES Figure Page 2.1 The weights θi;j for the case in which n = 2 and M = 8. 26 3.1 Six different 95% confidence bands when n = 10. The bands are (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 73 3.2 Six different 95% confidence bands when n = 20. The bands are (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 74 3.3 Six different 95% confidence bands when n = 40. The bands are (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 75 3.4 Power curves for the 95% confidence bands when n = 10. The al- ternative is the Beta(η; η) distribution. The plots are for (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 76 x 3.5 Power curves for the 95% confidence bands when n = 20. The al- ternative is the Beta(η; η) distribution. The plots are for (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 77 3.6 Power curves for the 95% confidence bands when n = 40. The al- ternative is the Beta(η; η) distribution. The plots are for (a) the Kolmogorov-Smirnov band, (b) the Anderson-Darling band, (c) Owen's nonparametric likelihood band, (d) the uniform-weight optimal band, (e) an optimal band with weights emphasizing the extremes, and (f) an optimal band with weights de-emphasizing the extremes. 78 3.7 Optimal conditional 95% confidence bands based on RSS when the set size is 3 and 1 cycle of observations is measured. Each band is plot- ted against data taking on the expected values of the respective order statistics given the observed sequence of order statistics. Individual plots are labeled by the observed sequence of order statistics. 88 3.8 Optimal conditional 95% confidence bands based on RSS when the set size is 2 and 2 cycles of observations are measured.
Recommended publications
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Lecture 14: Order Statistics; Conditional Expectation
    Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 14: Order Statistics; Conditional Expectation Relevant textbook passages: Pitman [5]: Section 4.6 Larsen–Marx [2]: Section 3.11 14.1 Order statistics Pitman [5]: Given a random vector (X(1,...,Xn) on the probability) space (S, E,P ), for each s ∈ S, sort the § 4.6 components into a vector X(1)(s),...,X(n)(s) satisfying X(1)(s) 6 X(2)(s) 6 ··· 6 X(n)(s). The vector (X(1),...,X(n)) is called the vector of order statistics of (X1,...,Xn). Equivalently, { } X(k) = min max{Xj : j ∈ J} : J ⊂ {1, . , n} & |J| = k . Order statistics play an important role in the study of auctions, among other things. 14.2 Marginal Distribution of Order Statistics From here on, we shall assume that the original random variables X1,...,Xn are independent and identically distributed. Let X1,...,Xn be independent and identically distributed random variables with common cumulative distribution function F ,( and let (X) (1),...,X(n)) be the vector of order statistics of X1,...,Xn. By breaking the event X(k) 6 x into simple disjoint subevents, we get ( ) X(k) 6 x = ( ) X(n) 6 x ∪ ( ) X(n) > x, X(n−1) 6 x . ∪ ( ) X(n) > x, . , X(j+1) > x, X(j) 6 x . ∪ ( ) X(n) > x, . , X(k+1) > x, X(k) 6 x . 14–1 Ma 3/103 Winter 2017 KC Border Order Statistics; Conditional Expectation 14–2 Each of these subevents is disjoint from the ones above it, and each has a binomial probability: ( ) X(n) > x, .
    [Show full text]
  • Medians and Order Statistics
    Medians and Order Statistics CLRS Chapter 9 1 What Are Order Statistics? The k-th order statistic is the k-th smallest element of an array. 3 4 13 14 23 27 41 54 65 75 8th order statistic ⎢n⎥ ⎢ ⎥ The lower median is the ⎣ 2 ⎦ -th order statistic ⎡n⎤ ⎢ ⎥ The upper median is the ⎢ 2 ⎥ -th order statistic If n is odd, lower and upper median are the same 3 4 13 14 23 27 41 54 65 75 lower median upper median What are Order Statistics? Selecting ith-ranked item from a collection. – First: i = 1 – Last: i = n ⎢n⎥ ⎡n⎤ – Median(s): i = ⎢ ⎥,⎢ ⎥ ⎣2⎦ ⎢2⎥ 3 Order Statistics Overview • Assume collection is unordered, otherwise trivial. find ith order stat = A[i] • Can sort first – Θ(n lg n), but can do better – Θ(n). • I can find max and min in Θ(n) time (obvious) • Can we find any order statistic in linear time? (not obvious!) 4 Order Statistics Overview How can we modify Quicksort to obtain expected-case Θ(n)? ? ? Pivot, partition, but recur only on one set of data. No join. 5 Using the Pivot Idea • Randomized-Select(A[p..r],i) looking for ith o.s. if p = r return A[p] q <- Randomized-Partition(A,p,r) k <- q-p+1 the size of the left partition if i=k then the pivot value is the answer return A[q] else if i < k then the answer is in the front return Randomized-Select(A,p,q-1,i) else then the answer is in the back half return Randomized-Select(A,q+1,r,i-k) 6 Randomized Selection • Analyzing RandomizedSelect() – Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n) = O(n2) • No better than sorting! – “Best” case: suppose a 9:1 partition T(n) = T(9n/10)
    [Show full text]
  • Statistical Significance and Statistical Error in Antitrust Analysis
    STATISTICAL SIGNIFICANCE AND STATISTICAL ERROR IN ANTITRUST ANALYSIS PHILLIP JOHNSON EDWARD LEAMER JEFFREY LEITZINGER* Proof of antitrust impact and estimation of damages are central elements in antitrust cases. Generally, more is needed for these purposes than simple ob- servational evidence regarding changes in price levels over time. This is be- cause changes in economic conditions unrelated to the behavior at issue also may play a role in observed outcomes. For example, prices of consumer elec- tronics have been falling for several decades because of technological pro- gress. Against that backdrop, a successful price-fixing conspiracy may not lead to observable price increases but only slow their rate of decline. There- fore, proof of impact and estimation of damages often amounts to sorting out the effects on market outcomes of illegal behavior from the effects of other market supply and demand factors. Regression analysis is a statistical technique widely employed by econo- mists to identify the role played by one factor among those that simultane- ously determine market outcomes. In this way, regression analysis is well suited to proof of impact and estimation of damages in antitrust cases. For that reason, regression models have become commonplace in antitrust litigation.1 In our experience, one aspect of regression results that often attracts spe- cific attention in that environment is the statistical significance of the esti- mates. As is discussed below, some courts, participants in antitrust litigation, and commentators maintain that stringent levels of statistical significance should be a threshold requirement for the results of regression analysis to be used as evidence regarding impact and damages.
    [Show full text]
  • Exponential Families and Theoretical Inference
    EXPONENTIAL FAMILIES AND THEORETICAL INFERENCE Bent Jørgensen Rodrigo Labouriau August, 2012 ii Contents Preface vii Preface to the Portuguese edition ix 1 Exponential families 1 1.1 Definitions . 1 1.2 Analytical properties of the Laplace transform . 11 1.3 Estimation in regular exponential families . 14 1.4 Marginal and conditional distributions . 17 1.5 Parametrizations . 20 1.6 The multivariate normal distribution . 22 1.7 Asymptotic theory . 23 1.7.1 Estimation . 25 1.7.2 Hypothesis testing . 30 1.8 Problems . 36 2 Sufficiency and ancillarity 47 2.1 Sufficiency . 47 2.1.1 Three lemmas . 48 2.1.2 Definitions . 49 2.1.3 The case of equivalent measures . 50 2.1.4 The general case . 53 2.1.5 Completeness . 56 2.1.6 A result on separable σ-algebras . 59 2.1.7 Sufficiency of the likelihood function . 60 2.1.8 Sufficiency and exponential families . 62 2.2 Ancillarity . 63 2.2.1 Definitions . 63 2.2.2 Basu's Theorem . 65 2.3 First-order ancillarity . 67 2.3.1 Examples . 67 2.3.2 Main results . 69 iii iv CONTENTS 2.4 Problems . 71 3 Inferential separation 77 3.1 Introduction . 77 3.1.1 S-ancillarity . 81 3.1.2 The nonformation principle . 83 3.1.3 Discussion . 86 3.2 S-nonformation . 91 3.2.1 Definition . 91 3.2.2 S-nonformation in exponential families . 96 3.3 G-nonformation . 99 3.3.1 Transformation models . 99 3.3.2 Definition of G-nonformation . 103 3.3.3 Cox's proportional risks model .
    [Show full text]
  • Statistical Proof of Discrimination: Beyond "Damned Lies"
    Washington Law Review Volume 68 Number 3 7-1-1993 Statistical Proof of Discrimination: Beyond "Damned Lies" Kingsley R. Browne Follow this and additional works at: https://digitalcommons.law.uw.edu/wlr Part of the Labor and Employment Law Commons Recommended Citation Kingsley R. Browne, Statistical Proof of Discrimination: Beyond "Damned Lies", 68 Wash. L. Rev. 477 (1993). Available at: https://digitalcommons.law.uw.edu/wlr/vol68/iss3/2 This Article is brought to you for free and open access by the Law Reviews and Journals at UW Law Digital Commons. It has been accepted for inclusion in Washington Law Review by an authorized editor of UW Law Digital Commons. For more information, please contact [email protected]. Copyright © 1993 by Washington Law Review Association STATISTICAL PROOF OF DISCRIMINATION: BEYOND "DAMNED LIES" Kingsley R. Browne* Abstract Evidence that an employer's work force contains fewer minorities or women than would be expected if selection were random with respect to race and sex has been taken as powerful-and often sufficient-evidence of systematic intentional discrimina- tion. In relying on this kind of statistical evidence, courts have made two fundamental errors. The first error is assuming that statistical analysis can reveal the probability that observed work-force disparities were produced by chance. This error leads courts to exclude chance as a cause when such a conclusion is unwarranted. The second error is assuming that, except for random deviations, the work force of a nondiscriminating employer would mirror the racial and sexual composition of the relevant labor force. This assumption has led courts inappropriately to shift the burden of proof to employers in pattern-or-practice cases once a statistical disparity is shown.
    [Show full text]
  • Exploratory Versus Confirmative Testing in Clinical Trials
    xperim & E en Gaus et al., Clin Exp Pharmacol 2015, 5:4 l ta a l ic P in h DOI: 10.4172/2161-1459.1000182 l a C r m f Journal of o a l c a o n l o r g u y o J ISSN: 2161-1459 Clinical and Experimental Pharmacology Research Article Open Access Interpretation of Statistical Significance - Exploratory Versus Confirmative Testing in Clinical Trials, Epidemiological Studies, Meta-Analyses and Toxicological Screening (Using Ginkgo biloba as an Example) Wilhelm Gaus, Benjamin Mayer and Rainer Muche Institute for Epidemiology and Medical Biometry, University of Ulm, Germany *Corresponding author: Wilhelm Gaus, University of Ulm, Institute for Epidemiology and Medical Biometry, Schwabstrasse 13, 89075 Ulm, Germany, Tel : +49 731 500-26891; Fax +40 731 500-26902; E-mail: [email protected] Received date: June 18 2015; Accepted date: July 23 2015; Published date: July 27 2015 Copyright: © 2015 Gaus W, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Abstract The terms “significant” and “p-value” are important for biomedical researchers and readers of biomedical papers including pharmacologists. No other statistical result is misinterpreted as often as p-values. In this paper the issue of exploratory versus confirmative testing is discussed in general. A significant p-value sometimes leads to a precise hypothesis (exploratory testing), sometimes it is interpret as “statistical proof” (confirmative testing). A p-value may only be interpreted as confirmative, if (1) the hypothesis and the level of significance were established a priori and (2) an adjustment for multiple testing was carried out if more than one test was performed.
    [Show full text]
  • Decomposing High-Order Statistics for Sensitivity Analysis
    Center for Turbulence Research 139 Annual Research Briefs 2014 Decomposing high-order statistics for sensitivity analysis By G. Geraci, P.M. Congedo† AND G. Iaccarino 1. Motivation and objectives Sensitivity analysis in the presence of uncertainties in operating conditions, material properties, and manufacturing tolerances poses a tremendous challenge to the scientific computing community. In particular, in realistic situations, the presence of a large number of uncertain inputs complicates the task of propagation and assessment of output un- certainties; many of the popular techniques, such as stochastic collocation or polynomial chaos, lead to exponentially increasing costs, thus making these methodologies unfeasible (Foo & Karniadakis 2010). Handling uncertain parameters becomes even more challeng- ing when robust design optimization is of interest (Kim et al. 2006; Eldred 2009). One of the alternative solutions for reducing the cost of the Uncertainty Quantification (UQ) methods is based on approaches attempting to identify the relative importance of the in- put uncertainties. In the literature, global sensitivity analysis (GSA) aims at quantifying how uncertainties in the input parameters of a model contribute to the uncertainties in its output (Borgonovo et al. 2003). Traditionally, GSA is performed using methods based on the decomposition of the output variance (Sobol 2001), i.e., ANalysis Of VAriance, ANOVA. The ANOVA approach involves splitting a multi-dimensional function into its contributions from different groups of dependent variables. The ANOVA-based analysis creates a hierarchy of dominant input parameters, for a given output, when variations are computed in terms of variance. A limitation of this approach is the fact that it is based on the variance since it might not be a sufficient indicator of the overall output variations.
    [Show full text]
  • Speech Detection Using High Order Statistic (Skewness)
    Speech Detection Using High Order Statistic (Skewness) Juraj Kačur1, Matúš Jurečka2 1) Slovak University of Technology, Faculty of Electrical Engineering and Information Technology, Department of Telecommunications, Ilkovičová 3, Bratislava, Slovakia, e- mail: [email protected], phone: +421-2- 68279416 2) Slovak University of Technology, Faculty of Electrical Engineering and Information Technology, Department of Telecommunications, Ilkovičová 3, Bratislava, Slovakia, e- mail: [email protected] process. Another detection problem poses the Abstract speech itself, containing high energy vowels of In this article we discuses one of many various lengths as well as unvoiced consonants possible applications of high order statistic of low energy and higher frequency (HOS) to speech detection problem. As a components; even intervals of silence are speech feature we used skewness or third- regular parts of speech. Situation dramatically order cumulant with two zero lags in a simple deteriorates if there is some background noise voiced speech detection algorithm. We present and mixed up with useful signal. It is compared its properties with the energy important to stress that noises can have almost feature in symmetric noises, speech signals any spectral and time characteristic that makes and in various signal to noise ratios (SNR). their separation from speech sometime almost These results proofed skewness feature being impossible. much suitable than energy for detection The process of speech detection includes: purposes, given those environment settings. finding proper speech features distinguishing Furthermore, we executed detection accuracy speech from noise, evaluation of their tests, which showed skewness outperforming behaviour in the time and decision taking classical approach, especially in low SNRs’.
    [Show full text]
  • The Multivariate Normal Distribution
    The Multivariate Normal Distribution Why should we consider the multivariate normal distribution? It would seem that applied problems are so complex that it would only be interesting from a mathematical perspective. 1. It is mathematically tractable for a large number of problems, and, therefore, progress towards answers to statistical questions can be provided, even if only approximately so. 2. Because it is tractable for so many problems, it provides insight into techniques based upon other distributions or even non-parametric techniques. For this, it is often a benchmark against which other methods are judged. 3. For some problems it serves as a reasonable model of the data. In other instances, transfor- mations can be applied to the set of responses to have the set conform well to multivariate normality. 4. The sampling distribution of many (multivariate) statistics are normal, regardless of the parent distribution (Multivariate Central Limit Theorems). Thus, for large sample sizes, we may be able to make use of results from the multivariate normal distribution to answer our statistical questions, even when the parent distribution is not multivariate normal. Consider first the univariate normal distribution with parameters µ (the mean) and σ (the variance) for the random variable x, 2 1 − 1 (x−µ) f(x)=√ e 2 σ2 (1) 2πσ2 for −∞ <x<∞, −∞ <µ<∞,andσ2 > 0. Now rewrite the exponent (x − µ)2/σ2 using the linear algebra formulation of (x − µ)(σ2)−1(x − µ). This formulation matches that for the generalized or Mahalanobis squared distance (x − µ)Σ−1(x − µ), where both x and µ are vectors.
    [Show full text]
  • Algorithms Chapter 9 Medians and Order Statistics
    Algorithms Chapter 9 Medians and Order Statistics Assistant Professor: Ching‐Chi Lin 林清池 助理教授 [email protected] Department of Computer Science and Engineering National Taiwan Ocean University Outline ` Minimum and maximum ` Selection in expected linear time ` Selection in worst‐case linear time 2 Order statistics ` The ith order statistic of a set of n elements is the ith smallest element. ` The minimum is the first order statistic (i =1). ` The maximum is the nth order statistic (i = n). ` A median is the “halfway point” of the set. ` When n is odd, the median is unique, at i = (n + 1)/2. ` When n is even, there are two medians: ` The lower median: i = ⎣(n + 1)/2⎦, and ` The upper median: i = ⎡(n + 1)/2⎤. ` We mean lower median when we use the phrase “the median”. 3 The selection problem ` How can we find the ith order statistic of a set and what is the running time? ` Input: A set A of n (distinct) number and a number i, with 1 ≤ i ≤ n. ` Output: The element x ∈ A that is larger than exactly i–1 other elements of A. ` The selection problem can be solved in O(nlgn) time. ` Sort the numbers using an O(nlgn)‐time algorithm, such as heapsort or merge sort. ` Then return the ith element in the sorted array. ` Are there faster algorithms? ` An O(n)‐time algorithm would be presented in this chapter. 4 Finding minimum ` We can easily obtain an upper bound of n−1 comparisons for finding the minimum of a set of n elements.
    [Show full text]
  • Concomitants of Order Statistics Shie-Shien Yang Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1976 Concomitants of order statistics Shie-Shien Yang Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Yang, Shie-Shien, "Concomitants of order statistics " (1976). Retrospective Theses and Dissertations. 5720. https://lib.dr.iastate.edu/rtd/5720 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This material was produced from a microfilm copy of the original document. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the original submitted. The following explanation of techniques is provided to help you understand markings or patterns which may appear on this reproduction. 1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting thru an image and duplicating adjacent pages to insure you complete continuity. 2. When an image on the film is obliterated with a large round black mark, it is an indication that the photographer suspected that the copy may have moved during exposure and thus cause a blurred image.
    [Show full text]