Adaptive Mixtures of Student-T Distributions by David Ardia, Lennart F

Total Page:16

File Type:pdf, Size:1020Kb

Adaptive Mixtures of Student-T Distributions by David Ardia, Lennart F CONTRIBUTED RESEARCH ARTICLES 25 AdMit: Adaptive Mixtures of Student-t Distributions by David Ardia, Lennart F. Hoogerheide and Herman K. fatter tails than the Normal distribution; especially van Dijk if one specifies Student-t distributions with few de- grees of freedom, the risk is small that the tails of the candidate are thinner than those of the target Introduction distribution. Finally, Zeevi and Meir(1997) showed that under certain conditions any density function This note presents the package AdMit (Ardia et al., may be approximated to arbitrary accuracy by a con- 2008, 2009), an R implementation of the adaptive vex combination of basis densities; the mixture of mixture of Student-t distributions (AdMit) proce- Student-t distributions falls within their framework. dure developed by Hoogerheide(2006); see also The package AdMit consists of three main func- Hoogerheide et al.(2007); Hoogerheide and van Dijk tions: AdMit, AdMitIS and AdMitMH. The first one al- (2008). The AdMit strategy consists of the construc- lows the user to fit a mixture of Student-t distribu- tion of a mixture of Student-t distributions which tions to a given density through its kernel function. approximates a target distribution of interest. The The next two functions perform importance sam- fitting procedure relies only on a kernel of the tar- pling and independence chain M-H sampling using get density, so that the normalizing constant is not the fitted mixture estimated by AdMit as the impor- required. In a second step, this approximation is tance or candidate density, respectively. used as an importance function in importance sam- To illustrate the use of the package, we apply the pling or as a candidate density in the independence AdMit methodology to a bivariate bimodal distribu- chain Metropolis-Hastings (M-H) algorithm to esti- tion. We describe the use of the functions provided mate characteristics of the target density. The esti- by the package and document the ability and rele- mation procedure is fully automatic and thus avoids vance of the methodology to reproduce the shape of the difficult task, especially for non-experts, of tun- non-elliptical distributions. ing a sampling algorithm. Typically, the target is a posterior distribution in a Bayesian analysis, where we indeed often only know a kernel of the posterior Illustration density. In a standard case of importance sampling or the This section presents the functions provided by the independence chain M-H algorithm, the candidate package AdMit with an illustration of a bivariate bi- density is unimodal. If the target distribution is mul- modal distribution. This distribution belongs to the timodal then some draws may have huge weights class of conditionally Normal distributions proposed in the importance sampling approach and a second by Gelman and Meng(1991) with the property that mode may be completely missed in the M-H strat- the joint density is not Normal. It is not a posterior egy. As a consequence, the convergence behavior of distribution, but it is chosen because it is a simple these Monte Carlo integration methods is rather un- distribution with non-elliptical shapes so that it al- certain. Thus, an important problem is the choice of lows for an easy illustration of the AdMit approach. the importance or candidate density, especially when Let X1 and X2 be two random variables, for little is known a priori about the shape of the target which X1 is Normally distributed given X2 and vice density. For both importance sampling and the in- versa. Then, the joint distribution, after location and dependence chain M-H, it holds that the candidate scale transformations in each variable, can be written density should be close to the target density, and it as (see Gelman and Meng, 1991): is especially important that the tails of the candidate . h 1 2 2 2 2 p(x) ∝ k(x) = exp − 2 Ax1x2 + x1 + x2 should not be thinner than those of the target. (1) i Hoogerheide(2006) and Hoogerheide et al.(2007) − 2Bx1x2 − 2C1x1 − 2C2x2 mention several reasons why mixtures of Student-t . distributions are natural candidate densities. First, where p(x) denotes a density, k(x) a kernel and x = 0 they can provide an accurate approximation to a (x1, x2) for notational purposes. A, B, C1 and C2 are wide variety of target densities, with substantial constants; we consider an asymmetric case in which skewness and high kurtosis. Furthermore, they A = 5, B = 5, C1 = 3, C2 = 3.5 in what follows. can deal with multi-modality and with non-elliptical The adaptive mixture approach determines the shapes due to asymptotes. Second, this approxima- number of mixture components H, the mixing prob- tion can be constructed in a quick, iterative proce- abilities, the modes and scale matrices of the compo- dure and a mixture of Student-t distributions is easy nents in such a way that the mixture density q(x) ap- to sample from. Third, the Student-t distribution has proximates the target density p(x) of which we only The R Journal Vol. 1/1, May 2009 ISSN 2073-4859 26 CONTRIBUTED RESEARCH ARTICLES know a kernel function k(x) with x 2 Rd. Typically, default, the algorithm uses the value one (i.e. the ap- k(x) will be a posterior density kernel for a vector of proximation density is a mixture of Cauchy) since i) model parameters x. it enables the method to deal with fat-tailed target The AdMit strategy consists of the following distributions and ii) it makes it easier for the iterative steps: procedure to detect modes that are far apart. There may be a trade-off between efficiency and robustness (0) Initialization: computation of the mode and at this point: setting the degrees of freedom to one re- scale matrix of the first component, and draw- flects that we find a higher robustness more valuable ing a sample from this Student-t distribution; than a slight possible gain in efficiency. (1) Iterate on the number of components: add a The core function provided by the package is the AdMit new component that covers a part of the space function . The main arguments of the function KERNEL of x where the previous mixture density was are: , a kernel function of the target density on relatively small, as compared to k(x); which the approximation is constructed. This func- tion must contain the logical argument log. When (2) Optimization of the mixing probabilities; log = TRUE, the function KERNEL returns the loga- rithm value of the kernel function; this is used for nu- (3) Drawing a sample fx[i]g from the new mixture merical stability. mu0 is the starting value of the first q(x); stage optimization; it is a vector whose length corre- sponds to the length of the first argument in KERNEL. (4) Evaluation of importance sampling weights Sigma0 is the (symmetric positive definite) scale ma- [i] [i] fk(x )/q(x )g. If the coefficient of variation, trix of the first component. If a matrix is provided by i.e. the standard deviation divided by the the user, it is used as the scale matrix of the first com- mean, of the weights has converged, then stop. ponent and then mu0 is used as the mode of the first Otherwise, go to step (1). By default, the adap- component (instead of a starting value for optimiza- tive procedure stops if the relative change in tion). control is a list of tuning parameters, contain- the coefficient of variation of the importance ing in particular: df (default: 1), the degrees of free- sampling weights caused by adding one new dom of the mixture components and CVtol (default: Student-t component to the candidate mixture 0.1), the tolerance of the relative change of the co- is less than 10%. efficient of variation. For further details, the reader is referred to the documentation manual (by typing There are two main reasons for using the coefficient ?AdMit) or to Ardia et al.(2009). Finally, the last ar- of variation of the importance sampling weights as gument of AdMit is ... which allows the user to pass a measure of the fitting quality. First, it is a nat- additional arguments to the function KERNEL. ural, intuitive measure of quality of the candidate Let us come back to our bivariate conditionally as an approximation to the target. If the candidate Normal distribution. First, we need to code the ker- and the target distributions coincide, all importance nel: sampling weights are equal, so that the coefficient of variation is zero. For a poor candidate that not > GelmanMeng <- function(x, log = TRUE) even roughly approximates the target, some impor- + { tance sampling weights are huge while most are (al- + if (is.vector(x)) most) zero, so that the coefficient of variation is high. + x <- matrix(x, nrow = 1) + r <- -0.5 * ( 5 * x[,1]^2 * x[,2]^2 The better the candidate approximates the target, the + + x[,1]^2 + x[,2]^2 more evenly the weight is divided among the can- + - 10 * x[,1] * x[,2] didate draws, and the smaller the coefficient of vari- + - 6 * x[,1] - 7 * x[,2] ) ation of the importance sampling weights. Second, + if (!log) Geweke(1989) argues that a reasonable objective in + r <- exp(r) the choice of an importance density is the minimiza- + as.vector(r) tion of the variance, or equivalently the coefficient of + } variation, of the importance weights. We prefer to Note that the argument log is set to TRUE by default quote the coefficient of variation because it does not so that the function outputs the logarithm of the ker- depend on the number of draws or the integration nel.
Recommended publications
  • Nonparametric Density Estimation Using Kernels with Variable Size Windows Anil Ramchandra Londhe Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1980 Nonparametric density estimation using kernels with variable size windows Anil Ramchandra Londhe Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Londhe, Anil Ramchandra, "Nonparametric density estimation using kernels with variable size windows " (1980). Retrospective Theses and Dissertations. 6737. https://lib.dr.iastate.edu/rtd/6737 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This was produced from a copy of a document sent to us for microfilming. WhDe the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help you understand markings or notations which may appear on this reproduction. 1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure you of complete continuity. 2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy.
    [Show full text]
  • MDL Histogram Density Estimation
    MDL Histogram Density Estimation Petri Kontkanen, Petri Myllym¨aki Complex Systems Computation Group (CoSCo) Helsinki Institute for Information Technology (HIIT) University of Helsinki and Helsinki University of Technology P.O.Box 68 (Department of Computer Science) FIN-00014 University of Helsinki, Finland {Firstname}.{Lastname}@hiit.fi Abstract only on finding the optimal bin count. These regu- lar histograms are, however, often problematic. It has been argued (Rissanen, Speed, & Yu, 1992) that reg- We regard histogram density estimation as ular histograms are only good for describing roughly a model selection problem. Our approach uniform data. If the data distribution is strongly non- is based on the information-theoretic min- uniform, the bin count must necessarily be high if one imum description length (MDL) principle, wants to capture the details of the high density portion which can be applied for tasks such as data of the data. This in turn means that an unnecessary clustering, density estimation, image denois- large amount of bins is wasted in the low density re- ing and model selection in general. MDL- gion. based model selection is formalized via the normalized maximum likelihood (NML) dis- To avoid the problems of regular histograms one must tribution, which has several desirable opti- allow the bins to be of variable width. For these irreg- mality properties. We show how this frame- ular histograms, it is necessary to find the optimal set work can be applied for learning generic, ir- of cut points in addition to the number of bins, which regular (variable-width bin) histograms, and naturally makes the learning problem essentially more how to compute the NML model selection difficult.
    [Show full text]
  • Chapter 6: Random Errors in Chemical Analysis
    Chapter 6: Random Errors in Chemical Analysis Source slideplayer.com/Fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch Random errors are present in every measurement no matter how careful the experimenter. Random, or indeterminate, errors can never be totally eliminated and are often the major source of uncertainty in a determination. Random errors are caused by the many uncontrollable variables that accompany every measurement. The accumulated effect of the individual uncertainties causes replicate results to fluctuate randomly around the mean of the set. In this chapter, we consider the sources of random errors, the determination of their magnitude, and their effects on computed results of chemical analyses. We also introduce the significant figure convention and illustrate its use in reporting analytical results. 6A The nature of random errors - random error in the results of analysts 2 and 4 is much larger than that seen in the results of analysts 1 and 3. - The results of analyst 3 show outstanding precision but poor accuracy. The results of analyst 1 show excellent precision and good accuracy. Figure 6-1 A three-dimensional plot showing absolute error in Kjeldahl nitrogen determinations for four different analysts. Random Error Sources - Small undetectable uncertainties produce a detectable random error in the following way. - Imagine a situation in which just four small random errors combine to give an overall error. We will assume that each error has an equal probability of occurring and that each can cause the final result to be high or low by a fixed amount ±U. - Table 6.1 gives all the possible ways in which four errors can combine to give the indicated deviations from the mean value.
    [Show full text]
  • A Least-Squares Approach to Direct Importance Estimation∗
    JournalofMachineLearningResearch10(2009)1391-1445 Submitted 3/08; Revised 4/09; Published 7/09 A Least-squares Approach to Direct Importance Estimation∗ Takafumi Kanamori [email protected] Department of Computer Science and Mathematical Informatics Nagoya University Furocho, Chikusaku, Nagoya 464-8603, Japan Shohei Hido [email protected] IBM Research Tokyo Research Laboratory 1623-14 Shimotsuruma, Yamato-shi, Kanagawa 242-8502, Japan Masashi Sugiyama [email protected] Department of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan Editor: Bianca Zadrozny Abstract We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. Keywords: importance sampling, covariate shift adaptation, novelty detection, regularization path, leave-one-out cross validation 1. Introduction In the context of importance sampling (Fishman, 1996), the ratio of two probability density func- tions is called the importance. The problem of estimating the importance is attracting a great deal of attention these days since the importance can be used for various succeeding tasks such as covariate shift adaptation or outlier detection.
    [Show full text]
  • Lecture 2: Density Estimation 2.1 Histogram
    STAT 535: Statistical Machine Learning Autumn 2019 Lecture 2: Density Estimation Instructor: Yen-Chi Chen Main reference: Section 6 of All of Nonparametric Statistics by Larry Wasserman. A book about the methodologies of density estimation: Multivariate Density Estimation: theory, practice, and visualization by David Scott. A more theoretical book (highly recommend if you want to learn more about the theory): Introduction to Nonparametric Estimation by A.B. Tsybakov. Density estimation is the problem of reconstructing the probability density function using a set of given data points. Namely, we observe X1; ··· ;Xn and we want to recover the underlying probability density function generating our dataset. 2.1 Histogram If the goal is to estimate the PDF, then this problem is called density estimation, which is a central topic in statistical research. Here we will focus on the perhaps simplest approach: histogram. For simplicity, we assume that Xi 2 [0; 1] so p(x) is non-zero only within [0; 1]. We also assume that p(x) is smooth and jp0(x)j ≤ L for all x (i.e. the derivative is bounded). The histogram is to partition the set [0; 1] (this region, the region with non-zero density, is called the support of a density function) into several bins and using the count of the bin as a density estimate. When we have M bins, this yields a partition: 1 1 2 M − 2 M − 1 M − 1 B = 0; ;B = ; ; ··· ;B = ; ;B = ; 1 : 1 M 2 M M M−1 M M M M In such case, then for a given point x 2 B`, the density estimator from the histogram will be n number of observations within B` 1 M X p (x) = × = I(X 2 B ): bM n length of the bin n i ` i=1 The intuition of this density estimator is that the histogram assign equal density value to every points within 1 Pn the bin.
    [Show full text]
  • An Approach for Appraising the Accuracy of Suspended-Sediment Data
    An Approach for Appraising the Accuracy of Suspended-sediment Data U.S. GEOLOGICAL SURVEY PROFESSIONAL PAl>£R 1383 An Approach for Appraising the Accuracy of Suspended-sediment Data By D. E. BURKHAM U.S. GEOLOGICAL SURVEY PROFESSIONAL PAPER 1333 UNITED STATES GOVERNMENT PRINTING OFFICE, WASHINGTON, 1985 DEPARTMENT OF THE INTERIOR DONALD PAUL MODEL, Secretary U.S. GEOLOGICAL SURVEY Dallas L. Peck, Director First printing 1985 Second printing 1987 For sale by the Books and Open-File Reports Section, U.S. Geological Survey, Federal Center, Box 25425, Denver, CO 80225 CONTENTS Page Page Abstract ........... 1 Spatial error Continued Introduction ....... 1 Application of method ................................ 11 Problem ......... 1 Basic data ......................................... 11 Purpose and scope 2 Standard spatial error for multivertical procedure .... 11 Sampling error .......................................... 2 Standard spatial error for single-vertical procedure ... 13 Discussion of error .................................... 2 Temporal error ......................................... 13 Approach to solution .................................. 3 Discussion of error ................................... 13 Application of method ................................. 4 Approach to solution ................................. 14 Basic data .......................................... 4 Application of method ................................ 14 Standard sampling error ............................. 4 Basic data ........................................
    [Show full text]
  • Approximating the Distribution of the Product of Two Normally Distributed Random Variables
    S S symmetry Article Approximating the Distribution of the Product of Two Normally Distributed Random Variables Antonio Seijas-Macías 1,2 , Amílcar Oliveira 2,3 , Teresa A. Oliveira 2,3 and Víctor Leiva 4,* 1 Departamento de Economía, Universidade da Coruña, 15071 A Coruña, Spain; [email protected] 2 CEAUL, Faculdade de Ciências, Universidade de Lisboa, 1649-014 Lisboa, Portugal; [email protected] (A.O.); [email protected] (T.A.O.) 3 Departamento de Ciências e Tecnologia, Universidade Aberta, 1269-001 Lisboa, Portugal 4 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile * Correspondence: [email protected] or [email protected] Received: 21 June 2020; Accepted: 18 July 2020; Published: 22 July 2020 Abstract: The distribution of the product of two normally distributed random variables has been an open problem from the early years in the XXth century. First approaches tried to determinate the mathematical and statistical properties of the distribution of such a product using different types of functions. Recently, an improvement in computational techniques has performed new approaches for calculating related integrals by using numerical integration. Another approach is to adopt any other distribution to approximate the probability density function of this product. The skew-normal distribution is a generalization of the normal distribution which considers skewness making it flexible. In this work, we approximate the distribution of the product of two normally distributed random variables using a type of skew-normal distribution. The influence of the parameters of the two normal distributions on the approximation is explored. When one of the normally distributed variables has an inverse coefficient of variation greater than one, our approximation performs better than when both normally distributed variables have inverse coefficients of variation less than one.
    [Show full text]
  • Analysis of the Coefficient of Variation in Shear and Tensile Bond Strength Tests
    J Appl Oral Sci 2005; 13(3): 243-6 www.fob.usp.br/revista or www.scielo.br/jaos ANALYSIS OF THE COEFFICIENT OF VARIATION IN SHEAR AND TENSILE BOND STRENGTH TESTS ANÁLISE DO COEFICIENTE DE VARIAÇÃO EM TESTES DE RESISTÊNCIA DA UNIÃO AO CISALHAMENTO E TRAÇÃO Fábio Lourenço ROMANO1, Gláucia Maria Bovi AMBROSANO1, Maria Beatriz Borges de Araújo MAGNANI1, Darcy Flávio NOUER1 1- MSc, Assistant Professor, Department of Orthodontics, Alfenas Pharmacy and Dental School – Efoa/Ceufe, Minas Gerais, Brazil. 2- DDS, MSc, Associate Professor of Biostatistics, Department of Social Dentistry, Piracicaba Dental School – UNICAMP, São Paulo, Brazil; CNPq researcher. 3- DDS, MSc, Assistant Professor of Orthodontics, Department of Child Dentistry, Piracicaba Dental School – UNICAMP, São Paulo, Brazil. 4- DDS, MSc, Full Professor of Orthodontics, Department of Child Dentistry, Piracicada Dental School – UNICAMP, São Paulo, Brazil. Corresponding address: Fábio Lourenço Romano - Avenida do Café, 131 Bloco E, Apartamento 16 - Vila Amélia - Ribeirão Preto - SP Cep.: 14050-230 - Phone: (16) 636 6648 - e-mail: [email protected] Received: July 29, 2004 - Modification: September 09, 2004 - Accepted: June 07, 2005 ABSTRACT The coefficient of variation is a dispersion measurement that does not depend on the unit scales, thus allowing the comparison of experimental results involving different variables. Its calculation is crucial for the adhesive experiments performed in laboratories because both precision and reliability can be verified. The aim of this study was to evaluate and to suggest a classification of the coefficient variation (CV) for in vitro experiments on shear and tensile strengths. The experiments were performed in laboratory by fifty international and national studies on adhesion materials.
    [Show full text]
  • Density Estimation 36-708
    Density Estimation 36-708 1 Introduction Let X1;:::;Xn be a sample from a distribution P with density p. The goal of nonparametric density estimation is to estimate p with as few assumptions about p as possible. We denote the estimator by pb. The estimator will depend on a smoothing parameter h and choosing h carefully is crucial. To emphasize the dependence on h we sometimes write pbh. Density estimation used for: regression, classification, clustering and unsupervised predic- tion. For example, if pb(x; y) is an estimate of p(x; y) then we get the following estimate of the regression function: Z mb (x) = ypb(yjx)dy where pb(yjx) = pb(y; x)=pb(x). For classification, recall that the Bayes rule is h(x) = I(p1(x)π1 > p0(x)π0) where π1 = P(Y = 1), π0 = P(Y = 0), p1(x) = p(xjy = 1) and p0(x) = p(xjy = 0). Inserting sample estimates of π1 and π0, and density estimates for p1 and p0 yields an estimate of the Bayes rule. For clustering, we look for the high density regions, based on an estimate of the density. Many classifiers that you are familiar with can be re-expressed this way. Unsupervised prediction is discussed in Section 9. In this case we want to predict Xn+1 from X1;:::;Xn. Example 1 (Bart Simpson) The top left plot in Figure 1 shows the density 4 1 1 X p(x) = φ(x; 0; 1) + φ(x;(j=2) − 1; 1=10) (1) 2 10 j=0 where φ(x; µ, σ) denotes a Normal density with mean µ and standard deviation σ.
    [Show full text]
  • Nonparametric Estimation of Probability Density Functions of Random Persistence Diagrams
    Journal of Machine Learning Research 20 (2019) 1-49 Submitted 9/18; Revised 4/19; Published 7/19 Nonparametric Estimation of Probability Density Functions of Random Persistence Diagrams Vasileios Maroulas [email protected] Department of Mathematics University of Tennessee Knoxville, TN 37996, USA Joshua L Mike [email protected] Computational Mathematics, Science, and Engineering Department Michigan State University East Lansing, MI 48823, USA Christopher Oballe [email protected] Department of Mathematics University of Tennessee Knoxville, TN 37996, USA Editor: Boaz Nadler Abstract Topological data analysis refers to a broad set of techniques that are used to make inferences about the shape of data. A popular topological summary is the persistence diagram. Through the language of random sets, we describe a notion of global probability density function for persistence diagrams that fully characterizes their behavior and in part provides a noise likelihood model. Our approach encapsulates the number of topological features and considers the appearance or disappearance of those near the diagonal in a stable fashion. In particular, the structure of our kernel individually tracks long persistence features, while considering those near the diagonal as a collective unit. The choice to describe short persistence features as a group reduces computation time while simultaneously retaining accuracy. Indeed, we prove that the associated kernel density estimate converges to the true distribution as the number of persistence diagrams increases and the bandwidth shrinks accordingly. We also establish the convergence of the mean absolute deviation estimate, defined according to the bottleneck metric. Lastly, examples of kernel density estimation are presented for typical underlying datasets as well as for virtual electroencephalographic data related to cognition.
    [Show full text]
  • "The Distribution of Mckay's Approximation for the Coefficient Of
    ARTICLE IN PRESS Statistics & Probability Letters 78 (2008) 10–14 www.elsevier.com/locate/stapro The distribution of McKay’s approximation for the coefficient of variation Johannes Forkmana,Ã, Steve Verrillb aDepartment of Biometry and Engineering, Swedish University of Agricultural Sciences, P.O. Box 7032, SE-750 07 Uppsala, Sweden bU.S.D.A., Forest Products Laboratory, 1 Gifford Pinchot Drive, Madison, WI 53726, USA Received 12 July 2006; received in revised form 21 February 2007; accepted 10 April 2007 Available online 7 May 2007 Abstract McKay’s chi-square approximation for the coefficient of variation is type II noncentral beta distributed and asymptotically normal with mean n À 1 and variance smaller than 2ðn À 1Þ. r 2007 Elsevier B.V. All rights reserved. Keywords: Coefficient of variation; McKay’s approximation; Noncentral beta distribution 1. Introduction The coefficient of variation is defined as the standard deviation divided by the mean. This measure, which is commonly expressed as a percentage, is widely used since it is often necessary to relate the size of the variation to the level of the observations. McKay (1932) introduced a w2 approximation for the coefficient of variation calculated on normally distributed observations. It can be defined in the following way. Definition 1. Let yj, j ¼ 1; ...; n; be n independent observations from a normal distribution with expected value m and variance s2. Let g denote the population coefficient of variation, i.e. g ¼ s=m, and let c denote the sample coefficient of variation, i.e. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 u 1 Xn 1 Xn c ¼ t ðy À mÞ2; m ¼ y .
    [Show full text]
  • DENSITY ESTIMATION INCLUDING EXAMPLES Hans-Georg Müller
    DENSITY ESTIMATION INCLUDING EXAMPLES Hans-Georg M¨ullerand Alexander Petersen Department of Statistics University of California Davis, CA 95616 USA In order to gain information about an underlying continuous distribution given a sample of independent data, one has two major options: • Estimate the distribution and probability density function by assuming a finitely- parameterized model for the data and then estimating the parameters of the model by techniques such as maximum likelihood∗(Parametric approach). • Estimate the probability density function nonparametrically by assuming only that it is \smooth" in some sense or falls into some other, appropriately re- stricted, infinite dimensional class of functions (Nonparametric approach). When aiming to assess basic characteristics of a distribution such as skewness∗, tail behavior, number, location and shape of modes∗, or level sets, obtaining an estimate of the probability density function, i.e., the derivative of the distribution function∗, is often a good approach. A histogram∗ is a simple and ubiquitous form of a density estimate, a basic version of which was used already by the ancient Greeks for pur- poses of warfare in the 5th century BC, as described by the historian Thucydides in his History of the Peloponnesian War. Density estimates provide visualization of the distribution and convey considerably more information than can be gained from look- ing at the empirical distribution function, which is another classical nonparametric device to characterize a distribution. This is because distribution functions are constrained to be 0 and 1 and mono- tone in each argument, thus making fine-grained features hard to detect. Further- more, distribution functions are of very limited utility in the multivariate case, while densities remain well defined.
    [Show full text]