THE EPIC STORY of MAXIMUM LIKELIHOOD 3 Error Probabilities Follow a Curve

Total Page:16

File Type:pdf, Size:1020Kb

THE EPIC STORY of MAXIMUM LIKELIHOOD 3 Error Probabilities Follow a Curve Statistical Science 2007, Vol. 22, No. 4, 598–620 DOI: 10.1214/07-STS249 c Institute of Mathematical Statistics, 2007 The Epic Story of Maximum Likelihood Stephen M. Stigler Abstract. At a superficial level, the idea of maximum likelihood must be prehistoric: early hunters and gatherers may not have used the words “method of maximum likelihood” to describe their choice of where and how to hunt and gather, but it is hard to believe they would have been surprised if their method had been described in those terms. It seems a simple, even unassailable idea: Who would rise to argue in favor of a method of minimum likelihood, or even mediocre likelihood? And yet the mathematical history of the topic shows this “simple idea” is really anything but simple. Joseph Louis Lagrange, Daniel Bernoulli, Leonard Euler, Pierre Simon Laplace and Carl Friedrich Gauss are only some of those who explored the topic, not always in ways we would sanction today. In this article, that history is reviewed from back well before Fisher to the time of Lucien Le Cam’s dissertation. In the process Fisher’s unpublished 1930 characterization of conditions for the consistency and efficiency of maximum likelihood estimates is presented, and the mathematical basis of his three proofs discussed. In particular, Fisher’s derivation of the information inequality is seen to be derived from his work on the analysis of variance, and his later approach via estimating functions was derived from Euler’s Relation for homogeneous functions. The reaction to Fisher’s work is reviewed, and some lessons drawn. Key words and phrases: R. A. Fisher, Karl Pearson, Jerzy Neyman, Harold Hotelling, Abraham Wald, maximum likelihood, sufficiency, ef- ficiency, superefficiency, history of statistics. 1. INTRODUCTION cial philosopher-scientist Herbert Spencer. One eve- ning about 1870 they met for dinner at the Athenaeum arXiv:0804.2996v1 [stat.ME] 18 Apr 2008 In the 1860s a small group of young English intel- Club in London, and that evening included one ex- lectuals formed what they called the X Club. The name was taken as the mathematical symbol for the change that so struck those present that it was re- unknown, and the plan was to meet for dinner once peated on several occasions. Francis Galton was not a month and let the conversation take them where present at the dinner, but he heard separate ac- chance would have it. The group included the Dar- counts from three men who were, and he recorded winian biologist Thomas Henry Huxley and the so- it in his own memoirs. As Galton reported it, dur- ing a pause in the conversation Herbert Spencer Stephen M. Stigler is the Ernest DeWitt Burton said, “You would little think it, but I once wrote a Distinguished Service Professor, Department of tragedy.” Huxley answered promptly, “I know the Statistics, University of Chicago, Chicago, Illinois catastrophe.” Spencer declared it was impossible, 60637, USA e-mail: [email protected]. for he had never spoken about it before then. Huxley This is an electronic reprint of the original article insisted. Spencer asked what it was. Huxley replied, published by the Institute of Mathematical Statistics in “A beautiful theory, killed by a nasty, ugly little Statistical Science, 2007, Vol. 22, No. 4, 598–620. This fact” (Galton, 1908, page 258). reprint differs from the original in pagination and Huxley’s description of a scientific tragedy is sin- typographic detail. gularly appropriate for one telling of the history 1 2 S. M. STIGLER Joe Hodges’s Nasty, Ugly Little Fact (1951) that history, with a sketch of the conceptual prob- 1 lems of the early years and then a closer look at the T = X¯ if X¯ n n | n|≥ n1/4 bold claims of the 1920s and 1930s, and at the early 1 arguments, some unpublished, that were devised to = αX¯n if X¯n < . | | n1/4 support them. Then √n(Tn θ) is asymptotically N(0, 1) if θ = 0, − 2 6 and asymptotically N(0, α ) if θ = 0. 2. THE EARLY HISTORY OF 2 Tn is then “super-efficient” for θ =0 if α < 1. MAXIMUM LIKELIHOOD Fig. 1. The example of a superefficient estimate due to By the mid-1700s it seems to have become a com- Joseph L. Hodges, Jr. The example was presented in lectures monplace among natural philosophers that problems in 1951, but was first published in Le Cam (1953). Here X¯n is of observational error were susceptible to mathe- the sample mean of a random sample of size n from a N(θ, 1) matical description. There was essential agreement population, with n Var(X¯n) = 1 all n, all θ (Bahadur, 1983; van der Vaart, 1997). upon some elements of that description: errors, for want of a better assumption, were supposed equally able to be positive and negative, and large errors of Maximum Likelihood. The theory of maximum were expected to be less frequently encountered than likelihood is very beautiful indeed: a conceptually small. Indeed, it was generally accepted that their simple approach to an amazingly broad collection frequency distribution followed a smooth symmet- of problems. This theory provides a simple recipe ric curve. Even the goal of the observer was agreed that purports to lead to the optimum solution for upon: while the words employed varied, the observer all parametric problems and beyond, and not only sought the most probable position for the object of promises an optimum estimate, but also a simple observation, be it a star declination or a geodetic lo- all-purpose assessment of its accuracy. And all this cation. But in the few serious attempts to treat this comes with no need for the specification of a pri- problem, the details varied in important ways. It ori probabilities, and no complicated derivation of was to prove quite difficult to arrive at a precise for- distributions. Furthermore, it is capable of being mulation that incorporated these elements, covered automated in modern computers and extended to useful applications, and also permitted analysis. any number of dimensions. But as in Huxley’s quip There were early intelligent comments related to about Spencer’s unpublished tragedy, some would this problem already in the 1750s by Thomases Simp- have it that this theory has been “killed by a nasty, son and Bayes and by Johann Heinrich Lambert ugly little fact,” most famously by Joseph Hodges’s in 1760, but the first serious assault related to our elegant simple example in 1951, pointing to the ex- topic was by Joseph Louis Lagrange in 1769 (Stigler, istence of “superefficient” estimates (estimates with 1986, Chapter 2; 1999, Chapter 16; Sheynin, 1971; smaller asymptotic variances than the maximum like- Hald, 1998, 2007). Lagrange postulated that obser- lihood estimate). See Figure 1. And then, just as vations varied about the desired mean according to with fatally wounded slaves in the Roman Colos- a multinomial distribution, and in an analytical tour seum, or fatally wounded bulls in a Spanish bull- de force he showed that the probability of a set ring, the theory was killed yet again, several times of observations was largest if the relative frequen- over by others, by ingenious examples of inconsis- cies of the different possible values were used as the tent maximum likelihood estimates. values of the probabilities. In modern terminology, The full story of maximum likelihood is more com- he found that the maximum likelihood estimates of plicated and less tragic than this simple account the multinomial probabilities are the sample relative would have it. The history of maximum likelihood frequencies. He concluded that the most probable is more in the spirit of a Homeric epic, with long value for the desired mean was then the mean value periods of peace punctuated by some small attacks found from these probabilities, which is the arith- building to major battles; a mixture of triumph and metic mean of the observations. It was only then, tragedy, all of this dominated by a few characters and contrary to modern practice, that Lagrange in- of heroic stature if not heroic temperament. For all troduced the hypothesis that the multinomial prob- its turbulent past, maximum likelihood has survived abilities followed a symmetric curve, and so he was numerous assaults and remains a beautiful, if in- left with only the problem of finding the probabil- creasingly complicated theory. I propose to review ity distribution of the arithmetic mean when the THE EPIC STORY OF MAXIMUM LIKELIHOOD 3 error probabilities follow a curve. This he solved for the nineteenth century. By the end of that century several examples by introducing and using “Laplace this was sometimes known as the Gaussian method, Transforms.” By introducing restrictions in the form and the approach became the staple of many text- of the curve only after deriving the estimates of books, often without the explicit invocation of a uni- probabilities, Lagrange’s analysis had the curious form prior that Gauss had seen as needed to justify consequence of always arriving at method of mo- the procedure. ment estimates, even though starting with maxi- mum likelihood! (Lagrange, 1776; Stigler, 1999, Chap- 3. KARL PEARSON AND L. N. G. FILON ter 14; Hald, 1998, page 48.) At about the same time, Daniel Bernoulli consid- Over the 19th century, the theory of estimation ered the problem in two successively very different generally remained around the level Laplace and ways. First, in 1769 he tried using the hypothesized Gauss left it, albeit with frequent retreats to lower curve as a weight function, in order to weight, then levels. With regard to maximum likelihood, the most iteratively reweight and average the observations.
Recommended publications
  • April 2, 2015 Probabilistic Voting in Models of Electoral Competition By
    April 2, 2015 Probabilistic Voting in Models of Electoral Competition by Peter Coughlin Department of Economics University of Maryland College Park, MD 20742 Abstract The pioneering model of electoral competition was developed by Harold Hotelling and Anthony Downs. The model developed by Hotelling and Downs and many subsequent models in the literature about electoral competition have assumed that candidates embody policies and, if a voter is not indifferent between the policies embodied by two candidates, then the voter’s choices are fully determined by his preferences on possible polices. More specifically, those models have assumed that if a voter prefers the policies embodied by one candidate then the voter will definitely vote for that candidate. Various authors have argued that i) factors other than policy can affect a voter’s decision and ii) those other factors cause candidates to be uncertain about who a voter will vote for. These authors have modeled the candidates’ uncertainty by using a probabilistic description of the voters’ choice behavior. This paper provides a framework that is useful for discussing the model developed by Hotelling and Downs and for discussing other models of electoral competition. Using that framework, the paper discusses work that has been done on the implications of candidates being uncertain about whom the individual voters in the electorate will vote for. 1. An overview The initial step toward the development of the first model of electoral competition was taken by Hotelling (1929), who developed a model of duopolists in which each firm chooses a location for its store. Near the end of his paper, he briefly described how his duopoly model could be reinterpreted as a model of competition between two political parties.
    [Show full text]
  • Harold Hotelling 1895–1973
    NATIONAL ACADEMY OF SCIENCES HAROLD HOTELLING 1895–1973 A Biographical Memoir by K. J. ARROW AND E. L. LEHMANN Any opinions expressed in this memoir are those of the authors and do not necessarily reflect the views of the National Academy of Sciences. Biographical Memoirs, VOLUME 87 PUBLISHED 2005 BY THE NATIONAL ACADEMIES PRESS WASHINGTON, D.C. HAROLD HOTELLING September 29, 1895–December 26, 1973 BY K. J. ARROW AND E. L. LEHMANN AROLD HOTELLING WAS A man of many interests and talents. HAfter majoring in journalism at the University of Washington and obtaining his B.A in that field in 1919, he did his graduate work in mathematics at Princeton, where he received his Ph.D. in 1924 with a thesis on topology. Upon leaving Princeton, he took a position as research associate at the Food Research Institute of Stanford Univer- sity, from where he moved to the Stanford Mathematics Department as an associate professor in 1927. It was during his Stanford period that he began to focus on the two fields— statistics and economics—in which he would do his life’s work. He was one of the few Americans who in the 1920s realized the revolution that R. A. Fisher had brought about in statistics and he spent six months in 1929 at the Rothamstead (United Kingdom) agricultural research station to work with Fisher. In 1931 Hotelling accepted a professorship in the Eco- nomics Department of Columbia University. He taught a course in mathematical economics, but most of his energy during his 15 years there was spent developing the first program in the modern (Fisherian) theory of statistics.
    [Show full text]
  • Memorial Resolution George Bernard Dantzig (1914–2005)
    MEMORIAL RESOLUTION GEORGE BERNARD DANTZIG (1914–2005) George Bernard Dantzig, the C. A. Criley Professor of Transportation Sciences and Professor of Operations Research and of Computer Science, Emeritus, died at his campus home on May 13, 2005 at age 90. Born on November 8, 1914 in Portland, Oregon, George Dantzig was given the middle name “Bernard” as an expression of his parents’ hope that he would become a writer. This was not to be, even though late in his life George was engaged in writing a novel. Instead, George became a mathematician. He graduated from the University of Maryland with an A.B. in mathematics and physics (1936) and took his M.A. in mathematics from the University of Michigan (1938). After a two-year period at the Bureau of Labor Statistics, he enrolled in the doctoral program in mathematics at the University of California, Berkeley, with the intention of writing his dissertation on mathematical statistics under the supervision of Jerzy Neyman. Arriving late to one of Neyman’s lectures, George copied down two problem statements from the blackboard thinking they were a homework assignment. George found these problems challenging. After a while though, he was able to solve them both and turned them in to the professor. As it happened, the problems were not just exercises but open questions in the field. The solutions to these problems became the two independent parts of George’s doctoral dissertation. With the outbreak of World War II, George took a leave of absence from the doctoral program at Berkeley to join the U.S.
    [Show full text]
  • The Gompertz Distribution and Maximum Likelihood Estimation of Its Parameters - a Revision
    Max-Planck-Institut für demografi sche Forschung Max Planck Institute for Demographic Research Konrad-Zuse-Strasse 1 · D-18057 Rostock · GERMANY Tel +49 (0) 3 81 20 81 - 0; Fax +49 (0) 3 81 20 81 - 202; http://www.demogr.mpg.de MPIDR WORKING PAPER WP 2012-008 FEBRUARY 2012 The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart ([email protected]) This working paper has been approved for release by: James W. Vaupel ([email protected]), Head of the Laboratory of Survival and Longevity and Head of the Laboratory of Evolutionary Biodemography. © Copyright is held by the authors. Working papers of the Max Planck Institute for Demographic Research receive only limited review. Views or opinions expressed in working papers are attributable to the authors and do not necessarily refl ect those of the Institute. The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart November 28, 2011 Abstract The Gompertz distribution is widely used to describe the distribution of adult deaths. Previous works concentrated on formulating approximate relationships to char- acterize it. However, using the generalized integro-exponential function Milgram (1985) exact formulas can be derived for its moment-generating function and central moments. Based on the exact central moments, higher accuracy approximations can be defined for them. In demographic or actuarial applications, maximum-likelihood estimation is often used to determine the parameters of the Gompertz distribution. By solving the maximum-likelihood estimates analytically, the dimension of the optimization problem can be reduced to one both in the case of discrete and continuous data.
    [Show full text]
  • Stable Components in the Parameter Plane of Transcendental Functions of Finite Type
    Stable components in the parameter plane of transcendental functions of finite type N´uriaFagella∗ Dept. de Matem`atiquesi Inform`atica Univ. de Barcelona, Gran Via 585 Barcelona Graduate School of Mathematics (BGSMath) 08007 Barcelona, Spain [email protected] Linda Keen† CUNY Graduate Center 365 Fifth Avenue, New York NY, NY 10016 [email protected], [email protected] November 11, 2020 Abstract We study the parameter planes of certain one-dimensional, dynamically-defined slices of holomorphic families of entire and meromorphic transcendental maps of finite type. Our planes are defined by constraining the orbits of all but one of the singular values, and leaving free one asymptotic value. We study the structure of the regions of parameters, which we call shell components, for which the free asymptotic value tends to an attracting cycle of non-constant multiplier. The exponential and the tangent families are examples that have been studied in detail, and the hyperbolic components in those parameter planes are shell components. Our results apply to slices of both entire and meromorphic maps. We prove that shell components are simply connected, have a locally connected boundary and have no center, i.e., no parameter value for which the cycle is superattracting. Instead, there is a unique parameter in the boundary, the virtual center, which plays the same role. For entire slices, the virtual center is always at infinity, while for meromorphic ones it maybe finite or infinite. In the dynamical plane we prove, among other results, that the basins of attraction which contain only one asymptotic value and no critical points are simply connected.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • UC Riverside UC Riverside Previously Published Works
    UC Riverside UC Riverside Previously Published Works Title Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more Permalink https://escholarship.org/uc/item/9351z60j Authors Lindsay, Bruce G Yao, Weixin Publication Date 2012 Peer reviewed eScholarship.org Powered by the California Digital Library University of California 712 The Canadian Journal of Statistics Vol. 40, No. 4, 2012, Pages 712–730 La revue canadienne de statistique Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more Bruce G. LINDSAY1 and Weixin YAO2* 1Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA 2Department of Statistics, Kansas State University, Manhattan, KS 66502, USA Key words and phrases: Dimension reduction; Fisher information matrix; independent component analysis; projection pursuit; white noise matrix. MSC 2010: Primary 62-07; secondary 62H99. Abstract: Hui & Lindsay (2010) proposed a new dimension reduction method for multivariate data. It was based on the so-called white noise matrices derived from the Fisher information matrix. Their theory and empirical studies demonstrated that this method can detect interesting features from high-dimensional data even with a moderate sample size. The theoretical emphasis in that paper was the detection of non-normal projections. In this paper we show how to decompose the information matrix into non-negative definite information terms in a manner akin to a matrix analysis of variance. Appropriate information matrices will be identified for diagnostics for such important modern modelling techniques as independent component models, Markov dependence models, and spherical symmetry. The Canadian Journal of Statistics 40: 712– 730; 2012 © 2012 Statistical Society of Canada Resum´ e:´ Hui et Lindsay (2010) ont propose´ une nouvelle methode´ de reduction´ de la dimension pour les donnees´ multidimensionnelles.
    [Show full text]
  • Statistical Estimation in Multivariate Normal Distribution
    STATISTICAL ESTIMATION IN MULTIVARIATE NORMAL DISTRIBUTION WITH A BLOCK OF MISSING OBSERVATIONS by YI LIU Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT ARLINGTON DECEMBER 2017 Copyright © by YI LIU 2017 All Rights Reserved ii To Alex and My Parents iii Acknowledgements I would like to thank my supervisor, Dr. Chien-Pai Han, for his instructing, guiding and supporting me over the years. You have set an example of excellence as a researcher, mentor, instructor and role model. I would like to thank Dr. Shan Sun-Mitchell for her continuously encouraging and instructing. You are both a good teacher and helpful friend. I would like to thank my thesis committee members Dr. Suvra Pal and Dr. Jonghyun Yun for their discussion, ideas and feedback which are invaluable. I would like to thank the graduate advisor, Dr. Hristo Kojouharov, for his instructing, help and patience. I would like to thank the chairman Dr. Jianzhong Su, Dr. Minerva Cordero-Epperson, Lona Donnelly, Libby Carroll and other staffs for their help. I would like to thank my manager, Robert Schermerhorn, for his understanding, encouraging and supporting which make this happen. I would like to thank my employer Sabre and my coworkers for their support in the past two years. I would like to thank my husband Alex for his encouraging and supporting over all these years. In particularly, I would like to thank my parents -- without the inspiration, drive and support that you have given me, I might not be the person I am today.
    [Show full text]
  • Backpropagation TA: Yi Wen
    Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion Section Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer Motivation Recall: Optimization objective is minimize loss Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the parameters to decrease the loss? Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer A Simple Example Loss Goal: Tweak the parameters to minimize loss => minimize a multivariable function in parameter space A Simple Example => minimize a multivariable function Plotted on WolframAlpha Approach #1: Random Search Intuition: the step we take in the domain of function Approach #2: Numerical Gradient Intuition: rate of change of a function with respect to a variable surrounding a small region Approach #2: Numerical Gradient Intuition: rate of change of a function with respect to a variable surrounding a small region Finite Differences: Approach #3: Analytical Gradient Recall: partial derivative by limit definition Approach #3: Analytical Gradient Recall: chain rule Approach #3: Analytical Gradient Recall: chain rule E.g. Approach #3: Analytical Gradient Recall: chain rule E.g. Approach #3: Analytical Gradient Recall: chain rule Intuition: upstream gradient values propagate backwards -- we can reuse them! Gradient “direction and rate of fastest increase” Numerical Gradient vs Analytical Gradient What about Autograd?
    [Show full text]
  • Decomposing the Parameter Space of Biological Networks Via a Numerical Discriminant Approach
    Decomposing the parameter space of biological networks via a numerical discriminant approach Heather A. Harrington1, Dhagash Mehta2, Helen M. Byrne1, and Jonathan D. Hauenstein2 1 Mathematical Institute, The University of Oxford, Oxford OX2 6GG, UK {harrington,helen.byrne}@maths.ox.ac.uk, www.maths.ox.ac.uk/people/{heather.harrington,helen.byrne} 2 Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame IN 46556, USA {dmehta,hauenstein}@nd.edu, www.nd.edu/~{dmehta,jhauenst} Abstract. Many systems in biology (as well as other physical and en- gineering systems) can be described by systems of ordinary dierential equation containing large numbers of parameters. When studying the dynamic behavior of these large, nonlinear systems, it is useful to iden- tify and characterize the steady-state solutions as the model parameters vary, a technically challenging problem in a high-dimensional parameter landscape. Rather than simply determining the number and stability of steady-states at distinct points in parameter space, we decompose the parameter space into nitely many regions, the number and structure of the steady-state solutions being consistent within each distinct region. From a computational algebraic viewpoint, the boundary of these re- gions is contained in the discriminant locus. We develop global and local numerical algorithms for constructing the discriminant locus and classi- fying the parameter landscape. We showcase our numerical approaches by applying them to molecular and cell-network models. Keywords: parameter landscape · numerical algebraic geometry · dis- criminant locus · cellular networks. 1 Introduction The dynamic behavior of many biophysical systems can be mathematically mod- eled with systems of dierential equations that describe how the state variables interact and evolve over time.
    [Show full text]
  • A Sufficiency Paradox: an Insufficient Statistic Preserving the Fisher
    A Sufficiency Paradox: An Insufficient Statistic Preserving the Fisher Information Abram KAGAN and Lawrence A. SHEPP this it is possible to find an insufficient statistic T such that (1) holds. An example of a regular statistical experiment is constructed The following example illustrates this. Let g(x) be the density where an insufficient statistic preserves the Fisher information function of a gamma distribution Gamma(3, 1), that is, contained in the data. The data are a pair (∆,X) where ∆ is a 2 −x binary random variable and, given ∆, X has a density f(x−θ|∆) g(x)=(1/2)x e ,x≥ 0; = 0,x<0. depending on a location parameter θ. The phenomenon is based on the fact that f(x|∆) smoothly vanishes at one point; it can be Take a binary variable ∆ with eliminated by adding to the regularity of a statistical experiment P (∆=1)= w , positivity of the density function. 1 P (∆=2)= w2,w1 + w2 =1,w1 =/ w2 (2) KEY WORDS: Convexity; Regularity; Statistical experi- ment. and let Y be a continuous random variable with the conditional density f(y|∆) given by f1(y)=f(y|∆=1)=.7g(y),y≥ 0; = .3g(−y),y≤ 0, 1. INTRODUCTION (3) Let P = {p(x; θ),θ∈ Θ} be a parametric family of densities f2(y)=f(y|∆=2)=.3g(y),y≥ 0; = .7g(−y),y≤ 0. (with respect to a measure µ) of a random element X taking values in a measurable space (X , A), the parameter space Θ (4) being an interval. Following Ibragimov and Has’minskii (1981, There is nothing special in the pair (.7,.3); any pair of positive chap.
    [Show full text]
  • A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess
    S S symmetry Article A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess Guillermo Martínez-Flórez 1, Víctor Leiva 2,* , Emilio Gómez-Déniz 3 and Carolina Marchant 4 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 14014, Colombia; [email protected] 2 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile 3 Facultad de Economía, Empresa y Turismo, Universidad de Las Palmas de Gran Canaria and TIDES Institute, 35001 Canarias, Spain; [email protected] 4 Facultad de Ciencias Básicas, Universidad Católica del Maule, 3466706 Talca, Chile; [email protected] * Correspondence: [email protected] or [email protected] Received: 30 June 2020; Accepted: 19 August 2020; Published: 1 September 2020 Abstract: In this paper, we consider skew-normal distributions for constructing new a distribution which allows us to model proportions and rates with zero/one inflation as an alternative to the inflated beta distributions. The new distribution is a mixture between a Bernoulli distribution for explaining the zero/one excess and a censored skew-normal distribution for the continuous variable. The maximum likelihood method is used for parameter estimation. Observed and expected Fisher information matrices are derived to conduct likelihood-based inference in this new type skew-normal distribution. Given the flexibility of the new distributions, we are able to show, in real data scenarios, the good performance of our proposal. Keywords: beta distribution; centered skew-normal distribution; maximum-likelihood methods; Monte Carlo simulations; proportions; R software; rates; zero/one inflated data 1.
    [Show full text]