Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 9, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 1/207 Section 1 Introduction Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 2/207 1. Introduction The Maximum Likelihood Estimation (MLE) is a method of estimating the parameters of a model. This estimation method is one of the most widely used. The method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the "agreement" of the selected model with the observed data. The Maximum-likelihood Estimation gives an uni…ed approach to estimation. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 3/207 2. The Principle of Maximum Likelihood What are the main properties of the maximum likelihood estimator? I Is it asymptotically unbiased? I Is it asymptotically e¢ cient? Under which condition(s)? I Is it consistent? I What is the asymptotic distribution? How to apply the maximum likelihood principle to the multiple linear regression model, to the Probit/Logit Models etc. ? ... All of these questions are answered in this lecture... Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 4/207 1. Introduction The outline of this chapter is the following: Section 2: The principle of the maximum likelihood estimation Section 3: The likelihood function Section 4: Maximum likelihood estimator Section 5: Score, Hessian and Fisher information Section 6: Properties of maximum likelihood estimators Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 5/207 1. Introduction References Amemiya T. (1985), Advanced Econometrics. Harvard University Press. Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (a special thank) Ruud P., (2000) An introduction to Classical Econometric Theory, Oxford University Press. Zivot, E. (2001), Maximum Likelihood Estimation, Lecture notes. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 6/207 Section 2 The Principle of Maximum Likelihood Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 7/207 2. The Principle of Maximum Likelihood Objectives In this section, we present a simple example in order 1 To introduce the notations 2 To introduce the notion of likelihood and log-likelihood. 3 To introduce the concept of maximum likelihood estimator 4 To introduce the concept of maximum likelihood estimate Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 8/207 2. The Principle of Maximum Likelihood Example Suppose that X1,X2, ,XN are i.i.d. discrete random variables, such that Xi Pois (θ) with a pmf (probability mass function) de…ned as: exp ( θ) θxi Pr (Xi = xi ) = xi ! where θ is an unknown parameter to estimate. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 9/207 2. The Principle of Maximum Likelihood Question: What is the probability of observing the particular sample x1, x2, .., xN , assuming that a Poisson distribution with as yet unknown parameterf θ generatedg the data? This probability is equal to Pr ((X1 = x1) ... (XN = xN )) \ \ Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 10/207 2. The Principle of Maximum Likelihood Since the variables Xi are i.i.d. this joint probability is equal to the product of the marginal probabilities N Pr ((X1 = x1) ... (XN = xN )) = ∏ Pr (Xi = xi ) \ \ i=1 Given the pmf of the Poisson distribution, we have: N exp ( θ) θxi Pr ((X1 = x1) ... (XN = xN )) = ∏ \ \ i=1 xi ! N x θ∑i=1 i = exp ( θN) N ∏ xi ! i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 11/207 2. The Principle of Maximum Likelihood De…nition This joint probability is a function of θ (the unknown parameter) and corresponds to the likelihood of the sample x1, .., xN denoted by f g LN (θ; x1.., xN ) = Pr ((X1 = x1) ... (XN = xN )) \ \ with N 1 ∑=1 xi LN (θ; x1.., xN ) = exp ( θN) θ N ∏ xi ! i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 12/207 2. The Principle of Maximum Likelihood Example Let us assume that for N = 10, we have a realization of the sample equal to 5, 0, 1, 1, 0, 3, 2, 3, 4, 1 , then: f g LN (θ; x1.., xN ) = Pr ((X1 = x1) ... (XN = xN )) \ \ e 10θθ20 L (θ; x .., x ) = N 1 N 207, 360 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 13/207 2. The Principle of Maximum Likelihood Question: What value of θ would make this sample most probable? Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 14/207 2. The Principle of Maximum Likelihood This Figure plots the function LN (θ; x) for various values of θ. It has a single mode at θ = 2, which would be the maximum likelihood estimate, or MLE, of θ. •8 x 10 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 q Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 15/207 2. The Principle of Maximum Likelihood Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 16/207 2. The Principle of Maximum Likelihood Consider maximizing the likelihood function LN (θ; x1.., xN ) with respect to θ. Since the log function is monotonically increasing, we usually maximize ln LN (θ; x1.., xN ) instead. In this case: N N ln LN (θ; x1.., xN ) = θN + ln (θ) xi ln ∏ xi ! ∑ i=1 i=1 N ∂ ln LN (θ; x1.., xN ) 1 = N + ∑ xi ∂θ θ i=1 2 N ∂ ln LN (θ; x1.., xN ) 1 2 = 2 ∑ xi < 0 ∂θ θ i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 17/207 2. The Principle of Maximum Likelihood Under suitable regularity conditions, the maximum likelihood estimate (estimator) is de…ned as: θ = arg max ln LN (θ; x1.., xN ) θ R+ 2 b N ∂ ln LN (θ; x1.., xN ) 1 FOC : = N + xi = 0 ∂θ ∑ θ θ i=1 N b b θ = (1/N) ∑ xi () i=1 2 b N ∂ ln LN (θ; x1.., xN ) 1 SOC : = xi < 0 2 2 ∑ ∂θ θ θ i=1 θ is a maximum. b b Christopheb Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 18/207 2. The Principle of Maximum Likelihood The maximum likelihood estimate (realization) is: 1 N θ θ (x) = ∑ xi N i=1 b b Given the sample 5, 0, 1, 1, 0, 3, 2, 3, 4, 1 , we have θ (x) = 2. f g The maximum likelihood estimator (random variableb) is: 1 N θ = ∑ Xi N i=1 b Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 19/207 2. The Principle of Maximum Likelihood Continuous variables The reference to the probability of observing the given sample is not exact in a continuous distribution, since a particular sample has probability zero. Nonetheless, the principle is the same. The likelihood function then corresponds to the pdf associated to the joint distribution of (X1, X2, .., XN ) evaluated at the point (x1, x2, .., xN ) : LN (θ; x1.., xN ) = fX1,..,XN (x1, x2, .., xN ; θ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 20/207 2. The Principle of Maximum Likelihood Continuous variables If the random variables X1, X2, .., XN are i.i.d. then we have: f g N LN (θ; x1.., xN ) = ∏ fX (xi ; θ) i=1 where fX (xi ; θ) denotes the pdf of the marginal distribution of X (or Xi since all the variables have the same distribution). The values of the parameters that maximize LN (θ; x1.., xN ) or its log are the maximum likelihood estimates, denoted θ (x). b Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 21/207 Section 3 The Likelihood function De…nitions and Notations Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 22/207 3. The Likelihood Function Objectives 1 Introduce the notations for an estimation problem that deals with a marginal distribution or a conditional distribution (model). 2 De…ne the likelihood and the log-likelihood functions. 3 Introduce the concept of conditional log-likelihood 4 Propose various applications Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 23/207 3. The Likelihood Function Notations Let us consider a continuous random variable X , with a pdf denoted fX (x; θ) , for x R 2 | θ = (θ1..θK ) is a K 1 vector of unknown parameters. We assume that θ Θ RK . 2 Let us consider a sample X1, .., XN of i.i.d. random variables with the same arbitrary distributionf as X .g The realisation of X1, .., XN (the data set..) is denoted x1, .., xN or x for simplicity.f g f g Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 24/207 3. The Likelihood Function Example (Normal distribution) If X N m, σ2 then: 1 (z m)2 fX (z; θ) = exp 2 z R σp2π 2σ ! 8 2 with K = 2 and m θ = σ2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 25/207 3. The Likelihood Function De…nition (Likelihood Function) The likelihood function is de…ned to be: N + LN : Θ R R ! N (θ; x1, .., xn) LN (θ; x1, .., xn) = ∏ fX (xi ; θ) 7! i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 26/207 3. The Likelihood Function De…nition (Log-Likelihood Function) The log-likelihood function is de…ned to be: N `N : Θ R R ! N (θ; x1, .., xn) `N (θ; x1, .., xn) = ∑ ln fX (xi ; θ) 7! i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December9,2013 27/207 3.
Recommended publications
  • Analysis of Variance; *********43***1****T******T
    DOCUMENT RESUME 11 571 TM 007 817 111, AUTHOR Corder-Bolz, Charles R. TITLE A Monte Carlo Study of Six Models of Change. INSTITUTION Southwest Eduoational Development Lab., Austin, Tex. PUB DiTE (7BA NOTE"". 32p. BDRS-PRICE MF-$0.83 Plug P4istage. HC Not Availablefrom EDRS. DESCRIPTORS *Analysis Of COariance; *Analysis of Variance; Comparative Statistics; Hypothesis Testing;. *Mathematicai'MOdels; Scores; Statistical Analysis; *Tests'of Significance ,IDENTIFIERS *Change Scores; *Monte Carlo Method ABSTRACT A Monte Carlo Study was conducted to evaluate ,six models commonly used to evaluate change. The results.revealed specific problems with each. Analysis of covariance and analysis of variance of residualized gain scores appeared to, substantially and consistently overestimate the change effects. Multiple factor analysis of variance models utilizing pretest and post-test scores yielded invalidly toF ratios. The analysis of variance of difference scores an the multiple 'factor analysis of variance using repeated measures were the only models which adequately controlled for pre-treatment differences; ,however, they appeared to be robust only when the error level is 50% or more. This places serious doubt regarding published findings, and theories based upon change score analyses. When collecting data which. have an error level less than 50% (which is true in most situations)r a change score analysis is entirely inadvisable until an alternative procedure is developed. (Author/CTM) 41t i- **************************************4***********************i,******* 4! Reproductions suppliedby ERRS are theibest that cap be made * .... * '* ,0 . from the original document. *********43***1****t******t*******************************************J . S DEPARTMENT OF HEALTH, EDUCATIONWELFARE NATIONAL INST'ITUTE OF r EOUCATIOp PHISDOCUMENT DUCED EXACTLYHAS /3EENREPRC AS RECEIVED THE PERSONOR ORGANIZATION J RoP AT INC.
    [Show full text]
  • 6 Modeling Survival Data with Cox Regression Models
    CHAPTER 6 ST 745, Daowen Zhang 6 Modeling Survival Data with Cox Regression Models 6.1 The Proportional Hazards Model A proportional hazards model proposed by D.R. Cox (1972) assumes that T z1β1+···+zpβp z β λ(t|z)=λ0(t)e = λ0(t)e , (6.1) where z is a p × 1 vector of covariates such as treatment indicators, prognositc factors, etc., and β is a p × 1 vector of regression coefficients. Note that there is no intercept β0 in model (6.1). Obviously, λ(t|z =0)=λ0(t). So λ0(t) is often called the baseline hazard function. It can be interpreted as the hazard function for the population of subjects with z =0. The baseline hazard function λ0(t) in model (6.1) can take any shape as a function of t.The T only requirement is that λ0(t) > 0. This is the nonparametric part of the model and z β is the parametric part of the model. So Cox’s proportional hazards model is a semiparametric model. Interpretation of a proportional hazards model 1. It is easy to show that under model (6.1) exp(zT β) S(t|z)=[S0(t)] , where S(t|z) is the survival function of the subpopulation with covariate z and S0(t)isthe survival function of baseline population (z = 0). That is R t − λ0(u)du S0(t)=e 0 . PAGE 120 CHAPTER 6 ST 745, Daowen Zhang 2. For any two sets of covariates z0 and z1, zT β λ(t|z1) λ0(t)e 1 T (z1−z0) β, t ≥ , = zT β =e for all 0 λ(t|z0) λ0(t)e 0 which is a constant over time (so the name of proportional hazards model).
    [Show full text]
  • The Gompertz Distribution and Maximum Likelihood Estimation of Its Parameters - a Revision
    Max-Planck-Institut für demografi sche Forschung Max Planck Institute for Demographic Research Konrad-Zuse-Strasse 1 · D-18057 Rostock · GERMANY Tel +49 (0) 3 81 20 81 - 0; Fax +49 (0) 3 81 20 81 - 202; http://www.demogr.mpg.de MPIDR WORKING PAPER WP 2012-008 FEBRUARY 2012 The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart ([email protected]) This working paper has been approved for release by: James W. Vaupel ([email protected]), Head of the Laboratory of Survival and Longevity and Head of the Laboratory of Evolutionary Biodemography. © Copyright is held by the authors. Working papers of the Max Planck Institute for Demographic Research receive only limited review. Views or opinions expressed in working papers are attributable to the authors and do not necessarily refl ect those of the Institute. The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart November 28, 2011 Abstract The Gompertz distribution is widely used to describe the distribution of adult deaths. Previous works concentrated on formulating approximate relationships to char- acterize it. However, using the generalized integro-exponential function Milgram (1985) exact formulas can be derived for its moment-generating function and central moments. Based on the exact central moments, higher accuracy approximations can be defined for them. In demographic or actuarial applications, maximum-likelihood estimation is often used to determine the parameters of the Gompertz distribution. By solving the maximum-likelihood estimates analytically, the dimension of the optimization problem can be reduced to one both in the case of discrete and continuous data.
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • 18.650 (F16) Lecture 10: Generalized Linear Models (Glms)
    Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X (µ(X), σ2I), | ∼ N And ⊤ IE(Y X) = µ(X) = X β, | 2/52 Components of a linear model The two components (that we are going to relax) are 1. Random component: the response variable Y X is continuous | and normally distributed with mean µ = µ(X) = IE(Y X). | 2. Link: between the random and covariates X = (X(1),X(2), ,X(p))⊤: µ(X) = X⊤β. · · · 3/52 Generalization A generalized linear model (GLM) generalizes normal linear regression models in the following directions. 1. Random component: Y some exponential family distribution ∼ 2. Link: between the random and covariates: ⊤ g µ(X) = X β where g called link function� and� µ = IE(Y X). | 4/52 Example 1: Disease Occuring Rate In the early stages of a disease epidemic, the rate at which new cases occur can often increase exponentially through time. Hence, if µi is the expected number of new cases on day ti, a model of the form µi = γ exp(δti) seems appropriate. ◮ Such a model can be turned into GLM form, by using a log link so that log(µi) = log(γ) + δti = β0 + β1ti. ◮ Since this is a count, the Poisson distribution (with expected value µi) is probably a reasonable distribution to try. 5/52 Example 2: Prey Capture Rate(1) The rate of capture of preys, yi, by a hunting animal, tends to increase with increasing density of prey, xi, but to eventually level off, when the predator is catching as much as it can cope with.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Random Variables and Probability Distributions 1.1
    RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 1. DISCRETE RANDOM VARIABLES 1.1. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable infinite number of distinct values. A discrete random variable can be defined on both a countable or uncountable sample space. 1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x), is defined as the sum of the probabilities of all sample points in Ω that are assigned the value x. We may denote P(X=x) by p(x) or pX (x). The expression pX (x) is a function that assigns probabilities to each possible value x; thus it is often called the probability function for the random variable X. 1.3. Probability distribution for a discrete random variable. The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides pX (x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a positive probability is understood to be such that P(X=x) = 0. The function pX (x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X. 1.4. Properties of the probability distribution for a discrete random variable.
    [Show full text]
  • UC Riverside UC Riverside Previously Published Works
    UC Riverside UC Riverside Previously Published Works Title Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more Permalink https://escholarship.org/uc/item/9351z60j Authors Lindsay, Bruce G Yao, Weixin Publication Date 2012 Peer reviewed eScholarship.org Powered by the California Digital Library University of California 712 The Canadian Journal of Statistics Vol. 40, No. 4, 2012, Pages 712–730 La revue canadienne de statistique Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more Bruce G. LINDSAY1 and Weixin YAO2* 1Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA 2Department of Statistics, Kansas State University, Manhattan, KS 66502, USA Key words and phrases: Dimension reduction; Fisher information matrix; independent component analysis; projection pursuit; white noise matrix. MSC 2010: Primary 62-07; secondary 62H99. Abstract: Hui & Lindsay (2010) proposed a new dimension reduction method for multivariate data. It was based on the so-called white noise matrices derived from the Fisher information matrix. Their theory and empirical studies demonstrated that this method can detect interesting features from high-dimensional data even with a moderate sample size. The theoretical emphasis in that paper was the detection of non-normal projections. In this paper we show how to decompose the information matrix into non-negative definite information terms in a manner akin to a matrix analysis of variance. Appropriate information matrices will be identified for diagnostics for such important modern modelling techniques as independent component models, Markov dependence models, and spherical symmetry. The Canadian Journal of Statistics 40: 712– 730; 2012 © 2012 Statistical Society of Canada Resum´ e:´ Hui et Lindsay (2010) ont propose´ une nouvelle methode´ de reduction´ de la dimension pour les donnees´ multidimensionnelles.
    [Show full text]
  • Statistical Estimation in Multivariate Normal Distribution
    STATISTICAL ESTIMATION IN MULTIVARIATE NORMAL DISTRIBUTION WITH A BLOCK OF MISSING OBSERVATIONS by YI LIU Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT ARLINGTON DECEMBER 2017 Copyright © by YI LIU 2017 All Rights Reserved ii To Alex and My Parents iii Acknowledgements I would like to thank my supervisor, Dr. Chien-Pai Han, for his instructing, guiding and supporting me over the years. You have set an example of excellence as a researcher, mentor, instructor and role model. I would like to thank Dr. Shan Sun-Mitchell for her continuously encouraging and instructing. You are both a good teacher and helpful friend. I would like to thank my thesis committee members Dr. Suvra Pal and Dr. Jonghyun Yun for their discussion, ideas and feedback which are invaluable. I would like to thank the graduate advisor, Dr. Hristo Kojouharov, for his instructing, help and patience. I would like to thank the chairman Dr. Jianzhong Su, Dr. Minerva Cordero-Epperson, Lona Donnelly, Libby Carroll and other staffs for their help. I would like to thank my manager, Robert Schermerhorn, for his understanding, encouraging and supporting which make this happen. I would like to thank my employer Sabre and my coworkers for their support in the past two years. I would like to thank my husband Alex for his encouraging and supporting over all these years. In particularly, I would like to thank my parents -- without the inspiration, drive and support that you have given me, I might not be the person I am today.
    [Show full text]
  • A Sufficiency Paradox: an Insufficient Statistic Preserving the Fisher
    A Sufficiency Paradox: An Insufficient Statistic Preserving the Fisher Information Abram KAGAN and Lawrence A. SHEPP this it is possible to find an insufficient statistic T such that (1) holds. An example of a regular statistical experiment is constructed The following example illustrates this. Let g(x) be the density where an insufficient statistic preserves the Fisher information function of a gamma distribution Gamma(3, 1), that is, contained in the data. The data are a pair (∆,X) where ∆ is a 2 −x binary random variable and, given ∆, X has a density f(x−θ|∆) g(x)=(1/2)x e ,x≥ 0; = 0,x<0. depending on a location parameter θ. The phenomenon is based on the fact that f(x|∆) smoothly vanishes at one point; it can be Take a binary variable ∆ with eliminated by adding to the regularity of a statistical experiment P (∆=1)= w , positivity of the density function. 1 P (∆=2)= w2,w1 + w2 =1,w1 =/ w2 (2) KEY WORDS: Convexity; Regularity; Statistical experi- ment. and let Y be a continuous random variable with the conditional density f(y|∆) given by f1(y)=f(y|∆=1)=.7g(y),y≥ 0; = .3g(−y),y≤ 0, 1. INTRODUCTION (3) Let P = {p(x; θ),θ∈ Θ} be a parametric family of densities f2(y)=f(y|∆=2)=.3g(y),y≥ 0; = .7g(−y),y≤ 0. (with respect to a measure µ) of a random element X taking values in a measurable space (X , A), the parameter space Θ (4) being an interval. Following Ibragimov and Has’minskii (1981, There is nothing special in the pair (.7,.3); any pair of positive chap.
    [Show full text]
  • A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess
    S S symmetry Article A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess Guillermo Martínez-Flórez 1, Víctor Leiva 2,* , Emilio Gómez-Déniz 3 and Carolina Marchant 4 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 14014, Colombia; [email protected] 2 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile 3 Facultad de Economía, Empresa y Turismo, Universidad de Las Palmas de Gran Canaria and TIDES Institute, 35001 Canarias, Spain; [email protected] 4 Facultad de Ciencias Básicas, Universidad Católica del Maule, 3466706 Talca, Chile; [email protected] * Correspondence: [email protected] or [email protected] Received: 30 June 2020; Accepted: 19 August 2020; Published: 1 September 2020 Abstract: In this paper, we consider skew-normal distributions for constructing new a distribution which allows us to model proportions and rates with zero/one inflation as an alternative to the inflated beta distributions. The new distribution is a mixture between a Bernoulli distribution for explaining the zero/one excess and a censored skew-normal distribution for the continuous variable. The maximum likelihood method is used for parameter estimation. Observed and expected Fisher information matrices are derived to conduct likelihood-based inference in this new type skew-normal distribution. Given the flexibility of the new distributions, we are able to show, in real data scenarios, the good performance of our proposal. Keywords: beta distribution; centered skew-normal distribution; maximum-likelihood methods; Monte Carlo simulations; proportions; R software; rates; zero/one inflated data 1.
    [Show full text]