Topic 15 Maximum Likelihood Estimation Multidimensional Estimation

Total Page:16

File Type:pdf, Size:1020Kb

Topic 15 Maximum Likelihood Estimation Multidimensional Estimation Fisher Information Example Topic 15 Maximum Likelihood Estimation Multidimensional Estimation 1 / 10 Fisher Information Example Outline Fisher Information Example Distribution of Fitness Effects Gamma Distribution 2 / 10 Fisher Information Example Fisher Information For a multidimensional parameter space θ = (θ1; θ2; : : : ; θn), the Fisher information I (θ) is a matrix. As with one-dimensional case, the ij-th entry has two alternative expressions, namely, @ @ @2 I (θ)ij = Eθ ln L(θjX ) ln L(θjX ) = −Eθ ln L(θjX ) : @θi @θj @θi @θj Rather than taking reciprocals to obtain an estimate of the variance, we find the matrix inverse I (θ)−1. • The diagonal entries of I (θ)−1 gives estimates of variances. • The off-diagonal entries of I (θ)−1 give estimates of covariances. 3 / 10 Fisher Information Example Fisher Information To be precise, for n observations, let θ^i;n(X ) be the maximum likelihood estimator of the i-th parameter. Then 1 1 Var (θ^ (X )) ≈ I (θ)−1 Cov (θ^ (X ); θ^ (X )) ≈ I (θ)−1: θ i;n n ii θ i;n j;n n ij When the i-th parameter is θi , the asymptotic normality and efficiency can be expressed by noting that the z-score θ^i (X ) − θi Zi;n = : q −1 I (θ)ii =n is approximately a standard normal. As we saw in one dimension, we can replace the information matrix with the observed information matrix, @2 J(θ^)ij = − ln L(θ^(X )jX ): @θi @θj 4 / 10 Fisher Information Example Distribution of Fitness Effects We return to the model of the gamma distribution for the distribution of fitness effects of deleterious mutations. To obtain the maximum likelihood estimate for the gamma family of random variables, write the likelihood βα βα L(α; βjx) = xα−1e−βx1 ··· xα−1e−βxn Γ(α) 1 Γ(α) n βα n = (x x ··· x )α−1e−β(x1+x2+···+xn) : Γ(α) 1 2 n and its logarithm n n X X ln L(α; βjx) = n(α ln β − ln Γ(α)) + (α − 1) ln xi − β xi : i=1 i=1 @ @ The score function is a vector @α ln L(α; βjx); @β ln L(α; βjx) . 5 / 10 Fisher Information Example Gamma Distribution n n X X ln L(α; βjx) = n(α ln β − ln Γ(α)) + (α − 1) ln xi − β xi : i=1 i=1 The zeros of the components of the score function determine the maximum likelihood estimators. Thus, to determine these parameters, we solve the equations n @ d X ln L(^α; β^jx) = n(ln β^ − ln Γ(^α)) + ln x = 0 @α dα i i=1 n @ α^ X α^ and ln L(^α; β^jx) = n − x = 0; or¯ x = : @β ^ i ^ β i=1 β Substituting β^ =α= ^ x¯ into the first equation results the following relationship for^α. d n(lnα ^ − lnx ¯ − ln Γ(^α) + ln x) = 0 dα 6 / 10 Fisher Information Example Gamma Distribution This can be solved numerically. The deriva- tive of the logarithm of the gamma function 1.5 d 1.0 (α) = ln Γ(α) dα 0.5 is know as the digamma function and is alpha-score called in R with digamma. 0.0 For the example for the distribution of fit- -0.5 ness effects in humans, a simulated data 0.14 0.16 0.18 0.20 0.22 0.24 set (rgamma(500,0.19,5.18)) yields^α = alpha ^ 0:2006 and β = 5:806 for maximum likeli- d Figure: lnα ^ − lnx ¯ − dα ln Γ(^α) + ln xi crosses hood estimates. the horizontal axis at^α = 0:2006. 7 / 10 Fisher Information Example Gamma Distribution Exercise. To determine the variance of these estimators, compute the appropriate second derivatives. @2 d2 @2 α I (α; β) = − ln L(α; βjx) = n ln Γ(α); I (α; β) = − ln L(α; βjx) = n ; 11 @α2 dα2 22 @β2 β2 @2 1 I (α; β) = − ln L(α; βjx) = −n : 12 @α@β β This give a Fisher information matrix 2 ! d ln Γ(α) − 1 28:983 −0:193 I (α; β) = n dα2 β I (0:19; 5:18) = 500 : 1 α −0:193 0:007 − β β2 2 2 NB. 1(α) = d ln Γ(α)=dα is known as the trigamma function and is called in R with trigamma. 8 / 10 Fisher Information Example Gamma Distribution The inverse matrix 1 0:0422 1:1494 I (α; β)−1 = : 500 1:1494 172:5587 Thus, −5 Var(^α) ≈ 8:432 × 10 σα^ ≈ 0:00918 ^ Var(β) ≈ 0:3451 σβ^ ≈ 0:5875 Compare this with the method of moments estimators σα^ ≈ 0:02838 σβ^ ≈ 0:9769 Exercise. Estimate the correlation ρ(^α; β^). 9 / 10 Fisher Information Example Gamma Distribution 2120 2110 loglikelia 2100 0.14 0.16 0.18 0.20 0.22 0.24 loglikeli alpha 2125.5 loglikelib alpha 2124.5 beta 5.0 5.5 6.0 6.5 7.0 beta Figure: Graphs of vertical slices through the Figure: The log-likelihood surface. The domain log-likelihood function surface through the is0 :14 ≤ α ≤ 0:24 and5 ≤ β ≤ 7 MLE. (top) β^ = 5:806 (bottom)^α = 0:20066. 10 / 10.
Recommended publications
  • Arxiv:1702.05635V2 [Math.CA] 20 May 2020 Napprwitnb H Elkong .Hry[0,A Neetn in Interesting an [10], Hardy H
    ON SPECIAL RIEMANN XI FUNCTION FORMULAE OF HARDY INVOLVING THE DIGAMMA FUNCTION ALEXANDER E PATKOWSKI Abstract. We consider some properties of integrals considered by Hardy and Koshliakov, that have have connections to the digamma function. We establish a new general integral formula that provides a connection to the polygamma function. We also obtain lower and upper bounds for Hardy’s integral through properties of the digamma function. Keywords: Fourier Integrals; Riemann xi function; Digamma function. 2010 Mathematics Subject Classification 11M06, 33C15. 1. Introduction and Main formulas In a paper written by the well-known G. H. Hardy [10], an interesting integral formula is presented (corrected in [3]) (1.1) ∞ ∞ Ξ(t/2) cos(xt) 1 1 1 1 2 4x dt = e−x 2x + γ + log(π) + log(2) + ex ψ(t+1)e−πt e dt, 1+ t2 cosh(πt/2) 4 2 2 2 Z0 Z0 ∂ where ψ(x) := ∂x log(Γ(x)), Γ(x) being the gamma function [1, pg.1], and Ξ(t) := 1 ξ( 2 + it), where [11, 15] 1 s s ξ(s) := s(s 1)π− 2 Γ( )ζ(s). 2 − 2 arXiv:1702.05635v2 [math.CA] 20 May 2020 Here we have used the standard notation for the Riemann zeta function ζ(s) := n−s, for (s) > 1. Koshliakov [12, eq.(14), eq.(20)] (or [3, eq.(1.15)]) pro- n≥1 ℜ Pduced this formula as well, but in a slightly different form, ∞ ∞ Ξ(t/2) cos(xt) 2 4x (1.2) 2 dt = ex (ψ(t + 1) log(t))e−πt e dt.
    [Show full text]
  • The IFC on the Hill Greek Awards 2020 Interfraternity Council at the University of Colorado, Inc
    The IFC on The Hill Greek Awards 2020 Interfraternity Council at the University of Colorado, Inc. recognizes the following Brothers and Chapters for Excellency and Accomplishments. Chapter of the Year: Pi Kappa Alpha ​ Most Improved Chapter: Alpha Kappa Lambda ​ COVID Response Plan: Pi Kappa Alpha ​ Outstanding and Innovative Recruitment: Phi Gamma Delta ​ Outstanding Philanthropic Award: Theta Xi ​ Outstanding Risk Reduction: Pi Kappa Alpha ​ Brothers and Cousins: Phi Kappa Psi, Chi Psi, and Theta Xi ​ Greek Man of the Year: Adam Wenzlaff (Sigma Nu) ​ Fraternity President of the Year: Josh Tackaberry (Theta Xi) ​ Emerging Leader Award: Jackson Brown (Pi Kappa Alpha) ​ Outstanding Fraternity Philanthropist: Nick Drew (Theta Xi) ​ Outstanding Fraternity Advisor: John Shay (Sigma Alpha Epsilon) ​ Outstanding Senior Award: Andrew Siana (Sigma Nu), Alex Vaillancourt ​ (Acacia), Jack Lynch (Chi Psi), Kyle Furlong (Chi Psi), Nathan Davis (Phi Kappa Psi), Reid Schneckenberger (Theta Xi), Nathan Vandiver (Tau Kappa Epsilon), Harrison Bolin (Alpha Gamma Omega) Individual Academic Excellence Award: Acacia - Nicolas Abate Alpha Epsilon Pi - Jack Elliot Alpha Gamma Omega - Alexander Karas Alpha Kappa Lambda – Jason Aristidies Alpha Phi Delta - Eric Wright Alpha Sigma Phi - William Molineaux Chi Psi - Ben Miller Delta Kappa Epsilon - Titus Ellison Delta Sigma Phi - Daniel Merritt Phi Gamma Delta - Mitchel Ramba Phi Kappa Psi - Kyle Singleton Pi Kappa Alpha - Cross Di Muro Pi Kappa Phi - Jackson Winn Sigma Alpha Epsilon - Eddy Connors Sigma Nu - Cameron Carelson Tau Kappa Epsilon - Jakob Fletcher Theta Chi - Cole Smith Theta Xi - Zach Dickman Zeta Beta Tau - Manny Gutman .
    [Show full text]
  • Greek Houses
    2 Greek houses Σ Δ Σ Σ Ζ ΚΑ Υ Α 33rd Street Θ Τ ΛΧΑ Δ ΝΜ ΤΕΦ ΑΦ Ξ Α Fresh Τ Grocer Radian Hill ΚΑΘ ΖΨ Walnut Street Walnut Street 34th Street ΣΦΕ Du Bois GSE Street 37th 39th Street Annenberg Van Pelt Α Rotunda ΠΚΦ ∆ Movie Huntsman Π Hillel ΑΧΡ theater Rodin ΔΦ SP2 Woodland Walk Locust Walk ΑΤΩ ΣΧ Locust Walk ΔΨ ΦΓΔ 3609-11 36th Street Fisher Class of 1920 Commons ΚΣ Φ Fine 38th Street 40th Street Δ Harnwell Steinberg- Arts McNeil Θ Deitrich ΨΥ College Hall Cohen Harrison ΖΒΤ Houston Irvine Van Pelt Σ Α Β Wistar Williams Α Χ Θ Allegro 41st Street 41st Spruce Street Ε Ω Π Spruce Street Δ Φ The Quad Δ Κ Stouffer ΔΚΕ Δ Ψ Σ Χ ΠΠ Κ Ω Κ Λ HUP N ΑΦ Vet school Pine Street Chapter Letters Address Page Chapter Letters Address Page Chapter Letters Address Page Alpha Chi Omega* ΑΧΩ 3906 Spruce St. 9 Kappa Alpha Society ΚΑ 124 S. 39th St. 15 Sigma Alpha Mu ΣΑΜ 3817 Walnut St. 17 Alpha Chi Rho ΑΧΡ 219 S. 36th St. 7 Kappa Alpha Theta* ΚΑΘ 130 S. 39th St. 15 Sigma Chi ΣΧ 3809 Locust Walk 3 Alpha Delta Pi* ADP 4032 Walnut St. 14 Kappa Sigma ΚΣ 3706 Locust Walk 4 Sigma Delta Tau* ΣΔΤ 3831-33 Walnut St. 16 Alpha Phi* ΑΦ 4045 Walnut St. 14 Lambda Chi Alpha ΛΧΑ 128 S. 39th St. 15 Sigma Kappa* ΣΚ 3928 Spruce St. 11 Alpha Tau Omega ΑΤΩ 225 S. 39th St.
    [Show full text]
  • UC Riverside UC Riverside Previously Published Works
    UC Riverside UC Riverside Previously Published Works Title Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more Permalink https://escholarship.org/uc/item/9351z60j Authors Lindsay, Bruce G Yao, Weixin Publication Date 2012 Peer reviewed eScholarship.org Powered by the California Digital Library University of California 712 The Canadian Journal of Statistics Vol. 40, No. 4, 2012, Pages 712–730 La revue canadienne de statistique Fisher information matrix: A tool for dimension reduction, projection pursuit, independent component analysis, and more Bruce G. LINDSAY1 and Weixin YAO2* 1Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA 2Department of Statistics, Kansas State University, Manhattan, KS 66502, USA Key words and phrases: Dimension reduction; Fisher information matrix; independent component analysis; projection pursuit; white noise matrix. MSC 2010: Primary 62-07; secondary 62H99. Abstract: Hui & Lindsay (2010) proposed a new dimension reduction method for multivariate data. It was based on the so-called white noise matrices derived from the Fisher information matrix. Their theory and empirical studies demonstrated that this method can detect interesting features from high-dimensional data even with a moderate sample size. The theoretical emphasis in that paper was the detection of non-normal projections. In this paper we show how to decompose the information matrix into non-negative definite information terms in a manner akin to a matrix analysis of variance. Appropriate information matrices will be identified for diagnostics for such important modern modelling techniques as independent component models, Markov dependence models, and spherical symmetry. The Canadian Journal of Statistics 40: 712– 730; 2012 © 2012 Statistical Society of Canada Resum´ e:´ Hui et Lindsay (2010) ont propose´ une nouvelle methode´ de reduction´ de la dimension pour les donnees´ multidimensionnelles.
    [Show full text]
  • Statistical Estimation in Multivariate Normal Distribution
    STATISTICAL ESTIMATION IN MULTIVARIATE NORMAL DISTRIBUTION WITH A BLOCK OF MISSING OBSERVATIONS by YI LIU Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT ARLINGTON DECEMBER 2017 Copyright © by YI LIU 2017 All Rights Reserved ii To Alex and My Parents iii Acknowledgements I would like to thank my supervisor, Dr. Chien-Pai Han, for his instructing, guiding and supporting me over the years. You have set an example of excellence as a researcher, mentor, instructor and role model. I would like to thank Dr. Shan Sun-Mitchell for her continuously encouraging and instructing. You are both a good teacher and helpful friend. I would like to thank my thesis committee members Dr. Suvra Pal and Dr. Jonghyun Yun for their discussion, ideas and feedback which are invaluable. I would like to thank the graduate advisor, Dr. Hristo Kojouharov, for his instructing, help and patience. I would like to thank the chairman Dr. Jianzhong Su, Dr. Minerva Cordero-Epperson, Lona Donnelly, Libby Carroll and other staffs for their help. I would like to thank my manager, Robert Schermerhorn, for his understanding, encouraging and supporting which make this happen. I would like to thank my employer Sabre and my coworkers for their support in the past two years. I would like to thank my husband Alex for his encouraging and supporting over all these years. In particularly, I would like to thank my parents -- without the inspiration, drive and support that you have given me, I might not be the person I am today.
    [Show full text]
  • A Sufficiency Paradox: an Insufficient Statistic Preserving the Fisher
    A Sufficiency Paradox: An Insufficient Statistic Preserving the Fisher Information Abram KAGAN and Lawrence A. SHEPP this it is possible to find an insufficient statistic T such that (1) holds. An example of a regular statistical experiment is constructed The following example illustrates this. Let g(x) be the density where an insufficient statistic preserves the Fisher information function of a gamma distribution Gamma(3, 1), that is, contained in the data. The data are a pair (∆,X) where ∆ is a 2 −x binary random variable and, given ∆, X has a density f(x−θ|∆) g(x)=(1/2)x e ,x≥ 0; = 0,x<0. depending on a location parameter θ. The phenomenon is based on the fact that f(x|∆) smoothly vanishes at one point; it can be Take a binary variable ∆ with eliminated by adding to the regularity of a statistical experiment P (∆=1)= w , positivity of the density function. 1 P (∆=2)= w2,w1 + w2 =1,w1 =/ w2 (2) KEY WORDS: Convexity; Regularity; Statistical experi- ment. and let Y be a continuous random variable with the conditional density f(y|∆) given by f1(y)=f(y|∆=1)=.7g(y),y≥ 0; = .3g(−y),y≤ 0, 1. INTRODUCTION (3) Let P = {p(x; θ),θ∈ Θ} be a parametric family of densities f2(y)=f(y|∆=2)=.3g(y),y≥ 0; = .7g(−y),y≤ 0. (with respect to a measure µ) of a random element X taking values in a measurable space (X , A), the parameter space Θ (4) being an interval. Following Ibragimov and Has’minskii (1981, There is nothing special in the pair (.7,.3); any pair of positive chap.
    [Show full text]
  • AIX Globalization
    AIX Version 7.1 AIX globalization IBM Note Before using this information and the product it supports, read the information in “Notices” on page 233 . This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document............................................................................................vii Highlighting.................................................................................................................................................vii Case-sensitivity in AIX................................................................................................................................vii ISO 9000.....................................................................................................................................................vii AIX globalization...................................................................................................1 What's new...................................................................................................................................................1 Separation of messages from programs..................................................................................................... 1 Conversion between code sets.............................................................................................................
    [Show full text]
  • A Note on Inference in a Bivariate Normal Distribution Model Jaya
    A Note on Inference in a Bivariate Normal Distribution Model Jaya Bishwal and Edsel A. Peña Technical Report #2009-3 December 22, 2008 This material was based upon work partially supported by the National Science Foundation under Grant DMS-0635449 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Statistical and Applied Mathematical Sciences Institute PO Box 14006 Research Triangle Park, NC 27709-4006 www.samsi.info A Note on Inference in a Bivariate Normal Distribution Model Jaya Bishwal¤ Edsel A. Pena~ y December 22, 2008 Abstract Suppose observations are available on two variables Y and X and interest is on a parameter that is present in the marginal distribution of Y but not in the marginal distribution of X, and with X and Y dependent and possibly in the presence of other parameters which are nuisance. Could one gain more e±ciency in the point estimation (also, in hypothesis testing and interval estimation) about the parameter of interest by using the full data (both Y and X values) instead of just the Y values? Also, how should one measure the information provided by random observables or their distributions about the parameter of interest? We illustrate these issues using a simple bivariate normal distribution model. The ideas could have important implications in the context of multiple hypothesis testing or simultaneous estimation arising in the analysis of microarray data, or in the analysis of event time data especially those dealing with recurrent event data.
    [Show full text]
  • Risk, Scores, Fisher Information, and Glrts (Supplementary Material for Math 494) Stanley Sawyer — Washington University Vs
    Risk, Scores, Fisher Information, and GLRTs (Supplementary Material for Math 494) Stanley Sawyer | Washington University Vs. April 24, 2010 Table of Contents 1. Statistics and Esimatiors 2. Unbiased Estimators, Risk, and Relative Risk 3. Scores and Fisher Information 4. Proof of the Cram¶erRao Inequality 5. Maximum Likelihood Estimators are Asymptotically E±cient 6. The Most Powerful Hypothesis Tests are Likelihood Ratio Tests 7. Generalized Likelihood Ratio Tests 8. Fisher's Meta-Analysis Theorem 9. A Tale of Two Contingency-Table Tests 1. Statistics and Estimators. Let X1;X2;:::;Xn be an independent sample of observations from a probability density f(x; θ). Here f(x; θ) can be either discrete (like the Poisson or Bernoulli distributions) or continuous (like normal and exponential distributions). In general, a statistic is an arbitrary function T (X1;:::;Xn) of the data values X1;:::;Xn. Thus T (X) for X = (X1;X2;:::;Xn) can depend on X1;:::;Xn, but cannot depend on θ. Some typical examples of statistics are X + X + ::: + X T (X ;:::;X ) = X = 1 2 n (1:1) 1 n n = Xmax = maxf Xk : 1 · k · n g = Xmed = medianf Xk : 1 · k · n g These examples have the property that the statistic T (X) is a symmetric function of X = (X1;:::;Xn). That is, any permutation of the sample X1;:::;Xn preserves the value of the statistic. This is not true in general: For example, for n = 4 and X4 > 0, T (X1;X2;X3;X4) = X1X2 + (1=2)X3=X4 is also a statistic. A statistic T (X) is called an estimator of a parameter θ if it is a statistic that we think might give a reasonable guess for the true value of θ.
    [Show full text]
  • Evaluating Fisher Information in Order Statistics Mary F
    Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2011 Evaluating Fisher Information in Order Statistics Mary F. Dorn [email protected] Follow this and additional works at: http://opensiuc.lib.siu.edu/gs_rp Recommended Citation Dorn, Mary F., "Evaluating Fisher Information in Order Statistics" (2011). Research Papers. Paper 166. http://opensiuc.lib.siu.edu/gs_rp/166 This Article is brought to you for free and open access by the Graduate School at OpenSIUC. It has been accepted for inclusion in Research Papers by an authorized administrator of OpenSIUC. For more information, please contact [email protected]. EVALUATING FISHER INFORMATION IN ORDER STATISTICS by Mary Frances Dorn B.A., University of Chicago, 2009 A Research Paper Submitted in Partial Fulfillment of the Requirements for the Master of Science Degree Department of Mathematics in the Graduate School Southern Illinois University Carbondale July 2011 RESEARCH PAPER APPROVAL EVALUATING FISHER INFORMATION IN ORDER STATISTICS By Mary Frances Dorn A Research Paper Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the field of Mathematics Approved by: Dr. Sakthivel Jeyaratnam, Chair Dr. David Olive Dr. Kathleen Pericak-Spector Graduate School Southern Illinois University Carbondale July 14, 2011 ACKNOWLEDGMENTS I would like to express my deepest appreciation to my advisor, Dr. Sakthivel Jeyaratnam, for his help and guidance throughout the development of this paper. Special thanks go to Dr. Olive and Dr. Pericak-Spector for serving on my committee. I would also like to thank the Department of Mathematics at SIUC for providing many wonderful learning experiences. Finally, none of this would have been possible without the continuous support and encouragement from my parents.
    [Show full text]
  • CHAPTER BYLAWS – XI Gamma Chapter
    CHAPTER BYLAWS – XI Gamma Chapter Beta Alpha Psi recognizes academic excellence and complements members' formal education by providing interaction among students, faculty, and professionals, and fosters lifelong growth, service and ethical conduct. Article 1: Name of the organization The name of this organization shall be the Xi Gamma of Beta Alpha Psi at Auburn University at Montgomery. Article 2: The Objectives and Purposes of the Chapter Beta Alpha Psi is an honorary and professional organization for students of accountancy. Its stated purpose includes recognition of outstanding academic achievements in the field of accounting; promotion of the study and practice of accounting; provision of opportunities for self‐ development and association among members and accounting professionals; and encouragement of a sense of ethical, social, and public responsibility. The organization seeks to develop its members' professionalism through career‐oriented, social, and community activities. Article 3: Recognized Candidateship To be admitted as a recognized‐candidate in the organization, a student must successfully complete a semester as a candidate and attain minimum academic performance. Chapter Candidate Requirements: 1. Be a matriculated undergraduate or postgraduate student with a concentration in accounting. 2. Completed (in addition to one and one‐half years of collegiate courses), at least one upper level course beyond the business core in the appropriate major with a 3.0 average or better in all upper‐level courses taken in the accounting. 3. Attain an overall minimum cumulative grade point average of 3.0. 4. Must have a minimum of 2 semesters remaining before graduation. 5. Payment of candidate fee as determined by the executive committee of the petitioning chapter.
    [Show full text]
  • Information-Geometric Optimization Algorithms: a Unifying Picture Via Invariance Principles
    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles Ludovic Arnold, Anne Auger, Nikolaus Hansen, Yann Ollivier Abstract We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space 푋 into a continuous-time black-box optimization method on 푋, the information- geometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitrary choices to a minimum. The resulting method conducts a natural gradient ascent using an adaptive, time-dependent transformation of the objective function, and makes no particular assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. The cross-entropy method is recovered in a particular case with a large time step, and can be extended into a smoothed, parametrization-independent maximum likelihood update. When applied to specific families of distributions on discrete or con- tinuous spaces, the IGO framework allows to naturally recover versions of known algorithms. From the family of Gaussian distributions on R푑, we arrive at a version of the well-known CMA-ES algorithm. From the family of Bernoulli distributions on {0, 1}푑, we recover the seminal PBIL algorithm. From the distributions of restricted Boltzmann ma- chines, we naturally obtain a novel algorithm for discrete optimization on {0, 1}푑. The IGO method achieves, thanks to its intrinsic formulation, max- imal invariance properties: invariance under reparametrization of the search space 푋, under a change of parameters of the probability dis- tribution, and under increasing transformation of the function to be optimized. The latter is achieved thanks to an adaptive formulation of the objective.
    [Show full text]