On the Consistency of Some Constrained Maximum Likelihood Estimator Used in Crash Data Modelling

Total Page:16

File Type:pdf, Size:1020Kb

On the Consistency of Some Constrained Maximum Likelihood Estimator Used in Crash Data Modelling N◦ d’ordre : 41979 Université Lille 1 – Sciences et Technologies Université Catholique de l’Afrique de l’Ouest - Unité Universitaire du Togo (UCAO-UUT) THESE EN COTUTELLE Pour obtenir le grade de Docteur de l’Université Lille 1 Discipline : Mathématiques Appliquées Ecole doctorale : Sciences Pour l’Ingénieur On the consistency of some constrained maximum likelihood estimator used in crash data modelling Présentée par Issa Cherif GERALDO soutenue le 15 décembre 2015 devant le jury composé de : M. Célestin Kokonendji Pr., Université de Franche-Comté Président M. Sergio Alvarez-Andrade MdC, HDR, Université de Compiègne Examinateur M. Joseph Ngatchou-Wandji Pr., Université Nancy 1 Rapporteur M. Abdallah Mkhadri Pr., Université Cadi Ayyad (Maroc) Rapporteur M. Assi N’Guessan MdC, HDR, Université Lille 1 Directeur M. Kossi Essona Gneyou MdC, HDR, UCAO-UUT Co-Directeur 2 Remerciements Mes premiers remerciements iront à mes directeurs de thèse, Messieurs Assi N’Guessan et Kossi Gneyou. Malgré leurs nombreuses occupations, ils ont toujours su trouver le temps pour lire mes écrits. La rigueur scientifique dont ils ont fait preuve et le regard critique qu’ils ont toujours su porter sur mes travaux m’ont considérablement aidé tout au long de ces trois années de parcours. Je tiens à adresser une mention spéciale à Monsieur N’Guessan qui s’est démené pour me trouver les financements pour ma thèse. Je remercie Messieurs Célestin Kokonendji et Sergio Alvarez-Andrade pour l’honneur qu’ils me font en acceptant le rôle d’examinateurs de cette thèse. Mes remerciements vont également à l’endroit de Messieurs Abdallah Mkhadri et Joseph Ngatchou-Wandji pour avoir accepté la mission de rapporteurs de cette thèse. Je tiens à remercier Messieurs Komi Midzodzi Pekpe, Toyo Koffi Edarh-Bossou et François Messanvi Gbeassor pour toutes les aides qu’ils m’ont apportées tout au long de mon travail. Je n’oublie pas mes parents et mes petits frères pour tout le soutien qu’ils m’ont apporté ainsi que l’amour dont ils m’ont entouré pendant toutes ces années. Je tiens également à remercier Monsieur Marius Aduayi et son épouse, Monsieur Clément Hlomaschi et son épouse pour m’avoir hébergé lors de mes séjours en France. J’adresse ma gratitude à Monsieur François Recher pour tous ses conseils. Je n’oublie pas mes amis Djidula, Julien, Karim, Quentin 1, Quentin 2, Yinouss, Yasser pour tous les bons moments partagés lors de mes séjours à Lille. Je remercie l’UCAO-UUT, le Laboratoire Paul Painlevé, l’Université Lille 1, le Laboratoire d’Accidentologie et de Biomécanique (Renault/PSA) et tous ceux qui ont contribué à la réalisation de ce travail. A tous et à chacun, je dis un sincère Merci. 3 4 Résumé Dans cette thèse, nous étudions la convergence de certains estimateurs appartenant par- tiellement au simplexe multidimensionnel et proposés dans la littérature pour combiner des données d’accidents de la route. Les estimateurs proposés sont issus de modèles proba- bilistes discrets ayant pour but de modéliser à la fois les risques d’accidents et l’effet de modifications des conditions de la route. Les composantes des estimateurs étant fonction- nellement dépendantes, il est d’usage d’utiliser les méthodes itératives classiques sous con- traintes pour obtenir des valeurs numériques de ces estimateurs. Cependant, ces méthodes s’avèrent difficiles d’accès du fait du choix de points initiaux, de l’inversion des matrices d’information et de la convergence des solutions. Certains auteurs ont proposé un algo- rithme cyclique (CA) et itératif qui contourne ces difficultés mais les propriétés numériques et la convergence presque sûre de cet algorithme n’ont pas été abordées en profondeur. Cette thèse étudie, en profondeur, les propriétés de convergence numérique et théorique de cet algorithme. Sur le plan numérique, nous analysons les aspects les plus importants et les comparons à une large gamme d’algorithmes disponibles sous R et Matlab. Sur le plan théorique, nous démontrons la convergence presque sûre des estimateurs lorsque le nombre de données d’accidents tend vers l’infini. Des simulations de données d’accidents basées sur la loi uniforme, normale et de Dirichlet viennent illustrer les résultats. Enfin, nous étudions une extension de l’algorithme CA et obtenons des résultats théoriques de la con- vergence lorsque la même modification des conditions de la route est appliquée à plusieurs sites chacun comportant plusieurs types d’accidents. Mots clés. Optimisation sous contraintes – Modèles statistiques discrets – Maximum de vraisemblance - Complément de Schur – Convergence – Accident de la route – Sécurité Routière. 5 6 Abstract In this thesis, we study the convergence of some estimators which partially belong to a multidimensional simplex and which can be found in literature to combine crash data. The proposed estimators are derived from discrete probabilistic models aiming at modelling both road accident risks and the effect of changes in the road conditions. The estimators’ components being functionally dependent, classic iterative constrained methods are usually chosen to obtain numerical values of these estimators. However, these methods turn out to be difficult to use because of the choice of starting points, of the convergence of solutions and of the information matrix inversion. Some authors have suggested a cyclic iterative algorithm (CA) by-passing these difficulties but the numerical properties and the almost sure convergence of this algorithm have not been examined in depth. The theoretical and numerical convergence properties of this algorithm are thoroughly studied in this thesis. On the numerical level, we analyse the most important aspects and compare them to a wide range of available algorithms in R or Matlab. On the theoretical level, we prove the estimators’ almost sure convergence when the number of crash data approaches infinity. Road accident data simulations based on the uniform, Gaussian and Dirichlet distributions illustrate the results. Finally, we study an extension of the CA algorithm and we get theoretical convergence results when the same road condition modification is applied to several areas, each with several types of accidents. Keywords Constrained optimization – Discrete statistical models – Maximum likelihood - Schur complement – Convergence – Road accident – Road Safety. 7 8 Contents Remerciements3 Résumé 5 Abstract 7 Main abbreviations and notations 13 General introduction 15 1 On optimization algorithms and crash data models 19 1.1 Introduction.................................... 19 1.2 Numerical algorithms for unconstrained optimization............. 20 1.2.1 Fundamentals of unconstrained optimization............. 20 1.2.2 Newton’s algorithm and Fisher scoring................. 22 1.2.2.1 Newton’s method for solving F (β) = 0 ........... 22 1.2.2.2 Newton’s method for optimization.............. 23 1.2.2.3 Fisher’s scoring......................... 24 1.2.3 Quasi-Newton algorithms........................ 25 1.2.4 Line search methods........................... 27 1.2.5 MM and EM algorithms......................... 27 1.2.5.1 MM algorithms......................... 27 1.2.5.2 EM algorithms......................... 29 1.2.6 Block-relaxation Algorithms....................... 30 1.2.7 Trust region algorithms......................... 31 1.2.8 Special algorithms for least-squares optimization........... 32 1.2.8.1 Gauss-Newton algorithm................... 33 1.2.8.2 Levenberg-Marquardt algorithm............... 34 1.2.9 Derivative Free Optimization (DFO): Nelder-Mead’s algorithm... 34 1.3 Numerical algorithms for constrained optimization.............. 36 1.3.1 Fundamentals of constrained optimization............... 37 1.3.2 Penalty methods............................. 38 1.3.3 Barrier methods............................. 39 1.3.4 Augmented Lagrangian method..................... 39 9 10 Contents 1.3.5 Interior point methods.......................... 40 1.4 Statistical models for crash data: state of the art............... 41 1.4.1 Poisson regression model......................... 41 1.4.2 Negative binomial regression model................... 42 1.4.3 Poisson-lognormal and Poisson-Weibull models............ 42 1.4.4 Bivariate and multivariate models................... 42 1.4.4.1 Tanner’s model......................... 43 1.4.4.2 Generalizations of Tanner’s model.............. 43 1.4.4.3 Other bivariate/multivariate models............. 46 1.5 Conclusion.................................... 46 2 A cyclic algorithm for maximum likelihood estimation 49 2.1 Introduction.................................... 49 2.2 Schur complement and block matrix inversion................. 51 2.3 The cyclic algorithm: state of the art...................... 53 2.3.1 Problem setting and models....................... 53 2.3.2 Constrained maximum likelihood estimation of the parameters... 55 2.3.3 The cyclic algorithm (CA)........................ 58 2.4 Theoretical study of some numerical properties of the Cyclic Algorithm.. 62 2.5 Numerical experiments.............................. 68 2.5.1 Data generation principle........................ 72 2.5.2 Results.................................. 72 2.5.2.1 Numerical study of the cyclic algorithm........... 73 2.5.2.2 Comparison with other algorithms.............. 74 2.5.3 Illustration of Theorems 2.4.1 and 2.4.2................ 78 2.6 Conclusion.................................... 79 3 Strong consistency of the cyclic MLE 85 3.1 Introduction.................................... 85 3.2 Consistency of the MLE : an overview..................... 87 3.3
Recommended publications
  • Machine Learning Or Econometrics for Credit Scoring: Let’S Get the Best of Both Worlds Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi
    Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi To cite this version: Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi. Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds. 2020. hal-02507499v2 HAL Id: hal-02507499 https://hal.archives-ouvertes.fr/hal-02507499v2 Preprint submitted on 2 Nov 2020 (v2), last revised 15 Jan 2021 (v3) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds∗ Elena, Dumitrescu,y Sullivan, Hu´e,z Christophe, Hurlin,x Sessi, Tokpavi{ October 18, 2020 Abstract In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we pro- pose to obtain the best of both worlds by introducing a high-performance and inter- pretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression.
    [Show full text]
  • 18.650 (F16) Lecture 10: Generalized Linear Models (Glms)
    Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X (µ(X), σ2I), | ∼ N And ⊤ IE(Y X) = µ(X) = X β, | 2/52 Components of a linear model The two components (that we are going to relax) are 1. Random component: the response variable Y X is continuous | and normally distributed with mean µ = µ(X) = IE(Y X). | 2. Link: between the random and covariates X = (X(1),X(2), ,X(p))⊤: µ(X) = X⊤β. · · · 3/52 Generalization A generalized linear model (GLM) generalizes normal linear regression models in the following directions. 1. Random component: Y some exponential family distribution ∼ 2. Link: between the random and covariates: ⊤ g µ(X) = X β where g called link function� and� µ = IE(Y X). | 4/52 Example 1: Disease Occuring Rate In the early stages of a disease epidemic, the rate at which new cases occur can often increase exponentially through time. Hence, if µi is the expected number of new cases on day ti, a model of the form µi = γ exp(δti) seems appropriate. ◮ Such a model can be turned into GLM form, by using a log link so that log(µi) = log(γ) + δti = β0 + β1ti. ◮ Since this is a count, the Poisson distribution (with expected value µi) is probably a reasonable distribution to try. 5/52 Example 2: Prey Capture Rate(1) The rate of capture of preys, yi, by a hunting animal, tends to increase with increasing density of prey, xi, but to eventually level off, when the predator is catching as much as it can cope with.
    [Show full text]
  • Maximum Likelihood from Incomplete Data Via the EM Algorithm A. P. Dempster; N. M. Laird; D. B. Rubin Journal of the Royal Stati
    Maximum Likelihood from Incomplete Data via the EM Algorithm A. P. Dempster; N. M. Laird; D. B. Rubin Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38. Stable URL: http://links.jstor.org/sici?sici=0035-9246%281977%2939%3A1%3C1%3AMLFIDV%3E2.0.CO%3B2-Z Journal of the Royal Statistical Society. Series B (Methodological) is currently published by Royal Statistical Society. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/rss.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact [email protected]. http://www.jstor.org Fri Apr 6 01:07:17 2007 Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER,N. M. LAIRDand D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYALSTATISTICAL SOCIETY at a meeting organized by the RESEARCH SECTIONon Wednesday, December 8th, 1976, Professor S.
    [Show full text]
  • Analysis of Computer Experiments Using Penalized Likelihood in Gaussian Kriging Models
    Editor’s Note: This article will be presented at the Technometrics invited session at the 2005 Joint Statistical Meetings in Minneapolis, Minnesota, August 2005. Analysis of Computer Experiments Using Penalized Likelihood in Gaussian Kriging Models Runze LI Agus SUDJIANTO Department of Statistics Risk Management Quality & Productivity Pennsylvania State University Bank of America University Park, PA 16802 Charlotte, NC 28255 ([email protected]) ([email protected]) Kriging is a popular analysis approach for computer experiments for the purpose of creating a cheap- to-compute “meta-model” as a surrogate to a computationally expensive engineering simulation model. The maximum likelihood approach is used to estimate the parameters in the kriging model. However, the likelihood function near the optimum may be flat in some situations, which leads to maximum likelihood estimates for the parameters in the covariance matrix that have very large variance. To overcome this difficulty, a penalized likelihood approach is proposed for the kriging model. Both theoretical analysis and empirical experience using real world data suggest that the proposed method is particularly important in the context of a computationally intensive simulation model where the number of simulation runs must be kept small because collection of a large sample set is prohibitive. The proposed approach is applied to the reduction of piston slap, an unwanted engine noise due to piston secondary motion. Issues related to practical implementation of the proposed approach are discussed. KEY WORDS: Computer experiment; Fisher scoring algorithm; Kriging; Meta-model; Penalized like- lihood; Smoothly clipped absolute deviation. 1. INTRODUCTION e.g., Ye, Li, and Sudjianto 2000) and by the meta-modeling ap- proach itself (Jin, Chen, and Simpson 2000).
    [Show full text]
  • Properties of the Zero-And-One Inflated Poisson Distribution and Likelihood
    Statistics and Its Interface Volume 9 (2016) 11–32 Properties of the zero-and-one inflated Poisson distribution and likelihood-based inference methods Chi Zhang, Guo-Liang Tian∗, and Kai-Wang Ng The over-dispersion may result from other phenomena such as a high proportion of both zeros and ones in the To model count data with excess zeros and excess ones, count data. For example, Eriksson and Aberg˚ [8] reported a in their unpublished manuscript, Melkersson and Olsson two-year panel data from Swedish Level of Living Surveys (1999) extended the zero-inflated Poisson distribution to in 1974 and 1991, where the visits to a dentist have higher a zero-and-one-inflated Poisson (ZOIP) distribution. How- proportions of zeros and ones and one-visit observations are ever, the distributional theory and corresponding properties even much more frequent than zero-visits. It is common to of the ZOIP have not yet been explored, and likelihood- visit a dentist for a routine control, e.g., school-children go based inference methods for parameters of interest were not to a dentist once a year almost as a rule. As the second ex- well developed. In this paper, we extensively study the ZOIP ample, Carrivick et al.[3] reported that much of the data distribution by first constructing five equivalent stochastic collected on occupational safety involves accident or injury representations for the ZOIP random variable and then de- counts with many zeros and ones. In general, the safety reg- riving other important distributional properties. Maximum ulations protect most workers from injury so that many ze- likelihood estimates of parameters are obtained by both the ros are observed.
    [Show full text]
  • Constrained Optimization 5
    Constrained Optimization 5 Most problems in structural optimization must be formulated as constrained min- imization problems. In a typical structural design problem the objective function is a fairly simple function of the design variables (e.g., weight), but the design has to satisfy a host of stress, displacement, buckling, and frequency constraints. These constraints are usually complex functions of the design variables available only from an analysis of a finite element model of the structure. This chapter offers a review of methods that are commonly used to solve such constrained problems. The methods described in this chapter are for use when the computational cost of evaluating the objective function and constraints is small or moderate. In these meth- ods the objective function or constraints these are calculated exactly (e.g., by a finite element program) whenever they are required by the optimization algorithm. This approach can require hundreds of evaluations of objective function and constraints, and is not practical for problems where a single evaluation is computationally ex- pensive. For these more expensive problems we go through an intermediate stage of constructing approximations for the objective function and constraints, or at least for the more expensive functions. The optimization is then performed on the approx- imate problem. This approximation process is described in the next chapter. The basic problem that we consider in this chapter is the minimization of a function subject to equality and inequality constraints minimize f(x) such that hi(x) = 0, i = 1, . , ne , (5.1) gj(x) ≥ 0, j = 1, . , ng . The constraints divide the design space into two domains, the feasible domain where the constraints are satisfied, and the infeasible domain where at least one of the constraints is violated.
    [Show full text]
  • A Brief Survey of Modern Optimization for Statisticians1
    International Statistical Review (2014), 82, 1, 46–70 doi:10.1111/insr.12022 A Brief Survey of Modern Optimization for Statisticians1 Kenneth Lange1,2,EricC.Chi2 and Hua Zhou3 1Departments of Biomathematics and Statistics, University of California, Los Angeles, CA 90095-1766, USA E-mail: [email protected] 2Department of Human Genetics, University of California, Los Angeles, CA 90095-1766, USA E-mail: [email protected] 3Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, USA E-mail: [email protected] Summary Modern computational statistics is turning more and more to high-dimensional optimization to handle the deluge of big data. Once a model is formulated, its parameters can be estimated by optimization. Because model parsimony is important, models routinely include non-differentiable penalty terms such as the lasso. This sober reality complicates minimization and maximization. Our broad survey stresses a few important principles in algorithm design. Rather than view these principles in isolation, it is more productive to mix and match them. A few well-chosen examples illustrate this point. Algorithm derivation is also emphasized, and theory is downplayed, particularly the abstractions of the convex calculus. Thus, our survey should be useful and accessible to a broad audience. Key words: Block relaxation; Newton’s method; MM algorithm; penalization; augmented Lagrangian; acceleration. 1 Introduction Modern statistics represents a confluence of data, algorithms, practical inference, and subject area knowledge. As data mining expands, computational statistics is assuming greater promi- nence. Surprisingly, the confident prediction of the previous generation that Bayesian methods would ultimately supplant frequentist methods has given way to a realization that Markov chain Monte Carlo may be too slow to handle modern data sets.
    [Show full text]
  • On the Maximization of Likelihoods Belonging to the Exponential Family
    arXiv.org manuscript No. (will be inserted by the editor) On the maximization of likelihoods belonging to the exponential family using ideas related to the Levenberg-Marquardt approach Marco Giordan · Federico Vaggi · Ron Wehrens Received: date / Revised: date Abstract The Levenberg-Marquardt algorithm is a flexible iterative procedure used to solve non-linear least squares problems. In this work we study how a class of possible adaptations of this procedure can be used to solve maximum likelihood problems when the underlying distributions are in the exponential family. We formally demonstrate a local convergence property and we discuss a possible implementation of the penalization involved in this class of algorithms. Applications to real and simulated compositional data show the stability and efficiency of this approach. Keywords Aitchison distribution · compositional data · Dirichlet distribution · generalized linear models · natural link · optimization. arXiv:1410.0793v1 [stat.CO] 3 Oct 2014 M. Giordan IASMA Research and Innovation Centre, Biostatistics and Data Managment E-mail: [email protected] F. Vaggi IASMA Research and Innovation Centre, Integrative Genomic Research E-mail: [email protected] Ron Wehrens IASMA Research and Innovation Centre, Biostatistics and Data Managment E-mail: [email protected] 2 Marco Giordan et al. 1 Introduction The exponential family is a large class of probability distributions widely used in statistics. Distributions in this class possess many useful properties that make them useful for inferential and algorithmic purposes, especially with reference to maximum likelihood estimation. Despite this, when closed form solutions are not available, computational problems can arise in the fitting of a particular distribution in this family to real data.
    [Show full text]
  • Tabu Search for Constraint Satisfaction
    Tabu Search for Constraint Solving and Its Applications Jin-Kao Hao LERIA University of Angers 2 Boulevard Lavoisier 49045 Angers Cedex 01 - France 1. Introduction The Constraint Satisfaction Problem (CSP) is a very general problem able to conveniently formulate a wide range of important applications in combinatorial optimization. These include well-known classical problems such as graph k-coloring and satisfiability, as well as many practical applications related to resource assignments, planning and timetabling. The feasibility problem of Integer Programming can also be easily represented by the CSP formalism. As elaborated below, we provide case studies for specific CSP applications, including the important realm of frequency assignment, and establish that tabu search is the best performing approach for these applications. The Constraint Satisfaction Problem is defined by a triplet (X, D, C) where: – X = {x1,· · ·,xn} is a finite set of n variables. – D = {Dx1 ,· · ·,Dxn} is a set of associated domains. Each domain Dxi specifies the finite set of possible values of the variable xi. – C = {C1,· · ·,Cp} is a finite set of p constraints. Each constraint is defined on a set of variables and specifies which combinations of values are compatible for these variables. Given such a triplet, the basic problem consists in finding a complete assignment of the values to the variables such that that all the constraints are satisfied. Such an assignment is called a solution to the given CSP instance. Since the set of all assignments is given by the Cartesian product Dx1 ×· · ·×Dxn, solving a CSP means to determine a particular assignment among a potentially huge search space.
    [Show full text]
  • Gaussian Process Learning Via Fisher Scoring of Vecchia's Approximation
    Gaussian Process Learning via Fisher Scoring of Vecchia's Approximation Joseph Guinness Cornell University, Department of Statistics and Data Science Abstract We derive a single pass algorithm for computing the gradient and Fisher information of Vecchia's Gaussian process loglikelihood approximation, which provides a computationally efficient means for applying the Fisher scoring algorithm for maximizing the loglikelihood. The advantages of the optimization techniques are demonstrated in numerical examples and in an application to Argo ocean temperature data. The new methods are more accurate and much faster than an optimization method that uses only function evaluations, especially when the covariance function has many parameters. This allows practitioners to fit nonstationary models to large spatial and spatial-temporal datasets. 1 Introduction The Gaussian process model is an indispensible tool for the analysis of spatial and spatial-temporal datasets and has become increasingly popular as a general-purpose model for functions. Because of its high computational burden, researchers have devoted substantial effort to developing nu- merical approximations for Gaussian process computations. Much of the work focuses on efficient approximation of the likelihood function. Fast likelihood evaluations are crucial for optimization procedures that require many evaluations of the likelihood, such as the default Nelder-Mead al- gorithm (Nelder and Mead, 1965) in the R optim function. The likelihood must be repeatedly evaluated in MCMC algorithms as well. Compared to the amount of literature on efficient likelihood approximations, there has been considerably less development of techniques for numerically maximizing the likelihood (see Geoga et al. (2018) for one recent example). This article aims to address the disparity by providing: 1.
    [Show full text]
  • Stochastic Gradient Descent As Approximate Bayesian Inference
    Journal of Machine Learning Research 18 (2017) 1-35 Submitted 4/17; Revised 10/17; Published 12/17 Stochastic Gradient Descent as Approximate Bayesian Inference Stephan Mandt [email protected] Data Science Institute Department of Computer Science Columbia University New York, NY 10025, USA Matthew D. Hoffman [email protected] Adobe Research Adobe Systems Incorporated 601 Townsend Street San Francisco, CA 94103, USA David M. Blei [email protected] Department of Statistics Department of Computer Science Columbia University New York, NY 10025, USA Editor: Manfred Opper Abstract Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specif- ically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distri- butions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also show how to tune SGD with momentum for approximate sampling. (4) We analyze stochastic-gradient MCMC algorithms. For Stochastic-Gradient Langevin Dynamics and Stochastic-Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process per- spective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.
    [Show full text]
  • Constrained Quadratic Optimization: Theory and Application for Wireless Communication Systems
    Constrained Quadratic Optimization: Theory and Application for Wireless Communication Systems by Erik Hons A thesis presented to the University of Waterloo in ful¯lment of the thesis requirement for the degree of Master of Applied Science in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2001 c Erik Hons 2001 ° I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. I further authorize the University of Waterloo to reproduce this thesis by pho- tocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. ii The University of Waterloo requires the signatures of all persons using or pho- tocopying this thesis. Please sign below, and give address and date. iii Abstract The suitability of general constrained quadratic optimization as a modeling and solution tool for wireless communication is discussed. It is found to be appropriate due to the many quadratic entities present in wireless communication models. It is posited that decisions which a®ect the minimum mean squared error at the receiver may achieve improved performance if those decisions are energy constrained. That theory is tested by applying a speci¯c type of constrained quadratic optimization problem to the forward link of a cellular DS-CDMA system. Through simulation it is shown that when channel coding is used, energy constrained methods typi- cally outperform unconstrained methods. Furthermore, a new energy-constrained method is presented which signi¯cantly outperforms a similar published method in several di®erent environments.
    [Show full text]