Sliced Score Matching: a Scalable Approach to Density and Score Estimation

Total Page:16

File Type:pdf, Size:1020Kb

Sliced Score Matching: a Scalable Approach to Density and Score Estimation Sliced Score Matching: A Scalable Approach to Density and Score Estimation Yang Song∗ Sahaj Garg∗ Jiaxin Shi Stefano Ermon Stanford University Stanford University Tsinghua University Stanford University Abstract matching only depends on the scores, which are oblivious to the (usually) intractable partition functions. However, Score matching is a popular method for esti- score matching requires the computation of the diago- mating unnormalized statistical models. How- nal elements of the Hessian of the model’s log-density ever, it has been so far limited to simple, shal- function. This Hessian trace computation is generally low models or low-dimensional data, due to expensive (Martens et al., 2012), requiring a number of the difficulty of computing the Hessian of log- forward and backward propagations proportional to the density functions. We show this difficulty can data dimension. This severely limits its applicability to be mitigated by projecting the scores onto ran- complex models parameterized by deep neural networks, dom vectors before comparing them. This ob- such as deep energy-based models (LeCun et al., 2006; jective, called sliced score matching, only in- Wenliang et al., 2019). volves Hessian-vector products, which can be Several approaches have been proposed to alleviate this easily implemented using reverse-mode auto- difficulty: Kingma & LeCun (2010) propose approximate matic differentiation. Therefore, sliced score backpropagation for computing the trace of the Hessian; matching is amenable to more complex models Martens et al. (2012) develop curvature propagation, a and higher dimensional data compared to score fast stochastic estimator for the trace in score matching; matching. Theoretically, we prove the consis- and Vincent (2011) transforms score matching to a de- tency and asymptotic normality of sliced score noising problem which avoids second-order derivatives. matching estimators. Moreover, we demon- These methods have achieved some success, but may strate that sliced score matching can be used suffer from one or more of the following problems: incon- to learn deep score estimators for implicit dis- sistent parameter estimation, large estimation variance, tributions. In our experiments, we show sliced and cumbersome implementation. score matching can learn deep energy-based models effectively, and can produce accurate To alleviate these problems, we propose sliced score score estimates for applications such as varia- matching, a variant of score matching that can scale to tional inference with implicit distributions and deep unnormalized models and high dimensional data. arXiv:1905.07088v2 [cs.LG] 27 Jun 2019 training Wasserstein Auto-Encoders. The key intuition is that instead of directly matching the high-dimensional scores, we match their projections along random directions. Theoretically, we show that 1 INTRODUCTION under some regularity conditions, sliced score matching is a well-defined statistical estimation criterion that yields Score matching (Hyvärinen, 2005) is particularly suitable consistent and asymptotically normal parameter estimates. for learning unnormalized statistical models, such as en- Moreover, compared to the methods of Kingma & LeCun ergy based ones. It is based on minimizing the distance be- (2010) and Martens et al. (2012), whose implementations tween the derivatives of the log-density functions (a.k.a., require customized backpropagation for deep networks, scores) of the data and model distributions. Unlike maxi- sliced score matching only involves Hessian-vector prod- mum likelihood estimation (MLE), the objective of score ucts, thus can be easily and efficiently implemented in ∗ frameworks such as TensorFlow (Abadi et al., 2016) and Joint first authors. Correspondence to Yang Song <[email protected]> and Stefano Ermon <er- PyTorch (Adam et al., 2017). [email protected]>. Beyond training unnormalized models, sliced score between pd and pm(·; θ), which is defined as matching can also be naturally adapted as an objective for 1 estimating the score function of a data generating distri- L(θ) [ks (x; θ) − s (x)k2]: (1) , 2Epd m d 2 bution (Sasaki et al., 2014; Strathmann et al., 2015) by training a score function model parameterized by deep Since sm(x; θ) does not involve Zθ, the Fisher diver- neural networks. This observation enables many new gence does not depend on the intractable partition func- applications of sliced score matching. For example, we tion. However, Eq. (1) is still not readily usable for learn- show that it can be used to provide accurate score es- ing unnormalized models, as we only have samples and timates needed for variational inference with implicit do not have access to the score function of the data sd(x). distributions (Huszár, 2017) and learning Wasserstein By applying integration by parts, Hyvärinen (2005) shows Auto-Encoders (WAE, Tolstikhin et al. (2018)). that L(θ) can be written as L(θ) = J(θ) + C (cf ., Theo- Finally, we evaluate the performance of sliced score rem 1 in Hyvärinen (2005)), where matching on learning unnormalized statistical models 1 2 (density estimation) and estimating score functions of J(θ) p tr(rxsm(x; θ)) + ksm(x; θ)k ; (2) , E d 2 2 a data generating process (score estimation). For density estimation, experiments on deep kernel exponential fami- C is a constant that does not depend on θ, tr(·) denotes lies (Wenliang et al., 2019) and NICE flow models (Dinh the trace of a matrix, and et al., 2015) show that our method is either more scalable 2 or more accurate than existing score matching variants. rxsm(x; θ) = rx logp ~m(x; θ) (3) For score estimation, our method improves the perfor- is the Hessian of the log-density function. The constant mance of variational auto-encoders (VAE) with implicit can be ignored and the following unbiased estimator of encoders, and can train WAEs without a discriminator the remaining terms is used to train p~m(x; θ): or MMD loss by directly optimizing the KL divergence N 1 X 1 2 between aggregated posteriors and the prior. In both situ- J^(θ; xN ) tr(r s (x ; θ)) + ks (x ; θ)k ; 1 , N x m i 2 m i 2 ations we outperformed kernel-based score estimators (Li i=1 & Turner, 2018; Shi et al., 2018) by achieving better test where xN is a shorthand used throughout the paper for likelihoods and better sample quality in image generation. 1 a collection of N data points fx1; x2; ··· ; xN g sampled i.i.d. from pd, and rxsm(xi; θ) denotes the Hessian of 2 BACKGROUND logp ~m(x; θ) evaluted at xi. D Computational Difficulty. While the score matching Given i.i.d. samples fx1; x2; ··· ; xN g ⊂ R from a data ^ N distribution pd(x), our task is to learn an unnormalized objective J(θ; x1 ) avoids the computation of Zθ for un- density, p~m(x; θ), where θ is from some parameter space normalized models, it introduces a new computational Θ. The model’s partition function is denoted as Zθ, which difficulty: computing the trace of the Hessian of a log- 2 is assumed to be existent but intractable. Let pm(x; θ) be density function, rx logp ~m. A naïve approach of com- the normalized density determined by our model, we have puting the trace of the Hessian requires D times more backward passes than computing the gradient sm = Z p~m(x; θ) rx logp ~m (see Alg. 2 in the appendix). For example, the pm(x; θ) = ;Zθ = p~m(x; θ)dx: Zθ trace could be computed by applying backpropogation D 2 times to sm to get each diagonal term of rx logp ~m se- For convenience, we denote the score functions of quentially. In practice, D can be many thousands, which pm and pd as sm(x; θ) , rx log pm(x; θ) and can render score matching too slow for practical purposes. sd(x) , rx log pd(x) respectively. Note that since Moreover, Martens et al. (2012) argues from a theoretical log pm(x; θ) = logp ~m(x; θ) − log Zθ, we immediately perspective that it is unlikely that there exists an algorithm conclude that sm(x; θ) does not depend on the intractable for computing the diagonal of the Hessian defined by an partition function Zθ. arbitrary computation graph within a constant number of forward and backward passes. 2.1 SCORE MATCHING 2.2 SCORE ESTIMATION FOR IMPLICIT Learning unnormalized models with maximum likeli- DISTRIBUTIONS hood estimation (MLE) can be difficult due to the in- tractable partition function Zθ. To avoid this, score match- Besides parameter estimation in unnormalized models, ing (Hyvärinen, 2005) minimizes the Fisher divergence score matching can also be used to estimate scores of implicit distributions, which are distributions that have a Here v ∼ pv and x ∼ pd are independent, and we require | 2 tractable sampling process but without a tractable density. Epv [vv ] 0 and Epv [kvk2] < 1. Many examples of For example, the distribution of random samples from pv satisfy these requirements. For instance, pv can be a the generator of a GAN (Goodfellow et al., 2014) is an multivariate standard normal (N (0;ID)), a multivariate implicit distribution. Implicit distributions can arise in Rademacher distribution (the uniform distribution over many more situations such as the marginal distribution {±1gD), or a uniform distribution on the hypersphere of a non-conjugate model (Sun et al., 2019), and models SD−1 (recall that D refers to the dimension of x). defined by complex simulation processes (Tran et al., To eliminate the dependence of L(θ; p ) on s (x), we 2017). In many cases learning and inference become v d use integration by parts as in score matching (cf ., the intractable due to the need of optimizing an objective that equivalence between Eq. (1) and (2)). Defining involves the intractable density. | In these cases, directly estimating the score function J(θ; pv) , Epv Epd v rxsm(x; θ)v sq(x) = rx log qθ(x) based on i.i.d.
Recommended publications
  • Diagonal Estimation with Probing Methods
    Diagonal Estimation with Probing Methods Bryan J. Kaperick Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Mathematics Matthias C. Chung, Chair Julianne M. Chung Serkan Gugercin Jayanth Jagalur-Mohan May 3, 2019 Blacksburg, Virginia Keywords: Probing Methods, Numerical Linear Algebra, Computational Inverse Problems Copyright 2019, Bryan J. Kaperick Diagonal Estimation with Probing Methods Bryan J. Kaperick (ABSTRACT) Probing methods for trace estimation of large, sparse matrices has been studied for several decades. In recent years, there has been some work to extend these techniques to instead estimate the diagonal entries of these systems directly. We extend some analysis of trace estimators to their corresponding diagonal estimators, propose a new class of deterministic diagonal estimators which are well-suited to parallel architectures along with heuristic ar- guments for the design choices in their construction, and conclude with numerical results on diagonal estimation and ordering problems, demonstrating the strengths of our newly- developed methods alongside existing methods. Diagonal Estimation with Probing Methods Bryan J. Kaperick (GENERAL AUDIENCE ABSTRACT) In the past several decades, as computational resources increase, a recurring problem is that of estimating certain properties very large linear systems (matrices containing real or complex entries). One particularly important quantity is the trace of a matrix, defined as the sum of the entries along its diagonal. In this thesis, we explore a problem that has only recently been studied, in estimating the diagonal entries of a particular matrix explicitly. For these methods to be computationally more efficient than existing methods, and with favorable convergence properties, we require the matrix in question to have a majority of its entries be zero (the matrix is sparse), with the largest-magnitude entries clustered near and on its diagonal, and very large in size.
    [Show full text]
  • A Guide on Probability Distributions
    powered project A guide on probability distributions R-forge distributions Core Team University Year 2008-2009 LATEXpowered Mac OS' TeXShop edited Contents Introduction 4 I Discrete distributions 6 1 Classic discrete distribution 7 2 Not so-common discrete distribution 27 II Continuous distributions 34 3 Finite support distribution 35 4 The Gaussian family 47 5 Exponential distribution and its extensions 56 6 Chi-squared's ditribution and related extensions 75 7 Student and related distributions 84 8 Pareto family 88 9 Logistic ditribution and related extensions 108 10 Extrem Value Theory distributions 111 3 4 CONTENTS III Multivariate and generalized distributions 116 11 Generalization of common distributions 117 12 Multivariate distributions 132 13 Misc 134 Conclusion 135 Bibliography 135 A Mathematical tools 138 Introduction This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri- butions. It is constructed in chapters of distribution family with a section for each distribution. Each section focuses on the tryptic: definition - estimation - application. Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750 univariate discrete distributions and Johnson et al. (1994) which details continuous distributions. In the appendix, we recall the basics of probability distributions as well as \common" mathe- matical functions, cf. section A.2. And for all distribution, we use the following notations • X a random variable following a given distribution, • x a realization of this random variable, • f the density function (if it exists), • F the (cumulative) distribution function, • P (X = k) the mass probability function in k, • M the moment generating function (if it exists), • G the probability generating function (if it exists), • φ the characteristic function (if it exists), Finally all graphics are done the open source statistical software R and its numerous packages available on the Comprehensive R Archive Network (CRAN∗).
    [Show full text]
  • Probability Statistics Economists
    PROBABILITY AND STATISTICS FOR ECONOMISTS BRUCE E. HANSEN Contents Preface x Acknowledgements xi Mathematical Preparation xii Notation xiii 1 Basic Probability Theory 1 1.1 Introduction . 1 1.2 Outcomes and Events . 1 1.3 Probability Function . 3 1.4 Properties of the Probability Function . 4 1.5 Equally-Likely Outcomes . 5 1.6 Joint Events . 5 1.7 Conditional Probability . 6 1.8 Independence . 7 1.9 Law of Total Probability . 9 1.10 Bayes Rule . 10 1.11 Permutations and Combinations . 11 1.12 Sampling With and Without Replacement . 13 1.13 Poker Hands . 14 1.14 Sigma Fields* . 16 1.15 Technical Proofs* . 17 1.16 Exercises . 18 2 Random Variables 22 2.1 Introduction . 22 2.2 Random Variables . 22 2.3 Discrete Random Variables . 22 2.4 Transformations . 24 2.5 Expectation . 25 2.6 Finiteness of Expectations . 26 2.7 Distribution Function . 28 2.8 Continuous Random Variables . 29 2.9 Quantiles . 31 2.10 Density Functions . 31 2.11 Transformations of Continuous Random Variables . 34 2.12 Non-Monotonic Transformations . 36 ii CONTENTS iii 2.13 Expectation of Continuous Random Variables . 37 2.14 Finiteness of Expectations . 39 2.15 Unifying Notation . 39 2.16 Mean and Variance . 40 2.17 Moments . 42 2.18 Jensen’s Inequality . 42 2.19 Applications of Jensen’s Inequality* . 43 2.20 Symmetric Distributions . 45 2.21 Truncated Distributions . 46 2.22 Censored Distributions . 47 2.23 Moment Generating Function . 48 2.24 Cumulants . 50 2.25 Characteristic Function . 51 2.26 Expectation: Mathematical Details* . 52 2.27 Exercises .
    [Show full text]
  • Robust Inversion, Dimensionality Reduction, and Randomized Sampling
    Robust inversion, dimensionality reduction, and randomized sampling Aleksandr Aravkin Michael P. Friedlander Felix J. Herrmann Tristan van Leeuwen November 16, 2011 Abstract We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several di- mensionality-reduction techniques based on data averaging or sampling, which are especially useful for large-scale problems. We survey these approaches and their connection to stochastic optimization. The data-averaging approach is only viable, however, for a least-squares misfit, which is sensitive to outliers in the data and artifacts unexplained by the forward model. This motivates us to propose a robust formulation based on the Student's t-distribution of the error. We demonstrate how the corresponding penalty function, together with the sampling approach, can obtain good results for a large-scale seismic inverse problem with 50% corrupted data. Keywords inverse problems · seismic inversion · stochastic optimization · robust estimation 1 Introduction Consider the generic parameter-estimation scheme in which we conduct m exper- iments, recording the corresponding experimental input vectors fq1; q2; : : : ; qmg and observation vectors fd1; d2; : : : ; dmg. We model the data for given parameters n x 2 R by di = Fi(x)qi + i for i = 1; : : : ; m; (1.1) This work was in part financially supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (22R81254) and the Collaborative Research and Develop- ment Grant DNOISE II (375142-08). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BPG, BP, Chevron, Conoco Phillips, Petrobras, PGS, Total SA, and WesternGeco.
    [Show full text]
  • Handbook on Probability Distributions
    R powered R-forge project Handbook on probability distributions R-forge distributions Core Team University Year 2009-2010 LATEXpowered Mac OS' TeXShop edited Contents Introduction 4 I Discrete distributions 6 1 Classic discrete distribution 7 2 Not so-common discrete distribution 27 II Continuous distributions 34 3 Finite support distribution 35 4 The Gaussian family 47 5 Exponential distribution and its extensions 56 6 Chi-squared's ditribution and related extensions 75 7 Student and related distributions 84 8 Pareto family 88 9 Logistic distribution and related extensions 108 10 Extrem Value Theory distributions 111 3 4 CONTENTS III Multivariate and generalized distributions 116 11 Generalization of common distributions 117 12 Multivariate distributions 133 13 Misc 135 Conclusion 137 Bibliography 137 A Mathematical tools 141 Introduction This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri- butions. It is constructed in chapters of distribution family with a section for each distribution. Each section focuses on the tryptic: definition - estimation - application. Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750 univariate discrete distributions and Johnson et al. (1994) which details continuous distributions. In the appendix, we recall the basics of probability distributions as well as \common" mathe- matical functions, cf. section A.2. And for all distribution, we use the following notations • X a random variable following a given distribution, • x a realization of this random variable, • f the density function (if it exists), • F the (cumulative) distribution function, • P (X = k) the mass probability function in k, • M the moment generating function (if it exists), • G the probability generating function (if it exists), • φ the characteristic function (if it exists), Finally all graphics are done the open source statistical software R and its numerous packages available on the Comprehensive R Archive Network (CRAN∗).
    [Show full text]
  • Spectral Barriers in Certification Problems
    Spectral Barriers in Certification Problems by Dmitriy Kunisky A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Mathematics New York University May, 2021 Professor Afonso S. Bandeira Professor Gérard Ben Arous © Dmitriy Kunisky all rights reserved, 2021 Dedication To my grandparents. iii Acknowledgements First and foremost, I am immensely grateful to Afonso Bandeira for his generosity with his time, his broad mathematical insight, his resourceful creativity, and of course his famous reservoir of open problems. Afonso’s wonderful knack for keeping up the momentum— for always finding some new angle or somebody new to talk to when we were stuck, for mentioning “you know, just yesterday I heard a very nice problem...” right when I could use something else to work on—made five years fly by. I have Afonso to thank for my sense of taste in research, for learning to collaborate, for remembering to enjoy myself, and above all for coming to be ever on the lookout for something new to think about. I am also indebted to Gérard Ben Arous, who, though I wound up working on topics far from his main interests, was always willing to hear out what I’d been doing, to encourage and give suggestions, and to share his truly encyclopedic knowledge of mathematics and its culture and history. On many occasions I’ve learned more tracking down what Gérard meant with the smallest aside in the classroom or in our meetings than I could manage to get out of a week on my own in the library.
    [Show full text]
  • Field Guide to Continuous Probability Distributions
    Field Guide to Continuous Probability Distributions Gavin E. Crooks v 1.0.0 2019 G. E. Crooks – Field Guide to Probability Distributions v 1.0.0 Copyright © 2010-2019 Gavin E. Crooks ISBN: 978-1-7339381-0-5 http://threeplusone.com/fieldguide Berkeley Institute for Theoretical Sciences (BITS) typeset on 2019-04-10 with XeTeX version 0.99999 fonts: Trump Mediaeval (text), Euler (math) 271828182845904 2 G. E. Crooks – Field Guide to Probability Distributions Preface: The search for GUD A common problem is that of describing the probability distribution of a single, continuous variable. A few distributions, such as the normal and exponential, were discovered in the 1800’s or earlier. But about a century ago the great statistician, Karl Pearson, realized that the known probabil- ity distributions were not sufficient to handle all of the phenomena then under investigation, and set out to create new distributions with useful properties. During the 20th century this process continued with abandon and a vast menagerie of distinct mathematical forms were discovered and invented, investigated, analyzed, rediscovered and renamed, all for the purpose of de- scribing the probability of some interesting variable. There are hundreds of named distributions and synonyms in current usage. The apparent diver- sity is unending and disorienting. Fortunately, the situation is less confused than it might at first appear. Most common, continuous, univariate, unimodal distributions can be orga- nized into a small number of distinct families, which are all special cases of a single Grand Unified Distribution. This compendium details these hun- dred or so simple distributions, their properties and their interrelations.
    [Show full text]
  • Package 'Extradistr'
    Package ‘extraDistr’ September 7, 2020 Type Package Title Additional Univariate and Multivariate Distributions Version 1.9.1 Date 2020-08-20 Author Tymoteusz Wolodzko Maintainer Tymoteusz Wolodzko <[email protected]> Description Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson. License GPL-2 URL https://github.com/twolodzko/extraDistr BugReports https://github.com/twolodzko/extraDistr/issues Encoding UTF-8 LazyData TRUE Depends R (>= 3.1.0) LinkingTo Rcpp 1 2 R topics documented: Imports Rcpp Suggests testthat, LaplacesDemon, VGAM, evd, hoa,
    [Show full text]
  • Inference Based on the Wild Bootstrap
    Inference Based on the Wild Bootstrap James G. MacKinnon Department of Economics Queen's University Kingston, Ontario, Canada K7L 3N6 email: [email protected] Ottawa September 14, 2012 The Wild Bootstrap Consider the linear regression model 2 2 yi = Xi β + ui; E(ui ) = σi ; i = 1; : : : ; n: (1) One natural way to bootstrap this model is to use the residual bootstrap. We condition on X, β^, and the empirical distribution of the residuals (perhaps transformed). Thus the bootstrap DGP is ∗ ^ ∗ ∗ ∼ yi = Xi β + ui ; ui EDF(^ui): (2) Strong assumptions! This assumes that E(yi j Xi) = Xi β and that the error terms are IID, which implies homoskedasticity. At the opposite extreme is the pairs bootstrap, which draws bootstrap samples from the joint EDF of [yi; Xi]. This assumes that there exists such a joint EDF, but it makes no assumptions about the properties of the ui or about the functional form of E(yi j Xi). The wild bootstrap is in some ways intermediate between the residual and pairs bootstraps. It assumes that E(yi j Xi) = Xi β, but it allows for heteroskedasticity by conditioning on the (possibly transformed) residuals. If no restrictions are imposed, wild bootstrap DGP is ∗ ^ ∗ yi = Xi β + f(^ui)vi ; (3) th ∗ where f(^ui) is a transformation of the i residualu ^i, and vi has mean 0. ∗ 6 Thus E f(^ui)vi = 0 even if E f(^ui) = 0. Common choices: p w1: f(^ui) = n=(n − k)u ^i; u^ w2: f(^u ) = i ; i 1=2 (1 − hi) u^i w3: f(^ui) = : 1 − hi th Here hi is the i diagonal element of the \hat matrix" > −1 > PX ≡ X(X X) X : The w1, w2, and w3 transformations are analogous to the ones used in the HC1, HC2, and HC3 covariance matrix estimators.
    [Show full text]
  • The Generalized Poisson-Binomial Distribution and the Computation of Its Distribution Function
    The Generalized Poisson-Binomial Distribution and the Computation of Its Distribution Function Man Zhang and Yili Hong Narayanaswamy Balakrishnan Department of Statistics Department of Mathematics and Statistics Virginia Tech McMaster University Blacksburg, VA 24061, USA Hamilton, ON, L8S 4K1, Canada February 10, 2018 Abstract The Poisson-binomial distribution is useful in many applied problems in engineering, actuarial science, and data mining. The Poisson-binomial distribution models the dis- tribution of the sum of independent but non-identically distributed random indicators whose success probabilities vary. In this paper, we extend the Poisson-binomial distri- bution to a generalized Poisson-binomial (GPB) distribution. The GPB distribution corresponds to the case where the random indicators are replaced by two-point random variables, which can take two arbitrary values instead of 0 and 1 as in the case of ran- dom indicators. The GPB distribution has found applications in many areas such as voting theory, actuarial science, warranty prediction, and probability theory. As the GPB distribution has not been studied in detail so far, we introduce this distribution first and then derive its theoretical properties. We develop an efficient algorithm for the computation of its distribution function, using the fast Fourier transform. We test the accuracy of the developed algorithm by comparing it with enumeration-based exact method and the results from the binomial distribution. We also study the computational time of the algorithm under various parameter settings. Finally, we discuss the factors affecting the computational efficiency of the proposed algorithm, and illustrate the use of the software package. Key Words: Actuarial Science; Discrete Distribution; Fast Fourier Transform; Rademacher Distribution; Voting Theory; Warranty Cost Prediction.
    [Show full text]
  • Probability Distribution Relationships
    STATISTICA, anno LXX, n. 1, 2010 PROBABILITY DISTRIBUTION RELATIONSHIPS Y.H. Abdelkader, Z.A. Al-Marzouq 1. INTRODUCTION In spite of the variety of the probability distributions, many of them are related to each other by different kinds of relationship. Deriving the probability distribu- tion from other probability distributions are useful in different situations, for ex- ample, parameter estimations, simulation, and finding the probability of a certain distribution depends on a table of another distribution. The relationships among the probability distributions could be one of the two classifications: the transfor- mations and limiting distributions. In the transformations, there are three most popular techniques for finding a probability distribution from another one. These three techniques are: 1 - The cumulative distribution function technique 2 - The transformation technique 3 - The moment generating function technique. The main idea of these techniques works as follows: For given functions gin(,XX12 ,...,) X, for ik 1, 2, ..., where the joint dis- tribution of random variables (r.v.’s) XX12, , ..., Xn is given, we define the func- tions YgXXii (12 , , ..., X n ), i 1, 2, ..., k (1) The joint distribution of YY12, , ..., Yn can be determined by one of the suit- able method sated above. In particular, for k 1, we seek the distribution of YgX () (2) For some function g(X ) and a given r.v. X . The equation (1) may be linear or non-linear equation. In the case of linearity, it could be taken the form n YaX ¦ ii (3) i 1 42 Y.H. Abdelkader, Z.A. Al-Marzouq Many distributions, for this linear transformation, give the same distributions for different values for ai such as: normal, gamma, chi-square and Cauchy for continuous distributions and Poisson, binomial, negative binomial for discrete distributions as indicated in the Figures by double rectangles.
    [Show full text]
  • On Strict Sub-Gaussianity, Optimal Proxy Variance and Symmetry for Bounded Random Variables Julyan Arbel, Olivier Marchal, Hien Nguyen
    On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables Julyan Arbel, Olivier Marchal, Hien Nguyen To cite this version: Julyan Arbel, Olivier Marchal, Hien Nguyen. On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables. ESAIM: Probability and Statistics, EDP Sciences, 2020, 24, pp.39-55. 10.1051/ps/2019018. hal-01998252 HAL Id: hal-01998252 https://hal.archives-ouvertes.fr/hal-01998252 Submitted on 29 Jan 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables Julyan Arbel1∗, Olivier Marchal2, Hien D. Nguyen3 ∗Corresponding author, email: [email protected]. 1Universit´eGrenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France. 2Universit´ede Lyon, CNRS UMR 5208, Universit´eJean Monnet, Institut Camille Jordan, 69000 Lyon, France. 3Department of Mathematics and Statistics, La Trobe University, Bundoora Melbourne 3086, Victoria Australia. Abstract We investigate the sub-Gaussian property for almost surely bounded random variables. If sub-Gaussianity per se is de facto ensured by the bounded support of said random variables, then exciting research avenues remain open.
    [Show full text]