Exponential Family

Total Page:16

File Type:pdf, Size:1020Kb

Exponential Family Exponential family STK3100/4100 - 21. august 2014 Plan for 2. lecture: 1. Definition exponential family 2. Examples 3. Expectation and variance 4. Likelihood and estimation Exponential family – p. 1 Definition of GLM Independent responses: Y1,Y2,...,Yn conditioned on explanatory variables Vectors of explanatory variables x1, x2,..., xn where xi =(xi1,xi2,...,xip) are p-dimensional A GLM = Generalized Linear Model is defined by • Y1,Y2,...,Yn comes from the same class of distributions from the exponential family Linear predictors η = β + β x + + β x • i 0 1 i1 ··· p ip • Link function g(): µi = E[Yi] is coupled to the linear 1 predictor by g(µi)= ηi, i.e. µi = g− (ηi) Exponential family – p. 2 Exponential family, de Jong & Heller, Ch. 3 A stochastic variable Y has a distribution belonging to the exponential family if its probability density function (pdf), or probability mass function (pmf) if Y is discrete, can be written as yθ a(θ) − f(y; θ,φ)= c(y,φ)exp( φ ) where • θ - canonical parameter • φ - dispersion parameter • The functions a(θ) and c(y,φ) is specific for each distribution The Gaussian, binomial, Poisson, gamma and other distributions can be written this way Exponential family – p. 3 Exponential distributions with φ =1 Some distributions don’t include the dispersion parameter, i.e. φ =1. Then the pdf or the pmf can be written f(y; θ)= c(y)exp(yθ a(θ)) − This includes • Poisson distribution • Distribution for binary responses, Y =1 or 0, with µ = P(Y = 1) • Binomial distribution 2 • Standard normal distribution with variance σ =1 Exponential family – p. 4 Ex: Poisson distribution, Y Po(λ) ∼ pmf: λy 1 f(y; λ)= exp( λ)= exp(y log(λ) λ), y! − y! − i.e. belonging to the exponential family with • θ = log(λ) • a(θ)= λ = exp(θ) 1 • c(y)= y! Exponential family – p. 5 Ex: Binary variable 1 with probability π Y = 0 with probability 1 π − pmf: y 1 y f(y; π)= π (1 π) − = exp(y log(π)+(1 y) log(1 π)) − − − π = exp(y log( ) + log(1 π)) 1 π − − which is on the form c(y)exp(yθ a(θ)) with − π which gives exp(θ) • θ = log( 1 π ) π = π(θ)= 1+exp(θ) − a(θ)= log(1 π(θ)) = log(1 + exp(θ)) • − − • c(y)=1 Exponential family – p. 6 Ex: Y Bin(n,π) ∼ n y n y pmf: f(y; π)= π (1 π) − y − which can be transformed to c(y)exp(yθ a(θ)) with − π exp(θ) • θ = log( 1 π ) π = 1+exp(θ) − ⇔ a(θ)= n log(1 + exp(θ)) = n log(1 π) • − n • c(y)= y Note that exp(θ) E • a′(θ)= n 1+exp(θ) = nπ = [Y ] exp(θ) a′′(θ)= n 2 = nπ(1 π)= Var[Y ] • (1+exp(θ)) − where a′(θ) and a′′(θ) are the first and second derivatives of a(θ) with respect to θ. These are general expressions for the exponential family. Exponential family – p. 7 Ex: Standard normal distribution, Y N(µ, 1) ∼ 1 1 1 µ2 y2 pdf: f(y; µ)= exp( (y µ)2)= exp(yµ ) √2π −2 − √2π − 2 − 2 2 exp( y ) µ2 = − 2 exp(yµ ) √2π − 2 which is on the form c(y)exp(yθ a(θ)) with − • θ = µ θ2 • a(θ)= 2 2 exp( y ) c(y)= − 2 • √2π Again expectation and variance are given from a(θ): • a′(θ)= θ = µ = E[Y ] a′′(θ)=1= Var[Y ] • Exponential family – p. 8 Exponential family with dispersion parameter With general φ not necessarily 1 Includes normal distribution with general σ2 Exponential family – p. 9 Ex: Y N(µ,σ2) ∼ 1 1 pdf: f(y; µ)= exp( (y µ)2) √2πσ −2σ2 − 2 2 2 y 2 1 yµ µ /2 y /2 exp( 2 ) yµ µ /2 = exp( − − )= − 2σ exp( − )) √2πσ σ2 √2πσ2 σ2 which is on the form c(y,φ) exp((yθ a(θ))/φ) with − 2 2 and µ θ • θ = µ a(θ)= 2 = 2 2 • dispersion parameter φ = σ 2 exp( y ) c(y,φ)= − 2φ • √2πφ Note that • E[Y ]= µ = θ = a′(θ) 2 Var[Y ]= σ = φ = φa′′(θ) • Exponential family – p. 10 Expectation and variance in the exponential family • E[Y ]= a′(θ) • Var[Y ]= φa′′(θ) Exponential family – p. 11 Proof for E[Y ]= a′(θ) y a′(θ) − First derivative of f: f ′(y; θ,φ)= φ f(y; θ,φ) Integral of left side ∂ ∂ f ′(y; θ,φ)dy = f(y; θ,φ)dy = (1) = 0 ∂θ ∂θ Z Z Integral of right side 1 E[Y ] a′(θ) ( yf(y; θ,φ)dy a′(θ) f(y; θ,φ)dy)= − , φ − φ Z Z which gives E[Y ]= a′(θ) Assumes that differentiation and integration can be interchanged Exponential family – p. 12 Proof for Var(Y )= φa′′(θ) 2 y a′(θ) a′′(θ) Second derivative: f ′′(y; θ,φ)= − f(y; θ,φ) φ − φ Integral of left side ∂2 ∂2 f ′′(y; θ,φ)dy = f(y; θ,φ)dy = (1) = 0 ∂θ2 ∂θ2 Z Z Integral of right side 2 y a′(θ) a′′(θ) Var(Y ) a′′(θ) − f(y; θ,φ)dy = φ − φ φ2 − φ Z " # which gives Var(Y )= φa′′(θ) Exponential family – p. 13 Ex: Poisson distribution Y Po(λ) ∼ • θ = log(λ): canonical parameter • a(θ) = exp(θ) which gives • E[Y ]= a′(θ) = exp(θ)= λ • Var[Y ]= a′′(θ) = exp(θ)= λ Exponential family – p. 14 Ex: Normal distribution N(µ,σ2) 2 and θ • θ = µ a(θ)= 2 2 • φ = σ which gives • E[Y ]= a′(θ)= θ = µ 2 • Var[Y ]= φa′′(θ)= φ = σ Exponential family – p. 15 Ex. Binomial distribution Y Bin(n,π) ∼ θ = log(π/(1 π) • − • a(θ)= n log(1 + exp(θ)) • φ =1 which gives E n exp(θ) • [Y0]= a′(θ)= 1+exp(θ) = nπ n exp(θ) Var[Y ]= φa′′(θ)= 2 = nπ(1 π)= µ(1 µ/n) • 0 (1+exp(θ)) − − Exponential family – p. 16 Variance function V (µ) Var(Y )= φa′′(θ) It is an 1-1 relationship between µ = E[Y ]= a′(θ) and θ. Therefore can we also express θ = θ(µ) as a function of µ. The variance function is V (µ)= a′′(θ(µ)) such that Var(Y )= φV (µ). For the most common distributions the expression for V (µ) is found directly. Exponential family – p. 17 Variance function for some distributions Normal distribution: a′′(θ)=1, which gives the variance function V (µ)=1 (the constant function) Poisson distribution: a′′(θ) = exp(θ)= µ, i.e. V (µ)= µ (the identity function) Binomial distribution: neθ a′′(θ)= θ 2 = nπ(1 π)= µ(1 µ/n), i.e. (1+e ) − − V (µ)= µ(1 µ/n) − Exponential family – p. 18 Other members in the exponential family • Gamma distribution, with the exponential distribution as a special case • Inverse Gaussian distribution • Negative binomial distribution, with the geometric distribution as a special case Exponential family – p. 19 Gamma distribution −1 pdf: f(y; µ,ν)= y ( yν )ν exp( yν/µ) for y > 0 Γ(ν) µ − Belongs to the exponential family with θ = 1/µ and a(θ)= log( θ) • − − − • φ =1/ν which gives • E[Y ]= a′(θ)= 1/θ = µ − 2 Var 1 µ • [Y ]= φa′′(θ)= νθ2 = ν 2 • V (µ)= µ ν =1 gives the exponential distribution pdf: f(y; µ)= 1 exp( y/µ)= λ exp( λy), where λ =1/µ µ − − Exponential family – p. 20 Inverse Gaussian distribution: Y has density, for y > 0, 1 1 (y µ)2 f(y; µ,σ2)= exp( − ) 2µy3σ2 −2y µ2σ2 where µ = E[Y ] and Varp(Y )= σ2µ3. This belongs to the exponential family with θ = µ, φ = σ2 and V (µ)= µ3. mu=5, sigma2=0.01 mu=5, sigma2=0.05 mu=5, sigma2=0.1 f(y) f(y) f(y) 0.0 0.1 0.2 0.3 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 y y y mu=20, sigma2=0.01 mu=20, sigma2=0.05 mu=20, sigma2=0.1 f(y) f(y) f(y) 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.02 0.04 0.06 Exponential family – p. 21 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Negative binomial distribution A useful distribution for over-dispersed counts Γ(y + r) pmf: P(Y = y)= (1 p)rpy for y =0, 1, 2,... y!Γ(r) − With κ =1/r assumed known this belongs to the exponential family with µ = E[Y ]= rp/(1 p), • − • without any dispersion parameter • V (µ)= µ(1 + κµ). This distribution may arise for instance if Y λ Po(λ), | ∼ where λ is a stochastic gamma distributed variable with expectation µ and shape parameter r =1/κ. Exponential family – p. 22 Overview of some distributions in the exponential family Distrib. θ a(θ) φ E[Y ] V (µ) Bin(n,π) log( π ) log(1+ eθ) 1 µ = nπ nπ(1 π)= µ(1 µ/n) 1−π − − Po(µ) log(µ) exp(θ) 1 µ µ 2 2 θ 2 N(µ,σ ) µ 2 σ µ 1 Gamma(µ,ν) 1 log( µ) 1 µ µ2 − µ − − ν 2 1 2 3 IG(µ,σ ) 2 √ 2θ σ µ µ − 2µ − − − NB(µ,κ) log( κµ ) 1 log(1 κeθ) 1 µ µ(1 + κµ) 1+κµ κ − Exponential family – p.
Recommended publications
  • A Random Variable X with Pdf G(X) = Λα Γ(Α) X ≥ 0 Has Gamma
    DISTRIBUTIONS DERIVED FROM THE NORMAL DISTRIBUTION Definition: A random variable X with pdf λα g(x) = xα−1e−λx x ≥ 0 Γ(α) has gamma distribution with parameters α > 0 and λ > 0. The gamma function Γ(x), is defined as Z ∞ Γ(x) = ux−1e−udu. 0 Properties of the Gamma Function: (i) Γ(x + 1) = xΓ(x) (ii) Γ(n + 1) = n! √ (iii) Γ(1/2) = π. Remarks: 1. Notice that an exponential rv with parameter 1/θ = λ is a special case of a gamma rv. with parameters α = 1 and λ. 2. The sum of n independent identically distributed (iid) exponential rv, with parameter λ has a gamma distribution, with parameters n and λ. 3. The sum of n iid gamma rv with parameters α and λ has gamma distribution with parameters nα and λ. Definition: If Z is a standard normal rv, the distribution of U = Z2 called the chi-square distribution with 1 degree of freedom. 2 The density function of U ∼ χ1 is −1/2 x −x/2 fU (x) = √ e , x > 0. 2π 2 Remark: A χ1 random variable has the same density as a random variable with gamma distribution, with parameters α = 1/2 and λ = 1/2. Definition: If U1,U2,...,Uk are independent chi-square rv-s with 1 degree of freedom, the distribution of V = U1 + U2 + ... + Uk is called the chi-square distribution with k degrees of freedom. 2 Using Remark 3. and the above remark, a χk rv. follows gamma distribution with parameters 2 α = k/2 and λ = 1/2.
    [Show full text]
  • Stat 5101 Notes: Brand Name Distributions
    Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution Symbol DiscreteUniform(n). Type Discrete. Rationale Equally likely outcomes. Sample Space The interval 1, 2, ..., n of the integers. Probability Function 1 f(x) = , x = 1, 2, . , n n Moments n + 1 E(X) = 2 n2 − 1 var(X) = 12 2 Uniform Distribution Symbol Uniform(a, b). Type Continuous. Rationale Continuous analog of the discrete uniform distribution. Parameters Real numbers a and b with a < b. Sample Space The interval (a, b) of the real numbers. 1 Probability Density Function 1 f(x) = , a < x < b b − a Moments a + b E(X) = 2 (b − a)2 var(X) = 12 Relation to Other Distributions Beta(1, 1) = Uniform(0, 1). 3 Bernoulli Distribution Symbol Bernoulli(p). Type Discrete. Rationale Any zero-or-one-valued random variable. Parameter Real number 0 ≤ p ≤ 1. Sample Space The two-element set {0, 1}. Probability Function ( p, x = 1 f(x) = 1 − p, x = 0 Moments E(X) = p var(X) = p(1 − p) Addition Rule If X1, ..., Xk are i. i. d. Bernoulli(p) random variables, then X1 + ··· + Xk is a Binomial(k, p) random variable. Relation to Other Distributions Bernoulli(p) = Binomial(1, p). 4 Binomial Distribution Symbol Binomial(n, p). 2 Type Discrete. Rationale Sum of i. i. d. Bernoulli random variables. Parameters Real number 0 ≤ p ≤ 1. Integer n ≥ 1. Sample Space The interval 0, 1, ..., n of the integers. Probability Function n f(x) = px(1 − p)n−x, x = 0, 1, . , n x Moments E(X) = np var(X) = np(1 − p) Addition Rule If X1, ..., Xk are independent random variables, Xi being Binomial(ni, p) distributed, then X1 + ··· + Xk is a Binomial(n1 + ··· + nk, p) random variable.
    [Show full text]
  • On a Problem Connected with Beta and Gamma Distributions by R
    ON A PROBLEM CONNECTED WITH BETA AND GAMMA DISTRIBUTIONS BY R. G. LAHA(i) 1. Introduction. The random variable X is said to have a Gamma distribution G(x;0,a)if du for x > 0, (1.1) P(X = x) = G(x;0,a) = JoT(a)" 0 for x ^ 0, where 0 > 0, a > 0. Let X and Y be two independently and identically distributed random variables each having a Gamma distribution of the form (1.1). Then it is well known [1, pp. 243-244], that the random variable W = X¡iX + Y) has a Beta distribution Biw ; a, a) given by 0 for w = 0, (1.2) PiW^w) = Biw;x,x)=\ ) u"-1il-u)'-1du for0<w<l, Ío T(a)r(a) 1 for w > 1. Now we can state the converse problem as follows : Let X and Y be two independently and identically distributed random variables having a common distribution function Fix). Suppose that W = Xj{X + Y) has a Beta distribution of the form (1.2). Then the question is whether £(x) is necessarily a Gamma distribution of the form (1.1). This problem was posed by Mauldon in [9]. He also showed that the converse problem is not true in general and constructed an example of a non-Gamma distribution with this property using the solution of an integral equation which was studied by Goodspeed in [2]. In the present paper we carry out a systematic investigation of this problem. In §2, we derive some general properties possessed by this class of distribution laws Fix).
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • A Form of Multivariate Gamma Distribution
    Ann. Inst. Statist. Math. Vol. 44, No. 1, 97-106 (1992) A FORM OF MULTIVARIATE GAMMA DISTRIBUTION A. M. MATHAI 1 AND P. G. MOSCHOPOULOS2 1 Department of Mathematics and Statistics, McOill University, Montreal, Canada H3A 2K6 2Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX 79968-0514, U.S.A. (Received July 30, 1990; revised February 14, 1991) Abstract. Let V,, i = 1,...,k, be independent gamma random variables with shape ai, scale /3, and location parameter %, and consider the partial sums Z1 = V1, Z2 = 171 + V2, . • •, Zk = 171 +. • • + Vk. When the scale parameters are all equal, each partial sum is again distributed as gamma, and hence the joint distribution of the partial sums may be called a multivariate gamma. This distribution, whose marginals are positively correlated has several interesting properties and has potential applications in stochastic processes and reliability. In this paper we study this distribution as a multivariate extension of the three-parameter gamma and give several properties that relate to ratios and conditional distributions of partial sums. The general density, as well as special cases are considered. Key words and phrases: Multivariate gamma model, cumulative sums, mo- ments, cumulants, multiple correlation, exact density, conditional density. 1. Introduction The three-parameter gamma with the density (x _ V)~_I exp (x-7) (1.1) f(x; a, /3, 7) = ~ , x>'7, c~>O, /3>0 stands central in the multivariate gamma distribution of this paper. Multivariate extensions of gamma distributions such that all the marginals are again gamma are the most common in the literature.
    [Show full text]
  • 6: the Exponential Family and Generalized Linear Models
    10-708: Probabilistic Graphical Models 10-708, Spring 2014 6: The Exponential Family and Generalized Linear Models Lecturer: Eric P. Xing Scribes: Alnur Ali (lecture slides 1-23), Yipei Wang (slides 24-37) 1 The exponential family A distribution over a random variable X is in the exponential family if you can write it as P (X = x; η) = h(x) exp ηT T(x) − A(η) : Here, η is the vector of natural parameters, T is the vector of sufficient statistics, and A is the log partition function1 1.1 Examples Here are some examples of distributions that are in the exponential family. 1.1.1 Multivariate Gaussian Let X be 2 Rp. Then we have: 1 1 P (x; µ; Σ) = exp − (x − µ)T Σ−1(x − µ) (2π)p=2jΣj1=2 2 1 1 = exp − (tr xT Σ−1x + µT Σ−1µ − 2µT Σ−1x + ln jΣj) (2π)p=2 2 0 1 1 B 1 −1 T T −1 1 T −1 1 C = exp B− tr Σ xx +µ Σ x − µ Σ µ − ln jΣj)C ; (2π)p=2 @ 2 | {z } 2 2 A | {z } vec(Σ−1)T vec(xxT ) | {z } h(x) A(η) where vec(·) is the vectorization operator. 1 R T It's called this, since in order for P to normalize, we need exp(A(η)) to equal x h(x) exp(η T(x)) ) A(η) = R T ln x h(x) exp(η T(x)) , which is the log of the usual normalizer, which is the partition function.
    [Show full text]
  • A Note on the Existence of the Multivariate Gamma Distribution 1
    - 1 - A Note on the Existence of the Multivariate Gamma Distribution Thomas Royen Fachhochschule Bingen, University of Applied Sciences e-mail: [email protected] Abstract. The p - variate gamma distribution in the sense of Krishnamoorthy and Parthasarathy exists for all positive integer degrees of freedom and at least for all real values pp 2, 2. For special structures of the “associated“ covariance matrix it also exists for all positive . In this paper a relation between central and non-central multivariate gamma distributions is shown, which implies the existence of the p - variate gamma distribution at least for all non-integer greater than the integer part of (p 1) / 2 without any additional assumptions for the associated covariance matrix. 1. Introduction The p-variate chi-square distribution (or more precisely: “Wishart-chi-square distribution“) with degrees of 2 freedom and the “associated“ covariance matrix (the p (,) - distribution) is defined as the joint distribu- tion of the diagonal elements of a Wp (,) - Wishart matrix. Its probability density (pdf) has the Laplace trans- form (Lt) /2 |ITp 2 | , (1.1) with the ()pp - identity matrix I p , , T diag( t1 ,..., tpj ), t 0, and the associated covariance matrix , which is assumed to be non-singular throughout this paper. The p - variate gamma distribution in the sense of Krishnamoorthy and Parthasarathy [4] with the associated covariance matrix and the “degree of freedom” 2 (the p (,) - distribution) can be defined by the Lt ||ITp (1.2) of its pdf g ( x1 ,..., xp ; , ). (1.3) 2 For values 2 this distribution differs from the p (,) - distribution only by a scale factor 2, but in this paper we are only interested in positive non-integer values 2 for which ||ITp is the Lt of a pdf and not of a function also assuming any negative values.
    [Show full text]
  • Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families
    STATS 300A: Theory of Statistics Fall 2015 Lecture 2 | September 24 Lecturer: Lester Mackey Scribe: Stephen Bates and Andy Tsao 2.1 Recap Last time, we set out on a quest to develop optimal inference procedures and, along the way, encountered an important pair of assertions: not all data is relevant, and irrelevant data can only increase risk and hence impair performance. This led us to introduce a notion of lossless data compression (sufficiency): T is sufficient for P with X ∼ Pθ 2 P if X j T (X) is independent of θ. How far can we take this idea? At what point does compression impair performance? These are questions of optimal data reduction. While we will develop general answers to these questions in this lecture and the next, we can often say much more in the context of specific modeling choices. With this in mind, let's consider an especially important class of models known as the exponential family models. 2.2 Exponential Families Definition 1. The model fPθ : θ 2 Ωg forms an s-dimensional exponential family if each Pθ has density of the form: s ! X p(x; θ) = exp ηi(θ)Ti(x) − B(θ) h(x) i=1 • ηi(θ) 2 R are called the natural parameters. • Ti(x) 2 R are its sufficient statistics, which follows from NFFC. • B(θ) is the log-partition function because it is the logarithm of a normalization factor: s ! ! Z X B(θ) = log exp ηi(θ)Ti(x) h(x)dµ(x) 2 R i=1 • h(x) 2 R: base measure.
    [Show full text]
  • Negative Binomial Regression Models and Estimation Methods
    Appendix D: Negative Binomial Regression Models and Estimation Methods By Dominique Lord Texas A&M University Byung-Jung Park Korea Transport Institute This appendix presents the characteristics of Negative Binomial regression models and discusses their estimating methods. Probability Density and Likelihood Functions The properties of the negative binomial models with and without spatial intersection are described in the next two sections. Poisson-Gamma Model The Poisson-Gamma model has properties that are very similar to the Poisson model discussed in Appendix C, in which the dependent variable yi is modeled as a Poisson variable with a mean i where the model error is assumed to follow a Gamma distribution. As it names implies, the Poisson-Gamma is a mixture of two distributions and was first derived by Greenwood and Yule (1920). This mixture distribution was developed to account for over-dispersion that is commonly observed in discrete or count data (Lord et al., 2005). It became very popular because the conjugate distribution (same family of functions) has a closed form and leads to the negative binomial distribution. As discussed by Cook (2009), “the name of this distribution comes from applying the binomial theorem with a negative exponent.” There are two major parameterizations that have been proposed and they are known as the NB1 and NB2, the latter one being the most commonly known and utilized. NB2 is therefore described first. Other parameterizations exist, but are not discussed here (see Maher and Summersgill, 1996; Hilbe, 2007). NB2 Model Suppose that we have a series of random counts that follows the Poisson distribution: i e i gyii; (D-1) yi ! where yi is the observed number of counts for i 1, 2, n ; and i is the mean of the Poisson distribution.
    [Show full text]
  • Modeling Overdispersion with the Normalized Tempered Stable Distribution
    Modeling overdispersion with the normalized tempered stable distribution M. Kolossiatisa, J.E. Griffinb, M. F. J. Steel∗,c aDepartment of Hotel and Tourism Management, Cyprus University of Technology, P.O. Box 50329, 3603 Limassol, Cyprus. bInstitute of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, CT2 7NF, U.K. cDepartment of Statistics, University of Warwick, Coventry, CV4 7AL, U.K. Abstract A multivariate distribution which generalizes the Dirichlet distribution is intro- duced and its use for modeling overdispersion in count data is discussed. The distribution is constructed by normalizing a vector of independent tempered sta- ble random variables. General formulae for all moments and cross-moments of the distribution are derived and they are found to have similar forms to those for the Dirichlet distribution. The univariate version of the distribution can be used as a mixing distribution for the success probability of a binomial distribution to define an alternative to the well-studied beta-binomial distribution. Examples of fitting this model to simulated and real data are presented. Keywords: Distribution on the unit simplex; Mice fetal mortality; Mixed bino- mial; Normalized random measure; Overdispersion 1. Introduction In many experiments we observe data as the number of observations in a sam- ple with some property. The binomial distribution is a natural model for this type of data. However, the data are often found to be overdispersed relative to that ∗Corresponding author. Tel.: +44-2476-523369; fax: +44-2476-524532 Email addresses: [email protected] (M. Kolossiatis), [email protected] (J.E. Griffin), [email protected] (M.
    [Show full text]
  • Supplementary Information on the Negative Binomial Distribution
    Supplementary information on the negative binomial distribution November 25, 2013 1 The Poisson distribution Consider a segment of DNA as a collection of a large number, say N, of individual intervals over which a break can take place. Furthermore, suppose the probability of a break in each interval is the same and is very small, say p(N), and breaks occur in the intervals independently of one another. Here, we need p to be a function of N because if there are more locations (i.e. N is increased) we need to make p smaller if we want to have the same model (since there are more opportunities for breaks). Since the number of breaks is a cumulative sum of iid Bernoulli variables, the distribution of breaks, y, is binomially distributed with parameters (N; p(N)), that is N P (y = k) = p(N)k(1 − p(N))N−k: k If we suppose p(N) = λ/N, for some λ, and we let N become arbitrarily large, then we find N P (y = k) = lim p(N)k(1 − p(N))N−k N!1 k N ··· (N − k + 1) = lim (λ/N)k(1 − λ/N)N−k N!1 k! λk N ··· (N − k + 1) = lim (1 − λ/N)N (1 − λ/N)−k N!1 k! N k = λke−λ=k! which is the Poisson distribution. It has a mean and variance of λ. The mean of a discrete random P 2 variable, y, is EY = k kP (Y = k) and the variance is E(Y − EY ) .
    [Show full text]
  • 5 the Poisson Process for X ∈ [0, ∞)
    Gamma(λ, r). There are a couple different ways that gamma distributions are parametrized|either in terms of λ and r as done here, or in terms of α and β. The connection is α = r, and β = 1/λ, A short introduction to Bayesian which is the expected time to the first event in a statistics, part III Poisson process. The probability density function Math 217 Probability and Statistics for a gamma distribution is Prof. D. Joyce, Fall 2014 xα−1e−x/β λrxr−1e−λx f(x) = = βαΓ(α) Γ(r) 5 The Poisson process for x 2 [0; 1). The mean of a gamma distribution 2 2 A Poisson process is the continuous version of a is αβ = r/λ while its variance is αβ = r/λ . Bernoulli process. In a Bernoulli process, time is Our job is to get information about this param- discrete, and at each time unit there is a certain eter λ. Using the Bayesian approach, we have a probability p that success occurs, the same proba- prior density function f(λ) on λ. Suppose over a bility at any given time, and the events at one time time interval of length t we observe k events. The instant are independent of the events at other time posterior density function is proportional to a con- instants. ditional probability times the prior density function In a Poisson process, time is continuous, and f(λ j k) / P (k j λ) f(λ): there is a certain rate λ of events occurring per unit time that is the same for any time interval, and Now, k and t are constants, so events occur independently of each other.
    [Show full text]