Exponential Families

Exponential Families Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Surprisingly many of the distributions we use in statistics for random vari- n ables X taking value in some space X (often R or N0 but sometimes R , Z, or some other space), indexed by a parameter θ from some parameter set Θ, can be written in exponential family form, with pdf or pmf f(x θ) = exp [η(θ)t(x) B(θ)] h(x) | − for some statistic t : X R, natural parameter η :Θ R, and functions → → B : Θ R and h : X R . The likelihood function for a random sample → → + of size n from the exponential family is n fn(x θ) = exp η(θ) t(xj) nB(θ) h(xi), | − Xj=1 Y which is actually of the same form with the same natural parameter η( ), · but now with statistic Tn(x) = t(xj) and functions Bn(θ) = nB(θ) and hn(x) = Πh(xj). P Examples For example, the pmf for the binomial distribution Bi(m,p) can be written as m x m x p m p (1 p) − = exp log x m log(1 p) x − 1 p − − x − p of Exponential Family form with η(p) = log 1 p and natural sufficient statistic t(x)= x, and the Poisson − x θ θ 1 e− = exp [(log θ)x θ] x! − x! 1 with η = log θ and again t(x) = x. The Beta distribution Be(α, β) with either one of its two parmeters unknown can be written in EF form too: β Γ(α + β) α 1 β 1 Γ(α) (1 x) x − (1 x) − = exp α log x log − Γ(α)Γ(β) − − Γ(α + β) x(1 x)Γ(β) − Γ(β) xα = exp β log(1 x) log − − Γ(α + β) x(1 x)Γ(α) − with t(x) = log x or log(1 x) when η = α or η = β is unknown, respectively. − With both parameters unknown the beta distribution can be written as a bivariate Exponential Family with parameter θ = (α, β) R2 : ∈ + f(x θ) = exp [η(θ) t(x) B(θ)] h(x) (1) | · − with vector parameter η = (α, β) and statistic t(x) = (log x, log 1 x) and − scalar (one-dimensional) functions B(θ) = log Γ(α) + log Γ(β) log(α + β) − and h(x) = 1/x(1 x). Since this comes up often, we’ll let η and T be − q-dimensional below; usually in this course q = 1 or 2. Natural Exponential Families It is often convenient to reparametrize exponential families to the natural parameter η = η(θ) Rq, leading (with A(η(θ)) B(θ)) to ∈ ≡ η t(x) A(η) f(x η)= e · − h(x) (2) | Since any pdf integrates to unity we have A(η) η t(x) e = e · h(x) dx ZX and hence can calculate the moment generating function (MGF) for the natural sufficient statistic t(x)= t (x), ,t (x) as { 1 · · · q } s t(X) Mt(s)= E e · h i s t(x) η t(x) A(η) = e · e · − h(x) dx ZX A(η) (η+s) t(x) = e− e · h(x) dx ZX A(η+s) A(η) = e − , 2 so log M (s) = A(η + s) A(η) and we can find moments for the natural t − sufficient statistic by E[t] = log M (0) = A(η) ∇ t ∇ V[t] = 2 log M (0) = 2A(η) ∇ t ∇ provided that η is an interior point of the natural parameter space q η t(x) E η R : 0 < e · h(x) dx < ≡ { ∈ ZX ∞} and that A( ) is twice-differentiable near η. For samples of size n N the · ∈ sufficient statistic Tn(x)= t(xj) X is a sum of independent random variables, so by the Central Limit Theorem we have approximately No n A(η), n 2A(η) . ∼ ∇ ∇ Note that 2A(η) = 2 log f(x θ) is both the observed and Fisher ∇ −∇ | (expected) information (matrix) In(θ) for natural exponential families, and that the score statistic is Z := log f(x θ)= T (x) n A(η) . ∇ | n − ∇ Conjugate Priors For hyper-parameters α Rq and β R such that ∈ ∈ η(θ) α βB(θ) cα,β := e · − dθ < , ZΘ ∞ we can define a prior density for θ by 1 η(θ) α βB(θ) π(θ α, β)= cα,β− e · − dθ. | ZΘ iid With this prior and with data X f(x θ) from the exponential family, { i} ∼ | the posterior is η(θ) α βB(θ) η(θ) Tn(x) nB(θ) π(θ x)) e · − e · − | ∝ π(θ α∗ = α + T (x), β∗ = β + n), ∝ | n 3 again within the same family but now with parameters α∗ = α + Tn and β∗ = β + n. For example, in the binomial example above this conjugate prior family is p α β α π(θ α, β) exp α log β log(1 p) = p (1 p) − , | ∝ 1 p − − − − the Beta family, while for the Poisson example it is α βθ π(θ α, β) exp α log θ βθ = θ e− , | ∝ { − } the Gamma family. Conjugate families for every exponential family are available in the same way. Note not every distribution we consider is from an exponential family. From (2), for exmple, it is clear set of points where the pdf or pmf is nonzero, the possible values a random variable X can take, is just x X : f(x θ) > 0 = x X : h(x) > 0 , { ∈ | } { ∈ } which does not depend on the parameter θ; thus any family of distributions where the “support” depends on the parameter (uniform distributions are important examples) can’t be from an exponential family. The next pages show several familiar (and some less familiar ones, like the Inverse Gaussian IG(µ, λ) and Pareto Pa(α, β)) distributions in exponential family form. Some of the formulas involve the log gamma function γ(z) = log Γ(z) and its first and second derivatives, the “digamma” 2 2 ψ(z) = (d/dz)γ(z) and “trigamma” ψ′(z) = (d /dz )γ(z), which are built into R, Mathematica, Maple, the gsl library in C, and such, but aren’t on pocket calculators or most spreadsheets. In each case 2A(η) is the ∇ Information matrix in the natural parametrization, I(θ) in the usual pa- rameterization. 4 1 Exponential Family Examples Be(α, β) f(x)= Γ(α+β) xα 1(1 x)β 1, x (0, 1) T = (log x, log 1 x) Γ(α)Γ(β) − − − ∈ − B(α, β)= γ(α)+ γ(β) γ(α + β) η = (α, β) − A(η)= γ(η )+ γ(η ) γ(η + η ) 1 2 − 1 2 ψ(η ) ψ(η + η ) ψ(α) ψ(α + β) A(η)= 1 − 1 2 ET = − ∇ ψ(η2) ψ(η1 + η2) ψ(β) ψ(α + β) − − 2 ψ′(η1) c c A(η)= − − c = ψ′(η1 + η2) ∇ c ψ (η ) c − ′ 2 − Bi m x (m x) (m,p) f(x)= x p q − , x = 0...m T = x B(p)= m log q η = log(p/q) − A(η)= m log(1 + eη) p = eη/(1 + eη) meη E A(η)= 1+eη T = mp ∇2 meη A(η)= 2 I(p)= m/pq ∇ (1+eη ) λx Ex(λ) f(x)= λe− , x> 0 T = x B(λ)= log λ η = λ − − A(η)= log( η) − − A(η)= 1/η ET = 1/λ ∇ − 2A(η)= η 2 I(λ) = 1/λ2 ∇ − Ga λα α 1 λx (α, λ) f(x)= Γ(α) x − e− , x> 0 T = (log x,x) B(α, β)= γ(α) α log λ η = (α, λ) − − A(η)= γ(η ) η log( η ) 1 − 1 − 2 ψ(η ) log( η ) ψ(α) log λ A(η)= 1 − − 2 ET = − ∇ η1/η2 α/λ − 2 ψ′(η1) 1/η2 ψ′(α) 1/λ A(η)= − 2 I(α, λ)= − 2 ∇ 1/η2 η1/η2 1/λ α/λ − − Ge(p) f(x)= pqx, x = 0, 1, 2, ... T = x B(p)= log p η = log q − A(η)= log(1 eη) p = 1 eη −eη − E − A(η)= 1 eη T = q/p ∇2 −eη 2 A(η)= (1 eη )2 I(p) = 1/p q ∇ − 5 Exponential Family Examples (cont’d) 2 (a bx) /2x 3 IG(a, b) f(x)= ae− − /√2πx , x> 0 T = (1/x,x) B(a, b)= ab log a η = ( a2/2, b2/2) − − 1 − − A(η)= 2√η1 η2 log( 2η1) a = √ 2η1, b = √ 2η2 − − 2 − − − η /η 1/2η b/a + 1/a2 A(η)= 2 1 − 1 ET = ∇ p η1/η2 a/b η2 1 1 p 3 + 2 2 η1 η1 √−η1η2 b/a + 2/a 1 2A(η)= 1 I(a, b)= 2 q 1 η1 − ∇ − 3 1 a/b √η1η2 η2 − q α α x NB(α, p) f(x)= − p ( q) , x = 0, 1, 2, ... T = x x − B(p)= α log p η = log q − A(η)= α log(1 eη) p = 1 eη −αeη − E − A(η)= 1 eη T = αq/p ∇2 −αeη 2 A(η)= (1 eη )2 I(p)= α/p q ∇ − 2 2 2 (x µ) /2σ 2 2 No(µ,σ ) f(x)= e− − /√2πσ T = (x,x ) B(µ,σ2)= µ2/2σ2 + 1 log σ2 η = (µσ 2, σ 2/2) 2 − − − A(η)= η 2/4η 1 log( 2η ) − 1 2 − 2 − 2 η1/2η2 E µ A(η)= 2 − 2 T = 2 2 ∇ η1 /4η2 1/2η2 µ + σ − 2 2 2 1/2η2 η1/2η2 σ− 0 A(η)= − 2 2 3 2 I(a, b)= 4 ∇ η1/2η2 η1 /2η2 + 1/2η2 0 σ− /2 − x λ Po(λ) f(x)= λ e− /x!, x = 0, 1, 2, ..

Exponential Families

The Exponential Family 1 Definition

A Skew Extension of the T-Distribution, with Applications

5. the Student T Distribution

On a Problem Connected with Beta and Gamma Distributions by R

1 One Parameter Exponential Families

Random Variables and Probability Distributions 1.1

6: the Exponential Family and Generalized Linear Models

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Random Processes

Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families

11. Parameter Estimation

Random Variables and Applications