<<

Exponential Families

Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Surprisingly many of the distributions we use in for random vari- n ables X taking value in some space X (often R or N0 but sometimes R , Z, or some other space), indexed by a θ from some parameter Θ, can be written in form, with pdf or pmf

f(x θ) = exp [η(θ)t(x) B(θ)] h(x) | − for some t : X R, natural parameter η :Θ R, and functions → → B : Θ R and h : X R . The for a random sample → → + of size n from the exponential family is

n fn(x θ) = exp η(θ) t(xj) nB(θ) h(xi), | − Xj=1 Y   which is actually of the same form with the same natural parameter η( ), · but now with statistic Tn(x) = t(xj) and functions Bn(θ) = nB(θ) and hn(x) = Πh(xj). P

Examples

For example, the pmf for the Bi(m,p) can be written as

m x m x p m p (1 p) − = exp log x m log(1 p)  x  −  1 p − −  x  − p of Exponential Family form with η(p) = log 1 p and natural sufficient statis- tic t(x)= x, and the Poisson −

x θ θ 1 e− = exp [(log θ)x θ] x! − x!

1 with η = log θ and again t(x) = x. The Be(α, β) with either one of its two parmeters unknown can be written in EF form too:

β Γ(α + β) α 1 β 1 Γ(α) (1 x) x − (1 x) − = exp α log x log − Γ(α)Γ(β) −  −  Γ(α + β) x(1 x)Γ(β) − Γ(β) xα = exp β log(1 x) log  − −  Γ(α + β) x(1 x)Γ(α) − with t(x) = log x or log(1 x) when η = α or η = β is unknown, respectively. − With both unknown the beta distribution can be written as a bivariate Exponential Family with parameter θ = (α, β) R2 : ∈ + f(x θ) = exp [η(θ) t(x) B(θ)] h(x) (1) | · − with vector parameter η = (α, β) and statistic t(x) = (log x, log 1 x) and − scalar (one-dimensional) functions B(θ) = log Γ(α) + log Γ(β) log(α + β) − and h(x) = 1/x(1 x). Since this comes up often, we’ll let η and T be − q-dimensional below; usually in this course q = 1 or 2.

Natural Exponential Families

It is often convenient to reparametrize exponential families to the natural parameter η = η(θ) Rq, leading (with A(η(θ)) B(θ)) to ∈ ≡ η t(x) A(η) f(x η)= e · − h(x) (2) | Since any pdf integrates to unity we have

A(η) η t(x) e = e · h(x) dx ZX and hence can calculate the (MGF) for the natural sufficient statistic t(x)= t (x), ,t (x) as { 1 · · · q }

s t(X) Mt(s)= E e · h i s t(x) η t(x) A(η) = e · e · − h(x) dx ZX A(η) (η+s) t(x) = e− e · h(x) dx ZX A(η+s) A(η) = e − ,

2 so log M (s) = A(η + s) A(η) and we can find moments for the natural t − sufficient statistic by

E[t] = log M (0) = A(η) ∇ t ∇ V[t] = 2 log M (0) = 2A(η) ∇ t ∇ provided that η is an interior point of the natural parameter space

q η t(x) E η R : 0 < e · h(x) dx < ≡ { ∈ ZX ∞} and that A( ) is twice-differentiable near η. For samples of size n N the · ∈ sufficient statistic

Tn(x)= t(xj) X is a sum of independent random variables, so by the we have approximately

No n A(η), n 2A(η) . ∼  ∇ ∇  Note that 2A(η) = 2 log f(x θ) is both the observed and Fisher ∇ −∇ | (expected) () In(θ) for natural exponential families, and that the statistic is Z := log f(x θ)= T (x) n A(η) . ∇ | n − ∇   Conjugate Priors

For hyper-parameters α Rq and β R such that ∈ ∈ η(θ) α βB(θ) cα,β := e · − dθ < , ZΘ ∞ we can define a prior density for θ by

1 η(θ) α βB(θ) π(θ α, β)= cα,β− e · − dθ. | ZΘ

iid With this prior and with X f(x θ) from the exponential family, { i} ∼ | the posterior is

η(θ) α βB(θ) η(θ) Tn(x) nB(θ) π(θ x)) e · − e · − | ∝ π(θ α∗ = α + T (x), β∗ = β + n), ∝ | n

3 again within the same family but now with parameters α∗ = α + Tn and β∗ = β + n. For example, in the binomial example above this family is

p α β α π(θ α, β) exp α log β log(1 p) = p (1 p) − , | ∝  1 p − −  − − the Beta family, while for the Poisson example it is

α βθ π(θ α, β) exp α log θ βθ = θ e− , | ∝ { − } the Gamma family. Conjugate families for every exponential family are available in the same way. Note not every distribution we consider is from an exponential family. From (2), for exmple, it is clear set of points where the pdf or pmf is nonzero, the possible values a random X can take, is just

x X : f(x θ) > 0 = x X : h(x) > 0 , { ∈ | } { ∈ } which does not depend on the parameter θ; thus any family of distributions where the “” depends on the parameter (uniform distributions are important examples) can’t be from an exponential family. The next pages show several familiar (and some less familiar ones, like the Inverse Gaussian IG(µ, λ) and Pareto Pa(α, β)) distributions in expo- nential family form. Some of the formulas involve the log gamma func- tion γ(z) = log Γ(z) and its first and second , the “digamma” 2 2 ψ(z) = (d/dz)γ(z) and “trigamma” ψ′(z) = (d /dz )γ(z), which are built into R, Mathematica, Maple, the gsl library in C, and such, but aren’t on pocket calculators or most spreadsheets. In each case 2A(η) is the ∇ Information matrix in the natural parametrization, I(θ) in the usual pa- rameterization.

4 1 Exponential Family Examples

Be(α, β) f(x)= Γ(α+β) xα 1(1 x)β 1, x (0, 1) T = (log x, log 1 x) Γ(α)Γ(β) − − − ∈ − B(α, β)= γ(α)+ γ(β) γ(α + β) η = (α, β) − A(η)= γ(η )+ γ(η ) γ(η + η ) 1 2 − 1 2 ψ(η ) ψ(η + η ) ψ(α) ψ(α + β) A(η)= 1 − 1 2 ET = − ∇ ψ(η2) ψ(η1 + η2) ψ(β) ψ(α + β) − − 2 ψ′(η1) c c A(η)= − − c = ψ′(η1 + η2) ∇  c ψ (η ) c − ′ 2 − Bi m x (m x) (m,p) f(x)= x p q − , x = 0...m T = x B(p)= m log q η = log(p/q) − A(η)= m log(1 + eη) p = eη/(1 + eη) meη E A(η)= 1+eη T = mp ∇2 meη A(η)= 2 I(p)= m/pq ∇ (1+eη )

λx Ex(λ) f(x)= λe− , x> 0 T = x B(λ)= log λ η = λ − − A(η)= log( η) − − A(η)= 1/η ET = 1/λ ∇ − 2A(η)= η 2 I(λ) = 1/λ2 ∇ − Ga λα α 1 λx (α, λ) f(x)= Γ(α) x − e− , x> 0 T = (log x,x) B(α, β)= γ(α) α log λ η = (α, λ) − − A(η)= γ(η ) η log( η ) 1 − 1 − 2 ψ(η ) log( η ) ψ(α) log λ A(η)= 1 − − 2 ET = − ∇  η1/η2   α/λ  − 2 ψ′(η1) 1/η2 ψ′(α) 1/λ A(η)= − 2 I(α, λ)= − 2 ∇  1/η2 η1/η2   1/λ α/λ  − − Ge(p) f(x)= pqx, x = 0, 1, 2, ... T = x B(p)= log p η = log q − A(η)= log(1 eη) p = 1 eη −eη − E − A(η)= 1 eη T = q/p ∇2 −eη 2 A(η)= (1 eη )2 I(p) = 1/p q ∇ −

5 Exponential Family Examples (cont’d)

2 (a bx) /2x 3 IG(a, b) f(x)= ae− − /√2πx , x> 0 T = (1/x,x) B(a, b)= ab log a η = ( a2/2, b2/2) − − 1 − − A(η)= 2√η1 η2 log( 2η1) a = √ 2η1, b = √ 2η2 − − 2 − − − η /η 1/2η b/a + 1/a2 A(η)= 2 1 − 1 ET = ∇ p η1/η2   a/b  η2 1 1 p 3 + 2 2 η1 η1 √−η1η2 b/a + 2/a 1 2A(η)= 1 I(a, b)= 2 q 1 η1  − ∇ − 3  1 a/b √η1η2 η2 −  q  α α x NB(α, p) f(x)= − p ( q) , x = 0, 1, 2, ... T = x x − B(p)= α log p η = log q − A(η)= α log(1 eη) p = 1 eη −αeη − E − A(η)= 1 eη T = αq/p ∇2 −αeη 2 A(η)= (1 eη )2 I(p)= α/p q ∇ − 2 2 2 (x µ) /2σ 2 2 No(µ,σ ) f(x)= e− − /√2πσ T = (x,x ) B(µ,σ2)= µ2/2σ2 + 1 log σ2 η = (µσ 2, σ 2/2) 2 − − − A(η)= η 2/4η 1 log( 2η ) − 1 2 − 2 − 2 η1/2η2 E µ A(η)= 2 − 2 T = 2 2 ∇ η1 /4η2 1/2η2 µ + σ  − 2 2 2 1/2η2 η1/2η2 σ− 0 A(η)= − 2 2 3 2 I(a, b)= 4 ∇ η1/2η2 η1 /2η2 + 1/2η2   0 σ− /2 − x λ Po(λ) f(x)= λ e− /x!, x = 0, 1, 2, ... T = x B(λ)= λ η = log λ A(η)= eη λ = eη A(η)= eη ET = λ ∇ 2A(η)= eη I(λ) = 1/λ ∇ Pa(α, β) f(x)= β αβ/xβ+1,x>α T = log x B(β)= log β β log α η = β − − − A(η)= log( η)+ η log α − − A(η) = log α 1/η ET = log α + 1/β ∇ − 2A(η)= η 2 I(λ)= β 2 ∇ − −

6