<<

Exponential Families

T The random vector Y = (Y1,...,Yn) has a distribution from an exponen- tial family if the density of Y is of the form

T  fY (y|θ) = b(y) exp η(θ) T (y) − a(θ)

where

T η(θ) = η1(θ), . . . , ηd(θ) , T T (y) = T1(y),...,Td(y) .

T ◦ η = (η1, . . . , ηd) is the natural parameter of the ◦ T = T (Y ) is a sufficient statistic for θ (or for η)

Definition A statistic T = T (Y1,...,Yn) is said to be sufficient for θ if the conditional distribution of Y1,...,Yn given T = t does not depend on θ for any value of t.

A sufficient statistic for θ contains all the information in the sample about θ. Thus, given the value of T , we cannot improve our knowledge about

θ by a more detailed analysis of the data Y1,...,Yn. In other words, an estimate based on T = t cannot improved by using the data Y1,...,Yn. Sufficiency and exponential families: Rice (1995), 280-284. Example: Consider a sequence of independent Bernoulli trials,

iid Yi ∼ Bin(1, θ). Pn The number of successes, Sn = i=1 Yi is sufficient for the parameter θ. Additional information about the observed values Y1,...,Yn, as e.g. the order in which the successes occured, does not convey any information about θ.

Exponential Families, Apr 13, 2004 - 1 - Exponential Families

Examples

: Y ∼ Bin(n, θ) n f (y|θ) = θy(1 − θ)n−y Y y

n h   i = exp y log θ + n log(1 − θ) y 1−θ with  θ  η(θ) = log 1 − θ T (y) = y

: Y ∼ N (µ, σ2)

2 2 − 1  1 2 f (y|µ, σ ) = (2πσ ) 2 exp − (y − µ) Y 2σ2  1 µ µ2 1  = exp − y2 + y − − log(2πσ2) 2σ2 σ2 2σ2 2 with  1 µ T η(µ, σ2) = − , 2σ2 σ2 T (y) = (y2, y)T

T iid 2 If Y = (Y1,...,Yn) with Yi ∼ N (µ, σ ), then

n n T  P 2 P  T (y) = yi , yi i=1 i=1

Exponential Families, Apr 13, 2004 - 2 - Exponential Families

: Y ∼ Γ(α, λ) λα f (y|α, λ) = yα−1 exp − λ y Y Γ(α) = x−1 exp − λ y + α log(y) + α log(λ) − log(Γ(α))

with

η(α, λ) = (−λ, α)T T (y) = y, log(y)T

T iid If Y = (Y1,...,Yn) with Yi ∼ Γ(α, λ), then

n n T  P P  T (y) = yi, log(yi) i=1 i=1

This includes as special cases ◦ the (= Γ(1, λ)) 2 n 1 ◦ the χ distribution with n degrees of freedom (= Γ 2 , 2 ) ◦ : Y ∼ B(α, β) Γ(α + β) f (y|α, β) = yα−1(1 − y)β−1 Y Γ(α)Γ(β) = [y(1 − y)]−1 exp α log(y) + β log(1 − y) + log Γ(α + β) − log Γ(α) − log Γ(β)

with

η(α, β) = (α, β)T T (y) = log(y), log(1 − y)T

Exponential Families, Apr 13, 2004 - 3 - Maximum Likelihood for Exponential Families

T Suppose that Y = (Y1,...,Yn) with density

T  fY (y|θ) = b(y) exp η(θ) T (y) − a(θ)

Then the log- is given by

T  ln(θ|Y ) = η(θ) T (Y ) − a(θ) + log b(Y ) .

Differentiating with respect to θ, we obtain the likelihood equations

∂η(θ)T ∂a(θ) T (Y ) = . ∂θ ∂θ

The likelihood equations can be rewritten in the following form

∂η(θ)T  ∂η(θ)T E T (Y ) θ = T (Y ). ∂θ ∂θ ∂η(θ) If the matrix ∂θ is invertible, then this simplifies to ET (Y )|θ = T (Y ).

Proof. To see this, note that

∂η(θ)T ∂a(θ) E T (Y ) − ∂θ ∂θ

Z ∂η(θ)T ∂a(θ) = T (Y ) − exp η(θ)TT (y) − a(θ) b(y) dy ∂θ ∂θ

∂ Z = exp η(θ)TT (y) − a(θ) b(y) dy ∂θ = 0

Exponential Families, Apr 13, 2004 - 4 - Maximum Likelihood for Exponential Families

Examples

iid 2 ◦ Y1,...,Yn ∼ N (µ, σ ) Note that

n n T  P P 2 T (Y ) = Yi, Yi . i=1 i=1 Thus the ML is given by the solution of

n P Yi = n µ i=1

n P 2 2 2 Yi = n σ + n µ i=1

iid ◦ Y1,...,Yn ∼ Γ(α, λ) Note that

n n T  P P  T (Y ) = Yi, log(Yi) . i=1 i=1 Thus the ML estimator is given by the solution of

n P α Yi = n · i=1 λ

n P  log(Yi) = n E log(Y1) i=1 It can be shown that ∂Γ(α) 1 E log(Y ) = · − log(λ) 1 ∂α Γ(α)

Exponential Families, Apr 13, 2004 - 5 - The EM Algorithm for Exponential Families

Suppose the complete data Y have a distribution from an exponential family

T  fY (y|θ) = b(y) exp η(θ) T (y) − a(θ) .

Then the EM algorithm has a particularly simple form.

EM algorithm for exponential families

◦ E-step: Estimate the sufficient statistic T = T (Y ) by

(k) (k) T = E T (Y ) Yobs, θ .

◦ M-step: Find θ(k+1) by solving the likelihood equations for θ,

∂η(θ)T  ∂η(θ)T E T (Y ) θ = T (k), ∂θ ∂θ ∂η(θ) or, if the matrix ∂θ is invertible, ET (Y )|θ = T (k).

Example: normal observations Suppose that only the first m of n values are observed. ◦ E-step:

m (k) E (k) (k) 2 P (k) T1 = T1(Y ) Yobs, µˆ , σˆ = Yi + (n − m)µ ˆ i=1 m (k) E (k) (k) 2 P 2 (k) 2 (k) 2 T2 = T2(Y ) Yobs, µˆ , σˆ = Yi + (n − m) σˆ +µ ˆ i=1 ◦ M-step: 1 T (k) = E(T (Y )|µ, σ2) = n µ ⇒ µˆ(k+1) = T (k) 1 1 n 1 1 1 T (k) = E(T (Y )|µ, σ2) = n σ2 + µ2 ⇒ σˆ(k+1) 2 = T (k) − T (k)2 2 2 n 2 n2 1

Exponential Families, Apr 13, 2004 - 6 - The EM Algorithm for Exponential Families

Example: t distribution

Suppose that Y1,...,Yn are independently sampled from the density 1 f (y|µ) = 1 + (y − µ)2−1 Yi √ 1 πΓ 2

iid 2 Define the complete data as (Y,X) where Xi ∼ χ1 such that

−1 Yi|Xi ∼ N (µ, Xi ).

Then the complete-data likelihood is

n n  1 P 2 Q p  Ln(µ|Y,X) = exp − Xi(Yi − µ) Xi · fXi (Xi) 2 i=1 i=1 Thus n n  1 2  P P  η(µ) = µ, − µ and T (Y ) = Xi Yi, Xi . 2 i=1 i=1 ◦ E-step: n n 2 Y T (k) = P E(X |Y , µˆ(k))Y = P i 1 i i i (k) 2 i=1 i=1 1 + (Yi − µˆ )

n n 2 T (k) = P E(X |Y , µˆ(k)) = P 2 i i (k) 2 i=1 i=1 1 + (Yi − µˆ )

∂η(θ) T ◦ M-step: Note that ∂θ = (1, −µ) . Thus the ML estimator solves the equations

(k) (k) E E T1 − µ T2 = n (X1 Y1) − n µ (X1),

which yields

T (k) µˆ(k+1) = 1 . (k) T2

Exponential Families, Apr 13, 2004 - 7 -