Exponential Families

Exponential Families T The random vector Y = (Y1,...,Yn) has a distribution from an exponential family if the density of Y is of the form T fY (y|θ) = b(y) exp η(θ) T (y) − a(θ) where T η(θ) = η1(θ), . , ηd(θ) , T T (y) = T1(y),...,Td(y) . T ◦ η = (η1, . , ηd) is the natural parameter of the exponential family ◦ T = T (Y ) is a sufficient statistic for θ (or for η) Definition A statistic T = T (Y1,...,Yn) is said to be sufficient for θ if the conditional distribution of Y1,...,Yn given T = t does not depend on θ for any value of t. A sufficient statistic for θ contains all the information in the sample about θ. Thus, given the value of T , we cannot improve our knowledge about θ by a more detailed analysis of the data Y1,...,Yn. In other words, an estimate based on T = t cannot improved by using the data Y1,...,Yn. Sufficiency and exponential families: Rice (1995), 280-284. Example: Consider a sequence of independent Bernoulli trials, iid Yi ∼ Bin(1, θ). Pn The number of successes, Sn = i=1 Yi is sufficient for the parameter θ. Additional information about the observed values Y1,...,Yn, as e.g. the order in which the successes occured, does not convey any information about θ. Exponential Families, Apr 13, 2004 - 1 - Exponential Families Examples ◦ Binomial distribution: Y ∼ Bin(n, θ) n f (y|θ) = θy(1 − θ)n−y Y y n h i = exp y log θ + n log(1 − θ) y 1−θ with θ η(θ) = log 1 − θ T (y) = y ◦ Normal distribution: Y ∼ N (µ, σ2) 2 2 − 1 1 2 f (y|µ, σ ) = (2πσ ) 2 exp − (y − µ) Y 2σ2 1 µ µ2 1 = exp − y2 + y − − log(2πσ2) 2σ2 σ2 2σ2 2 with 1 µ T η(µ, σ2) = − , 2σ2 σ2 T (y) = (y2, y)T T iid 2 If Y = (Y1,...,Yn) with Yi ∼ N (µ, σ ), then n n T P 2 P T (y) = yi , yi i=1 i=1 Exponential Families, Apr 13, 2004 - 2 - Exponential Families ◦ Gamma distribution: Y ∼ Γ(α, λ) λα f (y|α, λ) = yα−1 exp − λ y Y Γ(α) = x−1 exp − λ y + α log(y) + α log(λ) − log(Γ(α)) with η(α, λ) = (−λ, α)T T (y) = y, log(y)T T iid If Y = (Y1,...,Yn) with Yi ∼ Γ(α, λ), then n n T P P T (y) = yi, log(yi) i=1 i=1 This includes as special cases ◦ the exponential distribution (= Γ(1, λ)) 2 n 1 ◦ the χ distribution with n degrees of freedom (= Γ 2 , 2 ) ◦ Beta distribution: Y ∼ B(α, β) Γ(α + β) f (y|α, β) = yα−1(1 − y)β−1 Y Γ(α)Γ(β) = [y(1 − y)]−1 exp α log(y) + β log(1 − y) + log Γ(α + β) − log Γ(α) − log Γ(β) with η(α, β) = (α, β)T T (y) = log(y), log(1 − y)T Exponential Families, Apr 13, 2004 - 3 - Maximum Likelihood for Exponential Families T Suppose that Y = (Y1,...,Yn) with density T fY (y|θ) = b(y) exp η(θ) T (y) − a(θ) Then the log-likelihood function is given by T ln(θ|Y ) = η(θ) T (Y ) − a(θ) + log b(Y ) . Differentiating with respect to θ, we obtain the likelihood equations ∂η(θ)T ∂a(θ) T (Y ) = . ∂θ ∂θ The likelihood equations can be rewritten in the following form ∂η(θ)T ∂η(θ)T E T (Y ) θ = T (Y ). ∂θ ∂θ ∂η(θ) If the matrix ∂θ is invertible, then this simplifies to ET (Y )|θ = T (Y ). Proof. To see this, note that ∂η(θ)T ∂a(θ) E T (Y ) − ∂θ ∂θ Z ∂η(θ)T ∂a(θ) = T (Y ) − exp η(θ)TT (y) − a(θ) b(y) dy ∂θ ∂θ ∂ Z = exp η(θ)TT (y) − a(θ) b(y) dy ∂θ = 0 Exponential Families, Apr 13, 2004 - 4 - Maximum Likelihood for Exponential Families Examples iid 2 ◦ Y1,...,Yn ∼ N (µ, σ ) Note that n n T P P 2 T (Y ) = Yi, Yi . i=1 i=1 Thus the ML estimator is given by the solution of n P Yi = n µ i=1 n P 2 2 2 Yi = n σ + n µ i=1 iid ◦ Y1,...,Yn ∼ Γ(α, λ) Note that n n T P P T (Y ) = Yi, log(Yi) . i=1 i=1 Thus the ML estimator is given by the solution of n P α Yi = n · i=1 λ n P log(Yi) = n E log(Y1) i=1 It can be shown that ∂Γ(α) 1 E log(Y ) = · − log(λ) 1 ∂α Γ(α) Exponential Families, Apr 13, 2004 - 5 - The EM Algorithm for Exponential Families Suppose the complete data Y have a distribution from an exponential family T fY (y|θ) = b(y) exp η(θ) T (y) − a(θ) . Then the EM algorithm has a particularly simple form. EM algorithm for exponential families ◦ E-step: Estimate the sufficient statistic T = T (Y ) by (k) (k) T = E T (Y ) Yobs, θ . ◦ M-step: Find θ(k+1) by solving the likelihood equations for θ, ∂η(θ)T ∂η(θ)T E T (Y ) θ = T (k), ∂θ ∂θ ∂η(θ) or, if the matrix ∂θ is invertible, ET (Y )|θ = T (k). Example: Univariate normal observations Suppose that only the first m of n values are observed. ◦ E-step: m (k) E (k) (k) 2 P (k) T1 = T1(Y ) Yobs, µˆ , σˆ = Yi + (n − m)µ ˆ i=1 m (k) E (k) (k) 2 P 2 (k) 2 (k) 2 T2 = T2(Y ) Yobs, µˆ , σˆ = Yi + (n − m) σˆ +µ ˆ i=1 ◦ M-step: 1 T (k) = E(T (Y )|µ, σ2) = n µ ⇒ µˆ(k+1) = T (k) 1 1 n 1 1 1 T (k) = E(T (Y )|µ, σ2) = n σ2 + µ2 ⇒ σˆ(k+1) 2 = T (k) − T (k)2 2 2 n 2 n2 1 Exponential Families, Apr 13, 2004 - 6 - The EM Algorithm for Exponential Families Example: t distribution Suppose that Y1,...,Yn are independently sampled from the density 1 f (y|µ) = 1 + (y − µ)2−1 Yi √ 1 πΓ 2 iid 2 Define the complete data as (Y, X) where Xi ∼ χ1 such that −1 Yi|Xi ∼ N (µ, Xi ). Then the complete-data likelihood is n n 1 P 2 Q p Ln(µ|Y, X) = exp − Xi(Yi − µ) Xi · fXi (Xi) 2 i=1 i=1 Thus n n 1 2 P P η(µ) = µ, − µ and T (Y ) = Xi Yi, Xi . 2 i=1 i=1 ◦ E-step: n n 2 Y T (k) = P E(X |Y , µˆ(k))Y = P i 1 i i i (k) 2 i=1 i=1 1 + (Yi − µˆ ) n n 2 T (k) = P E(X |Y , µˆ(k)) = P 2 i i (k) 2 i=1 i=1 1 + (Yi − µˆ ) ∂η(θ) T ◦ M-step: Note that ∂θ = (1, −µ) . Thus the ML estimator solves the equations (k) (k) E E T1 − µ T2 = n (X1 Y1) − n µ (X1), which yields T (k) µˆ(k+1) = 1 . (k) T2 Exponential Families, Apr 13, 2004 - 7 -.

Load more