The Exponential Family of Distributions

The Exponential Family of Distributions

The Exponential Family of Distributions θ>T (x) A(θ) p(x) = h(x) e ¡ θ vector of parameters T (x) vector of “suf£cient statistics” A(θ) cumulant generating function h(x) T Key point: x and θ only “mix” in e θ T (x) 1 The Exponential Family of Distributions θ>T (x) A(θ) p(x) = h(x) e ¡ To get a normalized distribution, for any θ A(θ) θ>T (x) p(x) dx = e¡ h(x) e dx = 1 Z Z so > eA(θ) = h(x) e θ T (x) dx; Z i.e., when T (x) = x, A(θ) is the log of Laplace transform of h(x). 2 Examples 1 x ¹ 2=(2σ2) Gaussian p(x) = 2 e¡k ¡ k x R p2¼σ 2 x 1 x Bernoulli p(x) = ® (1 ®) ¡ x 0; 1 ¡ 2 f g n x n x Binomial p(x) = ® (1 ®) ¡ x 0; 1; 2; : : : ; n x ¡ 2 f g n! n xi Multinomial p(x) = ¡ ¢ ® xi 0; 1; 2; : : : ; n ; xi = n x1!x2!:::xn! i=1 i 2 f g i ¸x + Exponential p(x) = ¸ e¡ Q x R P 2 ¡¸ Poisson p(x) = e ¸x x 0; 1; 2; : : : x! 2 f g ¡( i ®i) ®i 1 Dirichlet p(x) = ¡(® ) i xi ¡ xi [0; 1] ; i xi = 1 iP i 2 Q (don’t need to memorize these eQxcept for Gaussian) P 3 Natural Parameter form for Bernoulli θ>T (x) A(θ) p(x) = h(x) e ¡ x 1 x p(x) = ® (1 ®) ¡ ¡ x 1 x = exp log ® (1 ®) ¡ ¡ = exp [hx log¡ ® + (1 x) log¢ i(1 ®) ] ¡ ¡ ® = exp x log + log (1 ®) 1 ® ¡ · ¡ ¸ = exp x θ log 1 + eθ ¡ so £ ¡ ¢ ¤ ® T (x) = x θ = log A(θ) = log 1 + eθ 1 ® ¡ ¡ ¢ 4 Natural Parameter Form for Gaussian 1 (x ¹)2=(2σ2) p(x) = e¡ ¡ p2¼σ2 1 x2 ¹x ¹2 = exp log σ + p2¼ ¡ ¡ 2σ2 σ2 ¡ 2σ2 µ ¶ 1 2 2 = exp θ>T (x) log σ ¹ =(2σ ) p2¼ ¡ ¡ ¡ A(θ) ¢ h(x) | {z } where | {z } 2 2 ¹ x µ/σ A(θ) = 2σ2 + log σ T (x) = θ = 2 2 2 [θ]1 1 0 x 1 0 1=(2σ ) 1 = log ( 2[θ]2) ¡ ¡ 4[θ]2 ¡ 2 ¡ @ A @ A 5 Natural Parameter Form for Multivariate Gaussian θ>T (x) A(θ) p(x) = h(x) e ¡ 1 (x ¹)§¡1(x ¹)=2 p(x) = e¡ ¡ ¡ (2¼)D=2 § 1=2 j j 1 D=2 x §¡ ¹ h(x) = (2¼)¡ T (x) = θ = 1 1 0 x x> 1 0 §¡ 1 ¡ 2 @ A @ A 6 The £rst derivative of A(θ) > A(θ) = log h(x) e θ T (x) dx · Z ¸ Q(θ) | {z } dA(θ) 1 dQ(θ) Q (θ) = = 0 dθ Q(θ) dθ Q(θ) > h(x) e θ T (x) T (x) dx = θ>T (x) R h(x) e dx θ>T (x) A(θ) hR(x) e ¡ T (x) dx = θ>T (x) A(θ) R h(x) e ¡ dx = E [T (x)] : pθR 7 The second derivative of A(θ) > A(θ) = log h(x) e θ T (x) dx · Z ¸ Q(θ) | {z } 2 dA(θ) d Q0(θ) d 1 Q00(θ) (Q0(θ)) = = Q0(θ) = dθ dθ Q(θ) dθ Q(θ) Q(θ) ¡ 2 · ¸ · ¸ (Q(θ)) > h(x) e θ T (x) T 2(x) dx = (E [T (x)])2 θ>T (x) pθ R h(x) e dx ¡ θ>T (x) A(θ) 2 hR(x) e ¡ T (x) dx 2 = (E [T (x)]) θ>T (x) A(θ) pθ R h(x) e ¡ dx ¡ = E T 2(x) (E [T (x)])2 = Cov [T (x)] 0: pθR ¡ pθ pθ º = A(θ) is convex.£ ( means¤ positive de£nite) ) º 8 Maximum Likelihood N N `(θ) = log p ( x θ ) = log h(x ) + T (x ) A(θ) i j i i ¡ i=1 i=1 X Xh i To £nd maxmimum likelihood solution N T `0(θ) = θ T (x ) NA0(θ) i ¡ " i=1 # X So ML solution satis£es 1 N A0(θ^ ) = T (x ) = 0 ML N i i=1 X (is θ^ML a consistent estimator then ?) 1 N Suf£cient statistics N i=1 T (xi) summarize data. When can’t do this analytically: convexity = unique global ML P ) solution for θ. 9 Products Products of E-family distributions are E-family distributions T T θ T (x) A(θ1) θ T (x) A(θ2) h(x) e 1 ¡ h(x) e 2 ¡ = £ ³ ´ ³ ´(θ1+θ2)T (x) A~(θ1,θ2) h~(x) e ¡ but might not have a nice parametric form any more. But the product of two Gaussians is always a Gaussian. 10 Conjugate Priors in Bayesian Statistics p ( x θ ) p(θ) p ( θ x ) = j j p ( x θ ) p(θ) dθ j Note: denominator not a functionR of θ just normalizing term ) p(θ) p ( x θ ) p(θ) p ( θ x ) p ( x θ ) p(θ) ¡! j ¡! j / j parametric parametric mess? Conjugacy:|{z} require |p(θ{z) and} p ( θ x ) to be of the same| for{zm. E.g.} j p(θ) p ( x θ ) p(θ) p ( θ x ) ¡! j ¡! j Dirichlet Multinomial Dirichlet p(θ) and p ( x|{z}θ ) are then |called{z conjugate} distrib| {zutions.} j 11 Example: Dirichlet and Multinomial ¡ ( i ®i) ®i 1 p(θ) = θ ¡ Dirichlet in θ ¡(x) = (x 1)! ¡ (® ) ¡ i i i P Y Q( x )! n p ( x θ ) = i i θxi Multinomial in x j x !x ! : : : x ! i 1 2 n i=1 P Y xi+®i 1 p ( θ x ) p ( θ x ) p(θ) = junk θ ¡ j / j £ i i Y which is again Dirichlet, so we must have ¡ ( i ®i + xi) xi+®i 1 p ( θ x ) = θ ¡ : j ¡ (® + x ) i i i i i P Y Remember pseudocount ofQ1? That was just a Dirichlet prior. 12 Conjugate Pairs Prior Conditional 2 2 2 2 ¹ ¹0 =(2σ ) x ¹ =(2σ ) Gaussian e¡k ¡ k Gaussian e¡k ¡ k ¡(r+s) r 1 s 1 x 1 x Beta ® ¡ (2 ®) ¡ Bernoulli ® (1 ®) ¡ ¡(r)¡(s) ¡ ¡ ¡( ®i) ®i 1 ( xi)! xi Dirichlet θ ¡ Multinomial θ ¡(®i) i xi! i P P Inv. Wishart Q Q Gaussian (cov) Q Q Note: Conjugacy is mutual, e.g. Dirichlet Multinomial Dirichlet ! ! Multinomial Dirichlet Multinomial ! ! 13.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us