The Exponential Family of Distributions

The Exponential Family of Distributions θ>T (x) A(θ) p(x) = h(x) e ¡ θ vector of parameters T (x) vector of “suf£cient statistics” A(θ) cumulant generating function h(x) T Key point: x and θ only “mix” in e θ T (x) 1 The Exponential Family of Distributions θ>T (x) A(θ) p(x) = h(x) e ¡ To get a normalized distribution, for any θ A(θ) θ>T (x) p(x) dx = e¡ h(x) e dx = 1 Z Z so > eA(θ) = h(x) e θ T (x) dx; Z i.e., when T (x) = x, A(θ) is the log of Laplace transform of h(x). 2 Examples 1 x ¹ 2=(2σ2) Gaussian p(x) = 2 e¡k ¡ k x R p2¼σ 2 x 1 x Bernoulli p(x) = ® (1 ®) ¡ x 0; 1 ¡ 2 f g n x n x Binomial p(x) = ® (1 ®) ¡ x 0; 1; 2; : : : ; n x ¡ 2 f g n! n xi Multinomial p(x) = ¡ ¢ ® xi 0; 1; 2; : : : ; n ; xi = n x1!x2!:::xn! i=1 i 2 f g i ¸x + Exponential p(x) = ¸ e¡ Q x R P 2 ¡¸ Poisson p(x) = e ¸x x 0; 1; 2; : : : x! 2 f g ¡( i ®i) ®i 1 Dirichlet p(x) = ¡(® ) i xi ¡ xi [0; 1] ; i xi = 1 iP i 2 Q (don’t need to memorize these eQxcept for Gaussian) P 3 Natural Parameter form for Bernoulli θ>T (x) A(θ) p(x) = h(x) e ¡ x 1 x p(x) = ® (1 ®) ¡ ¡ x 1 x = exp log ® (1 ®) ¡ ¡ = exp [hx log¡ ® + (1 x) log¢ i(1 ®) ] ¡ ¡ ® = exp x log + log (1 ®) 1 ® ¡ · ¡ ¸ = exp x θ log 1 + eθ ¡ so £ ¡ ¢ ¤ ® T (x) = x θ = log A(θ) = log 1 + eθ 1 ® ¡ ¡ ¢ 4 Natural Parameter Form for Gaussian 1 (x ¹)2=(2σ2) p(x) = e¡ ¡ p2¼σ2 1 x2 ¹x ¹2 = exp log σ + p2¼ ¡ ¡ 2σ2 σ2 ¡ 2σ2 µ ¶ 1 2 2 = exp θ>T (x) log σ ¹ =(2σ ) p2¼ ¡ ¡ ¡ A(θ) ¢ h(x) | {z } where | {z } 2 2 ¹ x µ/σ A(θ) = 2σ2 + log σ T (x) = θ = 2 2 2 [θ]1 1 0 x 1 0 1=(2σ ) 1 = log ( 2[θ]2) ¡ ¡ 4[θ]2 ¡ 2 ¡ @ A @ A 5 Natural Parameter Form for Multivariate Gaussian θ>T (x) A(θ) p(x) = h(x) e ¡ 1 (x ¹)§¡1(x ¹)=2 p(x) = e¡ ¡ ¡ (2¼)D=2 § 1=2 j j 1 D=2 x §¡ ¹ h(x) = (2¼)¡ T (x) = θ = 1 1 0 x x> 1 0 §¡ 1 ¡ 2 @ A @ A 6 The £rst derivative of A(θ) > A(θ) = log h(x) e θ T (x) dx · Z ¸ Q(θ) | {z } dA(θ) 1 dQ(θ) Q (θ) = = 0 dθ Q(θ) dθ Q(θ) > h(x) e θ T (x) T (x) dx = θ>T (x) R h(x) e dx θ>T (x) A(θ) hR(x) e ¡ T (x) dx = θ>T (x) A(θ) R h(x) e ¡ dx = E [T (x)] : pθR 7 The second derivative of A(θ) > A(θ) = log h(x) e θ T (x) dx · Z ¸ Q(θ) | {z } 2 dA(θ) d Q0(θ) d 1 Q00(θ) (Q0(θ)) = = Q0(θ) = dθ dθ Q(θ) dθ Q(θ) Q(θ) ¡ 2 · ¸ · ¸ (Q(θ)) > h(x) e θ T (x) T 2(x) dx = (E [T (x)])2 θ>T (x) pθ R h(x) e dx ¡ θ>T (x) A(θ) 2 hR(x) e ¡ T (x) dx 2 = (E [T (x)]) θ>T (x) A(θ) pθ R h(x) e ¡ dx ¡ = E T 2(x) (E [T (x)])2 = Cov [T (x)] 0: pθR ¡ pθ pθ º = A(θ) is convex.£ ( means¤ positive de£nite) ) º 8 Maximum Likelihood N N `(θ) = log p ( x θ ) = log h(x ) + T (x ) A(θ) i j i i ¡ i=1 i=1 X Xh i To £nd maxmimum likelihood solution N T `0(θ) = θ T (x ) NA0(θ) i ¡ " i=1 # X So ML solution satis£es 1 N A0(θ^ ) = T (x ) = 0 ML N i i=1 X (is θ^ML a consistent estimator then ?) 1 N Suf£cient statistics N i=1 T (xi) summarize data. When can’t do this analytically: convexity = unique global ML P ) solution for θ. 9 Products Products of E-family distributions are E-family distributions T T θ T (x) A(θ1) θ T (x) A(θ2) h(x) e 1 ¡ h(x) e 2 ¡ = £ ³ ´ ³ ´(θ1+θ2)T (x) A~(θ1,θ2) h~(x) e ¡ but might not have a nice parametric form any more. But the product of two Gaussians is always a Gaussian. 10 Conjugate Priors in Bayesian Statistics p ( x θ ) p(θ) p ( θ x ) = j j p ( x θ ) p(θ) dθ j Note: denominator not a functionR of θ just normalizing term ) p(θ) p ( x θ ) p(θ) p ( θ x ) p ( x θ ) p(θ) ¡! j ¡! j / j parametric parametric mess? Conjugacy:|{z} require |p(θ{z) and} p ( θ x ) to be of the same| for{zm. E.g.} j p(θ) p ( x θ ) p(θ) p ( θ x ) ¡! j ¡! j Dirichlet Multinomial Dirichlet p(θ) and p ( x|{z}θ ) are then |called{z conjugate} distrib| {zutions.} j 11 Example: Dirichlet and Multinomial ¡ ( i ®i) ®i 1 p(θ) = θ ¡ Dirichlet in θ ¡(x) = (x 1)! ¡ (® ) ¡ i i i P Y Q( x )! n p ( x θ ) = i i θxi Multinomial in x j x !x ! : : : x ! i 1 2 n i=1 P Y xi+®i 1 p ( θ x ) p ( θ x ) p(θ) = junk θ ¡ j / j £ i i Y which is again Dirichlet, so we must have ¡ ( i ®i + xi) xi+®i 1 p ( θ x ) = θ ¡ : j ¡ (® + x ) i i i i i P Y Remember pseudocount ofQ1? That was just a Dirichlet prior. 12 Conjugate Pairs Prior Conditional 2 2 2 2 ¹ ¹0 =(2σ ) x ¹ =(2σ ) Gaussian e¡k ¡ k Gaussian e¡k ¡ k ¡(r+s) r 1 s 1 x 1 x Beta ® ¡ (2 ®) ¡ Bernoulli ® (1 ®) ¡ ¡(r)¡(s) ¡ ¡ ¡( ®i) ®i 1 ( xi)! xi Dirichlet θ ¡ Multinomial θ ¡(®i) i xi! i P P Inv. Wishart Q Q Gaussian (cov) Q Q Note: Conjugacy is mutual, e.g. Dirichlet Multinomial Dirichlet ! ! Multinomial Dirichlet Multinomial ! ! 13.

The Exponential Family of Distributions

1 One Parameter Exponential Families

6: the Exponential Family and Generalized Linear Models

Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families

Exponential Families and Theoretical Inference

Exponential Family

3.4 Exponential Families

On Curved Exponential Families

A Primer on the Exponential Family of Distributions

Exponential Families

A Matrix Variate Generalization of the Power Exponential Family of Distributions

2. the Exponential Family

Exponential Families