Exponential Families

STAT 582 Exponential families The family of distributions with range not depending on the parameter and with sufficient statistics that have dimension independent of sample size turns out to be quite rich. It is called the exponential family of distributions. These have density T n f(x;q) = exp(c(q) T(x) - B(q))h(x), xŒEÃR where the set E does not depend on q. The sufficient statistic T(x), which is determined up to a multipli- cative constant, is called the natural sufficient statistic.Wesay that an exponential family is minimal if the functions c(q) and the statistics T(x) each are linearly independent. We can always achieve this by reparametrization. Example: (1) G(q,1): For the gamma distribution with shape parameter q and scale parameter 1 we have T (x)=Slog xi, E={xi>0, i=1, . ,n}, c(q)=q-1, B(q)=-n log G(q) and h(x)=exp(-Sxi). (2) The inverse normal: This distribution is given by the density 1 ⁄2 __l _ -3/2 2 2 f (m,l) = x exp(-l(x-m) /2m x), x>0. 2p 2 Here m is the mean, and l is a precision parameter. It sometimes helps to reparametrize using a=l/m , yielding 1 ⁄2 1 f(a,l) = exp((al) - ⁄2log l-ax/2-l/2x) -1 so that (Sxi,Sxi )issufficient for the natural parameter a/2,l/2. Foraminimal family, the sufficient statistic T is also minimal sufficient. For a proof, see Lehmann: Theory of Point Estimation, Example 5.9, pp. 43-44. If we parametrize the family using h=c(q), this is called the natural parametrization (or the canonical parametrization). We then write T f (x,h) = exp(h T(x) - A(h))h(x) where T A(h) = log Úexp(h T(x))h(x)dx E The natural parameter space is H={h: A(h) < •}. Theorem 1: H is a convex set. Proof: Let 0<a<1 and take h and h1 in H.Write T a T 1-a A(ahh + (1-a)h1) = log Ú(exp(h T(x))h((x))) (exp(h 1 T(x))h((x))) dx. E 1 But u av -a£au+(1-a)v (take logarithms of both sides and use the fact that the logarithm is a concave function), whence A(ahh + (1-a)h1) £ aA(h) + (1-a)A(h1) < •. Example: In the case of G(q,1) we define h(x)=exp(-Sxi - Slog xi), and find that q is itself the natural parameter (obviously, any linear function of q is also natural). The natural parameter space is then R+. - 2 - Theorem 2: If d=dim(H)=1wehave whenever hŒint(H) that EhT(X) = A´(h) and Var hT(X) = A´´(h). Proof: First compute E T(X) = T(x)f(x; )dx h Ú h E = ÚT(x)exp(hT(x)-A(h))h(x)dx E and ___d ____________1 A´( ) = log exp( T)hdx = Texp(hT)hdx h d Ú h Ú h E Úexp(hT)hdx E E = Texp( T A)hdx = E T Ú h - h E where the differentiation under the integral sign needs to be be justified. To do that, define y(h)=exp(A(h)). Note that _y___________(h+k)-y(h) __1 = (Úexp((h+k)T)hdx - Úexp(hT)hdx) k k E E _exp(________kT)-1 = Ú exp(hT)hdx E k Now, if | k | <d i i 1 i i 1 _exp(________kT)-1 • _T_____k - • ___________| T | | k | - ____________exp( | T | d)-1 _exp(_______________dT)+exp(-dT) | | = | | £ < < k i=S1 i ! i=S1 i ! d d and thus _y___________(h+k)-y(h) __1 | | < (y(h+d)+y(h-d)) < • k d since hŒint(H). Hence the dominated convergence theorem applies, and _y___________(h+k)-y(h) _exp(________kT)-1 lim = Úlim exp(hT)hdx = ÚTexp(hT)hdx. kÆ0 k EkÆ0 k E This justifies the interchange of integration and differentiation (and the argument can be repeated for higher derivatives). To compute the variance, first note that 2 2 ____d _y_____´´(s) _y____´(s) log y(s) = - . ds 2 y(s) y(s) By differentiating under the integral sign ´´( )= T 2exp( T)hdx,so ´´/ =E T 2,whence y h Ú h y y h E - 3 - ____d 2 A´´ = log ( ) = ET 2 E2T = VarT. 2 y h - dh Example: (1) G(q,1): For n=1wehave A (q)=log G(q), so _G____´(q) EqT = Eqlog X = = Y(q) G(q) where Y(x)isthe digamma function. (2) The Rayleigh density: let __x _ f(x; ) = exp( x 2/2 2) q 2 - q q where x and q are positive real numbers. This is the density of the length of a bivariate normal vector, where the two components are independent standard normal. Writing this in exponential family form for a sample of size n we get ____1 f(x; ) = exp( x2 nlog 2 + log x ). q - 2 i - q i 2q S S 2 2 2 Thus h=c(q)=-1/2q ,orq =-1/2h. Also B(q)=n log q ,soA(h)=-n log(-2h). By Theorem 1 we get 2 __n 2 E Xi = - = 2nq S h and __n _ Var X2 = = 4n 4. i 2 q S h To compute these moments directly from the density one must compute, e.g, • x 3 __ _ exp( x 2/2 2)dx, Ú 2 - q 0 q an exercise in partial integration. Corollary 1: For d>1wehave ____∂ EhTi(X) = A(h) ∂hi and 2 _______∂ Covh(Ti(X),Tj(X)) = A(h) ∂hi∂hj Corollary 2: The moment generating function of T is E exp( t T ) = ( +t)/ ( ), h S i i y h y h where y(h) = exp(A(h)). Corollary 3: The cumulant generating function of T is (t) = log E exp t T = A ( +t) A( ). k h S i i h - h (k) Thus A (h)isthe k’th cumulant of T. - 4 - Example: (1) G(a,b): Here f(x,h) = exp(aSlogxi - bSxi - nalogb - nlogG(a)-Slog xi) T The natural parameter is h=(a,b) , with A(h)=alogb+logG(a), and by Corollary 2 the joint mgf for (log X,X)is ( +t 1) _G_______a a t 1 exp(log(G(a+t 1)+(a+t 1)log(b+t 2))-log G(a)-alog b) = (1 +t2/b) (b+t 2) G(a) Setting t 1=0weget the mgf for X, and setting t 2=0 that for log X. 1 1 ⁄2 (2) Inverse normal: By Corollary 3, the cgf for the inverse normal distribution is k(a,l)= ⁄2logl+(al) . The exponential family is closed under sampling, in the sense that if we take a random sample X1, . ,Xn from the distribution (1), the joint density is n T T f(xi;q) = exp( c(q) T(xi) - nB(q)) h(xi) = exp(c(q) ( T(xi)) - nB(q))hn(x1, . ,xn) i=P1 S P S so the natural sufficient statistic is ST(xi). Notice that this always has the same dimension, regardless of sample size. Furthermore, the natural sufficient statistic itself has an exponential family distribution. More precisely, we have the following result: Theorem 3: Let X be distributed according to the exponential family r s T f(x;(a,b) ) = exp( aiUi(x) + biTi(x) - A(a,b))h(x). i=S1 i=S1 Then the distribution of T 1, . ,Ts is an exponential family of the form s T f T(t;(a,b) ) = exp( biti - A(a,b))g(t;a) i=S1 and the conditional distribution of U 1, . ,Ur given T=t is an exponential family of the form r T f U |T(u | t;(a,b) ) = exp( aiui - A t(a))k(t,u). i=S1 Proof: We prove the result in the discrete case. A general proof is in Lehmann: Testing Statistical Hypotheses, Wiley, 1959, p.52. Compute T P(T(X) = t) = f(x;(a,b) )) x:TS(x)=t r s = exp( aiUi(x) + biTi(x) - A(a,b))h(x) x:TS(x)=t i=S1 i=S1 s r = exp( biti - A(a,b)) exp( biUi(x))h(x) i=S1 x:TS(x)=t i=S1 This proves the first part. For the second part, write P(U=u | T=t) = P(U=u,T=t)/P(T=t) r s exp( aiui+ biti-A(a,b))k(u,t) _____________________________i=S1 i=S1 = = exp( u log g (t; ))k(u,t) s Sai i - a exp( biti-A(a,b))g(t;a) i=S1 - 5 - where k(u,t)= h(x). x:U(x)=Su,T(x)=t The likelihood equation for an exponential family is simple. Theorem 4: Let C=int(c(q):qqŒQQ), and let X be distributed according to a minimal exponential family. If the equation Eq T(X) = T(x) has a solution qˆ(x) with c(qˆ(x))ŒC, then qˆ is the unique mle of q. Proof: Look first at the canonical parametrization, so that the log likelihood is ShiTi(x) - A(h) - log h (x) and the likelihood equations are ____∂ Ti(x) = A(h) = EhTi(X) ∂hi using Corollary 1 to Theorem 1. In addition, minus the second derivative of the log likelihood is just [∂A(h)/∂hi∂hj], or the covariance matrix of T. Since the parametrization is minimal, the covariance matrix is positive definite. Now note that the values of ln(q), qqŒQQ are a subset of the values of ln(h), hhŒH.

Load more