<<

STAT 582 Exponential families

The family of distributions with range not depending on the parameter and with sufficient that have dimension independent of sample size turns out to be quite rich. It is called the of distributions. These have density T n f(x;q) = exp(c(q) T(x) - B(q))h(x), xŒEÃR where the set E does not depend on q. The sufficient T(x), which is determined up to a multipli- cative constant, is called the natural sufficient statistic.Wesay that an exponential family is minimal if the functions c(q) and the statistics T(x) each are linearly independent. We can always achieve this by reparametrization.

Example: (1) G(q,1): For the gamma distribution with q and 1 we have T (x)=Slog xi, E={xi>0, i=1, . . . ,n}, c(q)=q-1, B(q)=-n log G(q) and h(x)=exp(-Sxi).

(2) The inverse normal: This distribution is given by the density 1  ⁄2  __l _  -3/2 2 2 f (m,l) = x exp(-l(x-m) /2m x), x>0.  2p  2 Here m is the , and l is a precision parameter. It sometimes helps to reparametrize using a=l/m , yielding

1 ⁄2 1 f(a,l) = exp((al) - ⁄2log l-ax/2-l/2x) -1 so that (Sxi,Sxi )issufficient for the natural parameter a/2,l/2.

Foraminimal family, the sufficient statistic T is also minimal sufficient. For a proof, see Lehmann: Theory of , Example 5.9, pp. 43-44. If we parametrize the family using h=c(q), this is called the natural parametrization (or the canonical parametrization). We then write T f (x,h) = exp(h T(x) - A(h))h(x) where T A(h) = log Úexp(h T(x))h(x)dx E The natural parameter space is H={h: A(h) < •}.

Theorem 1: H is a convex set.

Proof: Let 0

A(ahh + (1-a)h1) £ aA(h) + (1-a)A(h1) < •.

Example: In the case of G(q,1) we define h(x)=exp(-Sxi - Slog xi), and find that q is itself the natural parameter (obviously, any linear function of q is also natural). The natural parameter space is then R+. - 2 -

Theorem 2: If d=dim(H)=1wehave whenever hŒint(H) that

EhT(X) = A´(h) and

Var hT(X) = A´´(h).

Proof: First compute E T(X) = T(x)f(x; )dx h Ú h E = ÚT(x)exp(hT(x)-A(h))h(x)dx E and ___d ______1 A´( ) = log exp( T)hdx = Texp(hT)hdx h d Ú h Ú h E Úexp(hT)hdx E E = Texp( T A)hdx = E T Ú h - h E where the differentiation under the integral sign needs to be be justified. To do that, define y(h)=exp(A(h)). Note that _y______(h+k)-y(h) __1 = (Úexp((h+k)T)hdx - Úexp(hT)hdx) k k E E _exp(______kT)-1 = Ú exp(hT)hdx E k Now, if | k |

____d 2 A´´ = log ( ) = ET 2 E2T = VarT. 2 y h - dh

Example: (1) G(q,1): For n=1wehave A (q)=log G(q), so _G____´(q) EqT = Eqlog X = = Y(q) G(q) where Y(x)isthe digamma function.

(2) The Rayleigh density: let __x _ f(x; ) = exp( x 2/2 2) q 2 - q q where x and q are positive real numbers. This is the density of the length of a bivariate normal vector, where the two components are independent standard normal. Writing this in exponential family form for a sample of size n we get ____1 f(x; ) = exp( x2 nlog 2 + log x ). q - 2 i - q i 2q S S 2 2 2 Thus h=c(q)=-1/2q ,orq =-1/2h. Also B(q)=n log q ,soA(h)=-n log(-2h). By Theorem 1 we get 2 __n 2 E Xi = - = 2nq S h and __n _ Var X2 = = 4n 4. i 2 q S h To compute these moments directly from the density one must compute, e.g, • x 3 __ _ exp( x 2/2 2)dx, Ú 2 - q 0 q an exercise in partial integration.

Corollary 1: For d>1wehave ____∂ EhTi(X) = A(h) ∂hi and 2 ______∂ Covh(Ti(X),Tj(X)) = A(h) ∂hi∂hj Corollary 2: The generating function of T is E exp( t T ) = ( +t)/ ( ), h S i i y h y h where y(h) = exp(A(h)). Corollary 3: The cumulant generating function of T is (t) = log E exp t T = A ( +t) A( ). k h S i i h - h (k) Thus A (h)isthe k’th cumulant of T. - 4 -

Example: (1) G(a,b): Here f(x,h) = exp(aSlogxi - bSxi - nalogb - nlogG(a)-Slog xi) T The natural parameter is h=(a,b) , with A(h)=alogb+logG(a), and by Corollary 2 the joint mgf for (log X,X)is

( +t 1) _G______a a t 1 exp(log(G(a+t 1)+(a+t 1)log(b+t 2))-log G(a)-alog b) = (1 +t2/b) (b+t 2) G(a) Setting t 1=0weget the mgf for X, and setting t 2=0 that for log X.

1 1 ⁄2 (2) Inverse normal: By Corollary 3, the cgf for the inverse is k(a,l)= ⁄2logl+(al) .

The exponential family is closed under , in the sense that if we take a random sample X1, . . . ,Xn from the distribution (1), the joint density is n T T f(xi;q) = exp( c(q) T(xi) - nB(q)) h(xi) = exp(c(q) ( T(xi)) - nB(q))hn(x1, . . . ,xn) i=P1 S P S so the natural sufficient statistic is ST(xi). Notice that this always has the same dimension, regardless of sample size. Furthermore, the natural sufficient statistic itself has an exponential family distribution. More precisely, we have the following result: Theorem 3: Let X be distributed according to the exponential family r s T f(x;(a,b) ) = exp( aiUi(x) + biTi(x) - A(a,b))h(x). i=S1 i=S1

Then the distribution of T 1, . . . ,Ts is an exponential family of the form s T f T(t;(a,b) ) = exp( biti - A(a,b))g(t;a) i=S1 and the conditional distribution of U 1, . . . ,Ur given T=t is an exponential family of the form r T f U |T(u | t;(a,b) ) = exp( aiui - A t(a))k(t,u). i=S1 Proof: We prove the result in the discrete case. A general proof is in Lehmann: Testing Statistical Hypotheses, Wiley, 1959, p.52. Compute T P(T(X) = t) = f(x;(a,b) )) x:TS(x)=t r s = exp( aiUi(x) + biTi(x) - A(a,b))h(x) x:TS(x)=t i=S1 i=S1 s r = exp( biti - A(a,b)) exp( biUi(x))h(x) i=S1 x:TS(x)=t i=S1 This proves the first part. For the second part, write P(U=u | T=t) = P(U=u,T=t)/P(T=t) r s exp( aiui+ biti-A(a,b))k(u,t) ______i=S1 i=S1 = = exp( u log g (t; ))k(u,t) s Sai i - a exp( biti-A(a,b))g(t;a) i=S1 - 5 - where k(u,t)= h(x). x:U(x)=Su,T(x)=t

The likelihood equation for an exponential family is simple. Theorem 4: Let C=int(c(q):qqŒQQ), and let X be distributed according to a minimal exponential family. If the equation

Eq T(X) = T(x) has a solution qˆ(x) with c(qˆ(x))ŒC, then qˆ is the unique mle of q. Proof: Look first at the canonical parametrization, so that the log likelihood is ShiTi(x) - A(h) - log h (x) and the likelihood equations are ____∂ Ti(x) = A(h) = EhTi(X) ∂hi using Corollary 1 to Theorem 1. In addition, minus the second derivative of the log likelihood is just [∂A(h)/∂hi∂hj], or the matrix of T. Since the parametrization is minimal, the is positive definite.

Now note that the values of ln(q), qqŒQQ are a subset of the values of ln(h), hhŒH. Hence if c(qˆ)ŒC then this corresponds to the unique maximizing hˆ .

Example: G(a,b): The likelihood equations are __a n = xi n(logb-Y(a)) = log xi b S S