Statistics 550 Notes 8

Total Page:16

File Type:pdf, Size:1020Kb

Statistics 550 Notes 8

Statistics 550 Notes 8

Reading: Section 1.6.1-1.6.4

I. Correction on Minimal Sufficiency

The statement of Theorem 2 in Notes 7 was wrong. The correct statement is

Theorem 2 (Lehmann and Scheffe, 1950): Suppose S(X ) is a sufficient statistic for q . Also suppose that if for two sample points x and y , the ratio f(x |q ) / f ( y | q ) is constant as a function of q , then S(x )= S ( y ) . Then S(X ) is a minimal sufficient statistic for q .

Proof: Let T (X ) be any statistic that is sufficient for q . By the factorization theorem, there exist functions g and h such that f(x |q )= g ( T ( xθ ) | ) hx ( ) . Let x and y be any two sample points with T(x )= T ( y ) . Then f(x |q ) g ( T ( x ) | q ) h ( x ) h ( x ) = = . f(y |q ) g ( T ( y ) | q ) h ( y ) h ( y ) Since this ratio does not depend on q , the assumptions of the theorem imply that S(x )= S ( y ) . Thus, S(X ) is at least as coarse a partition of the sample space as T (X ) , and consequently S(X ) is minimal sufficient.

Example 1: Consider the ratio

1 n n 邋 xi n- x i f (x |q ) qi=1 (1- q ) i = 1 = n n . 邋 yi n- y i f (y |q ) qi=1(1- q ) i = 1 If this ratio is constant as a function of q , then we must n n x= y have 邋i=1i i = 1 i . Since we have shown that n

T(X ) = X i is a sufficient statistic, it follows from the i=1 n above sentence and Theorem 2 that T(X ) = X i is a i=1 minimal sufficient statistic.

II. Exponential Families

The binomial and normal models exhibited the interesting feature that there is a natural minimal sufficient statistic whose dimension is independent of the sample size. The exponential family models are a general class of models that exhibit this feature.

The class of exponential family models includes many of the mostly widely used statistical models (e.g., binomial, normal, gamma, Poisson, multinomial). Exponential family models have an underlying structure with elegant properties that we will discuss.

One-parameter exponential families: The family of distributions of a model {Pq :q 蜵 }is said to be a one- parameter exponential family if there exist real-valued

2 functions h( q ),B ( q ), T (x ), h ( x ) such that the pdf or pmf may be written as p(x |q )= h ( x )exp{ h ( q ) T ( x ) - B ( q )}

Comments: (1) For an exponential family, the support of the distribution (i.e., {x : p ( x |q )> 0}) cannot depend on q .

Thus, X1, , X n iid Uniform (0,q ) is not an exponential family model.

(2) For an exponential family model, T (x ) is a sufficient statistic by the factorization theorem.

(3) h,B , T are not unique. For example, h can be multiplied by a constant c and T can be divided by the same constant c.

Examples of one-parameter exponential family models: (1) Poisson family. Let X ~ Poisson(q ), 0 < q < . Then for x {0,1,2,...}, q xe-q 1 p( x |q )= = exp{ x log q - q }. x! x ! This is a one-parameter exponential family with h( q )= log q ,B ( q ) = q , T ( x ) = x , h ( x ) = 1/ x !.

(2) Binomial family. Let X~ Binomial( n ,q ), 0< q < 1. Then for x{0,1,2,..., n },

3 骣n x n- x p( x |q )=琪 q (1 - q ) 桫x

骣n 骣q =琪 exp[x log琪 + n log(1 -q )] 桫x 桫1-q This is a one-parameter exponential family with 骣q 骣n h( q )= log琪 ,B ( q ) = - n log(1 - q ), T ( x ) = x , h ( x ) = 琪 桫1-q 桫x The family of distributions obtained by taking iid samples from one-parameter exponential families are themselves one-parameter exponential families.

Specifically, suppose X ~ Pq and {Pq :q 蜵 }is an exponential family, then for X1, , X n iid with common distribution Pq , n 轾 n p(x , , x |q )= h ( x ) exp轾 h ( q ) T ( x ) - nB ( q ) 1 n犏 i臌 i=1 i 臌i=1 n T (x ) A sufficient statistic is i=1 i and it is one dimensional whatever the sample size n is.

For X1, , X n iid Poisson (q ), the sufficient statistic n T (x ) i=1 i has a Poisson ( nq ) distribution and hence has an exponential family model. It is generally true that the sufficient statistic of an exponential family model follows an exponential family.

Theorem 1.6.1: Let {Pq :q 蜵 }be a one-parameter exponential family of discrete distributions:

4 p(x |q )= h ( x )exp{ h ( q ) T ( x ) - B ( q )} Then the family of the distributions of the statistic T (X ) is a one-parameter exponential family of discrete distributions whose pdf may be written h*( t )exp{h ( q ) t- B ( q )} for suitable h*.

Proof: By definition,

Pq [ T ( x )= t ] = p ( x |q ) {x : T ( x )= t } =h ( x )exp[h ( q ) T ( x ) - B ( q )] {x : T ( x )= t } = exp[h ( q )t - B ( q )]{ h ( x )} {x : T ( x )= t } * If we let h( t )= h ( x ) , the result follows. {x : T ( x )= t }

A similar theorem holds for continuous exponential families.

A useful reparameterization of the exponential family model is to index h h( q ) as the parameter to yield p( x |h )= h ( x )exp[ h T ( x ) - A ( h )], where A(h )= log蝌⋯ h ( x )exp[ h T ( x )] dx in the continuous case and the integral is replaced by a sum in the discrete space. If q 蜵 , then A(h ) must be finite. Let X ={h :|A ( h ) | < } . The model given by with h ranging over X is called the canonical one-parameter exponential family generated by T and h. X is called the natural

5 parameter space and T is called the natural sufficient statistic. The canonical one-parameter exponential family contains the one-parameter exponential family with parameter space q 蜵 and can be thought of as the “biggest” possible parameter space for the exponential family.

Example 1: Let X ~ Poisson(q ), 0 < q < . Then for x {0,1,2,...}, q xe-q 1 p( x |q )= = exp{ x log q - q } x! x !

Letting h= log q , we have 1 p( x |h )= exp{ h x - exp( h )}, x={0,1,2,...}. x! We have 1 A(h )= log ehx x=0 x! (eh )x = log x=0 x! = log exp(eh ) = e h Thus, X ={h :|A ( h ) | < � } . Note that if 1

6 A useful result about exponential families is the following computational shortcut for moments of the natural sufficient statistic:

Theorem 1.6.2: If X is distributed according to and h is an interior point of X , then the moment-generating function of T( X ) exists and is given by M( s )= E [exp( sT ( X ))]exp[ A ( s +h ) - A ( h )] for s in some neighborhood of 0. Moreover,

Eh[ T ( X )]= A '(h ), Var h [ T ( X )] = A ''( h ) .

Proof: This is the proof for the continuous case. M( s )= E (exp( sT ( X ))) =蝌⋯ h ( x )exp[( s +h ) T ( x ) - A ( h )] dx = {exp[A ( s +h ) - A ( h )]}蝌⋯ h ( x )exp[( s + h ) T ( x ) - A ( s + h )] dx = exp[A ( s +h ) - A ( h )] because the last factor, being the integral of a density, is one. The rest of the theorem follows from the moment generating property of M( s ) (see Section A.12 of Bickel and Doksum).

Comment on proof: In order for the moment generating function (MGF) properties to hold, the MGF must exist (be less than infinity) for s in some neighborhood of 0. The proof that the MGF exists for s in some neighborhood of 0 relies on the fact that X is an interval or , which is established in Section 1.6.4.

7 Example 1 continued: Let X ~ Poisson(q ), 0 < q < . The natural sufficient statistic is T( X ) = X and h= log q , A(h ) = eh . Thus, using Theorem 1.6.2, d E[ X ] = eh = e h = q q h=log q dh h=log q d 2 Var[ X ] = eh = e h = q q 2 h=log q dh h=log q

Example 2: Suppose X1, , X n is a sample from a population with pdf x x2 p( x |q )= exp( - ), x > 0, q > 0 q22 q 2 This is known as the Rayleigh distribution. It is used to model the density of time until failure for certain types of equipment. The data comes from an exponential family: n n 2 骣 xi x i p( x1 , , xn |q )=琪 2 exp( - 2 ) 桫i=1 qi=1 2 q n n 骣 1 2 2 =琪 xi exp( -2 x i - n logq ) 桫i=1 2q i=1 Here -1 1 h=, q2 = - , B ( q ) = n log q 2 , A ( h ) = - n log( - 2 h ) . 2q2 2 h n 2 Therefore, the natural sufficient statistic X i has mean i=1 A'(h )= - n / h = 2 n q 2 and variance A''(h )= n / h2 = 4 n q 4 .

8 Proving that a one parameter family is not an exponential family

A one parameter exponential family is a family p(x |q )= h ( x )exp{ h ( q ) T ( x ) - B ( q )} , q 蜵 .

Consider a one parameter family {p (x |q ), q 蜵 }. If the support of p( x |q ) is different for different q , then the family is not an exponential family because p( x |q )= 0 if and only if h(x )= 0 .

Suppose that the support of p( x |q ) is the same for all q 蜵 . We can write the pdf or pmf of the family as p(x |q )= h ( x )exp{ g ( x , q )}. In order for this to be an exponential family, we need to be able to write g(x ,q )= h ( q ) T ( x ) - B ( q ) for some functions h, B and T .

Suppose holds. Then for any two sample points x1 and x2 , g(x1 ,q )- g ( x 2 , q ) = h ( q )[ T ( x 1 ) - T ( x 2 )] and for any four sample points x1 , x2 , x3 , x4 , g(x ,q )- g ( x , q ) T ( x ) - T ( x ) 1 2= 1 2 g(x3 ,q )- g ( x 4 , q ) T ( x 3 ) - T ( x 4 ) is constant as a function of q .

9 Thus, a necessary condition for a one-parameter exponential family is that for any four sample points, x1 , x2 , x3 , x4 , g(x1 ,q )- g ( x 2 , q ) g(x3 ,q )- g ( x 4 , q ) must be constant as a function of q .

Proof that the Cauchy family is not an exponential family: The Cauchy family is 1 p( x |q ) = p(1+ (x - q )2 ) 禳 1 = exp睚 log 2 铪 p(1+ (x - q ) ) = exp{ - logp - log[1 + (x - q )2 ]}, -� < q �� , < x Thus, for the Cauchy family, g(x ,q )= - log p - log[1 + ( x - q )2 ] .

For any four sample points x1, x 2 , x 3 , x 4 , 2 2 g( x1 ,q )- g ( x 2 , q ) - log[1 + ( x 1 - q ) ] + log[1 + ( x 2 - q ) ] = 2 2 g( x3 ,q )- g ( x 4 , q ) - log[1 + ( x 3 - q ) ] + log[1 + ( x 4 - q ) ] This is not constant as a function of q so the Cauchy family is not an exponential family.

II. Multiparameter exponential families

One-parameter exponential families have a natural one- dimensional sufficient statistic regardless of the sample size. A k-parameter exponential family has a k-

10 dimensional sufficient statistic regardless of the sample size.

The family of distributions of a model {Pq :q 蜵 } is said to be a k-parameter exponential family if there exist real- valued functions h1, , hk , B of q such that the pdf or pmf may be written as k

p(x |q )= h ( x )exp{h j ( q ) T j ( x ) - B ( q )} j=1

By the factorization theorem, T(X )= ( T1 ( X ), , Tk ( X )) is a sufficient statistic.

2 Example 1: Suppose X1, , X n is iid N(m , s ) . Then n 2 2 1 (xi - m ) p(x |m , s )= exp{ - 2 } i=1 2ps 2s n n 禳 x 2 骣 1镲 1 n 2 i=1 i nm =琪 exp 睚 -xi +m - 桫 2ps 2s2i=1 s 2 2 s 2 铪镲 which corresponds to a two-parameter exponential family n n T(X )= ( X , X 2 ) with 邋i=1i i = 1 i .

Example 2: Multinomial. Suppose we observe n independent trials where each trial can end up in one of k possible categories {1,...,k} with probabilities q ={p1 , , pk- 1 , p k = 1 - p 1 - ⋯ - p k - 1 }. Let

11 y1(x ), , yk ( x ) be the number of outcomes in categories 1,...,k in the n trials. Then,

n! y1 (x ) yk (x ) p(x |q ) = p1 ⋯ pk y1(x )⋯ yk ( x ) 骣y1(x ) 骣 yk- 1 ( x ) n! p1 pk-1 n = 琪⋯ 琪 pk y1(x )⋯ yk ( x ) 桫 p k 桫 p k n! = exp[y1 (x )log( p 1 / pk ) +⋯ + y k- 1 ( x )log( p k - 1 / p k ) + n log p k ] y1(x )⋯ yk ( x ) n! = y1(x )⋯ yk (x ) k -1 pi exp[y1 (x )log( p 1 / pk )+⋯ + y k- 1 ( x )log( p k - 1 / p k ) - n log(1 + exp(log ))] i=1 pk The multinomial is a (k-1) parameter exponential family with h = (log(p1 / pk , ,log( p k- 1 / p k )) , k -1

T(x )= y1 ( x ), , yk - 1 ( x ) and A(h )= n log(1 + exp( hi )) . i=1

Moments of Sufficient Statistics: As with the one- parameter exponential family, it is convenient to index the family by h = (h1 , , hk ) . The analogue of Theorem 1.6.2 that calculates the moments of the sufficient statistics is Corollary 1.6.1:

T 骣抖A A ET( X )= (h ), , ( h ) h0 琪 0 0 桫抖h1 hk 2 A VarT( X )= (h ) h0 0 抖ha h b

12 Example 2 continued: For the multinomial distribution, p p ni n i 轾 k -1 nehi p p E[ y (x )]= n log 1 + ehi = =k = k = np jh 犏 k-1 k - 1 p 1 i j 臌 i=1 1+邋ehi 1 + i i=1 i = 1 pk pk h 抖 轾 k -1 -nehi e j Cov[ y (x ), y ( x )]= n log 1 + ehi = = - np p , i j h0 i j抖h h 犏 k -1 i j j k 臌 i=1 (1+ ehi )2 i=1 2 轾 k-1 Var[ y (x )]= n log 1 + ehi = np (1 - p ) h0 i2 犏 i i . h j 臌 i=1

Curved Exponential Families: A curved exponential family is a family k p(x |q )= h ( x )exp{h j ( q ) T j ( x ) - B ( q )} j=1 for which dim(q ) < k .

An exponential family for which dim(q ) = k is a full exponential family.

Example of a curved exponential family: 2 X1, , Xn ~ N (q , q ) .

13 n 2 1 (xi -q ) p(x |q )= exp{ - 2 } i=1 2pq 2q n n 禳 x 2 骣 1镲 1 n 2 i=1 i nq =琪 exp 睚 -xi +q - 桫 2pq 2q2i=1 q 2 2 q 2 铪镲 n 骣 1禳 1n2 1 n n =琪 exp 睚 -邋 xi + x i - 桫 2pq 铪 2q2 i=1 q i = 1 2 This is an exponential family with h=( - 1/(2 q2 ),1/ q ) . The parameter space is a curve:

14

Recommended publications