1 Sufficient Statistic Theorem

Mathematical Statistics (NYU, Spring 2003) Summary (answers to his potential exam questions) By Rebecca Sela 1Sufficient statistic theorem (1) Let X1, ..., Xn be a sample from the distribution f(x, θ).LetT (X1, ..., Xn) be asufficient statistic for θ with continuous factor function F (T (X1,...,Xn),θ). Then, P (X A T (X )=t) = lim P (X A (T (X ) t h) ∈ | h 0 ∈ | − ≤ → ¯ ¯ P (X A, (T (X ) t h¯)/h ¯ ¯ ¯ = lim ∈ − ≤ h 0 ¯ ¯ → P ( T (X¯ ) t h¯)/h ¯ − ≤ ¯ d ¯ ¯ P (X A,¯ T (X ) ¯t) = dt ∈ ¯ ≤ ¯ d P (T (X ) t) dt ≤ Consider first the numerator: d d P (X A, T (X ) t)= f(x1,θ)...f(xn,θ)dx1...dxn dt ∈ ≤ dt A x:T (x)=t Z ∩{ } d = F (T (x),θ),h(x)dx1...dxn dt A x:T (x)=t Z ∩{ } 1 = lim F (T (x),θ),h(x)dx1...dxn h 0 h A x: T (x) t h → Z ∩{ | − |≤ } Since mins [t,t+h] F (s, θ) F (t, θ) maxs [t,t+h] on the interval [t, t + h], we find: ∈ ≤ ≤ ∈ 1 1 lim (min F (s, θ)) h(x)dx lim F (T (x),θ)h(x)dx h 0 s [t,t+h] h A x: T (x) t h ≤ h 0 h A x: T (x) t h → ∈ Z ∩{ k − k≤ } → Z ∩{ k − k≤ } 1 lim (max F (s, θ)) h(x)dx ≤ h 0 s [t,t+h] h A x: T (x) t h → ∈ Z ∩{ k − k≤ } 1 By the continuity of F (t, θ), limh 0(mins [t,t+h] F (s, θ)) h h(x)dx = → ∈ A x: T (x) t h 1 ∩{ k − k≤ } limh 0(maxs [t,t+h] F (s, θ)) h A x: T (x) t h h(x)dx = F (t, θ).Thus, → ∈ ∩{ k − k≤ } R R 1 1 lim F (T (x),θ),h(x)dx1...dxn = F (t, θ) lim h(x)dx h 0 h A x: T (x) t h h 0 h A x: T (x) t h → Z ∩{ | − |≤ } → Z ∩{ | − |≤ } d = F (t, θ) h(x)dx dt A x:T (x) t Z ∩{ ≤ } 1 If we let A be all of Rn, then we have the case of the denominator. Thus, we find: d F (t, θ) dt A x:T (x) t h(x)dx ∩{ ≤ } P (X A T (X)=t)= d ∈ | F (t, θ) dtR x:T (x) t h(x)dx { ≤ } d dt A x:T (x) t h(x)dx R ∩{ ≤ } = d dtR x:T (x) t h(x)dx { ≤ } which is not a function of Rθ. Thus, P (X A T (X )=t) does not depend on θ when T (X ) is a sufficient statistic. ∈ | 2Examplesofsufficient statistics (2) 2.1 Uniform 1 Suppose f(x, θ)= θ I(0,θ)(x). Then, 1 f(x ,θ)= I (X ) i θn (0,θ) i Y 1 Y = n I( ,θ)(max Xi)I(0, )(min Xi) θ −∞ ∞ 1 Let F(max Xi,θ)= θn I( ,θ)(max Xi) and h(X1,...,Xn)=I(0, )(min Xi). −∞ ∞ This is a factorization of f(xi,θ),somax Xi is a sufficient statistic for the uniform distribution. Y 2.2 Binomial xi 1 xi xi Suppose f(x, θ)=θ (1 θ) − ,x =0, 1. Then, f(xi,θ)=θ (1 − − n xi t n t θ) − .LetT (x1, ..., xn)= Xi, F (t, θ)=θ (1 θY) − ,andh(x1,P ..., xn)= − 1. ThisP is a factorization of Xf(xi,θ),whichshowsthatT (x1, ..., xn)= Xi is a sufficient statistic. Y X 2.3 Normal 2 1 1 (x µ)2 Suppose f(x, µ, σ )= e− 2σ2 − . Then, √2πσ2 1 2 2 2 n/2 (xi µ) f(xi,µ,σ )=(2πσ )− e− 2σ2 − P 1 2 1 2 2 n/2 (xi x¯) n(¯x µ) Y =(2πσ )− e− 2σ2 − e− 2σ2 − P 2 since 2 2 2 2 (xi x¯) + n(¯x µ)= (xi 2xix¯ +¯x )+n(¯x 2µx¯ + µ ) − − − − X = X x2 2¯x(nx¯)+nx¯2 + nx¯2 2nµx¯ + nµ2 i − − 2 2 = X x 2µ xi + nµ i − 2 2 = X(x 2µxXi + µ ) i − 2 = X(xi µ) − X Case 1: σ2 unknown, µ known. 1 2 2 2 n/2 2 t Let T (x1,...,xn)= (xi µ) , F (t, σ )=(2πσ )− e− 2σ ,andh(x1, ..., xn)= − 2 1. Thisisafactorizationof f(xi,σ ). Case 2: σ2 known,Pµ unknown. 1 2 1 2 Q 2 n(¯x µ) 2 n/2 2 (xi x¯) Let T (x1,...xn)=¯x, F(t, µ)=e− 2σ − ,andh(x1, ...xn)=(2πσ )− e− 2σ − . P This is a factorization of f(xi,µ). Case 3: µ unknown, σ2 unknown. 1 1 2 Q 2 2 2 n/2 2 t2 2 n(t1 µ) Let T1(x1, ...xn)=¯x, T2(x1,...xn)= (xi x¯) , F (t1,t2,µ,σ )=(2πσ )− e− 2σ e− 2σ − , − and h(x1, ..., xn)=1. This is a factorization. P 3 Rao-Blackwell Theorem (3) Let X1, ..., Xn be a sample from the distribution f(x, θ).LetY = Y (X1,...,Xn) be an unbiased estimator of θ.LetT = T (X1, ..., Xn) be a sufficient statistics for θ.Letϕ(t)=E(Y T = t). | Lemma 1 E(E(g(Y ) T )) = E(g(Y )), for all functions g. | Proof. ∞ E(E(g(Y ) T )) = E(g(Y ) T )f(t)dt | | Z−∞ ∞ ∞ = ( g(y)f(y t)dy)f(t)dt | Z−∞ Z−∞ ∞ ∞ = g(y)f(y, t)dydt Z−∞ Z−∞ ∞ ∞ = g(y)( f(y,t)dt)dy Z−∞ Z−∞ ∞ = g(y)f(y)dy Z−∞ = E(g(y)) 3 Step 1: ϕ(t) does not depend on θ. ϕ(t)= y(x1, ..., xn)f(x1, ..., xn T (x1, ..., xn)=t)dx1...dxn. n | ZR Since T (x1, ..., xn) is a sufficient statistic, f(x1, ..., xn T (x1, ..., xn) does not | depend on θ.Sincey(x1, ..., xn) is an estimator, it is not a function of θ.Thus, the integral of their product over Rn does not depend on θ. Step 2: ϕ(t) is unbiased. E(ϕ(t)) = E(E(Y T )) | = E(Y ) = θ (1) by the lemma above. Step 3: Var(ϕ(t)) Var(Y ) ≤ Var(ϕ(T )) = E(E(Y T )2) E(E(Y T ))2 | − | E(E(Y 2 T )) E(Y )2 ≤ | − = E(Y 2) E(Y )2 − = Var(Y ) Thus, conditioning an unbiased estimator on the sufficient statistic gives a new unbiased estimator with variance at most that of the old estimator. 4 Some properties of the derivative of the log (4) ∂ Let X have the distribution function f(x, θ0).LetY = ∂θ log f(X, θ) θ=θ0 . ∂ 1 ∂ | Notice that, by the chain rule, ∂θ log f(x, θ)= f(x,θ) ( ∂θ f(x, θ)).Usingthis 4 fact, we find: ∂ E(Y )=E( log f(X, θ) θ=θ ) ∂θ | 0 ∞ ∂ = log f(X, θ) θ=θ f(X, θ0)dX ∂θ | 0 Z−∞ ∞ 1 ∂ = ( f(x, θ) θ=θ0 )f(X, θ0)dX f(x, θ0) ∂θ | Z−∞ ∞ ∂ = f(x, θ) θ=θ dX ∂θ | 0 Z−∞ ∂ ∞ = ( f(x, θ)dX) θ=θ ∂θ | 0 Z−∞ ∂ = (1) θ=θ ∂θ | 0 =0 ∂2 ∂ 1 ∂ log f(x, θ)= ( ( f(x, θ))) ∂θ2 ∂θ f(x, θ) ∂θ 1 ∂2 ∂ = (f(x, θ) f(x, θ) ( f(x, θ))2) f(x, θ)2 ∂θ2 − ∂θ 1 ∂2 1 ∂ = f(x, θ) ( f(x, θ))2 f(x, θ) ∂θ2 − f(x, θ) ∂θ 1 ∂2 ∂ = f(x, θ) ( log f(x, θ))2 f(x, θ) ∂θ2 − ∂θ 2 2 ∂ ∞ 1 ∂ ∂ 2 E( log f(x, θ) θ=θ )= f(x, θ) θ=θ ( log f(x, θ) θ=θ ) dx ∂θ2 | 0 f(x, θ) ∂θ2 | 0 − ∂θ | 0 Z−∞ 2 ∞ 1 ∂ ∞ ∂ 2 = f(x, θ) θ=θ dx ( log f(x, θ) θ=θ ) dx f(x, θ) ∂θ2 | 0 − ∂θ | 0 Z−∞ Z−∞ 2 ∂ ∞ 1 ∂ 2 = ( f(x, θ)dx) θ=θ E(( log f(x, θ) θ=θ ) ) ∂θ2 f(x, θ) | 0 − ∂θ | 0 Z−∞ 2 ∂ ∂ 2 = (1) θ=θ E(( log f(x, θ) θ=θ ) ) ∂θ2 | 0 − ∂θ | 0 ∂ 2 = E(( log f(x, θ) θ=θ ) ) − ∂θ | 0 ∂2 Thus, the expected value of Y is zero, and the variance of Y is E( ∂θ2 log f(x, θ) θ=θ0 ), which is defined as the information function, I(θ). − | 5 5 The Cramer-Rao lower bound (5) Let T be an unbiased estimator based on a sample X , from the distribution f(x, θ). Then, E(T )=θ. We take the derivative of this equation to find: ∂ 1= E(T ) (2) ∂θ ∂ = T (x)f(x, θ)dx (3) ∂θ n ZR ∂ = T (x) f(x, θ)dx (4) n ∂θ ZR ∂ = T (x)( log f(x, θ))f(x, θ)dx (5) n ∂θ ZR ∂ = E(T (X ) log f(X,θ )) (6) ∂θ ∂ ∂ = E(T (X ) log f(X,θ )) cE( log f(X,θ )) (7) ∂θ − ∂θ ∂ = E((T (X ) c) log f(X,θ )) (8) − ∂θ ∂ = E((T (X ) θ) log f(X,θ )) (9) − ∂θ By the Cauchy-Schwartz Inequality, E(AB)2 E(A2)E(B2).Squaring both sides of the equation above and applying this,≤ we find: ∂ 1=E((T (X ) θ) log f(X,θ ))2 − ∂θ ∂ E((T (X ) θ)2)E(( log f(X,θ ))2) ≤ − ∂θ ∂ = Var(T )E(( log f(X,θ ))2) ∂θ Since the sample is independent and identically distributed, ∂ ∂ n ( log(f(X,θ )))2 =( log( f(x ,θ)))2 ∂θ ∂θ i i=1 Y ∂ n =( log f(x ,θ))2 ∂θ i i=1 X n ∂ =( log f(x ,θ))2 ∂θ i i=1 X 6 n ∂ n ∂ E(( log f(x ,θ))2)=Var( log f(x ,θ)) ∂θ i ∂θ i i=1 i=1 X X n ∂ = Var( log f(x ,θ)) ∂θ i i=1 X ∂ = n Var( log f(xi,θ)) · ∂θ ∂ 2 = n E(( log f(xi,θ)) ) · ∂θ = nI(θ) Thus, 1 Var(T )(nI(θ)) and the variance of an unbiased estimator is at 1 ≤ least nI(θ) .

1 Sufficient Statistic Theorem

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support