1 Sufficient Statistic Theorem
Total Page:16
File Type:pdf, Size:1020Kb
Mathematical Statistics (NYU, Spring 2003) Summary (answers to his potential exam questions) By Rebecca Sela 1Sufficient statistic theorem (1) Let X1, ..., Xn be a sample from the distribution f(x, θ).LetT (X1, ..., Xn) be asufficient statistic for θ with continuous factor function F (T (X1,...,Xn),θ). Then, P (X A T (X )=t) = lim P (X A (T (X ) t h) ∈ | h 0 ∈ | − ≤ → ¯ ¯ P (X A, (T (X ) t h¯)/h ¯ ¯ ¯ = lim ∈ − ≤ h 0 ¯ ¯ → P ( T (X¯ ) t h¯)/h ¯ − ≤ ¯ d ¯ ¯ P (X A,¯ T (X ) ¯t) = dt ∈ ¯ ≤ ¯ d P (T (X ) t) dt ≤ Consider first the numerator: d d P (X A, T (X ) t)= f(x1,θ)...f(xn,θ)dx1...dxn dt ∈ ≤ dt A x:T (x)=t Z ∩{ } d = F (T (x),θ),h(x)dx1...dxn dt A x:T (x)=t Z ∩{ } 1 = lim F (T (x),θ),h(x)dx1...dxn h 0 h A x: T (x) t h → Z ∩{ | − |≤ } Since mins [t,t+h] F (s, θ) F (t, θ) maxs [t,t+h] on the interval [t, t + h], we find: ∈ ≤ ≤ ∈ 1 1 lim (min F (s, θ)) h(x)dx lim F (T (x),θ)h(x)dx h 0 s [t,t+h] h A x: T (x) t h ≤ h 0 h A x: T (x) t h → ∈ Z ∩{ k − k≤ } → Z ∩{ k − k≤ } 1 lim (max F (s, θ)) h(x)dx ≤ h 0 s [t,t+h] h A x: T (x) t h → ∈ Z ∩{ k − k≤ } 1 By the continuity of F (t, θ), limh 0(mins [t,t+h] F (s, θ)) h h(x)dx = → ∈ A x: T (x) t h 1 ∩{ k − k≤ } limh 0(maxs [t,t+h] F (s, θ)) h A x: T (x) t h h(x)dx = F (t, θ).Thus, → ∈ ∩{ k − k≤ } R R 1 1 lim F (T (x),θ),h(x)dx1...dxn = F (t, θ) lim h(x)dx h 0 h A x: T (x) t h h 0 h A x: T (x) t h → Z ∩{ | − |≤ } → Z ∩{ | − |≤ } d = F (t, θ) h(x)dx dt A x:T (x) t Z ∩{ ≤ } 1 If we let A be all of Rn, then we have the case of the denominator. Thus, we find: d F (t, θ) dt A x:T (x) t h(x)dx ∩{ ≤ } P (X A T (X)=t)= d ∈ | F (t, θ) dtR x:T (x) t h(x)dx { ≤ } d dt A x:T (x) t h(x)dx R ∩{ ≤ } = d dtR x:T (x) t h(x)dx { ≤ } which is not a function of Rθ. Thus, P (X A T (X )=t) does not depend on θ when T (X ) is a sufficient statistic. ∈ | 2Examplesofsufficient statistics (2) 2.1 Uniform 1 Suppose f(x, θ)= θ I(0,θ)(x). Then, 1 f(x ,θ)= I (X ) i θn (0,θ) i Y 1 Y = n I( ,θ)(max Xi)I(0, )(min Xi) θ −∞ ∞ 1 Let F(max Xi,θ)= θn I( ,θ)(max Xi) and h(X1,...,Xn)=I(0, )(min Xi). −∞ ∞ This is a factorization of f(xi,θ),somax Xi is a sufficient statistic for the uniform distribution. Y 2.2 Binomial xi 1 xi xi Suppose f(x, θ)=θ (1 θ) − ,x =0, 1. Then, f(xi,θ)=θ (1 − − n xi t n t θ) − .LetT (x1, ..., xn)= Xi, F (t, θ)=θ (1 θY) − ,andh(x1,P ..., xn)= − 1. ThisP is a factorization of Xf(xi,θ),whichshowsthatT (x1, ..., xn)= Xi is a sufficient statistic. Y X 2.3 Normal 2 1 1 (x µ)2 Suppose f(x, µ, σ )= e− 2σ2 − . Then, √2πσ2 1 2 2 2 n/2 (xi µ) f(xi,µ,σ )=(2πσ )− e− 2σ2 − P 1 2 1 2 2 n/2 (xi x¯) n(¯x µ) Y =(2πσ )− e− 2σ2 − e− 2σ2 − P 2 since 2 2 2 2 (xi x¯) + n(¯x µ)= (xi 2xix¯ +¯x )+n(¯x 2µx¯ + µ ) − − − − X = X x2 2¯x(nx¯)+nx¯2 + nx¯2 2nµx¯ + nµ2 i − − 2 2 = X x 2µ xi + nµ i − 2 2 = X(x 2µxXi + µ ) i − 2 = X(xi µ) − X Case 1: σ2 unknown, µ known. 1 2 2 2 n/2 2 t Let T (x1,...,xn)= (xi µ) , F (t, σ )=(2πσ )− e− 2σ ,andh(x1, ..., xn)= − 2 1. Thisisafactorizationof f(xi,σ ). Case 2: σ2 known,Pµ unknown. 1 2 1 2 Q 2 n(¯x µ) 2 n/2 2 (xi x¯) Let T (x1,...xn)=¯x, F(t, µ)=e− 2σ − ,andh(x1, ...xn)=(2πσ )− e− 2σ − . P This is a factorization of f(xi,µ). Case 3: µ unknown, σ2 unknown. 1 1 2 Q 2 2 2 n/2 2 t2 2 n(t1 µ) Let T1(x1, ...xn)=¯x, T2(x1,...xn)= (xi x¯) , F (t1,t2,µ,σ )=(2πσ )− e− 2σ e− 2σ − , − and h(x1, ..., xn)=1. This is a factorization. P 3 Rao-Blackwell Theorem (3) Let X1, ..., Xn be a sample from the distribution f(x, θ).LetY = Y (X1,...,Xn) be an unbiased estimator of θ.LetT = T (X1, ..., Xn) be a sufficient statistics for θ.Letϕ(t)=E(Y T = t). | Lemma 1 E(E(g(Y ) T )) = E(g(Y )), for all functions g. | Proof. ∞ E(E(g(Y ) T )) = E(g(Y ) T )f(t)dt | | Z−∞ ∞ ∞ = ( g(y)f(y t)dy)f(t)dt | Z−∞ Z−∞ ∞ ∞ = g(y)f(y, t)dydt Z−∞ Z−∞ ∞ ∞ = g(y)( f(y,t)dt)dy Z−∞ Z−∞ ∞ = g(y)f(y)dy Z−∞ = E(g(y)) 3 Step 1: ϕ(t) does not depend on θ. ϕ(t)= y(x1, ..., xn)f(x1, ..., xn T (x1, ..., xn)=t)dx1...dxn. n | ZR Since T (x1, ..., xn) is a sufficient statistic, f(x1, ..., xn T (x1, ..., xn) does not | depend on θ.Sincey(x1, ..., xn) is an estimator, it is not a function of θ.Thus, the integral of their product over Rn does not depend on θ. Step 2: ϕ(t) is unbiased. E(ϕ(t)) = E(E(Y T )) | = E(Y ) = θ (1) by the lemma above. Step 3: Var(ϕ(t)) Var(Y ) ≤ Var(ϕ(T )) = E(E(Y T )2) E(E(Y T ))2 | − | E(E(Y 2 T )) E(Y )2 ≤ | − = E(Y 2) E(Y )2 − = Var(Y ) Thus, conditioning an unbiased estimator on the sufficient statistic gives a new unbiased estimator with variance at most that of the old estimator. 4 Some properties of the derivative of the log (4) ∂ Let X have the distribution function f(x, θ0).LetY = ∂θ log f(X, θ) θ=θ0 . ∂ 1 ∂ | Notice that, by the chain rule, ∂θ log f(x, θ)= f(x,θ) ( ∂θ f(x, θ)).Usingthis 4 fact, we find: ∂ E(Y )=E( log f(X, θ) θ=θ ) ∂θ | 0 ∞ ∂ = log f(X, θ) θ=θ f(X, θ0)dX ∂θ | 0 Z−∞ ∞ 1 ∂ = ( f(x, θ) θ=θ0 )f(X, θ0)dX f(x, θ0) ∂θ | Z−∞ ∞ ∂ = f(x, θ) θ=θ dX ∂θ | 0 Z−∞ ∂ ∞ = ( f(x, θ)dX) θ=θ ∂θ | 0 Z−∞ ∂ = (1) θ=θ ∂θ | 0 =0 ∂2 ∂ 1 ∂ log f(x, θ)= ( ( f(x, θ))) ∂θ2 ∂θ f(x, θ) ∂θ 1 ∂2 ∂ = (f(x, θ) f(x, θ) ( f(x, θ))2) f(x, θ)2 ∂θ2 − ∂θ 1 ∂2 1 ∂ = f(x, θ) ( f(x, θ))2 f(x, θ) ∂θ2 − f(x, θ) ∂θ 1 ∂2 ∂ = f(x, θ) ( log f(x, θ))2 f(x, θ) ∂θ2 − ∂θ 2 2 ∂ ∞ 1 ∂ ∂ 2 E( log f(x, θ) θ=θ )= f(x, θ) θ=θ ( log f(x, θ) θ=θ ) dx ∂θ2 | 0 f(x, θ) ∂θ2 | 0 − ∂θ | 0 Z−∞ 2 ∞ 1 ∂ ∞ ∂ 2 = f(x, θ) θ=θ dx ( log f(x, θ) θ=θ ) dx f(x, θ) ∂θ2 | 0 − ∂θ | 0 Z−∞ Z−∞ 2 ∂ ∞ 1 ∂ 2 = ( f(x, θ)dx) θ=θ E(( log f(x, θ) θ=θ ) ) ∂θ2 f(x, θ) | 0 − ∂θ | 0 Z−∞ 2 ∂ ∂ 2 = (1) θ=θ E(( log f(x, θ) θ=θ ) ) ∂θ2 | 0 − ∂θ | 0 ∂ 2 = E(( log f(x, θ) θ=θ ) ) − ∂θ | 0 ∂2 Thus, the expected value of Y is zero, and the variance of Y is E( ∂θ2 log f(x, θ) θ=θ0 ), which is defined as the information function, I(θ). − | 5 5 The Cramer-Rao lower bound (5) Let T be an unbiased estimator based on a sample X , from the distribution f(x, θ). Then, E(T )=θ. We take the derivative of this equation to find: ∂ 1= E(T ) (2) ∂θ ∂ = T (x)f(x, θ)dx (3) ∂θ n ZR ∂ = T (x) f(x, θ)dx (4) n ∂θ ZR ∂ = T (x)( log f(x, θ))f(x, θ)dx (5) n ∂θ ZR ∂ = E(T (X ) log f(X,θ )) (6) ∂θ ∂ ∂ = E(T (X ) log f(X,θ )) cE( log f(X,θ )) (7) ∂θ − ∂θ ∂ = E((T (X ) c) log f(X,θ )) (8) − ∂θ ∂ = E((T (X ) θ) log f(X,θ )) (9) − ∂θ By the Cauchy-Schwartz Inequality, E(AB)2 E(A2)E(B2).Squaring both sides of the equation above and applying this,≤ we find: ∂ 1=E((T (X ) θ) log f(X,θ ))2 − ∂θ ∂ E((T (X ) θ)2)E(( log f(X,θ ))2) ≤ − ∂θ ∂ = Var(T )E(( log f(X,θ ))2) ∂θ Since the sample is independent and identically distributed, ∂ ∂ n ( log(f(X,θ )))2 =( log( f(x ,θ)))2 ∂θ ∂θ i i=1 Y ∂ n =( log f(x ,θ))2 ∂θ i i=1 X n ∂ =( log f(x ,θ))2 ∂θ i i=1 X 6 n ∂ n ∂ E(( log f(x ,θ))2)=Var( log f(x ,θ)) ∂θ i ∂θ i i=1 i=1 X X n ∂ = Var( log f(x ,θ)) ∂θ i i=1 X ∂ = n Var( log f(xi,θ)) · ∂θ ∂ 2 = n E(( log f(xi,θ)) ) · ∂θ = nI(θ) Thus, 1 Var(T )(nI(θ)) and the variance of an unbiased estimator is at 1 ≤ least nI(θ) .