<<

Stats 135: Efficiency and Sufficiency

Joan Bruna

Department of UC, Berkeley

February 8, 2015

Joan Bruna Stats 135: Efficiency and Sufficiency Cramer-Rao lower bound

This result basically states that the price to pay for having an unbiased is a certain amount of . Theorem

Suppose Xi are i.i.d with Pθ. Under smoothness condition on fθ (the density or function), we have: If θb is an unbiased error for θ, then   1 var θb ≥ . nI (θ)

Note: the right hand side is the asymptotic variance of the MLE. Proof idea: take Z to be the partial derivative (wrt to θ) of the log-likelihood. Show that cov (Z, T ) = 1 and var (Z) = nI (θ). Use Cauchy-Schwarz to conclude.

Joan Bruna Stats 135: Efficiency and Sufficiency Efficiency

Question: how to compare two ? Check out their variance Definition (Efficiency) Given two estimators θb and θe of a parameter θ, the efficiency of θb relative to θ is e   var θe eff(θ,b θe) =   . var θb     Note: if eff(θ,b θe) ≤ 1, var θb ≥ var θe , θb less efficient than θe Efficiency measures what “fraction of samples needed for the two estimators to have same variance”. Warning: It only makes sense when θb and θe have the same bias (ideally 0).

Joan Bruna Stats 135: Efficiency and Sufficiency Efficient estimators

Definition (Efficient estimator) An unbiased estimator that achieves the Cramer-Rao lower bound is called efficient. Note: Unbiased estimators cannot do better in terms of variance than Cramer-Rao This is true asymptotically for the MLE. So MLE is asymptotically efficient. However, MLE is not necessarily efficient in finite sample. No uniqueness of asymptotically efficient estimators

Joan Bruna Stats 135: Efficiency and Sufficiency Sufficiency

(X1,..., Xn): n-dimensional; might be complicated/expensive to store Question: is there function of that contains all the information there is in the sample about parameter θ? If so T (X1,..., Xn) contains all relevant info. So it’s the only thing we need to keep track of. Definition (Sufficiency) A statistics T is said to be sufficient for θ if

X1,..., Xn|T (X1,..., Xn) = t

does not depend on θ for any t. T is called a sufficient for θ.

Joan Bruna Stats 135: Efficiency and Sufficiency Sufficiency Example

n Consider {Xi }i=1 i.i.d Bernoulli(p) (1 wp p, 0 wp 1 − p). P n t n−t Let T = Xi . T is binomial (n, p): P(T = t) = t p (1 − p) P(X = x ,..., X = x , T = t) P(X = x ,..., X = x |T = t) = 1 1 n n 1 1 n n P(T = t) px1 (1 − p)1−x1 ... pxn (1 − p)1−xn = n t n−t t p (1 − p) pt (1 − p)n−t 1 = = n t n−t n t p (1 − p) t So T is sufficient for p

Joan Bruna Stats 135: Efficiency and Sufficiency Sufficiency: N&S condition

Theorem A necessary and sufficient condition for T to be sufficient for θ is that fθ(x1,..., xn) = gθ(T )h(x1,..., xn)

Previous example:

P P Xi n− Xi Pp(X1 = x1,..., Xn = xn) = p (1 − p) P = (p/(1 − p)) Xi (1 − p)n

n T So h = 1, and gp(T ) = (1 − p) (p/(1 − p)) . Proof: see class notes

Joan Bruna Stats 135: Efficiency and Sufficiency Sufficiency and MLE

Corollary

If T is sufficient for θ, θbML is a function of T . Why? If fθ(x1,..., xn) = gθ(T )h(x1,..., xn),

log(lik(θ)) = log(gθ(T )) + log(h(x1,..., xn))

log(h(x1,..., xn)) does not involve θ, so plays no role in the maximization.

Joan Bruna Stats 135: Efficiency and Sufficiency Rao-Blackwell theorem Preparations

Various ways of measuring quality of estimators. A possible one: -squared error (MSE):

 2 2   MSE(θb) = Eθ (θb− θ) = bias + var θb

Nice property of MSE: captures bias-variance tradeoff.

Joan Bruna Stats 135: Efficiency and Sufficiency Rao-Blackwell theorem

Question: given an estimator θb, can I find a better one if I know a sufficient statistic T ? Better here is in the sense of MSE. Theorem (Rao-Blackwell)   Let θb be an estimator for θ. Assume E θb2 < ∞. Suppose T is sufficient for θ. Let   θe = E θb|T Then MSE(θe) ≤ MSE(θb) .

Question: why do we need T to be sufficient? Can we improve the MLE in this fashion? Proof: see class notes; basically properties of conditional expectation.

Joan Bruna Stats 135: Efficiency and Sufficiency