7 Minimax Estimation
Total Page:16
File Type:pdf, Size:1020Kb
7 Minimax Estimation Previously we have looked at Bayes estimation when the overall measure of the es- timation error is weighted over all values of the parameter space, with a positive weight function, a prior. Now we use the maximum risk as a relevant measure of the estimation risk, sup R(✓,δ), ✓ and define the minimax estimator, if it exists, as the estimator that minimizes this risk, i.e. δˆ = arg inf sup R(✓,δ), δ ✓ where the infimum is taken over all functions of the data. To relate the minimax estimator with the Bayes estimator the Bayes risk r⇤ = R(✓,δλ) d⇤(✓) Z is of interest. Definition 16 The prior distribution ⇤ is called least favorable for estimating g(✓) if r r for every other distribution ⇤0 on ⌦. ⇤ ≥ ⇤0 When a Bayes estimator has a Bayes risk that attains the minimax risk it is minimax: Theorem 11 Assume ⇤ is a prior distribution and assume the Bayes estimator δ⇤ satisfies R(✓,δ⇤) d⇤(✓)=supR(✓,δ⇤). Z ✓ Then (i) δ⇤ is minimax. (ii) If δ⇤ is the unique Bayes estimator, it is the unique mimimax estimator. (iii)⇤is least favorable. Proof. Let δ = δ be another estimator. Then 6 ⇤ sup R(✓,δ) R(✓,δ) d⇤(✓) R(✓,δ⇤) d⇤(✓) ✓ ≥ Z ≥ Z =supR(✓,δλ), ✓ which proves that δ⇤ is minimax. To prove (ii), uniqueness of the Bayes estimator means that no other estimator has the same Bayes risk, so > replaces in the second inequality above, and thus ≥ 49 no other estimator has the same maximum risk, which proves the uniqueness of the minimax estimator. To show (iii), let ⇤0 =⇤beadistributionon⌦,andletδ⇤,δ⇤0 be the corresponding Bayes estimators. The6 the Bayes risks are related by r⇤0 = R(✓,δ⇤0 ) d⇤0(✓) R(✓,δ⇤) d⇤0(✓) sup R(✓,δ⇤) Z Z ✓ = r⇤, where the first inequality follows since δ⇤0 is the Bayes estimator corresponding to ⇤0. We have proven that δ⇤ is least favorable. 2 Note 1 The previous result says minimax estimators are Bayes estimators with re- spect to the least favorable prior. Two simple cases when the Bayes estimator’s Bayes risk attains the minimax risk are given in the following. Corollary 3 Assume the Bayes estimator δ⇤ has constant risk, i.e. R(✓,δ⇤) does not depend on ✓. Then δ⇤ is minimax. Proof. If the risk is constant the supremum over ✓ is equal to the average over ✓ so the Bayes and the minimax risk are the same, and the result follows from the previous theorem. 2 Corollary 4 Let δ⇤ be the Bayes estimator for ⇤, and define ⌦⇤ = ✓ ⌦:R(✓,δ⇤)=supR(✓0,δ⇤) . { 2 ✓0 } Then if ⇤(⌦⇤)=1, δ⇤ is minimax. Proof. The condition ⇤(⌦⇤)=1meansthattheBayesestimatorhasconstantrisk ⇤-almost surely, and since the Bayes estimator is only determined modulo ⇤-null sets, this is enough. 2 Example 32 Let X Bin(n, p). Let ⇤=B(a, b) be the Beta distribution for p.As previously established,2 via the conditional distribution of p given X (which is Beta also), the Bayes estimator of p is a + x δ (x)= , ⇤ a + b + n 50 with risk function, for quadratic loss, R(p, δ )=Var(δ )+(E(δ ) p)2 ⇤ ⇤ ⇤ − 1 a + np p(a + b + n) 2 = npq + − (a + b + n)2 a + b + n ! 1 = npq +(aq bp)2 (a + b + n)2 − ⇣ ⌘ a2 +(n 2a(a + b))p +((a + b)2 n)p2 = − − . (a + b + n)2 Since this function is a quadratic in p it is constant, as a function of p, if and only if the coefficients of p and p2 are zero, i.e. i↵ (a + b)2 = n, 2a(a + b)=n, which has a solution a = b = pn/2. Thus x + pn/2 δ (x)= ⇤ n + pn is a constant risk Bayes estimator and therefore minimax for the (least favorable) distribution ⇤=B(pn/2, pn/2). The Bayes estimator is unique and therefore this is the unique minimax estimator for p, with respect to ⇤=B(pn/2, pn/2). Now let ⇤ be an arbitrary distribution on [0, 1]. Then the Bayes estimator of p is δ (x)=E(p x), i.e. ⇤ | 1 1 1 δ⇤(x)= pd⇤(p x)= pf(x p) d⇤(p) Z0 | Z0 | f(x) 1 x n x 1 pp (1 p) − d⇤(p) = 1 − . R px(1 p)n xd⇤(p) 0 − − Using the power expansion R n x n x (1 p) − =1+a1p + ...+ an xp − , − − this becomes 1 x+1 x+2 n+1 0 p + a1p + ...+ an xp d⇤(p) − δ⇤(x)= x x+1 n , R p + a1p + ...+ an xp d⇤(p) − which shows that the Bayes estimatorR depends on the distribution ⇤ only via the n+1 first moments of ⇤. Therefore the least favorable distribution is not unique for esti- mating p in a Bin(n, p) distribution: Two priors with the same first n +1 moments will give the same Bayes estimator. 2 51 Recall that when the loss function is convex, then for any randomized estima- tor there is a nonrandomized estimator which has at least as small a risk as the randomized, so then there is no need to consider randomized estimators. The relation that was established between the minimax estimator and the Bayes estimator, and obtaining the prior ⇤as least favorable distribution, is valid when ⇤ is proper prior. What happens when ⇤is not proper? Sometimes the estimation problem at hand makes it natural to consider such an improper prior: One such situation is when estimating the mean in a Normal distribution with known variance, and the mean is unrestricted i.e. it is a real number. Then one could believe that the least favorable distribution is the Lebesgue measure on R. To model this, assume ⇤is a fixed (improper) prior and let ⇤n be a sequence of priors that in some sense approximate ⇤: { } Definition 17 Let ⇤ be a sequence of priors and let δ be the Bayes estimator { n} n corresponding to ⇤n with rn = R(✓,δn) d⇤n(✓) Z the Bayes risk. Define r =limn rn. ! If r⇤0 r for every (proper) prior distribution ⇤0 then the sequence ⇤n is called least favorable. { } Now if δ is an estimator whose maximal risk attains this limiting Bayes risk, then δ is minimax and the sequence is least favorable: Theorem 12 Let ⇤n be a sequence of priors and let r =limn rn. Assume the !1 estimator δ satisfies{ } r =supR(✓,δ). ✓ Then δ is minimax and the sequence ⇤ is least favorable. { n} Proof. To prove the minimaxity: Let δ0 be any other estimator. Then sup R(✓,δ0) R(✓,δ0) d⇤(✓) rn, ✓ ≥ Z ≥ for every n.Sincetherighthandsideconvergestor =sup✓ R(✓,δ)thisimpliesthat sup R(✓,δ0) sup R(✓,δ), ✓ ≥ ✓ and thus δ is minimax. To prove that ⇤ is least favorable: Let ⇤be any (proper) prior distribution { n} and let δ⇤ be the corresponding Bayes estimator. Then r⇤ = R(✓,δ⇤) d⇤(✓) R(✓,δ) d⇤(✓) Z Z sup R(✓,δ)=r, ✓ and thus ⇤ is least favorable. 2 { n} 52 Note 2 Uniqueness of the Bayes estimators δn does not imply uniqueness of the minimax estimator, since in that case the strict inequality in rn = R(✓,δn) d⇤n(✓) < R(✓,δ0) d⇤(✓) is transformed to r R(✓,δ0) d⇤(✓) under the limit operation. R R R To calculate the Bayes risk rn for δn,thefollowingisuseful. Lemma 9 Let δ⇤ be the Bayes estimator of g(✓) corresponding to ⇤, and assume quadratic loss. Then the Bayes risk is given by r⇤ = Var(g(⇥) x) dP (x). Z | Proof. The Bayes estimator is for quadratic loss given by δ⇤(x)=E(g(⇥) x), and thus the Bayes risk is | r⇤ = R(✓,δ⇤)d⇤(✓)=...(Fubinis theorem)... = Z 2 = E([δ⇤(x) g(⇥)] x) dP (x) Z − | = E([E(g(⇥) x) g(⇥)]2 x) dP (x) Z | − | = Var(g(⇥) x) dP (x). Z | 2 Note 3 If the conditional variance Var(g(⇥) x) is independent of x, the Bayes risk can be obtained as Var(g(⇥) x). | | 2 Example 33 Let X =(X1,...,Xn) be an i.i.d sequence with X1 N(✓,σ ).We want to estimate g(✓)=✓ and assume quadratic loss. Assume σ2 2is known. Then δ = X¯ is minimax: To prove this we will find a sequence a Bayes estimators δn whose 2 Bayes risk satisfies rn σ /2=sup✓ R(✓,δ); the last equality holds since the risk of X¯ does not depend on !✓. Let ✓ be distributed according to the prior ⇤=N(µ, b2). Then the Bayes estimator is nx/¯ σ2 + µ/b2 δ (x)= , ⇤ n/σ2 +1/b2 with posterior variance 1 Var(⇥ x)= , | n/σ2 +1/b2 which, since it is independent of x, implies the Bayes risk 1 r = . ⇤ n/σ2 +1/b2 If b , then r = r σ2/n. Thus δ = X¯ is minimax. 2 !1 ⇤ ⇤b ! 53 Lemma 10 Let 1 be sets of distributions. Let g(F ) be an estimand (a func- tional) defined onF .⇢ AssumeF δ is minimax over . Assume F 1 F1 sup R(F,δ1)= supR(F,δ1). F F 1 2F 2F Then δ is minimax over . 1 F Proof. Assume δ is minimax over .Sinceforeveryδ, 1 F1 sup R(F,δ) sup R(F,δ) F 1 F 2F 2F we get sup R(F,δ1)= supR(F,δ1) = inf sup R(F,δ) inf sup R(F,δ). F F 1 δ F 1 δ F 2F 2F 2F 2F Thus we have equality in the last inequality, and thus δ is minimax over . 2 1 F 7.1 Minimax estimation in exponential families Recall that the standard approach to estimation in exponential families is to make a restriction to the unbiased estimator, and in this restricted family find the estimator (if it exists) that has smallest risk (or variance for quadratic loss), uniformly over the parameter space (the UMVU), or locally for a fixed parameter (the LMVU).