<<

STAT 210B HWK #5 SOLUTIONS

GARVESH RASKUTTI

(1) Let X1, ..., Xn be i.i.d according to the Pareto distribution with density θ −(θ+1) fθ(x) = θc x , where 0 < θ and 0 < c < x. Determine the Wald, Rao and likelihood ratio tests of H0 : θ = θ0 against θ 6= θ0. The log-likelihood is

Y θ −(θ+1) l(θ; X1, ..., Xn) = log θc x i X X = n log(θ) + θ(n log(c) − log(Xi)) − log(Xi) i i The MLE for θ is ˆ n θ = P . i log(Xi) − n log(c) The likelihood ratio test: ˆ Λ = 2l(θ; X1, ..., Xn) − 2l(θ0; X1, ..., Xn) ˆ ˆX  = 2n log(θ) − 2θ log(Xi) − n log(c) X  − 2n log(θ0) + 2θ0 log(Xi) − n log(c) ! θˆ  θ  = 2n log − 1 − 0 θ0 θˆ In order to calculate the Wald and Rao statistics, we first need to calculate the : X  l(θ, X) = log(θ) − θ log − log(X) c ∂l 1 X  ⇒ (θ, X) = − log ∂θ θ c ∂2l 1 ⇒ (θ, X) = − ∂θ2 θ2 1 ⇒ I(θ) = θ2 Wald test statistic: ˆ 2 W = n(θ − θ0) In(θ0) !2 θˆ = n 1 − θ0

Date: May 2, 2011. 1 2 GARVESH RASKUTTI

√1 P ∂l Rao test statistic: let ∆n,θ = n i ∂θ (θ, Xi)

2 −1 R = ∆n,θ0 I (θ0)  !2 1 n X Xi = − log θ2 n θ c 0 0 i  2 1 1 2 = n − θ0 θ0 θˆ  θ 2 = n 1 − 0 θˆ

(2) Calculate the asymptotic relative efficiency of the empirical p−quantile −1 ¯ and the Φ (p)Sn + Xn for estimating the p−th quantile of the dis- tribution of a sample from the univariate normal N(µ, σ2) distribution (where 2 P ¯ 2 Sn = 1/(n − 1) i(Xi − Xn) ). (a) Empirical quantile: The asymptotic of the empirical p−quantile can be obtained using functional delta method as in Example 20.5 VDV p(1 − p) σ2 = 1 f(F −1(p))2 p(1 − p)σ2 = , φ2(Φ−1(p)) where F is the CDF of N(µ, σ2) and f is the corresponding density function, while Φ and φ are the CDF and PDF of standard normal, respectively. So we have √  p(1 − p)σ2  n(F −1(p) − F −1(p)) →d N 0, n φ2(Φ−1(p)) −1 ¯ 2 (b) Φ (p)Sn + Xn. First use delta method on Sn as in Example 3.2 VDV, we have

√ 2 2 d 4 n(Sn − σ ) → N(0, 2σ ), use delta method again we have √ 1 n(S − σ) →d N(0, σ2). n 2 √ 2 ¯ ¯ d Then note that Sn and Xn are independent for Gaussian data, and that n(Xn −µ) → N(0, σ2), we obtain √ √ −1 ¯ −1 −1 ¯ −1  n(Φ (p)Sn + Xn − F (p)) = n Φ (p)Sn + Xn − (Φ (p)σ + µ) √ √ −1 ¯ = Φ (p) n(Sn − σ) + n(Xn − µ) ! ! (Φ−1(p))2 N 0, 1 + σ2 2

0 2 1 (Φ−1(p)) @1+ 2 A

As a result, the ARE between the two is: ARE = p(1−p) . φ2(Φ−1(p)) STAT 210B HWK #5 SOLUTIONS 3

(3) Consider the corrupted mixture of Gaussian F = (1 − )N(µ, 1) + N(µ, σ2) where  and σ are fixed with 0 ≤  ≤ 1 and σ > 0. (a) Find the asymptotic relative efficiency (ARE) of the relative to the mean as estimators for µ in terms of  and σ. (b) Prove that the ARE is an increasing function of σ in the range σ ∈ (1, ∞). (a) Note that the mean of the distribution F = (1 − )N(µ, 1) + N(µ, σ2) is µ and the variance is (1 − ) + σ2 by the law of total variation. Therefore √ n(X¯ − µ) →d N (0, 1 −  + σ2).

For any distribution, note that the asymptotic distribution of the median X n satisfies: b( 2 )c

√ d 1 n(Xb( n )c − µ) → N (0, ), 2 4f(F −1(1/2)) where f is the pdf and F is the cdf. For the mixture F = (1 − )N(µ, 1) + N(µ, σ2), f(F −1(1/2)) = (1 − )(2π)−1/2 + (2πσ2)−1/2. Therefore the ARE of the median compared to the mean is: 2 ARE = (1 −  + σ2)(1 −  + /σ)2. π (b) To show that the ARE is an increasing function when σ > 1, simply show that the of log ARE(σ) with respect to σ is an increasing function. d log ARE 2σ 2 = − . dσ 1 −  + σ2 σ2 − σ2 + σ Therefore d log ARE 1 1 = 2(1 − )(σ − σ−2). dσ 1 −  + σ2 σ2 − σ2 + σ Clearly the derivative of log ARE(σ) is positive in the range σ > 1 which shows that the ARE is an increasing function when σ > 1.

(4) Consider the gaussian sequence model Yi = µi + wi, for i = 1, 2, ..., n, and n 1 ∞ (wi)i=1 are i.i.d with wi ∼ N(0, n ) for each i. Further assume that (µi)i=1 is an infinite sequence that satisfies the following ellipsoid constraint: ∞ X 2 2α µi i ≤ 1, i=1 1 n n where α > 2 . Consider two estimators for the sequence µ = (µi)i=1: (i) n The maximum likelihood estimator µˆ where µˆi = Yi; and (ii) the truncated n 1/(2α+1) estimator µ˜ where µ˜i = Yi1(i ≤ n ). n 2α n P 2 n − 2α+1 (a) Prove that MSE(ˆµ ) := i=1 E[(µ ˆi − µi)] = 1 and MSE(˜µ ) ≤ 2n for all n. Note that MSE(˜µn)/MSE(ˆµn) → 0 as n → ∞. (b) Explain in words why your result from (a) does not contradict Theorem n 8.11 (page 118) in VDV for the case ` = `2 where Tn =µ ˜ . (a) First, we compute MSE(ˆµn):

n n X X 1 MSE(ˆµn) = (w2) = = 1 E i n i=1 i=1 4 GARVESH RASKUTTI

To compute MSE(˜µn), we first de-compose the sum as follows: n X 2 X 2 MSE(˜µ ) = E(wi ) + µi . i≤bn1/(2α+1)c i>bn1/(2α+1)c First we bound the first sum: X 1 (w2) = bn1/(2α+1)c ≤ n−2α/(2α+1). E i n i≤bn1/(2α+1)c Next we bound the second sum: X 2 X 2 2α −2α µi = µi i i i≥dn1/(2α+1)e i≥dn1/(2α+1)e −2α/(2α+1) X 2 2α ≤ n µi i i≥dn1/(2α+1)e ≤ n−2α/(2α+1). Hence the proof for part (a) is complete.

(b) There are two reasons why the result from part (a) does not violate Theorem 8.11 in VDV. The first reason is that Theorem 8.11 assumes the parameter space is fixed while for the Guassian sequence model defined here, the parameter space grows as n grows. Hence Theorem 8.11 does not apply. The second reason is that Theorem 8.11 in VDV involves taking a supremum over h belonging to some I. Part (a) proves that in the specific case h = 0, the MSE is asymptotically smaller for the shrinkage estimatorµ ˜n. Note that for any finite h the ellipsoid constraint would be violated so the bounds on MSE ofµ ˜n do not hold over the shrinking neighborhood. This again highlights the importance of the shrinking neighborhoods in Le Cam’s theory.