<<

Stat 411 – Review problems for Exam 3 Solutions

1. (a) Since the Gamma(a, b) PDF is xa−1e−x/b/baΓ(a), and the Exp(b) PDF is e−x/b/b, it is clear that Exp(b) = Gamma(1, b); here you also need the fact that, for integer a, Γ(a) = (a − 1)!. (b) The gamma PDF can be rewritten as

f(x) = exp{(a − 1) log x − (1/b)x − a log b − log Γ(a)}.

This is clearly in the form of a two- with p1(a, b) = a − 1, p2(a, b) = −1/b, K1(x) = log x, and K2(x) = x. a−1 (c) Let X ∼ Beta(a, 1), with PDF fX (x) = ax , and Y = − log X. Then −Y X = e and the transformation rule says the PDF fY (y) is

−y −y −y a−1 −y −ay fY (y) = fX (e )e = a(e ) e = ae .

This is the PDF of an Exp(1/a) = Gamma(1, 1/a) . (d) Let X ∼ Γ(a, b). The transformation rule can be used again to show that Y = tX ∼ Gamma(a, tb) for t > 0. Perhaps a simpler argument is based on b being a . That is X = bV where V ∼ Gamma(a, 1). Writing tb in place of b in the previous statement proves the claim.

−ai (e) The -generating for Xi ∼ Gamma(ai, b) is Mi(t) = (1−bt) , for t < 1/b; see page 151 in the book. Then the moment-generating function Pn of i=1 Xi is the product of the individual moment-generating functions:

n n Pn Y Y −ai − ai Mi(t) = (1 − bt) = (1 − bt) i=1 . i=1 i=1 Pn The latter expression is the moment-generating function of a Gamma( i=1 ai, b) Pn random variable and, by uniqueness of moment-generating functions, i=1 Xi must have this distribution. (f) The ChiSq(r) PDF is given on page 152 of the book. It is clear the this is just the special case Gamma(r/2, 2).

2. (a) The is

n 2−1 −Xi/θ Y Xi e −2n −(1/θ) Pn X L(θ) = ∝ θ e i=1 i . θ2Γ(2) i=1

Calculus gives the MLE as θˆ = X/2, and so the likelihood ratio is

2n L(θ0)  nX  Λ = = e2n e−(1/θ0)nX . L(θˆ) 2nθ0

Let T = 2nX/θ0. Then the likelihood ratio statistic is proportional to g(T ) with g(t) = (t/2)2ne−t/2. This function is increasing for a while, and then

1 decreasing so, overall, g(T ) is small iff T < k1 or T > k2, where k1 < k2 are constants to be determined. To make the size equal α, we choose k1, k2 such Pn that Pθ0 (T < k1) = Pθ0 (T > k2) = α/2. Since i=1 Xi ∼ Gamma(2n, θ0) when H0 is true, it follows that T ∼ Gamma(2n, 2) = ChiSq(4n). Therefore, we take k1, k2 to be of the ChiSq(4n) distribution, i.e., the exact size-α likelihood ratio test rejects H0 in favor of H1 iff 2 2 T < χ4n,1−α/2 or T > χ4n,α/2. (b) Wilk’s theorem says that, if n is large, then −2 log Λ is approximately dis- tributed as ChiSq(1). In that case, an approximate size-α likelihood ratio test 2 rejects H0 in favor of H1 iff −2 log Λ > χ1,α. But −2 log Λ can be simplified: −2 log Λ = −4n log(T/4n) − 4n + T.

So the approximately size-α likelihood ratio test rejects H0 iff −4n log(T/4n)− 2 4n + T ≥ χ1,α. (c) The two tests in (a) and (b), respectively, are different. However, both are based on T and have cutoffs determined by chi-square distribution percentiles. For the , the power function can be derived by “inserting” the true θ in the denominator of T . That is,

pow(θ) = Pθ(T < k1) + Pθ(T > k2)

= Pθ{(θ0/θ)T < (θ0/θ)k1} + Pθ{(θ0/θ)T > (θ0/θ)k2}

= P{ChiSq(4n) < (θ0/θ)k1} + P{ChiSq(4n) > (θ0/θ)k2}

= H((θ0/θ)k1) + 1 − H((θ0/θ)k2),

where H is the CDF of a ChiSq(4n) distributed random variable, and k1 = 2 2 χ4n,1−α/2 and k2 = χ4n,α/2. The H function can be evaluated in R with the function pchisq. The power function for the approximate test in (b) can be evaluated exactly, and the calculation looks similar to that given above for the exact test, but it requires a numerical solution to a non-linear equation, which is outside the scope of what we’re doing here. But it can be found via Monte Carlo without much trouble; see the corresponding R code. Plots of these two power functions (n = 10, α = 0.05, and θ0 = 2) are given below.

Exact Approx ) θ ( pow 0.2 0.4 0.6 0.8

1.0 1.5 2.0 2.5 3.0

θ

2 3. In this case, the likelihood function for (θ1, θ2) is

n Y 1  1 n L(θ , θ ) = I (X ) = I (X )I (X ). 1 2 θ − θ [θ1,θ2] i θ − θ [θ1,∞) (1) (−∞,θ2] (n) i=1 2 1 2 1 If the null hypothesis is true, and the distribution is symmetric, i.e., Unif(−θ, θ) for ˆ0 some θ > 0, then the MLE for the upper endpoint θ is θ = max{−X(1),X(n)}; the superscript “0” is to indicate that this is the MLE under H0. This can be derived by manipulating the indicator functions in the likelihood. The overall MLE is given by ˆ ˆ θ1 = X(1) and θ2 = X(n). Now the likelihood ratio statistic for testing H0 : θ1 = −θ2 (symmetry) is L(−θˆ0, θˆ0)  X − X n = (n) (1) . ˆ ˆ L(θ1, θ2) 2 max{−X(1),X(n)} 4. Both examples are exponential families which, in general, have the monotone like- lihood ratio property. However, this can be verified directly.

(a) Let θ1 > θ0. Then the likelihood ratio is

2 1 1 Pn L(θ0) θ1  ( − ) Xi = ne θ1 θ0 i=1 . (1) L(θ1) θ0

1 1 Since θ1 > θ0, θ − θ < 0, and so the likelihood ratio is a decreasing function Pn 1 0 of T = i=1 Xi. Therefore, Gamma(2, θ) has the monotone likelihood ratio property in T .

(b) Again, take θ1 > θ0. The likelihood ratio is

n n θ −θ L(θ0) Γ(θ1) Y  0 1 = X . L(θ ) Γ(θ ) i 1 0 i=1 Qn Since the exponent on T = i=1 is negative, the ratio is decreasing in T . Therefore, Gamma(θ, 1) has the monotone likelihood ratio property in T .

5. (a) The Neyman–Pearson theorem says that the most powerful size-α test of H0 : θ = θ0 versus H1 : θ = θ1 rejects H0 iff L(θ0)/L(θ1) ≤ k, where k is chosen so that the size of the test is α. The likelihood ratio in (1) is decreasing Pn Pn 0 in i=1 Xi, so it’s less than k iff i=1 Xi is greater than some k . From Pn Problem 2(a), (2/θ0) i=1 Xi ∼ ChiSq(4n), so the most powerful size-α test Pn 2 rejects H0 in favor of H1 iff (2/θ0) i=1 Xi ≥ χ4n,α.

(b) Since the most-powerful test of H0 : θ = θ0 versus H1 : θ = θ1 does not depend on θ1, we can conclude that the test is actually uniformly most-powerful for the one-sided alternative H1 : θ > θ0. This conclusion could have been reached automatically based on the monotone likelihood ratio property shown in Problem 4(a).

3 Pn 2 (c) The power function is pow(θ) = Pθ{(2/θ0) i=1 Xi ≥ χ4n,α}. The key is that, Pn if θ is the true parameter, then (2/θ) i=1 Xi ∼ ChiSq(4n). So we only need to do a bit of algebra:

n n n 2 X o n θ 2 X o pow(θ) = P X ≥ χ2 = P X ≥ χ2 θ θ i 4n,α θ θ θ i 4n,α 0 i=1 0 i=1 n n2 X θ0 o = P X ≥ χ2 = P{ChiSq(4n) ≥ θ χ2 /θ} θ θ i θ 4n,α 0 4n,α i=1 2 = 1 − H(θ0χ4n,α/θ), where H is the CDF of a ChiSq(4n) distributed random variable. A plot of this function versus the power function for the exact likelihood ratio test in Problem 2(c) above (with n = 10, α = 0.05, θ0 = 2) is shown below. As expected, the UMP test has larger power for all θ > 2.

UMP Exact LRT ) θ ( pow 0.1 0.2 0.3 0.4 0.5 0.6

2.0 2.2 2.4 2.6 2.8 3.0

θ

6. The goal is to test H0 : µ = 140 versus H1 : µ > 140. The observed gives summary n = 12, x = 145, and s = 8. All units are milligrams.

(a) Since the distribution is assumed to be normal with both and unknown, a t-test seems most appropriate. The t-test will reject H0 if and only if X − µ T := 0 > t? . S/n1/2 n−1,1−α

Here T has a Student-t distribution with n − 1 degrees of freedom when H0 ? is true; then tn−1,1−α is the value t such that P(T > t) = α. With n = 12 and ? α = 0.05, we get t11,0.95 = 1.796. This value can be found from the t-table or from software (in R, type qt(0.95, df=11)). (b) For the given data, the t-statistic T defined above equals 2.17. Therefore, since 2.17 > 1.796, we should reject H0. Also, the p-value for the test is

pval = Pµ=140(T > 2.17) ≈ 0.027.

4 The numerical value can be found from a t-table or with software (in R, type 1-pt(2.17, df=11)). In any case, the p-value is less than the specified α = 0.05, so we should still reject H0. 7. (a) In the hypothesis testing context, the size of the test is the probability of making a Type I error, i.e., the probability of rejecting the null hypothesis when it’s actually true. Since size is an error probability, it makes intuitive sense that this number should be small. However, if we force the size to be zero, then, in general, the test will never reject. For example, suppose X ∼ N(θ, 1) and the goal is to test H0 : θ = 0 versus H1 : θ > 0. The uniformly most ? ? powerful size-α test rejects H0 iff X > zα, where 1 − Φ(zα) = α. If α = 0, then the testing rule is to reject H0 iff X = ∞. Since this can never happen, the test will never reject H0. This is still a valid test, but it’s lousy because it’s power function is identically zero. (b) It is not obvious why having n → ∞ is meaningful in statistics—in every real problem, n is surely finite. But letting n → ∞ reveals the common fea- tures that many seemingly different problems have. This, in turn, gives us some intuition about how problems should work. More practically, problems are often easier to understand and solve as n → ∞. For example, the exact distribution, with finite n, of the MLE can be difficult, if not im- possible, to pin down because, e.g., there is no closed-form expression for the likelihood equation solution. But assuming n is large allows us to approximate the of the MLE by something simple, namely, a . This approximate normality can be used to, say, construct a con- fidence interval. Similarly, in a hypothesis testing context, the exact, finite n, sampling distribution of the likelihood ratio may not be available. However, Wilk’s theorem says that, if n is large, then we can approximate that exact sampling distribution by a familiar chi-square. So the point of asymptotic (n → ∞) theory is to help develop intuition and to give approximate solutions to important problems where the exact solution may not be available. (c) When we say that parameter values with large likelihood values are “bet- ter” than parameter values with small likelihood, we are talking about how well the postulated model fits the observed data. When the model assumes iid Qn X1,...,Xn ∼ fθ(x), then the likelihood function L(θ) = i=1 fθ(Xi) is the joint PDF/PMF of the observed data. To avoid technical problems, assume the data are discrete-valued, so that the likelihood function is the joint PMF of data, as a function of parameter. The starting point is the following intuition: we expect to see data that the model gives high probability to. Now, in our case, the model/parameter is uncertain so we flip this intuition around and seek a model/parameter that gives high probability to the data we actually saw. That is, we prefer θ values with large L(θ) values.

5