B90.3302 / C22.0015 Exam of 2011.MAR.30 REDsolutions 

1. The factorization theorem is the powerful tool that we use to identify sufficient . For each of the following cases, find the sufficient . In some cases, no simplification works, and you’ll have to say “the whole is needed for the sufficient statistic.”

(a) X1, X2, …, Xn is a sample from the with λ. The x 1 − probability density is exλ I0( > ) . λ (b) X1, X2, …, Xn is a sample from the Cauchy distribution with ω. The 11 probability density is × 2 . π 1+(x −ω)

(c) X1, X2, …, Xn is a sample from the logistic distribution with median µ. The 1 cumulative distribution function is F(x) = and the density is 1 + e−(x −µ) e−(x −µ) f(x) = 2 . (You do not need F to do this problem.) (1 + e−(x −µ) )

(d) X1, X2, …, Xn is a sample from the lognormal distribution with µ and σ. (Do not think of µ and σ as the mean and .) The 2 (log x −µ) − 1 2 density is f (x) = e 2σ . σπx 2 (e) The values c1 , c2 , …, cn are n known positive numbers. The random variables X1, X2, …, Xn are independent, and the distribution of Xi is Poisson with mean ci θ.

SOLUTION: n x n i ∑ xi −λ λ −λ 1 ( n ) λ i=1 (a) The likelihood is ∏ e = e n . The only part of the i=1 xi ! ∏ xi ! i=1

likelihood in which the xi’s and λ cannot be separated is the third factor. There n appears ∑ xi , so this must be the sufficient statistic. In form, i=1 n this is ∑ X i . It would also be reasonable to use x as the sufficient statistic. i=1 1 (b) The likelihood is n . There is no way to separate the xi’s n 2 π∏ ( 1 +(xi −ω) ) i=1 from ω, so the whole sample is needed for the sufficient statistic.

1  gs2011 B90.3302 / C22.0015 Exam of 2011.MAR.30 REDsolutions 

−( x −µ) n e i (c) The likelihood is 2 . The numerator will factor neatly, but the ∏ −( x −µ) i=1 (1 + e i )

denominator hopelessly entangles the xi’s with µ. The whole sample is needed for the sufficient statistic. 2 (log x −µ) n − i 1 2 (d) The likelihood is ∏ e 2σ = i=1 σπxi 2 n 1 2 − (log xi −µ) 11   2σ2 ∑ e i=1 . The first factor has parameters but n nn/2   σπ(2 ) ∏ xi i=1 no xi’s, and the second factor has xi’s but no parameters. The sufficient statistic must therefore come from the exponent in the third factor. Write now

n nn 1 2 1 2 − −µ − −µ +µ2 2 ∑(log xi ) = 2 ∑∑(logxii) 2 log xn 2σ i=1 2σ ii=11=

nn 2 This reveals that the sufficient statistic is ∑∑(logxxii) , log . ii=11=

The recommended way to deal with lognormal is by taking logarithms. Let Yi = log Xi . Then Y1, Y2, …, Yn will be a normal sample, nn 2 for which the sufficient statistic is ∑∑yyii, . ii=11= n nnxi xi c n −θ cxii∏ i −θ(c θ) ∑∑ ci i ( ii=11) i=1 θ = (e) The likelihood is ∏ e = e n . The only part i=1 xi !  ∏ xi ! i=1

of the likelihood in which the xi’s and θ cannot be separated is the third factor. n There appears ∑ xi , so this must be the sufficient statistic. In random variable i=1 n form, this is ∑ X i . It would also be reasonable to use x as the sufficient i=1 statistic.

2  gs2011 B90.3302 / C22.0015 Exam of 2011.MAR.30 REDsolutions 

2. Suppose that x1, x2, … , xn are known numbers. The random variables Y1, Y2, …, Yn σ2 are independent, and the distribution of Yi is normal N( 0, 2 ). The standard deviation xi σ of Yi is . The only unknown is σ. xi

(a) Construct the likelihood for Y1, Y2, …, Yn. (b) Identify the sufficient statistic. If the sufficient statistic is complicated, indicate why this is so. (c) Find the maximum likelihood estimate for σ. 2 2 2 (d) Show that E σˆ ML = σ . That is, the maximum likelihood estimate of σ is unbiased.

SOLUTION: For part (a), just write

2 1 n xi 2 n − 22 n − yi 2 ∑ xyii x 2 1 2σ L = i e 2 σ = xe i=1 ∏ n n/2 ∏ i i=1 σπ2 σπ(2 ) i=1

(b) The likelihood involves only one expression in the yi’s, and that must be the n 22 sufficient statistic. It is ∑ xyii. In random variable form, this would be i=1 n 22 ∑ xYii. i=1

d let (c) Just solve log L = 0. This is dσ

n 1 n let −+ 22= 3 ∑ xyii 0 σσi=1

1 n 1 n = 22 σ2 22 This reduces to n 2 ∑ xyii, and thus ˆ ML = ∑ xyii. In random σ i=1 n i=1 n 2 1 22 variable form, this is σˆ ML = ∑ xYii. The maximum likelihood estimate of n i=1 the standard deviation is of course the square root of this.

2 2 σ 2 2 (d) Since E Yi = Var Yi = 2 , it follows quickly that E σˆ ML = σ . xi

3  gs2011 B90.3302 / C22.0015 Exam of 2011.MAR.30 REDsolutions 

3. The only data you’ve got is X, and it follows a with n = 6 and with unknown p. You are going to test H0: p = 0.40 versus H1: p ≠ 0.40. You have decided to use rejection set  = {x = 0} ∪ {x = 6}.

(a) Find the probability of type I error for this test. (b) Find the probability of type II error if really p = 0.90. (c) Find the probability of type II error if really p = 1.00. (d) Find the probability of type II error if really p = 0.50.

SOLUTION:

(a) This is P[ X = 0 | p = 0.40 ] + P[ X = 6 | p = 0.40 ] = 0.606 + 0.406 = 0.046656 + 0.004096 = 0.050752. This is very close to 5%.

(b) This is 1 – { P[ X = 0 | p = 0.90 ] + P[ X = 6 | p = 0.90 ] } = 1 - { 0.106 + 0.906} = 1 - { 0.000001 + 0.531441 } = 1 - 0.531442 = 0.468558  47%.

(c) The type II error probability is zero. If p = 1.00, you are certain to see X = 7, and you cannot make an error.

(d) This is 1 - { P[ X = 0 | p = 0.50 ] + P[ X = 6 | p = 0.50 ] } = 6 6 1 - { 0.50 + 0.50 } = 1 - 2 × 0.015625 = 1 - 0.03125 = 0.96875 97%. You are likely to make this error!

4. Greene County has undertaken a survey related to recycling of beverage bottles. A random sample was takenof n households, asking whether they recycle beverage bottles. The survey modeled X, the number responding “yes,” as binomial (n, p), meaning

n x nx− P[ X = x ] = pp(1− ) . Each household that responded “yes,” was further x asked for the number of bottles recycled. It was assumed that the number of bottles follows a with mean λ. Let T be the total number of bottles resulting from this process. Find the mean and of T.

SOLUTION: The conditional law of T, given X = x, is Poisson with parameter x λ. It’s just the sum of x independent Poisson random variables. Thus

E( T | X = x ) = x λ Var( T | X = x ) = x λ

In random variable form, these are

E( T | X ) = X λ Var( T | X ) = X λ

4  gs2011 B90.3302 / C22.0015 Exam of 2011.MAR.30 REDsolutions 

Then E( T ) = E { E( T | X ) } = E(X λ ) = npλ.

Also Var( T ) = Var { E( T | X ) } + E { Var( T | X ) }

= Var { X λ } + E { X λ }

= λ2 Var(X) + λ E X = λ2 np(1 – p) + λ np

= λ np ( λ(1 – p) + 1 )

5. X1, X2, …, Xn is a sample from the probability law which is uniform on the interval 1 [ -θ, 5θ ]. The density is f(x | θ) = I5(−θ ≤x ≤ θ) . 6θ

(a) Find the method of moments estimate for θ. (b) Find the maximum likelihood estimate for θ.

SOLUTION: For part (a), just notice that for any individual Xi , E(Xi) = 2 θ as 2 θ is ˆ midway between -θ and 5θ. Replace E(Xi) by X, leading to X =2 θMM . This leads X to θ=ˆ . MM 2

For (b), write the likelihood for the sample as

n 1 1 −θ ≤ ≤ θ −θ ≤ ≤ θ L = ∏ I5( xi ) = n I( xxmin) I5( max ) i=1 6θ (6θ)

1 xmax 1 xmax = n II(−θ ≤ xmin ) ≤ θ = n II(−xmin ≤θ)  ≤θ (6θ) 5 (6θ) 5

1 xmax n I max−xmin , ≤θ (6θ) { 5 }

In maximizing L, you want the estimate for θ to be as small as possible. This results in x X θˆ = max−x , max . In random variable form, this is θˆ = max−X , max . ML { min 5 } ML { min 5 }

5  gs2011