Statistics GIDP Ph.D. Qualifying Exam Theory May 27, 2014, 9:00Am-1:00Pm

Statistics GIDP Ph.D. Qualifying Exam Theory May 27, 2014, 9:00am-1:00pm

Instructions: Provide answers on the supplied pads of paper; write on only one side of each sheet. Complete exactly 2 of the ﬁrst 3 problems, and 2 of the last 3 problems. Turn in only those sheets you wish to have graded. Stay calm and do your best; good luck.

1. An urn initially contains b black balls and r red balls. A ball is drawn at random and then replaced along with another ball of the same color. This procedure is then repeated one more time. Let Xn equal 0 if the ball at the nth draw is black and equal 1 if it is red. Let Y = X1 + X2. Calculate the mean and variance of Y .

2. (In this question, you may assume without proof that all moments required exist, are ﬁnite, and are non-zero.) For a random variable Y with mean µY and variance 2 σY ,“heavy-tailedness may be measured by kurtosis

4 E[(Y − µY ) ] τY = 4 . σY

If Y1,Y2, ··· ,Yn are i.i.d. from a distribution with mean µY = 0, kurtosis τY and n 2 P variance σY , show that T = Yi has kurtosis i=1

τT = τY + 3(n − 1)/n.

3. Let (Xi,Yi), i = 1, 2, ··· be a sequence of i.i.d. random vectors, where Xi ∼ exp(θ) for each i, with the pdf 1 x f (x) = exp − , x > 0, θ θ θ 1 with θ > 0 and mean E(Xi) = θ. And Yi ∼ exp( θ ), i = 1, ··· , n, are independent and identically distributed. Write

n n 1 X 1 X X¯ = X and Y¯ = Y . n n i n n i i=1 i=1 (a) Consider a linear estimator ˆ ¯ θa = aXn ˆ 2 where a is a constant. Find the value of a that minimizes E(θa − θ) .

1 ¯ (b) Show that Xn is a consistent and asymptotically normal estimator of θ and ﬁnd its asymptotic variance. ¯ (c) Show that 1/Yn is a consistent and asymptotically normal estimator of θ and ﬁnd its asymptotic variance.

4. Let X1, ··· ,Xn be a random sample from the distribution with the density ( (θ + θ )−1 exp(−x/θ ), x > 0, f(x|θ , θ ) = 1 2 1 1 2 −1 (θ1 + θ2) exp(x/θ2), x < 0.

where θ1 > 0 and θ2 > 0 are unknown parameters.

(a) Find Pr(X1 > 0). (b) Is this an exponential family? Justify your answer.

x1 = 2.0, x2 = −0.6, x3 = −0.4.

Use these data to ﬁnd the maximum likelihood estimates for θ1 and θ2.

T 5. Let (Yi1,Yi2) , i = 1, ··· , n be n independent random vectors which are distributed µi as bivariate Normal N( , Σ) with µi = α + βxi for i = 1, ··· , n and covariance µi 1 ρ matrix Σ = σ2 . The parameters α, β, σ2, ρ are unknown. Here x , ··· , x are ρ 1 1 n known constants. Deﬁne n n 1 X 1 X Y¯ = Y¯ , x¯ = x .. n i· n i i=1 i=1 and Y + Y Y¯ = i1 i2 , i = 1, ··· , n. i· 2 (a) Obtain a minimal suﬃcient statistic for (α, β, σ2, ρ). (b) It can be shown that the MLE of β is Pn (x − x¯)(Y¯ − Y¯ ) βˆ = i=1 i i· ·· MLE Pn 2 i=1(xi − x¯) ˆ ˆ Obtain E βMLE . Establish that βMLE is the UMVUE of β. (c) Establish the distribution of

n n 2 X 2 X X ¯ 2 S1 = (Yi1 − Yi2) = 2 (Yij − Yi·) . i=1 i=1 j=1

2 2 (d) Use S1 to obtain the UMVUE for the parameter θ = σ (1 − ρ). 6. Assume that a chemical experiment can produce one of ﬁve possible consequences, with the following probability distribution

Consequence (k) 1 2 3 4 5 Probability p4 4p3q 6p2q2 4pq3 q4

where 0 < p < 1 is unknown and q = 1 − p. Now we repeat the chemical experiment n times. Let Xk denote the total counts of runs which have Consequence k in n runs, for k = 1, ..., 5. Assume the consequences of individual experiments are independent.

(a) Show that the distribution of the counts (X1,X2,X3,X4,X5) belongs to the exponential family.

(b) Show that T = 4X1 + 3X2 + 2X3 + X4 is a suﬃcient statistic for p. (c) Is T a complete statistic? Justify your answer. (d) Show that T ∼ Bin(4n, p). (e) Derive the uniformly most powerful (UMP) level α test of the hypotheses 1 1 H : p ≤ vs H : p > . 0 2 1 2 For each part, you are allowed to use results or conclusions drawn in the earlier parts.

3 Solutions:

r 1. X1 is obviously a Bernoulli r.v. with p = b+r , X2 is a Bernoulli r.v. with r r + 1 b r r p = · + · = . r + b r + b + 1 r + b r + b + 1 r + b

r Therefore, E(Y ) = E(X1) + E(X2) = 2 r+b .

Var(Y ) = Var(X1) + Var(X2) + 2(E(X1X2) − E(X1)E(X2)) r b r r + 1 r r = 2 · + 2( · − · ) r + b r + b r + b r + b + 1 b + r b + r br b + r + 2 = 2 · . (b + r)2 b + r + 1

2 2 2. (a) Without loss of generality, assume µY = 0. First, it is easy to ﬁnd σT = nσY . When to ﬁnd the relationship between ET 4 and EY 4, we only need to consider items with all even powers, because with odd powers, at least one EYi must be 2 2 n4 involved. The total number of EYi Yj is 2 2 , therefore,

Pn 4 4 2 2 E( i=1 Yi) nEY + 6n(n − 1)/2 · (EY ) 1 τT = 4 = 2 4 = τY + 3(n − 1)/n. σT n σY n

(b) Using the result from part (a), we ﬁrst get the kurtosis of a Bernoulli r.v.

1 (1 − p)4 + 1 (0 − p)4 (1 − p)4 + p4 τ = 2 2 = . Y [p(1 − p)]2 2p2(1 − p)2

Hence, the kurtosis for Bin(n, p) is τT = τY /n + 3(n − 1)/n. 3. (a) Let h(a) denote the MSE at a and decompose as,

ˆ 2 ˆ 2 ˆ h(a) = E(θa − θ) = [Bias(θa)] + Var(θa) θ2 = (a − 1)2θ2 + a2 n n Minimizing h(a), we get a = n+1 . (b) Consistency can be showed by LLN. By CLT, √ ¯ D 2 n(Xn − θ) → N(0, θ ).

3 (c) By CLT, √ ¯ D 2 n(Yn − 1/θ) → N(0, 1/θ ). Apply the delta method and obtain √ ¯ D 2 n(1/Yn − θ) → N(0, θ ).

4 4. (a) P (X > 0) = R ∞(θ + θ )−1 exp{−x/θ }dx = θ1 . 1 0 1 2 1 θ1+θ2 (b) The density of X can be expressed as 1 −xI(x > 0) xI(x < 0) f(x|θ1, θ2) = exp[ + ]. θ1 + θ2 θ1 θ2

It is a two-dimensional exponential family with T1(X) = −XI(X > 0) and T2(X) = XI(X < 0), and w1(θ) = 1/θ1 and w2(θ) = 1/θ2. Pn Pn (c) The pair (− i=1[xiI(xi > 0)], i=1[xiI(xi < 0)]) is sufficient for (θ1, θ2). (d) The log likelihood is Pn Pn i=1[−xiI(xi > 0)] i=1[xiI(xi < 0)] log L(θ1, θ2|x1, ··· , xn) = −n log(θ1+θ2)+ + . θ1 θ2 Pn Pn Define x+ = i=1 xiI(xi > 0) and x− = i=1 xi(xi < 0). Taking the partial first-order derivatives with respect to θ1 and θ2, we get the equation system x+ n 2 = θ1 θ1 + θ2 x− n 2 = − . θ2 θ1 + θ2 Plug in x = Pn x I(x > 0) = 2 and x = Pn x I(x < 0) = −1, and solve √ + i=1 i√ i − i=1 i i ˆ 2+ 2 ˆ 1+ 2 θ1 = 3 , θ2 = 3 . Check the second-order derivative too. a b 5. (a) Note that the inverse Σ−1 = where a = 1 and b = − ρ . There is a b a 1−ρ2 1−ρ2 one-to-one correspondence between the pair (ρ, σ2) and (a, b). The determinant 2 2 T |Σ| = σ (1 − ρ ). The joint pdf of (Yi1,Yi2) , i = 1, ··· , n is n n Y Y −1 2 2 1/2 f(yi1, yi2|α, β, a, b) = (2π) (a − b ) · i=1 i=1 ( ) 1 Y α + βx T Y α + βx exp − i1 − i Σ−1 i1 − i 2 Yi2 α + βxi Yi2 α + βxi 1 = (2π)−n(a2 − b2)n/2 exp{− T } 2 where n T T T X Y a b Y Y a b α + βx α + βx a b α + βx T = i1 i1 − 2 i1 i + i i Yi2 b a Yi2 Yi2 b a α + βxi α + βxi b a α + βxi i=1 n X 2 2 = aYi1 + 2bYi1Yi2 + aYi2 − 2α(a + b)(Yi1 + Yi2) − 2β(a + b)(Yi1 + Yi2)xi + constant i=1 n 2 n n X X 2 X ¯ X ¯ = a Yij + 2b( Yi1Yi2) − 4nα(a + b)Y·· − 4β(a + b)( xiYi·) + constant, i=1 j=1 i=1 i=1

1 P 2 P ¯ P ¯ Therefore, i,j Yij, i Yi1Yi2, Y··, i xiYi· is complete and suﬃcient for (a, b, α, β), also for (σ2, ρ, α, β). Pn ¯ ¯ ˆ i=1 xiYi·−nx¯Y·· ¯ ¯ (b) βMLE = Pn 2 . Note E(Yi·) = α + βxi, E(Y··) = α + βx¯. As a function i=1(xi−x¯) ˆ of complete and suﬃcient statistics, βMLE is the UMVUE of β. 2 (c) Note Yi1 −Yi2 ∼ N(0, 2σ (1−ρ)), and Yi1 −Yi2’s are all indepdendent. Therefore, Pn 2 S1 i=1(Yi1−Yi2) 2 2θ = 2σ2(1−ρ) ∼ χn. And E(S1) = 2nθ. Note

n n n X 2 X 2 2 X 2 X S1 = (Yi1 − Yi2) = (Yi1 + Yi2 − 2Yi1Yi2) = Yij − 2 Yi1Yi2 i=1 i=1 i,j i=1

S1 is a function of complete and suﬃcient statistics. The UMVUE of θ is 2n .

6. (a) (X1,X2,X3,X4,X5) follows a multinomial distribution with the joint pdf

n! x1 x5 fp(x) = P (X1 = x1, ..., X5 = x5) = p1 ··· p5 x1! ··· x5! n! = (p4)x1 (4p3q)x2 (6p2q2)x3 (4pq3)x4 (q4)x5 x1! ··· x5! ∝ p4x1+3x2+2x3+x4 qx2+2x3+3x4+4x5 ∝ pT (x)q4n−T (x) p ∝ q4n exp[T (x) log( )] q

p where η = log( q ) is the natural parameter. This belong an exponential family. (b) T (X) = 4X1 + 3X2 + 2X3 + X4 is suﬃcient for p. p (c) The parameter space {log( 1−p ) : 0 < p < 1} is an open set in R, so T is complete. (d) Note T essentially counts the total number of heads in tossing 4n iid coins. Alternatively, use the MGF argument.

(e) Assume 0 < p0 < p1 < 1, then the likelihood ratio

q1 4n p1 q0 l(x) = fp1 (x)/fp0 (x) = [ ] exp[T (x) log( )] q0 p0 q1 is monotonically increasing in T . Using Karlin-Rubin, the UMP test is to reject H0 iﬀ T > c, with c satisfying P 1 (T > c) = α. p= 2