2WS30 - Mathematical

Homework 3 (2017) - Solutions

Note to the students: these solutions are for your reference only. In some parts I might elaborate more than what I expected you to do, while in others I might omit some details. Also, take the solutions with a grain of salt, as there might be typos and small errors.

Exercise 1. (a) The density of X is f (x)= 1 1 0 x ✓ . Clearly i ✓ ✓ {   } 1 1 f (x)= 1 0 x/✓ 1 = g(x/✓) , ✓ ✓ {   } ✓ where g(x)=1 0 x 1 , so this is a scale family, with scale ✓. {   } (b) For scale families we know that ✓ˆn/✓ is a pivotal quantity, where ✓ˆn is the maximum likelihood estimator of ✓. For this class of distributions we have seen in class that ✓ˆn = maxi Xi maxi Xi, therefore Q = ✓ is a pivotal quantity for ✓. (c) To use the pivotal quantity method we need to find an event involving Q with probability 1 ↵. Let’s find c so that ↵ P(Q c↵)=1 ↵. Clearly

P(Q c↵)=P(M/✓ c↵) = P(M ✓c↵) 1 xn 1 = n dx ✓n Zc↵✓ x n 1 = ✓ x=c↵✓ ⇣ ⌘n =1 c . ↵ 1/n Therefore we should take c↵ = ↵ . Now

maxi Xi P(Q c↵)=P ✓ ,  c ✓ ↵ ◆ meaning 0, 1 max X is a 100(1 ↵)% upper confidence interval for ✓. Plugging in ↵1/n i i the givenh information wei conclude that [0, 2.199] is a 95% CI interval for ✓. Note that we can significantly shorten this interval without changing the confidence level, as we know that with probability one ✓ max X . Therefore, a 95% CI interval for ✓ is [1.99, 2.199]. i i

1 (d) No, in this case we can not use Wald’s approach to construct an approximate confidence interval because the method is based on the asymptotic normality of the MLE. Wald’s theorem giving sucient conditions for the asymptotic normality of the MLE cannot be applied as the first assumption is violated, namely the support of the density function of Xi depends on ✓. Alternatively, the distribution of the MLE (which is M) is given in the question, and it is clearly not normal nor asymptotically normal , therefore Wald’s confidence intervals cannot be constructed.

Exercise 2. (a) The type I error of this test is the probability we reject the null hypothesis when it is true. Therefore

P✓=2( M 2 M c )=P✓=2(M 2) + P✓=2(M c) { }[{  }  =0+P✓=2 M c ) {  } c xn 1 = n dx 2n Z0 xn c = 2n  x=0 =(c/2)n =(1.8/2)30 =0.0424 ,

where we used the fact that c<2 to compute the limits of integration.

(b) The power of the test is computed in an analogous fashion as in (a), but now ✓ =1.9. Therefore the power in this case is

P✓=1.9( M 2 M c )=P✓=1.9(M 2) + P✓=1.9(M c) { }[{  }  =0+P✓=1.9 M c ) {  } min(c,✓) xn 1 = n dx 1.9n Z0 =(1.8/1.9)30 =0.1975 .

(c) Although this is similar to (b) and (a), we need to realize now that the event M 2 no { } longer has probability zero, since ✓ =2.1.

P✓=2.1( M 2 M c )=P✓=2.1(M 2) + P✓=2.1(M c) { }[{  }  2.1 xn 1 c xn 1 = n dx + n dx 2.1n 2.1n Z2 Z0 xn 2.1 xn c = + 2.1n 2.1n  x=2  x=0 =1 (2/2.1)30 +(1.8/2.1)30 =0.7784 . Note that although 2.1 is equally far from 2 than from 1.9 (in part (b)) the test is signifi- cantly more powerful against this alternative (so there is some asymmetry).

2 (d) We just refer back to (a), and simply have to solve the equation (c/2)n =0.05, so that c =2 0.051/30 =1.8099. ⇥ (e) Here n = 30 and the observed maximum m =1.99. The p-value is the significance level for which we are borderline in our decision, so this corresponds to taking c = m and computing the type I error. So, in conclusion, the p-value is (1.99/2)30 =0.8604.

Remark: note that this test has a very peculiar behavior that the p-value does not change smoothly. Suppose you actually observed a maximum m =2.00001. Then, no matter what the significance level is you would reject the null hypothesis, meaning the p-value is exactly 0 (makes sense, as you know that with probability one you cannot observe a value larger than 2 under the null).

Exercise 3. (a) Given the numerical values of the pH of the selected batch, there is not strong evidence to reject the hypothesis that these are samples from a . Looking at the normal QQ plot (qqnorm(x) in R), we see that the dots fall roughly on a straight line, which is what you expect for from a normal distribution. This is only a qualitative statement, and one should keep in mind that the sample size is very small. Quantitatively, we can look for example at the Shapiro-Wilk (recall that

the null hypothesis corresponds to normality). This test has a p-value=0.9484 which is quite large (in comparison with the usual significance level ↵ =0.05). So this there is no strong evidence against the normality hypothesis.

(b) Let µ denote the pH. We want to check whether the mean pH is really 5.4 at a significance level ↵ =0.1. Test H : µ =5.4 against H : µ =5.4. Because it is reasonable to assume the data 0 1 6 is a realization of a normal random sample this leads to the t-test we saw in class. The

3 corresponding test is

x¯ µ t(x)= 0 ,µ=5.4 . s/pn 0 We reject the null hypothesis if the test statistic is larger than t↵/2;n 1 = t0.05;9 =1.833. For our datasetx ¯ =5.344 and s =0.1641 and so t(x)=1.07941. Since this is smaller than t0.05;9 there is not enough evidence to reject the null hypothesis. (c) There are various ways to interpret the definition of p-value. A particularly useful way is to regard the p-value as the significance level at which you are borderline in your decision (meaning that is the significance level is smaller you won’t reject the null hypothesis and if it is larger you will reject it). In the present example, you will be borderline in your decision if tp/2;n 1 = t(x)=1.07941 where p is the p-value. So we simply need to compute 2 P(T 1.07941) where T is a student-t with n 1 degrees of freedom. ⇥ This can be done for instance using R: 2*(1-pt(1.07941,9)).Thep-value is therefore 0.3085. As a sanity check we notice, as expected, this is larger than 0.1, meaning the null hypothesis cannot be rejected at that significance level. If using the tables in the statistical compendium we cannot get a very fine grained eval- uation of the p-value. If one looks at the table with the quantiles of the t distribution we conclude that t0.2;9 < 1.07941

s s x¯ t↵/2;n 1 , x¯ + t↵/2;n 1 . pn pn  with ↵ =0.1. Plugging in the values specific to our dataset we get [5.2489, 5.4391]. Since 5.4 is contained in this interval we are quite confident this could be a reasonable value for the mean, so we reach the same conclusion as in (b). From class we know this approach is entirely equivalent to the test in (b). Remarks: The function t.test in R would provide you with most of the numerical quantities you need for parts (a), (b), and (c). Simply type t.test(x,mu=5.4,conf.level=0.90).

Exercise 4. (a) The Neyman-Pearson (NP) lemma tells us that when testing a simple null hypothesis against a simple alternative the most powerful test is based on the likelihood ratio statistic, which in this case is simply

100 0.02 (100 0.02)X X f0.02(X) e ⇥ ⇥X! 3 2 LR(X)= = X = e . f0.05(X) 100 0.05 (100 0.05) 5 e ⇥ ⇥X! ✓ ◆

The NP lemma tells us that we should reject H0 if LR(X) is large. Since LR(X)isa monotone decreasing function of X this means we should reject the null if X is small.

4 That is, we should reject the null if X c for some value of c to be specified. Since we  want our test to have type I error approximately 0.125 we want to find c so that

c i 5 5 P=0.05(X c)= e 0.125 .  i! ⇡ Xi=0 By trial and error we see that the choice c = 2 yields a type I error of e 5(1 + 5 + 52/2) ⇡ 0.1247, which is very close to 0.125 (any other choice of c will result in a very di↵erent values of type I error). Keep in mind that the data is discrete, so only a discrete number of significance levels can be attained (by any deterministic test). The power of the test is the probability we actually reject the null under the alternative. So this is computed in the same way as the type I error, but for =0.02. So the power is

c i 2 2 2 2 P=0.02(X c)= e e (1 + 2 + 2 /2) 0.6767 .  i! ⇡ Xi=0

(b) Actually we almost answered this in part (a). Take 0 <0 <1 arbitrary. The likelihood ratio is given by

X 100 (100 1) X e 1 ⇥ f1 (X) ⇥ X! 100(1 0) 1 LR(X)= = = e . (100 )X f (X) 100 0 0 0 0 e ⇥ X⇥! ✓ ◆

Note that, since 1 >0 the ratio 1/0 is larger than one, meaning LR(X) is a monotone increasing function of T (X) X (which is a statistic as it does not depend on the param- ⌘ eters 1 and 0). This means that the monotone likelihood ratio property is satisfied for the statistic T (X)=X. (c) Based on (b) we can easily derive the Uniformly Most Powerful (UMP) test of level ↵ of H : =0.02 against H : >0.02. This test will reject the null hypothesis if T (X) X 0 1 ⌘ is greater or equal to k↵,wherek↵ must be chosen to satisfy

P=0.02(X k↵)=↵. (d) Recall that the p-value is the significant level for which we are “borderline” in our decision to reject the null hypothesis. If we observe x = 1 we will be borderline in our decision if k↵ = x = 1. So the p-value is simply given by

x 1 i 2 2 2 P=0.02(X x)=1 P=0.02(X

x 1 i 2 2 2 2 3 P=0.02(X x)=1 P=0.02(X

5 Exercise 5. (a) The distributions of these random variables form a location-scale family (we’ve seen this in class). Simply note that 1 (X µ)2 1 x µ fµ,(x)= exp = , p 2 22 2⇡ ⇢ ✓ ◆ where is the density of a standard normal random variable.

(b) From the results stated in class you know that ˆn/ is a pivotal quantity for ,whereˆn is the MLE for the :

n 1 n 1 ˆ = (X X)2 = S2 . n vn i n u i=1 r u X t This means that any function of ˆ / will be a pivotal quantity. Therefore Q =(n n 1)S2/2 is pivotal quantity, and furthermore we saw in class that this is a 2 random variable with n 1 degrees of freedom. So we’ll use this as a pivotal quantity. Alternatively, you could simply note that this is a situation we treated in class, and apply the corresponding formula in slide 56 of slide set IV. (c) We simply need to find an event involving Q that happens with probability 1 ↵ =0.95. Since we want a specific form of confidence interval in the end it is not hard to see that we should consider the event Q 2 ,where2 is the ↵ quantile of the 2 distribution  ↵;n 1 ↵;n 1 (with n 1 degrees of freedom). Then 2 1 ↵ = P Q ↵;n 1  2 (n 1)S 2 = P ↵;n 1 2  ✓ ◆ 2 2 (n 1)S = P 2 ↵;n 1 ! (n 1)S2 = P 2 . s ↵;n 1 ! (n 1)s2 This means 2 , is a 100(1 ↵)% confidence interval for (in our case n = 12 0.05;n 1 1 r 2 and ↵ =0.05 and so 0.05;11 = 19.68). Note that in the table in the Appendix what is tabulated is 1 2 (always read the captions and details when looking at tabulated ↵;k results)!!! Alternatively, if using the expressions in slide 56 of IV you would get exactly the same answer. (n 1)s2 (d) Recall that A = 2 . The type I error of the test that rejects H0 if A 75 is simply 0.05;n 1 2 2 (n 1)S (n 1)S 2 P=75 2 75 = P=75 2 ↵;n 1 s ↵;n 1 ! 75 ✓ ◆ 2 (n 1)S 2 = P=75 ↵;n 1 = ↵. 2 ✓ ◆

6 So the type I error is precisely ↵. This is not unexpected, as this test is simply rejecting the null hypothesis if 75 is not contained on the lower confidence interval for (so that we are very confident is larger than 75).

(e) The computation of the power is almost analogous to that done in the previous question. The main di↵erence is that = 120 now. So the power is

2 2 2 (n 1)S (n 1)S 75 2 P=120 2 75 = P=120 2 2 ↵;n 1 s ↵;n 1 ! 120 120 ✓ ◆ 2 2 (n 1)S 75 2 = P=120 ↵;n 1 2 1202 ✓ ◆ 2 75 2 = P Q ↵;n 1 , 1202 ✓ ◆ 2 where Q is a 2 random variable with n 1 degrees of freedom. Now 75 2 =7.6856. 1202 ↵;n 1 Looking at the table in the Appendix we see that

2 2 0.75;11 < 7.6856 <0.25;11 ,

meaning the power is between 0.5 and 0.75. So in this case the power is below 0.75. We could compute the power exactly using R:

1-pchisq(75^2/120^2*qchisq(0.95,df=11),df=11) = pchisq(75^2/120^2*qchisq(0.95,df=11),df=11,lower.tail=F)

so the power is 0.7416 (which is indeed less than 0.75).

Exercise 6. (a) Although it was not asked, we can re-derive the MLE. The log-likelihood and score are respectively n `(; X)=n log X , i Xi=1 and n n `˙(; X)= X . i Xi=1 ˙ ˆ n 1 Solving for the equation `(; X) = 0 we conclude that n = n = ¯ is the maximum i=1 Xi X likelihood estimator, since `(, X) is a strictly concave functionP of . The corresponding point estimate of is simply 1 ˆ =2.257(µs) . The two-sided 95% confidence interval for using the Wald’s approach is then

z↵/2 z1 ↵/2 ˆ , ˆ + n p n p  nI⇤ nI⇤

7 Computing the observed Fisher Information

n 1 @2 I⇤ = log f(xi) ˆ , n @2 |=n Xi=1 1 2 I⇤ = = X¯ , ˆ2 n we derive the Wald Interval Estimator [2.033, 2.4814]. 2 (b) The GLR is given by

n n sup>0 i=1 f(Xi) fˆ(Xi) T (X)= n = . i=1 f2(Xi) f2(Xi) Q Yi=1 Therefore Q

n ˆX ˆe i T (X)= 2X 2e i Yi=1 n ˆ n X ˆ e i=1 i = n 2 Pn X 2 e i=1 i 1 P ˆ n = e(2 ) i=1 Xi ¯ n (2X) P 1 n(2 ˆ) 1 n X = e n i=1 i ¯ n (2X) P 1 n(2 1/X¯)X¯ = e (2X¯)n 1 n(2X¯ 1) = e , (2X¯)n

as we wanted to show.

(c) For the data collected we havex ¯ =0.443 and n = 390, therefore

2 log T (x)=2n(2¯x log(2¯x) 1) = 5.4899 . According to Wilk’s theorem the distribution of the GLR under the null hypothesis is asymptotically chi-squared with 1-0 degrees of freedom (the parameter space has dimension 1 and the null hypothesis has dimension 0). Therefore, at level ↵ =0.05 we reject the null hypothesis if T (x) 2 =3.841. So in this case there is enough evidence to reject the ↵;1 null hypothesis. From the table in the appendix we can also see that the p-value of this test is between 0.01 and 0.025. The p-value of this test can also be computed in R,

p-value=1-pchisq(5.4899,df=1)= 0.01912664.

Exercise 7. Please ask me if you want to know more about the solution of this problem.

8 Exercise 8. (a) Let’s use the notation in the hint. First note that

P0( (X) = 0) = P0( (X) = 1) = f0(x)dx , 6 ZA and P1( (X) = 1) = P1( (X) = 0) = f1(x)dx , 6 c ZA where A¯ denotes the complement of A. Now

R( ) R( ⇤)= f0(x)dx + f1(x)dx f0(x)dx f1(x)dx c ¯ ZA ZA ZA⇤ ZA⇤ = f (x)dx + 1 f (x)dx f (x)dx 1 f (x)dx 0 1 0 1 ZA ✓ ZA ◆ ZA⇤ ✓ ZA⇤ ◆ = f (x) f (x)dx f (x) f (x)dx 1 0 1 0 ZA⇤ ZA = f1(x) f0(x)dx f1(x) f0(x)dx A A A A Z ⇤\ Z \ ⇤

= f1(x) f0(x) dx + f1(x) f0(x) dx A A | | A A | | Z ⇤\ Z \ ⇤ 0 , where the second to last step follows from the fact that for any point x A we have 2 ⇤ f (x) f (x) 0 and for any point x/A f (x) f (x) < 0. 1 0 2 ⇤ 1 0 (b) Let’s try to use the above result, by considering a test for which we know the risk. Let A = . Clearly R( )=P0( (X) = 0) + P1( (X) = 1) = 0 + 1 = 1. Now, using the ; 6 6 derivation in (a)

1 R( ⇤)=R( ) R( ⇤)

= f1(x) f0(x) dx + f1(x) f0(x) dx A A | | A A | | Z ⇤\ Z \ ⇤ = f (x) f (x) dx = f (x) f (x)dx . | 1 0 | 1 0 ZA⇤ ZA⇤ We just showed that R( ⇤)=1 f1(x) f0(x)dx, which is almost what we wanted to A⇤ show. Now simply note that R

f1(x) f0(x) dx = f1(x) f0(x) dx + f1(x) f0(x) dx | | | | ¯ | | Z ZA⇤ ZA⇤ = f1(x) f0(x)dx + (f1(x) f0(x))dx ¯ ZA⇤ ZA⇤ = f1(x) f0(x)dx (f1(x) f0(x))dx ¯ ZA⇤ ZA⇤ = f (x) f (x)dx (f (x) f (x))dx + (f (x) f (x))dx 1 0 1 0 1 0 ZA⇤ Z ZA⇤ =2 f (x) f (x)dx , 1 0 ZA⇤

9 since f0(x)dx = f1(x)dx = 1 (these are densities). Therefore we conclude the final result. R R (c) The optimal risk R( ⇤) is zero in this case (that is, you can perfectly detect if data came from either distribution). This is not unexpected at all. To formally see this let S0 and S1 be the support of the densities f0 and f1,respectively.Then

f1(x) f0(x) dx = f1(x) f0(x) dx | | S0 S1 | | Z Z [ = f (x) f (x) dx + f (x) f (x) dx | 1 0 | | 1 0 | ZS0 ZS1

= f0(x)dx + f1(x)dx =2. ZS0 ZS1 This means the risk of is R( )=1 1 f (x) f (x) dx =1 2/2 = 0. ⇤ ⇤ 2 | 1 0 | Exercise 9. (a) Since the support of the densityR is the entire real line we can for sure compute the log-likelihood, which is given by n n (X µ)2 `(µ; X)= log(2⇡) n log(µ) i=1 i . 2 2µ2 P The score is therefore given by n n n 1 1 `˙(µ; X)= + (X µ)2 + 2(X µ) µ µ3 i 2µ2 i i=1 i=1 n n X nX = + (X˜ 2µX¯ + µ2)+ (X¯ µ) , µ µ3 µ2 where the last step follows after doing some basic algebra. To find the stationary points of the log-likelihood we need to solve `˙(µ; X) = 0 for µ, which simplifies (after some small manipulation) to µ2 + Xµ¯ X˜ =0. This equation has two has in general two solutions, namely X¯ X¯ 2 +4X˜ ± . p2 X¯+pX¯ 2+4X˜ However, since we assume µ>0 the only interesting solution isµ ˆ = 2 .This point is necessarily the only stationary point of the log-likelihood (for µ>0). Now noticing that `(ˆµ; X) > and that `(µ; X) converges to when either µ or µ 0 allows 1 1 !1 ! us to conclude this is actually the maximizer of the function. (b) We simply have to apply the formula (with n = 1) @2 I⇤ = E `(µ; X1) @µ2  1 3 2 2 1 2 1 = E (X 2µX1 + µ )+ ( 2X1 +2µ) (X1 µ) µ2 µ4 1 µ3 µ3 µ2  1 2 3 2 3 = E + X1 X = , µ2 µ2 µ4 1 µ2 

10 2 2 2 where the last step follows simply because E[X1]=µ and E[(X1 ]=V(X1)+E [X1]=2µ . (c) For this and the next question we need to compute the various estimators of the Fisher in- formation. Although it was not asked we can easily derive the expected Fisher information ˆ 2 and the two other forms of observed Fisher information. Clearly IA⇤ =3/µˆ . Also

n 2 1 1 1 2 2 1 Iˆ⇤ = + (X 2ˆµX +ˆµ )+ (X µˆ) B n µˆ µˆ3 i i µˆ2 i i=1 ✓ ◆ Xn 1 1 1 1 2 = X2 X . n µˆ3 i µˆ2 i µˆ Xi=1 ✓ ◆ Finally

n 1 1 2 3 2 Iˆ⇤ = + X X . C n µˆ2 µˆ2 i µˆ4 i Xi=1 For the rest of the answer see the attached code (and the corresponding comments).

(d) See attached code (and the corresponding comments).

Exercise 10. (a) Clearly

1 x µ 1 x 1 f (x)= = 1 = g(x/µ) , µ µ µ µ µ µ ✓ ◆ ✓ ◆ where is the density of a standard normal and g(x)=(x 1). Therefore this family of distributions is a scale family with µ.

(b) We will try to use the result that states the maximum likelihood estimator is asymptotically normal. Refer to slide V. 13. We need to argue that the conditions of the Theorem are met. Condition (1) holds, since fµ(x) > 0 for all x R. Condition (2) simply states that, 2 for µ1 = µ2 the two densities are di↵erent. This is clear, since µ1 = Eµ1 [X] = Eµ2 [X]=µ2. 6 6 Condition (4) is rather technical, but as explained in class, if condition (1) holds and the density is twice di↵erentiable it is almost impossible to come up with a situation where (4) doesn’t hold (you need “hardcore” measure theoretical constructions to come up with an example). Finally let’s sketch why condition (3) holds. Note that, by the law of large numbers X¯ E[X]=µ and X˜ E[X2]=2µ2 as n . This means that, when n is ! ! !1 large the MLE should converge to

µ + µ2 + 4(2µ2) = µ, p 2 which means the MLE is consistent (formalizing the above reasoning requires some basic results from probability theory).

11 Since the conditions of the theorem are met we know that the of pn(ˆµ µ) is normal with mean 0 and 1/I = µ2/3. Therefore ⇤ µˆ µˆ µ pn(T 1) = pn 1 = pn (0, 1/3) , µ µ ⇡N ✓ ◆ when n is large. Note that T is a real pivotal quantity (not an approximate pivotal quantity), which has the above asymptotic distribution. (c) For Q note that X¯ (µ, µ2/n). Therefore Q =(X¯ µ)/µ (0, 1/n), so this is a ⇠N ⇠N pivotal quantity. Similarly, for R note that Yi =(Xi µ)/µ are i.i.d. standard normal n 2 2 random variables, and R = i=1 Yi ,soR has a distribution with n degrees of freedom. (d) This was already done in (c).P (e) Note that pnQ has a standard normal distribution, and p3n(T 1) has approximately a standard normal distribution. To use the pivotal quantity method based on Q (or T ) we simply need to find an event involving Q (or T ) that happens with probability 1 ↵. Let’s simply use the event Q [ z ,z ] : { 2 ↵/2 ↵/2 } 1 ↵ = P( z↵/2 pnQ z↵/2)   X¯ µ = P z↵/2 z↵/2  µ/pn  ✓ ◆ X¯ = P z↵/2 pn pn z↵/2  µ  ✓ ◆ X¯ X¯ = P z↵/2 µ z↵/2 , 1+   1 ! pn pn where in the last step it was assumed that z↵/2 < pn (not a very stringent assumption, since for ↵ =0.05 this holds for n 4). So, provided z < pn,if¯x>0 a 100(1 ↵)% ↵/2 confidence interval for µ is given by x¯ x¯ z↵/2 , z↵/2 . "1+ 1 # pn pn Note that, ifx< ¯ 0 the interval is actually empty! (think about what that means). To apply the pivotal quantity method for T a similar reasoning applies, namely

1 ↵ P z↵/2 p3n(T 1) z↵/2 ⇡   ⇣ z↵/2 z↵/2 ⌘ = P 1 µ/µˆ 1+ p3n   p3n ✓ ◆ µˆ µˆ = P z↵/2 µ z↵/2 , 1+   1 ! p3n p3n

where in the last step it was assumed that z↵/2 < p3n (even less stringent than before). Therefore the associated 100(1 ↵)% confidence interval for µ is given by µˆ µˆ z↵/2 , z↵/2 . "1+ 1 # p3n p3n

12 (f) See the attached code and corresponding comments.

(g) The interval we derived using the pivotal method with T is di↵erent than than we obtained Wald’s approach. However, note that if n is large the two intervals are approximately identical. To see do a Taylor approximation around zero of the function 1/(1 + x)= 1 x + o(x) as x 0. Therefore ! 1 z↵/2 z 1 1+ ↵/2 ⇡ p3n p3n and 1 z↵/2 z 1+ 1 ↵/2 ⇡ p3n p3n if n is large.

13 ############################################################## # The implementation below is rather crude and inefficient, but it is easy to read it, and in most computers this should run in a few seconds either way. ############################################################## mu<-1 n<-10 N<-100000 coverage_A<-rep(0,N) coverage_B<-rep(0,N) coverage_C<-rep(0,N) coverage_Q<-rep(0,N) coverage_T<-rep(0,N) length_T<-rep(0,N) length_Q<-rep(0,N) hI_A<-rep(0,N) hI_B<-rep(0,N) hI_C<-rep(0,N)

Q<-rep(0,N) T<-rep(0,N) hmu_ML<-rep(0,N) for (k in 1:N) { y<-rnorm(n,mu,mu) my<-mean(y) my2<-mean(y^2) hmu<-(-my+sqrt(my^2+4*my2))/2

hmu_ML[k]<-hmu

hI_A[k]<-3/hmu^2 hI_B[k]<-mean((-1/hmu-y/hmu^2+y^2/hmu^3)^2) hI_C[k]<--1/hmu^2-2*my/hmu^3+3/hmu^4*my2

coverage_A[k]=(abs(hmu-mu)<=qnorm(0.975)/sqrt(n*hI_A[k])) coverage_B[k]=(abs(hmu-mu)<=qnorm(0.975)/sqrt(n*hI_B[k])) coverage_C[k]=(abs(hmu-mu)<=qnorm(0.975)/sqrt(n*hI_C[k]))

Q[k]<-(my-mu)/mu T[k]<-hmu/mu

coverage_T[k]<-(mu <= hmu/(1-qnorm(0.975)/sqrt(3*n)))*(mu >= hmu/(1+qnorm(0.975)/sqrt(3*n)))

length_T[k]<-hmu/(1-qnorm(0.975)/sqrt(3*n))-hmu/(1+qnorm(0.975)/sqrt(3*n))

tmp1 <- my/(1-qnorm(0.975)/sqrt(n)) tmp2 <- my/(1+qnorm(0.975)/sqrt(n))

coverage_Q[k]<-(mu <= tmp1)*(mu >= tmp2) length_Q[k]<-(tmp1-tmp2)*(tmp1>tmp2)

} mean(hI_A) mean(hI_B) mean(hI_C)

############################################################## # Ex 9 # For n=10 hI_B seems to underestimate the Fisher information number (essentially saying our estimator is worse than it really is), while the other two approaches overestimate the Fisher information number. As the sample size increases (e.g. n=20) the differences become small though. ############################################################## mean(coverage_A) mean(coverage_B) mean(coverage_C)

############################################################## # Ex 9 # All the intervals have a coverage that is smaller than 0.95, which means the intervals are not conservative (this means the intervals are too short). However, the intervals based on hI_B seem to be have coverage that is closer to the target 0.95. ##############################################################

############################################################## ############################################################## ##############################################################

############################################################## # Ex 10 ############################################################## mean(coverage_T) mean(coverage_Q)

### Both intervals are quite reasonable, and with a coverage that is significantly better than that of the previous intervals.

### In general, just looking at the coverage does not provide a fair comparison, as you can get better coverage by making the intervals wider. It is therefore important to look at the length of the various intervals, and see which method provides smaller ones. mean(2*qnorm(0.975)/sqrt(n*hI_A[k])) mean(2*qnorm(0.975)/sqrt(n*hI_B[k])) mean(2*qnorm(0.975)/sqrt(n*hI_C[k])) mean(length_T) mean(length_Q)

### The Wald intervals tend to be short, but have coverage less than 0.95 (so these are too short). The coverage of the intervals based on T and Q is spot on, but the length of the interval based on Q is too big!!! The interval based on T seems to be good both in terms of coverage and in terms of length. As the sample size increases these differences become less significant, except for the interval based on Q which is still quite large (by comparisson).

###Question: this raises an interesting question, what characterizes a good pivotal quantity (in this case T is "better" than Q)