CHAPTER 6 Point Estimation

1

6.1 Introduction Statistics may be defined as the science of using data to better understand specific traits of chosen variables. In Chapters 1 through 5 we have encountered many such traits. For example, in relation to a 2-

D random variable (X ,Y ) , classic traits include:  X , X , f X (x) (marginal traits for X), Y , Y , fY (y)

(marginal traits for Y), and XY ,  XY , f XY (x, y)) (joint traits for (X ,Y ) ). Notice that all of these traits are simply parameters. This includes even a trait such as f XY (x, y) . For each ordered pair (x, y) , the quantity f XY (x, y) is a scalar-valued parameter.

n Definition 1.1 Let  denote a parameter associated with a random variable, X, and let {X k }k1 denote a  n collection of associated data collection variables. Then the estimator   g[{X k }k1] is said to be a point estimator of  .

Remark 1.1 The term ‘point’ refers to the fact that we obtain a single number as an estimate; as opposed to, say, an interval estimate.

Much of the material in this chapter will refer to material in the course textbook, so that the reader may go to that material for further clarification and examples.

6.2 The Central Limit Theorem (CLT) and Related Topics (mainly Ch.8 of the textbook) Even though we have discussed the CLT on numerous occasions, because it plays such a central role in point estimation we present it again here.

Theorem 2.1 (Theorem 8.3 of the textbook) (Central Limit Theorem) Suppose that X ~ f X (x) and  that  2   , and suppose that {X }n ~ iid(X ) . Define {Z } where , X k k 1 n n1 Zn (X n  X ) /( X / n)  n  X  X / n with n  k . Then lim Zn  Z ~ N(0,1) . k 1 n

Remark 2.1 In relation to point estimation of  X for large n, This theorem states that   . (X n   X ) /( X / n)~ N(0,1)  X n ~ N(X   X ,  X   X / n)

 Remark 2.2 The authors state (p.269) that “In practice, this approximation [i.e. that 2 X n ~ N(X , X / n) ] is used when n  30 , regardless of the actual shape of the population sampled.” Now, while this is, 2 indeed, a ‘rule of thumb’, that does not mean that it is always justified. There are two key issues that must be reckoned with, as will be illustrated in the following example.

n Example 2.1. Suppose that {X k }k 1 ~ iid(X ) with X ~ Bernoulli(p). Then X  p and 2 n  X  p(1 p) / n . In fact, X ~ (1/ n)  Binomial(n, p) . Specifically, SX {xk  k / n}k 0 . Case 1: n = 30 and p = 0.5.Clearly, the shape is that of a bell curve.

0 . 1 6

0 . 1 4

0 . 1 2

0 . 1

0 . 0 8

0 . 0 6

0 . 0 4

0 . 0 2

0 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1

Figure 2.1 The pdf of X ~ (1/ 30)  Binomial(n  30, p  0.5)

 Clearly, is a discrete random variable; whereas to claim that 2 is to claim that it is X X n ~ N(0.5,0.0911 ) a continuous random variable. To obtain this continuous pdf, it is only necessary to scale Figure 1 to have total area equal to one (i.e. divide each number by 1/30, and use a bar plot instead of a stem plot). The result is given in Figure 2.

C o n t i n u o u s A p p r o x i m a t i o n o f X 4 . 5

4

3 . 5

3

2 . 5

2

1 . 5

1

0 . 5

0 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 3

C o n t i n u o u s A p p r o x i m a t i o n o f X

4

3 . 5

3

2 . 5

2

1 . 5

1

0 . 5

0 0 . 4 4 0 . 4 6 0 . 4 8 0 . 5 0 . 5 2 0 . 5 4 0 . 5 6

Figure 2.2 Comparison of the staircase pdf and the normal pdf. The lower plot shows the error (green) associated with using the normal pdf to estimate the probability of the double arrow region.

Case 2: n = 30 and p = 0.1.

P D F o f X ~ ( 1 / n ) * b i n o m i a l ( 3 0 , 0 . 5 ) 0 . 2 5

0 . 2

0 . 1 5

0 . 1

0 . 0 5

0 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1

C o n t i n u o u s A p p r o x i m a t i o n o f X 8

7

6

5

4

3

2

1

0 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2

Figure 2.3 The pdf of X ~ (1/ 30)  Binomial(n  30, p  0.1) (TOP), and the continuous approximations (BOTTOM). The problem here is that by using the normal approximation, we see that Pr[0  X 1]is significantly less than one! □ 4

We now address some sampling distribution results given in Chapter 8 of the text.

n n 2 2 Theorem 2.2 (Theorem 8.8 in the textbook) For {Zk }k 1 ~ iid N(0,1) , Y   Zk ~ n . k 1 Proof: The proof of this theorem relies on two ‘facts’:

2 2 (i) For Z ~ N(0,1) , the random variable W  Z ~ 1 .

n n 2 2 2 (ii) For{Wk }k1 ~ iid 1 , the random variable Y   Zk ~ n . k 1

2 2 Claim 2.1 For Y ~ n : Y  n and  Y  2n .

The proof of the claim Y  n follows directly from the linearity of E(*):

n n n  2  2 E(Y)  E Zk    E(Zk )  1  n.  k1  k1 k1

2 The proof of the claim  Y  2n requires a little more work.

n 2 2 Example 2.2 Suppose that {X k }k 1 ~ iid N(, ) . Obtain the pdf for the point estimator of   n  2 2 2 given by   Sn (1/ n)(X k  ) . k1

2   n  X    n Solution: Write 2 2 2 2 k 2 2 . n /  nSn /     Zk ~ n k1    k1

  2   2 2 2 2   2 Even though n / ~ n , it is sometimes easier (and more insightful) to say that  ~   n .  n  In either case, we have

  2    2   2  2   2   2   2 (i) E( )  E  n     E(n )   n   , and  n    n   n 

2 2   2    2   2  2 4 2   2   2   (ii) Var( )  Var  n     Var(n )    2n  .  n    n   n  n

  2 2 2 Finally, if n is sufficiently large that the CLT holds, then  ~ N(  2   ,  2   2/ n) . □   5

nS 2 n (X  )2  n n  k  Z 2 where {Z }n ~ iid N(0,1) Proof: 2  2  k k k 1 . □  k 1  k 1

n 2 Theorem 2.3 (Theorem 8.11 of the textbook) Given {X k }k1 ~ iid N( X , X ) , suppose that  X is   n   2 S 2 (1/ n 1) (X )2  2 not known. Let  X  X and  X  n1    k   X . Then (i) X and  X are independent, k1  2 n (n 1) X 2  2 1  2 and (ii) 2 ~ n1 . The above should be:  X  (X k   X ) for the related result.  X n 1 k1 Proof: [This is one reason that a proof can be insightful; not simply a mathematical torture ]

n n n n 2 2 2 2 (X k  X )  [(X k   X )  (X   X )]  (X k   X )  2(X   X )(X k   X )  n(X   X )] k1 k1 k1 k1

n n But (X k   X )   X k  n X  nX  n X  n(X   X ) . And so, we have: k1 k1

n n 2 2 2 2 (X k  X )  (X k   X )  n(X   X ) . Dividing both sides by  X gives: k1 k1 2 2 n n 1 2  X     X    X  X   k X    X    2   2   2 where the rightmost equality 2  k      n 1 n1  n k1 k1   X   X / n  follows from the independence result (not proved).

n 2 2 2 Hence, we have shown that X k  X    X  n1 . From this result, we have k1

 n 2  n 2 2 1 2  X 2 2 1 2  X 2 Sn1  X k  X    n1 and Sn  X k  X    n1 n 1 k1 n 1 n k1 n

Remark 2.3 A comparison of Theorems 2.2 and 2.3 reveals the effect of replacing the mean X by its  estimator  X  X ; namely, that one loses one degree of freedom in relation to the (scaled) chi-squared 2 distribution for the estimator of the variance  X . It is not uncommon that persons not well-versed in probability and statistics use X in estimating the variance, even when the true mean is known! The price paid for this ignorance is that the variance estimator will not be as accurate.

2 Theorem 2.4 (Theorem 8.12 in the textbook) Suppose that Z ~ N(0,1) and Y ~ v are independent.  Define T  Z / Y / v . Then the probability density function for T is: 6

 v 1 v1   2   2   x  2 f (x)   1  ; x(,) . T  v   v   v      2 

This pdf is called the (student) t distribution with v degrees of freedom and is denoted as tv.

From Theorems 2.3 and 2.4 we immediately have

  2 2 2 Theorem 2.5 (Theorem 8.13 in the textbook) If  X  X and  X  Sn1 are estimates of X and  X , n 2 respectively, based on {X k }k 1 ~ iid X ~ N(X , X ) , then X   T  X ~ t 2 n1 . Sn1 / n Explain:

Theorem 2.6 (Theorem 8.14 in the textbook) Suppose that U ~  2 and V ~  2 are independent. Then v1 v2 U / v F  1 F has an F distribution, and is denoted as v1,v2 . V / v2

As an immediate consequence, we have

{S 2 ( j)}2 Theorem 2.7 (Theorem 8.15 in the textbook) Suppose that n j 1 j1 are sample variances of

2 2 n1 n2 {X ( j) ~ N( j , j )}j1 , obtained using {X k (1)}k 1 and {X k (2)}k 1 , respectively, and that the elements in each sample are independent, and that the samples are also independent. Then

2 2  Sn 1 /1 F  1 has an F distribution with v  n 1 and v  n 1degrees of freedom. S 2 / 2 1 1 2 2 n2 1 2

Remark 2.4 It should be clear that this statistic plays a major role in determining whether two variances are equal or not.

Q: Suppose that n1  n2  n is large enough to invoke the CLT. How could you use this to arrive at another test statistic for deciding whether two variances are equal? A: Use their DIFFERENCE 7

6.3 A Summary of Useful Point Estimation Results The majority of the results of the last section pertained to the parameters  and  2 . For this reason, the following summary will address each one of these parameters, individually, as the parameter of primary concern.

2 n Results when µ is of Primary Concern- Suppose that X ~ f X (x; X , X ) , and that {X k }k1 are iid data collection variables that are to be used to investigate the unknown parameter  X . For such an  investigation, we will use  X  X .

2 Case 1: X ~ N(x;  X , X )

 2 (Remark 2.1): Z  ( X   X ) /( X / n) ~ N(0,1) for any n, when  X is known.     n X X 2  2 1  2 T  ~ tn1 (X ) (Theorem 2.5):  2 for any n, when  X is estimated by  X   k   X  X / n n 1 k1

2 Case 2: X ~ f X (x; X , X ) For n sufficiently large (i.e. the CLT is a good approximation), then:

 2 (Remark 2.1): Z  ( X   X ) /( X / n) ~ N(0,1) when  X is known.

 n  X   X  1  Z  ~ N(0,1) 2 2 (X )2 (Theorem 2.5):  2 when  X is estimated by  X   k   X  X / n n 1 k1

2 2 n Results when σ is of Primary Concern- Suppose that X ~ f X (x; X , X ) , and that {X k }k1 are iid 2 data collection variables that are to be used to investigate the unknown parameter  X . For such an  investigation, we will use  X  X , when appropriate.

2 Case 1: X ~ N(x;  X , X )

n  2 2 2  2 1 2 (Example 2.2): n / ~ n , when  X is known and  X  (X k   X ) n k1  (n 1) 2  1 n  X ~ 2 2 2 (X )2 (Theorem 2.3): 2 n1 for any n, when  X is estimated by  X   k   X  X n 1 k1  2 2  / n j X1 X1  2 1 ( j) 2 (Theorem 2.7): F   ~ fn ,n when  X  (X k   X ) ; j  1,2 .  2 / 2 1 2 j n  j X 2 X 2 j k1

 2 2  / n j X1 X1  2 1 ( j)  2 (Theorem 2.7): F   ~ fn 1,n 1 when  X  (X k  X ) ; j  1,2 .  2 / 2 1 2 j n 1 j X 2 X 2 j k1 8

2 Case 2: X ~ f X (x; X , X ) For n sufficiently large (i.e. the CLT is a good approximation), then:   2  2  (Example 2.2): Z  X ~ N(0,1)  2 2/ n Application of the above results to ( X,Y )- For a 2-D random variable, (X ,Y ) , suppose that they are independent. Then we can define the random variable W  X Y . Regardless of the independence assumption, we have W   X  Y . In view of 2 2 2 that assumption, we have W   X  Y . Hence, we are now in the setting of a 1-D random variable, W, and so many of the above results apply. Even if we drop the independence assumption, we can still apply 2 2 2 them. The only difference is that in this situation W   X  Y  2 XY . Hence, we need to realize that a larger value of n may be needed to achieve an acceptable level of uncertainty in relation to, for example.  n W  W . Specifically, if we assume the data collection variables {Wk }k1 are iid, then

2 2 2 2     / n  (   2 ) / n . W W X Y XY

The perhaps more interesting situation is where are concerned with either  XY or  XY   XY /( X Y ) .

Of these two parameters,  XY is the more tractable one to address using the above results. To see this, consider the estimator:  1 n  XY  (X k   X )(Yk  Y ) . n k1

Define the random variable W  (X   X )(Y  Y ) . Then we have  1 n 1 n   XY  (X k   X )(Yk  Y )  Wk  W  W . n k1 n k1

  And so, we are now in the setting where we are concerned with W   XY . In relation to W   XY , we have:   1 n  1 n 1 n E(W )  E( XY )  E (X k   X )(Yk  Y )   E[(X k   X )(Yk  Y )]   XY   XY  W . n k1  n k1 n k1  We also have Var(W )  Var(W ) / n . The trick now is to obtain an expression for Var(W ) . To this end, we begin with:

2 2 2 2 2 Var(W )  E[(W  W ) ]  E(W )  W  E[{(X   X )(Y  Y )} ]  XY . This can be written as: 9

 2 2  2 2  X     Y    2 2 2 2 2 2 2 Var(W )    E X   Y       {E[(Z Z ) ]   }    Var(Z Z ) . X Y       XY X Y 1 2 Z1Z2 X Y 1 2  X   Y   And so, we are led to ask the

Question: Suppose that Z1 ~ N(0,1) and Z2 ~ N(0,1) have E(Z1Z2 )   . Then what is the variance of

W  Z1Z2 ?

Answer: As noted above, the mean of W is E(Z1Z2 )   . Before getting too involved in the mathematics, let’s run some simulations : The following plot is for   0.5 :

S i m u l a t i o n - b a s e d p d f f o r W w i t h r = 0 . 5 0 . 8

0 . 7

0 . 6

0 . 5

0 . 4

0 . 3

0 . 2

0 . 1

0 - 6 - 4 - 2 0 2 4 6 8 1 0 1 2

Figure 3.1 Plot of the simulation-based estimate of fW (w) for   0.5 . The simulations also resulted in 2 W 0.4767 and W  1.1716 . The Matlab code is given below.

% PROGRAM name: z1z2.m % This code uses simulations to investigate the pdf of W = Z1*Z2 % where Z1 & Z2 are unit normal r.v.s with cov= r; nsim = 10000; r=0.5; mu = [0 0]; Sigma = [1 r; r 1]; Z = mvnrnd(mu, Sigma, nsim); plot(Z(:,1),Z(:,2),'.'); pause W = Z(:,1).*Z(:,2); Wmax = max(W); Wmin=min(W); db = (Wmax - Wmin)/50; bctr = Wmin + db/2:db:Wmax-db/2; fw = hist(W,bctr); fw = (nsim*db)^-1 * fw; bar(bctr,fw) title('Simulation-based pdf for W with r = 0.5')

Conclusion: The simulations revealed a number of interesting things. For one, the mean was

W  0.4767 . One would have thought that for 10,000 simulations it would have been closer to the true mean 0.5. Also, and not unrelated to this, is the fact that fW (w) has very long tails. Based on this simple 10 simulation-based analysis, one might be thankful that the mathematical pursuit was not readily undertaken, as it would appear that it will be a formidable undertaking!

n QUESTION: How can knowledge of fW (w) be used in relation to estimating ρXY from {(X k ,Yk )}k1 ?

Computation of fXY(w): Method 1: Use THEOREM 7.2 on p.248 of the book. [A nice example of its application ]

Let f (x1, x2 ) be the joint pdf of the 2-D continuous random variable (X1, X 2 ) . Let Y1  X1 and

Y2  X1X 2 . Then x1  y1 and x2  y2 / y1 . And so, the Jacobian is:

x1 / y1 x1 / y2   1 0  1 J  det   det 2   . x2 / y1 x2 / y2   y2 / y1 1/ y1 y1

Hence, the pdf for (Y1,Y2 ) is: 1 g(y1, y2 )  f (x1, x2 )| J | f (y1, y2 / y1) . | y1 |

1 2 2 (x1 x2 2 x1x2 ) 1 2(1 2 ) We now apply this to: f (x1, x2 )  e [see DEFINITION 6.8 on p.220.] 2 1  2

1 [ y2 ( y / y )2 2 y ] 1 1 2 1 2 1 2 g(y , y )  f (y , y / y )  e 2(1 ) . 1 2 1 2 1 | y | 2 1 | y1 | 2 1  Hence,

1  [ y2 ( y / y )2 2 y ] 1 2 1 2 1 2 f (y ) e 2(1 ) dy (***) Y2 2  2 1 | y1 | 2 1 

I will proceed with a possibly useful integral, but this will not, in itself, give a closed form for (***). [Table of Integrals, Series, and Products by Gradshteyn & Ryzhik p.307 #3.325]

 b (a x2  ) 2 1  2 a b  e x dx  e . 0 2 a

Method 2 (use characteristic functions): Assume that X and Y are two independent standard normal random variables and let us compute the characteristic function of XY=W.

2 2 2 One knows that E(eiX )  e / 2 . Hence, E(eiXY | Y)  e Y / 2 . And so: 11

  2 2 1 2 2 2 1 2 2 E(eiXY )  E[E(eiXY | Y )]  E[e Y / 2 ]   e y / 2ey / 2dy   e(1 ) y / 2dy  2  2  .   1 2 2 1  1 2 2  1 ey / 2[1/(1 )]dy   ey / 2[1/(1 )]dy    (i) 2  2 2  2 W  1  2[1/(1 )]   1

Hence, 1  1  ei w 1  cos(w) 1 f (w)   (i)ei wd  d  d  K (w) W  W  2  2 0 2  2  1  0 1  where (G&R p.419 entry 3.7542) is:

1  cos(w) 1 1  i   i  f (w)  d  K (w)   H (1) (iw)   H (1) (iw) p.952, 8.4071 W  2 0 0 0  0 1    2   2  (1) where H0 (iw) is a Bessel function of the third kind, also called a Hankel function. Yuck!!!

For my own possible benefit: The OP mentions the characteristic function of the product of two independent brownian motions, say the processes (Xt)t and (Yt)t. The above yields the distribution of

Z1=X1Y1 and, by homogeneity, of Zt=XtYt which is distributed like tZ1 for every t⩾0. However this does not determine the distribution of the process (Zt)t. For example, to compute the distribution of (Zt,Zs) for

0⩽t⩽s, one could write the increment Zs−Zt as Zs−Zt=Xt(Ys−Yt)+Yt(Xs−Xt)+(Xs−Xt)(Ys−Yt) but the terms

Xt(Ys−Yt) and Yt(Xs−Xt) show that Zs−Zt is not independent on Zt and that (Zt)t is probably not Markov. [NOTE: Take X and Y two Gaussian random variables with mean 0 and variance 1. Since they have the same variance, X−Y and X+Y are independent.]□

6.4 Simple Hypothesis Testing Example 4.1 (Book 8.78 on p.293): A random sample of size 25 from a normal population had mean value 47 and standard deviation 7. If we base our decision on Theorem 2.5 (t-distribution), can we say that this information supports the claim that the population mean is 42?

Solution: While not formally stated as such, this is a problem in hypothesis testing. Specifically,

H0 : X  42 versus H1 :  X  42 . The natural rationale for deciding which hypothesis to announce is simple: 12

 If  X  X is ‘sufficiently close’ to42, then we will announce H0. X   T  X ~ t Regardless of whether or not H0 is true, we know that 2 24 . S25 / n

X  42 T  ~ t Assuming that, indeed, H0 is true, (i.e.  X  42 ), then 2 24 . A plot of fT (t) is shown S25 / n below.

P l o t o f f ( t ) f o r v = 2 4 T 0 . 4

0 . 3 5

0 . 3

0 . 2 5

0 . 2

0 . 1 5

0 . 1

0 . 0 5

0 - 4 - 3 - 2 - 1 0 1 2 3 4 t

Figure 4.1 A plot of fT (t) for T ~ t24 .

Based on our above rationale for announcing H0 versus H1 , if a measurement of this T is sufficiently large, we should announce H1 , even if H0 is, in fact, true. Suppose that we were to use a threshold value tth . Then our decision rule is:

[| T | tth ] = The event that we will announce H0 . Or, equivalently:

[| T | tth ] = The event that we will announce H1 .

If, in fact, H0 is true, then our false alarm probability is:

 X  42 where T  ~ t24 . Pr[| T | tth ] 2 S25 / n There are a number of ways to proceed from here.

Case 1: Specify the false alarm probability,  , that we are willing to risk. Suppose, for example, that we choose   .05 (i.e. we are willing to risk wrongly announcing H1 with a 5% probability.) Then 13

 X  42 where T  ~ t24 results in a threshold t  tinv(.975,24) = 2.064. Pr[| T | tth ] 2 th S25 / n x   47  42 t  X   3.53 From the above data, we have 2 2 . Since this value is greater than 2.064 then sn / n 7 / 25 we should announce H1 . x   47  42 t  X   3.53 Case 2: Use the data to first compute 2 2 , and then use this as our threshold, tth sn / n 7 / 25

. Then because our value was t  3.53, this threshold has been met, and we should announce H1 . In this case, the probability that we are wrong becomes: Pr[| T | 3.53]  2Pr[T  3.53]  2*tcdf(-3.53,24) = 0.0017.

QUESTION: Are we willing to risk announcing H1 with the probability of being wrong equal to .0017? ANSWER: If we were willing to risk a 5% false alarm probability, then surely we would risk an even lower probability of being wrong in announcing H1 . Hence, we should announce H1 .

Comment: In both cases we ended up announcing H1 . So, what does it matter which case we abide by?

The answer, with a little thought, should be obvious. In case 1 we are announcing H1 with a 5% probability of being wrong; whereas in case 2 we are announcing H1 with a 0.17% probability of being wrong. In other words, we were willing to take much less risk, and even then we announced H1 . □

The false alarm probability obtained in case 2 of the above example has a name. It is called the p-value of the test. It is the smallest false alarm probability that we can achieve in using the given data to announce

H1 .

As far as the problem goes, as stated in the textbook, we are finished. However, we have ignored the second type of error that we could make; namely, announcing H0 true, when, in fact, H1 is true. And there is a reason that this error (called the type-2 error) is often ignored. It is because in the situation where H0 is true, we have a numerical value for  X (in this case,  X  42 ). However, if H1 is true, then all we know is that  X  42 . And so, to compute a type-2 error probability, we need to specify a value for  X .

Suppose, for example, that we assume  X  45 . Suppose further that we maintain the threshold X  45 X  42 T  ~ t T  tth  3.53 . Now we have 45 2 24 , while our chosen test statistics remains 42 2 . S25 / n S25 / n

And so, the question becomes: How does the random variable T42 relate to the random variable T45 ? To answer this question, write: 14

X  42 X  45  3 3  T    T  T W 42 2 2 45 2 45 (4.1) S25 / n S25 / n S25 / n 3  W where 2 is a random variable; and not a pleasant one, at that! To see why this point is S25 / n 2 important, let’s presume for the moment that we actually know  X . In this case, we would have X  45 X  42 3  Z  ~ N(0,1) T   Z   Z  c 2 . Hence, 42 2 2 45 . In words, T42 is a normal random  X / n  X / n  X / n 3  c variable with mean 2 and with variance equal to one. Using the above data, assume  X / n 2 2 2  X / n  7 / 5  1.4 . Then c  2.143. This pdf is shown below.

0 . 4

0 . 3 5

0 . 3

0 . 2 5

0 . 2

0 . 1 5

0 . 1

0 . 0 5

0 - 2 - 1 0 1 2 3 4 5 6

Figure 4.2 The pdf for T42 ~ N(0.467,1) . Now let’s return to the above two cases.

Case 1 (continued): Our decision rule for announcing H1 with a 5% false alarm probability relied on the event [| T42 | 2.064] . This event is shown as the green double arrow in Figure 4.2. The probability of this event is computed as

Pr[| T42 | 2.064]  normcdf(2.064,2.143,1) – normcdf(-2.064,2.143,1) = 0.47.

Hence, if, indeed, the true mean is  X  45 , then under the assumption that the CLT is valid, we have a

47% chance of announcing that  X  42 using this decision rule.

Case 2 (continued): Our decision rule for announcing H1 with false alarm probability 0.17% relied on the event [| T42 | 3.53]. This event is shown as the red double arrow in Figure 4.2. The probability of this event is computed as

Pr[| T42 | 3.53]  normcdf(3.53,2.143,1) – normcdf(-3.53,2.143,1) = 0.92. 15

Hence, if, indeed, the true mean is  X  45 , then under the assumption that the CLT is valid, we have a

92% chance of announcing that  X  42 using this decision rule.

In conclusion, we see that the smaller that we make the false alarm probability, the greater the type-2 error will be. Taken to the extreme, suppose that no matter what the data says, we will simply not announce H1 . Then, of course our false alarm probability is zero, since we will never sound that alarm. However, by never sounding the alarm, we are guaranteed that with probability one, we will announce

H0 when H1 is true. A reasonable question to ask is: How can one select acceptable false alarm and type-2 error probabilities. One answer is given by the Neymann Pearson Lemma. We will not go into this lemma in these notes. The interested reader might refer to: http://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma

Finally, we are in a position to return to the problem entailed in (3.1), where we see that we are not subtracting a number (which would merely shift the mean), but we are subtracting a random variable.

Hence, one cannot simply say that (3.1) is a mean-shifted t24 random variable. We will leave it to the interested reader to pursue this further. One simple approach to begin with would use simulations to arrive at the pdf for (3.1). □

Example 4.2 (Problem 8.79 on p.293 of the textbook) A random sample of size n=12 from a normal population has (sample) mean x  27.8 and (sample) variance s2  3.24 . If we base our decision on the statistic of Theorem 8.13 (i.e. Theorem 2.5 of these notes), can we say that the given information supports the claim that the mean of the population is  X  28.5 ? X   T  X ~ t Solution: The statistic to be used is 2 11 . And so, the corresponding measurement of T Sn / n 27.8  28.5 obtained from the data is t   1.347 . Since the sample mean x  27.8 is less than the 3.24 /12 speculated true mean  X  28.5 under H0 , a reasonable hypothesis setting would ask the question: Is

27.8 really that much smaller than 28.5. This leads to the decision rule: [T  1.347]  Announce H1 . X  28.5 T  ~ t In the event that, indeed, 2 11 , Pr[T  1.347]  0.1025 . In words, there is a 10% Sn / n chance that a measurement of T would be contained in the interval (,1.347] if, indeed,  X  28.5 .

And so now, one needs to decide whether or not one should announce H1 . One the one hand, one could announce H1 with a reported p-value of 0.10 and let management make the final announcement. On the other hand, if management has, for example, already ‘laid down the rule’ that it will not accept a false alarm probability greater than, say, 0.05, then one should announce H0 . □ We now proceed to some examples concerning tests for an unknown variance. 16

Example 4.3 (EXAMPLE 13.6 on p.410 of the textbook) Let X denote the thickness of a part used in a 18 2 semiconductor. Using the iid data collection variables {X k }k1 it was found that s  0.68. The 2 manufacturing process is considered to be in control if  X  0.36 . Assuming X is normally distributed, 2 2 test the null hypothesis H0 : X  0.36 against the alternative hypothesis H1 : X  0.36 at a 5% significance level (i.e. a 5% false alarm probability).

  2 (17) X 2 Solution: From Theorem 2.3 we have Q  2 ~ 17 . From the data, assuming H0 is true, we have  X (18 1)(0.68) q   32.11. The threshold value, q is obtained from Pr[Q  q ]  .05 . 0.36 th th

Hence, qth = chi2inv(.95,17) = 27.587. Hence, we will announce H1 .

Now let’s obtain the p-value for this test. If we use qth  32.11, Then Pr[Q  qth ]  .0146 is the p- value.

Next, suppose that instead of using the sample mean (which is, by the way, not reported) one had used the   2 (18) X 2 design mean thickness. Then we have Q  2 ~ 18 . In this case, Pr[Q  qth ]  .0213 is the p-  X value. And so, at a 5% significance level, we would still announce H1 . □

Example 4.4 Suppose that X = The act of recording the number of false statements of a certain news network in any given day. We will assume that X has a Poisson distribution, with mean  X  2 . Let 5 {X k }k1 ~ iid X be the number of false statements throughout any 5-day work week. In order to determine whether or not the network has decided to increase the number of false statements, we will test  1 5 H0 :  X  2 versus H0 :  X  2 , where  X  X   X k is the average number of false statements 5 k1 in any given week. We desire a false alarm probability of ~10%.

QUESTION: How should we proceed?

ANSWER: Simulations! %PROGRAM NAME: avgdefects.m nsim = 40000; n = 5; mu = 2; x=poissrnd(mu,n,nsim); muhat = mean(x); db = 1/n; 17 bctr = 0:db:5; fmuhat = nsim^-1 *hist(muhat,bctr); figure(1) stem(bctr,fmuhat) hold on pause for m = 1:10 x=poissrnd(mu,n,nsim); muhat = mean(x); db = 1/n; bctr = 0:db:5; fmuhat = nsim^-1 *hist(muhat,bctr); stem(bctr,fmuhat) end 18

6.5 Confidence Intervals The concept of a confidence interval for an unknown parameter, θ, is intimately connected to the concept of simple hypothesis testing. To see this, consider the following example.

Example 5.1 Let X = The act of recording the body temperature of a person who is in deep meditation. 2 n We will assume that X ~ N( X , X ) . The data collection random variables {X k }k1 are to be used to o investigate the unknown parameter  X . We will assume that  X  1.2 F is known and that as sample  size n=25 will be used. The estimator  X  X will be used in the investigation.

Hypothesis Testing:

In the hypothesis testing portion of the investigation we begin by assuming that, under H 0 , the parameter o  X is a known quantity. Let’s assume that it is 98.6 F. In other words, we are assuming that deep meditation has no influence on  X . Since we are investigating whether or not meditation has an effect on

 X in any way, then out test, formally is:

o o H0 :  X  98.6 F versus H1 :  X  98.6 F .

 (0) o Our decision rule for this test is, in words: If  X  X is sufficiently close to  X  98.6 F , then we will announce H0 . The event, call it A, that is described by these words then has the form:

 (0) A  [|  X   X |  th ] .

Here, the number th is the threshold temperature difference . Suppose that we require that, in the event that H 0 , we have

 (0) Pr(A)  Pr[|  X   X |  th ]  .90 . (5.1) In other words, we require a false alarm probability (or significance level) of size 0.10.

  X  X In the last section, we used the standard test statistic: Z  . Then, assuming that H is, indeed,   0 X     (0) true, it follows that Z  X X ~ N(0,1) . To convert (5.1) into an event involving this Z, we use the   X concept of equivalent events:

 (0)  (0)  X   X th Pr(A)  Pr[|  X   X |  th ]  Pr[  ]  Pr{| Z | zth ]  0.90 . (5.2)     X X From (5.2) we obtain 19

zth  norminv(0.95 , 0 , 1) = 1.645. (5.3) Hence, if the magnitude of

 (0)  X   X Z  exceeds 1.645, then we will announce H1 .   X

Now, instead of going through the procedure of using the standard test statistic, Z, let’s use (5.1) more directly. Specifically,

  (0)  (0)  . (5.4) Pr[ X   X  th ]  Pr[ X   X th ] Pr[ X  xth ]  0.05

Because     / n  1.2/ 25  0.21 , from (5.3) we obtain X X

o xth  norminv(0.95 , 98.6 , 0.21) = 98.945 F. (5.5)

Hence, there was no need to use the standard statistic! Furthermore, the threshold (5.5) is in units of oF, making it much more attractive that the dimensionless units of (5.3).

QUESTION: Given that using (5.1) to arrive at (5.5) is much more direct than using (5.2) and (5.3), then why in the world did we not use the more direct method in the last section? ANSWER: The method used in the last section is a method that has been used for well over 100 years. It was the only method that one could use when only a unit-normal z-table was available. Only in very recent years has statistical software been designed that precludes the use of this table. However, there are random variables other than the normal type for which a standardized table must be used. One example is

T ~ tn . A linear function of T is NOT a T random variable; whereas a linear function of a normal random variable IS a normal random variable.

Confidence Interval Estimation: We begin here with the same event that we began hypothesis testing with; namely

 (0) A  [|  X   X |  th ] We then form the same probability that we formed in hypothesis testing; namely  Pr(A)  Pr[|  X   X |  th ]  .90 .

A close inspection of equations (5.1) and (5.6) will reveal a slight difference. Specifically, in (5.1) we (0) o have the parameter  X , which is the known value of  X (=98.6 F) under H 0 . In contrast, the parameter

 X in (5.6) is not assumed to be known. The goal of this investigation is to arrive at a 90% 2-sided confidence interval for this unknown  X . From (5.2) we have: 20

     Pr(A)  Pr[|    |   ]  Pr[1.645  X X  1.645]  .90 . X X th 0.21

Finally, using the concept of equivalent events, we have

    Pr[ X 1.645(0.21)   X   X  1.645(0.21)]  Pr[ X .345   X   X  .345]  .90 . (5.6)

Notice that the event A has been manipulated an equivalent event that is

  [ X .345 , X  .345] (5.7)

It is the interval (5.7) that is called the 90% 2-sided Confidence Interval (CI) for the unknown mean  X .

Remark 5.1 The interval (5.7) is an interval whose endpoints are random variables. In other words, the CI described by (5.7) is a random endpoint interval. It is NOT a numerical interval. Even though (5.7) is a random endpoint interval, the interval width is 0.69; that is, it is not random.

 o To complete this example, suppose that after collecting the data, we arrive at  X  97.2 F . Inserting this number into (5.7) gives the numerical interval [96.855 , 97.545] . The interval is not a CI. Rather it is an estimate of the CI. Furthermore, one cannot say that Pr[96.855   X  97.545]  0.9 . Why?

Because even though  X is not known, it is a number; not a random variable. Hence, this equation makes no sense. Either the number  X is in the interval [96.855 , 97.545] or it isn’t. Period! For example, what would you say to someone who asked you the probability that the number 2 is in the interval [1 , 3]? Hopefully, you would be kind in your response to such a person, when saying “Um… the number 2 is in the interval [1 , 3] with probability one.” In summary, a 90% two-sided CI estimate should not be taken to mean that the probability that  X is in that interval is 90%. Rather, it should be viewed as one measurement of a random endpoint interval, such that, were repeated measurements to be conducted again and again, then overall, one should expect that 90% of those numerical intervals will include the unknown parameter  X . □ A General Procedure for Constructing a CI:  Let  be an unknown parameter for which a (1)% 2-sided CI is desired, and let  be the associated point estimator of  .   Step 1: Identify a standard random variable W that is a function of both  and  ; say, W  g(, ) .  Step 2: Write: Pr[w1  W  w2 ]  Pr[w1  g(, )  w2 ]  1 . Then, from this, find the numerical values of w1 and w2 . Step 3: Use the concept of equivalent events to arrive at an expression of the form:    Pr[w1  W  w2 ]  Pr[w1  g(, )  w2 ]  Pr[h1(w1, )    h2 (w2 , )]  1 . 21

  The desired CI is then the random endpoint interval [h1(w1, ),h2 (w2 , )]. □

The above procedure is now demonstrated in a number of simple examples.

  Example 5.2 Arrive at a 98% two-sided CI estimate for  X , based on  X  4.4 ,  X  .25 and n=10. 10 2 Assume that the data collection variables {X k }k1 are iid N( X , X ) .

 2 Solution: Even though the sample mean  ~ N( , /10) , we do not have knowledge of  X , and  X X X so will need to use  X .Since n=10 is too small to invoke the CLT, we will use the test statistic   X   X Step 1: T   ~ t9 .  X / 10

Step 2: Pr[t1  T  t2 ]  .98  t2  tinv(.99,9) = 2.82 and t1 = -2.82.    X   X      Step 3: [t1  T  t2 ]  t1  T   t2    X  t2 X / 10   X   X  t2 X / 10. Hence, the   X / 10      desired CI is the random endpoint interval  X  2.82 X / 10 ,  X  2.82 X / 10. Using the above data, we arrive at the CI estimate: 4.4  2.82(.25) / 10 ,4.4  2.82(.25) / 10  4.4  .223 . □

Example 5.3 In an effort to estimate the variability of the temperature, T, of a pulsar, physicist used n=100 random measurements. The data gave the following statistics:

 25 o  25 o T  2.110 K and  T  .000810 K .

Obtain a 95% 2-sided CI estimate for  T .

 1 100  2 2 (T )2  Solution: We will begin by using the point estimator of  T :  T   k  T where T  T . 99 k1

 2 (n 1) X 2 Step 1: The most common test statistic in relation to variance is   2 ~ n1 (c.f. Theorem  X 2.7).

Step 2: Pr[x1    x2 ]  0.95 gives Pr[  x1]  0.025  x1  chi2inv(.025,99) = 73.36

and Pr[  x2 ]  0.9755  x2  chi2inv(.975,99) = 128.42 . 22

 2  (n 1) X   x1 1 x2  x1  2  x2     2  2   2    X  (n 1) X  X (n 1) X  Step 3:     . (n 1) 2 (n 1) 2   (n 1) 2 (n 1) 2   X   2  X   X    X   x X x  x X x  2 1   2 1     99 2 99 2  Hence, the 95% 2-sided CI is the random endpoint interval  X , X  . From this and the  128.42 73.36  above data, we obtain the CI estimate:

 99(.00082 ) 99(.00082 )   ,   [0.00070 , 0.00093] .  128.42 73.36 

100  2 1  2 Now, because  T  (Tk  T ) is an average of iid random variables, and because n=100 would 99 k1 seem to be large enough to invoke the CLT, repeat the above solution with this knowledge. Solution:  (n 1) 2    2    X ~  2 2  2   X  2 n1 has mean   n 1 and variance    2(n 1) . Writing X   ,  X  n 1 then it follows that

2 2   X    X  2   2      (n 1)         X X  n 1  n 1 and

2 2 2 2 4 2   X  2   X  2 X   2        2(n 1)  .       X  n 1  n 1 n 1

And so, from the CLT, we have   2  2  Z  X X ~ N(0,1) Step1: 4 . 2 X n 1

Step 2: Pr[z1  Z  z2 ]  .95  z2  1.96 and z1  1.96 .

Step 3: 23

    2 2   2 2    2   2  [z  Z  z ]  z  X X  z   z  X X  z  1 2 1 4 2  1 2 2   2   n 1  X n 1  X   n 1 

    2  2 2   1  2 1  X  X  1 z1  2  1 z2     2    n 1  X n 1  2  X 2  1 z2 1 z1  n 1 n 1 

          2  2    2  2  X 2 X  X X      X      X    2 2   2 2  1 z2 1 z1   1 z2 1 z1   n 1 n 1   n 1 n 1 

Hence, the desired CI is the random endpoint interval           X  X    X  X  ,   ,  .  2 2  1.13 0.85  11.96 11.96   99 99 

      .0008 .0008 And so the CI estimate is: X , X  ,  [.00071,.00094]. 1.13 0.85  1.13 0.85  This interval is almost exactly equal to the interval obtained above. 

Example 5.4 Suppose that it is desired to investigate the influence of the addition of background music in drawing customers into your new café during the evening hours. To this end, you will observe pedestrians who walk past your café while no music is playing, and while music is playing. Let X = The act of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when no music is playing. Let Y = The act of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when music is playing.

 Suppose that you obtained the following data-based results: p  0.13 for sample size n  100 , and  X X pY  0.18 for sample size nY  50 . Obtain a 95% 1-sided (upper) CI for   pY  pX . Solution: We will assume that the actions of the pedestrians are mutually independent. Then we have 24

  nX pX ~ Binomial(nX , pX ) is independent of nY pY ~ Binomial(nY , pY ) .

  Assumption: For a first analysis, we will assume that the CLT holds in relation to both pX and pY . This would seem to be reasonable, based on the above p-value estimates and the chart included in Homework #6, Problem 1.

   p (1 p )     p (1 p )   X X   Y Y  Then pX ~ N pX ,  and pY ~ N pY ,  . Hence  nX   nY 

     p (1 p ) p (1 p )  2   p  p ~ N p  p , X X  Y Y   N(,  ) Y X  Y X    nX nY 

 2 p (1 p ) p (1 p )  X X Y Y where we defined    . The standard approach [See Theorem 11.8 on p.365 nX nY of the text] is to use:      2 p (1 p ) p (1 p ) 0.13(0.87) 0.18(0.82)  X X Y Y       0.0041. nX nY 100 50       Then   0.064 , and so Z  ~ N(0,1) , and with Pr[Z  zth ]  0.95  zth  1.645 .  .064 Hence, we have the following equivalent events:          [Z  zth ]     1.645  [   1.645 ]  .064  [   .105] .      

Recall that the smallest possible value for   pY  pX is -1. Hence, the CI estimate is: [-1 , .05+.105] = [-1 , .155] . Conclusion: Were you to repeat this experiment many times, you would expect that 95% of the time, the increase in clientele due to music would be no more that ~15%.

     2 pX (1 pX ) pY (1 pY )        Investigation of  and its influence on the CI [1, 1.645 ] - nX nY Note: The primary intent of this section is to illustrate that: (i) the CI is a random variable, (i.e. the obtained CI estimate is simply one measurement of this variable), and (ii) the CLT assumption depends upon the size of the events being considered in relation to the size of the increments of the actual sample space for the variable. This section can be skipped with no loss of basic understanding of the above. This material could serve as the basis of a project, as it demonstrates a number of major concepts that could be elaborated upon (e.g. sample spaces, events, estimators, simulation-based pdf’s of more complicated estimators.) 25

In the above analysis we used estimates of the unknown parameters p and p to arrive at an estimate of X Y    the standard deviation  . The value used, .064, is a single measurement of the random variable  . In this section we will investigate the properties of this random variable using simulations. Specifically, we   will use pX  0.13 and pY  0.18 , and 10,000 simulations of  . The resulting simulation-based

f   (x) estimate of the pdf  is shown in Figure 5.1.

S i m u l a t i o n - B a s e d E s t i m a t e o f s t d ( t h e t a h a t ) 7 0

6 0

5 0

4 0

3 0

2 0

1 0

0 0 . 0 3 0 . 0 4 0 . 0 5 0 . 0 6 0 . 0 7 0 . 0 8 0 . 0 9

f   (x) Figure 5.1 Simulation-based (nsim = 10,000) estimate of  .

   .064     .006 In relation to this figure, we have   ; which is the true value of  . We also have   .   

f  (x) From Figure 5.1, however, we see that   is notably skewed. Hence, this should be noted if one      .006  chooses to use   as a measure of the level of uncertainty of  .   We are now proceed to investigate the random variable that is the upper random endpoint of the CI     [1 ,  1.645 ]. To this end, we begin by addressing the probability structure of  . Because      p  p  Y  X is the difference between two averages, and because it would appear that Y X  nX  100 and nY  100 are sufficiently large that the CLT holds; that is,  should be expected to have  f (x) that is bell-shaped (i.e. normal). In any case, we have:

p (1 p ) p (1 p )    p  p  0.05  X X Y Y  Y X and     0.064 nX nY The simulations that resulted in Figure 5.1 also resulted in Figure 5.2 below.

S i m u l a t i o n - B a s e d E s t i m a t e o f t h e p d f o f t h e t a h a t 6

5

4

3

2

1

0 - 0 . 4 - 0 . 3 - 0 . 2 - 0 . 1 0 0 . 1 0 . 2 0 . 3 26

f  (x) Figure 5.2 Simulation-based (nsim = 10,000) estimate of  using a 10-bin scaled histogram.

f  (x) Now, based on Figure 5.2, one could reasonably argue that the pdf,  , has a normal distribution- RIGHT? Well? Let’s use a 50-bin histogram instead of a 10-bin histogram, and see what we get. We are, after all, using 10,000 simulations. And so we can afford finer resolution. The result is shown in Figure 5.3.

S i m u l a t i o n - B a s e d E s t i m a t e o f t h e p d f o f t h e t a h a t 1 0

9

8

7

6

5

4

3

2

1

0 - 0 . 4 - 0 . 3 - 0 . 2 - 0 . 1 0 0 . 1 0 . 2 0 . 3

f  (x) Figure 5.3 Simulation-based (nsim = 10,000) estimate of  using a 50-bin scaled histogram. There are two peaks that clearly stand out in relation to an assumed bell curve shape. Furthermore, they real, in the sense that they appear for repeated simulations! Conclusion 1: If one is interested in events that are much larger than the bin width resolution, which in Figure 5.3 is ~0.011, then the normal approximation in Figure 5.2 may be acceptable. However, if one is interested in events on the order of 0.011, then Figure 5.3 suggests otherwise.  The fact of the matter is that  is discrete random variable. To obtain its sample space, write    1 50 1 100   pY  pX  Y  X  Yk   X k . 50 k1 100 k1

 The smallest element of S is -1. The event {-1} is equivalent to[Y1  Y2    Y50  0  X1  X 2    X100 1] . 100 The next smallest event is {-0.99}. This event is equivalent to [Y1  Y2    Y50  0   X k  99]. k1 Clearly, the explicit computation of such equivalent events will become more cumbersome as we proceed. As an alternative to this cumbersome approach, we will, instead, simply use a bin width that is

f  (x) much smaller than 0.01 in computing the estimate of  . Using not 50, but 500 bins resulted in Figure 5.4 below. 27

S i m u l a t i o n - B a s e d E s t i m a t e o f t h e p d f o f t h e t a h a t

5 0

4 0

3 0

2 0

1 0

0 - 0 . 1 5 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 . 2 5 0 . 3

S i m u l a t i o n - B a s e d E s t i m a t e o f t h e p d f o f t h e t a h a t

5 0

4 0

3 0

2 0

1 0

0 0 . 0 6 0 . 0 6 5 0 . 0 7 0 . 0 7 5 0 . 0 8 0 . 0 8 5 0 . 0 9 0 . 0 9 5 0 . 1

f  (x) Figure 5.4 Simulation-based (nsim = 10,000) estimate of  using a 500-bin scaled histogram. The lower plot includes the zoomed region near 0.08.

 What we see from Figure 5.4 is that the elements of S are not spaced at uniform 0.01 increments throughout. At the value x  0.08 we see that the spacing is much smaller than 0.01. Hence, it is very  likely that the largest peak in Figure 5.3 is due to this unequal spacing in S .

 Now that we have an idea as to the structures of the marginal pdf’s f  (x) and f  (x) , we need to    investigate their joint probability structure. To this end, the place to begin is with a scatter plot of    versus  . This is shown in Figure 5.6. 28

S c a t t e r P l o t o f t h e t a h a t v s s t d ( t h e t a h a t ) 0 . 0 8 5

0 . 0 8

0 . 0 7 5

0 . 0 7

0 . 0 6 5

0 . 0 6

0 . 0 5 5

0 . 0 5

0 . 0 4 5

0 . 0 4

- 0 . 2 - 0 . 1 5 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 . 2 5 0 . 3    Figure 5.6 A scatter plot of  versus  for nsim =10,000 simulations.         0.6  Clearly, Figure 5.6 suggests that  and  are correlated. We obtained and Cov( , )  .0002 ].    Finally, since CI  [1 ,  1.645 ] , we have:    (i) E( 1.645 )]  0.05 +1.645(.06) = 0.15 , and    2    Var( 1.645  )  Var( ) 1.645 Var(  )  2(1.645)Cov( ,  ) (ii)    .0632 1.6452 (.0062 )  2(1.645)(.00023)  .005    or, Std( 1.645 )  .07

  2  The pdf for assumed CIupper   1.645 ~ N(.06,.07 ) is overlaid against the simulation-based pdf in the figure below.

7

6

5

4

3

2

1

0 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 . 2 5 0 . 3 0 . 3 5 0 . 4

  2  Figure 5.7 Overlaid plots of the pdf of CIupper   1.645 ~ N(.15,.07 ) and the corresponding    simulation-based pdf, f  (x) , for   p  p  Y  X .  Y X nY 50 nY 100

Comment: Now, the truth of the matter in the chosen setting (the value of simulations- you know the truth. !) is that   pY  pX  .18 .13  .05 . While this increase due to playing music may seem minor, it is, in fact, an increase of .05/.13, or 38% in likely customer business! This is significant by any    business standards. But is the (upper) random endpoint interval CI  [1 ,  1.645 ] , in fact, a 95% CI estimator? Using the normal approximation in Figure 5.7, the probability to the left of the dashed line 29

is: Pr[CIupper  .05]  norminv(.05,.15,.07) = 0.035. And so, we actually have a 96.5% CI estimator. While this is only 1.5% larger than a 95% CI, it is an increase in the level of confidence.

Summary & Conclusion This example to illustrate how one typically accommodates the unknown standard deviation associated with the estimated difference between two proportions. Because this standard deviation is a function of the unknown proportions, one typically uses the numerical estimates of the same to arrive at a numerical value for it. Hence, the estimator of the standard deviation is a random variable. Rather than using a direct mathematical approach to investigating its influence on the CI, it was easier to resort to simulations. In the course of this simulation-based study, it was discovered that the  actual pdf of  can definitely NOT be assumed to be normal if one is interested in event sizes on the    order of 0.01. It was also found that  and  are correlated; as opposed to the case of normal random variables, where they are independent. For larger event sizes, it was found that the random variable is well-approximated by a normal pdf. It was also found that it corresponds not to a 95% CI but to a 96.5% CI. A one-sided CI was chosen simply to illustrate contrast to prior examples that involved 2-sided CI’s. It would be interesting to re-visit this example not only with other types of CI’s but also in relation to hypothesis testing. Finally, to tie this example to hypothesis testing, consider the hypothesis test:

H0 :  0 versus H0 :  0 .

Furthermore, suppose that we choose a test threshold th  0.05. Then if, in fact,   0.05 , the type-2 error is 0.5 (see Figure 5.4).[Note: The Matlab code used in relation to this example is included in the Appendix.] □

6.6 Summary This chapter is concerned with estimating parameters associated with a random variable. Section 6.2 included a number of random variables associated with this estimation problem. Many of these random variables (in this context they are also commonly referred to as statistics) relied on the CLT. Section 6.3 addressed some commonly estimated parameters and the probabilistic structure of the corresponding estimators. The results in sections 6.2 and 6.3 were brought to bear on the topics of hypothesis testing and confidence interval estimation in sections 6.4 and 6.5, respectively. In both topics the distribution of the estimator plays the central role. The only real difference between the two topics is that in confidence interval estimation the parameter of interest is unknown, whereas in hypothesis testing it is known under

H0. 30

Appendix Matlab code used in relation to Example 5.4 %PROGRAM NAME: example54.m px=0.13; nx = 100; py=0.18; ny = 50; nsim = 10000; x = ceil(rand(nx,nsim) - (1-px)); y = ceil(rand(ny,nsim) - (1-py)); mx = mean(x); my = mean(y); thetahat = my - mx; %vx = nx^-1 *mx.*(1-mx); %vy = ny^-1 *my.*(1-my); vx = var(x)/nx; vy = var(y)/ny; vth = vx + vy; sth = vth.^0.5; msth = mean(sth) ssth = std(sth) vsth = ssth^2; figure(1) db = (.09 - .03)/50; bvec = .03+db/2 : db : .09-db/2; fsth = (nsim*db)^-1 *hist(sth,bvec); bar(bvec,fsth) title('Simulation-Based Estimate of the pdf of std(thetahat)') grid pause figure(2) db = (.3 +.25)/500; bvec = -.25+db/2 : db : .3-db/2; fth = (nsim*db)^-1 *hist(thetahat,bvec); bar(bvec,fth) title('Simulation-Based Estimate of the pdf of thetahat') grid mthetahat = mean(thetahat) sthetahat = std(thetahat) vthetahat = sthetahat^2 varthetahat = px*(1-px)/nx + py*(1-py)/ny pause figure(3) plot(thetahat,sth,'*') title('Scatter Plot of thetahat vs std(thetahat)') grid pause %Compute mean and std dev of CI cmat = cov(thetahat,sth); cov_th_sth = cmat(1,2) mCI = mthetahat + 1.645*msth vCI = vthetahat+1.645^2*vsth + 2*1.645*cov_th_sth; sCI = vCI^.5 pause xx = -.1:.0001:.4; fn = normpdf(xx,mCI,sCI); CIsim = thetahat +1.645*sth; db = (.4+.1)/50; bvec = -.1+db/2:db:.4-db/2; 31 fCIsim = (db*nsim)^-1 *hist(CIsim,bvec); figure(4) bar(bvec,fCIsim) hold on plot(xx,fn,'r','LineWidth',2) grid