Exact Confidence Intervals

Math 541: Statistical Theory II Exact Con¯dence Intervals Instructor: Songfeng Zheng Con¯dence intervals provide an alternative to using an estimator θ^ when we wish to estimate an unknown parameter θ. We can ¯nd an interval (A; B) that we think has high probability of containing θ. The length of such an interval gives us an idea of how closely we can estimate θ. In some situations, we can ¯nd the mathematical formula for the sampling distribution of the estimator, and we can further make use of this sampling distribution to get a con¯dence interval for the parameter. The con¯dence interval obtained in this case are called exact con¯dence intervals. To ¯nd an exact con¯dence interval, one need to know the distribution of the population to ¯nd out the sampling distribution of the statistic used to estimate the parameter. Once you know the sampling distribution of the statistic, you can construct the interval. Suppose we have a random sample X1; ¢ ¢ ¢ ;Xn from a population distribution, and the parameter of interest is θ. Given a value ® 2 (0; 1), we want to construct a 1 ¡ ® con¯dence interval. Usually ® takes values 0.01, 0.02, 0.05. The general tricks to construct an exact con¯dence interval for θ is: 1. Find a variable that is a function of the data and of the parameter. Call this function h, and denote it as h(X1; ¢ ¢ ¢ ;Xn; θ). 2. The distribution of this newly created variable should not depend on the parameter. 3. Using the distribution of h(X1; ¢ ¢ ¢ ;Xn; θ), let a and b be the values of h such that the probability that h is between these values is 1 ¡ ®, i.e. P (a · h · b) = 1 ¡ ®. Then we manipulate the relation so that we can ¯nd a lower and upper bound within which the parameter could be contained, i.e. P (a · h(X1; ¢ ¢ ¢ ;Xn; θ) · b) = 1¡® =) P (L(X1; ¢ ¢ ¢ ;Xn) · θ · U(X1; ¢ ¢ ¢ ;Xn)) = 1¡® Several Notes: ² A function that satis¯es conditions (1) and (2) is called a pivot. For example, if we have 2 a random sample X1; ¢ ¢ ¢ ;Xn from a normal distribution N(¹; σ ) where ¹ is unknown 1 2 but σ is known. We de¯ne function h to be X ¡ ¹ h(X ; ¢ ¢ ¢ ;X ; ¹) = p ; 1 n σ= n then the function h depends on both the data and the unknown parameter ¹, and the distribution of h is N(0; 1) which does not depend on ¹, therefore it is a pivot. ² Please note that pivot is not a statistic, because pivot contains the unknown parameter, but statistic cannot contain any unknown parameter. We use statistic to estimate a parameter, therefore a statistic must be computable from the sample data, and this is why statistic cannot contain unknown parameter. On the other hand, in constructing con¯dence interval, we want to manipulate the pivot to get an interval about the unknown parameter, so a pivot must contain the unknown parameter. ² In the third step above, when choosing a and b, such that P (a · h · b) = 1 ¡ ®, we want the interval length b ¡ a as small as possible. The shorter the interval, the more precise it is. Usually when the distribution of h is symmetric, obviously the interval [a; b] should be symmetric as well. 2 Example 1: Suppose X1; ¢ ¢ ¢ ;Xn from a normal distribution N(¹; σ ) where ¹ is unknown but σ is known. Find a 1 ¡ ® con¯dence interval for ¹. X1+¢¢¢+Xn Solution: We usually use X = n to estimate the parameter ¹, and, as the above shows, we de¯ne the pivot as X ¡ ¹ h(X ; ¢ ¢ ¢ ;X ; ¹) = p 1 n σ= n and h(X1; ¢ ¢ ¢ ;Xn; ¹) » N(0; 1) therefore Ã ! µ®¶ X ¡ ¹ µ ®¶ P z · p · z 1 ¡ = 1 ¡ ® 2 σ= n 2 where z(®) is de¯ned as Z z(®) Á(x)dx = ® ¡1 and Á(x) is the density function of standard normal distribution. Because normal distribution is symmetric about 0, we have µ®¶ µ ®¶ z = ¡z 1 ¡ 2 2 and µ ®¶ X ¡ ¹ µ ®¶ µ ®¶ σ µ ®¶ σ ¡z 1 ¡ · p · z 1 ¡ , X ¡ z 1 ¡ p · ¹ · X + z 1 ¡ p 2 σ= n 2 2 n 2 n 3 so, ( ) µ ®¶ σ µ ®¶ σ P X ¡ z 1 ¡ p · ¹ · X + z 1 ¡ p = 1 ¡ ® 2 n 2 n That is, the 1 ¡ ® con¯dence interval for ¹ is " # µ ®¶ σ µ ®¶ σ X ¡ z 1 ¡ p ; X + z 1 ¡ p 2 n 2 n 2 Example 2: Suppose X1; ¢ ¢ ¢ ;Xn from a normal distribution N(¹; σ ) where ¹ is known and σ is unknown. Find a 1 ¡ ® con¯dence intervals for σ2. P c2 1 n 2 2 Solution: We use σ = n i=1(Xi ¡ ¹) to estimate σ , and we de¯ne the pivot to be nσc2 h = σ2 2 It was shown that h follows chi-square distribution with n degrees of freedom. Let Ân(®=2) 2 and Ân(1¡®=2) be the (®=2)£100-th and (1¡®=2)£100-th percentiles, respectively. Then 0 1 nσc2 P @Â2 (®=2) · · Â2 (1 ¡ ®=2)A = 1 ¡ ® n σ2 n therefore 0 1 nσc2 nσc2 @ 2 A P 2 · σ · 2 = 1 ¡ ® Ân(1 ¡ ®=2) Ân(®=2) So the 1 ¡ ® con¯dence interval for σ2 is 2 3 nσc2 nσc2 4 5 2 ; 2 Ân(1 ¡ ®=2) Ân(®=2) Note this interval is not symmetric, because the chi-square distribution is not symmetric. From the above two examples, we can see that the general procedure for getting a pivot in exact con¯dence interval is as following: ² ¯nd an estimator for θ; ² build a connection between the estimator and the parameter, usually this will give us some functions involving both the parameter and the estimator; ² among the many candidates obtained in last step, we choose the one which will give us standard distribution as the pivot. 4 2 Example 3: Suppose X1; ¢ ¢ ¢ ;Xn from a normal distribution N(¹; σ ) where both ¹ and σ are unknown. Find a 1 ¡ ® con¯dence intervals for ¹ and σ. Solution: The MLE or Method of moment estimate for ¹ and σ2 are Xn c2 1 2 ¹^ = X; σ = (Xi ¡ X) n i=1 De¯ne function p n(X ¡ ¹) h = 1 S where Xn 2 1 2 S = (Xi ¡ X) n ¡ 1 i=1 then h1 is a pivot and h1 » tn¡1, and tn¡1 stands for t distribution with n ¡ 1 degrees of freedom. Let tn¡1(®=2) and tn¡1(1 ¡ ®=2) be the (®=2) £ 100-th and (1 ¡ ®=2) £ 100-th percentiles, respectively. Then P (tn¡1(®=2) · h1 · tn¡1(1 ¡ ®=2)) = 1 ¡ ® Since t distribution is symmetric about 0, tn¡1(®=2) = ¡tn¡1(1 ¡ ®=2), therefore the above inequality reads Ã p ! n(X ¡ ¹) P ¡t (1 ¡ ®=2) · · t (1 ¡ ®=2) = 1 ¡ ® n¡1 S n¡1 The inequality can be manipulated to yield Ã ! S S P X ¡ t (1 ¡ ®=2)p · ¹ · X + t (1 ¡ ®=2)p = 1 ¡ ® n¡1 n n¡1 n Therefore the 1 ¡ ® con¯dence interval for ¹ is " # S S X ¡ t (1 ¡ ®=2)p ; X + t (1 ¡ ®=2)p n¡1 n n¡1 n and the probability of ¹ lies in the interval is 1 ¡ ®. This interval is symmetric about X. Now let us turn to a con¯dence interval for σ2. We de¯ne nσc2 h = 2 σ2 2 2 then h2 is a pivot and h2 » Ân¡1, where Ân¡1 is the chi-square distribution with n¡1 degrees 2 2 of freedom. Let Ân¡1(®=2) and Ân¡1(1 ¡ ®=2) be the (®=2) £ 100-th and (1 ¡ ®=2) £ 100-th percentiles, respectively. Then 0 1 nσc2 P @Â2 (®=2) · · Â2 (1 ¡ ®=2)A = 1 ¡ ® n¡1 σ2 n¡1 5 therefore, after manipulation, we have 0 1 nσc2 nσc2 @ 2 A P 2 · σ · 2 = 1 ¡ ® Ân¡1(1 ¡ ®=2) Ân¡1(®=2) So the 1 ¡ ® con¯dence interval for σ2 is 2 3 nσc2 nσc2 4 5 2 ; 2 Ân¡1(1 ¡ ®=2) Ân¡1(®=2) Of course, this can also be written as " # (n ¡ 1)S2 (n ¡ 1)S2 2 ; 2 Ân¡1(1 ¡ ®=2) Ân¡1(®=2) Example 4: con¯dence interval for the parameter ¸ of an exponential. A theoretical model suggests that the time to breakdown of an insulating ﬂuid between electrodes at a particular voltage has an exponential distribution with parameter ¸. A random sample of n = 10 breakdown times yields the following sample data (in minutes): 41.53, 18.73, 2.99, 30.34, 12.33, 117.52, 73.02, 223.63, 4.00, 26.78. We want to obtain a 95% con¯dence interval for ¸ and the average breakdown time ¹ = 1=¸. Solution: First let us prove that if X follows an exponential distribution with parameter 2 ¸, then Y = 2¸X follows an exponential distribution with parameter 1/2, i.e. Â2. The density function for X is f(xj¸) = ¸e¡¸x if x > 0 and 0 otherwise. It is easy to see the 1 ¡y=2 density function for Y is g(y) = 2 e for y > 0, and g(y) = 0 otherwise. Therefore Y has an exponential distribution with parameter 1=2, i.e. chi-square distribution with degree of freedom 2. Now, let us ¯rst ¯nd a pivot, de¯ne Xn Xn h(X1; ¢ ¢ ¢ ;Xn; ¸) = 2¸ Xi = Yi i=1 i=1 2 and each Yi = 2¸Xi follows Â2 distribution, and they are independent. Therefore h follows 2 Â2n distribution. 2 2 Let Â2n(®=2) and Â2n(1 ¡ ®=2) be the (®=2) £ 100-th and (1 ¡ ®=2) £ 100-th percentiles, respectively.

Load more