12 Interval estimation

12.1 Introduction

In Chapter 11, we looked into point estimation in the sense of giving single values or points as estimates for well-defined parameters in a pre-selected population den- sity/probability function. If p is the probability that someone contesting an election will win and if we give an estimate as p = 0.7, then we are saying that there is exactly 70% chance of winning. From a layman’s point of view, such an exact number may not be that reasonable. If we say that the chance is between 60 and 75%, it may be more acceptable to a layman. If the waiting time in a queue at a check-out counter in a grocery store is exponentially distributed with expected waiting time θ minutes, time being measured in minutes, and if we give an estimate of θ as between 5 and 10 minutes it may be more reasonable than giving a single number such as the expected waiting time is exactly 6 minutes. If we give an estimate of the expected life-time of individuals in a certain community of people as between 80 and 90 years, it may be more acceptable rather than saying that the expected life time exactly 83 years. Thus, when the unknown parameter θ has a continuous parameter space Ω it may be more reasonable to come up with an interval so that we can say that the unknown parameter θ is somewhere on this interval. We will examine such interval estimation problems here.

12.2 Interval estimation problems

In order to explain the various technical terms in this area, it is better to examine a simple problem and then define various terms appearing there, in the light of the il- lustrations.

Example 12.1. Let x1,…,xn be iid variables from an exponential population with den- sity 1 f (x, θ)= e−x/θ, x ≥ 0, θ > 0 θ = +⋯+ = u and zero elsewhere. Compute the densities of (1) u x1 xn; (2) v θ and then evaluate a and b such that Pr{a ≤ v ≤ b}=0.95.

Solution 12.1. The moment generating function (mgf) of x is known and it is Mx(t)= −1 −n (1−θt) ,1−θt > 0. Since x1,…,xn are iid, the mgf of u = x1 +⋯+xn is Mu(t)=(1−θt) , 1 − θt > 0 or u has a gamma distribution with parameters (α = n, β = θ). The mgf of v is −n available from Mu(t) as Mv(t)=(1−t) ,1−t > 0. In other words, v has a gamma density with the parameters (α = n, β = 1) or it is free of all parameters since n is known. Let the density of v be denoted by g(v). Then all sorts of probability statements can be made

Open Access. © 2018 ArakM. Mathai, Hans J. Haubold, published by De Gruyter. This workis licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. https://doi.org/10.1515/9783110562545-012 356 | 12 Interval estimation

on the variable v. Suppose that we wish to find an a such that Pr{v ≤ a} = 0.025 then we have a vn−1 ∫ e−vdv = 0.025. 0 Γ(n) We can either integrate by parts or use incomplete gamma function tables to obtain the exact value of a since n is known. Similarly, we can find a b such that ∞ vn−1 Pr{x ≥ b} = 0.025 ⇒ ∫ e−vdv = 0.025. b Γ(n) This b is also available either integrating by parts or from the incomplete gamma func- tion tables. Then the probability coverage over the interval [a, b] is 0.95 or

Pr{a ≤ v ≤ b} = 0.95.

We are successful in finding a and b because the distribution of v is free of all pa- rameters. If the density of v contained some parameters, then we could not have found a and b because those points would have been functions of the parameters involved. Hence the success of our procedure depends upon finding a quantity such as v here, which is a function of the sample values x1,…, xn and the parameter (or parameters) under consideration, but whose distribution is free of all parameters. Such quantities are called pivotal quantities.

Definition 12.1 (Pivotal quantities). A function of the sample values x1,…, xn and the parameters under consideration but whose distribution is free of all parameters is called a .

Let us examine Example 12.1 once again. We have a probability statement

Pr{a ≤ v ≤ b} = 0.95.

Let us examine the mathematical inequalities here. (x + ⋯ + x ) a ≤ v ≤ b ⇒ a ≤ 1 n ≤ b θ 1 θ 1 ⇒ ≤ ≤ b (x1 + ⋯ + xn) a (x + ⋯ + x ) (x + ⋯ + x ) ⇒ 1 n ≤ θ ≤ 1 n . b a Since these inequalities are mathematically identical, we must have the probability statements over these intervals identical. That is, (x + ⋯ + x ) (x + ⋯ + x ) (x + ⋯ + x ) Pr{a ≤ 1 n ≤ b} = Pr{ 1 n ≤ θ ≤ 1 n }. (12.1) θ b a Thus, we have converted a probability statement over v into a probability statement over θ. What is the difference between these two probability statements? The first one 12.2 Interval estimation problems | 357

says that the probability that the random variable falls on the fixed interval [a, b] is 0.95. In the second statement, θ is not a random variable but a fixed but unknown parameter and the random variables are at the end points of the interval or here the interval is random, not θ. Hence the probability statement over θ is to be interpreted [ u , u ] . as the probability for the random interval b a covers the unknown θ is 0 95. In this example, we have cut off 0.025 area at the right tail and 0.025 area at the left tail so that the total area cut off is 0.025 + 0.025 = 0.05. If we had cut off an area α 2 each at both the tails then the total area cut off is α and the area in the middle if 1 − α. In our Example 12.1, α = 0.05 and 1 − α = 0.95. We will introduce some standard notations which will come in handy later on.

Notation 12.1. Let y be a random variable whose density f (y) is free of all parame- ters. Then we can compute a point b such that from that point onward to the right

the area cut off is a specified number, say α. Then this b is usually denoted as yα or the value of y from there onward to the right the area under the density curve or probability function is α or

Pr{y ≥ yα} = α. (12.2)

Then from Notation 12.1 if a is a point below which of the left tail area is α then the point a should be denoted as y1−α or the point from where onward to the right the area under the curve is 1 − α or the left tail area is α. In Example 12.1 if we wanted to α compute a and b so that equal areas 2 is cut off at the right and left tails, then the first part of equation (12.1) could have been written as

Pr{v − α ≤ v ≤ v α } = 1 − α. 1 2 2

Definition 12.2 (Confidence intervals). Let x1,…, xn be a sample from the popula- tion f (x|θ) where θ is the parameter. Suppose that it is possible to construct two

functions of the sample values ϕ1(x1,…, xn) and ϕ2(x1,…, xn) so that the probabil- ity for the random interval [ϕ1, ϕ2] covers the unknown parameter θ is 1 − α for a given α. That is,

Pr{ϕ1(x1,…, xn) ≤ θ ≤ ϕ2(x1,…, xn)} = 1 − α

for all θ in the parameter space Ω. Then 1 − α is called the confidence coefficient,

the interval [ϕ1, ϕ2] is called a 100( 1 − α)% for θ, ϕ1 is called the lower confidence limit, ϕ2 is called the upper confidence limit and ϕ2 − ϕ1 the length of the confidence interval.

When a random interval [ϕ1, ϕ2] is given we are placing 100(1−α)% confidence on our interval saying that this interval will cover the true parameter value θ with proba- bility 1 − α. The meaning is that if we construct the same interval by using samples of 358 | 12 Interval estimation

the same size n then in the long run 100(1 − α)% of the intervals will contain the true parameter θ. If one interval is constructed, then that interval need not contain the true parameter θ, the chance that this interval contains the true parameter θ is 1 − α. In our ( +⋯+ ) ( +⋯+ ) Example 12.1, we were placing 95% confidence in the interval [ x1 xn , x1 xn ] to v0.025 v0.975 contain the unknown parameter θ. From Example 12.1 and the discussions above, it is clear that we will be successful in coming up with a 100(1 − α)% confidence interval for a given parameter θ if we have the following: (i) A pivotal quantity Q, that is, a quantity containing the sample values and the pa- rameter θ but whose distribution is free of all parameters. [Note that there may be many pivotal quantities in a given situation.] (ii) Q enables us to convert a probability statement on Q into a mathematically equiv- alent statement on θ.

How many such 100(1 − α)% confidence intervals can be constructed for a given θ, if one such interval can be constructed? The answer is: infinitely many. From our Ex- . α ample 12.1, it is seen that instead of cutting off 0 025 or in general 2 at both ends we could have cut off α at the right tail, or α at the left tail or any α1 at the left tail and α2 at the right tail so that α1 + α2 = α. In our example, vα ≤ v < ∞ would have produced an interval of infinite length. Such an interval may not be of much use because itisof infinite length, but our aim is to give an interval which covers the unknown θ with a given confidence coefficient 1 − α, and if we say that an interval of infinite length will cover the unknown parameter then such a statement may not have much significance. Hence a very desirable property is that the expected length of the interval is as short as possible.

Definition 12.3 (Central intervals). Confidence intervals, obtained by cutting off α equal areas 2 at both the tails of the distribution of the pivotal quantity so that we obtain a 100(1 − α)% confidence interval, are called central intervals.

It can be shown that if the pivotal quantity has a symmetric distribution then the central interval is usually the shortest in expected value. Observe also that when the length, which is the upper confidence limit minus the lower confidence limit, is taken, it may be free of all variables. In this case, the length and the expected length are one and the same.

12.3 Confidence interval for parameters in an exponential population

We have already given one example for setting up confidence interval for the parame- ter θ in the exponential population 12.3 Confidence interval for parameters in an exponential population | 359

1 − x f (x|θ) = e θ , x ≥ 0, θ > 0 θ ( +⋯+ ) = x1 xn and zero elsewhere. Our pivotal quantity was u θ where u has a gamma dis- tribution with the parameters (α = n, β = 1) where n is the sample size, which is known. Hence there is no parameter and, therefore, probabilities can be read from incomplete gamma tables or can be obtained by integration by parts. Then a 100(1 − α)% confi- dence interval for θ in an exponential population is given by

(x + ⋯ + x ) (x + ⋯ + x ) [ 1 n , 1 n ] = a 100(1 − α)% confidence interval u α u − α 2 1 2 where

u − α 1 2 α ∫ g(u)du = 0 2 ∞ α (12.3) ∫ g(u)du = u α 2 2 and

un−1 g(u) = e−u, u ≥ 0. Γ(n)

Example 12.2. Construct a 100( 1 − α)% confidence interval for the location parame- ter γ in an exponential population, where the scale parameter θ is known, say θ = 1. Assume that a simple random sample of size n is available.

Solution 12.2. The density function is given by

f (x|γ) = e−(x−γ), x ≥ γ and zero elsewhere. Let us consider the MLE of γ which is the smallest order statistic xn∶1 = y1. Then the density of y1 is available as

d n g(y1|γ) = − [Pr{xj ≥ z}] | dz z=y1

−n(y1−γ) = ne , y1 ≥ γ and zero elsewhere. Let u = y1 −γ. Then u has the density, denoted by g1(u), as follows:

−nu g1(u) = ne , u ≥ 0 and zero elsewhere. Then we can read off u α and u − α for any given α from this density. 2 1 2 That is,

u1− α − 2 −nu α nu1− α α ∫ ne du = ⇒ 1 − e 2 = 0 2 2 360 | 12 Interval estimation

1 α ⇒ u1− α = − ln(1 − ) (a) 2 n 2 ∞ −nu α −nu α α ∫ ne du = ⇒ e 2 = u α 2 2 2 1 α ⇒ u α = − ln( ). (b) 2 n 2 Now, we have the probability statement

Pr{u − α ≤ y − γ ≤ u α } = 1 − α. 1 2 1 2 That is,

Pr{y − u α ≤ γ ≤ y − u − α } = 1 − α. 1 2 1 1 2 Hence a 100(1 − α)% confidence interval for γ is given by

[y − u α , y − u − α ]. (12.4) 1 2 1 1 2 For example, for an observed sample 2, 8, 5 of size 3, a 95% confidence interval for gamma is given by the following: α α = 0.05 ⇒ = 0.025. 2 1 α 1 u α = − ln( ) = − ln(0.025). 2 n 2 3 1 u1− α = − ln(0.975). 2 3 = % [ + 1 ( . ), An observed value of y1 2. Hence a 95 confidence interval for γ is 2 3 ln 0 025 + 1 ( . )] 2 3 ln 0 975 .

Note 12.1. If both scale parameter θ and location parameter γ are present, then we need simultaneous confidence intervals or a confidence region for the point (θ, γ). Confidence region will be considered later.

Note 12.2. In Example 12.2, we have taken the pivotal quantity as the smallest order

statistic y1 = xn∶1. We could have constructed confidence interval by using a single observation or sum of observations or the sample .

12.4 Confidence interval for the parameters in a uniform density

Consider x1,…, xn, iid from a one parameter uniform density 1 f (x|θ) = , 0 ≤ x ≤ θ θ and zero elsewhere. Let us construct a 100(1 − α)% confidence interval for θ. Assume that a simple random sample of size n is available. The largest order statistic seems to 12.5 Confidence intervals in discrete distributions | 361

be a convenient starting point since it is the MLE of θ. Let yn = xn∶n be the largest order statistic. Then yn has the density d n ( | ) = [ { ≤ }]n| = n−1, ≤ ≤ . g yn θ Pr xj z n yn 0 yn θ dz z=yn θ

= yn ( ) Let us take the pivotal quantity as u θ . The density of u, denoted by g1 u is given by n−1 g1(u) = nu , 0 ≤ u ≤ 1 and zero elsewhere. Hence

1 u − α n 1 2 n−1 α α ∫ nu du = ⇒ u1− α = [ ] 0 2 2 2 and 1 1 n−1 α α n ∫ nu du = ⇒ u α = [1 − ] . 2 u α 2 2 2 Therefore,

Pr{u − α ≤ u ≤ u α } = 1 − α 1 2 2 1 1 α n y α n ⇒ Pr{[ ] ≤ n ≤ [1 − ] } = 1 − α 2 θ 2 y y ⇒ Pr{ n ≤ θ ≤ n } = 1 − α. α 1 α 1 ( − ) n ( ) n 1 2 2 Hence a 100(1 − α)% confidence interval for θ in this case is y y [ n , n ]. (12.5) α 1 α 1 ( − ) n ( ) n 1 2 2 For example, for an observed sample 8, 2, 5 from this one parameter uniform popula- 8 8 tion a 90% confidence interval for θ is given by [ 1 , 1 ]. (0.95) 3 (0.05) 3

Note 12.3. If the uniform population is over [a, b], b > a, then by using the largest and smallest order one can construct confidence intervals for b when a is known, for a when b is known. Simultaneous intervals for a and b will be discussed later.

12.5 Confidence intervals in discrete distributions

Here, we will consider a general procedure of setting up confidence intervals for the Bernoulli parameter p and the Poisson parameter λ. In discrete cases, such as a bi- α nomial, cutting off tail probability equal to 2 each may not be possible because the 362 | 12 Interval estimation

probability masses are at individually distinct points. When we add up the tail prob- α . abilities we may not get exact values 2 , for example, 0 025. When we add up a few points, the sum of the probabilities may be less than 0.025 and when we add up the next probability the total may exceed 0.025. Hence in discrete situations we take the ≤ α ≥ − tail probabilities as 2 so that the middle probability will be 1 α. Take the nearest α α point so that the tail probability is closest to 2 but less than or equal to 2 .

12.5.1 Confidence interval for the Bernoulli parameter p

We can set up confidence intervals for the Bernoulli parameter p by taking n observa- tions from a Bernoulli population or one observation from a binomial population. The binomial population has the probability function

n f (x, p) = ( ) px(1 − p)n−x, 0 < p < 1, x = 0, 1,…, n x and zero elsewhere. We can assume n to be known. We will see that we cannot find a pivotal quantity Q so that the probability function of Q is free of p. For a binomial random variable x, we can make a statement α Pr{x ≤ x1− α } ≤ , (12.6) 2 2

α that is, the left tail probability is less than or equal to 2 for any given α if p is known. But since x is not a pivotal quantity x − α will be a function of p, that is x − α (p). For 1 2 1 2 a given p, we can compute x − α for any given α. For a given p, we can compute two 1 2 points x1(p) and x2(p) such that

Pr{x1(p) ≤ x ≤ x2(p)} ≥ 1 − α (a) or we can select x1(p) and x2(p), for a given p, such that α Pr{x ≤ x (p)} ≤ (b) 1 2 and α Pr{x ≥ x (p)} ≤ . (c) 2 2

For every given p, the points x1(p) and x2(p) are available. If we plot x = x1(p) and x = x2(p), against p then we may get the graphs as shown in Figure 12.1. Let the observed value of x be x0. If the line x = x0 cuts the bands x1(p) and x2(p), then the inverse images will be p1 and p2 as shown in Figure 12.1. The cut on x1(p) will give p2 and that on x2(p) will give p1 or a 100( 1 − α)% confidence interval for p is [p1, p2]. Note that the 12.5 Confidence intervals in discrete distributions | 363

= α region below the line x x0 is characterized by the probability 2 and similarly the = α region above the line x x0 is characterized by 2 . Hence the practical procedure is the following: Consider equation (b) with x1(p) = x0 and search through the binomial table for a p, then the solution in (b) will give p2. Take equation (c) with x2(p) = x0 and search through the binomial tables for a p, then the solution in (c) gives p1.

Figure 12.1: Lower and upper confidence bands.

Note that in some situations the line x = x0 may not cut one or both of the curves x1(p) and x2(p). We may have situations where p1 and p2 cannot be found or p1 may be 0 or p1 may be 1. Let the observed value of x be x0, for example suppose that we observed 3 suc- cesses in n = 10 trials. Then our x = 3. We can take x (p) = x − α (p) = x and search for 0 1 1 2 0 that p, say p2, which will satisfy the inequality

x0 n x n−x α ∑ ( ) p2 (1 − p2) ≤ . (12.7) x=0 x 2

This will give one value of p, namely, p2 for which (12.6) holds. Now consider the upper tail probability. Consider the inequality

α Pr{x ≥ x α (p)} ≤ . (12.8) 2 2

Again let us take x (p) = x α = x and search for p for which (12.8) holds. Call it p . That 2 2 0 1 is,

n n α ∑ ( )px(1 − p )n−x ≤ . (12.9) x 1 1 2 x=x0

Then

Pr{p1 ≤ p ≤ p2} ≤ 1 − α (12.10) is the required 100(1 − α)% confidence interval for p. 364 | 12 Interval estimation

Example 12.3. If 10 Bernoulli trials gave 3 successes, compute a 95% confidence in- terval for the probability of success p. Note that for the same p both (12.7) and (12.9) cannot hold simultaneously.

Solution 12.3. Consider the inequality

3 10 x 10−x ∑ ( ) p2 (1 − p2) ≤ 0.025. x=0 x Look through a binomial table for n = 10 and all values of p. From tables, we see that for p = 0.5 the sum is 0.1710 which indicates that the value of p2 is bigger than 0.5. Most of the tables are given only for p up to 0.5. The reason being that for p > 0.5 we can still use the same table. By putting y = n − x and writing

3 3 10 10 ∑ ( ) px(1 − p)10−x = ∑ ( ) qy(1 − q)n−y x=0 x x=0 y 10 10 = ∑ ( ) qy(1 − q)10−y ≤ 0.025 y=7 y where q = 1 − p. Now looking through the binomial tables we see that q = 0.4. Hence p2 = 1 − q = 0.6. Now we consider the inequality 10 10 x 10−x ∑ ( ) p1 (1 − p1) ≤ 0.025, x=3 x which is the same as saying

2 10 x 10−x ∑ ( ) p1 (1 − p1) ≥ 0.975. x=0 x

Now, looking through the binomial table for n = 10 and all p we see that p1 = 0.05. Hence the required 95% confidence interval for p is [p1, p2] = [0.05, 0.60]. We have 95% confidence on this interval.

Note 12.4. We can use this exact procedure of this section to construct confidence interval for the parameter θ of a one-parameter distribution whether we have a piv- otal quantity or not. Take any convenient statistic T for which the distribution can

be derived. This distribution will contain θ. Let T0 be the observed value of T. Con- sider the inequalities α Pr{T ≤ T } ≤ (a) 0 2 and α Pr{T ≥ T } ≤ . (b) 0 2 12.6 Confidence intervals for parameters in N(μ, σ2) | 365

If the inequalities have solutions, note that both cannot be satisfied by the same

θ value, then the solution of (a) gives θ2 and the solution of (b) gives θ1 and then [θ1, θ2] is a 100(1 − α)% confidence interval for θ. As an exercise, the student is ad- vised to use this exact procedure to construct confidence interval for θ in an expo- nential population. Use the sample sum as T.

This exact procedure can be adopted for getting confidence intervals for the Pois- son parameter λ. In this case, make use of the property that the sample sum is again a Poisson with the parameter nλ. This is left as an exercise to the student.

Exercises 12.2–12.5

12.5.1. Construct a 95% confidence interval for the location parameter γ in an expo- nential population in Example 12.2 by using (1) x̄ the sample mean of a sample of size n; (2) the sample sum for a sample of size 2; (3) one observation from the population.

12.5.2. By using the observed sample 3, 8, 4, 5 from an exponential population,

1 f (x|θ, γ) = e−(x−γ), x ≥ γ, θ > 0 θ and zero elsewhere, construct a 95% confidence interval for (1): θ if γ = 2; (2): γ if θ = 4.

12.5.3. Consider a uniform population over [a, b], b > a. Assume that the observed sample 2, 8, 3 is available from this population. Construct a 95% confidence interval for (1) a when b = 8; (2) b when a = 1, by using order statistics.

12.5.4. Consider the same uniform population in Exercise 12.5.3 with a = 0. Assume that a sample of size 2 is available. (1) Compute the density of the sample sum y = x1 + x2; (2) by using y construct a 95% confidence interval for b if the observed sample is 2, 6.

12.5.5. Construct a 90% confidence interval for the Bernoulli parameter p if 2 suc- cesses are obtained in (1) 10 trials; (2) eight trials.

12.5.6. Consider a Poisson population with parameter λ. Construct a 90% confidence interval for λ if 3, 7, 4 is an observed sample.

12.6 Confidence intervals for parameters in N(μ, σ2)

First, we will consider a simple problem of constructing a confidence interval for the mean value μ in a normal population when the population is known. Then we will consider intervals for μ when σ2 is not known. Then we will look at inter- vals for σ2. In the following situations, we will be constructing the central intervals 366 | 12 Interval estimation

for convenience. These central intervals will be the shortest when the pivotal quan- tities have symmetric distributions. In the case of confidence intervals for the popu- lation variance, the pivotal quantity taken is a chi-square variable, which does not have a symmetric distribution, and hence the central interval cannot be expected to be the shortest, but for convenience we will consider the central intervals in all situa- tions.

12.6.1 Confidence intervals for μ

Case 1 (Population variance σ2 is known). Here, we can take a pivotal quantity as the standardized sample mean

√n(x̄ − μ) z = ∼ N(0, 1) σ which is free of all parameters when σ is known. Hence we can read off z α and z − α so 2 1 2 that

Pr{z − α ≤ z ≤ z α } = 1 − α. 1 2 2

Since a standard normal density is symmetric at z = 0, we have z − α = −z α . Let us 1 2 2 examine the mathematical inequalities.

√n(x̄ − μ) −z α ≤ z ≤ z α ⇒ −z α ≤ ≤ z α 2 2 2 σ 2 σ σ ⇒ −z α ≤ x̄ − μ ≤ z α 2 √n 2 √n σ σ ⇒ x̄ − z α ≤ μ ≤ x̄ + z α 2 √n 2 √n and hence σ σ Pr{−z α ≤ z ≤ z α } = Pr{x̄ − z α ≤ μ ≤ x̄ + z α } 2 2 2 √n 2 √n = 1 − α.

Hence a 100(1 − α)% confidence interval for μ, when σ2 is known, is given by

σ σ [x̄ − z α , x̄ + z α ]. (12.11) 2 √n 2 √n

The following Figure 12.2 gives an illustration of the construction of the central confi- dence interval for μ in a normal population with σ2 known.

Example 12.4. Construct a 95% confidence interval for μ in a N(μ, σ2 = 4) from the following observed sample: −5, 0, 2, 15. 12.6 Confidence intervals for parameters in N(μ, σ2) | 367

Figure 12.2: Confidence interval for μ in a N(μ, σ2), σ2 known.

Solution 12.4. Here, the sample mean x̄ = (−5 + 0 + 2 + 15)/4 = 3. 1 − α = 0.95 α = . = . 2 2 0 025. From a standard normal table, we have z0.025 1 96 approximately. σ is given to be 4, and hence σ = 2. Therefore, from (12.6), one 95% confidence interval for μ is given by σ σ 2 2 [x̄ − z α , x̄ + z α ] = [3 − 1.96( ), 3 + 1.96( )] 2 √n 2 √n 2 2 = [1.04, 4.96].

We have 95% confidence that the unknown μ is on this interval.

Note that the length of the interval in this case is σ σ σ [x̄ + z α ] − [x̄ − z α ] = 2z α 2 √n 2 √n 2 √n which is free of all variables, and hence it is equal to its expected value, or the expected σ length of the interval in this case is 2z α = 2(1.96) = 3.92 in Example 12.4. 2 √n

Example 12.5. For a binomial random variable x, it is known that for large n (n ≥ 20, np ≥ 5, n(1 − p) ≥ 5) the standardized binomial variable is approximately a standard normal. By using this approximation set up an approximate 100(1 − α)% confidence interval for p the probability of success.

Solution 12.5. We will construct a central interval. We have x − np ≈ z, z ∼ N(0, 1). √np(1 − p)

From a standard normal table, we can obtain z α so that an approximate probability is 2 the following: x − np Pr{−z α ≤ ≤ z α } ≈ 1 − α. 2 √np(1 − p) 2

The inequality can be written as

2 (x − np) 2 ≤ z α . np(1 − p) 2 Opening this up as a quadratic equation in p, when the equality holds, and then solv- ing for p, one has 368 | 12 Interval estimation

1 2 1 2 2 2 1 2 (x + z α ) ∓ √(x + z α ) − x (1 + z α ) 2 2 2 2 n 2 p = . (12.12) 1 2 n(1 + z α ) n 2 These two roots are the lower and upper 100(1 − α)% central confidence limits for p approximately. For example, for n = 20, x = 8, α = 0.05 we have z0.025 = 1.96. Substitut- ing these values in (12.12) we obtain the approximate roots as 0.22 and 0.61. Hence an approximate 95% central confidence interval for the binomial parameter p in this case is [0.22, 0.61]. [Simplifications of the computations are left to the student.]

Case 2 (Confidence intervals for μ when σ2 is unknown). In this case, we cannot take the standardized normal variable as our pivotal quantity because, even though the distribution of the standardized normal is free of all parameters, we have a σ present in the standardized variable, which acts as a nuisance parameter here.

Definition 12.4 (Nuisance parameters). These are parameters which are not rele- vant for the problem under consideration but which are going to be present in the computations.

Hence our aim is to come up with a pivotal quantity involving the sample values and μ only and whose distribution is free of all parameters. We have such a quantity here, which is the Student-t variable. Consider the following pivotal quantity, which has a Student-t distribution: √ ∑n (x − x̄)2 n(x̄ − μ) 2 j=1 j ∼ tn−1, s1 = (12.13) s1 n − 1

2 2 where s1 is an unbiased estimator for the population variance σ . Note that a Student-t distribution is symmetric around t = 0. Hence we can expect the central interval being the shortest interval in expected value. For constructing a central 100(1 − α)% confi- dence interval for μ read off the upper tail point t − , α such that n 1 2 α Pr{tn−1 ≥ tn−1, α } = . 2 2 Then we can make the probability statement

Pr{−t − , α ≤ t − ≤ t − , α } = 1 − α. (12.14) n 1 2 n 1 n 1 2

Substituting for tn−1 and converting the inequalities into inequalities over μ, we have the following:

s1 s1 Pr{x̄ − tn−1, α ≤ μ ≤ x̄ + tn−1, α } = 1 − α (12.15) 2 √n 2 √n which gives a central 100(1 − α)% confidence interval for μ. Figure 12.3 gives the illus- tration of the percentage points. 12.6 Confidence intervals for parameters in N(μ, σ2) | 369

Figure 12.3: Percentage points from a Student-t density.

s1 The interval is of length 2t − , α , which contains the variable s , and hence it is a ran- n 1 2 √n 1 dom quantity. We can compute the expected value of this length by using the fact that

2 (n − 1)s1 2 ∼ χ − σ2 n 1

2 where χn−1 is a chi-square variable with (n − 1) degrees of freedom.

Example 12.6. Construct a 99% confidence interval for μ in a normal population with unknown variance, by using the observed sample 1, 0, 5 from this normal population.

2 Solution 12.6. The sample mean x̄ = (1 + 0 + 5)/3 = 2. An observed value of s1 is given by

1 s2 = [(1 − 2)2 + (0 − 2)2 + (5 − 2)2] = 7 1 2 √ ⇒ s1 = 7 = 2.6457513.

= . ⇒ α = . − = Now, our α 0 01 2 0 005. From a Student-t table for n 1 2 degrees of freedom, t2,0.005 = 9.925. Hence a 99% central confidence interval for μ here is given by

√7 √7 [2 − 9.925 , 2 + 9.925 ] ≈ [−13.16, 17.16]. √3 √3

Note 12.5. In some books, the students may find the statement that when the sam- ple size n ≥ 30 one can get a good normal approximation for Student-t, and hence

take zα from a standard normal table instead of tν,α from the Student-t table with ν degrees of freedom, for ν ≥ 30. The student may look into the exact percentage points from the Student-t table to see that even for the degrees of freedom ν = 120 the upper tail areas of the standard normal and Student-t do not agree with each

other. Hence taking zα instead of tν,α for ν ≥ 30 is not a proper procedure.

12.6.2 Confidence intervals for σ2 in N(μ, σ2)

Here, we can consider two situations. (1) μ is known, (2) μ is not known, and we wish to construct confidence intervals for σ2 in N(μ, σ2). Convenient pivotal quantities are the following: When μ is known we can use 370 | 12 Interval estimation

n (x − μ)2 n (x − x̄)2 ∑ j ∼ 2 ∑ j ∼ 2 . 2 χn and 2 χn−1 j=1 σ j=1 σ Then from a chi-square density we have n ( − )2 2 xj μ 2 Pr{χ , − α ≤ ∑ ≤ χ , α } = 1 − α (12.16) n 1 2 2 n 2 j=1 σ and n ( − ̄)2 2 xj x 2 Pr{χ − , − α ≤ ∑ ≤ χ − , α } = 1 − α. (12.17) n 1 1 2 2 n 1 2 j=1 σ The percentage points are marked in Figure 12.4.

Figure 12.4: Percentage points from a chi-square density.

Note that (12.16) can be rewritten as ∑n (x − μ)2 ∑n (x − μ)2 { j=1 j ≤ 2 ≤ j=1 j } = − . Pr 2 σ 2 1 α χ , α χ , − α n 2 n 1 2 A similar probability statement can be obtained by rewriting (12.17). Therefore, a 100(1 − α)% central confidence interval for σ2 is the following: ∑n (x − μ)2 ∑n (x − μ)2 ∑n (x − x̄)2 ∑n (x − x̄)2 [ j=1 j , j=1 j ]; [ j=1 j , j=1 j ]. 2 2 2 2 (12.18) χ , α χ , − α χ − , α χ − , − α n 2 n 1 2 n 1 2 n 1 1 2 Note that a χ2 distribution is not symmetric and hence we cannot expect to get the shortest interval by taking the central intervals. The central intervals are taken only ∑n ( − )2 j=1 xj μ ∼ 2 for convenience. When μ is unknown, then we cannot use σ2 χn because the nuisance parameter μ is present. We can use the pivotal quantity n (x − x̄)2 ∑ j ∼ 2 2 χn−1 j=1 σ and construct a 100(1 − α)% central confidence interval, and it is the second one given in (12.18). When μ is known, we can also use the standardized normal √n(x̄ − μ) ∼ N(0, 1) σ as a pivotal quantity to construct confidence interval for σ, thereby the confidence 2 interval for σ . Note that if [T1, T2] is a 100(1 − α)% confidence interval for θ then 12.6 Confidence intervals for parameters in N(μ, σ2) | 371

[g(T1), g(T2)] is a 100(1 − α)% confidence interval for g(θ) when θ to g(θ) is a one to one function.

Example 12.7. If −2, 1, 7 is an observed sample from a N(μ, σ2), construct a 95% per- cent confidence interval for σ2 when (1) μ = 1, (2) μ is unknown.

̄ = (−2+1+7) = ∑3 ( − ̄)2 = (− − )2 + ( − )2 + ( − )2 = ∑3 ( − Solution 12.7. x 3 2, j=1 xj x 2 2 1 2 7 2 42. j=1 xj )2 = (− − )1 + ( − )2 + ( − )2 = − = . ⇒ α = . μ 2 1 1 1 7 1 45. 1 α 0 95 2 0 025. From a chi-square table 2 2 2 2 2 2 2 2 χ , α = χ3,0.025 = 9.35, χ − , α = χ2,0.025 = 7.38, χ , − α = χ3,0.975 = 0.216, χ − , − α = χ2,0.975 = n 2 n 1 2 n 1 2 n 1 1 2 0.0506. (2) Then when μ is unknown a 95% central confidence interval for σ2 is given by

n ̄ 2 n ̄ 2 ∑j=1(xj − x) ∑j=1(xj − x) 42 42 [ , ] = [ , ] 2 2 χ − , α χ − , − α 7.38 0.0506 n 1 2 n 1 1 2 = [5.69, 830.04].

(1) When μ = 1, we can use the above interval as well as the following interval:

n 2 n 2 ∑j=1(xj − μ) ∑j=1(xj − μ) 45 45 [ , ] = [ , ] 2 2 χ , α χ , − α 9.35 0.216 n 2 n 1 2 = [4.81, 208.33].

Note that when the information about μ is used the interval is shorter.

Note 12.6. The student may be wondering whether it is possible to construct con- fidence intervals for σ, once confidence interval for σ2 is established. Then take the

corresponding square roots. If [ϕ1(x1,…, xn), ϕ2(x1,…, xn)] is a 100(1 − α)% confi- dence interval for θ, then [h(ϕ1), h(ϕ2)] is a 100(1 − α)% confidence interval for h(θ) as long as θ to h(θ) is a one to one function.

Exercises 12.6

12.6.1. Consider a 100(1 − α)% confidence interval for μ in a N(μ, σ2) where σ2 is known, by using the standardized sample mean. Construct the interval so that the left tail area left out is α1 and the right tail area left out is α2 so that α1 + α2 = α. Show that = = α the length of the interval is shortest when α1 α2 2 . 2 2 12.6.2. Let x1,…, xn be iid as N(μ, σ ) where σ is known. Construct a 100(1 − α)% cen- tral confidence interval for μ by using the statistic c1x1 + ⋯ + cnxn where c1,…, cn are known constants. Illustrate the result for c1 = 2, c2 = −3, c3 = 5 and based on the ob- served sample 2, 1, −5. 372 | 12 Interval estimation

12.6.3. Construct (1) a 90%, (2) a 95%, (3) a 99% central confidence interval for μ in Exercise 12.6.1 with σ2 = 2 and based on the observed sample −1, 2, 5, 7.

12.6.4. Do the same Exercise 12.6.3 if σ2 is unknown.

12.6.5. Compute the expected length in the central interval for the parameter μ in a N(μ, σ2), where σ2 is unknown, and based on a Student-t statistic.

12.6.6. Compute the expected length as in Exercise 12.6.5 if the interval is obtained by cutting off the areas α1 at the left tail and α2 at the right tail. Show that the expected length is least when α1 = α2.

12.6.7. Construct a 95% central confidence interval for μ in a N(μ, σ2), when σ2 is un- known, by using the statistic u = 2x1 + x2 − 5x3, and based on the observed sample 5, −2, 6.

12.6.8. By using the standard normal approximation for a standardized binomial vari- able construct a 90% confidence interval (central) for p the probability of success if (1) 7 successes are obtained in 20 trials; (2) 12 successes are obtained in 22 trials.

12.6.9. The grades obtained by students in a statistics course are assumed to be nor- mally distributed with mean value μ and variance σ2. Construct a 95% confidence in- terval for σ2 when (1) μ = 80, (2) μ is unknown, based on the following observed sam- ple: 75, 85, 90, 90; (a) Consider central intervals, (b) Consider cutting off 0.5 at the right tail.

12.6.10. Show that for the problem of constructing confidence interval for σ2 in a N(μ, σ2), based on a pivotal quantity having a chi-square distribution, the central in- terval is not the shortest in expected length when the degrees of freedom is small.

12.7 Confidence intervals for linear functions of mean values

Here, we are mainly interested in situations of the following types: (1) A new drug is administered to lower blood pressure in human beings. A random sample of n individ- uals is taken. Let xj be the blood pressure before administering the drug and yj be the blood pressure after administering the drug on the j-th individual, for j = 1,…, n. Then we have paired values (xj, yj), j = 1,…, n. Our aim may be to estimate the expected difference, namely μ2 − μ1, μ2 = E(yj), μ1 = E(xj) and test a hypothesis that (xj, yj), j = 1,…, n are identically distributed. But obviously, y = the blood pressure after ad- ministering the drug depends on x = the blood pressure before administering the drug. Here, x and y are dependent variables and may have a joint distribution. (2) A sample of n1 test plots are planted with corn variety 1 and a sample of n2 test plots are planted ,…, with corn variety 2. Let x1 xn1 be the observations on the yield x of corn variety 1 and ,…, let y1 yn2 be the observations on the yield y of corn variety 2. Let the test plots be 12.7 Confidence intervals for linear functions of mean values | 373

homogeneous in all respects. Let E(x) = μ1 and E(y) = μ2. Someone may have a claim that the expected yield of variety 2 is 3 times that of variety 1. Then our aim may be to estimate μ2 − 3μ1. If someone has the claim that variety 2 is better than variety 1, then our aim may be to estimate μ2 − μ1. In this example, without loss of generality, we may assume that x and y are independently distributed. (3) A random sample of n1 students of the same background are subjected to method 1 of teaching (consisting of lectures followed by one final examination), and a random sample of n2 students of the same background, as of the first set of students, are subjected to method 2of teaching (may be consisting of each lecture followed by problem sessions and three cumulative tests). Our aim may be to claim that method 2 is superior to method 1. Let

μ2 = E(y), μ1 = E(x) where x and y represent the grades under method 1 and method 2, respectively. Then we may want to estimate μ2 − μ1. Here also, it can be assumed that x and y are independently distributed. (3) Suppose that a farmer has planted 5 vari- eties of paddy (rice). Let the yield per test plot of the 5 varieties be denoted by x1,…, x5 with μi = E(xi), i = 1,…, 5. The market prices of these varieties are respectively Rs 20, Rs 25, Rs 30, Rs 32, Rs 38 per kilogram. Then the farmer’s interest may be to estimate the money value, that is, 20μ1 + 25μ2 + 30μ3 + 32μ4 + 38μ5. Variety i may be planted in ni test plots so that the yields are xij, j = 1,…, ni, i = 1,…, 5, where xij is the yield of the j-th test plot under variety i. Problems of the above types are of interest in this section. We will consider only situations involving two variables. The procedure is exactly parallel when more vari- ables are involved. In the two variables case also, we will look at situations where the variables are dependent in the sense of having a joint distribution, and situations where the variables are assumed to be statistically independently distributed in the sense of holding product probability property will be considered later.

12.7.1 Confidence intervals for mean values when the variables are dependent

When we have paired variables (x, y), where x and y are dependent, then one way of handling the situation is to consider u = y − x, in situations such as blood pressure before administering the drug (x) and blood pressure after administering the drug (y), if we wish to estimate μ2 − μ1 = E(y) − E(x). If we wish to estimate a linear function aμ1 + bμ2, then consider the function u = ax + by. For example, a = −1 and b = 1 gives μ2 − μ1. When (x, y) has a bivariate then it can be proved that ev- 2 ery linear function is univariate normal. That means, u ∼ N(μ̃, σ̃ ) where μ̃ = aμ1 + bμ2 2 2 2 2 2 2 2 and σ̃ = a σ1 + b σ2 + 2ab Cov(x, y), σ1 = Var(x), σ2 = Var(y). Now, construct confi- dence intervals for the mean value of u in situations where (1) Var(u) is known, (2) Var(u) is unknown, and confidence intervals for Var(u) for the cases when (1) E(u) is known, (2) E(u) is unknown, by using the procedures in Section 12.5. Note that we 2 2 need not know about the individual parameters μ1, μ2, σ1 , σ2 and Cov(x, y) in this pro- cedure. 374 | 12 Interval estimation

Note 12.7. Many books may proceed with the assumption that x and y are indepen- dently distributed, in situations like blood pressure example, claiming that the effect of the drug is washed out after two hours or dependency is gone after two hours. As- suming statistical independence in such situations is not a proper procedure. When paired values are available we can handle by using u as described above, which is a correct procedure when the joint distribution is normal. If the joint distribution is not normal, then we may evaluate the distribution of a linear function first and then use a linear function to construct confidence intervals for linear functions for mean values.

Example 12.8. The following are the paired observations on (x, y) = (1, 4), (4, 8), (3, 6), (2, 7) where x is the amount of a special animal feed and y is the gain in weight. It is conjectured that y is approximately 3x + 1. Construct a 95% confidence interval for

(1) E(u) = E[y − (3x + 1)] = μ2 − 3μ1 − 1, E(y) = μ2, E(x) = μ1, (2) variance of u, assuming that (x, y) has a bivariate normal distribution.

Solution 12.8. Let u = y − 3x − 1. Then the observations on u are the following:

u1 = 4 − 2(1) − 1 = 1, u2 = 8 − 2(−4)1, − 1 = u3 = 6 − 2(−3)1, − 1 = 1 1 u = 7 − 2(2) − 1 = 2, ū = (1 − 1 − 1 + 2) = 4 4 4 n ( − ̄)2 2 uj u s1 = ∑ ; j=1 n − 1 1 1 2 1 2 1 2 1 2 108 Observed value = [(1 − ) + (−1 − ) + (−1 − ) + (2 − ) ] = . 3 4 4 4 4 16 × 3 √n[ū − E(ū)] ∼ tn−1 = t3 (12.19) s1 is Student-t with 3 degrees of freedom. [Since all linear functions of normal variables (correlated or not) are normally distributed, u is N(μ, σ2) where μ = E(u), σ2 = Var(u).] t − , α = t , . = 3.182 from Student-t tables (see the illustration in Figure 12.3). Hence n 1 2 3 0 025 a 95% central confidence interval for E(u) = μ2 − 3μ1 − 1 is the following: √ √ s1 s1 1 108 1 108 [ū − tn−1, α , ū + tn−1, α ] = [ − 3.182 , + 3.182 ] 2 √n 2 √n 4 4(√12) 4 4(√12) = [−2.14, 2.64].

For constructing a 95% confidence interval for Var(u), one can take the pivotal quantity as n (u − ū)2 ∑ j ∼ 2 = 2; 2 = . , 2 = . . 2 χn−1 χ3 χ3,0.025 9 35 χ3,0.975 0 216 j=1 σ See the illustration of the percentage points in Figure 12.4. Then a 95% central confi- dence interval is given by the following: 12.7 Confidence intervals for linear functions of mean values | 375

n ̄ 2 n ̄ 2 ∑j=1(uj − u) ∑j=1(uj − u) 108 108 [ , ] = [ , ] 2 2 χ − , α χ − , − α 16(9.35) 16(0.216) n 1 2 n 1 1 2 = [0.72, 31.25].

Note 12.8. Note that in the paired variable (x, y) case if our interest is to construct

a confidence interval for μ2 − μ1 then take u = y − x and proceed as above. Whatever be the linear function of μ1 and μ2, for which a confidence interval is needed, take the corresponding linear function of x and y as u and then proceed. Do not assume statistical independence of x and y unless there is theoretical justification to do so.

12.7.2 Confidence intervals for linear functions of mean values when thereis statistical independence

If x and y are statistically independently distributed with E(x) = μ1, E(y) = μ2, Var(x) = 2 2 σ1 , Var(y) = σ2 and if simple random samples of sizes n1 and n2 are available from x and y, then how can we set up confidence intervals for aμ1 + bμ2 + c where a, b, c are ,…, ,…, known constants? Let x1 xn1 and y1 yn2 be the samples from x and y, respec- tively. If x and y are normally distributed then the problem is easy, otherwise one has to work out the distribution of the linear function first and then proceed. Let us assume 2 2 that x ∼ N(μ1, σ1 ), y ∼ N(μ2, σ2 ) and be independently distributed. Let n n ∑ 1 ∑ 2 n1 n2 j=1 xj j=1 yj 2 2 2 2 x̄ = , ȳ = , v1 = ∑(xj − x̄) , v2 = ∑(yj − ȳ) (12.20) n1 n2 j=1 j=1 and u = ax̄ + bȳ + c. Then u ∼ N(μ, σ2), where

μ = E(u) = aE[x̄] + bE[ȳ] + c = aμ1 + bμ2 + c σ2 = Var(u) = Var(ax̄ + bȳ + c) = Var(ax̄ + bȳ) = a2 Var(x̄) + b2 Var(ȳ) since x̄ and ȳ are independently distributed σ2 σ2 σ2 = a2 1 + b2 2 . n1 n2

Our interest here is to set up confidence intervals for aμ1 + bμ2 + c. A usual situation may be to set up confidence intervals for μ2 − μ1. In that case, c = 0, b = 1, a = −1. Various situations are possible. 2 2 Case 1 (σ1 and σ2 are known). In this case, we can take the pivotal quantity as the standardized u. That is, u − E(u) u − (aμ + bμ + c) = 1 2 ∼ N(0, 1). (12.21) √Var(u) σ2 σ2 √a2 1 + b2 2 n1 n2 376 | 12 Interval estimation

Hence a 100(1 − α)% central confidence interval for aμ1 + bμ2 + c is the following:

2 2 2 2 σ1 σ2 σ1 σ2 [u − z α √a2 + b2 , u + z α √a2 + b2 ] (12.22) 2 2 n1 n2 n1 n2 where z α is illustrated in Figure 12.2. 2 2 2 2 Case 2 (σ1 = σ2 = σ = unknown). In this case, the population are given to be equal but it is unknown. In that case, we can use a Student-t statistic. Note from 2 2 2 2 2 2 2 (12.20) that E[v1 ] = (n1 − 1)σ1 and E[v2] = (n2 − 1)σ2 , and hence when σ1 = σ2 = σ then 2 2 2 E[v1 + v2] = (n1 + n2 − 2)σ or

n1 ̄ 2 n2 ̄ 2 (∑j=1(xj − x) + ∑j=1(yj − y) ) E[v2] = E[ ] = σ2. (12.23) n1 + n2 − 2

Hence σ̂2 = v2 can be taken as an unbiased estimator of σ2. In the standardized normal 2 2 variable if we replace σ by σ̂ , then we should get a Student-t with n1 + n2 − 2 degrees of freedom because the corresponding chi-square has n1 + n2 − 2 degrees of freedom. Hence the pivotal quantity that we will use is the following:

(ax̄ + bȳ + c) − (aμ + bμ + c) (ax̄ + bȳ + c) − (aμ + bμ + c) 1 2 = 1 2 2 2 2 2 σ̂√ a + b v√ a + b n1 n2 n1 n2 ∼ tn1+n2−2 (12.24) where v is defined in (12.23). Now a 100(1 − α)% central confidence interval for aμ1 + bμ2 + c is given by

a2 b2 [(ax̄ + bȳ + c) ∓ t + − , α v√ + ]. (12.25) n1 n2 2 2 n1 n2

The percentage point t + − , α is available from Figure 12.3 and v is available from n1 n2 2 2 (12.23). If the confidence interval for μ2 − μ1 is needed, then put c = 0, b = 1, a = −1 in (12.25).

2 2 Case 3 (σ1 and σ2 are unknown but n1 ≥ 30, n2 ≥ 30). In this case, one may use the following approximation to standard normal for setting up confidence intervals.

(ax̄ + bȳ + c) − (aμ + bμ + c) 1 2 ∼ N(0, 1) (12.26) 2 2 √ s1 + s2 n1 n2

( − ̄)2 ( − ̄)2 2 n1 xj x 2 n2 yj y approximately, where s = ∑ = , s = ∑ = are the sample variances. When 1 j 1 n1 2 j 1 n2 n1 and n2 are large, dividing by ni or ni − 1 for i = 1, 2 will not make a difference. Then the approximate 100(1 − α)% central confidence interval for aμ1 + bμ2 + c is given 12.7 Confidence intervals for linear functions of mean values | 377

by

2 2 2 2 a s1 b s2 (ax̄ + bȳ + c) ∓ z α √ + (12.27) 2 n1 n2 where the percentage point z α is available from the standard normal density in Fig- 2 ure 12.2.

12.7.3 Confidence intervals for the ratio of variances

2 Here again, we consider two independently distributed normal variables x ∼ N(μ1, σ1 ) 2 and y ∼ N(μ2, σ2 ) and simple random samples of sizes n1 and n2 from x and y, respec- 2 σ1 tively. We would like to construct a 100(1 − α)% confidence interval for θ = 2 . We will σ2 make use of the property that n ∑ 1 (x − x̄)2 j=1 j ∼ 2 2 χn1−1 σ1 n ∑ 2 (y − ȳ)2 j=1 j ∼ 2 2 χn2−1 σ2 n1 ̄ 2 1 [∑j=1(xj − x) /(n1 − 1)] 1 u( ) = ( ) n2 ̄ 2 θ [∑j=1(yj − y) /(n2 − 1)] θ ∼ . Fn1−1,n2−1 (12.28) From this, one can make the following probability statement: 1 Pr{Fn −1,n −1,1− α ≤ u( ) ≤ Fn −1,n −1, α } = 1 − α. 1 2 2 θ 1 2 2 Rewriting this as a statement on θ, we have u u Pr{ ≤ θ ≤ } = 1 − α (12.29) F − , − , α F − , − , − α n1 1 n2 1 2 n1 1 n2 1 1 2 where the percentage points F − , − , α and F − , − , − α are given in Figure 12.5, and n1 1 n2 1 2 n1 1 n2 1 1 2

n1 ̄ 2 [∑j=1(xj − x) /(n1 − 1)] u = ∼ θF − , − . (12.30) n2 ̄ 2 n1 1 n2 1 [∑j=1(yj − y) /(n2 − 1)]

Figure 12.5: Percentage points from a F -density. 378 | 12 Interval estimation

2 σ1 Note 12.9. If confidence intervals for a 2 = aθ, where a is a constant, is needed σ2 then multiply and divide u in (12.28) by a, absorb the denominator a with θ and proceed to get the confidence intervals from (12.29). Also note that only the central interval is considered in (12.29).

1 Note 12.10. Since F-random variable has the property that F , = we can con- m n Fn,m vert the lower percentage point Fm,n,1−α/2 to an upper percentage point on Fn,m,α/2. That is,

1 Fm,n,1− α = . (12.31) 2 F , , α n m 2

Hence usually the lower percentage points are not given in F-tables.

Example 12.9. Nine test plots of variety 1 and 5 test plots of variety 2 of tapioca gave 2 2 2 2 the following summary data: s1 = 10 kg and s2 = 5 kg, where s1 and s2 are the sample 2 variances. The yield x under variety 1 is assumed to be distributed as N(μ1, σ1 ) and the 2 yield y of variety 2 is assumed to be distributed as N(μ2, σ2 ) and independently of x. 2 σ1 Construct a 90% confidence interval for 3 2 . σ2

Solution 12.9. We want to construct a 90% confidence interval and hence in our no- 2 α σ1 tation, α = 0.10, = 0.05. The parameter of interest is 3θ = 3 2 . Construct interval for 2 σ2 θ and then multiply by 3. Hence the required statistic, in observed value, is n1 ̄ 2 [∑j=1(xj − x) /(n1 − 1)] u = n2 ̄ 2 [∑j=1(yj − y) /(n2 − 1)] [9s2/(8)] = 1 ∼ 2 F8,4 and in observed value [5s2/(4)] (9)(10) (5)(5) 9 = [ ]/[ ] = . 8 4 5

From F-tables, we have F8,4,0.05 = 6.04 and F4,8,0.05 = 3.84. Hence a 90% central confi- dence interval for 3θ is given by

27 27 27 27(F , , . ) [ , ] = [ , 4 8 0 05 ] 5(F8,4,0.05) 5(F8,4,0.95) 5(F8,4,0.05) 5 27 27(3.84) = [ , ] = [0.89, 20.74]. 5(6.04) 5

Note 12.11 (Confidence regions). In a population such as gamma (real scalar ran- dom variable), there are usually two parameters, the scale parameter β, β > 0 and the shape parameter α, α > 0. If relocation of the variable is involved, then there is an additional location parameter γ, −∞ < γ < ∞. In a real scalar normal popu- 12.7 Confidence intervals for linear functions of mean values | 379

lation N(μ, σ2), there are two parameters μ, −∞ < μ < ∞ and σ2, 0 < σ2 < ∞. The parameter spaces in the 3-parameter gamma density is

Ω = {(α, β, γ) ∣ 0 < α < ∞, 0 < β < ∞, −∞ < γ < ∞}.

In the normal case, the parameter space is Ω = {(μ, σ2) ∣ −∞ < μ < ∞, 0 < σ2 < ∞}.

Let θ = (θ1,…, θs) represent the set of all parameters in a real scalar population. In the above gamma case, θ = (α, β, γ), s = 3 and in the above normal case θ = (μ, σ2), s = 2. We may be able to come up with a collection of one or more functions of the

sample values x1,…, xn and some of the parameters from θ, say, P = (P1,…, Pr) such that the joint distribution of P is free of all parameters in θ. Then we will be able to make a statement of the type

Pr{P ϵ R1} = 1 − α (12.32)

r for a given α, where R1 is a subspace of R = R × R × ⋯ × R where R is the real line. If we can convert this statement into a statement of the form

Pr{S1 covers θ} = 1 − α (12.33)

where S1 is a subspace of the sample space S, then S1 is the confidence region for θ. Since computations of confidence regions will be more involved, we will not be discussing this topic further.

Exercises 12.7

12.7.1. In a weight reduction experiment, a random sample of 5 individuals under- went a certain dieting program. The weight of a randomly selected person, before the program started, is x and when the program is finished it is y. (x, y) is assumed to have a bivariate normal distribution. The following are the observations on (x, y): (80, 80), (90, 85), (100, 80), (60, 55), (65, 70). Construct a 95% central confidence inter- val for (a) μ1 − μ2, when (1) variance of x − y is 4, (2) when the variance of x − y is unknown; (b) 0.2μ1 − μ2 when (1) variance of u = 0.2x − y is known to be 5, (2) variance of u is unknown.

12.7.2. Two methods of teaching are experimented on sets of n1 = 10 and n2 = 15 stu- dents. These students are assumed to have the same backgrounds and are indepen- dently selected. If x and y are the grades of randomly selected students under the two 2 2 methods, respectively, and if x ∼ N(μ1, σ1 ) and y ∼ N(μ2, σ2 ) construct 90% confidence 2 2 2 2 2 2 intervals for (a) μ1 − 2μ2 when (1) σ1 = 2, σ2 = 5, (2) σ1 = σ2 but unknown; (b) 2σ1 /σ2 when (1) μ1 = −10, μ2 = 5, (2) μ1, μ2 are unknown. The following summary statistics are 2 2 given, with the usual notations: x̄ = 90, ȳ = 80, s1 = 25, s2 = 10. 380 | 12 Interval estimation

2 12.7.3. Consider the same problem as in Exercise 12.6.2 with n1 = 40, n2 = 50 but σ1 2 and σ2 are unknown. Construct a 95% confidence interval for μ1 − μ2, by using the same summary data as in Exercise 12.7.2.

1 12.7.4. Prove that F , , − = . m n 1 α Fn,m,α

12.7.5. Let x1,…, xn be iid variables from some population (discrete or continuous) with mean value μ and variance σ2 < ∞. Use the result that

√n(x̄ − μ) ∼ N(0, 1) σ approximately for large n, and set up a 100(1 − α)% confidence interval for μ when σ2 is known.

12.7.6. The temperature reading x at location 1 and y at location 2 gave the following 2 data. A simple random sample of size n1 = 5 on x gave x̄ = 20c and s1 = 5c, and a ran- 2 2 2 dom sample of n2 = 8 on y gave ȳ = 30c and s2 = 8c. If x ∼ N(μ1, σ1 ) and y ∼ N(μ2, σ2 ) 2 σ1 and independently distributed then construct a 90% confidence interval for 2 . σ2