Module 5: Interval Estimation Statistics (OA3102)
Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 8.5-8.9
Revision: 1-12 1 Goals for this Module
• Interval estimation – i.e., confidence intervals – Terminology – Pivotal method for creating confidence intervals • Types of intervals – Large-sample confidence intervals – One-sided vs. two-sided intervals – Small-sample confidence intervals for the mean, differences in two means – Confidence interval for the variance • Sample size calculations
Revision: 1-12 2 Interval Estimation
• Instead of estimating a parameter with a single number, estimate it with an interval • Ideally, interval will have two properties: – It will contain the target parameter q – It will be relatively narrow • But, as we will see, since interval endpoints are a function of the data, – They will be variable – So we cannot be sure q will fall in the interval
Revision: 1-12 3 Objective for Interval Estimation
• So, we can’t be sure that the interval contains q, but we will be able to calculate the probability the interval contains q • Interval estimation objective: Find an interval estimator capable of generating narrow intervals with a high probability of enclosing q
Revision: 1-12 4 Why Interval Estimation?
• As before, we want to use a sample to infer something about a larger population • However, samples are variable – We’d get different values with each new sample – So our point estimates are variable • Point estimates do not give any information about how far off we might be (precision) • Interval estimation helps us do inference in such a way that: – We can know how precise our estimates are, and – We can define the probability we are right
Revision: 1-12 5 Terminology
• Interval estimators are commonly called confidence intervals • Interval endpoints are called the upper and lower confidence limits • The probability the interval will enclose q is called the confidence coefficient or confidence level – Notation: 1-a or 100(1-a)% – Usually referred to as “100(1-a)” percent CIs Revision: 1-12 6 Confidence Intervals: The Main Idea
• Via the CLT, we know that Y is within 2 std
errors ( Y n ) of m 95% of the time • So, m must be within 2 SEs of 95% of the time
(Unobserved) sampling distribution of the mean
y 95% confidence (Unobserved) mY interval for mY
(Unobserved) population distribution (pdf of Y)
7 mYY 2 n In General
• A two-sided confidence interval: Lower confidence Upper confidence limit limit
Prqˆˆ q q 1 a LU Target Confidence parameter coefficient • A lower one-sided confidence interval: ˆ PrqL q 1 a • An upper one-sided confidence interval: ˆ Prq qU 1 a Revision: 1-12 8 Pivotal Method: A Strategy for Constructing CIs
• Pivotal method approach – Find a “pivotal quantity” that has following two characteristics: • It is a function of the sample data and q, where q is the only unknown quantity • Probability distribution of pivotal quantity does not depend on q (and you know what it is) • Now, write down an appropriate probability statement for the pivotal quantity and then rearrange terms…
Revision: 1-12 9 Example: Constructing a 95% CI for m, known (1)
• Let Y1, Y2, …, Yn be a random sample from a normal population with unknown mean mY and known standard deviation Y
• Create a CI for mY based on the sampling distribution of the mean: 2 Y~ NmYY , / n • To start, we know that (via standardizing): Y m Y ~N (0,1)
Y / n
Revision: 1-12 10 Example: Constructing a 95% CI for m, known (2)
• Now for Z ~ N(0,1) we know Pr( 1.96 Z 1.96) 0.95 – That is, there is a 95% probability that the random variable Z lies in this fixed interval
• Thus Y -mY Pr -1.96 1.96 0.95 Y / n
• So, let’s derive a 95% confidence interval…
Revision: 1-12 11 Example: Constructing a 95% CI for m, known (3)
Y -mY Pr -1.96 1.96 0.95 Y / n
Revision: 1-12 12 Example: Constructing a 95% CI for m, known (4)
• So, If Y1 = y1, Y2 = y2, …, Yn = yn are observed values of a random sample from a N m , 2 with known, then
Y y m1.96 is a 95% confidence interval for Y n • We can be 95% confident that the interval covers the population mean – Interpretation: In the long run, 19 times out of 20 the interval will cover the true mean and 1 time out of 20 it will not
Revision: 1-12 13 Calculating a Specific CI
• Consider an experiment with sample size
n=40, y 5.426 and Y=0.1
• Calculate a 95% confidence interval for mY
Revision: 1-12 14 Example 8.4
• Suppose we obtain a single observation Y from an exponential distribution with mean q. Use Y to form a confidence interval for q with confidence level 0.9. • Solution:
Revision: 1-12 15 Example 8.4 (continued)
Revision: 1-12 16 Example 8.5
• Suppose we take a sample of size n=1 from a uniform distribution on [0,q ], were q is unknown. Find a 95% lower confidence bound for q. • Solution:
Revision: 1-12 17 Example 8.5 (continued)
Revision: 1-12 18 Large-Sample Confidence Intervals
• If q ˆ is an unbiased statistic, then via the CLT
qˆ q Z qˆ has an approximate standard normal distribution for large samples • So, use it as an (approximate) pivotal quantity to develop (approximate) confidence intervals for q
Revision: 1-12 19 Example 8.6
• Let qˆ ~ N ( q , ) . Find a confidence interval qˆ for q with confidence level 1-a. • Solution:
Revision: 1-12 20 Example 8.6 (continued)
Revision: 1-12 21 One-Sided Limits
• Similarly, we can determine the 100(1-a)% one-sided confidence limits (aka confidence bounds): 100(1a )% lower bound for q qˆ z – a qˆ – 100(1 a )% upper bound for q qˆ z a qˆ • What if you use both bounds to construct a two-sided confidence interval? – Each bound has confidence level 1-a, so resulting interval has a 1-2a confidence level
Revision: 1-12 22 Example 8.7
• The shopping times of n=64 randomly selected customers were recorded with y 33 2 minutes and s y 256 . Estimate m, the true average shopping time per customer with confidence level 0.9. • Solution:
Revision: 1-12 23 Example 8.7 (continued)
Revision: 1-12 24 Example 8.8
• Two brands of refrigerators, A and B, are each guaranteed for a year. Out of a random
sample of nA=50 refrigerators, 12 failed before one year. And out of an independent random
sample of nB=60 refrigerators, 12 failed before one year. Give a 98% CI for pA-pB. • Solution
Revision: 1-12 25 Example 8.8 (continued)
Revision: 1-12 26 Example 8.8 (continued)
Revision: 1-12 27 What is a Confidence Interval?
• Before collecting data and calculating it, a confidence interval is a random interval – Random because it is a function of a random variable (e.g., Y ) • The confidence level is the long-run percentage of intervals that will “cover” the population parameter – It is not the probability a particular interval contains the parameter! • This statement implies that the parameter is random • After collecting the data and calculating the CI the interval is fixed – It then contains the parameter with probability 0 or 1
Revision: 1-12 28 A CI Simulation
• Simulated 20 95% confidence intervals with samples of size n=10 drawn from N(40,1) distribution • One failed to cover the true (unknown) parameter, which is what is expected on average
Revision: 1-12 29 Another CI Simulation
• Simulated 100 95% confidence intervals with samples of size n=10 drawn from N(40,1) distribution • 6 failed to cover the true (unknown) parameter – Close to the expected number: 5
Revision: 1-12 30 Illustrating Confidence Intervals
This is a demonstration showing confidence intervals for a proportion.
TO DEMO
Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html
Revision: 1-12 31 Summary: Constructing a Two-sided Large-Sample Confidence Interval
• For an unbiased statistic q ˆ , determine qˆ • Choose the confidence level: 1-a
• Find z a /2 – E.g., for a = 0.05, z 0.025 1.96 • Given data, calculate and • Then the 100(1-a)% confidence interval for q is ˆˆ qzzaa/2 ˆˆ, q /2 qq
Revision: 1-12 32 E.g., Constructing a Two-sided Large-Sample 95% CI for m
• Y is an unbiased estimator for m, and we
know Y Y n The confidence level is 1-a = 0.95 • So zza /2 0.025 1.96 • Given data, calculate y and the 95% CI for m is y1.96 n , y 1.96 n YY
Revision: 1-12 33 E.g., Constructing a Two-sided Large-Sample 95% CI for p
• For Y, the number of successes out of n trials, an unbiased estimator for p is pˆ Y/ n
• Then note that pˆ p(1 p ) / n – Follows from: Var(Y / n ) Var( Y ) / n22 np (1 p ) / n
– And, since we don’t know p, ˆpˆ p ˆ(1 p ˆ ) / n • As before, for a confidence level of 1-a = 0.95, zza /2 0.025 1.96 • So, the 95% CI for m is pˆ1.96 p ˆ 1 p ˆ n , p ˆ 1.96 p ˆ 1 p ˆ n Revision: 1-12 34 How Confidence Intervals Behave
• Width of CI’s: wz2 Y a /2 n
Y • Margin of error: Ez a /2 n – Bigger s.d. bigger s.e. wider intervals – Bigger sample size smaller s.e. narrower intervals – Higher confidence bigger z-values wider intervals
Revision: 1-12 35 Sample Size Calculations
• Often desire to determine necessary sample size to achieve a particular error of estimation – Must specify the estimation error B and know or well estimate the population standard deviation • Then for a 100(1-a)% two-sided CI solve Bza /2 n
for n: 2 za /2 n w Revision: 1-12 36 Example
• We want to estimate the average daily yield m of a chemical, where we know =21 tons • Find the sample size (n) so that a 95% CI for m has an error of estimation to be less than B=5 tons
Revision: 1-12 37 Example 8.9
• A stimulus reaction may take two forms: A or B. If we want to estimate the probability the reaction will be A, what sample size do we need if – We want the error of estimation less than 0.04 – The probability p is likely to be near 0.6 – And we plan to use a confidence level of 90% • Solution:
Revision: 1-12 38 Example 8.9 (continued)
Revision: 1-12 39 Example 8.10
• We’re going to compare the effectiveness of two types of training (for an assembly op) – Subjects to be divided into 2 equally sized groups – Measurement range expected to be about 8 mins – Estimate mean difference in assembly time to within 1 minute with 95% confidence • Solution:
Revision: 1-12 40 Example 8.10 (continued)
Revision: 1-12 41 Small-Sample Confidence Interval for m ( Unknown)
• For small n and unknown, standardized statistic no longer normally distributed • But, if Y is the mean of a random sample of size n from a distribution with mean m, Y m T n 1 sn/ has a t distribution with n-1 degrees of freedom – Precisely if population has normal distribution • See Theorems 7.1 & 7.3 and Definition 7.2 – Approximately for sample mean via CLT
Revision: 1-12 42 Very Similar to Confidence Interval for m with Known
• So, we can use the t distribution to build a CI! • Deriving using T as the pivotal quantity: Y m Prta/2,1n T n 1 t a /2,1 n Pr t a /2,1 n t a /2,1 n sn/
Pr taa/2,nn 1 s / n Y m t /2, 1 s / n
PrY taa/2,nn 1 s / n m Y t /2, 1 s / n
Revision: 1-12 43 So, Constructing a 95% Confidence Interval for m (with Unknown)
• Choose the confidence level: 1-a • Remember the degrees of freedom () = n -1
• Find t a / 2 , n 1 – Example: if a = 0.05, df=7 then t 0 . 025 , 7 = 2.365 • Calculate y and s / n • Then the 95% confidence interval for m is ss yy2.365 , 2.365 nn Remember, this value also depends on the dfs Revision: 1-12 44 Example 8.11
• A manufacturer of gunpowder has developed a new powder. Eight tests gave the following muzzle velocities in feet per second: 3,005 2,925 2,935 2,965 2,995 3,005 2,937 2,905 Find a 95% CI for the true average velocity m • Solution:
Revision: 1-12 45 Example 8.11 (continued)
Revision: 1-12 46 Small-Sample Confidence
Interval for m1-m2
• Suppose we want to compare the means of two normally distributed populations 2 – Population 1: mean m11 , variance 2 – Population 2: mean m22 , variance • Then
YY1 2 mm 1 2 ZN ~ (0,1) 22 12 nn 12 • Can use this as a pivotal quantity
Revision: 1-12 47 Small-Sample Confidence
Interval for m1-m2 , continued
222 • If we can further assume that 12 , then
YY1 2 mm 1 2 ZN ~ (0,1) 11 nn12 • But if is unknown, then need to appropriately estimate it • To do so, first estimate the two sample means 1 n1 1 n2 YY11 i YY22 i n i1 n i1 Revision: 1-12 1 2 48 Pooled Estimate of the Variance
• Then, the pooled estimate of variance: Sample mean for Sample mean for
population Y1 population Y2 nn 12()()y y22 y y 2 ii111ii 1 2 2 sp nn122 Average squared deviation from different means 2 • Can also express as a weighted average of s 1 2 and s 2 : 22 2 (n1 1) s 1 ( n 2 1) s 2 sp Revision: 2-10 nn122 49 Small-Sample Confidence
Interval for m1-m2 , continued
222 • So, assuming 12 , we have
2 Z YY1 2 mm 1 2 n12 n2 S p W / 11nn 2 nn2 12 12
YY1 2 mm 1 2 ~ T 11 n 1 S p nn12
Revision: 1-12 50 Example 8.12
• Lengths of time for two groups of employees to assemble a device:
Training Time to Assemble Type Measurements Standard 32 37 35 28 41 44 35 31 34 New 35 31 29 25 34 40 27 32 31 – Standard: Employees received standard training – New: Employees received a new type of training • Estimate the true mean difference in training
(m1-m2) with 95% confidence
Revision: 1-12 51 Example 8.12 Solution
Revision: 1-12 52 Example 8.12 (continued)
Revision: 1-12 53 CI for the Variance
• Let X1, X2, …, Xn be a random sample from a normal population with mean m and standard deviation • Consider the the pivotal quantity 2 22(nS 1) Pr1aa /2,nn 12 /2, 1 1 a • Then a confidence interval for the variance is: (n 1) S22 ( n 1) S Pra2 1 22 aa/2,nn 1 1 /2, 1 Revision: 1-12 54 Example: 95% CI for Variance
• After observing s2 = 25.4 for n=20 obs, calculate a 95% CI for 2 – For =19, chi-squared critical values are 8.906 and 32.852
– So: 22 (n 1) s2 ( n 1) s Pra 1 22 aa/2,nn 1 1 /2, 1 19 25.42 19 25.4 or, 0.95 32.852 8.906 Thu s, the 95% CI [14.69, 54.19 • Remember, the distribution is not symmetric, so be careful with a and a – Lower limit divides by the bigger critical value Revision: 1-12 55 Example 8.13
• We want to assess the variability of a measuring methodology. Three independent measurements are taken: 4.1, 5.2, and 10.2. Estimate 2 with confidence level 90%. • Solution:
Revision: 1-12 56 Example 8.13 (continued)
Revision: 1-12 57 Why Calculate CIs for ?
• Just like with m, is a population parameter – Sometimes need to know how well it is estimated by s • E.g., the precision of a weapon is inversely proportional to its standard deviation – if the standard deviation is large, the weapon is not precise – Confidence intervals for provide information about the likely range of the impact error – Big difference between a of 3 meters and a of 300 meters with implications for both collateral damage and friendly troops Revision: 1-12 58 Bootstrap Confidence Intervals
• Can use the bootstrap method to estimate confidence intervals • Basic idea: – Use bootstrap methodology to create an empirical sampling distribution for statistic of interest – Then take the appropriate quantiles of the empirical distribution for upper and lower end- points of confidence interval • As with point estimation, useful when it’s hard to analytically specify sampling distribution
Revision: 1-12 59 Caution! Confidence Intervals are Not for Prediction
• CI is an interval estimate for the population parameter • CIs do not predict the likely range of the next observation - common pitfall! • Interval for next observation is called a prediction interval • Prediction interval has variability of original random variable plus the uncertainty about the population parameter
Revision: 1-12 60 What We Covered in this Module
• Interval estimation – i.e., confidence intervals – Terminology – Pivotal method for creating confidence intervals • Types of intervals – Large-sample confidence intervals – One-sided vs. two-sided intervals – Small-sample confidence intervals for the mean, differences in two means – Confidence interval for the variance • Sample size calculations
Revision: 1-12 61
Homework
• WM&S chapter 8.5-8.9 – Required exercises: 40, 41, 42, 60, 63, 64, 71, 82, 91, 96 – Extra credit: 94 • Useful hints: Problems 8.91 and 8.96: Here’s you’re given the raw data and must calculate the necessary statistics first
Revision: 1-12 62