Module 5: (OA3102)

Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 8.5-8.9

Revision: 1-12 1 Goals for this Module

• Interval estimation – i.e., confidence intervals – Terminology – Pivotal method for creating confidence intervals • Types of intervals – Large- confidence intervals – One-sided vs. two-sided intervals – Small-sample confidence intervals for the , differences in two for the • Sample size calculations

Revision: 1-12 2 Interval Estimation

• Instead of estimating a with a single number, estimate it with an interval • Ideally, interval will have two properties: – It will contain the target parameter q – It will be relatively narrow • But, as we will see, since interval endpoints are a function of the , – They will be variable – So we cannot be sure q will fall in the interval

Revision: 1-12 3 Objective for Interval Estimation

• So, we can’t be sure that the interval contains q, but we will be able to calculate the probability the interval contains q • Interval estimation objective: Find an interval capable of generating narrow intervals with a high probability of enclosing q

Revision: 1-12 4 Why Interval Estimation?

• As before, we want to use a sample to infer something about a larger population • However, samples are variable – We’d get different values with each new sample – So our point estimates are variable • Point estimates do not give any information about how far off we might be (precision) • Interval estimation helps us do inference in such a way that: – We can know how precise our estimates are, and – We can define the probability we are right

Revision: 1-12 5 Terminology

• Interval are commonly called confidence intervals • Interval endpoints are called the upper and lower confidence limits • The probability the interval will enclose q is called the confidence coefficient or confidence level – Notation: 1-a or 100(1-a)% – Usually referred to as “100(1-a)” percent CIs Revision: 1-12 6 Confidence Intervals: The Main Idea

• Via the CLT, we know that Y is within 2 std

errors ( Y n ) of m 95% of the time • So, m must be within 2 SEs of 95% of the time

(Unobserved) distribution of the mean

y 95% confidence (Unobserved) mY interval for mY

(Unobserved) population distribution (pdf of Y)

7 mYY 2 n In General

• A two-sided confidence interval: Lower confidence Upper confidence limit limit

Prqˆˆ q  q  1  a  LU Target Confidence parameter coefficient • A lower one-sided confidence interval: ˆ PrqL  q  1  a • An upper one-sided confidence interval: ˆ Prq qU   1  a Revision: 1-12 8 Pivotal Method: A Strategy for Constructing CIs

• Pivotal method approach – Find a “” that has following two characteristics: • It is a function of the sample data and q, where q is the only unknown quantity • of pivotal quantity does not depend on q (and you know what it is) • Now, write down an appropriate probability statement for the pivotal quantity and then rearrange terms…

Revision: 1-12 9 Example: Constructing a 95% CI for m,  known (1)

• Let Y1, Y2, …, Yn be a random sample from a normal population with unknown mean mY and known Y

• Create a CI for mY based on the of the mean: 2 Y~ NmYY , / n • To start, we know that (via standardizing): Y m Y ~N (0,1)

Y / n

Revision: 1-12 10 Example: Constructing a 95% CI for m,  known (2)

• Now for Z ~ N(0,1) we know Pr( 1.96 Z  1.96)  0.95 – That is, there is a 95% probability that the Z lies in this fixed interval

• Thus  Y -mY Pr -1.96  1.96  0.95 Y / n

• So, let’s derive a 95% confidence interval…

Revision: 1-12 11 Example: Constructing a 95% CI for m,  known (3)

 Y -mY Pr -1.96  1.96  0.95 Y / n

Revision: 1-12 12 Example: Constructing a 95% CI for m,  known (4)

• So, If Y1 = y1, Y2 = y2, …, Yn = yn are observed values of a random sample from a N  m ,  2  with  known, then

Y y m1.96 is a 95% confidence interval for Y n • We can be 95% confident that the interval covers the population mean – Interpretation: In the long run, 19 times out of 20 the interval will cover the true mean and 1 time out of 20 it will not

Revision: 1-12 13 Calculating a Specific CI

• Consider an with sample size

n=40, y  5.426 and Y=0.1

• Calculate a 95% confidence interval for mY

Revision: 1-12 14 Example 8.4

• Suppose we obtain a single observation Y from an exponential distribution with mean q. Use Y to form a confidence interval for q with confidence level 0.9. • Solution:

Revision: 1-12 15 Example 8.4 (continued)

Revision: 1-12 16 Example 8.5

• Suppose we take a sample of size n=1 from a uniform distribution on [0,q ], were q is unknown. Find a 95% lower confidence bound for q. • Solution:

Revision: 1-12 17 Example 8.5 (continued)

Revision: 1-12 18 Large-Sample Confidence Intervals

• If q ˆ is an unbiased , then via the CLT

qˆ  q Z   qˆ has an approximate standard for large samples • So, use it as an (approximate) pivotal quantity to develop (approximate) confidence intervals for q

Revision: 1-12 19 Example 8.6

• Let qˆ ~ N ( q ,  ) . Find a confidence interval qˆ for q with confidence level 1-a. • Solution:

Revision: 1-12 20 Example 8.6 (continued)

Revision: 1-12 21 One-Sided Limits

• Similarly, we can determine the 100(1-a)% one-sided confidence limits (aka confidence bounds): 100(1a )% lower bound for q  qˆ z  – a qˆ – 100(1 a )% upper bound for q  qˆ z  a qˆ • What if you use both bounds to construct a two-sided confidence interval? – Each bound has confidence level 1-a, so resulting interval has a 1-2a confidence level

Revision: 1-12 22 Example 8.7

• The shopping times of n=64 randomly selected customers were recorded with y  33 2 minutes and s y  256 . Estimate m, the true average shopping time per customer with confidence level 0.9. • Solution:

Revision: 1-12 23 Example 8.7 (continued)

Revision: 1-12 24 Example 8.8

• Two brands of refrigerators, A and B, are each guaranteed for a year. Out of a random

sample of nA=50 refrigerators, 12 failed before one year. And out of an independent random

sample of nB=60 refrigerators, 12 failed before one year. Give a 98% CI for pA-pB. • Solution

Revision: 1-12 25 Example 8.8 (continued)

Revision: 1-12 26 Example 8.8 (continued)

Revision: 1-12 27 What is a Confidence Interval?

• Before collecting data and calculating it, a confidence interval is a random interval – Random because it is a function of a random variable (e.g., Y ) • The confidence level is the long-run percentage of intervals that will “cover” the population parameter – It is not the probability a particular interval contains the parameter! • This statement implies that the parameter is random • After collecting the data and calculating the CI the interval is fixed – It then contains the parameter with probability 0 or 1

Revision: 1-12 28 A CI Simulation

• Simulated 20 95% confidence intervals with samples of size n=10 drawn from N(40,1) distribution • One failed to cover the true (unknown) parameter, which is what is expected on average

Revision: 1-12 29 Another CI Simulation

• Simulated 100 95% confidence intervals with samples of size n=10 drawn from N(40,1) distribution • 6 failed to cover the true (unknown) parameter – Close to the expected number: 5

Revision: 1-12 30 Illustrating Confidence Intervals

This is a demonstration showing confidence intervals for a proportion.

TO DEMO

Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html

Revision: 1-12 31 Summary: Constructing a Two-sided Large-Sample Confidence Interval

• For an unbiased statistic q ˆ , determine  qˆ • Choose the confidence level: 1-a

• Find z a /2 – E.g., for a = 0.05, z 0.025  1.96 • Given data, calculate and • Then the 100(1-a)% confidence interval for q is ˆˆ qzzaa/2 ˆˆ, q /2  qq

Revision: 1-12 32 E.g., Constructing a Two-sided Large-Sample 95% CI for m

• Y is an unbiased estimator for m, and we

know Y  Y n The confidence level is 1-a = 0.95 • So zza /2 0.025 1.96 • Given data, calculate y and the 95% CI for m is y1.96 n , y 1.96 n YY

Revision: 1-12 33 E.g., Constructing a Two-sided Large-Sample 95% CI for p

• For Y, the number of successes out of n trials, an unbiased estimator for p is pˆ  Y/ n

• Then note that  pˆ p(1 p ) / n – Follows from: Var(Y / n ) Var( Y ) / n22  np (1  p ) / n

– And, since we don’t know p, ˆpˆ p ˆ(1 p ˆ ) / n • As before, for a confidence level of 1-a = 0.95, zza /2 0.025 1.96 • So, the 95% CI for m is pˆ1.96 p ˆ 1  p ˆ n , p ˆ  1.96 p ˆ 1  p ˆ n      Revision: 1-12 34 How Confidence Intervals Behave

 • Width of CI’s: wz2   Y a /2 n

Y • Margin of error: Ez a /2 n – Bigger s.d.  bigger s.e.  wider intervals – Bigger sample size  smaller s.e.  narrower intervals – Higher confidence  bigger z-values  wider intervals

Revision: 1-12 35 Sample Size Calculations

• Often desire to determine necessary sample size to achieve a particular error of estimation – Must specify the estimation error B and know or well estimate the population standard deviation  • Then for a 100(1-a)% two-sided CI solve  Bza /2 n

for n: 2 za /2 n   w Revision: 1-12 36 Example

• We want to estimate the average daily yield m of a chemical, where we know =21 tons • Find the sample size (n) so that a 95% CI for m has an error of estimation to be less than B=5 tons

Revision: 1-12 37 Example 8.9

• A stimulus reaction may take two forms: A or B. If we want to estimate the probability the reaction will be A, what sample size do we need if – We want the error of estimation less than 0.04 – The probability p is likely to be near 0.6 – And we plan to use a confidence level of 90% • Solution:

Revision: 1-12 38 Example 8.9 (continued)

Revision: 1-12 39 Example 8.10

• We’re going to compare the effectiveness of two types of training (for an assembly op) – Subjects to be divided into 2 equally sized groups – Measurement expected to be about 8 mins – Estimate mean difference in assembly time to within 1 minute with 95% confidence • Solution:

Revision: 1-12 40 Example 8.10 (continued)

Revision: 1-12 41 Small-Sample Confidence Interval for m ( Unknown)

• For small n and  unknown, standardized statistic no longer normally distributed • But, if Y is the mean of a random sample of size n from a distribution with mean m, Y  m T   n 1 sn/ has a t distribution with n-1 degrees of freedom – Precisely if population has normal distribution • See Theorems 7.1 & 7.3 and Definition 7.2 – Approximately for sample mean via CLT

Revision: 1-12 42 Very Similar to Confidence Interval for m with  Known

• So, we can use the t distribution to build a CI! • Deriving using T as the pivotal quantity: Y  m Prta/2,1n  T   n  1  t a /2,1 n   Pr   t a /2,1 n    t a /2,1 n  sn/

Pr taa/2,nn 1 s / n  Y m  t /2, 1 s / n 

PrY  taa/2,nn 1 s / n m  Y  t /2, 1 s / n 

Revision: 1-12 43 So, Constructing a 95% Confidence Interval for m (with  Unknown)

• Choose the confidence level: 1-a • Remember the degrees of freedom () = n -1

• Find t a / 2 , n  1 – Example: if a = 0.05, df=7 then t 0 . 025 , 7 = 2.365 • Calculate y and s / n • Then the 95% confidence interval for m is ss yy2.365 , 2.365 nn Remember, this value also depends on the dfs Revision: 1-12 44 Example 8.11

• A manufacturer of gunpowder has developed a new powder. Eight tests gave the following muzzle velocities in feet per second: 3,005 2,925 2,935 2,965 2,995 3,005 2,937 2,905 Find a 95% CI for the true average velocity m • Solution:

Revision: 1-12 45 Example 8.11 (continued)

Revision: 1-12 46 Small-Sample Confidence

Interval for m1-m2

• Suppose we want to compare the means of two normally distributed populations 2 – Population 1: mean m11 , variance 2 – Population 2: mean m22 , variance • Then

YY1 2 mm 1  2  ZN ~ (0,1) 22 12 nn 12 • Can use this as a pivotal quantity

Revision: 1-12 47 Small-Sample Confidence

Interval for m1-m2 , continued

222 • If we can further assume that  12  , then

YY1 2 mm 1  2  ZN ~ (0,1) 11   nn12 • But if  is unknown, then need to appropriately estimate it • To do so, first estimate the two sample means 1 n1 1 n2 YY11  i YY22  i n i1 n i1 Revision: 1-12 1 2 48 Pooled Estimate of the Variance

• Then, the pooled estimate of variance: Sample mean for Sample mean for

population Y1 population Y2 nn 12()()y y22  y  y 2 ii111ii 1 2 2 sp  nn122 Average squared deviation from different means 2 • Can also express as a weighted average of s 1 2 and s 2 : 22 2 (n1 1) s 1  ( n 2  1) s 2 sp  Revision: 2-10 nn122 49 Small-Sample Confidence

Interval for m1-m2 , continued

222 • So, assuming  12  , we have

2 Z YY1 2 mm 1  2  n12 n2 S p   W /  11nn  2 nn2  12   12

YY1 2 mm 1  2   ~ T 11  n 1 S p  nn12

Revision: 1-12 50 Example 8.12

• Lengths of time for two groups of employees to assemble a device:

Training Time to Assemble Type Measurements Standard 32 37 35 28 41 44 35 31 34 New 35 31 29 25 34 40 27 32 31 – Standard: Employees received standard training – New: Employees received a new type of training • Estimate the true mean difference in training

(m1-m2) with 95% confidence

Revision: 1-12 51 Example 8.12 Solution

Revision: 1-12 52 Example 8.12 (continued)

Revision: 1-12 53 CI for the Variance

• Let X1, X2, …, Xn be a random sample from a normal population with mean m and standard deviation  • Consider the the pivotal quantity 2 22(nS 1) Pr1aa /2,nn  12   /2,  1  1  a  • Then a confidence interval for the variance is: (n 1) S22 ( n 1) S Pra2   1  22 aa/2,nn 1 1  /2,  1 Revision: 1-12 54 Example: 95% CI for Variance

• After observing s2 = 25.4 for n=20 obs, calculate a 95% CI for  2 – For =19, chi-squared critical values are 8.906 and 32.852

– So: 22 (n 1) s2 ( n 1) s Pra   1  22 aa/2,nn 1 1  /2,  1 19 25.42 19 25.4 or,    0.95 32.852 8.906 Thu s, the 95% CI [14.69, 54.19 • Remember, the distribution is not symmetric, so be careful with a and a – Lower limit divides by the bigger critical value Revision: 1-12 55 Example 8.13

• We want to assess the variability of a measuring methodology. Three independent measurements are taken: 4.1, 5.2, and 10.2. Estimate 2 with confidence level 90%. • Solution:

Revision: 1-12 56 Example 8.13 (continued)

Revision: 1-12 57 Why Calculate CIs for ?

• Just like with m,  is a population parameter – Sometimes need to know how well it is estimated by s • E.g., the precision of a weapon is inversely proportional to its standard deviation – if the standard deviation is large, the weapon is not precise – Confidence intervals for  provide information about the likely range of the impact error – Big difference between a  of 3 meters and a  of 300 meters with implications for both collateral damage and friendly troops Revision: 1-12 58 Bootstrap Confidence Intervals

• Can use the bootstrap method to estimate confidence intervals • Basic idea: – Use bootstrap methodology to create an empirical sampling distribution for statistic of interest – Then take the appropriate quantiles of the empirical distribution for upper and lower end- points of confidence interval • As with , useful when it’s hard to analytically specify sampling distribution

Revision: 1-12 59 Caution! Confidence Intervals are Not for Prediction

• CI is an interval estimate for the population parameter • CIs do not predict the likely range of the next observation - common pitfall! • Interval for next observation is called a • Prediction interval has variability of original random variable plus the uncertainty about the population parameter

Revision: 1-12 60 What We Covered in this Module

• Interval estimation – i.e., confidence intervals – Terminology – Pivotal method for creating confidence intervals • Types of intervals – Large-sample confidence intervals – One-sided vs. two-sided intervals – Small-sample confidence intervals for the mean, differences in two means – Confidence interval for the variance • Sample size calculations

Revision: 1-12 61

Homework

• WM&S chapter 8.5-8.9 – Required exercises: 40, 41, 42, 60, 63, 64, 71, 82, 91, 96 – Extra credit: 94 • Useful hints:  Problems 8.91 and 8.96: Here’s you’re given the raw data and must calculate the necessary statistics first

Revision: 1-12 62