<<

Cardinal Rule of Inference Types

• NEVER- NEVER- NEVER- NEVER- NEVER • Statistical inferences are made when the target report a point estimate without a and sampled populations are the same corresponding report of the • Judgement inferences are the result of studies (variability) of the estimate. done where the target and sampled populations are • One method of reporting uncertainty is by stating not the same the • Another method of reporting uncertainty is by constructing a confidence interval

Proportions # 1 Proportions # 2

Confidence Interval General CI - Formula

• A confidence interval for a population parameter is a numeric interval that is computed from the • Any confidence interval has the general formula : . • The associated confidence is really a probability that the computed interval actually contains the Parameter = Point Estimate +/- Margin of Error true population parameter of interest.

Proportions # 3 Proportions # 4 Margin of Error (ME) Endpoints • The margin of error is the measure of the variability associated with the point estimate at the desired level of confidence • A confidence interval is made up of two endpoints • Small margins of error imply a higher precision which enclose a of values. than large margins of error • The lowest value in the computed confidence interval • The higher the desired confidence level, the larger is called the Lower Endpoint . the margin of error • The largest value in the computed confidence interval is called the Upper Endpoint .

Proportions # 5 Proportions # 6

Interpretation of Confidence Intervals Confidence Level • One popular interpretation is that, “After the • This is a proportion associated with ANY confidence is done the investigator is 95% interval. (or 90 or 99%) certain (confident) that the • Most often confidence levels of 0.90, 0.95, 98, and 0.99 are used. computed interval contains the TRUE population parameter of interest.” • You might also hear the intervals associated with these values referred to as 90, 95, 98,and 99 percent • We will use this definition for class confidence intervals purposes.

Proportions # 7 Proportions # 8 The Point Estimate Computing the point estimate

• Just as we used the sample as the • To compute p-hat, use the following point estimate for the true population mean formula: – so we use the sample proportion as the point estimate for the true population number of outcomes of interest pˆ = proportion. n • The sample proportion is denoted as: pˆ

Proportions # 9 Proportions # 10

Margin of Error (ME) CI’s for Proportions

To Compute ME for the sample proportion you use: Parameter = Point Estimate +/- Margin of Error ( pˆ )( 1 − pˆ ) ME = z • ( pˆ)( 1− pˆ) n p = pˆ ± z • n

Proportions # 11 Proportions # 12 Example Compute the Point Estimate • Another way to think of successes is to view a “success” as an outcome of interest. In this A political pollster would like to estimate the true example the outcome of interest is “favoring” the proportion of the population of county residents ballot issue that favor a controlled growth ballot issue. A SRS • So the point estimate is: of 350 county residents found that 230 of them favored the issue. Use this information to construct a 95% CI for the true proportion of 230 residents who favor the issue. pˆ = = .0 657 350

Proportions # 13 Proportions # 14

Find the z-value Compute the ME - 1

• There are 4 popular CL z-value levels of confidence. − = • ( pˆ)( 1 pˆ) Each one has its own 90% 1.645 ME z corresponding z-value n 95% 1.96 • In this example CL = .0( 657 )( 1− .0 657 ) 95% so the z-value is 98% 2.33 ME = 96.1 • 1.96 350 99% 2.58

Proportions # 15 Proportions # 16 Compute the ME - 2 Confidence Interval

• Now put everything together ( .0 657 )( .0 343 ) ME = 96.1 • Parameter = Point Estimate +/- Margin of Error 350 = 96.1 • .000644 p = 0.657 ± 0.0497 = 96.1 ( .0 0254 ) pl = 0.657 - 0.0497 = 0.607 = 0 .0497 pu = 0.657 + 0.0497 = 0.707 = (0.607, 0.707) Proportions # 17 Proportions # 18

Interpretation of Confidence Intervals Sample Size Determination Population Proportions

• The English interpretation follows the format • Researchers often want to plan a study to insure that the described for interpreting confidence intervals uncertainty in the point estimate will not exceed some described in a previous slide specified value. • In this case, we are 95% confident that the true • Another way to state this is that the investigators would proportion of county voters in favor of the ballot like to be sure that, at the end of the study, a CI for the issue is between 0.607 and 0.707 parameter of interest does not exceed some prescribed • Note that you might also interpret in terms of width or that the ME doesn’t exceed some specified percentages value.

Proportions # 19 Proportions # 20 Sample Size Determination – The Sample Size Determination conservative approach • The superintendent of a large school district wants to estimate p, the proportion of first-graders who have • In order to obtain a CI at a specified level of not had their immunization shots. She plans to use a confidence and ME you need to take “n” observations where: SRS of first-graders to obtain the estimate and she wants to be 95% confident that the point estimate p ˆ will z 2 be no more than 0.05 units from the true value of p. n = 4E 2

Proportions # 21 Proportions # 22

Sample Size Determination if you have Sample Size Determination if you an estimate of p have an estimate of p • What do we know? • Now it’s just plug-n-chug 1) E = 0.05 ˆ z 2 2) p = 0.10 n = 3) 1( − p ˆ ) = 1 - 0.10 = 0.90 4 E 2 4) z comes from the t- 96.1 2 table. For a 95% CI the = 2 value of z = 1.96 4()05.0 = 384 .2 → 385 •Note that we ALWAYS round up!!

Proportions # 23 Proportions # 24 The Estimation of Capture/Mark/Recapture Population Size • Estimation of the sizes of “hard to ” populations is often carried out via the method of • In order to wisely manage the annual elk hunting “capture/recapture”. season the division of wildlife would like to • This is a two-stage method. estimate the total number of elk in the state. • During the first stage a group from the population • In order to determine whether or not to remove a of interest is captured and marked in some way. species from the endangered species list the EPA wants to estimate the population size of that • After a prescribed time has passed a second sample is species. “re”captured and the proportion of marked units in this group is used to compute a sample proportion.

Proportions # 25 Proportions # 26

Capture/Mark/Recapture Setting • The value of the sample proportion is used further to estimate the true population size. • A wildlife biologist would like to estimate the • The underlying premise is, as usual, that the sample is a number of black bear living in the western part of good representation of the population. the state. She spends the summer capturing and • In other words, if my resample contains 10% marked units marking 82 black bear. The following year she then we say that the entire population must contain 10% collects another sample of 95 bear. 6 of these had marked units. been marked. Develop a point estimate and • This that the number of units caught and marked at corresponding 95% CI for the population size of the first stage must have been about 10% of the entire black bear in the western part of the state. population of interest.

Proportions # 27 Proportions # 28 Variables and Values

• Let m be the number of bear in the first sample that Step 1A: Compute pˆ were caught and marked:  m = 82 k • Let n be the number of bear in the caught in the pˆ = recapture phase: n  n = 95 = 6 • Let k be the number of marked bear present in the 95 recaptured sample: =  k = 6 0 . 0632

Proportions # 29 Proportions # 30

pˆ Step 1B: Interpret Step 2: Solve For Nˆ

•We can interpret this as: ˆ = The estimated proportion of marked bears in pˆN m the entire population of bears m Nˆ = ˆ • Since we know the total number of marked p bears - because we put the marks on’em - we = 82 can estimate the total number of bears in the .0 0632 population = 1298 3. → 1298

Proportions # 31 Proportions # 32 CI for N CI for p

•We’ll get a CI for N by using the CI for p Recall that we can construct the CI of a population •p is the true proportion of marked elk in the proportion by using the relation: population. •We don’t have this value but we have an Parameter = Point Estimate +/- Margin of Error estimate of it. Recall that p-hat is 0.0632.

( pˆ )( 1 − pˆ ) p = pˆ ± z • n

Proportions # 33 Proportions # 34

CI for p CI for p

• Here pˆ is the estimated proportion of marked bear. • n is the number of marked critters in the re- .0( 0632 )( 1 − .0 0632 ) captured sample. p = .0 0632 ± 96.1 • Since we want to develop a 95% CI, the 95 = ± appropriate z-value is 1.96 .0 0632 .0 0489 = .0( 0143 , .0 112 )

Proportions # 35 Proportions # 36 CI for N CI for N

• Just as we used pˆ to obtain an estimate for N so • Now use the endpoints of the CI for p to get the we’ll use the interval endpoints to obtain a CI for endpoints for the CI for N N m • Call the upper endpoint: pˆ u • The upper endpoint is: pˆ l • And call the lower endpoint: m pˆ l • And the lower endpoint will be: pˆ u

Proportions # 37 Proportions # 38

CI for N Interpretation of the CI • Now use the endpoints of the CI for p to get the endpoints for the CI for N • This CI is interpreted like all of the other 82 • The upper endpoint is: = 5734 3. → 5734 confidence intervals we’ve looked at thus far. 0.0143 82 • That is: We are 95% confident that the true • And the lower endpoint will be: = 732 1. → 732 number of bears in the population is between 732 0.112 and 5734. • Note that this is a very wide interval and would probably be of little use!

Proportions # 39 Proportions # 40 The Estimation of The Estimation of Population Totals Population Totals

• In 2002, the estimated total number of • These statistics are computed in three steps highway deaths where alcohol was a 1) Estimate the proportion of the population contributing factor is: ______with the characteristic of interest • The estimated number of practicing 2) Construct a CI for this estimate Catholics in Larimer County is: ______3) Use the values computed in steps 1 and 2 to • The estimated number of college obtain estimates and limits for the total number of undergraduates who use recreational drugs units in a population that exhibit the characteristic is: ______of interest

Proportions # 41 Proportions # 42

The Estimation of Population Totals Step 1A: Compute pˆ

In 2004 a was commissioned by the Wyoming state legislature to estimate the number 63 of its citizens that were living at or below the pˆ = = 0 . 125 poverty line. In a random sample of 504 of its 504 residents 63 reported a annual income that placed them at or below the poverty line.

Proportions # 43 Proportions # 44 Use census estimates for the population The Estimation of Population Totals To estimate the total number of persons So the estimated total number of persons living at or below the poverty line we need living at or below poverty is: to know the total number of people living in the state during 2004. This information is usually obtained through 0.125(506,529) = 63,316 the national census database In 2004 the estimated population was Recall that all reported estimates need to 506,529 include a report of the uncertainty

Proportions # 45 Proportions # 46

Uncertainty in estimates of CI for p Population Totals For purposes of illustration we’ll use a 90% level To report uncertainty we’ll use the limits of of confidence the CI for the proportion of Wyoming ( pˆ )( 1 − pˆ ) residents that are living at or below the p = pˆ ± z • poverty line. n ()().0 125 ⋅ 1 − .0 125 = .0 125 ± .1 645 ⋅ 504 = ().0 101 , .0 149

Proportions # 47 Proportions # 48 CI for N Interpret the CI for N

• The estimated number of Wyoming residents • The lower limit for N: living at or below the poverty line is 63,316

–Nl = 0.101(506,529) = 51,159 • We are 90% confident that the true number is • The upper limit for N: between 51,159 and 75,473

–Nu = 0.149(506,529) = 75,473

Proportions # 49 Proportions # 50