IntroductiontoEstimation

POINT&INTERVALESTIMATION Basicdefinitionsandconcepts AND Theassignmentofvalue(s)toapopulationparameter (INTRODUCTIONTOTESTING) basedonavalueofthecorrespondingsampleis calledestimation.

The value(s) assignedto a population parameter based on thevalueofasamplestatisticiscalledanestimate.

Thesamplestatisticusedtoestimateapopulation parameteriscalledan.

2

Estimationsteps Point

Theestimationprocedureinvolvesthefollowingsteps:  APointEstimation 1. Select a sample. The value of a samplestatistic that isusedto estimate a populationparameteriscalledapointestimate.Usually, 2. Collecttherequiredinformationfromthemembersof wheneverweusepointestimation,wecalculatethemarginof the sample. errorassociated with that point estimation,which is     s 3. Calculatethevalueofthesamplestatistic. calculatedasfollows: Margin of error 1.96 x or 1.96 x 4. Assigg()pgppnvalue(s)tothecorrespondingpopulation Pointestimateisbasedonjustonesample,wecannot parameter. expectittobeequaltothecorrespondingpopulation parameter.Indeed,eachsamplewillhaveadifferent, non of them isequal to.But they are all unbiased estimatesof.(Recallthatunbiased their expectedvalueisequalto.)

3 4

Statistics PointEstimates   Parameters Z Astatisticisapropertyofasamplefromthe Z Instatisticalinference,thetermparameter isusedtodenote population. aquantity,say,thatisapropertyofanunknownprobability Z Astatisticisdefinedtobeanyfunctionofrandom distribution. variables.So,itisalsoarandomvariable.For Z Forexampp,le,the, , orap articularquantile of example,thesamplemean,samplevariance,ora theprobabilitydistribution particularsamplequantile. Z Parametersareunknown,andoneofthegoalsofstatistical Z The observedvalue of the statisticcan be calculated calculated inferenceis to estimate them . fromtheobservedvaluesofrandomvariables.  Estimation Examples of statistics: Z Aprocedureof“guessing”propertiesofthepopulationfrom whichdata are collected . X XX sample mean X  12 n Z Apointestimateofanunknownparameterisastatisticthat n representsa“guess”oftheparameterofinterest. n 2 Z B ()XXi  Theremay be more than one sensiblepoint estimate of a sample variance S 2  i1 n  parameter. 5 1 6 Therelationshipbetweenanunknown PropertiesofEstimatorsthatWeDesire parameteranditspointestimator  UbiUnbiased ness: E( ˆ )  Inotherwordswewouldwishthattheexpectedvalueof theestimatoristhesameasitstruevalue. Wedefinebiasofanestimator asthedifferencebetween theexpectedvalueoftheestimatorandthetruevaluein thepppopulation:  :wewishtominimizethemeansquareerror aroundthetruevalue.Theefficiencytellsushowwellthe estimatorperformsinpredicting.Amongunbiased estimatorstherefore,wewanttheonewiththesmallest variance.  Consistency.Assamplesizeincreases,variationofthe estimatorfromthetruepopulationvaluedecreases.

7 8

Unbiasedness Efficiency

P(X) Unbiased Biased P(X) Distribution of of Mean

 

9 10

Consistency Intervalestimation: Larger General approach P(X) sample size B Smaller sample size A



11 12 IntervalEstimation ConfidenceIntervalEstimation

Definition  Outline: Ininterval estimation,aninterval is constructed around  Procedure: thepointestimate,anditisstatedthatthisintervalislikely  1. Sample point estimator ( X or p ) tocontainthecorrespondingpopulationparameter.  2. Confidence level and Table Zortn-1  3. Formulas compute UCL and LCL:  point estimator  margin of error

   x x  $1370

13 $1130 $1610

IntervalEstimationofthePopulationMean Eachintervalisconstructedwithregardtoagivenconfidencelevel Intervalestimationofapopulation andiscalledaconfidenceinterval. The confidence levelassociated with a confidence intervalstateshow mean: The case of known muchconfidencewehavethatthisintervalcontainsthetrue populationparameter.Theconfidencelevelisdenotedby(1– ))%100%. The(1– )100%confidenceintervalfor (populationmean)is:

x  z x if  is known and x  zsx if  is not known,

where  x / n and sx s/ n

Thevalueofz usedherecanbefoundfromthestandardnormal distributiontable,forthegivenconfidencelevel. Themaximumerrorofestimatefor,denotedbyE,isthequantity thatis subtracted from and added to thevalue of x toobtain a confidenceintervalfor.Thus, E  z x or zsx 16 17

IntervalEstimationofthePopulationMean IntervalEstimationofthePopulationMeanwhen is when is known:Example known:AnswerstotheExample x Apublishingcompanyhasjustpublishedanewcollegetextbook. Herewetakeadvantageofourknowledgeondistributionofto Beforethecomppyanydecidesthep riceatwhichtosellthis developaconfidenceintervalfor.  4.50 textbook,itwantstoknowtheaveragepriceofallsuch a) n=36,x =$70.50,and =$4.5,thus:    $.75 x n textbooksinthemarket.Theresearchdepartmentatthe Pointestimateof =x =$70.50 36       company tooka sample of 36 comparable textbooks and Marginof error= 1.96 x 1.96(.75) $1.47 collectedinformationontheirprices.Thisinformationproduces b) Confidencelevelis90%or.90;andz =1.65. ameanpriceof$70.50forthissample.Itisknownthatthe x  z     standddiidarddeviationof fhthepri cesof fllhallsuchtextb oo ki$450ksis$4.50. x 70.50 1.65(.75) 70.50 1.24   (a) Whatisthepointestimateofthemeanpriceofallsuch (70.50 -1.24) to (70.50 1.24) textbooks?What is the margin of error for the estimate?  $69.26 to $71.74 (b) Constructa90%confidenceintervalforthemeanpriceofall Basedonourresults,wecansaythatweare90%confidentthat suchcollegetextbooks. themean price of all such college textbooks is between $69.26and$71.74. 18 19 Example 1: 1. for Population Mean Answer: Example 1: ( known case) • n = 49 In an effort to estimate the mean amount spent per customer for dinner at a major Atlanta restaurant. Data were collected X  $24.8 for a sample of 49 customers over a three-week period. Assume a population of $5. =$5 a. At the 95% confidence, what is the margin error? • Z: (1- )/2 = 0.95/2 = 0.475 Table 1: Z = 1.96  5 •1. Z x  (1.96) 1.4 b. If the sample mean is $24.80, What is the 95% / 2 n 49  for the population mean? 2. UCL  X  Z x  24 .8  1.4  26 .2 / 2 n  LCL  X  Z x  24 .8  1.4  23 .4 / 2 n : [23.4, 26.2]

IntervalEstimationofthePopulationMean when isunknown

Intervalestimationofapopulation  Insteadofpopulationstandarddeviation wehave mean: The case of unknown samplestandard deviation s.

 Insteadofnormaldistribution,wehavetdistribution Thetdistribution isusedtoconstructaconfidence intervalabout if: 1. Thepopul ati onf romwhi c h thesamp le is drawn is (approximately)normallydistributed; 2. Thesampp(,);lesizeissmall(thatis,n<30); 3. Thepopulationstandarddeviation,,isnotknown.

22 23

t The Distribution Thet Distribution:Example Thetdistribution isaspecifictypeofbellshapeddistributionwith alowerheightandawiderspreadthanthestandardnormal Findthevalueoft for16degreesoffreedomand.05areain distribution.Asthesamppg,lesizebecomeslarger,thet t distributionapproachesthestandardnormaldistribution.A ther ig ht ta ilo fa dis tr ibu tioncurve. specifictdistributiondependsononlyoneparameter,calledthe Area in the Right Tail Under the t Distribution Curve degreesof freedom (df). Themean of the t distributionis equal df .10 .05 .025 ….001 df df  to0anditsstandarddeviationisfoundby.The/( 2) 1 3.078 6.314 12.706 … 318.309 graphbelowdepictsthecaseofdf=3. 2 1.886 2.920 4.303 … 22.327 3 1.638 2.353 3.182 … 10.215 Thestandarddeviationofthe Thestandarddeviationofthet . … … … … … standardnormaldistributionis1.0 distributionis 9 /(9  2)  1.134 16 1.337 1.746 2.120 … 3.686 . … … … … …

Therequiredvalueoft for16 df and.05areaintherighttail. 24 25 = 0 ConfidenceIntervalforPopulationmean Using t (continued) The Distribution thet Distribution

Thet distributionwith16degreesoffreedom,areasunderthe rightand the left tails . The (1– )100%confidenceinterval for is s x  ts where s  x x n

Thevalueoft isobtainedfromthet distributiontableforn – 1dffddhfdlldegreesoffreedomandthegivenconfidencelevel. .05 1.746

26 -1.746 0 27

ConfidenceIntervalforPopulationmean Using ConfidenceIntervalforPopulationmean Using thet Distribution:Example thet Distribution:ExampleAnswered

Dr.Moorewantedtoestimatethemeancholesterollevelfor  Confidencelevelis95%or.95,withdf =n – 1=25– 1=24  Areaineachtail=.5– (.95/2)=.5 .4750=.025 alladult men living in Hartford. He took a sampleof 25 adult s  Thevalueoft intherighttailis2.064,and s   12  menfromHartfordandfoundthatthemeancholesterol x 2.40 n 25 levelforthissampleis186withastandarddeviationof12. Assumethatth ec hlholestero lllleve lflsfora lldllladultmenin df Hartfordare(approximately)normallydistributed.Construct = 24 a95% confidence intervalfor the population mean. .025 .025 .4750 .4750

x tsx 186 2.064(2.40) 186 4.95 181.05 to 190.95

 Thus,wecanstatewith95%confidencethatthemean cho les tero ll eve lf orall ad u ltmenli v ingi nH arf or dli esb e tween 181.05and190.95. 28 29

Example 2: Example 2: ( known case) Given: n = 100,,,, X = 49, S = 8.5, 1- = .95 The mean flying time for pilots at Continental Think: What to estimate? Use Z or t? Airlines is 49 hours per month. This mean was Answer: based on a sample of 100 pilots and the sample • Sample info (given): n = 100,X = 49, S = 8.5 standard deviation was 8.5 hours. • t: 1- =0.95, so /2=0.025, d.f.=n-1=99 Table 2: dfd.f.=100, /2=0.025 t=1.984 a. At 95% confidence, what is the margin of error? d.f.=80, /2=0.025 t=1.990 100  99 b. What is the 95% confidence interval estimate of *Interpolation: t 1.984  (1.990 1.984) 1.9843 the population mean flying time? 100 80 S 8.5 c. The mean flying time for pilots at United Airlines a. m.o.e.: m.o.e.  t  1.9843  1.69 / 2 n is 36 hours per month. Discuss difference 100 between the flying times at two airlines. b. UCL = 49+1.69 =50.69 LCL = 49 – 1.69 = 47.31 : [47315069][47.31, 50.69] c. 36 < LCL. The mean flying time is lower at United. Example Solution Twentyfiverandomlyselectedadultswhobuybooks  forgeneralreadingwereaskedhowmuchthey Confidence level is 99% or .99  s 300 usually spend on booksper year. sx    $60 n 25 df n Thesampleproducedameanof$1450anda  = – 1 = 25 – 1 = 24 standar d dev iat ionof $300 forsuc hannua lexpenses.  Area in each tail = . 5 – (.99/2) = .5 -.4950 = . 005  The values of t are 2.797 and -2.797 Assumethatsuch expenses for all adults who buy booksforgeneralreadinghaveanapproximate  The 99% confidence interval for is normaldistribution. x  tsx  $1450  2.797(60) Determinea99%confidenceintervalforthe   corresppgppondingpopulationmean. $1450 $167.82  $1282.18 to $1617.82 32 33

Intervalestimationofapopulationproportion: Thecaseoflarggpesamples Intervalestimationofapopulation EstimatoroftheStandardDeviationof pˆ The value of,s which gives a point estimate of, is proportion: The case of large samples pˆ pˆ calculatedas pq s  ˆ ˆ pˆ n

The(1– )100%confidenceintervalforthepopulation proportion,p,is p  zs ˆ pˆ

Thevalueofz usedhereisobtainedfromthestandard normaldistributiontableforthegivenconfidencelevel,and s  pq/n pˆ ˆ ˆ 34 35

Intervalestimationofapopulationproportionin Intervalestimationofapopulationproportionin caseoflargesamples:Example caseoflargesamples:Examplesolved Accordingtoa2002surveybyFindLaw.com,20%of  n =1000,=.20,and,=.80pˆ qˆ Americansneededlegaladviceduringthepastyearto  Note thatnpˆ andnqˆ are both greater than 5. resolvesuchthornyissuesasfamilytrustsandlandlord pq disputes(CBS.MarketWach.com,August6,2002).Suppose s  ˆ ˆ  (.20)(.80)  pˆ .01264911 arecent sample of 1000adult Americans showedthat 20% n 1000 ofthemneededlegaladviceduringthepastyeartoresolve a)Pointestimateofp ==.20pˆ s suchfamilyrelatedissues. pˆ a) Whatisthepointestimateofthepopulation MifMarginoferror=±1961.96=±1.96( . 01264911)=± .025or±25%2.5% proportion?Whatisthemarginoferrorofthis b)Theconfidencelevelis99%,or.99. estimate? b) Find,witha99%confidencelevel,thepercentageofall Thez valuefor.4950isapproximately2.58. adultAmericanswhoneededlegaladviceduringthe pˆ  zs  .20  2.58( .0126 4911)  .20  .033 pastyeartoresolvesuchfamilyrelatedissues. pˆ 36 37  .167 to .233 or 16.7% to 23.3% Determiningthesampleforestimationofthe Determiningthesamplesizeforestimatingthe mean ppproportion  Given the confidence level and the values of p and q, the  Giventheconfidencelevelandthestandarddeviationofthe sample size that will produce a predetermined maximum population, the samplesize that will produce a error E of the confidence interval estimate of p is: z 2 pq pq predeterminedmaximumerrorE oftheconfidenceinterval n  ,with E  z p  z  estimateof is: z 2 2  E 2 ˆ n n  E  z  z. E 2 ,with x n  p q  Example: In case the values of and are not known:  Take the most conservative estimate of the sample size n Analumniassociationwantstoestimatethemeandebtofthisyear’s by using p = .5 and q = .5. For a given E, these values of p and q will give the largest sample size in comparison to collegegraduates.Itisknownthatthepopulationstandard any other pair of values of p = .5 and q = .5 since their deviationof the debts of this year ’s college graduatesis $11 , 800. product is greater than the product of any other pair. Howlargeasampleshouldbeselectedsothattheestimatewitha 99%confidenceleveliswithin$800ofthepopulationmean?  Take a preliminary sample of arbitrarily determined size and calculate p and q from this sample. Then use them z 2 2 (2.58) 2 (11,800) 2 n ˆ ˆ n    1448 .18  1449 to find . 38 E 2 (800) 2 39

Determiningthesamplesizeforestimatingthe Determiningthesamplesizeforestimatingthe proportion:Example ppproportion:Examp le(()continued)

 LombardElectronicsCompanyhasjustinstalledanewmachinethat  ConsiderthepreviousExampleagain.Supposeapreliminary makesapartthatisusedinclocks.Thecompanywantstoestimatethe sample of 200parts produced by this machine showed that proportionofthesepartsproducedbythismachinethataredefective. Thecompanymanagerwantsthisestimatetobewithin.02ofthe 7%ofthemaredefective.Howlargeasampleshouldthe populationproportionfora95%confidencelevel.Whatisthemost comppyanyselectsothatthe95%confidenceintervalforp is conservativeestimateofthesamplesizethatwilllimitthemaximum within.02ofthepopulationproportion? errortowithin.02ofthepopulationproportion?  Answer:  Answer:  Thevalueofzfora95%confidencelevelis1.96; p =.5andq =.5 2 2 p q , z pq (1.96) (.50)(.50) ˆ =.07and=.93ˆ thus:  n    2401 E 2 2 z 2 pq 2 (.02) n  ˆ ˆ  (1.96) (.07)(.93) E 2 2  Thus,ifthecompanytakesasampleof2401parts,thereis95%chance (.02) p (3.8416)(.07)(.93) thattheestimateof willbewithin.02ofthepopulationproportion.   625.22  626 40 41 .0004

INTERVALESTIMATIONOFAPOPULATIONMEAN: AnIntervalEstimationcont. LARGESAMPLES

 Definition ConfidenceIntervalfor forLargeSamples The(1– )100%confidenceintervalfor is  Eachintervalisconstructedwithregggardtoagiven confidencelevel andiscalledaconfidenceinterval.The x  z x if  is known confidencelevelassociatedwithaconfidenceinterval x  zs  stateshowmuchconfidencewehavethatthisinterval x if is not known containsthetruepopulationparameter.Theconfidence where  x   / n and sx  s / n levelis denoted by(1 – )100%. Thevalueofz usedhereisreadfromthestandard normaldistributiontableforthegivenconfidence level.

42 43 Repeat, again: Finding z for a 95% confidence level. Area in the tails – what this means?

Total shaded area is .9500 or 95%

2 2 (1 – ) .4750 .4750 x -z 0 z z -1.96 0 1.96 z 44 45

Interval Estimation for Population Proportion Example 3: Answer: x 152 El3Example 3: a. Point estimate of p: p    .4393 A survey asked 346 job seekers. The answer selected most n 346 (152 times) was “higher compensation.” b. Confidence interval: • Z(1Z: (1- )/2= 0. 475, Ta ble 1 Z196Z=1.96 p(1 p) (0.4393)(1 0.4393) a. What is the point estimate of the proportion of job seekers • Margin of error = Z  ( 1 . 96 ) = 0.0523 who would select “higher compensation ” as the reason of n 346 changing jobs? • Confidence interval UCL  p  m o e    a. What is the 95% confidence interv al estimate of the . . . .4393 .0523 .4916 population proportion? LCL  p  m.o.e.  .4393  .0523  .3870

p: [.3870 , . 4916]

Sample Size and Precision Quality of estimation: Intervalestimation: • Confidence level: 1 - Samplesize and precision • Precision: margin of error Confidence level: 1 - is guaranteed by procedure. sampling distribution for sample mean.

Probability that   X falls between   Z x and   Z x / 2 n / 2 n is 1 - . In general, any sample mean that is within this will provide an interval that contains the population mean . Margin of error: Given n, then (1 - ) margin of error . Given 1 - , then n margin of error . 48 Determine sample size to meet requirements for both Example 4: confidence level and margin of error: Bride’s magazine reported that the mean cost of a wedding is 1. Determine sample size for estimation of  $19,000. Assume that the population standard deviation is $9,400. Use 95% confidence, F Z 2  2 V n  G / 2 W a.What is t he reco mme nded sam pl e siz e if th e desir ed m ar gin HG E 2 XW of error is $1,000? b. What is the recommended sample size if the desired margin E: desired margin of error of error is $500? [ ]: round up (“Be conservative”) Answer:  Z 1  Z  2. Determine samppple size for estimation of p a. / 2 : .475 / 2 1.96 2 F Z 2 p  p V 2 2 / 2 (1 ) F (1.96 ) (9400 ) V n  G W n  G W  339 .44  340 H E 2 X H 1000 2 X b. p = .5 (“Be conservative”) F(1.96)2 (9400)2 V E: desir ed m ar gin of err oor n  G W  1357.78 1358 H 500 2 X [ ]: round up (“Be conservative”)

Example 6: Example The League of American Theatres and Producers uses an According to the analysis of a CNN–USA TODAY–Gallup ongoing audience tracking survey that provides up-to-date poll conducted in October 2002, “Stress has become a in formati on ab ou t B road way thea ter audi ences. Every wee k, the common part of everyday life in the United States. The demands of work, family, and home place an increasing League distributes a one-page survey on random theater seats burden on the average American.” at a rotation roster of Broadway shows. a. How large a sample should be taken if the desired margin of According to this poll, 40% of Americans included in the error on any proportion is 0.04? Use 95% confidence. survey indicated that they had a limited amount of time to Answer: relax (Gallup. com, November 8, 2002).  Z 1  Z  The poll was based on a randomly selected national / 2 : .475 , Table 1 / 2 1 .96 sample of 1502 adults aged 18 and older. 2 F (1 .96 ) 2 .5(1  .5) V Construct a 95% confidence interval for the corresponding n  G W  600 .25   601 popultilation propor tion. H (. 04 ) 2 X 53

Solution Example  Confidence level = 95% or .95 According to a report by the Consumer Federation of  The value of z for .95 / 2 = .4750 is 1.96. America,,, National Credit Union Foundation, and the Credit Union National Association, households with negative pˆqˆ (.40)(.60) assets carried s    an average of $15,528 in debt in 2002 pˆ .01264069 n 1502 (CBS.MarketWatch.com, May 14, 2002).

Assume that this mean was based on a random sample pˆ  zs  .40 1.96(.01264069) of 400 households and that the standard deviation of pˆ debts for households in this sample was $4200.   .40 .025 Make a 99% confidence interval for the 2002 mean debt  .375 to .425 or 37.5% to 42.5% for all such households.

54 55 Solution Example  Confidence level 99% or .99 Lombard Electronics Company has just installed a new  s 4200 machine that makes a part that is used in clocks. The s    $210 x n company wants to estimate the proportion of these parts 400 produced by this machine that are defective.  The sample is large (n > 30) z Z Therefore, we use the normal distribution = 282.58 The company manager wants this estimate to be within .02 of the population proportion for a 95% confidence x  zsx  15,528  2.58(210)  15,528  541.80 level.  $14,986.20 to $16,069.80 What is the most conservative estimate of the sample size that will limit the maximum error to within . 02 of the Thus, we can state with 99% confidence that the 2002 population proportion? mean debt for all households with negative assets was between $14 ,986 . 20 and $16, 069. 80.

56 57

Solution Example  The value of z for a 95% confidence level is 1.96. Consider previous example again.  p = .50 and q = .50 z 2 pq (1.96) 2 (.50)(.50) Suppose a preliminary sample of 200 parts produced by  n    2401 this machine showed that 7% of them are defective. E 2 (.02) 2

 Thus, if the company takes a sample of 2401 parts, there is How large a sample should the company select so that the 95% chance that the estimate of p will be within .02 of the 95% confidence interval for p is within .02 of the pppopulation population proportion. proportion?

58 59

Solution

pˆ = .07 and q ˆ = .93

z 2 pq 2 n  ˆ ˆ  (1.96) (.07)(.93) E 2 (.02) 2 (3.8416)(.07)(.93)   625.22  626 .0004

60