Stratified Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Stratified Sampling! Professor Ron Fricker! Naval Postgraduate School! Monterey, California! Reading Assignment:! Scheaffer, Mendenhall, Ott, & Gerow! 2/1/13 Chapter 5! 1 Goals for this Lecture! •" Define stratified sampling! •" Discuss reasons for stratified sampling! •" Derive estimators for the population! –" Mean! –" Total! –" Percentage! •" Sample size calculation and allocation! –" Sampling proportional to strata size! •" Discuss design effects! 2/1/13 2 Stratified Sampling! •" Stratified sampling divides the sampling frame up into strata from which separate probability samples are drawn! •" Examples! –" GWI survey, needed to obtain information from members of each military service! –" For external validity, WMD survey had to sample large urban areas! –" In a particular country, survey in support of IW requires information on both Muslim and Christian communities! 2/1/13 3 Reasons for Stratified Sampling! •" More precision! –" Assuming strata are relatively homogeneous, can reduce the variance in the sample statistic(s)! –" So, get better information for same sample size! •" Estimates (of a particular precision) needed for subgroups! –" May be necessary to meet survey’s objectives! –" May have to oversample to get sufficient observations from small strata ! •" Cost! –" May be able to reduce cost per observation if population stratified into relevant groupings! 2/1/13 4 An Example! 2/1/13 5 Source: Survey Methodology, 1st ed., Groves, et al, 2004. Strata! •" When selecting a stratified random sample, must clearly specify the strata! –" Non-overlapping categories into which each sampling unit must be classified! –" Sampling units can only be in one strata! –" Strata based on information about whole population! •" Can have more than one type of classification! –" E.g., officer/enlisted and male/female! –" Then each sampling unit (person in this case) must be classified into one of four stratum! 2/1/13 6 Some Notation! •" Additional notation for stratified sampling! •" L is the number of strata! •" Ni is the number of sampling units in stratum i •" ni is the sample size in stratum i •" N is the total number of sampling units in the … population: N = N1 + N2 + + NL •" In this lecture, we use SRS within each strata! •" That does not necessarily mean each strata has the same selection probabilities! •" And often more complex sampling done within the strata! 2/1/13 7 Mean Estimation! •" Within a strata, probability a unit is sampled is ni/Ni •" Via the SRS lecture, we know that an unbiased estimate of the mean of strata i is! n n n n 1 i 1 1 i 1 1 i N 1 i y y i y y y ∑ ij = ∑ ij = ∑ ij = ∑ ij = i N j 1 π N j 1 n / N N j 1 n n j 1 i = i i = i i i = i i = •" So the estimated total for strata i is!τˆiii= Ny •" And the population mean estimate is just the sum of all the estimated totals divided by the τˆ 11LL population size! yNyˆ st==∑∑τ i = i i NNii==11 N 2/1/13 8 Mean Estimation Summary! 1 L •" Estimator for the mean:! yNy st= ∑ i i N i=1 •" Variance of y st :! ⎛ 1 L ⎞ Var y Var N y ( st ) = ⎜ ∑ i i ⎟ ⎝ N i=1 ⎠ 1 L = N 2Var y 2 ∑ i ( i ) N i=1 1 L ⎛ n ⎞ ⎛ s2 ⎞ ! = N 2 1 i i 2 ∑ i ⎜ − ⎟ ⎜ ⎟ N i=1 ⎝ N ⎠ ⎝ n ⎠ i i 2 Var y •" Bound on the error of estimation:! ( st ) 2/1/13 9 Example: TV Viewing Time (1)! •" Survey to estimate average TV viewing time! •" Population consists of two urban areas and rural residents! •" Reasons to stratify:! –" Similarity of viewing habits by towns and rural regions! –" Cost: perhaps cheaper to survey each region separately (?)! •" Sample allocated proportional to strata size! 2/1/13 10 Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012. Example: TV Viewing Time (2)! 2/1/13 11 Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012. Example: TV Viewing Time (3)! 2/1/13 12 Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012. Example: TV Viewing Time (4)! •" Correct analysis:! y 27.7, ˆ 1.4, 95% CI = (24.9, 30.5) st==σ yst •" What if you just treated this as a SRS?! –" Incorrect analysis with fpc:! ySRS==27.7, σˆ y 10.6, 95% CI = (6.5, 48.9) –" Incorrect analysis without fpc:! ySRS′ ==27.7, σˆ y 11.3, 95% CI = (5.1, 50.3) 2/1/13 13 Design Effect! •" The design effect (aka deff) compares how a complex sampling design, in this case stratified sampling, compares to SRS! 1 L N 2 1− n / N s2 / n Var Y N 2 ∑ i ( i i ) i i d 2 = ( st ) = i=1 Var Y 1− n / N s2 / n ( SRS ) ( ) •" Design effect can be greater or less than 1! –" But with reasonably homogeneous strata, almost always get decrease in variance! –" For this example, ! Var Y 1.97 d 2 = ( st ) = = 0.02 Var Y 112.36 ( SRS ) 2/1/13 14 Total Estimation Summary! L ˆ Ny N y •" Estimator for the total:!τ st== st∑ i i i=1 •" Variance of τ ˆ st :! ⎛ L ⎞ Var ˆ Var N y (τ st ) = ⎜ ∑ i i ⎟ ⎝ i=1 ⎠ L = N 2Var y ∑ i ( i ) i=1 L ⎛ n ⎞ ⎛ s2 ⎞ ! = N 2 1 i i ∑ i ⎜ − ⎟ ⎜ ⎟ i=1 ⎝ N ⎠ ⎝ n ⎠ i i •" Bound on the error of estimation:! 2 Var ˆ (τ st ) 2/1/13 15 Proportion Estimation Summary! 1 L •" Proportion estimator:! pNpˆˆ st= ∑ i i N i=1 •" Variance of y st :! ⎛ 1 L ⎞ Var pˆ Var N pˆ ( st ) = ⎜ ∑ i i ⎟ ⎝ N i=1 ⎠ 1 L = N 2Var pˆ 2 ∑ i ( i ) N i=1 1 L ⎛ n ⎞ ⎛ pˆ (1− pˆ )⎞ = N 2 1 i i i 2 ∑ i ⎜ − ⎟ ⎜ ⎟ N i=1 ⎝ N ⎠ ⎝ n ⎠ ! i i 2 Var pˆ •" Bound on the error of estimation:! ( st ) 2/1/13 16 Sample Size Determination & Allocation! •" Calculations more complicated as need to ! –" Choose overall sample size n, and ! … –" Allocate sample to strata, n1 + n2 + + nL = n ! •" “Best” allocation depends on! –" Purpose of the stratification! –" Number of sampling units in each stratum! –" Variability of sampling units within each stratum! –" Cost of surveying each sampling unit from each stratum! 2/1/13 17 Sample Size Calculation for Estimating the Mean! B 2 Var y •" Begin as with SRS, setting! = ( st ) •" Now, define ai as the proportion of the sample size to allocate to strata i, ni= n x ai ! •" Then,! L N 2 2 a ∑ i σ i i i=1 n = L N 2 B2 4 N 2 2 + ∑ i σ i i 1 = 2/1/13 18 Proportionate Allocation to Strata! •" Sample size within each strata is proportional to strata size in population! •" If N is population size and n is total sample size, then nn ii // = NN where! –" Ni is the population size of stratum i –" ni is the sample size for stratum i ! •" Then! L L L N 2 2 a N 2 2 N N N 2 ∑ i σ i i ∑( i σ i ) ( i ) ∑ iσ i n = i=1 = i=1 = i=1 L L 1 L N 2 B2 4 + N 2σ 2 N 2 B2 4 + N 2σ 2 NB2 4 + N σ 2 ∑ i i ∑ i i N ∑ i i i=1 i=1 i=1 2/1/13 19 Example! y s2 Ni Ni/N ni ni/Ni i i •" Note y =++ (6 5 8) / 3 = 6.3 (simple average)! •" However! yst =(0.4× 6)+ (0.5× 5)+ (0.1× 8)= 5.7 2/1/13 20 Source: Survey Methodology, 1st ed., Groves, et al, 2004. Other Allocation Schemes! •" Rather than allocating sample proportional to strata size, can! –" Allocate according to variability of the strata! •" Idea is to allocate more of the sample to strata that are more variable! •" Done right, can provide most precise population estimates! –" Allocate according to the cost of collection per strata! •" Idea is to allocate more of the sample to strata that cost less! 2/1/13 21 Poststratification! •" Poststratification used when strata variable(s) not observed until after survey completed! –" That is, stratify on data collected in the survey! •" Useful when sample demographics turn out not to match the population demographics! –" As with stratification, still need to know the distribution of the strata variable in the population! –" But don’t need to know the values for each element in the sampling frame! •" See text for more detail ! 2/1/13 22 What We Have Covered! •" Defined stratified sampling! •" Discussed reasons for stratified sampling! •" Derived estimators for the population! –" Mean! –" Total! –" Percentage! •" Sample size calculation and allocation! –" Sampling proportional to strata size! •" Discussed design effects! 2/1/13 23 .