<<

Stratified

Professor Ron Fricker Naval Postgraduate School Monterey, California

Reading Assignment: Scheaffer, Mendenhall, Ott, & Gerow 2/1/13 Chapter 5 1 Goals for this Lecture

• Define stratified sampling • Discuss reasons for stratified sampling • Derive estimators for the population – – Total – Percentage • size calculation and allocation – Sampling proportional to strata size • Discuss design effects

2/1/13 2 Stratified Sampling

• Stratified sampling divides the sampling frame up into strata from which separate probability samples are drawn • Examples – GWI , needed to obtain information from members of each military service – For external validity, WMD survey had to sample large urban areas – In a particular country, survey in support of IW requires information on both Muslim and Christian communities

2/1/13 3 Reasons for Stratified Sampling

• More precision – Assuming strata are relatively homogeneous, can reduce the in the sample (s) – So, get better information for same sample size • Estimates (of a particular precision) needed for subgroups – May be necessary to meet survey’s objectives – May have to oversample to get sufficient observations from small strata • Cost – May be able to reduce cost per observation if population stratified into relevant groupings 2/1/13 4 An Example

2/1/13 5 Source: , 1st ed., Groves, et al, 2004. Strata

• When selecting a stratified random sample, must clearly specify the strata – Non-overlapping categories into which each sampling unit must be classified – Sampling units can only be in one strata – Strata based on information about whole population • Can have more than one type of classification – E.g., officer/enlisted and male/female – Then each sampling unit (person in this case) must be classified into one of four stratum

2/1/13 6 Some Notation

• Additional notation for stratified sampling • L is the number of strata

• Ni is the number of sampling units in stratum i

• ni is the sample size in stratum i • N is the total number of sampling units in the … population: N = N1 + N2 + + NL • In this lecture, we use SRS within each strata • That does not necessarily mean each strata has the same selection probabilities • And often more complex sampling done within the strata

2/1/13 7 Mean Estimation

• Within a strata, probability a unit is sampled is

ni/Ni • Via the SRS lecture, we know that an unbiased estimate of the mean of strata i is n n n n 1 i 1 1 i 1 1 i N 1 i y = y = i y = y = y N ∑ ij N ∑ n / N ij N ∑ n ij n ∑ ij i i j=1 π i i j=1 i i i j=1 i i j=1

• So the estimated total for strata i isτˆiii= Ny • And the population mean estimate is just the sum of all the estimated totals divided by the τˆ 11LL population size yNyˆ st==∑∑τ i = i i NNii==11 N 2/1/13 8 Mean Estimation Summary

1 L • Estimator for the mean: yNy st= ∑ i i N i=1 • Variance of y st : ⎛ 1 L ⎞ Var y Var N y ( st ) = ⎜ ∑ i i ⎟ ⎝ N i=1 ⎠ 1 L = N 2Var y 2 ∑ i ( i ) N i=1 1 L ⎛ n ⎞ ⎛ s2 ⎞ = N 2 1 i i 2 ∑ i ⎜ − ⎟ ⎜ ⎟ N i=1 ⎝ N ⎠ ⎝ n ⎠ i i • Bound on the error of estimation: 2 Var y ( st ) 2/1/13 9 Example: TV Viewing Time (1)

• Survey to estimate average TV viewing time • Population consists of two urban areas and rural residents • Reasons to stratify: – Similarity of viewing habits by towns and rural regions – Cost: perhaps cheaper to survey each region separately (?) • Sample allocated proportional to strata size

2/1/13 10 Source: Elementary , 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012. Example: TV Viewing Time (2)

2/1/13 11 Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012. Example: TV Viewing Time (3)

2/1/13 12 Source: Elementary Survey Sampling, 7th ed., Scheaffer, Mendenhall, Ott, and Gerow, 2012. Example: TV Viewing Time (4)

• Correct analysis: y 27.7, ˆ 1.4, 95% CI = (24.9, 30.5) st==σ yst

• What if you just treated this as a SRS? – Incorrect analysis with fpc:

ySRS==27.7, σˆ y 10.6, 95% CI = (6.5, 48.9)

– Incorrect analysis without fpc:

ySRS′ ==27.7, σˆ y 11.3, 95% CI = (5.1, 50.3)

2/1/13 13

• The design effect (aka deff) compares how a complex sampling design, in this case stratified sampling, compares to SRS 1 L N 2 1− n / N s2 / n Var Y N 2 ∑ i ( i i ) i i d 2 = ( st ) = i=1 Var Y 1− n / N s2 / n ( SRS ) ( ) • Design effect can be greater or less than 1 – But with reasonably homogeneous strata, almost always get decrease in variance – For this example, Var Y 1.97 d 2 = ( st ) = = 0.02 Var Y 112.36 ( SRS ) 2/1/13 14 Total Estimation Summary

L ˆ Ny N y • Estimator for the total:τ st== st∑ i i i=1 • Variance of τ ˆ st : ⎛ L ⎞ Var ˆ Var N y (τ st ) = ⎜ ∑ i i ⎟ ⎝ i=1 ⎠ L = N 2Var y ∑ i ( i ) i=1 L ⎛ n ⎞ ⎛ s2 ⎞ = N 2 1 i i ∑ i ⎜ − ⎟ ⎜ ⎟ i=1 ⎝ N ⎠ ⎝ n ⎠ i i • Bound on the error of estimation: 2 Var τˆ ( st ) 2/1/13 15 Proportion Estimation Summary

1 L • Proportion estimator: pNpˆˆ st= ∑ i i N i=1 • Variance of y st : ⎛ 1 L ⎞ Var pˆ Var N pˆ ( st ) = ⎜ ∑ i i ⎟ ⎝ N i=1 ⎠ 1 L = N 2Var pˆ 2 ∑ i ( i ) N i=1 1 L ⎛ n ⎞ ⎛ pˆ (1− pˆ )⎞ = N 2 1 i i i 2 ∑ i ⎜ − ⎟ ⎜ ⎟ N i=1 ⎝ N ⎠ ⎝ n ⎠ i i • Bound on the error of estimation:2 Var pˆ ( st ) 2/1/13 16 Sample Size Determination & Allocation

• Calculations more complicated as need to – Choose overall sample size n, and … – Allocate sample to strata, n1 + n2 + + nL = n • “Best” allocation depends on – Purpose of the stratification – Number of sampling units in each stratum – Variability of sampling units within each stratum – Cost of surveying each sampling unit from each stratum

2/1/13 17 Sample Size Calculation for Estimating the Mean

• Begin as with SRS, setting B = 2 Var y ( st ) • Now, define ai as the proportion of the sample size to allocate to strata i, ni= n x ai

• Then, L N 2 2 a ∑ i σ i i i=1 n = L N 2 B2 4 N 2 2 + ∑ i σ i i=1

2/1/13 18 Proportionate Allocation to Strata

• Sample size within each strata is proportional to strata size in population • If N is population size and n is total sample

size, then nn ii // = NN where – Ni is the population size of stratum i

– ni is the sample size for stratum i • Then L L L N 2 2 a N 2 2 N N N 2 ∑ i σ i i ∑( i σ i ) ( i ) ∑ iσ i n = i=1 = i=1 = i=1 L L 1 L N 2 B2 4 + N 2σ 2 N 2 B2 4 + N 2σ 2 NB2 4 + N σ 2 ∑ i i ∑ i i N ∑ i i i=1 i=1 i=1

2/1/13 19 Example

y s2 Ni Ni/N ni ni/Ni i i

• Note y =++ (6 5 8) / 3 = 6.3 (simple average)

• However yst =(0.4× 6)+ (0.5× 5)+ (0.1× 8)= 5.7

2/1/13 20 Source: Survey Methodology, 1st ed., Groves, et al, 2004. Other Allocation Schemes

• Rather than allocating sample proportional to strata size, can – Allocate according to variability of the strata • Idea is to allocate more of the sample to strata that are more variable • Done right, can provide most precise population estimates – Allocate according to the cost of collection per strata • Idea is to allocate more of the sample to strata that cost less

2/1/13 21 Poststratification

• Poststratification used when strata variable(s) not observed until after survey completed – That is, stratify on collected in the survey • Useful when sample demographics turn out not to match the population demographics – As with stratification, still need to know the distribution of the strata variable in the population – But don’t need to know the values for each element in the sampling frame • See text for more detail

2/1/13 22 What We Have Covered

• Defined stratified sampling • Discussed reasons for stratified sampling • Derived estimators for the population – Mean – Total – Percentage • Sample size calculation and allocation – Sampling proportional to strata size • Discussed design effects

2/1/13 23