Chapter 10 Two Stage Sampling (Subsampling)

Chapter 10 Two Stage Sampling (Subsampling) In cluster sampling, all the elements in the selected clusters are surveyed. Moreover, the efficiency in cluster sampling depends on the size of the cluster. As the size increases, the efficiency decreases. It suggests that higher precision can be attained by distributing a given number of elements over a large number of clusters and then by taking a small number of clusters and enumerating all elements within them. This is achieved in subsampling. In subsampling - divide the population into clusters. - Select a sample of clusters [first stage} - From each of the selected cluster, select a sample of the specified number of elements [second stage] The clusters which form the units of sampling at the first stage are called the first stage units and the units or group of units within clusters which form the unit of clusters are called the second stage units or subunits. The procedure is generalized to three or more stages and is then termed as multistage sampling. For example, in a crop survey - villages are the first stage units, - fields within the villages are the second stage units and - plots within the fields are the third stage units. In another example, to obtain a sample of fishes from a commercial fishery - first, take a sample of boats and - then take a sample of fishes from each selected boat. Two stage sampling with equal first stage units: Assume that - the population consists of NM elements. - NM elements are grouped into N first stage units of M second stage units each, (i.e., N clusters, each cluster is of size M ) Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 1 - Sample of n first stage units is selected (i.e., choose n clusters) - Sample of m second stage units is selected from each selected first stage unit (i.e., choose m units from each cluster). - Units at each stage are selected with SRSWOR. Cluster sampling is a special case of two stage sampling in the sense that from a population of N clusters of equal size mM , a sample of n clusters are chosen. If further, Mm1, we get SRSWOR. If nN , we have the case of stratified sampling. th th yij : Value of the characteristic under study for the j second stage units of the i first stage unit; iNjm1,2,..., ; 1,2,.., . M 1 nd th st Yyiij : mean per 2 stage unit of i 1 stage units in the population. M j1 11NM N YyyYij i MN : mean per second stage unit in the population MNij11 N i 1 m 1 th yiij y : mean per second stage unit in the i first stage unit in the sample. m j1 11nm n y yyyij i mn : mean per second stage in the sample. mnij11 n i 1 Advantages: The principle advantage of two stage sampling is that it is more flexible than the one-stage sampling. It reduces to one stage sampling when mM but unless this is the best choice of m , we have the opportunity of taking some smaller value that appears more efficient. As usual, this choice reduces to a balance between statistical precision and cost. When units of the first stage agree very closely, then consideration of precision suggests a small value of m . On the other hand, it is sometimes as cheap to measure the whole of a unit as to a sample. For example, when the unit is a household and a single respondent can give as accurate data as all the members of the household. Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 2 A pictorial scheme of two stage sampling scheme is as follows: Population (MN units) Cluster Cluster Cluster Population M units M units … … … M units N clusters (large in number) N clusters Cluster Cluster First stage Cluster M units M units … … … sample n M units clusters (small in number) n clusters Second stage sample m units n clusters (large Cluster Cluster Cluster number of … … … m units m units m units elements from each cluster) mn units Note: The expectations under two stage sampling scheme depend on the stages. For example, the expectation at second stage unit will be dependent on first stage unit in the sense that second stage unit will be in the sample provided it was selected in the first stage. Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 3 To calculate the average - First average the estimator over all the second stage selections that can be drawn from a fixed set of n units that the plan selects. - Then average over all the possible selections of n units by the plan. In the case of two stage sampling, ˆˆ EEE( ) 12[ ( )] average average average over over over all all possible 2nd stage all 1st stage selections from samples samples a fixed set of units In case of three stage sampling, EEEE()ˆˆ () . 123 To calculate the variance, we proceed as follows: In case of two stage sampling, Var()ˆˆ E ( )2 ˆ 2 EE12() Consider ˆˆˆ22 2 EEE222() ()2() 2 . EV(()2()ˆˆ E ˆ2 22 2 Now average over first stage selection as 2 EE()ˆˆˆˆ2 E E () E V ()2()() EE E 2 12 1 2 1 2 12 1 2 EE()ˆˆ2 EV () 12 12 Var(ˆˆ ) V E ( ) E V ( ˆ ) . 12 12 In case of three stage sampling, ˆˆ ˆ ˆ Var() V13 E E () E 123123 V E () E E V () . 2 Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 4 Estimation of population mean: Consider y ymn as an estimator of the population mean Y . Bias: Consider Ey() E12 E ( ymn ) nd st EEyi12(im ) (as 2 stage is dependent on 1 stage) EEyi12(im ) (as y i is unbiased for Y i due to SRSWOR) 1 n = EY1 i n i1 1 N Yi N i1 Y . Thus ymn is an unbiased estimator of the population mean. Variance Var() y E12 V ( y i ) V 12 E ( y /) i 11nn EV12 yiii VE 1 2 y/ i nnii11 11nn EVyiVEyi() (/) 1122 ii nnii11 111nn 1 ESVY2 112 ii nmMii11 n 111n ES()2 Vy () 2 11ic nmMi1 (where yc is based on cluster means as in cluster sampling) 11122Nn 2 nSSwb nmM Nn 11 122 1 1 SSwb nm M n N NNM2 2211 where SSwi YY iji NNMiij111(1) N 221 SYYbi() N 1 i1 Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 5 Estimate of variance 22 An unbiased estimator of the variance of y can be obtained by replacing SSbwand by their unbiased estimators in the expression of the variance of y. Consider an estimator of N 221 SSwi N i1 M 2 2 1 where SyYiiji M 1 j1 n 221 as sswi n i1 m 221 wheresyyiiji ( ) . m 1 j1 So 22 Es()ww EE12 s i n 1 2 EE12 si i n i1 1 n 2 EEsi12 ()i n i1 1 n ES2 (as SRSWOR is used) 1 n i i1 n 1 2 ES1()i n i1 NN 112 Si NNii11 N 1 2 Si N i1 2 Sw 2 2 so sw is an unbiased estimator of Sw . Consider n 221 syybi() n 1 i1 as an estimator of N 221 SYYbi(). N 1 i1 Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 6 So n 221 Es()bi E ( y y ) n 1 i1 n 222 (1)()nEsEbi yny i1 n 22 E yi nE() y i1 n 2 EE y2 nVary() Ey () 12 i i1 n 222211 1 11 EEyin12 ())ibw S SY i1 nN mMn n 2 11222 1 11 E1 Var( yii)(Ey n S b S w Y i1 nN mMn n 1122 11 2 111 22 ESYnSSY1 ii b w i1 mM nN mMn n 11122 111 21122 nE1 Sii Y n S bSYw nmMi1 nN mMn NN 11122 1 11 2 111 22 nSYnSSYii b w mMNii11 N nN mMn N 1122 1 11 2 111 22 nSYnSwi b SY w mM Ni1 nN mMn N 11222n 11 2 (1)n Swi YnYn S b mM Ni1 nN N 112222n 11 (1)nSYNYnS wi b mM Ni1 nN 1122n 11 2 (1)nSNSnS wb ( 1) b mM N nN 11 22 (1)nSnS wb (1). mM 22211 Es()bwb S S mM 22211 or Esbwb s S. mM Thus 11 1ˆ 22 1 1 ˆ Var() y S Sb nm M n N 11 1222 1 1 1 1 ssswbw nm M n N m M 11 122 1 1 sswb . Nm M n N Sampling Theory| Chapter 10 | Two Stage Sampling (Subsampling) | Shalabh, IIT Kanpur Page 7 Allocation of sample to the two stages: Equal first stage units: The variance of the sample mean in the case of two stage sampling is 11 122 1 1 Var() y Swb S . nm M n N 22 It depends on SSbw,,and. n m So the cost of survey of units in the two stage sample depends on nmand . Case 1. When cost is fixed We find the values of n and m so that the variance is minimum for given cost. (I) When cost function is C = kmn Let the cost of the survey be proportional to sample size as Cknm where C is the total cost and k is constant. C When cost is fixed as CC . Substituting mVary 0 in ( ), we get 0 kn 22 1122SSwbkn Var() y Sbw S nMNnC 0 222 1 2 SSkSwbw Sb .

Load more