<<

Theory MODULE X

LECTURE - 33 TWO STAGE SAMPLING (SUB SAMPLING)

DR. SHALABH DEPARTMENT OF MATHEMATICS AND INDIAN INSTITUTE OF TECHNOLOGY KANPUR 1 In , all the elements in the selected clusters are surveyed. Moreover, the in cluster sampling depends on size of the cluster. As the size increases, the efficiency decreases. It suggests that hig her preciiision can beattai ned by dis tr ibu tingagiven number of eltlementsover a large number of cltlusters and then by taking a small number of clusters and enumerating all elements within them. This is achieved in subsampling.

IbIn subsamp ling

ƒ Divide the population into clusters.

ƒ Select a of clusters [first stage]

ƒ From each o f the se lec te d c lus ter, se lec t a samp le o f spec ifie d num ber o f e lemen ts [secondtd stage]

The clusters which form the units of sampling at the first stage are called the first stage units and the units or group of units within clusters which form the unit of clusters are called the second stage units or subunits.

The procedidure is genera lidtthlized to three or more s tages an dithtd is then terme d as multist age sampli ng.

For example, in a crop survey

ƒ villages are the first stage units,

ƒ fields within the villages are the second stage units and

ƒ plots within the fields are the third stage units.

In another example, to obtain a sample of fishes from a commercial fishery,

ƒ first take a sample of boats and

ƒ then take a sample of fishes from each selected boat. 2 Two stage sampling with equal first stage units AthtAssume that ƒ population consists of NM elements. ƒ NM elements are grouped into N first stage units of M second stage units each, (i.e., N clusters, each cluster is of size M). ƒ Sample of n first stage units is selected (i.e., choose n clusters) ƒ Sample of m second stage units is selected from each selected first stage unit (i.e., choose m units from each cluster). ƒ Units at each stage are selected with SRSWOR.

Cluster sampling is a special case of two stage sampling in the sense that from a population of N clusters of equal size m = M, a sample of n clusters chosen.

If f urth er M = m = 1, we get SRSWOR.

If n = N, we have the case of . 3 th th yij : Value of the characteristic under study for the j second stage unit of the i first stage unit;

iNjm==1,2,..., ; 1,2,.., .

m 1 nd th st Yyiij= ∑ : per 2 stage unit of i 1 stage units in the population. M j=1 11NM N YyyY===∑∑ij ∑ i MN : mean per second stage unit in the popultilation MNij==11 N i = 1 1 m mean per second stage unit in the ith first stage unit in the sample. yyiij= ∑ : n j=1 11nm n mean per second stage in the sample. yyyy===∑∑ij ∑ i mn : mnij==11 n i = 1

Advantages

The principle advantage of two stage sampling is that it is more flexible than the one stage sampling. It reduces to one stage sampling when m=M but unless this is the best choice of m, we have the opportunity of taking some smaller value that appears more efficient. As usual, this choice reduces to a balance between statistical precision and cost. When units of the first stage agree very closely, then consideration of precision suggests a small value of m. On the other hand, it is sometimes as cheap to measure the whole of a unit as to a sample. For example, when the unit is a household and a single respondent can give as accurate data as all the members of the household. 4 A pictorial scheme of two stage sampling scheme is as follows:

Population (MN units)

Cluster Cluster Cluster Population … … … N clusters M units M units M units

N clusters…

Cluster Cluster Cluster First stage sample M units M units M units n clusters (small)

n clusters (small)

Second stage sample Cluster Cluster Cluster m units m m … … … m units n clusters (small) units units mn units 5 Note The expectations under two stage sampling scheme depend on the stages. For example, the expectation at second stage unit will be dependent on first stage unit in the sense that second stage unit will be in the sample provided it was selected in the first stage.

To calculate the average

ƒ First average the estimator over all the second stage selections that can be drawn from a fixed set of n units that the plan selects.

ƒ Then average over all the possible selections of n units by the plan.

In case o f t wo stage sam plin g, ˆˆ EEE()θθ= 12 [ ()] ↓↓ 2 average average average over all over over all possible 2nd stage

all 1st stage selections from a samples samples fixed set of units

In case of three staggpge sampling,

EEEE()θθˆˆ= ⎡⎤ () . 123⎣⎦{ } To calculate the , we proceed as follows: In case of two stage sampling, Var()θθθˆˆ=− E ( )2 ˆ 2 =−EE12().θ θ 6 Consider ˆˆˆ22 2 EEE222()θθ−= θ θ θ ()2() θ − + 2 ⎡⎤ˆˆ ˆ2 =+−+{}EV22()θ ()θθθθ 2 E 2 () . ⎣⎦⎢⎥

Now average over first stage selection as

2 EE()θˆˆˆˆ−=θθ2 E⎡⎤⎡⎤ E () + E V ()2()()θθθθ − EE + E 2 12 1⎣⎦⎣⎦ 2 1 2 12 1

2 ⎡⎤ˆˆ2 ⎡⎤ =−+EEE112{}() EV 12 () ⎣⎦⎢⎥⎣⎦ Var (θθˆˆ )=+ V⎡⎤⎡⎤ E ( ) Eθθ V θ ( ˆ ) . θ 12⎣⎦⎣⎦ 12

In case of three stage sampling,

ˆˆ⎡⎤⎡ ˆ⎤⎡ ˆ ⎤ Var()θθ=++ V13 E{ E ()} E 123123 V{ E θ ()} E E{ θV ()} . ⎣⎦2 ⎣ ⎦⎣ ⎦ 7 Estimation of population mean

Consider yy= mn as an estimator of the population mean Y .

Bias

Consider

Ey()= E12[] E ( ymn )

nd st = EEy12[](|)(im i as21 stage is dependent on stage )

= EEy12[](|)(im i asyYii is unbiased for due to SRSWOR) ⎡⎤1 n = EY1 ⎢⎥∑ i ⎣⎦n i=1 1 N = ∑Yi N i=1 =Y .

Thus ymn is an unbiased estimator of the population mean. 8 Variance

Vary()=+ EV12[ ( yi |)] V 1[ E 2 ( yi |)] ⎡⎤⎡⎤⎧⎫11nn ⎧⎫ =+EV12⎢⎥⎢⎥⎨⎬∑∑ yiii|| VE 1 2 ⎨⎬ yi ⎣⎦⎣⎦⎩⎭nnii==11 ⎩⎭ ⎡⎤⎡⎤11nn =+EV(|)yyyiV E (|)y i 112⎢⎥⎢⎥2 ∑∑ii ⎣⎦⎣⎦nnii==11 ⎡⎤111nn⎛⎞ ⎡⎤ 1 =−+ESVY2 11⎢⎥2 ∑∑⎜⎟ii⎢⎥ ⎣⎦nmMii==11⎝⎠ ⎣⎦ n 111n ⎛⎞ E()|SiVy2 () =−2 ∑⎜⎟ic+ 1 nmMi=1 ⎝⎠

(whereyc is based on cluster as in cluster sampling )

111⎛⎞22Nn− =−+2 nSS⎜⎟wb nm⎝⎠MNn

11⎛⎞⎛⎞ 122 1 1 =−⎜⎟⎜⎟SSwb +− nm⎝⎠⎝⎠ M n N

N NM 2 2211 where SSwi==∑∑∑( YY iji −) NNMiij===111(1)−

N 221 SYYbi=−∑(). N −1 i=1 9 Estimate of variance

22 An unbiased estimator of variance of y can be obtained byyp replacin g SS bwand by their unbiased estimators in the expression of variance of y

Consider an estimator of

N 221 SSwi= ∑ N i=1 where

M 2 2 1 SyYiiji= ∑( − ) M −1 j=1 and

n 221 sswi= ∑ n i=1

m 221 syyiiji=−∑(). m −1 j=1 10 22 So Es()ww= EE12() s | i

n ⎡⎤1 2 = EE12⎢⎥∑ si | i ⎣⎦n i=1 1 n ⎡⎤2 = EEsi12∑ ⎣⎦(|)i n i=1

n 1 2 = ES1 ∑ i ()as SRSWOR is used n i=1

n 1 2 = ∑ ES1()i n i=1

NN 11⎡⎤2 = ∑∑⎢⎥Si NNii==11⎣⎦

N 1 2 = ∑ Si N i=1

2 = Sw

2 2 so s w is an unbiased estimator of S w .

Consider n 221 sbi=−∑ ()yy n −1 i=1 as an estimator of N 221 SYYbi=−∑(). N −1 i=1 11 So n 221 ⎡⎤ Es()bi=− E⎢⎥∑ (yy ) n −1 ⎣⎦i=1

n 222⎡⎤ (1)()nEsE−=bi⎢⎥∑ yny − ⎣⎦i=1

n ⎡⎤22 =−EynEy⎢⎥∑ i () ⎣⎦i=1

n ⎡⎤⎛⎞ 2 =−+EE y2 nVary⎡⎤() Ey () 12⎢⎥⎜⎟∑ i ⎣⎦{} ⎣⎦⎝⎠i=1

n ⎡⎤2222⎡⎤⎛⎞⎛⎞11 1 11 =−−+−+EEyin12⎢⎥∑ ()|)ibw⎢⎥⎜⎟⎜⎟ S SY ⎣⎦i=1 ⎣⎦⎝⎠⎝⎠nN mMn

n ⎡⎤2 ⎡⎤⎛⎞⎛⎞11222 1 11 = EVar1 ⎢⎥∑{ (yEynii)()+−−+−+()} ⎢⎥⎜⎟⎜⎟ S b SY w ⎣⎦i=1 ⎣⎦⎝⎠⎝⎠nN mMn

n ⎡⎤⎧⎫⎡⎛⎞1122 ⎛⎞⎛⎞ 11 2 111 22 ⎤ =−+−−+−+ESYnSSY1 ⎢⎥∑ ⎨⎬⎜⎟ii⎢⎥ ⎜⎟⎜⎟ b w ⎣⎦i=1 ⎩⎭⎣⎝⎠mM ⎝⎠⎝⎠ nN mMn ⎦

n ⎡⎤111⎧⎫⎛⎞22⎡⎤ ⎛⎞ 11 2⎛⎞11122 =−+−−nE1 ⎢⎥⎨⎬∑⎜⎟ Sii Y n⎢ ⎜⎟ S b+ ⎜⎟−+SYw ⎥ ⎣⎦nmM⎩⎭i=1 ⎝⎠⎣ ⎝⎠ nN ⎝⎠mMn ⎦

NN ⎡⎤⎡⎤⎛⎞11122 1 ⎛⎞⎛⎞ 11 2 111 22 =−nSYnSSY⎢⎥⎢⎥⎜⎟∑∑ii + −−+−+ ⎜⎟⎜⎟ b w ⎣⎦⎣⎦⎝⎠mMNii==11 N ⎝⎠⎝⎠ nN mMn

N ⎡⎤⎡⎤⎛⎞1122 1 ⎛⎞⎛⎞ 11 2 111 22 =−nSYnS⎢⎥⎢⎥⎜⎟wi +∑ −−+−+ ⎜⎟⎜⎟ b SY w ⎣⎦⎣⎦⎝⎠mM Ni=1 ⎝⎠⎝⎠ nN mMn 12 N ⎛⎞11222n ⎛⎞ 11 2 =−(1)n⎜⎟ − Swi +∑ YnYn − − ⎜⎟ − S b ⎝⎠mM Ni=1 ⎝⎠ nN

N ⎛⎞112222n ⎡⎤ ⎛⎞ 11 =−(1)nSYNYnS⎜⎟ −wi +⎢⎥∑ − − ⎜⎟ − b ⎝⎠mM N⎣⎦i=1 ⎝⎠ nN

⎛⎞1122n ⎛⎞ 11 2 =−(1)nSNSnS⎜⎟ −wb + ( − 1) − ⎜⎟ − b ⎝⎠mM N ⎝⎠ nN

⎛⎞11 22 =−(1)nSnS⎜⎟ −wb +− (1). ⎝⎠mM

222⎛⎞11 ⇒⇒+Es()bbb =−⎜⎟ Sw + Sb ⎝⎠mM or

⎡⎤222⎛⎞11 Es⎢⎥bwb−−⎜⎟ s = S. ⎣⎦⎝⎠mM

Thus

m 11⎛⎞⎛⎞ 1ˆ 22 1 1 ˆ Var() y=−⎜⎟⎜⎟ Sω +− Sb nm⎝⎠⎝⎠ M n N

11⎛⎞⎛⎞⎛⎞ 1222 1 1⎡⎤ 1 1 =−⎜⎟⎜⎟⎜⎟swbw +−−−⎢ss⎥ nm⎝⎠⎝⎠⎝⎠ M n N⎣ m M ⎦

11⎛⎞⎛⎞ 122 1 1 =−⎜⎟⎜⎟sswb +−. Nm⎝⎠⎝⎠ M n N 13