Sampling, Three-Stage, Cluster Design, Estimator, Bias and Variance
Total Page:16
File Type:pdf, Size:1020Kb
American Journal of Mathematics and Statistics 2012, 2(6): 199-205 DOI: 10.5923/j.ajms.20120206.06 Alternative Estimation Method for a Three-Stage Cluster Sampling in Finite Population L. A. Nafiu1,*, I. O. Oshungade 2, A. A. Adewara 1Department of Mathematics and Statistics, Federal University of Technology, Minna, +234, Nigeria 2Department of Statistics, University of Ilorin, Ilorin, +234, Nigeria Abstract This research investigates the use of a three-stage cluster sampling design in estimating population total. We focus on a special design where certain number of visits is being considered for estimating the population size and a weighted factor of / is introduced. In particular, attempt was made at deriving new method for a three-stage sampling design. In this study, we compared the newly proposed estimator with some of the existing estimators in a three-stage sampling design. Eight (8) data sets were used to justify this paper. The first four (4) data sets were obtained from[1],[2],[3] and[4] respectively while the second four (4) data sets represent the number of diabetic patients in Niger state, Nigeria for the years 2005, 2006, 2007 and 2008 respectively. The computation was done with software developed in Microsoft Visual C++ programming language. All the estimates obtained show that our newly proposed three-stage cluster sampling design estimator performs better. Ke ywo rds Sampling, Three-stage, Cluster Design, Estimator, Bias and Variance Sub sampling has a great variety of applications[3] and the 1. Introduction reason for mu ltistage samp ling is ad ministrative convenienc e[10]. The process of sub sampling can be carried to a third In a census, each unit (such as person, household or local stage by sampling the subunits instead of enumerating them government area) is enumerated, whereas in a sample survey, complete ly[11]. Comparing multistage cluster sampling with only a sample of units is enumerated and information simp le rando m samp ling, it was observes that multistage provided by the sample is used to make estimates relating to cluster sampling is better in terms of efficiency[12]. all units[5] and[6]. In designing a study, it can be Multistage sampling makes fieldwork and supervision advantageous to sample units in more than one-stage. The relatively easy[4]. Multistage sampling is more efficient than criteria for selecting a unit at a given stage typically depend single stage cluster sampling[13] and references had been on attributes observed in the previous stages[7]. Mu ltistage made to the use of three or more stages sampling[9]. sampling is where the researcher divides the population into clusters, samples the clusters, and then resample, repeating Let N denote the number of primary units in the the process until the ultimate sampling units are selected at population and n the number of primary units in the sample. the last of the hierarchical levels[8]. If, after selecting a Let M i be the number of secondary units in the primary unit. sample of primary units, a sample of secondary units is The total number of secondary units in the population is selected from each of the selected primary units, the design is N referred to as two-stage sampling. If in turn a sample of MM= ∑ i (1) tertiary units is selected from each selected secondary unit, i=1 the design is three-stage sampling[9]. Let yij denote the value of the variable of interest of the The aim of this paper is to model a new estimator for jth secondary unit in the ith primary unit. The total of the three-stage cluster sampling scheme which is to be compared y-values in the ith primary unit is with the other existing seven conventional estimators. M i Yyi= ∑ ij (2) j=1 2. Methodology Accordingly, the population total for over-all sa mple in a two-stage is given as N M * Corresponding author: i = (3) [email protected] (L. A. Nafiu) Yy∑∑ ij = = Published online at http://journal.sapub.org/ajms ij11 Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved For a three-stage sampling, the population contains N 200 L. A. Nafiu et al.: Alternative Estimation Method for a Three-Stage Cluster Sampling in Finite Population first-stage units, each with M second-stage units, each of estimator of the population total at secondary unit in the which has K third-stage units. The corresponding numbers primary unit in the sample is: ℎ for the sample are n, m and k respectively. Let y be the 1 iju ℎ = th th 2 value obtained for the u third-stage units in the j =1 second-stage units drawn from the ith primary units. The � � = =1 2 (8) relevant population total for over-a ll samp le in a three-s tage is given as follows: where = is the known ∑ sampling fraction for tertiary NMK = (4) Yy∑∑∑ iju units in the secondary unit of the primary unit. i=111 ju = = Also, let denote the number of individuals (tertiary ^ units) in the sampleℎ from the secondaryℎ unit of the th For any estimation Θ h in the h cell based on primary unit who engage in the treatment of diabetes. An completely arbitrary probabilities of selection, the total unbiased estimator of the populationℎ total in the primaryℎ variance is then the sum of the variances for all strata. The unit in the sample is: symbol E is used for the operator of expectation, V for the ℎ = =1 (9) ^ variance, and V for the unbiased estimate of V. We may Finally, an unbiased� estimator ∑ �of the population total of then write the diabetic patients undergoing treatment in all the hospitals ^ ^ ^ at the secondary unit (city) in the primary unit V (Θh ) = V (E(Θh )) + E(V (Θh )) (5) (local government area) is: 1 >1 1 >1 ℎ ℎ The expression (5) may be written into three components 3 = as: =1 ^ ^ ^ ^ � � � = =1{ =1( =1 2 )} (10) V (Θ h ) = V (E(E(Θ h ))) + E(V (E(Θ h ))) + E(E(V (Θ h ))) (6) 1 2 >2 1 2 >2 1 2 >2 ∑ ∑ ∑ where “>1” is the symbol to represent all stages of sampling after the first[3]. 4. Theorems 3. Proposed Three-Stage Cluster 4.1. Theorem 1: is Unbi ased for the Popul ati on total Y Sampling Design � To estimate the population size at different hospitals using Proof: three-stage sampling, the unbiased estimator of population We know that expectation of given by equation (8) total can be derived as follows. In a three-stage sampling conditional on samples 1 and 2 of primary units and without replacement design supported by[3],[4] and[14]; a secondary units respectively equals� of engaging in the sample of primary units is selected, then a sample of variable of interest in each primary unit and each secondary secondary units is chosen from each of the selected primary unit [15]. That is; units and finally, a sample of tertiary units is chosen from 1, 2 = (11) each selected secondary unit. For instance, the state consists Also, the expectation of given by equation (9) of number of local government areas out of which a � � � conditional on sample � of primary units equals of simple random sampling of n number of local government 1 engaging in the variable of interest� in each primary unit. areas is selected. Each local government area consists of That is; number of cities out of which a simple random sampling ( | ) = (12) without replacement of number of cities is selected. 1 To obtain the expected value of given by equation Finally, from the selected sample of city containing 3 (10) over all possible samples � of primary units. number of hospitals, number of hospitals is selected at Then, the expectation of is� : random without replacement and the number of diabetic 3 3 = 1 [ 2 3 3 1, 2 1 ] (13) patients in this hospital is collected. � where 1 and 2 denote the samples of primary units and Then; �� � � �� � �� � = secondary units respectively. =1 =1 (7) 3 = [ =1 1, 2 1 ] Again, let be the numbe r of primary units (local ∑ ∑ government areas) sampled without replacement, be the �� � � � ∑ � � �� � number of secondary units (cities) selected without = [ { | 1}] replacement from the sampled primary unit (local =1 government area) and be the number of tertiary units � � ℎ = { } (hospitals) selected from the secondary unit (city) in the primary unit (local government area). An unbiased =1 � ℎ ℎ American Journal of Mathematics and Statistics 2012, 2(6): 199-205 201 known for = 1,2, , and = 1,2, , . The third term = contains variance due to estimating the ’s fro m a =1 ⋯ ⋯ subsample of tertiary units within the selected secondary = � (14) units. An unbiased estimator of the variance of 3 given This implies that the proposed estimator 3 is in equation (18) is obtained by replacing the population unbiased. � variances with the sample variances as follows: � 2 2 Hence, the variance of the newly proposed esti mator 1 3 = ( ) + =1 ( ) + 3 of the population total is derived as follows: =1 =1 2 (22) In line with[3] and[14], we use ��� � − ∑ − � 3 = 3 1, 2 | 1 where − + 3 1, 2 1 = � � � � � � � � � 3 2 =1( ) 3 + 3 + { 3 } (15) 2 � �� � � �� �� 1 = � (23) ∑ − 1 Because of the simple random sampling of primary units 2 =1( ) �� � �� ��� � ��� �� � �� 2 − and secondary units without replacement at the first stage = (24) ∑ −1 and second stage respectively, the first term to the right of the 2 2 2 − 2 equality in equation (15) is: = 2 =1( 4 2 )( ) (25) ( ) 2 3 1, 2 | 1 = { =1 } = 1 (16) ∑ − − − � The second term to the right of the equality in equation (15) 4.2.