<<

Efficient Class of Estimators for Population Using Auxiliary Information

Prayas Sharma and Rajesh Singh Department of , Banaras Hindu University, Varanasi, India. [email protected], [email protected]

Abstract

This article suggests an efficient class of estimators of population median of the study variable using an auxiliary variable. Asymptotic expressions of bias and square error of the proposed class of estimators have been obtained. Asymptotic optimum estimator has been investigated along with its approximate mean square error. We have shown that proposed class of estimator is more efficient than estimator considered by Srivastava (1967), Gross (1980), Kuk and Mak (1989) Singh et al. (2003b), Al and Chingi (2009) and Singh and Solanki (2013). In addition theoretical findings are supported by an empirical study based on two populations to show the superiority of the constructed estimators over others.

Key words: Auxiliary Variable , Simple random , Bias, Mean Square Error.

1. Introduction

In the sampling literature, statisticians are often interested in dealing with variables that have highly skewed distributions such as consumptions and incomes. In such situations median is considered the more appropriate measure of location than mean. It has been well recognised that use of auxiliary information results in efficient estimators of population parameters. Initially, estimation of median without auxiliary variable was analyzed, after that some authors including Kuk and Mak (1989), Meeden (1995) and Singh et al. (2001) used the auxiliary information in median estimation. Kuk and Mak (1989), proposed the problem of estimating the population median M y of study variable Y using the auxiliary variable X for the units in the sample and its median M x for the whole population. Some other important references in this context are Chambers and Dunstan (1986), Rao et al. (1990), Mak and Kuk (1993), Rueda et al. (1998), Arcos et al. (2005), Garcia and Cebrian (2001), Singh et al. (2003a, 2006), Singh et al. (2007) and Singh and Solanki (2013).

Let Yi and Xi (i =1,2,....N) be the values of the population units for the study variable

Y and auxiliary variable X, respectively. Further suppose that yi and xi (i=1,2.....n) be the values of the units including in the sample say, sn of size n drawn by simple random sampling without replacement (SRSWOR) scheme. Kuk and Mak (1989) suggested a ratio estimator for estimating population median My of the study variable Y, assuming population median of auxiliary variable X, M x is known, given as

ˆ ˆ ˆ Mr  My Mx / Mx  (1.1)

ˆ ˆ where My (due to Gross 1980) and M x are the sample estimators of My and M x respectively.

Suppose that y(1) , y(2) ,..... y (n ) are the y values of sample unites in ascending order. Further, suppose t be an integer satisfying Y(t)  M y  M (t1) and p=t/n be the proportion of y values in the sample that are less than or equal to the median value My, an unknown population ˆ parameter. If Q y t denote the t-quantile of Y then My  Qy 0.5. Kuk and Mak (1989) defined a matrix of proportion (pij) is

Y  M y Y  M y Total

X  M x p11 p 21 p .1

X  M x p12 p 21 p.2

Total p1 p 2 1

Following Robson (1957) and Murthy (1964), the product estimator for population median

M y is defined as

ˆ ˆ ˆ Mp  My Mx / Mx  (1.2) The usual difference estimator for population median M y is given by

ˆ ˆ ˆ Md  My  dMx  Mx  (1.3)

ˆ where d is a constant to be determined such that the mean square error of Md is minimum.

Singh et al. (2003) proposed the following modified product and ratio estimators for population median M y , respectively, as

 a  Mˆ  Mˆ  Mˆ  x  (1.4) 1 y    a  M x  and

 a  M  Mˆ  Mˆ  x  (1.5) 2 y  ˆ   a  M x  where a is suitably chosen scalar.

Srivastava (1967) type estimator for median estimation is given by

  M  Mˆ  Mˆ  x  (1.6) 3 y  ˆ   M x 

Reddy (1973,1974) and Walsh (1970)-type estimator is given by

 M  Mˆ  Mˆ  x  (1.7) 4 y  ˆ   M x  M x  M x 

Sahai and Ray (1980)-type estimator is given by

   M   ˆ ˆ  x  M5  M y 2   (1.8)   Mˆ     x  

Vos (1980)- type estimator is given by  Mˆ  Mˆ  wMˆ  (1 w)Mˆ  x  6 y y  M   x   (1.9)  M  Mˆ  wMˆ  (1 w)Mˆ  x  7 y y  ˆ   M x  where w is suitably chosen scalar .

ˆ All the estimators considered from (1.1) to (1.9) and conventional estimator My are members of the Srivatava (1971) and Srivastava and Jhajj (1981)-type class of estimators

  Mˆ   ˆ (G) ˆ (G)  ˆ x  G  M y : M y  G M y ,  (1.10)  M    X 

where the function G assumes a value in a bounded closed convex subset Q  R 2 , which contains the point (My ,1) and is such that

GM y ,1  1

Using first order –order Taylor’s series expansion about the point M y ,1, we have

ˆ (G) ˆ 1 My  GMy ,1 My  My G10 My ,1 G01 My ,1 O(n ) (1.11)

Mˆ where U  x . M x

G And G M ,1  01 y U (My ,1)

Using conditions, we have

ˆ (G) ˆ 1 My  My  My  My  U 1G01 My ,1 O(n )

Or

ˆ (G) ˆ My  My  My  My  U 1G01 My ,1 (1.12) ˆ (G) Squaring and taking expectations of both sides of (1.12),we get the MSE of My to the first order of approximation as

 VMˆ  CovMˆ ,Mˆ   MSEMˆ (G)  VMˆ  y G 2 Mˆ ,1 2 y x G 2 Mˆ ,1 y y M 2 01 y M 01 y  x x 

Here as N  , n   then n / N  f and we assumed that as N  the distribution of

(X, Y) approaches a continues distribution with marginal densities f x x and f y y of X and Y respectively. Super population model framework is necessary for treating the values of X and Y in a realization of N independent observation from a continuous distribution. It is also ˆ assumed that f x M x  and f y M y  are positive. Under these conditions, sample median M y is consistent and asymptotically normal (due to Gross, 1980) with mean M y and

ˆ 2 2 VMy  MyCy and

ˆ 2 2 VM x   M x C x

ˆ ˆ CovMy ,Mx  cMyMxCyCx

1 where   1 f / 4n , f  n / N Cy  Myf y My  Cx  M x f x M x  and C  4p11 1

with p11  PM x ,M y  goes from -1 to +1 as p11 increase from 0 to 0.5

ˆ (G) Substituting these values we get the MSE of My to the first degree of approximation as

ˆ (G) 2 2 2 2 MSEMy  MyCy  Cx G01 My ,1)  2cCxCyMyG01 My ,1

The MSE is minimum when

G 01 M y ,1  k c M y (1.18)

 C   y  where k c  c  .  Cx  ˆ (G) Thus, the minimum MSE of My is given by

ˆ (G) 2 2 2 ˆ MSEmin My  CyMy 1 c  MSEmin Md  (1.19)

ˆ Which is equal to the minimum MSE of the estimator Md defined at (1.3).

ˆ ˆ ˆ It is to be mentioned that minimum MSE’s of the estimators M r , Mp and M i i  1,2...7 are equal to MSE expression given in equation (1.19). It is obvious from (1.19) that the ˆ (G) estimators of the form My are asymptotically no more efficient than the difference estimator at its optimum value or the regression type estimator given as

ˆ ˆ ˆ ˆ Mlr  My  dMx  Mx  (1.20)

fˆ Mˆ  where dˆ  x x 4pˆ 1 ˆ ˆ 11 f y M y 

Singh and Solanki (2013) suggested following classes of estimators

ˆ 1 ˆ ˆ Md  d1My  1 d1 Mx  Mx  (1.21)

ˆ 2 ˆ ˆ Md  d1My  d2 Mx  Mx  (1.22)

ˆ 3 ˆ ˆ Md  d1My  d2Mx  1 d1  d2 Mx (1.23)

  M   Mˆ 4  d Mˆ  d M  Mˆ  x  (1.24) d  1 y 2  x x  ˆ   M x  

where d 1 and d 2 are suitable constants to be determined such that MSE’s of the estimators considered in (1.21) to (1.24) are minimum,  and  are either real numbers or the functions of the known parameters of auxiliary variable X.

Biases and minimum MSEs of the estimators considered in (1.21) to (1.24) are given as

ˆ 1 BMd  (d1 1)My (1.25)

ˆ 2 BMd  (d1 1)My (1.26) ˆ 3 BMd  (d1 1)1 R)My (1.27)

ˆ 4 2 2 2 BMd  My d1 1 Cx   kc  d2RCx 1 (1.28)

2  1 RC2 R  k   MSE Mˆ 1  M 2 1 R 2 C2  x c  (1.29) min d y x 1  C2  RC 2 R  2k    y x  c 

M 2 C2 1 2 ˆ 2 y y  c  MSEmin M d   2 2 (1.30) 1 C y 1 c 

M 2 C 2 1 2 1 R2 MSE Mˆ 3  y y c (1.31) min  d  2 2 2 1 R  C y 1 c 

1 2 C2 M 2 C2 1 2 ˆ 4  x  y y  c  MSEmin M d   2 2 2 2 (1.32) 1  C x  C y 1 c 

2. The Suggested Class of Estimators

We propose a family of estimators for population median of the study variable Y, as

   M    M  Mˆ   ˆ  x    x x   ˆ t m  w1M y exp   w 2 M x  1 w1  w 2 M x (2.1)  Mˆ    M  Mˆ  2    x    x x  

where w1 and w 2 are suitable constants to be determined such that MSE of t m is minimum,  and  are either real numbers or the functions of the known parameters of auxiliary variables such as C x , 1x ,  2x  and correlation coefficient  c (see Singh and Kumar (2011)) .

It is to be mentioned that

(i) For w1 , w 2 =(1,0), the class of estimator t m reduces to the class of estimator as

   M    M  Mˆ   ˆ  x    x x   t mp  M y exp  (2.2)  Mˆ    M  Mˆ  2    x    x x  

(ii) For =(w1,0), the class of estimator t m reduces to the class of estimator as    M    M  Mˆ   ˆ  x    x x   t mq  w1M y exp  (2.3)  Mˆ    M  Mˆ  2    x    x x  

A set of new estimators generated from (2.1) using suitable values of w1 , w 2 ,  , and  are listed in Table 2.1.

Table 2.1: Set of estimators generated from the class of estimators t m

Subset of proposed estimator w1 w 2   

ˆ t m1  My (Gross, 1980) 1 0 0 0 1

 M  t  Mˆ  x   Mˆ (Kuk and Mak, 1989) 1 0 1 0 1 m2 y  ˆ  r  M x 

  M  t  Mˆ  x   Mˆ (Srivastava, 1967) 1 0  0 1 m3 y  ˆ  3  M x 

 Mˆ  t  Mˆ  x   M ( Murthy, 1964) 1 0 -1 0 1 m4 y   p  M x 

 M  t  w Mˆ  x  (Al and Cingi, 2009) 1 0 1 0 1 m5 1 y  ˆ   M x 

 Mˆ  t  w Mˆ  x  0 -1 0 1 m6 1 y    M x 

ˆ t m7  w1My (Al and Cingi, 2009) 0 0 0 1

ˆ ˆ 3 * t m8  w1My  w 2Mx  1 w1  w 2 Mx  Md w 2 0 0 1

*Estimator proposed by Singh and Solanki (2013) given in equation (1.23).

Another set of estimators generated from class of estimator t mq given in (2.3) using suitable values of  and  are summarized in table 2.2

Table 2.2: Set of estimators generated from the estimator

Subset of proposed estimator   

 ˆ  (1)   M   M  M   t  w Mˆ  x exp x x  1 1 1 mq  1 y  ˆ   ˆ    M x   M x  M x  2 

 ˆ  (2)   M   M  M   t  w Mˆ  x exp  x x  1 1  mq  1 y  ˆ   ˆ  c   M x   M x  M x  2c 

 ˆ  (3)   M   M  M   t  w Mˆ  x exp x x  1 1 M mq  1 y  ˆ   ˆ  x   M x   M x  M x  2M x 

 ˆ  (4)   M   M  M  t  w Mˆ  x exp  x x  1 1 0 mq  1 y  ˆ   ˆ    M x   M x  M x    Mˆ   M  Mˆ  (5)  ˆ  x    x x  t mq  w1M y exp  -1 1 1  M   ˆ    x   M x  M x 

 ˆ  (6)   M   M M  M   t  w Mˆ  x exp x x x  1 M  mq  1 y  ˆ   ˆ  x c   M x   M x M x  M x  2c 

 ˆ  (7)   M M  M   t  w Mˆ exp x x x  0 mq  1 y  ˆ    M x M x  M x  2c 

 ˆ  (8)   M    M  M   t  w Mˆ  x exp c x x  1 mq  1 y  ˆ   ˆ    M x   c M x  M x  2M x 

  Mˆ    M  Mˆ  (9)  ˆ  x   c  x x   t mq  w1M y exp  -1  M   ˆ    x   c M x  M x  2M x 

Expressing (2.1) in terms of e’s, we have

 1 t m  w1My 1 e0 1 e1  exp ke1 1 ke1  

M where, k  x . (2.4) 2M x  

Up to the first order of approximation we have,

2 t m  My  w1 1b  w2My e0  ae1  de1  ae0e1 w2Mxe1  (2.5)

3 2  1 where a    k , b  M y  M x  and d   k  k  . 2 2 

Taking expectations of both sides of (2.5) we get the bias of the estimator t m as

  3 2   Bt m   w1 1Y  w1Y  N02  N11   w 2PN02  (2.6)   2  

Squaring both sides of equation (2.5) and neglecting terms of e’s having power greater than two, we have

2 2 2 2 2 2 2 2 t m  Y  1 2w1 b  w1 b  M y e0  a e1  2ae0e1 

2 2 2 2  w2Mxe1  2w1w2MyMx e0e1  ae1 

Taking expectations of both sides of above expression, we get the MSE of the estimator t m to the first order of approximation as

2 2 2 MSE(t m )  1 2w1 b  w1 A  w 2 B  2w1w 2C (2.7) where,

2 2 2 2 2 A  b  My Cy  a Cx  2acCyCx ,

2 2 B  M x C x ,

C  M y M x c C y  aC x C x .

The optimum values of w1 and w 2 are obtained by minimizing (2.7) and is given by b 2 B  b 2C w *  And w *  (2.8) 1 AB  C2  2 AB  C2 

Substituting the optimal values of w1 and w 2 in equation (2.7) we obtain the minimum MSE of the estimator t m as

2 2  b B  MSE(t m )min  b 1 2  (2.9)  AB  C  Or 2 2 2 2 M y 1 R C y 1 c  MSE (t )    (2.10) min m (1 R)2  C2 1 2  y  c  ˆ 3 MSE expression given in (2.10) is same as the minimum MSE of Estimator Md given in (1.31)

Similarly, the minimum MSE of the class of estimators t mq is given by

 2 2 2  2 C y  a C x  2a cC y C x  MSE min (t mq)  M y  2 2 2  (2.11) 11 C y  a C x  2a c C y C x 

3. Comparisons

From equations (1.19) and (2.10) we have

2 ˆ (G) 1 R MSE M  MSE Mˆ  MSE Mˆ  MSE t   min d  0 (3.1)  min  y  min  d  min m ˆ 2 MSE min M d  1 R  2 M y

From equations (1.19) and (2.11) we have

ˆ (G) ˆ MSEmin My  MSEmin Md  MSEmin t mq  0

 2 2 2  2 2 2 2 C y  a C x  2a c C y C x  C y M y 1 c  M y  2 2 2   0 11 C y  a C x  2a c C y C x 

2 2 2 2 2 2 2 2 Cy 1c 11 Cy  a Cx  2acCyCx  Cy  a Cx  2acCyCx  (3.2)

From equations (1.30) and (2.10)

ˆ 2 MSEmin t m  MSEmin Md  0

M 2 RR  2MSE Mˆ  y min d , When 0

2 2 C2 M 2 MSE Mˆ  x y min d  0 (3.4) 2 ˆ 2 2 2 ˆ M y  MSEmin Md M y 1  Cx  MSEmin Md 

ˆ 2 and from (3.3) we have MSEmintm MSEminMd  0

ˆ 4 Therefore, MSEmin t m  MSEmin Md  0 (3.5)

It follows from (3.1), (3.2),(3.3), (3.4) and (3.5) that the proposed class of estimators t m is ˆ (G) better than the Conventional difference estimator Md , the class of estimators My and (G) ˆ estimator belonging to the class of estimators My i.e. usual unbiased estimator M y ,due to ˆ Gross(1980), usual ratio-type estimator M r due to Kuk and Mak (1989), product estimator ˆ ˆ Mp and M i (i=3,4...7) at their optimum conditions. Further it is shown that the proposed (2) (4) (1) class of estimators t m is better than the estimators M d , M d and M d considered by Singh and Solanki (2013).

4. Empirical study

Data Statistics: To illustrate the efficiency of proposed class of estimators in the application, we consider the following two population data sets.

Population I. (Source Singh, 2003) y : The number of fish caught by marine recreational fisherman in 1995. x : The number of fish caught by marine recreational fisherman in 1964

The values of the required parameters are :

N=69, n=17, M y  2068 , M x  2011 , f y M y   0.00014 , f x M x   0.00014

c  0.1505 , R= 0.97244

Population II. (Source Singh, 2003) y : The number of fish caught by marine recreational fisherman in 1995. x : The number of fish caught by marine recreational fisherman in 1993

The values of the required parameters are:

N=69, n=17, , M x  2307 , , f x M x   0.0013

c  0.3166 , R= 1.11557 Table 3.1: / MSEs/minimum MSEs of different Estimators

Estimators Population I Population II

ˆ VM y  565443.57 565443.57

ˆ MSEM r  988372.76 536149.50

ˆ  MSE min M d   ˆ (G)  MSE min M y  552636.13 508766.02  ˆ  MSE min M i  

ˆ 1 MSEmin Md  485969.06 495484.97

ˆ 2 MSEmin Md  489395.24 454675.78

ˆ 3 MSEmin Md  3229.34 51355.17

ˆ 4 MSEmin Md  480458.97 454616.15

MSE min t m  3229.34 51355.17

1 MSEmin t mq 3267.42 58727.72

2 MSEmin t mq 3267.43 58729.63

3 MSEmin t mq 3254.89 55919.25

4 MSEmin t mq 3267.43 58730.48

5 MSEmin t mq 3238.55 55037.68

6 MSEmin t mq 3267.43 58730.48

7 MSEmin t mq 3232.56 51514.08

8 MSEmin t mq 3247.25 54709.03

9 MSEmin t mq 3253.88 59211.32

(for i=1,2....7) Analysing Table 3.1, we conclude that the estimators based on auxiliary information are more ˆ efficient than the one which does not use the auxiliary information as M y .The members of the class of estimator t mq,obtained from class of estimator t m , are almost equally efficient but more than the usual unbiased estimator (due to Gross, 1980), usual ratio estimator

ˆ ˆ ˆ (G) M r (due to Kuk and Mak, 1989), difference type estimator Md , the class of estimator My

ˆ ˆ (1) ˆ 2) ˆ (4) the estimators M i (i=1,2,...7) and the estimator Md , Md and Md (due to Singh and Solanki,

j 2013). Among the proposed estimators t m and t mq(j=1,2,...9) the performance of the

ˆ (3) estimator t m ,which is equal efficient to the estimator Md (due to Singh and Solanki, 2013)

ˆ 7 , is best in the sense of having the least MSE followed by the estimator Mmq which utilize the information on population median M x along with  c .

Conclusion

In this present study we have suggested a class of estimators of the population median of study variable y when information is available on auxiliary variable. In addition, some known estimators of population median such as usual unbiased estimator for population

ˆ median M y due to Gross (1980), estimators due to Kuk and Mak (1989), Srivastava (1967), murthy(1964) , Al and Chingi (2009) and Singh and Solanki (2013) are found to be members of the proposed class of estimators also generated from the proposed class of estimators. We have obtained the biases and MSEs of the proposed class of estimators up to the first order of approximation. The proposed class of estimators are advantageous in the sense that the properties of the estimators which are members of the proposed class of estimators .In theoretical and empirical comparisons we have shown that the proposed class of estimators are more efficient than the estimators considered here and equally efficient than the estimator

ˆ 3 Md

References

1. Al, S., Cingi, H. (2009). New estimators for the population median in simple random sampling. Tenth Islamic Countries Conference on Statistical Sciences, held in New Cairo, Egypt. 2. Arcos, A., Rueda, M., Martinez-Miranda, M. D. (2005). Using multiparametric auxiliary information at the estimation stage. Statist. Pap. 46:339–358. 3. Chambers, R. L., Dunstan, R. (1986). Estimating distribution functions from survey data.Biometrika 73:597–604. 4. Garcia, M. R., Cebrian, A. A. (2001). On estimating the median from survey data using multiple auxiliary information. Metrika 54:59–76. 5. Gross, T. S. (1980). Median estimation in sample surveys. Proc. Surv. Res. Meth. Sect. Amer.Statist. Assoc. 181–184. 6. Kuk, A. Y. C., Mak, T. K. (1989). Median estimation in the presence of auxiliary information. J. Roy. Statist. Soc. Ser. B51:261–269. 7. Mak, T. K., Kuk, A. Y. C. (1993). A new method for estimating finite population quantiles using auxiliary information. Cana. J. Statist. 21:29–38. 8. Meeden, G. (1995). Median estimation using auxiliary information. Surv. Methodol. 21:71–77. 9. Murthy, M. N. (1964). Product method of estimation. Sankhya 26:294–307. 10. Rao, J. N. K., Kovar, J. G., Mantel, H. J. (1990). On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77:365–375. 11. Reddy, V. N. (1973). On ratio and product method of estimation. Sankhya B35:307– 317. 12. Reddy, V. N. (1974). On a transformed ratio method of estimation. Sankhya C36:59–70. 13. Robson, D. S. (1957). Application of multivariate polykays to the theory of unbiased ratiotype estimators. J. Amer. Statist. Assoc. 52:511–522. 14. Rueda, M., Arcos, A., Arte’s, E. (1998). Quantile in finite population using a multivariate ratio estimator. Metrika 47:203–213. 15. Sahai, A., Ray, S. K. (1980). An efficient estimator using auxiliary information. Metrika 27:27–275. 16. Singh, H.P. and Solanki, R. S., (2013): Some Classes of estimators for the Population Median Using Auxiliary Information. Com. in Stat. 42:4222-4238. 17. Singh, H. P., Sidhu, S. S., Singh, S. (2006). Median estimation with known interquartile of auxiliary variable. Int. J. Appl. Math. Statist. 4:68–80. 18. Singh, H. P., Singh, S., Joarder, A. H. (2003a). Estimation of population median when of an auxiliary variable is known. J. Statist. Res. 37(1):57–63. 19. Singh, H. P., Singh, S., Puertas, S. M. (2003b). Ratio type estimators for the median of finite populations. Allgemeines Statistisches Archiv. 87:369–382. 20. Singh, R. and Kumar, M. (2011): A note on transformations on auxiliary variable in survey sampling. Mod. Assis. Stat. Appl., 6:1, 17-19. doi 10.3233/MAS-2011-0154 21. Singh, S. (2003). Advanced sampling Theory with Applications: How Michael ‘Selected’ Amy.The Netherlands: Kluwer Academic Publishers. 22. Singh, S., Joarder, A. H., Tracy, D. S. (2001). Median estimation using double sampling. Austral. NZ J. Statist. 43:33–46. 23. Singh, S., Singh, H. P., Upadhyaya, L. N. (2007). Chain ratio and regression type estimators for median estimation in survey sampling. Statist. Pap. 48(1):23–46. 24. Srivastava, S. K. (1967). An estimator using auxiliary information in sample surveys. Calcutta Statist. Assoc. Bull. 6:121–132. 25. Srivastava, S. K. (1971). A generalized estimator for the mean of a finite population using multi-auxiliary information. J. Amer. Statist. Assoc. 66:404–407.Srivastava, S. K., Jhajj, H. S. (1981). A class of estimators of the population mean in survey sampling using auxiliary information. Biometrika 68:341–343. 26. Vos, J. W. E. (1980). Mixing of direct ratio and product method estimators. Statististica Neerlandica 34:209–218. 27. Walsh, J. E. (1970). Generalization of ratio estimator for population total. Sankhya A42:99–106.