A Regression Composite Estimator with Application to the Canadian Labour Force Survey
Total Page:16
File Type:pdf, Size:1020Kb
Survey Methodology, June 2001 45 Vol. 27, No. 1, pp. 4551 Statistics Canada, Catalogue No. 12001 A Regression Composite Estimator with Application to the Canadian Labour Force Survey Wayne A. Fuller and J.N.K. Rao 1 Abstract The Canadian Labour Force Survey is a monthly survey of households selected according to a stratified multistage design. The sample of households is divided into six panels (rotation groups). A panel remains in the sample for six consecutive months and is then dropped from the sample. In the past, a generalized regression estimator, based only on the current month’s data, has been implemented with a regression weights program. In this paper, we study regression composite estimation procedures that make use of sample information from previous periods and that can be implemented with a regression weights program. Singh (1996) proposed a composite estimator, called MR2, which can be computed by adding xvariables to the current regression weights program. Singh’s estimator is considerably more efficient than the generalized regression estimator for oneperiod change, but not for current level. Also, the estimator of level can deviate from that of the generalized regression estimator by a substantial amount and this deviation can persist over a long period. We propose a “compromise” estimator, using a regression weights program and the same number of xvariables as MR2, that is more efficient for both level and change than the generalized regression estimator based only on the current month data. The proposed estimator also addresses the drift problem and is applicable to other surveys that employ rotation sampling. Key Words: Survey sampling; Rotating samples; Combining estimators. 1. Introduction estimation for repeated surveys, using generalized least squares procedures. For the CPS, Hansen et al. (1955) Composite estimation is a term used in survey sampling proposed a simpler estimator, called the Kcomposite esti to describe estimators for a current period that use infor mator. Gurney and Daly (1965) presented an improvement mation from previous periods of a periodic survey with a to the Kcomposite estimator, called the AKcomposite esti rotating design. When some units are observed in some of mator with two weighting factors A and K. Breau and Ernst the periods, but not in all periods, it is possible to use this (1983) compared alternative estimators to the Kcomposite fact to improve estimates for all time periods. estimator for the CPS. Rao and Graham (1964) studied opti Statistics Canada, U.S. Bureau of the Census and some mal replacement schemes for the Kcomposite estimator. other statistical agencies use a rotating design for labour Eckler (1955) and Wolter (1979) studied twolevel rotation force surveys. The current Canadian Labour Force Survey schemes such as the one used in the U.S. Retail Trade (LFS) is a monthly survey of about 59,000 households, Survey. Yansaneh and Fuller (1998) studied optimal recur which are selected according to a stratified multistage sive estimation for repeated surveys. Fuller (1990) and Lent, sampling design. The ultimate sampling unit is the house Miller, Cantwell and Duff (1999) developed the method of hold and a sample of households is divided into six panels composite weights for the CPS. The composite weights are (rotation groups). A rotation group remains in the sample obtained by raking the design weights to specified control for six consecutive months and is then dropped from the totals that included population totals of auxiliary variables sample completely. Thus fivesixths of the sample of and Kcomposite estimates for characteristics of interest, y. households is common between two consecutive months. Using the composite weights, users can generate estimates Singh, Drew, Gambino and Mayda (1990) and Gambino, from microdata files for the current month without recourse Singh, Dufour, Kennedy and Lindeyer (1998) contain to data from previous months. detailed descriptions of the LFS design. In the U.S. Current The above authors used the traditional designbased Population Survey (CPS), the sample is composed of eight approach, assuming the unknown totals on each occasion to rotation groups. A rotation group stays in the sample for be fixed parameters. Other authors (Scott, Smith and Jones four consecutive months, leaves the sample for the 1977; Jones 1980; Binder and Dick 1989; Bell and Hillmer succeeding eight months, and then returns for another four 1990; Tiller 1989 and Pfeffermann 1991) developed esti consecutive months. It is then dropped from the sample mates for repeated surveys under the assumption that the completely. Thus there is a 75 percent monthtomonth underlying true values constitute a realization of a time sample overlap and a 50 percent yeartoyear sample series. overlap (Hansen, Hurwitz, Nisselson and Steinberg 1955). Statistics Canada considered K and AK composite Patterson (1950), following the initial work by Jessen estimation for the Labour Force Survey at several times (1942), provided the theoretical foundations for design and during the past 25 years (Kumar and Lee 1983), but did not 1. Wayne A. Fuller, Iowa State University, 221 Snedecor Hall, Ames, IA, U.S.A.; J.N.K. Rao, Carleton University, Ottawa, Ontario K1S 5B6. 46 Fuller and Rao: A Regression Composite Estimator with Application to the Canadian Labour Force Survey adopt composite estimation. Instead, a generalized regres ˆ åwi + å wi = N t = estimated population total. sion estimator, based only on the current months data, has iÎ At i Î B t been computed with a regression weights program. When Let q be the fraction of the sample in the overlap at time t: composite estimation was considered in the 1990’s, there t ˆ -1 was strong pressure to developed a composite estimation qt = N tå w i . (2.1) iÎ A procedure that used the existing estimation program. Singh t (1996) proposed an ingenious method, called Modified In the Labour Force Survey q t is about 5/6 and is nearly Regression (MR), to address this issue. This method leads to constant over time. We will frequently omit the subscript t a composite estimator, called MR2 estimator, which uses on AB, and q , for simplicity. the existing regression weights program. Singh suggested t t t creating xvariables to be used as control variables in the 2.1 Estimator regression program. With the created variables and the Singh’s (1996) MR2 estimator uses the control variable previous period estimator, the existing regression weights - 1 program is used to construct regression weights that define x1ti= q t( y t-1, i - y ti ) + y ti if i Î A t the estimator for the current period. Control variables with known population totals are also included. = yti ifiÎ Bt , (2.2) An empirical study of the MR2 estimator identified several characteristics of the procedure. First, the estimated in the regression program, where yti is the value of a variance of a oneperiod change is much reduced. Second, characteristics of interest, y, for element i at time t. Because the estimated variance of level is often similar to that for the of nonresponse in the LFS, Singh’s original proposal used direct estimator. Third, the estimator of level could deviate imputation for missing data and set q = 5/ 6, after from the direct estimator by a substantial amount and this imputation for missing data. In our initial discussion we use deviation could extend over a long period. the q t as defined in (2.1), assuming no nonresponse so that In this paper, we study the efficiency of MR estimators imputation is not required. Note that “micromatching” of theoretically, under a simplified setup. We propose also a individual data files at t - 1 and t is needed to calculate x1 ti “compromise” estimator that leads to significant gains in and the resulting MR2 estimator. Additional control efficiency, for both level and change, over the estimator variables of the form (2.2) associated with other yvariables using only the current month’s data. The composite as well as auxiliary variables with known population totals estimator also addresses the “drift” problem mentioned are also included in the regression estimation. The auxiliary above and can be implemented using the existing regression variables in the LFS include demographic variables such as weights program. Gambino, Kennedy and Singh (2000) age, sex and location. evaluated the efficiency of the composite estimates for the The particular xvariables in (2.2) is designed such that LFS data, using a jackknife method of variance estimation. the estimated total of x1 is an estimator of the previous Bell (2000) compared several composite estimators using period total of y. Thus, the control total for x1 in the data from the Australian Labour Force Survey. regression procedure is the previous period estimator of the total of y. Let mˆ t-1 be the estimator of the mean of y for period 2. Composite Regression Estimation t - 1, let ym , t - 1 and ymt be the means of the matched panels at time t - 1 and t respectively, let yt be the grand There are two types of observations used in composite mean of all sample panels at time t, and let y B, t be the estimation; those observed only at the current time, t, and mean of the birth panel at time t. Assume the sample of size those observed both at the current time and at the previous n is divided into g panels of equal size and denote the time, t - 1. Sometimes information in previous observations matched sampling fraction by q. To simplify the discussion is condensed in the estimate(s) for the previous period(s). we consider a single yvariable.