Survey Methodology, June 2001 45 Vol. 27, No. 1, pp. 45­51 Canada, Catalogue No. 12­001

A Regression Composite Estimator with Application to the Canadian Labour Force Survey

Wayne A. Fuller and J.N.K. Rao 1

Abstract The Canadian Labour Force Survey is a monthly survey of households selected according to a stratified multistage design. The sample of households is divided into six panels (rotation groups). A panel remains in the sample for six consecutive months and is then dropped from the sample. In the past, a generalized regression estimator, based only on the current month’s data, has been implemented with a regression weights program. In this paper, we study regression composite estimation procedures that make use of sample information from previous periods and that can be implemented with a regression weights program. Singh (1996) proposed a composite estimator, called MR2, which can be computed by adding x­variables to the current regression weights program. Singh’s estimator is considerably more efficient than the generalized regression estimator for one­period change, but not for current level. Also, the estimator of level can deviate from that of the generalized regression estimator by a substantial amount and this deviation can persist over a long period. We propose a “compromise” estimator, using a regression weights program and the same number of x­variables as MR2, that is more efficient for both level and change than the generalized regression estimator based only on the current month data. The proposed estimator also addresses the drift problem and is applicable to other surveys that employ rotation sampling.

Key Words: Survey sampling; Rotating samples; Combining estimators.

1. Introduction estimation for repeated surveys, using generalized least squares procedures. For the CPS, Hansen et al. (1955) Composite estimation is a term used in survey sampling proposed a simpler estimator, called the K­composite esti­ to describe estimators for a current period that use infor­ mator. Gurney and Daly (1965) presented an improvement mation from previous periods of a periodic survey with a to the K­composite estimator, called the AK­composite esti­ rotating design. When some units are observed in some of mator with two weighting factors A and K. Breau and Ernst the periods, but not in all periods, it is possible to use this (1983) compared alternative estimators to the K­composite fact to improve estimates for all time periods. estimator for the CPS. Rao and Graham (1964) studied opti­ Statistics Canada, U.S. Bureau of the Census and some mal replacement schemes for the K­composite estimator. other statistical agencies use a rotating design for labour Eckler (1955) and Wolter (1979) studied two­level rotation force surveys. The current Canadian Labour Force Survey schemes such as the one used in the U.S. Retail Trade (LFS) is a monthly survey of about 59,000 households, Survey. Yansaneh and Fuller (1998) studied optimal recur­ which are selected according to a stratified multistage sive estimation for repeated surveys. Fuller (1990) and Lent, sampling design. The ultimate sampling unit is the house­ Miller, Cantwell and Duff (1999) developed the method of hold and a sample of households is divided into six panels composite weights for the CPS. The composite weights are (rotation groups). A rotation group remains in the sample obtained by raking the design weights to specified control for six consecutive months and is then dropped from the totals that included population totals of auxiliary variables sample completely. Thus five­sixths of the sample of and K­composite estimates for characteristics of interest, y. households is common between two consecutive months. Using the composite weights, users can generate estimates Singh, Drew, Gambino and Mayda (1990) and Gambino, from microdata files for the current month without recourse Singh, Dufour, Kennedy and Lindeyer (1998) contain to data from previous months. detailed descriptions of the LFS design. In the U.S. Current The above authors used the traditional design­based Population Survey (CPS), the sample is composed of eight approach, assuming the unknown totals on each occasion to rotation groups. A rotation group stays in the sample for be fixed parameters. Other authors (Scott, Smith and Jones four consecutive months, leaves the sample for the 1977; Jones 1980; Binder and Dick 1989; Bell and Hillmer succeeding eight months, and then returns for another four 1990; Tiller 1989 and Pfeffermann 1991) developed esti­ consecutive months. It is then dropped from the sample mates for repeated surveys under the assumption that the completely. Thus there is a 75 percent month­to­month underlying true values constitute a realization of a time sample overlap and a 50 percent year­to­year sample series. overlap (Hansen, Hurwitz, Nisselson and Steinberg 1955). Statistics Canada considered K and AK composite Patterson (1950), following the initial work by Jessen estimation for the Labour Force Survey at several times (1942), provided the theoretical foundations for design and during the past 25 years (Kumar and Lee 1983), but did not

1. Wayne A. Fuller, Iowa State University, 221 Snedecor Hall, Ames, IA, U.S.A.; J.N.K. Rao, Carleton University, Ottawa, Ontario K1S 5B6. 46 Fuller and Rao: A Regression Composite Estimator with Application to the Canadian Labour Force Survey adopt composite estimation. Instead, a generalized regres­ ˆ ∑wi + ∑ wi = N t = estimated population total. sion estimator, based only on the current months data, has i∈ At i ∈ B t been computed with a regression weights program. When Let θ be the fraction of the sample in the overlap at time t: composite estimation was considered in the 1990’s, there t ˆ −1 was strong pressure to developed a composite estimation θt = N t∑ w i . (2.1) i∈ A procedure that used the existing estimation program. Singh t (1996) proposed an ingenious method, called Modified In the Labour Force Survey θ t is about 5/6 and is nearly Regression (MR), to address this issue. This method leads to constant over time. We will frequently omit the subscript t a composite estimator, called MR2 estimator, which uses on AB, and θ , for simplicity. the existing regression weights program. Singh suggested t t t creating x­variables to be used as control variables in the 2.1 Estimator regression program. With the created variables and the Singh’s (1996) MR2 estimator uses the control variable previous period estimator, the existing regression weights − 1 program is used to construct regression weights that define x1ti= θ t( y t−1, i − y ti ) + y ti if i ∈ A t the estimator for the current period. Control variables with known population totals are also included. = yti ifi∈ Bt , (2.2) An empirical study of the MR2 estimator identified several characteristics of the procedure. First, the estimated in the regression program, where yti is the value of a variance of a one­period change is much reduced. Second, characteristics of interest, y, for element i at time t. Because the estimated variance of level is often similar to that for the of nonresponse in the LFS, Singh’s original proposal used direct estimator. Third, the estimator of level could deviate imputation for missing data and set θ = 5/ 6, after from the direct estimator by a substantial amount and this imputation for missing data. In our initial discussion we use deviation could extend over a long period. the θ t as defined in (2.1), assuming no nonresponse so that In this paper, we study the efficiency of MR estimators imputation is not required. Note that “micromatching” of theoretically, under a simplified set­up. We propose also a individual data files at t − 1 and t is needed to calculate x1 ti “compromise” estimator that leads to significant gains in and the resulting MR2 estimator. Additional control efficiency, for both level and change, over the estimator variables of the form (2.2) associated with other y­variables using only the current month’s data. The composite as well as auxiliary variables with known population totals estimator also addresses the “drift” problem mentioned are also included in the regression estimation. The auxiliary above and can be implemented using the existing regression variables in the LFS include demographic variables such as weights program. Gambino, Kennedy and Singh (2000) age, sex and location. evaluated the efficiency of the composite estimates for the The particular x­variables in (2.2) is designed such that LFS data, using a jackknife method of variance estimation. the estimated total of x1 is an estimator of the previous Bell (2000) compared several composite estimators using period total of y. Thus, the control total for x1 in the data from the Australian Labour Force Survey. regression procedure is the previous period estimator of the total of y.

Let µˆ t−1 be the estimator of the mean of y for period 2. Composite Regression Estimation t − 1, let ym , t − 1 and ymt be the means of the matched panels at time t − 1 and t respectively, let yt be the grand There are two types of observations used in composite mean of all sample panels at time t, and let y B, t be the estimation; those observed only at the current time, t, and mean of the birth panel at time t. Assume the sample of size those observed both at the current time and at the previous n is divided into g panels of equal size and denote the time, t − 1. Sometimes information in previous observations matched sampling fraction by θ. To simplify the discussion is condensed in the estimate(s) for the previous period(s). we consider a single y­variable. Then Singh’s (1996) MR2 estimator at time t, constructed with x , can be written in a Let wi be the sampling weight for observation i at time t, let 1 ti regression estimator form as At be the set of elements with observations at both the time periods t and t − 1, and let Bt be the set of elements µˆ =y +()[( x − x βˆ + µˆ −y − y + y)] b , (2.3) observed only at the current time period t. In this initial t t CNt Ct Ct t−1 m , t− 1 m , t t t context, i is the index for an individual respondent. If there where xCNt is the population mean of the vector of auxiliary is no nonresponse, the set At for the LFS is composed of variables, such as age and sex, at time t, xCt is the weighted ˆ individuals in the five panels that were in the sample during sample mean of the auxiliary variables, and (,)β′Ct bt ′ is the the previous period, called the overlap panels. With no vector of regression coefficients for the regression of yt on nonresponse, the set Bt for the LFS contains individuals (xCt , x1 t ). first observed in the current period, called the birth panel. One can write Assume yt, i= yˆ t , i , ( r ) + d t , i , ( r ) ,

Statistics Canada, Catalogue No. 12­001 Survey Methodology, June 2001 47

ˆ where yt , i , ( r ) is the predicted value in the regression of yt , i 1− λA ≈ (1 − θ ) (1 − b0 ) on x Cr and d t,,() i r is the deviation from the regression predicted value. Then and −1 b∗ ≈[ θ + (1 − θ )b ]− 1 b . (2.6) x1t = θ( yˆtit−1,,(1) − + d tit − 1,,(1) − − yˆ tit ,,() − d tit ,,() ) 0 0

+yˆ t, i , ( t ) + d t , i , ( t ) if i∈ A t The first expression on the right of the equality of (2.5) gives the MR2 estimator as a linear combination of the =yˆ t, i , ( t ) + d t , i , ( t ) ifi∈ Bt . direct estimator yt and the difference estimator µˆ t−1 + For demographic variables X , it is reasonable to believe ()ymt− y m, t − 1 i.e., in the form of a composite estimator. Cti The final expression of (2.5) gives the estimator as a linear that yˆ t −1, i , ( t − 1) is close to yˆ t −1, i , ( t ) and close to yˆ t , i , ( t ) . Therefore the part of x that is orthogonal to x is close to combination of a “regression­type” estimator based on the 1t Ct overlap panels and the mean of the birth panels. x= θ−1 ( d − d ) d,1, t t−1, i , ( t − 1) t , i , ( t ) 2.2 An Alternative Estimator + d if i∈ A t, i , ( t ) t It is possible to define alternative regression variables to

= dt, i , ( t ) ifi∈ Bt . use in regression composite estimation. We present a particular regression variable in this subsection. The Thus the partial regression coefficient bt is close to the associated regression estimator is not suggested as the regression coefficient for the regression of d t,,() i t on xd ,1, t , ultimate estimator, but the estimator is a member of a class and the value depends on the correlation between d t,,() i t and for which efficiency calculations are given. An alternative to d t−1, i , ( t − 1) . A simple model for d t,,() i t that has been used in Singh’s (1996) MR2 estimator is outlined in section 5. the past, and the one we adopt in our analysis, is the Define a variable to be equal to the previous period value assumption that the d t,,() i t is the sum of a fixed µ t and an if the individual is in the overlap sample and to be equal to error that is a first order autoregression with parameter ρ. the estimated mean for the previous period if the individual To simplify the presentation, we discuss the simple is in the birth sample. The regression variable is random sampling model without xCt . The results extend to the general case by considering the parameter ρ to be the x2,ti= y t− 1, i if i∈ A t partial correlation between yt and yt −1 after adjusting for = µˆ t−1 ifi∈ Bt . (2.7) xCt . Under the autoregressive model with fixed ρ, an intercept and no other x Ct in the model, it can be shown If this variable is used in a regression estimator, the control ˆ that bt converges in probability to mean is µ t−1 , the previous period estimator, because the mean for the created variable is estimating the mean for −2 2 − 1 b0 = ρlim bt = θρ [2 − θ − 2(1 − θ ) ρ − (1 − θ ) σ y∆ t ] , x % n →∞ period t − 1. Singh (1996) used a variable 2ti similar to x2 ti . In Singh’s variable, the µˆ t−1 in (2.7) is ym , t − 1 if i∈ Bt . where ∆ 2 =(). µ − µ 2 Assuming (1− θ ) σ − 2∆ 2 is small t t t−1 y t Consider a regression estimator constructed with x2 ti and relative to the other terms we get recall that the control mean of x2 t is µˆ t−1 . The regression − 1 estimator using x2 t can be written b0 =& θρ[2 − θ − 2( 1 − θ) ρ ] . (2.4) ˆ − 1 µˆreg, t =y t +(), µ ˆ t−1 − x 2, t β (2.8) For the LFS, b0 =(7 − 2 ρ ) 5 ρ . Alternative representations for the estimator µˆ t , omitting where βˆ is the regression coefficient for the regression of xCt , are obtained using the formula yt= θ y mt +(1 − θ ) y Bt . y on x (subscript t is dropped on βˆ for simplicity), y Thus t 2t t t is the sample mean of y at time t, and x2, t is the sample mean of x for all sample panels at time t − 1. The µˆ t =(1 −b ) yt + [ µˆ t−1 + ( y mt − y m, t − 1 )] b 2, ti regression coefficient βˆ is, approximately, the regression of ˆ = θ[ymt + ( µ t−1 − y m , t − 1 ) b ] yt on x2 t in the set A. The coefficient is not exactly the regression coefficient for the set A because ym , t − 1 is not +−θµ−(1 )( ˆ t−1y m , t − 1 y m , t − 1 + y mt ) b +−θ− (1 )(1 b ) y B, t equal to µˆ t−1 , but the difference between the two estimators ∗ will usually be small. Singh (1996) called the regression =[ θ + (1 − θ )b ][ ym, t + ( µˆ t− 1 − y m , t − 1 ) b ] estimator constructed with x % 2 ti , the MR1 estimator. +(1 − θ )(1 − b ) y B, t Using yt= θ y mt +(1 − θ ) y Bt , the regression estimator of µ t using x2 t as a control variable is given by ∗ = λA[y m, t + ( µˆ t− 1 −y m , t − 1 ) b ] + (1 − λ A ) y B, t , (2.5) µˆ =(1 − θ )y + θ { y + ( µˆ −y ) βˆ }. (2.9) where t B, t m, t t− 1 m , t − 1

Statistics Canada, Catalogue No. 12­001 48 Fuller and Rao: A Regression Composite Estimator with Application to the Canadian Labour Force Survey

The expression within curly brackets in (2.9) is the For the LFS, g = 6 is the number of panels. Now consider regression estimator of µ t using the estimator µˆ t−1 and an estimator that is a linear combination of µˆ m, t and y B, t , only the data from the matched sample A. Note that the regression estimator µˆt = λµ ˆ mt +(1 − λ ) y B, t µˆ =y +(), µˆ −y βˆ (2.10) m, t m , t t− 1 m , t − 1 = λ( ym, t − y m , t− 1 β + µˆ t − 1 β) + (1 − λ )y B, t , (3.7) where βˆ is the regression of y on y in the set A, is the t t −1 where 0≤ λ ≤ 1 is to be determined. To minimize the optimal linear estimator for µ t based on µˆ t−1 and the data variance of current level, given µˆ t−1 with variance of set A. Note that β = ρ if the variances are the same at the − 1 two time periods. Hereafter, we often set β = ρ. qt V{}, y B, t one would minimize

Using the variable x2 t gives the optimal estimator, µˆ mt , based on data in set A, but it does not combine that estimator VV{µˆt } = { λµ ˆ mt + (1 − λ )y B, t } with the mean of set B in an optimal way. As can be seen in 2 2 (3.8) (2.10), the weight given to the mean of set B is 1− θ . In = λV{ µˆ mt } + (1 − λ )V { y B, t }, general, this weight is too large because the variance of the regression estimator is less than the variance of the simple with respect to λ. Under the assumptions (3.3), (3.4) and mean. (3.5), the optimum λ for current level is

−1 − 1 2− 1 2 − 1 λopt =[g θ (1 − ρ ) +qt ρ + 1] . 3. Optimal Estimation However, if one is planning on using the estimator for a The way in which one chooses to combine the regression long period of time, one must realize that only certain values estimator for set A with the mean of set B depends on one’s of qt are possible in the long run. The value of λ chosen to estimate µ determines the variance of µˆ and hence, objective function and on the variance of µˆ t−1 . We give t t some illustrative calculations based on some simplifying determines the variance that will go into the estimator of µ . Assuming β = ρ, we have assumptions. For convenience let V {}µˆ t −1 be expressed as a t+ 1 multiple of the variance of the birth panel, V{µ=ˆ } { g −1 θλ − 1 2 (1 −ρ+ 2 ) q− 1 λρ+−λ 2 2 (1 )2 }V { y } − 1 t t B, t V{}{}.µˆ t−1 = q t V y B, t (3.1) Assume or

V{}{}, y= g− 1 V y (3.2) −1 − 1 − 1 2 2 2 2 2− 1 t B, t qt+1 = g θλ(1 −ρ ) + (1 −λ ) +λρ qt . (3.9)

−1 Cov{µˆ t−1 , ( y m , t − y m , t − 1 β)} = 0, (3.3) Thus, for a given λ , the limiting value for qt is

g −1θ − 1 λ 2(1 − ρ 2 )  Cov{µˆ t−1 ,y B , t } = 0, (3.4) −1 2 2− 1 limqt = (1 − λ ρ )   . (3.10) t →∞ +(1 − λ ) 2  and  

Cov{yB, t ,( y m , t− y m , t − 1 β)} = 0, (3.5) This result is equivalent to that given by Cochran (1977), page 352 equation (12.86). where g is the number of rotation groups (panels). Table 1 contains values of the limit variances as the Assumption (3.1) is reasonable if the original panels have a number of periods becomes large, for selected values of ρ covariance function well approximated by that of a first and λ , where θ = 5/ 6 and g θ = 5 for the LFS. The order autoregressive process. For the LFS, the zero variances are standardized so that the variance of the direct covariances in (3.4) and (3.5), and assumption (3.2) are only estimator based on the mean of six panels is 1.00. Thus, the approximations because y B, t is not based on an entirely entries are six times the limiting value in (3.10). If the independent sample. correlation is 0.95 and λ is set equal to 0.96, the long run We write the regression estimator based on the overlap as variance of current level is 70% of that of the direct estimator. If λ is set equal to 0.90, the long run variance is µˆ =y ˆ − y β + µˆ β m, t m , t m , t− 1 t − 1 58% of that of the direct estimator when ρ = 0.95. and, with the assumptions, obtain The first line in Table 1 is for λ =5/ 6 = θ . This is the λ corresponding to the use of x2 t in a regression estimator. −1 − 1 2− 1 2 V(µˆ mt ) =& [ g θ (1 − ρ ) +qt ρ ] V { y B, t }. (3.6) The variance with λ = 5/6 is always smaller than that of the direct estimator because of the improvement associated

Statistics Canada, Catalogue No. 12­001 Survey Methodology, June 2001 49

with the use of the regression estimator µˆ m, t . Thus, if Under no revision in µˆ t−1 and conditions (3.2) through ρ ≠ 0, the regression estimator with x2 t leads to significant (3.5), the variance of µˆt − µ ˆ t−1 , where µˆ t is defined in reduction in variance over the direct estimator, yt , that uses (3.7), is current data only.

V{µˆt − µ ˆ t−1 } =V { λ [ yt + ( µˆ t−1 − x 2, t ) ρ ] Table 1 Standardized Limit Variances of Level: +(1 − λ )y − µˆ } LFS Rotation Pattern B, t t − 1 ρ λ 0.70 0.80 0.90 0.95 0.98 =[g −1 θ − 1 λ 2 (1 − ρ 2 ) + (1 − λ ) 2 0.833 0.897 0.840 0.743 0.665 0.600 0.840 0.895 0.836 0.734 0.650 0.581 2− 1 0.860 0.894 0.830 0.714 0.614 0.527 +( ρλ − 1)qt ] V { y B, t }. (4.1) 0.880 0.903 0.835 0.705 0.588 0.481 0.900 0.921 0.851 0.711 0.575 0.444 0.920 0.951 0.882 0.736 0.582 0.420 Table 2 contains standardized limit variances of the 0.940 0.992 0.928 0.785 0.617 0.420 estimated change, µˆt − µ ˆ t−1 , for selected values of g and λ , 0.960 1.046 0.994 0.867 0.698 0.465 with gθ = 5. The entries in the table are the limiting 0.980 1.115 1.083 0.997 0.861 0.619 variances of estimated change divided by the variance of 0.990 1.155 1.138 1.087 0.998 0.803 change based on the common elements, V{}. y− y 0.995 1.177 0.168 1.140 1.089 0.960 m, t m , t − 1 The variance of change based on the common elements is − 1 2θ (1 − θ )(1 − ρ )V { y B, t }. The optimal λ is a function of ρ and increases slowly as ρ increases. For ρ = 0.0, the optimal λ is 0.833, for ρ = 0.7 the optimal λ is about 0.85, for ρ = 0.95 the Table 2 Standardized Limit Variances of No­Revision optimal λ is about 0.91 and for ρ = 0.98 the optimal λ is One Period Change: LFS Rotation Pattern about 0.93. We now turn to the MR2 estimator (2.3) which can be ρ λ 0.70 0.80 written as 0.90 0.95 0.98 0.833 1.039 1.168 1.550 2.312 4.595 µˆ = λ[y + ( µˆ −y ) b∗ ] + (1 − λ ) y , 0.840 1.024 1.142 1.492 2.189 4.277 t A m, t t− 1 m , t − 1 A B, t 0.860 0.989 1.079 1.345 1.872 3.454 * 0.880 0.963 1.029 1.223 1.607 2.756 where λ A and b are defined in (2.6). While the MR2 estimator is not a member of the class (3.7), to the degree 0.900 0.947 0.993 1.127 1.391 2.181 0.920 0.940 0.970 1.055 1.222 1.723 that b * is “close to” ρ , it is “close to” a member of the 0.940 0.942 0.959 1.007 1.100 1.379 class. For example if ρ = 0.95, then b 0 B 0.9314 and * 0.960 0.953 0.961 0.982 1.024 1.146 b B 0.9422. If ρ = 0.90, then b 0 B 0.8659 and 0.980 0.972 0.975 0.980 0.991 1.021 * b B 0.8853. 0.990 0.985 0.986 0.987 0.990 0.998 Using the limiting value b0 of b, we have (1− λA ) = 0.995 0.992 0.993 0.993 0.994 0.996 (1− θ )(1 − b0 ), where b0 is given by (2.4). Then λA = 0.9375, 0.9568, 0.9776, 0.9886, and 0.9954 for ρ = 0.70, 0.80, 0.90, 0.95 and 0.98, respectively. From Table 1, the Tables 1 and 2 make clear the cost of not revising the standardized variances of µˆ t for these values of λ A are estimate of µˆ t−1 . For example, if ρ = 0.95, the variance of 0.986, 0.982, 0.978, 0.976, and 0.975, for ρ = 0.70, 0.80, no­revision one period change is minimized with λ B 0.99, 0.90, 0.95, and 0.98, respectively. Thus, the MR2 estimator but the variance of level is minimized with λ B 0.91. A for current level has an efficiency for level that is essentially compromise value of λ = 0.95 gives a variance of level that the same as that of the direct estimator, yt . The efficiency is about 14% larger than optimal and a variance of change of the MR1 estimator is that for λ = 0.833 in Table 1 and is that is about 7% larger than the smallest variance of Table 2. always superior to that of yt . Using the values of λ A associated with the MR2 estimator, the entries in Table 2 are 0.940, 0.960, 0.979, 0.989, and 0.996 for ρ = 0.70, 0.80, 0.90, 0.95 and 0.98, 4. Variance of One­Period Change respectively. Thus, given no revision, and ignoring the

difference between b0 and ρ, the MR2 estimator is nearly Given µˆ t−1,,y m , t − 1 y m , t and y B, t the optimal estimator of optimal as an estimator of change, unlike the MR1 µ t−1 is no longer µˆ t−1 because ym , t contains information estimator, where the MR1 estimator corresponds to λ = about µ t−1 . However, it is not customary practice to revise 0.833 in Table 2. the estimator of µ t−1 . Given no revision, the estimator of change is µˆt − µ ˆ t−1 .

Statistics Canada, Catalogue No. 12­001 50 Fuller and Rao: A Regression Composite Estimator with Application to the Canadian Labour Force Survey

5. A Compromise Estimator potential biases associated with the composite estimator should be smaller with the smaller α. On the basis of Table 2, the efficiency of the MR2 estimator of change for the LFS based on x1 t , for the no­ Table 3 revision case, is quite good. The MR1 no­revision estimator Approximate Efficiencies of Compromise Estimators Relative to α = 1 of change based on x2 t has relatively poor efficiency because it is a member of the class (3.7) with λ = θ = α = 0.75 α = 0.65 0.8333. On the other hand, the MR1 estimator of level based ρ b0 1 − λ A µˆ t µˆt − µ ˆ t−1 µˆ t µˆt − µ ˆ t−1 on x2 t is superior to the MR2 estimator based on x1 t , and 0.70 0.625 0.0625 1.052 0.999 1.069 0.995 there are members of the class (3.7) that are much superior 0.80 0.741 0.0432 1.099 0.994 1.129 0.934 to the MR2 estimator of level. 0.90 0.865 0.0224 1.238 0.975 1.303 0.946 Because the λ in the MR2 estimator is relatively large 0.95 0.931 0.0114 1.502 0.936 1.616 0.875 and the λ for the MR1 estimator is relatively small, we can 0.98 0.972 0.0046 2.177 0.833 2.321 0.712 create approximations to most interesting members of the class (3.7) as linear combinations of (2.10) and (2.5). Let 6. Drift Problem

x3,ti= α x 1, ti +(1 − α ) x2, ti , (5.1) As noted in the Introduction, the MR2 estimator could where 0≤ α ≤ 1 is a fixed number. The regression estimator deviate from the direct estimator by a substantial amount based on x3 ti gives an approximation to a member of the and this deviation could extend over a long period. We now class (3.7) with illustrate the basis for this phenomenon. We can express the

deviation of the compromise regression estimator µˆ t , based λ = αλA +(1 − α ) θ , (5.2) on x3 ti , from the true mean µ t as where λ A is defined in (2.6). Thus, if ρ = 0.95, t −1 t j λ = α(0.9886) + (1 − α )(5/ 6), µ−µ=λρˆ t t ( ) ( µ−µˆ 0 0 ) +∑ ( λρ ) j = 0 for the LFS rotation pattern with θ = 5/ 6 and [λrm, t− j + (1 − λ )(y B, t− j − µ t − j )], (6.1) −1 b0 =(7 − 2 ρ ) 5 ρ ; λ = 0.95 if α = 0.75. We choose α to give the desired combination of y B, t where µ 0 is the mean at the initiation of the process and and the “regression estimator” based on observations in set

A. If one does not revise the estimator of µ t−1 , the preferred rmt= y mt − µ t − ρ( y m, t− 1 − µ t − 1 ). combination depends on the relative importance assigned to the variance of level and to the variance of change. If ρ is close to one and we use λ = 1, then the error Table 3 gives the variance of the MR2 estimator ( α = 1) µˆ t − µ t behaves roughly as a random walk which can lead relative to the variance of the estimator constructed using to long periods in which µˆ t − µ t has the same sign. On the α = 0.75 and the variance of the estimator constructed other hand, if α < 1 and ρ = 1, then λ < 1 and the error using α = 0.65. An entry in Table 3 for µˆ is expression t µˆ t − µ t exhibits less drift. For example, if α = 0.70, the (3.10) evaluated at λ of (2.6) and ρ , divided by (3.10) A correlation between adjacent errors µˆ t − µ t , will be no evaluated at λ of (5.2) and ρ. An entry for µˆt − µ ˆ t−1 is greater than 0.95 under assumption (3.2)­(3.5). For the MR2 expression (4.1) evaluated at λ A of (2.6) and ρ , divided by estimator, λ → 1 as ρ → 1 and hence the MR2 estimator (4.1) evaluated at λ of (5.2) and ρ. These are can exhibit drift for ρ close to one. approximations to actual efficiencies because ρ is used for the coefficient of x3 . It is clear from Table 3 that the compromise estimator is slightly inferior to the MR2 7. Concluding Remarks estimator for one­period change, but is much superior to the MR2 estimator for level. For example, with ρ = 0.95 and For simplicity, we often assumed simple random α = 0.65, the relative efficiency of the compromise sampling to obtain theoretical results. Similar results hold estimator is 1.62 for level and 0.87 for one­period change. for complex designs and additional auxiliary variables, by For larger values of ρ , the variance of change is much considering ρ to be a partial autocorrelation. Also, we used smaller than the variance of level. Thus, for ρ = 0.95, the x3 ­variables corresponding to only one variable y, but variance of level and of change for α = 1.00 are about 1.00 several y­variables can be used in constructing the corres­ and 0.12, respectively, while the variance of level and of ponding x­variables for use in regression estimation. change for α = 0.75 are about 0.67 and 0.13, respectively, Gambino, Kennedy and Singh (2001) conducted an when expressed in common units. empirical study with LFS data using several x3 ­variables The smaller α has the advantage that the composite with a common α , and arrived at a compromise α for use estimator will be closer to the direct estimator. Thus, in the LFS.

Statistics Canada, Catalogue No. 12­001 Survey Methodology, June 2001 51

In section 2.1, we assumed no nonresponse so that Hansen, M.H., Hurwitz, W.N., Nisselson H. and Steinberg, J. (1955). imputation is not required. But in the LFS, nonresponse on The redesign of the census current population survey. Journal of the American Statistical Association, 50, 701­719. an item y may occur either at time t − 1 or a time t or at both time points. Gambino, Kennedy and Singh (2001) provide Jessen, R.J. (1942). Statistical investigation of a sample survey for details of the imputation methods actually used in the LFS. obtaining farm facts. Iowa Agricultural Experiment Station Research Bulletin, 304, 54­59. Jones, R.G. (1980). Best linear unbiased estimators for repeated Acknowledgements surveys. Journal of the Royal Statistical Society, B, 42, 221­226. Kumar, S., and Lee, H. (1983). Evaluation of composite estimation The research of Wayne Fuller was partly supported by for the Canadian Labour Force Survey. Survey Methodology, 9, Cooperative Agreement 43­3AEU­3­80088 between Iowa 178­201. State University, the National Agricultural Statistics Lent, J., Miller, S.M., Cantwell, P.J. and Duff, M. (1999). Effect of Service and the U.S. Bureau of the Census. We thank composite weights on some estimates from the Current Population Survey. Journal of Official Statistics, 14, 431­448. Harold Mantel for a careful reading of the manuscript that led to improvements. Patterson, H.D. (1950). Sampling on successive occasions with partial replacement of units. Journal of the Royal Statistical Society, B, 12, 241­255. References Pfeffermann, D. (1991). Estimation and seasonal adjustment of population means using data from repeated surveys. Journal of Bell, P. (2001). Comparisons of alternative Labour Force Survey Business and Economic Statistics, 9, 163­175. estimators. Survey Methodology, 27, 53­63. Rao, J.N.K., and Graham, J.E. (1964). Rotation designs for sampling Bell, W.R, and Hillmer, S.C. (1990). The time series approach to on repeated occasions. Journal of the American Statistical estimation for periodic surveys. Survey Methodology, 16, 195­215. Association, 59, 492­509. Binder, D.A., and Dick, J.P. (1989). Modeling and estimation for Scott, A.J., Smith, T.M.F. and Jones, R.G. (1977). The application of repeated surveys. Survey Methodology, 15, 29­45. time series methods to the analysis of repeated surveys. International Statistical Review, 45, 13­28. Breau, P., and Ernst, L. (1983). Alternative estimators to the current composite estimator. Proceedings of the Section on Survey Singh, A.C. (1996). Combining information in survey sampling by Research Methods, American Statistical Association, 397­402. modified regression. Proceedings of the Section on Survey Research Methods, American Statistical Association, 120­129. Cochran, W.G. (1977). Sampling Techniques. 3 rd Ed. New York: John Wiley & Sons, Inc. Singh, M.P., Drew, J.D., Gambino, J. and Mayda, F. (1990). Methodology of the Canadian Labour Force Survey. Catalogue Eckler, A.R. (1955). Rotation sampling. Annals of Mathematical No. 71­526, Statistics Canada. Statistics, 26, 664­180. Tiller, R. (1989). A Kalman filter approach to labor force estimation Fuller, W.A. (1990). Analysis of repeated surveys. Survey using survey data. Proceedings of the Section on Survey Research Methodology, 16, 167­180. Methods, American Statistical Association, 16­25. Gambino, J.G., Kennedy, B. and Singh, M.P. (2001). Regression Wolter, K.M. (1979). Composite estimation in finite populations. composite estimatiuon for the Canadian labour force survey: Journal of the American Statistical Association, 74, 604­613. Evaluation and implementation. Survey Methodology, 27, 65­74. Yansaneh, I.S., and Fuller, W.A. (1998). Optimal recursive estimation Gambino, J.G., Singh, M.P., Dufour, J., Kennedy, B. and Lindeyer, J. for repeated surveys. Survey Methodology, 24, 31­40. (1998). Methodology of the Canadian Labour Force Survey. Catalogue no. 71­526, Statistics Canada. Gurney, M., and Daly, J.F. (1965). A multivariate approach to estimation in periodic sample surveys. Proceedings of the American Statistical Association, Section on Social Statistics, 242­ 257.

Statistics Canada, Catalogue No. 12­001