<<

Catalogue no. 12-001-XIE Survey Methodology

December 2007

Statistics Statistique Canada How to obtain more information

Specifi c inquiries about this product and related statistics or services should be directed to: Business Survey Methods Division, Statistics Canada, Ottawa, Ontario, K1A 0T6 (telephone: 1-800-263-1136).

For information on the wide range of data available from Statistics Canada, you can contact us by calling one of our toll-free numbers. You can also contact us by e-mail or by visiting our website at www.statcan.ca.

National inquiries line 1-800-263-1136 National telecommunications device for the hearing impaired 1-800-363-7629 Depository Services Program inquiries 1-800-700-1033 Fax line for Depository Services Program 1-800-889-9734 E-mail inquiries [email protected] Website www.statcan.ca

Accessing and ordering information

This product, catalogue no. 12-001-XIE, is available for free in electronic format. To obtain a single issue, visit our website at www.statcan.ca and select Publications.

This product, catalogue no. 12-001-XPB, is also available as a standard printed publication at a price of CAN$30.00 per issue and CAN$58.00 for a one-year subscription.

The following additional shipping charges apply for delivery outside Canada:

Single issue Annual subscription United States CAN$6.00 CAN$12.00 Other countries CAN$10.00 CAN$20.00

All prices exclude sales taxes.

The printed version of this publication can be ordered by

• Phone (Canada and United States) 1-800-267-6677 • Fax (Canada and United States) 1-877-287-4369 • E-mail [email protected] • Mail Statistics Canada Finance Division R.H. Coats Bldg., 6th Floor 100 Tunney's Pasture Driveway Ottawa (Ontario) K1A 0T6 • In person from authorised agents and bookstores.

When notifying us of a change in your address, please provide both old and new addresses.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients. To obtain a copy of these service standards, please contact Statistics Canada toll free at 1-800-263-1136. The service standards are also published on www.statcan.ca under About us > Providing services to Canadians. Statistics Canada Business Survey Methods Division Survey Methodology

December 2007

Published by authority of the Minister responsible for Statistics Canada

© Minister of Industry, 2008

All rights reserved. The content of this electronic publication may be reproduced, in whole or in part, and by any means, without further permission from Statistics Canada, subject to the following conditions: that it be done solely for the purposes of private study, research, criticism, review or newspaper summary, and/or for non-commercial purposes; and that Statistics Canada be fully acknowledged as follows: Source (or “Adapted from”, if appropriate): Statistics Canada, year of publication, name of product, catalogue number, volume and issue numbers, reference period and page(s). Otherwise, no part of this publication may be reproduced, stored in a retrieval system or transmitted in any form, by any means—electronic, mechanical or photocopy—or for any purposes without prior written permission of Licensing Services, Client Services Division, Statistics Canada, Ottawa, Ontario, Canada K1A 0T6.

January 2008

Catalogue no. 12-001-XIE ISSN 1492-0921

Catalogue no. 12-001-XPB ISSN: 0714-0045

Frequency: semi-annual

Ottawa

Cette publication est disponible en français sur demande (no 12-001-XIF au catalogue).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued cooperation and goodwill. Survey Methodology A Journal Published by Statistics Canada Volume 33, Number 2, December 2007

Contents

In This Issue...... 95

Waksberg Invited Paper Series

Carl­Erik Särndal The calibration approach in survey theory and practice ...... 99

Regular Papers

Seppo Laaksonen Weighting for two­phase surveyed data ...... 121

Pascal Ardilly and Pierre Lavallée Weighting in rotating samples: The SILC survey in France ...... 131

Jay J. Kim, Jianzhu Li and Richard Valliant Cell collapsing in poststratification...... 139

Fulvia Mecatti A single frame multiplicity estimator for multiple frame surveys ...... 151

David Haziza Variance estimation for a ratio in the presence of imputed data ...... 159

James Chipperfield and John Preston Efficient bootstrap for business surveys ...... 167

Jacob J. Oleson, Chong Z. He, Dongchu Sun and Steven L. Sheriff Bayesian estimation in small areas when the sampling design strata differ from the study domains ...... 173

Enrico Fabrizi, Maria Rosaria Ferrante and Silvia Pacei Small area estimation of average household income based on unit level models for panel data...... 187

Anne Renaud Estimation of the coverage of the 2000 census of population in Switzerland: Methods and results...... 199

Marc N. Elliott and Amelia Haviland Use of a web­based convenience sample to supplement a probability sample...... 211

Acknowledgements...... 217

Statistics Canada, Catalogue No. 12­001­X The paper used in this publication meets the minimum Le papier utilisé dans la présente publication répond aux requirements of American National Standard for Infor- exigences minimales de l’“American National Standard for Infor- mation Sciences – Permanence of Paper for Printed Library mation Sciences” – “Permanence of Paper for Printed Library Materials, ANSI Z39.48 - 1984. Materials”, ANSI Z39.48 - 1984. ∞ ∞ Survey Methodology, December 2007 95 Vol. 33, No. 2, pp. 95­96 Statistics Canada, Catalogue No. 12­001­X

In This Issue

This issue of Survey Methodology opens with the seventh paper in the annual invited paper series in honour of Joseph Waksberg. The editorial board would like to thank the members of the selection committee ­ Gordon Brackstone, chair, Bob Groves, Sharon Lohr and Wayne Fuller – for having selected Carl­Erik Särndal as the author of this year’s Waksberg Award paper. For this occasion, a special Workshop on Calibration and Estimation in Surveys (WCES) was organised on October 31 st and November 1 st at Statistics Canada. Professor Carl­Erik Särndal was the keynote speaker and presented his Waksberg paper. During the two days, 12 other speakers presented a paper and paid their tribute to Carl­ Erik Särndal. In his paper entitled “The Calibration Approach in Survey Theory and Practice” Särndal discusses the development and application of calibration in survey sampling. He describes the concept of calibration in some detail and contrasts it with generalized regression. He then describes different approaches to calibration including the minimum distance method, instrumental variables, and model calibration. Several examples of calibration and alternatives are considered. Laaksonen discusses weighting in two phase sampling in which respondents to the first phase are asked if they are willing to participate in the second phase. The weighting thus has to deal with non­response at both phases of the survey, and also account for first­phase respondents who were unwilling to participate in the second phase. Using data from a Finnish survey on leisure­time activities, he empirically evaluates variations on a weighting method that uses response propensity modeling and calibration. The article by Ardilly and Lavallée discusses the weighting problem for the SILC (Statistics on Income and Living Conditions) survey in France. This survey uses a rotating sample plan with nine panels. To obtain approximate estimators without bias, the authors relied on the weight­share method. Longitudinal weighting is discussed first, and then cross­sectional weighting is also discussed. The paper by Kim, Li and Valliant deals with the problem of small cells or large weight adjustments when poststratification is used. The authors first describe several standard estimators and then introduce two alternative estimators based on cell collapsing. They study the performance of these estimators in terms of their effectiveness in controlling the coverage bias and the design variance. These properties are evaluated theoretically and also through a simulation study using a population based on the 2003 National Health Interview Survey. Mecatti proposes a simple multiplicity estimator in the context of multi­frame surveys. She first shows that the proposed estimator is design­unbiased. Then, she proposes an unbiased estimator of the variance of the multiplicity estimator. Using 29 simulated populations, she compares the multiplicity estimator with alternative estimators proposed in the literature. Haziza studies the problem of variance estimation for a ratio of two totals when marginal random hot deck imputation has been used to fill in missing data. Two approaches to inference are considered, one using an imputation model and a second one using a nonresponse model. Variance estimators are derived under two frameworks: the reverse approach of Shao and Steel (1999) and the traditional two­phase approach. In their paper, Chipperfield and Preston describe the without replacement scaled bootstrap variance estimator that was implemented in the Australian Bureau of Statistics’ generalized estimation system ABSEST. The without replacement scaled bootstrap estimator is shown to be more efficient than the with replacement scaled bootstrap estimator for stratified samples when the stratum sizes are small. In addition, the without replacement scaled bootstrap estimator was shown to require fewer replicates to achieve the same replication error as the with replacement estimator. For the ABSEST system, bootstrap variance estimators were chosen over other variance estimation methods for their computational efficiency and the without replacement bootstrap was selected for the reasons above. 96 In This Issue

Oleson, He and Sun describe a Bayesian modelling approach for situations where the sampling design is stratified and the estimation procedure requires post­stratification. The method is illustrated with data from the 1998 Missouri Turkey Hunting Survey for which the strata were defined by the hunter’s place of residence but estimates were required at the county level. Fabrizi, Ferrante and Pacei discuss a methodology which is increasingly important in modern sample survey applications. They investigate the effect of borrowing strength from additional panel information for cross sectional household income estimates for small areas in Italy. The proposed methods seem to tackle a problem which may have further relevance for European Official Statistics, and possibly also in the area of small area statistics for indicators which may be used for policy research. Renaud presents an interesting application of a post­enumeration survey to estimate net undercoverage in the 2000 census in Switzerland. The objective of this survey was slightly different from that of other countries in that it was not designed to adjust the Census counts for net undercoverage, but rather to gather information to improve the quality of subsequent censuses. In the final paper, Elliot and Haviland consider combining a convenience sample with a probability based sample to obtain an estimate with a smaller MSE. The resulting estimator is a linear combination of the convenience and probability sample estimates with weights that are a function of the bias. By looking at the maximum incremental contribution of the convenience sample, they show that improvement to the MSE may be attainable only in certain circumstances.

Harold Mantel, Deputy Editor

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 97 Vol. 33, No. 2, pp. 97­98 Statistics Canada, Catalogue No. 12­001­X

Waksberg Invited Paper Series

The journal Survey Methodology has established an annual invited paper series in honour of Joseph Waksberg, who has made many important contributions to survey methodology. Each year a prominent survey researcher is chosen to author an article as part of the Waksberg Invited Paper Series. The paper reviews the development and current state of a significant topic within the field of survey methodology, and reflects the mixture of theory and practice that characterized Waksberg’s work. The author receives a cash award made possible by a grant from Westat, in recognition of Joe Waksberg’s contributions during his many years of association with Westat. The grant is administered financially by the American Statistical Association. Previous winners are listed below. Their papers in the series have already appeared in Survey Methodology.

Previous Waksberg Award Winners:

Gad Nathan (2001) Wayne A. Fuller (2002) Tim Holt (2003) Norman Bradburn (2004) J.N.K. Rao (2005) Alastair Scott (2006) Carl­Erik Särndal (2007)

Nominations:

The author of the 2009 Waksberg paper will be selected by a four­person committee appointed by Survey Methodology and the American Statistical Association. Nominations of individuals to be considered as authors or suggestions for topics should be sent to the chair of the committee, Robert Groves, by email to [email protected]. Nominations and suggestions for topics must be received by February 29, 2008.

2007 Waksberg Invited Paper

Author: Carl­Erik Särndal

Carl­Erik Särndal, retired professor in the Université de Montréal, is a consultant and expert who has been associated with several national statistical institutes, in particular Statistics Canada and Statistics Sweden as well as Statistics Finland, INSEE and Eurostat. His list of publications comprises three books, including the very well reknown Model Assisted Survey Sampling book that has had a major impact. He is also the author of numerous scientific articles, in sole authorship or in collaboration with researchers from many countries. His research interest in survey sampling has been very diversified, but most often revolved around ways to best using auxiliary information in sampling and estimation. 98 Waskberg Invited Paper Series

Members of the Waskberg Paper Selection Committee (2007­2008)

Robert Groves, (Chair) Wayne A. Fuller, Iowa State University Daniel Kasprzyk, Mathematica Policy Research Leyla Mojadjer, Westat

Past Chairs:

Graham Kalton (1999 ­ 2001) Chris Skinner (2001 ­ 2002) David A. Binder (2002 ­ 2003) J. Michael Brick (2003 ­ 2004) David R. Bellhouse (2004 ­ 2005) Gordon Brackstone (2005 ­ 2006) Sharon Lohr (2006 ­ 2007)

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 99 Vol. 33, No. 2, pp. 99­119 Statistics Canada, Catalogue No. 12­001­X

The calibration approach in survey theory and practice

Carl­Erik Särndal 1

Abstract Calibration is the principal theme in many recent articles on estimation in survey sampling. Words such as “calibration approach” and “calibration estimators” are frequently used. As article authors like to point out, calibration provides a systematic way to incorporate auxiliary information in the procedure. Calibration has established itself as an important methodological instrument in large­scale production of statistics. Several national statistical agencies have developed software designed to compute weights, usually calibrated to auxiliary information available in administrative registers and other accurate sources. This paper presents a review of the calibration approach, with an emphasis on progress achieved in the past decade or so. The literature on calibration is growing rapidly; selected issues are discussed in this paper. The paper starts with a definition of the calibration approach. Its important features are reviewed. The calibration approach is contrasted with (generalized) regression estimation, which is an alternative but conceptually different way to take auxiliary information into account. The computational aspects of calibration are discussed, including methods for avoiding extreme weights. In the early sections of the paper, simple applications of calibration are examined: The estimation of a population total in direct, single phase sampling. Generalization to more complex parameters and more complex sampling designs are then considered. A common feature of more complex designs (sampling in two or more phases or stages) is that the available auxiliary information may consist of several components or layers. The uses of calibration in such cases of composite information are reviewed. Later in the paper, examples are given to illustrate how the results of the calibration thinking may contrast with answers given by earlier established approaches. Finally, applications of calibration in the presence of nonsampling error are discussed, in particular methods for nonresponse bias adjustment.

Key Words: Auxiliary information; Weighting; Consistency; Design­based inference; Regression estimator; Models; Nonresponse; Complex sampling design.

1. Introduction quantitative, on which one wishes to carry out, jointly, an adjustment. 1.1 Calibration defined Kott (2006) defines calibration weights as a set of It is useful in this paper to refer to a definition of the weights, for units in the sample, that satisfy a calibration to calibration approach. I propose the following formulation. known population totals, and such that the resulting estimator is randomization consistent (design consistent), or, Definition. The calibration approach to estimation for finite more rigorously, that the design bias is, under mild populations consists of conditions, an asymptotically insignificant contribution to (a) a computation of weights that incorporate specified the estimator’s mean squared error. This is the property I auxiliary information and are restrained by call “nearly design unbiased”. calibration equation(s), The Quality Guidelines (fourth edition) of Statistics (b) the use of these weights to compute linearly Canada (2003) say: “Calibration is a procedure than can be weighted estimates of totals and other finite used to incorporate auxiliary data. This procedure adjusts population parameters: weight times variable value, the sampling weights by multipliers known as calibration summed over a set of observed units, factors that make the estimates agree with known totals. The resulting weights are called calibration weights or final (c) an objective to obtain nearly design unbiased estimation weights. These calibration weights will generally estimates as long as nonresponse and other non­ result in estimates that are design consistent, and that have a sampling errors are absent. smaller variance than the Horvitz­Thompson estimator.” Part (c) of the definition merits a comment. Nothing In the literature, “calibration” frequently refers to (a) prevents producing weights calibrated to given auxiliary alone; I shall often use the term for (a) to (c) together. information without requiring (c). But most published work Earlier definitions, although less extensive, agree essentially on calibration is in the spirit of (c), so it makes good sense to with mine. Ardilly (2006) defines calibration (or, more include it. When non­sampling errors are present, bias in the precisely, “calage généralisé”) as a method of re­weighting estimates is unavoidable, whether they are made by used when one has access to several variables, qualitative or calibration or by any other method. In line with (c), I

1. Carl­Erik Särndal, 2115 Erinbrook Crescent, #44, Ottawa, Ontario, K1B 4J5, Canada. E­mail: [email protected]. 100 Särndal: The calibration approach in survey theory and practice consider design­based inference to be the standard in this “survey sample weighting” and “regression weighting” and paper. The randomization­based variance of an estimator is “case weighting” are used. Among such “early papers” are thus important. However, the paper focuses on “motivations Alexander (1987), Bankier, Rathwell and Majkowski behind (point) estimation”; for reasons of space, the (1992), Bethlehem and Keller (1987), Chambers (1996), important question of variance estimation is not addressed. Fuller, Loughin and Baker (1994), Kalton and Flores­ 1.2 Comments arising Cervantes (1998), Lemaître and Dufour (1987), Särndal (1982) and Zieschang (1990). I comment later on the The definition in Section 1.1 prompts some comments technique “repeated weighting”, promoted by the Dutch and references to earlier literature: national statistical agency, CBS. The newer term (1) Calibration as a linear weighting method. Calibration “calibration” conveys a more specific message and a more has an intimate link to practice. The fixation on weighting definite direction than the older “weighting”. methods on the part of the leading national statistical agencies is a powerful driving force behind calibration. To (2) Calibration as a systematic way to use auxiliary assign an appropriate weight to an observed variable value, information. Calibration provides a systematic way to take and to sum the weighted variable values to form appropriate auxiliary information into account. As Rueda, Martínez, aggregates, is firmly rooted procedure. It is used in Martínez and Arcos (2007) point out, “in many standard statistical agencies for estimating various descriptive finite settings, the calibration provides a simple and practical population parameters: totals, means, and functions of totals. approach to incorporating auxiliary information into the Weighting is easy to explain to users and other stakeholders estimation”. of the statistical agencies. Auxiliary information was used to improve the accuracy Weighting of units by the inverse of their inclusion of survey estimates long before calibration became popular. probability found firm scientific backing long ago in papers Numerous papers were written with this goal in mind, for such as Hansen and Hurwitz (1943), Horwitz and more or less specialized situations. Today, calibration does Thompson (1952). Weighting became widely accepted. offer a systematic outlook on the uses of auxiliary Later, post­stratification weighting achieved the same status. information. For example, calibration can deal effectively Calibration weighting extends both of these ideas. with surveys where auxiliary information exists at different Calibration weighting is outcome dependent; the weights levels. In two­stage sampling information may exist for the depend on the observed sample. first stage sampling units (the clusters), and other Inverse inclusion probability weights are, by definition, information for the second stage sampling units. In surveys greater than or equal to unity. A commonly heard with nonresponse (that is, essentially all surveys), infor­ interpretation is that “an observed unit represents itself and a mation may exist “at the population level” (known number of others, not observed”. Calibrated weights, on the population totals), and other information “at the sample other hand, are not necessarily greater than or equal to unity, level” (auxiliary variable values for all those sampled, unless special care is taken in the computation to obtain this responding and non­responding). Calibration with property. “composite information” is reviewed in Sections 8 and 9. Calibration is new as a term in survey sampling ­ about Regression estimation, or generalized regression (GREG) 15 years old ­ but not as a technique for producing weights. estimation, competes with calibration as a systematic way to Those who maintain “I practiced calibration long before it incorporating auxiliary information. It is therefore important was called calibration” have a point. The last 15 years to contrast GREG estimation (described in Section 3) with widened the scope and the appeal of the technique. calibration estimation (described in Section 4). The two Weighting akin to calibration has long been used by private approaches are different. survey institutes, for example, in connection with quota sampling, a form of non­probability sampling outside the (3) Calibration to achieve consistency. Calibration is often scope of this paper. described as “a way to get consistent estimates”. (Here Weighting of observed variable values was an important “consistent” refers not to “randomization consistent” but to topic before calibration became a popular term. Some “consistent with known aggregates”.) The calibration authors derived the weights via the argument that they equations impose consistency on the weight system, so that, should differ as little as possible from the unbiased sampling when applied to the auxiliary variables, it will confirm (be design weights (the inverse of the inclusion probabilities). consistent with) known aggregates for those same auxiliary Others found the weights by recognizing that a linear variables. A desire to promote credibility in published regression estimator can be written as a linearly weighted statistics is an often cited reason for demanding consistency. sum of the observed study variable values. Terms such as Some users of statistics dislike finding the same population

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 101 quantity estimated by two or more numbers that do not all study variables, of which there are usually many in large agree. government surveys. The totals with which consistency is sought are (5) Calibration in combination with other terms. Some sometimes called control totals. “Controlled weights” or authors use the word “calibration” in combination with “calibrated weights” suggest improved, more accurate other terms, to describe various directions of thought. estimation. The French term for calibration, “calage”, has a Examples of this proliferation of terms are: Model­ similar connotation of “stability”. calibration (Wu and Sitter 2001); g­calibration Consistency through calibration has a broader implica­ (Vanderhoeft, Waeytens and Museux 2000); Harmonized tion than just agreement with known population auxiliary calibration (Webber, Latouche and Rancourt 2000), Higher totals. Consistency can, for example, be sought with level calibration (Singh, Horn and Yu 1998); Regression appropriately estimated totals, arising in the current survey calibration (Demnati and Rao 2004); Non­linear calibration or in other surveys. (Plikusas 2006); Super generalized calibration (Calage super Consistency among tables estimated from different généralisé; Ardilly 2006); Neural network model­calibration surveys is the motive behind repeated weighting, the estimator and Local polynomial model­calibration estimator technique developed at the Dutch national statistical agency (Montanari and Ranalli 2003, 2005), Model­calibrated CBS in several articles: Renssen and Nieuwenbroek (1997); pseudo empirical maximum likelihood estimator (Wu Nieuwenbroek, Renssen, and Hofman (2000); Renssen, 2003), and yet others. Also, calibration plays a significant Kroese, and Willeboordse (2001); Knottnerus and van Duin role in the indirect sampling methods proposed in Lavalleé (2006). The stated objective is to accommodate user (2006). In a somewhat different spirit, not reviewed here, demands to produce numerically consistent outputs. As the are concepts such as calibrated imputation (Beaumont last mentioned paper points out, repeated weighting can be 2005a), and bias calibration (Chambers, Dorfman and seen as an additional calibration step for a new adjustment Wehrly (1993), Zheng and Little (2003)). The following of already calibrated weights. The final weights realize review pages do not give justice to all the innovations within consistency with given margins. the sphere of calibration, but the names alone do suggest Consistency with known or estimated totals may bring directions that have been explored. the extra benefit of improved accuracy (lower variance and/or reduced nonresponse bias). However, in some (6) Calibration as a new direction for thought. If articles, especially those authored in statistical agencies, calibration represents “a new approach” with clear consistency for user satisfaction seems a more imperative differences compared with predecessors, we must examine motivation than the prospect of increased accuracy. such questions as: Does calibration generalize earlier When the primary motivation for calibration is not so theories or approaches? Does calibration give better, more much an agreement with other statistics as rather to reduce satisfactory answers on questions of importance, as variance and/or nonresponse bias, then “balanced weight compared with earlier recognized approaches? Sections 4.5 system” is a more appropriate description than “consistent and 7.1 in this paper illustrate how the answers provided by weight system”, because the objective is then to balance the calibration compare with, or contrast with, those obtained in weights to reflect the outcome of the sampling, the response earlier modes of reasoning. to the survey, and the information available. The practice of survey sampling encounters “nuisances” such as nonresponse, frame deficiencies and measurement (4) Calibration for convenience and transparency. As errors. It is true that imputation and reweighing for non­ Harms and Duchesne (2006) point out, “The calibration response are widely practiced, through a host of techniques. approach has gained popularity in real applications because But they are somehow “separate issues”, still waiting to be the resulting estimates are easy to interpret and to motivate, more fully embedded into a comprehensive, more relying, as they do, on design weights and natural satisfactory theory of inference in sample surveys. Many calibration constraints.” Calibration on known totals strikes theory papers deal with estimation for an imagined ideal the typical user as transparent and natural. Users who survey, nonexistent in practice, where nonresponse and understand sample weighting appreciate that calibration other non­sampling errors are absent. This is not a criticism leaves the design weights “slightly modified only”, while of the many excellent but idealized theory papers. The respecting the controls. The unbiasedness is only negligibly foundations need to be explored, too. disturbed. The simpler forms of calibration invoke no Sections 9 and 10 indicate that calibration can provide a assumptions, only “natural constraints”. Yet another more systematic outlook on inference in surveys even in the advantage is appreciated by users: In many applications, presence of the various non­sampling errors. Future fruitful calibration gives a unique weighting system, applicable to developments are expected in that regard.

Statistics Canada, Catalogue No. 12­001­X 102 Särndal: The calibration approach in survey theory and practice

2. Basic conditions for design­based estimation It is often the survey environment that dictates whether in sample surveys (i) or (ii) prevails. Case (i), complete auxiliary information,

occurs when x k is specified in the sampling frame for every This section sets the background for Sections 3 to 7. By k∈ U (and thus known for every k∈ s ). This environment “basic conditions” I will mean single phase probability is typical of surveys on individuals and households in sampling of elements and full response. In practice, survey Scandinavia and other North European countries equipped conditions are not that simple and perfect, but many theory with high quality administrative registers that can be papers nevertheless address this situation. matched with the frame to provide a large number of A probability sample s is drawn from the finite potential auxiliary variables. The population total ∑U x k is population U= {1, 2, ..., k , ..., N }. The probability sampling obtained simply by adding the x k . design generates for element k a known inclusion Case (i) gives considerable freedom in structuring the π> 0, probability, k and a corresponding sampling design auxiliary vector x k . For example, if xk is a continuous weight d k = 1/π k . The value yk of the study variable y is variable value specified for every ∈ , kU then we are recorded for all k∈ s (complete response). The objective is 2 invited to consider xk and other functions of xk for to estimate a population total Y= ∑ y with the use of 2 U k inclusion in x k , because totals such as ∑U xk and auxiliary information. The study variable y may be ∑ U log x k are readily computed. If the relationship to the continuous or, as in many government surveys, categorical. study variable y is curved, it may be a serious omission not

For example, if y is dichotomous with value yk = 0 or to take into account known totals such as the quadratic one y k = 1 according as person k is employed or unemployed, or the logarithmic one. then the parameter Y= ∑U yk to be estimated is the Case (ii) prevails in surveys where (i) is not met, but population count of unemployed people. (If AU⊆ is a set where ∑U x k is imported from an outside source of elements, I write ∑ A for ∑ k∈ A.) The basic design considered accurate enough, and the individual value x ˆ k unbiased estimator of Y is YHT = ∑ s dk y k , the Horwitz­ is available (observed in data collection) for every ∈ . ks

Thompson estimator. It is, however, inefficient when Then ∑U x k is sometimes called an “independent control powerful auxiliary information is available for use at the total”, to mark its origin from outside the survey itself. estimation phase. Case (ii) is less flexible: If xk is a variable with a total The general notation for the auxiliary vector will be x . 2 k ∑U xk imported from a reliable source, then ∑U xk may In some countries, for some surveys, the sources of auxiliary 2 be unavailable, barring xk from inclusion into x k . data permit extensive vectors x k to be built. But some examples of simple vectors are: (1) k= x (1,x k )′ , where xk is the value for element k of a continuous auxiliary variable 3. Generalized regression estimation under x; (2) the classification vector used to code membership in the basic conditions one of P mutually exclusive and exhaustive groups, 3.1 The GREG concept x k= γ k =( γ1 k , ..., γ pk , ..., γ Pk )′ , so that, for p= 1, 2, ..., P , γpk = 1 if k belongs to group p, and γpk = 0 if not; (3) the Before examining calibration, let us consider generalized combination of (1) and (2), k= x (,); γ′ kx k γ ′k ′ (4) the vector regression (GREG) estimation (or just regression x k that codifies two classifications stringed out ‘side­by­ estimation), for two good reasons: (1) GREG estimation can side’, the dimension of x k being P + Q – 1, where P and Q are the respective number of categories, and the ‘minus­one’ also be claimed to be a systematic way to take auxiliary is to avoid a singular matrix in the computation of weights information into account; (2) some (but not all) GREG calibrated “to the margins”; (5) the extension of (4) to more estimators are calibration estimators, in that they can be than two ‘side­by­side’ categorical classifications. Cases 4 expressed in terms of a calibrated linear weighting. and 5 are particularly important for production in national GREG estimators and calibration estimators have been statistical agencies. extensively studied in the last two decades. The terms alone, In calibration reasoning it is crucially important to “GREG estimation” and “calibration estimation”, reflect a specify exactly the auxiliary information. Under the basic clear difference in thinking. Statisticians who work in the conditions we need to distinguish two different cases area are of two types: Those dedicated to “GREG thinking” and those dedicated to “calibration thinking”. The relative to x k : distinction may not be completely clear­cut, but it helps (i) x k is a known vector value for every k∈ U structuring this review paper, so I will use it. I am not (complete auxiliary information) venturing to say that the latter thinking is more prevalent in

(ii) ∑U x k is known (imported) total, and x k is known national statistical agencies and the former more prevalent (observed) for every k∈ s in the academic circles, but perhaps there is such a tendency.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 103

The GREG estimator concept evolved gradually since ()/Yˆ − Y N = d E − E N+ O() n − 1 GREG ( ∑s k k ∑ U k ) p the mid­1970’s. The simple (linear) GREG is explained in

Särndal, Swensson and Wretman (1992); a thorough review where ∑ s dk E k is the Horvitz­Thompson estimator in the −1 of regression estimation is given in Fuller (2002). The residuals Ek= y k − x′ k B U ; q with U; q = B()∑U qk x k x ′ k central idea is that predicted y­values yˆ k can be produced (∑U qkx k y k ). Hence, the design­based properties ˆ ˆ for all N population elements, via the fit of an assisting ()GREG EY≈ Y and Var(YGREG )≈ Var(∑ s dk E k ). A close model and the use of the auxiliary vector values x k known fitting linear regression of y on x holds the key to a small ˆ for all ∈ . kU The predicted values serve to build a nearly variance for YGREG (and this is very different from claiming design unbiased estimator of the population total Y= ∑U yk that “a linear regression is the true regression”). as The linear GREG in Särndal, Swensson and Wretman (1992) was motivated via the linear assisting model ξ Yˆ = yˆ + d( y − y ˆ ) 2 GREG ∑U k ∑ s k k k stating that Eξ () y k = β′ x k and Vξ (). y k= σ k Generalized =d y + yˆ − dy ˆ . least squares fit gives the estimator (3.2) with q =1/ σ 2 . In ∑s k k ( ∑U k ∑ s k k ) (3.1) k k that context, an educated guess about the variation of the

The obvious motivation behind this construction is the residuals yk − β′ x k determines the qk . When the vector x k ˆ prospect of a highly accurate estimate YGREG through a is fixed, the modeling effort boils down to an opinion about 2 2 close fitting assisting model that leaves small residuals the residual pattern. The choice σ=k σ x k gives the yk− yˆ k . That modeling is the corner stone of GREG classical ratio estimator. If qk = μ′ x k for all k∈ U and a thinking. Some authors use the (also justifiable) name constant vector , μ then (3.2) reduces to “the cosmetic general difference estimator for the construction (3.1). form” ().∑U xk′ B s; dq The great variety of possible assisting models generates a As Beaumont and Alavi (2004) and others have pointed wide family of GREG estimators of the form (3.1). The out, the linear GREG estimator is bias­robust (nearly assisting model, an imagined relationship between x and y, unbiased although the assisting model falls short of can have many forms: linear, non­linear, generalized linear, “correct”), but it can be considerably less efficient (have mixed (model with some fixed, some random effects), and larger mean squared error) than model dependent alter­ so on. Whatever the choice, the model is “assisting only”; natives which, although biased, may have a considerably even though it may be short of “true”, (3.1) is nearly deign smaller variance. Thus one may claim that linear GREG is unbiased under mild conditions on the assisting model and not variance robust; nevertheless, it is a basic concept in ˆ −1/ 2 on the sampling design, so that ()/()YGREG − Y N = Op n design­based survey theory. ˆ ˆ −1 and (YGREG − Y ) / N = ( YGREG,lin − Y ) / N + Op ( n ), where The specification of x k should include variables (with ˆ ˆ the statistic GREG,lin , Y the result of linearizing GREG , Y is known population totals) that served already in defining the unbiased for Y. sampling design. Design stage information should not be relinquished at the estimation stage; instead, a “repeated 3.2 Linear GREG usage” is recommended. For example, in stratified simple

By linear GREG I mean one that is generated by a linear (STSI) random sampling, the vector x k in estimator (3.2) fixed effects assisting model. The predictions are should include, along with other available variables, the =( γ , γ γ , ..., yˆ k = x′ k B s; dq with dummy coded stratum identifier, k k1 k 2 ′ − 1 γkh, ..., γ kH ) , where γkh = 1 if element k belongs to stratum B = d q x x′ d qx y ; sdq ( ∑s k k k k ) ( ∑ s k k k k ) h, and γkh = 0 if not; = 1, ..., . hH We can write the linear GREG (3.2) as a weighted so (3.1) becomes ˆ sample sum, YGREG = ∑ s wk y k , with ′ Yˆ = x B +d( y − x′ B ). w= d g; g = 1 + q λ′x ; GREG ( ∑U k ) s; dq ∑ s k k k s; dq (3.2) k k k k k k ′ −1 λ′ = x − d x d q x x ′ . (3.3) The qk are scale factors, chosen by the statistician. The ( ∑U k ∑s k k ) ( ∑ s k k k k ) standard choice is qk = 1 for all k. The choice of the qk has ˆ some (but often limited) impact on the accuracy of GREG ; Y The weights wk happen to be calibrated to (consistent near­unbiasedness holds for any specification (barring with) the known population x­total: ∑s wk k = x ∑U x k . That ˆ outrageous choices) for the qk . Although the model is YGREG is expressible as a linearly weighted sum with simple, the linear GREG (3.2) contains many estimators, calibrated weights is a fortuitous by­product. It is not part of considering the many possible choices of the auxiliary GREG thinking, whose central idea formulated in (3.1) is vector x k and the scale factors qk . Under general the fit of an assisting model. A few other GREG’s than the conditions,

Statistics Canada, Catalogue No. 12­001­X 104 Särndal: The calibration approach in survey theory and practice simple linear one also have the calibration property, as will not. For example, in a Labour Force Survey with I = 3 be noted later. categories, “employed”, “not employed” and “not in the 3.3 Non­linear GREG labour force”, an objective is to estimate the respective population counts Yi=∑U y ik , i = 1, 2, 3. These authors use Two features of the linear GREG (3.2) make it a the logistic assisting model favourite choice for routine production in statistical  I  agencies: (i) the auxiliary population total ∑U x k becomes Eξ ( yik |x k )= µ ik ; µ ik = exp(x ′k θ i ) 1+ ∑ exp(x′k θ i ) . (3.5) factored out, so the estimation can proceed as long as an  i = 2  accurate value for that total can be computed or imported, ˆ and (ii) when written as the linearly weighted sum Estimates θ i of the θ i are obtained by maximizing Yˆ = ∑ w y , the weight system (3.3) is independent of design­weighted log­likelihood. The resulting predictions GREG s k k ˆ the y­variable and can thereby be applied to all y­variables yˆik = µ ˆ ik are used to form Yi GREG =∑U yˆ ik + ∑ s d y− yˆ ( ), for i= 1, 2, ..., I . in the survey. We need not know x k individually for all k ik ik Another development is the application of GREG ∈ ; kU knowing ∑U x k suffices. Needless to say, if we do reasoning to estimation for domains, as in Lehtonen, know all x k , more efficient (still nearly design unbiased) members of the GREG family (3.1) can be sought. This will Särndal and Veijanen (2003, 2005) and Myrskylä (2007). also counter another criticism of the linear GREG, namely Mixed models are used in the first two of these papers to that a linear model is unrealistic for some types of data. For assist the non­linear GREG. Let U a be a domain, UUa ⊂ , whose total Y= ∑ y we wish to estimate, i= 1, 2, ..., I . example, for a dichotomous y­variable, a logistic assisting ia U a ik model may be both more realistic and yield a more precise The 2005 paper derives the predictions for the non­linear GREG estimator. GREG from the logistic mixed model stating that for By a non­linear GREG estimator I mean one generated k∈ U a as in (3.1) by an assisting model of other type than “linear in  I  ′ ′ x k with fixed effects”. Among the first to extend the GREG Eξ ( yik |x k ; u ia )= exp( x k θ ia ) 1+ ∑ exp(xk θ ia )  (3.6) concept in this direction are Firth and Bennett (1998) and  i = 2  Lehtonen and Veijanen (1998); see also Chambers et al. with = + θβ u , where u is a vector of domain (1993). In the last few years, several authors have studied ia i ia ia specific random deviations from the fixed effects vector β i . model­assisted non­linear GREG’s. Non­linear GREG’s assisted by models such as (3.5) and Non­linear GREG is a versatile idea; a variety of (3.6) require model fitting for every y­variable separately; estimators become possible via assisting models ξ of the there is no uniformly applicable weight system. However, following type: the question arises: Are there examples of non­linear GREG’s such that the practical advantages of linear GREG Eξ ( yk |x k )= µ k for k ∈ U (3.4) are preserved, that is, a linearly weighted form with where the model mean µ k and the model variance calibrated weights independent the y­variable. The answer is Vξ ( y k |x k ) are given appropriate formulations. in the affirmative. Two directions in recent literature are of One application of (3.4) is when µk = µ (,) x k θ is a interest in this regard: specified non­linear function in x k . Having estimated θ by Breidt and Opsomer (2000), Montanari and Ranalli ˆ ˆ , θ the fitted values needed for YGREG in (3.1) are (2005) consider model­assisted local polynomial GREG ˆ yˆ k = µ (,) x k θ for ∈ . kU For example, if the modeler estimators, for the case of a single continuous auxiliary specifies logµk = α + β xk , the predictions for use in (3.1) variable with values xk known for all ∈ . kU Several ˆ are, following parameter estimation, yˆ k = exp(α+ˆ β xk ). choices have to be made in the process: (1) the order q of Other applications of (3.4) include generalized linear the local polynomial expression, (2) the specification of the models such that g (),µk = x ′ k θ for a specified link function kernel function, and (3) the value of the band width. The

(), ⋅ g and Vξ ( yk | k )= v ( µ x k ) is given an appropriate resulting estimator can be expressed in terms of weights structure. We estimate θ by ˆ , θ the fitted values needed for calibrated with respect to population totals of the powers of j j the non­linear GREG estimator (3.1) are xk , so that ∑s wk x k= ∑U x k for j= 0,1, ..., q . − 1 ˆ yˆk= µ ˆ k = g ( x ′k θ ). For example, using a logistic assisting Breidt, Claeskens and Opsomer (2005) develop a model, x ′k θ =logit( µk ) = log( µk /(1 − µ k )), and penalized spline GREG estimator for a single x­variable; the ˆ ˆ q yˆ k = µˆ k = exp(′k θ ) /(1+ x exp( x′k θ )). assisting model is m(;) x β = β0 + β 1 x +... + βq x + K q q q Lehtonen and Veijanen (1998) examine the case of a ∑ j =1 βq+ j(),x − κ j + where ()t + = t if > 0 t and 0 categorical study variable with I classes, i= 1, 2, ..., I , otherwise, q is the degree of the spline, and the κ k are

yik = 1 if element k belongs to category i, and yik = 0 if suitably spaced knots, for example, uniformly spaced

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 105

sample quantiles of the xk ­values. After estimation of the with weights made to confirm computable aggregates. This ˆ β ­parameters, they obtain the predictions yˆ k= m(;) x k β conceptual difference will sometimes lead to different needed for the general GREG formula (3.1). The authors estimators in the two approaches. point out that the resulting GREG estimator is calibrated for The calibration approach has considerable generality; it the parametric portion of the model, that is, can deal with a variety of conditions: complex sampling j j ∑s wk x k= ∑U x k for j= 0,1, ..., q , and also for the designs, adjustments for nonresponse and frame errors. This truncated polynomial terms in the model as long a they are section, however, focuses on the basic conditions in Section left unpenalized. 2: single phase sampling and full response. The notation We can summarize GREG estimation as follows. The remains as in Section 2. The material available for linear GREG has practical advantages for large scale estimating the population total Y= ∑U yk is: (i) the study statistics production: It can be expressed as a linearly variable values yk observed for ∈ , ks (ii) the known weighted sum of yk ­values with weights calibrated to design weights d k = 1/π k for ∈ , kU and (iii) the known

∑U x k , the weights are independent of the yk ­values and vector values x k for k∈ U (or an imported total ∑U x k ). may be applied to all y­variables in the survey. It is These simple conditions prevail in Deville and Särndal sufficient to know a population auxiliary total ∑U x k , (1992) and Deville, Särndal and Sautory (1993), papers imported from a reliable source. Non­linear GREG may which gave the approach a name and inspired further work. give a considerably reduced variance, as a result of the more Even though the background is simple, calibration raises refined models that can be considered when there is several issues, some of them computational, as reviewed in complete auxiliary information (known x k for all ∈ ); kU Section 5. near design unbiasedness is preserved. Certain non­linear The objective in Sections 4.2 and 4.3 is to determine

GREG’s can be written as linearly weighted sums. weights wk to satisfy the calibration equation In academic exercises with artificially created ∑s wk k= x ∑U x k , then use them to form the calibration ˆ populations and relationships, one can provoke situations estimator of Y as YCAL = ∑ s wk y k , which we can confront where a nonlinear GREG has a large variance advantage with the unbiased Horvitz­Thompson estimator by writing ˆ ˆ over a linear GREG. Such experiments are important for YCAL= Y HT +∑ s (). wk − d k y k It follows that the bias of ˆ ˆ illustration. However, to meet the daily production needs in YCAL is E( YCAL )− Y = E (∑ s ( wk − d k ) y k ). Meeting the national statistical agencies; “farfetched” nonlinear GREG’s objective of near design unbiasedness requires seem to be of fairly remote interest at this point in time; the E(∑ s ( wk− d k ) y k ) ≈ 0, whatever the y­variable. Evidently, assisting models for GREG must meet requirements of the calibration should strive for small deviations wk− d k . robustness and practicality. The attraction of a minor The objective “calibration for consistency with known reduction of the sampling variance is swept away by worries population auxiliary totals” can be realized in many ways. about other (non­sampling) errors and troubles in the daily We can construct many sets of weights calibrated to the production process. known ∑U x k . This section examines this proliferation from The progression from linear to non­linear GREG creates two perspectives noted in the literature: the minimum opportunities and generates questions. What is the most distance method and the instrumental vector method. Yet appropriate formulation of the model expectation µ k ? How another construction of a variety of calibrated weights is sensitive are the results to the specification of the variance proposed in Demnati and Rao (2004). part of the assisting model? To what extent is computational 4.2 The minimum distance method efficiency an issue? Further research will respond more fully to these questions. In this method, the calibration sets out to modify the initial weights d k =1/ π k into new weights wk , determined

to “be close to” the d k . To this end, consider the distance 4. The calibration approach to estimation function Gk (,), w d defined for every > 0, w such that

Gk ( w , d )≥ 0, Gk ( d , d )= 0, differentiable with respect to 4.1 Calibration under basic conditions w, strictly convex, with continuous derivative

A crucial step in the GREG approach reviewed in the gk (,)(,)/ w d= ∂ Gk w d ∂ w such that gk ( d , d ) = 0. previous section is to produce predicted values yˆ k through Usually the distance function is chosen such that the fit of an assisting model. By contrast, the calibration gk (,)(/)/, w d= g w d q k where the qk are suitably chosen approach, as defined in Section 1.1, does not refer explicitly positive scale factors, (⋅ ) g is a function of a single to any model. It emphasizes instead the information on argument, continuous, strictly increasing, with which one can calibrate. A key element of “calibration g(1)= 0, g′ (1) = 1. Let F()() u= g−1 u be the inverse thinking” is the linear weighting of the observed y­values, function of ( ⋅ ). g Minimizing the total distance

Statistics Canada, Catalogue No. 12­001­X 106 Särndal: The calibration approach in survey theory and practice

∑ s Gk(,) w k d k subject to the calibration equation asymptotically equivalent calibration estimators. Alternative ∑s wk k= x ∑U x k leads to wk= d k F( q k x ′ k λ ), where λ is distance functions are compared in Deville, Särndal and obtained as the solution (assuming one exists) of Sautory (1993), Singh and Mohl (1996), Stukel, Hidiroglou and Särndal (1996). Some distance functions will guarantee dx F(). q x ′ λ= x (4.1) ∑s k k k k ∑ U k weights falling within specified bounds, so as to rule out too The weights have an optimality property, because a duly large or too small (negative) weights. Changes in the specified objective function is minimized, but it is a “weak distance function will often have minor effect only on the ˆ optimality” in the sense that there are many possible variance of the calibration estimator YCAL = ∑ s wk y k , even specifications of the distance function and the scale factors if the sample size is rather small. Questions about the existence of a solution to the calibration equation are qk . Much attention has focused on the distance function discussed in Théberge (2000). 2 Gk( w k , d k )= ( w k − d k ) / 2 d k q k . It gives gk(,) w k d k = 4.3 The instrument vector method (w / d− 1)/;(/) q g w d= w / d − 1; F( u )= g−1 ( u ) = 1 + u . k k k An alternative to distance minimization is the The term “the linear case” is thus appropriate. The task is instrumental vector method, considered in Deville (1998), then to minimize the “chi­square distance” Estevao and Särndal (2000, 2006) and Kott (2006). It can (w− d )2 / 2 d q , subject to w = x x . ∑ s k k k k ∑s k k ∑U k also generate many alternative sets of weights calibrated to Equation (4.1) reads ∑ dx (1+ q x ′ λ ) = ∑ x , which is s k k k k U k the same information. easily solved for . λ The resulting estimator of Y= y is ∑U k We can consider weights of the form w= d F λz(′ ), Yˆ = ∑ w y with weights w= d g given by (3.3). k k k CAL s k k k k k where z is a vector with values defined for k∈ s and That is, YYˆ= ˆ as given by (3.2), and the residuals k CAL GREG sharing the dimension of the specified auxiliary vector x , that determine the asymptotic variance are k and the vector λ is determined from the calibration E= y − x′ B as given in Section 3.2. Some negative k k k U; q equation ∑w = x ∑ x . The function ( ⋅ ) F plays the weights w may occur. s k kU k k same role as in the distance minimization method; several The linear GREG estimator implies weights that happen choices ( ⋅ ) F are of interest, for example, ( )=+ 1 Fu u to be calibrated (to x ), and the opposite side of the ∑U k and F ( u )= exp( u ). same coin says that the linear case for calibration (with chi­ Opting for the linear function F ( u )=+ 1 u , we have square distance) brings the linear GREG estimator. The w =d (1 + λ′z ). It is an easy exercise to determine λ to tendency in some articles and applications to intertwine k k k satisfy the calibration equation ∑w = x ∑ x . The GREG thinking and calibration thinking stems from this s k kU k resulting calibration estimator is fact. Many successful applications of the use of auxiliary information stem, in any case, from this linearity on both Yˆ = w y; w= d (1 + λ′z ), CAL ∑ s k k k k k sides of the coin. The Canadian Labour Force Survey is an − 1 λ′ = x − d x′ d z x′ . example, and an interesting recent development for that ( ∑U k ∑s k k ) ( ∑ s k k k ) (4.2) survey is the use of composite estimators, with part of the information coming from the survey results in previous Whatever the choice of z k , the weights wk = months, as described in Fuller and Rao (2001). d k (1+ λ′z k ) satisfy the calibration equation. The standard The calibration equation is satisfied for any choice of the choice is k= zx k . In particular, setting k= zq k x k , for specified q , gives the weights (3.3). positive scale factors qk in (4.1). A simple choice is qk = 1 k for all k. But it is not always the preferred choice. For Even “deliberately awkward choices” for z k give example, if there is a single, always positive auxiliary surprisingly good results. For example, let xk be a single continuous auxiliary variable, and = z c x p −1 . Suppose variable, and k= x x k , then many will intuitively expect k k k ˆ = 3, p and c = 1 for 4 elements only, chosen at random YCAL = ∑ s wk y k to deliver the usual ratio estimator k from = 100 n elements in a realized sample s, and c = 0 ∑Uxk( ∑ s d k y k ) /( ∑ s d k x k ), and it does, but by taking k q= x − 1 , not q = 1. for the remaining 96. The near­unbiasedness of k k k ˆ Another distance function of considerable interest is YCAL =∑ s dk (1 + λ′z k ) y k is still present. Even with such a sparse z­vector, the increase in variance, relative to better Gk( w k , d k )= { w k log( w k / d k )− w k + d k }/ q k . It leads to − 1 F( u )= g ( u ) = exp( u ), “the exponential case”. Then (4.1) choices of z k , may not be excessive. When both sampling design and x­vector are fixed, reads ∑s dkx k exp( q k x ′ k λ )= ∑U x k . Numeric methods are required to solve for , λ to obtain the weights Estevao and Särndal (2004) and Kott (2004) note that there is an asymptotically optimal z­vector given by wk= d kexp( q k x ′ k λ ). No negative weights wk will occur. Deville and Särndal (1992) show that a variety of − 1 zk= z0 k = d k ∑ () dk dl− d k l x l distance functions satisfying mild conditions will generate l∈s

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 107

where d k l is the inverse of the second order inclusion Montanari and Ranalli (2003, 2005). The motivating factor probability πk l =P( k &l ∈ s ), assumed strictly positive. is that complete auxiliary information allows a more ˆ The resulting calibration estimator, YCAL = effective use of the x k known for every k∈ U than what is

∑ s dk (1+ λ′z 0 k ) y k , is essentially the “randomization­ possible in model­free calibration, where a known total optimal estimator” due originally to Montanari (1987) and ∑U x k is sufficient. The weights are required to be discussed by many since then. consistent with the computable population total of the

Andersson and Thorburn (2005) view the question from predictions yˆ k , derived via an appropriate model the opposite direction and ask: In the minimum distance formulation. Thus the weight system may not be consistent method, can a distance function be specified such that its with the known population total of each auxiliary variable, minimization will deliver the randomization­optimal unless there is special provision to retain this property. estimator? They do find this distance; not entirely Model­calibration still satisfies all three parts, (a) to (c), of surprisingly, it is related to (but not identical to) the chi­ the definition of calibration in Section 1.1; in particular, the square distance. estimators are nearly design unbiased. 4.4 Does calibration need an explicitly stated model? Consider a non­linear assisting model of the type (3.4). We estimate the unknown parameter θ by ˆ , θ leading to The calibration approach as presented in Sections 4.2 and ˆ fitted values yˆk = µ ˆ k = µ (,) x k θ computed with the aid of 4.3 proceeds by simply computing the weights that the x k known for all ∈ . kU It follows that the population reproduce the specified auxiliary totals. There is no explicit size N is known and should be brought to play a significant assisting model, unless one were to insist that picking role in the calibration. If minimum chi­square distance is certain variables for inclusion in the vector x k amounts to a used, we find the weights of the model­calibration serious modeling effort. Instead, the weights are justified ˆ estimator YMCAL = ∑ s wk y k by minimizing primarily by their consistency with the stated controls. Early 2 ∑ s (wk− d k ) /(2 d k q k ), for specified qk , and d k =1/ π k , contributions reflect this attitude, from Deming (1943), and subject to the calibration equations continuing with Alexander (1987), Zieschang (1990) and w= N; w yˆ= y ˆ . (4.3) others. This begs the question: Is it nevertheless important to ∑s k ∑s k k ∑ U k motivate such “model­free calibration” with an explicit For simplicity, let us take q = 1 for all k; we derive the model statement? It is true that statisticians are trained to k calibrated weights, rearrange terms and find that the model­ think in terms of models, and they feel more or less calibration estimator can be written as compelled to always have a statistical procedure ˆ % accompanied by a model statement. It may indeed have YMCAL = N{ ys; d + yˆ U − y ˆ ( s; d ) B s ; d } (4.4) some pedagogical merit, also in explaining calibration, to state the associated relationship of y to x, even if it is as where ys; d= ∑s d k y k/; ∑ s d k yˆs; d= ∑s d ky ˆ k/, ∑ s d k and simple as a standard linear model. B% = d()(). yˆ − y ˆ y d yˆ− y ˆ 2 But will a stated model help the users and practitioners s; d ( ∑ s k k s; d k ) ∑ s k k s; d better understand the calibration approach? To most of them The regression implied by B % is one of observed y­ the approach is perfectly clear and transparent anyway. ; sd values on predicted y­values. The idea of this regression They need no other justification than the consistency with would hardly occur to the modeler is his/her attempts to stated controls. Will a search for “the true model with the structure the relation between y and x , but it proves true variance structure” bring significantly better accuracy k k effective in building the calibration estimator. Wu and Sitter for the bulk of the many estimates produced in a large (2001) present evidence that government survey? It is unlikely. The next section deals with model­calibration. For that ()/Yˆ − Y N = d E%% − E N+ O() n −1 MCAL ( ∑s k k ∑ U k ) p variety, proposed by Wu and Sitter (2001), modeling has % % % indeed an explicit and prominent role. These authors call the with Ek= y k − y U −(), µ k − µ UB U where BU = ˆ 2 linear calibration estimator, YCAL = ∑ s wk y k with weights (())/(),∑U µk − µ Uy k ∑U µk − µ U and µU =∑U µ k /.N % wk given by (3.3), “a routine application without The coefficient B U may not be near one even in large modeling”. The description is appropriate in that all that is samples. It expresses a regression of yk on its assisting ˆ necessary is to identify the x­variables with their known model mean µk = µ x ( k , β ). That is, YMCAL can be viewed population totals. as a regression estimator that uses the model expectation µ k % as the auxiliary variable, leaving E k as the residuals that 4.5 Model­calibration ˆ determine the asymptotic variance of MCAL . Y The idea of model­calibration is proposed in Wu and Sitter (2001) and pursued further in Wu (2003) and

Statistics Canada, Catalogue No. 12­001­X 108 Särndal: The calibration approach in survey theory and practice

How does this asymptotic variance compare with that of achieves only marginal improvement over the non­linear ˆ the non­linear GREG construction (3.1) for the same non­ GREG . Y linear assisting model and the same yˆ k = µ ˆ k ? Formula (3.1) We can summarize the calibration approach as follows: implies a slope equal to unity in the regression between yk The estimator of Y= ∑U y k has the linearly weighted form ˆ ˆ and yˆk = µ ˆ k ; viewed in that light, YGREG is a difference Y= ∑ s wk y k . In linear (model­free) calibration, the estimator rather than a regression estimator and hence less calibration equation reads ∑ s wk k = x ∑U x k ; a known sensitive to the pattern in the data. The non­linear GREG population auxiliary total ∑U x k is required, but complete ˆ ˆ YGREG is in general less efficient than MCAL . Y (It is of course auxiliary information (known x k for all ∈ ) kU is not ˆ possible to modify YGREG to also account for the required; the same weights can be applied to all y­variables information contained in the known population size N.) (multi­purpose weighting); the estimator is identical to the On the other hand, compared with the linear (model­free) linear GREG estimator (but derived by different reasoning). ˆ calibration estimator YCAL = ∑ s wk y k with weights as in In model­calibration, the assisting model mean µ k is non­ ˆ (3.3), the model­calibration estimator YMCAL given by (4.4) linear in x k ; complete auxiliary information is usually may have a considerable variance advantage but implies a required; the calibration constraints include the equation loss of the practical advantages of a consistency with the ∑s wk yˆ k = ∑U y ˆ k ; the weights wk depend on the yk ­ known population total ∑U x k and a multi­purpose weight values, implying a loss of the multi­purpose property. system applicable to all y­variables. The y­values in (4.4) are linearly weighted, but the weights now also depend on the y­ 5. Computational aspects, extreme weights values. It is thus debatable if Yˆ is a bona fide MCAL and outliers calibration estimator. In an empirical study, Wu and Sitter (2001) compare The computation of calibrated weights raises important ˆ YMCAL = ∑ s wk y k , calibrated according to (4.3), with the practical issues, discussed in a number of papers. All ˆ non­linear GREG, YGREG = ∑Uyˆk+ ∑ s d k y k − yˆ () k given by computation must proceed smoothly and routinely in the (3.1), for the same non­linear assisting model and same large scale statistics production of a national statistical ˆ yˆk = µ ˆ k . The study confirms that YMCAL has a variance agency. Undesirable (or unduly variable) weights should be ˆ advantage over the non­linear GREG . Y They created a finite avoided. Many practitioners support the reasonable population U of size = 2,000 N with values requirement that all weights be positive (even greater than

(yk , x k ), k = 1, ..., 2,000, such that log(yk ) = 1++xk ε k ; unity) and that very large weights should be avoided. the 2,000 values xk are realizations of the Gamma(1,1) A few of the weights computed according to (3.2) can random variable, and ε k is a normally distributed error. The turn out to be quite large or negative. Huang and Fuller auxiliary information consists of the population size N and (1978) and Park and Fuller (2005) proposed methods to the known values xk for k = 1,...,2,000. Repeated simple avoid undesirable weights. random samples of size n = 100 were taken; the assisting In the distance minimization method, the distance model for both estimators was the log­linear function can be formulated so that negative weights are

Eξ ( yk | x k ) = µ k with log(µk ) = α + β xk . This model was excluded, while still satisfying the given calibration fit for each sample, using pseudo­maximum quasi­ equations. The software CALMAR (Deville, Särndal and ˆ likelihood estimation. The fitted values yˆ k = exp(α+ˆ β xk ) Sautory 1993) allows several distance functions of this kind. ˆ ˆ were used to form both YMCAL and GREG . Y The simulation An expended version, CALMAR2, is described in ˆ variance was markedly lower for MCAL . Y (The linear GREG LeGuennec and Sautory (2002). Other statistical agencies (3.2), identical to the model­free calibration estimator, was have developed their own software for weight computation. also included in the Wu and Sitter study; not surprisingly, it Among those are GES (Statistics Canada), CLAN97 is even less efficient than the non­linear GREG, under the (Statistics Sweden), Bascula 4.0 (Central Bureau of strongly non­linear relationship imposed in their Statistics, The Netherlands), g­CALIB­S (Statistics experiment.) Belgium). These strive, in different ways, to resolve the Montanari and Ranalli (2005) provide further evidence, computational issues arising. The user needs to consult the for several artificially created populations, on the users’ guide in each particular case to see exactly how the ˆ ˆ comparison between YMCAL and the non­linear GREG . Y computational issues, including an avoidance of undesirable

Their assisting model, yk = µ k + ε k , is fitted via weights, are handled. nonparametric regression (local polynomial smoothing), GES uses mathematical programming to minimize the yielding predictions yˆk = µ ˆ k for ∈ . kU With this type of chi­square distance, subject to the calibration constraints as model fit, the predictions yˆk = µ ˆ k are highly accurate. Not well as to individual bounds on the weights, so that they will ˆ surprisingly, the model­calibration estimator YMCAL satisfy Ak≤≤ w k B k for specified ABk,. k Bascula 4.0 is

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 109 described in Nieuwenbroek and Boonstra (2002). The distribution function must first be estimated. Before software g­CALIB­S, described in Vanderhoeft, Waeytens calibration became popular, several papers considered the and Museux (2001), Vanderhoeft (2001), uses generalized estimation of quantiles, with or without the use of auxiliary inverse (the Moore–Penrose) for the weight computation; information. More recent articles have turned to the consequently one need not be concerned about a possible calibration approach for the same purpose, including redundancy in the auxiliary information. Kovaĉević (1997), Wu and Sitter (2001), Ren (2002), Tillé In Bankier, Houle and Luc (1997) the objective is two­ (2002), Harms (2003), Harms and Duchesne (2006) and fold: to keep the computed weights within desirable bounds, Rueda et al. (2007). As these papers illustrate, there is more and to drop some x­variables to remove near­linear than one way to implement the calibration approach. The dependencies. Isaki, Tsay and Fuller (2004) consider non­smooth character of the finite population distribution quadratic programming to obtain both household weights function causes certain complexities; these are resolved by and person weights that lie within specified bounds. different authors in different ways. An intervention with the weights (so as to get rid of Let ∆⋅ ( ) denote the Heaviside function, defined for all undesirable weight values) raises the question how far one real z so that ∆= ( ) 1 z if ≥ 0 z and ∆= ( ) 0 z if < 0. z The can deviate from the design weights d k without unknown distribution function of the study variable y is compromising the desirable feature of nearly design 1 unbiased estimation. An idea that has been tried is to modify Fy ( t )=∑ U ∆ (t − y k ). (6.1) the set of constraints so that tolerances are respected for the N difference between the estimator for the auxiliary variables The α ­quantile of the finite population is defined as and the corresponding known population totals. Hence, Qyα =inf{ t | Fy ( t ) ≥ α }. The auxiliary variable x j , taking Chambers (1996) minimizes a “cost­ridged loss function”. values x , has the distribution function F() t = jk x j Outlying values in the auxiliary variables may be a cause (1/N )∑ ∆ ( t − x ) with α ­quantile denoted Q , U jk x j α of extreme weights. Calibration in the presence of outliers is j= 1, 2, ..., J . A natural estimator of Fy () t based on the discussed in Duchesne (1999). His technique of “robust design weights d k =1/ π k is calibration” may introduce a certain bias in the estimates; it 1 may, however, be more than offset by a reduction in Fˆ () t = d∆( t − y ). y d ∑ s k k variance. ∑ s k When the set of constraints is extended to make the A calibration estimator F() t of takes the form weights restricted to specified intervals, a solution to the y optimization problem is not guaranteed. The existence of a 1 Fˆ () t = w∆() t − y (6.2) solution is considered in Théberge (2000), who also yCAL ∑ s k k ∑ s wk proposes methods for dealing with outliers.

where the weights wk are suitably calibrated to a specified ˆ auxiliary information; then from Fy CAL () t we obtain the 6. Calibration estimation for more ˆ ˆ α ­quantile estimator as Q y α = inf{t | Fy CAL ( t ) ≥ α }. A complex parameters formula analogous to (6.2) holds for Fˆ ( t ). x j CAL Without explicit reference to any model, Harms and The calibration approach adapts itself to the estimation of Duchesne (2006) specify the information available for more complex parameters than a population total. Examples calibration as a known population size, N, and known are reviewed in this section. Single phase sampling and full population quantiles Qx α for j= 1, 2, ..., J . The complete response continue to be assumed; the notation remains as in j auxiliary information, with values k= x (x k1 , ..., x kJ ) ′ Section 2. One example is the estimation of population known for ∈ , kU is not required. (But in practice, the quantiles (Section 6.1), another is the estimation of complete information would usually be necessary, because functions of totals (Section 6.2). Other examples in this accurate quantiles of several x­variables are not likely to be category, not reviewed here, are Théberge (1999), for the importable from outside sources.) They determine the wk to estimation of bilinear parameters, and Tracy, Singh and 2 minimize the chi­square distance ∑ s (wk− d k ) / 2 d k q k , for Arnab (2003), for calibration with respect to second order specified q , subject to the calibration equations moments. k w= N; Qˆ = Q, j = 1, 2, ..., J 6.1 Calibration for estimation of quantiles ∑ s k x jCAL, αx j α The median and other quantiles of the finite population for suitably defined estimates Qˆ . Now, if we were to x j CAL, α are important descriptive measures, especially in economic specify Qˆ = inf{t | Fˆ ( t )≥ α }, then it is in general x jCAL, α x j CAL surveys. To estimate quantiles, the finite population not possible to find an exact solution of the calibration

Statistics Canada, Catalogue No. 12­001­X 110 Särndal: The calibration approach in survey theory and practice problem as stated. Instead, Harms and Duchesne substitute 6.2 Calibration for other complex parameters smoothed estimators, called “interpolated distribution Plikusas (2006), and Krapavickaitė and Plikusas (2005) estimators”, of the distribution functions examine calibration estimation of certain functions of F( t ), j= 1, 2, ..., J . They replace ∆⋅ ( ) by a slightly x j population totals. (Their term “non­linear calibration” modified function. Weights wk can now be obtained, as signifies “non­linear function of totals”; I do not use it here.) well as a corresponding estimated distribution function A simple example is the estimation of a ratio of two totals, ˆ ˆ ˆ −1 Fy CAL (); t finally, Q y α is estimated as QFyα = y CAL ( α ). R= ∑U y1k/, ∑ U y 2 k where y1 k and y2 k are the values for The resulting calibrated weights wk allow us to retrieve element k of the variables y1 and 2 , y respectively. (The the known population quantiles of the auxiliary variables. distribution function (6.1) is in effect also of ratio type, with This is reassuring; one would expect such weights to y2 k = 1, and N = ∑U 1 as the denominator total.) These produce reasonable estimators for the quantiles of the study authors examine the calibration estimator variable y. Moreover, in the case of a single scalar auxiliary ˆ RCAL = ∑s wk y1 k/. ∑ s w k y 2 k Its weights wk , common to variable x, the resulting calibration estimator delivers exact the numerator and the denominator, are determined by population quantiles for y when the relationship between y calibration to auxiliary information stated as follows: There and x is exactly linear, that is, when yk=β x k for all ∈ . kU is one auxiliary variable, x1 k , for y1 k , and another, x2 k , for An idea involving smoothed distribution functions is also y2 k ; the ratio of totals R0= ∑U x 1k/ ∑ U x 2 k is a known used in Tillé (2002). value, by a complete enumeration at a previous occasion or The computationally simpler method of Rueda et al. from some other accurate source. The proposed calibration (2007) is an application of model­calibration, in that they equation is ∑ s w e = 0, where e=− x R x . Because calibrate with respect to a population total of predicted y­ k k k1 k 0 2 k ∑U ek = 0, the weights, by minimum chi­square distance, values. Complete auxiliary information is required. Using are the known x k , compute first the linear predictions −1 − 1 ˆ ′ ˆ ′ 2 yˆ k = β x k for ∈ , kU with = β()∑ s dk q k x k x k w= d1 − d e d e e . k k { ( ∑s k k )( ∑ s k k ) k } (),∑ s dk q kx k y k where d k =1/ π k and the qk are specified scale factors. The weights wk are obtained by minimizing These weights correctly retrieve the known ratio value the chi­square distance subject to calibration equations ˆ 0 ; R setting y1= x 1 kk and y2= x 2 kk in CAL , R we have stated in terms of the predictions, so as to have consistency at J arbitrarily chosen points t, j= 1,..., J : ∑wk x1 k ∑ wk e k j s −R =s = 0. w x 0 w x 1 ∑s k2 k ∑ s k2 k ∑ wk∆( t j − yˆ k ) = F yˆ ( t j ), j = 1,..., J N s The empirical evidence in Plikusas (2006), and where F() t is the finite population distribution function Krapavickaitė and Plikusas (2005) suggests that their yˆ j calibration estimator compares favourably (lower variance, of the predictions yˆ k , evaluated at t j . It is suggested that a fairly small number of arbitrarily selected points t may while maintaining near design unbiasedness) with other j estimators, derived through other arguments than cali­ suffice, say less than 10. Once the wk are determined, the α­quantile estimate is obtained from bration, while relying on the same auxiliary information. ˆ FyCAL ( t )= (1/ N )∑ s wk ∆ ( t − y k ). Quantile estimation provides a good illustration that the 7. Calibration contrasted with other approaches calibration approach can be carried out in more than one way when somewhat more complex parameters are being As many have noted, users view calibration as a simple estimated. Both methods mentioned give nearly design and convincing way to incorporating auxiliary information, unbiased estimation. The Harms and Duchesne (2006) for simple parameters (Section 4), as for more complex weights are multi­purpose, independent of the y­variable; by parameters such as quantiles, ratios and others (Section 6). contrast, the method of Rueda et al. (2007) requires a new Simplicity and practicality are undeniable advantages, but set of weights for every new y­variable. Empirical evidence, aside from that, is calibration also “theoretically superior”? by simulation, suggests that both methods compare Are there instances where calibration can be shown to give favourably with the earlier quantile estimation methods, not more accurate and/or more satisfactory answers on based explicitly on calibration thinking (but on the same questions of importance, when contrasted with other design­ auxiliary information). based approaches? An extension of the calibration approach to the Section 4.5 gave one indication that calibration thinking estimation of other complex parameters, such as the Gini may have an advantage over GREG thinking, in that model­ coefficient, is sketched in Harms and Duchesne (2006). calibration may give more precise estimates than the

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 111

ˆ non­linear GREG, for the same assisting model. The in an attempt to borrow strength for Ya GREG by letting it following Section 7.1 gives another example where depend also on y­data from outside the domain. By contrast, ˆ calibration reasoning and GREG reasoning give diverging Yq CAL relies exclusively on y­data in the domain, and this is answers, with an advantage for the calibration method. in effect better. Estevao and Särndal (2004) show that Yˆ with = zz has smaller (asymptotic) variance 7.1 An example in domain estimation a CAL 0 kCk ˆ than Ya GREG , no matter how s % is chosen. Bringing in y­data The example in this section, from Estevao and Särndal from the outside does not help; calibration thinking and (2004), shows, for a simple practical situation, a conflict regression thinking do not agree. between the results of GREG thinking and calibration thinking. The context is the estimation of the y­total for a sub­population (a domain). 8. Calibration estimation in the presence A probability sample s is drawn from U = of composite information {1, 2,..., , ..., } ; kN the known design weights are

d k = 1/π k . Let U a be a domain; UUa ⊂ . The domain As the preceding sections have shown, many papers indicator is δ ak with value δak = 1 if k∈ U a and δak = 0 choose to study estimation for direct, single phase sampling if not. The target of estimation is the domain total of elements, without any nonresponse. The information

Ya= ∑U y ak , where yak=δ ak y k , and yk is observed for available for calibration is simple; the k:th element of the ˆ ∈ . ks The Horvitz­Thompson estimator YHT = ∑ s dk y ak , finite population U= {1, 2, ..., k , ..., N } has an associated although design unbiased, has low precision, especially if auxiliary vector value x k . the domain is small; the use of auxiliary information will However, in an important category of situations, the bring improvement. An auxiliary vector value x k is auxiliary information has composite structure. The specified for every ∈ . kU complexity of the information increases with that of the As is frequently the case in practice, the elements sampling design. In designs with two or more phases, or in belonging to a domain of interest are not identified in the two or more stages, the information is typically composed sampling frame. (If they are, some very powerful of more than one component, reflecting the features of the information is available from the start, but frequently real design. The information is stated in terms of more than one world conditions are not that favourable.) But suppose auxiliary vector. For example, in two­stage sampling, some elements in a larger group U C are identifiable; information may be available about the first stage sampling UUUa⊂⊂ C . For example, suppose y is “income” and units (the clusters), other information about the second stage U C a professional group specified for the persons listed in units (the elements). the frame, while U a is a professional sub­group not Consequently, estimation by calibration (or by any identified in the frame. We can identify the sample subsets alternative method) must take the composite structure of the

sC =∩ s U C and sa =∩ s U a , and we can benefit from information systematically into account. The total infor­ knowing the total ∑U xCk , estimable without bias by mation has several pieces; the calibration can be done in

∑ s d kx Ck , where Ck= δ x Ck x k , and δ Ck is the information more than one way. All relevant pieces should be taken into group indicator: δCk = 1 if k∈ U C and δCk = 0 if not. The account, for best possible accuracy in the estimates. To domain auxiliary total ∑U x ak is unavailable, because U a is accomplish this in a general or “optimal” way is not a trivial not identified. Calibration to satisfy ∑s wk Ck = x ∑U xCk task. Calibration reasoning offers one way. gives the nearly design unbiased estimator Regression reasoning, with a duly formulated assisting ˆ YaCAL = ∑U wk y ak , where wk = d k (1+ λ′z k ), with model, is an alternative way, but it will strike some users as −1 λ′ =(∑U xCk − ∑ s d k x Ck ) ′ ().∑ s d kz k x ′ Ck The more roundabout. Hence, surveys that allow composite asymptotically optimal instrument for the given vector x k is auxiliary information bring further perspectives on the − 1 (see Section 4.3) = 0 kCk = zz dk ∑ l∈ s (). dk dl− d k l x C l contrast between calibration thinking and GREG thinking. By contrast, regression thinking for the same auxiliary Two­phase sampling and two­stage sampling are ˆ information leads to YaGREG =∑s dk y ak +( ∑U Ck − x discussed in this section. Another example of composite

∑ s d kx Ck),′ B s% ; d also nearly design unbiased, where the information occurs for nonresponse bias adjustment, as −1 regression coefficient s% ; d = B()∑s% dk x k x′ k ∑ s % d k x k y k is the discussed in Section 9. result of a weighted least squares fit at a suitable level, using Another aspect of composite information occurs when all (when % = ) ss or part (when % ⊂ ) ss of the data points the objective is to combine information from several

(,)yk x k available for ∈ . ks surveys. This, too, can be a way to add strength and improve For example, the modeller may opt for a regression fit accuracy of the estimates. It is a motivating factor (in

“extending beyond the domain” (so that s% ⊃ sa = s ∩ U a ), addition to the user oriented motive to achieve consistency

Statistics Canada, Catalogue No. 12­001­X 112 Särndal: The calibration approach in survey theory and practice among surveys) in the previously mentioned repeated models with their respective variance structures. In the weighting methodology of the Dutch statistical agency. calibration approach, alternative formulations of the Combined auxiliary information for GREG estimation is calibration equations are possible. considered in Merkouris (2004). For example, a two­step calibration option is as follows: First find intermediate weights w to satisfy 8.1 Composite information for two­phase sampling 1k ∑w = x ∑ x ; then use the weights w in the second designs s1 1k 1 k U 1 k 1k step to compute the final weights w to satisfy Double sampling refers to designs involving two k x  probability samples, s1 and 2 , s from the same population ∑ U 1 k wkx k = w 1 k x k =   U= {1, ..., k , ..., N }. Auxiliary data may be recorded for ∑s2 ∑ s 1 w1 kx 2 k  ∑ s 1  both U and 1 , s the study variable values yk are recorded only for k∈ s2 with an objective to estimate Y= ∑U y k . where x k is the combined auxiliary vector Hidiroglou (2001) distinguishes several kinds of double x  x = 1 k . sampling: In the nested case (traditional two phase k   x 2 k  sampling), the first phase sample s is drawn from , U the 1 Alternatively, in a single step option, we determine the second phase sample s2 is a sub­sample from 1 , s so that wk directly to satisfy U⊃⊃ s1 s 2 . Two non­nested cases can be distinguished: In the first of these, s is drawn from the frame ; U s ∑ x 1 k  1 1 2 w x = U  . ∑ s k k from the frame 2 , U where U1 and U 2 cover the same 2 d x  ∑ s 1k 2 k population ; U the sampling units may be defined 1  differently for the two frames. In the second non­nested The final weights wk are in general not identical in the case, s1 and s2 are drawn independently from . U two options. Suppose that ∑U x1 k is an imported x1 ­total. To illustrate how composite information intervenes in the At closer look, the two­step option requires more extensive estimation, consider the nested case. The design weights are information, because individually known values x1 k are d =1/ π ( s sampled from U); d =1/ π ( π = π required for ∈ 1 , ks whereas it is sufficient in the single step 1k 1 k 1 2k 2 k 2 k k s 1 in sub­sampling s2 from 1 ). s The combined design weight option that they be available for ∈ 2 . ks Some variance is d= d d . The basic unbiased estimator Yˆ = ∑ d y advantage may thus be expected from the two­step option, k1 k 2 k s 2 k k since ∑ w x is often more accurate (as an estimator of can be improved by a use of auxiliary information, specified s 1 1k 2 k ∑ x ) than ∑ d x in the single step procedure. here at two levels: U 2 k s 1 1k 2 k Nevertheless, this anticipation is not always confirmed; the Population level: The vector value x is known (given in 1k single step method can be better, as when x and x are the frame) for every ∈ , kU thus known for every k∈ s 1 2 1 weakly correlated. and for every k∈ s ; ∑ x is a known population vector 2U 1 k Dupont (1995) and Hidiroglou and Särndal (1998) total; examine links that exist, not surprisingly, between the two First sample level: The vector value x 2k is known approaches. A GREG estimator, derived from assisting (observed) for every ∈ 1 , ks and thereby known for every models with specific variance structures, may be identical to ∈ 2 ; ks the unknown total ∑U x 2 k is estimated without calibration estimator, if the weights of the latter are bias by ∑ d x . s 1 1k 2 k calibrated in a certain way. In other cases, differences may How do we best take this composite information into be small. account? In an adaptation of GREG thinking, Särndal and The efficiency of different options depends in rather Swensson (1987) formulated two linear assisting models, subtle ways on the pattern of correlation among yk , x1 k and x . For example, to what extent do x and x the first one stated in terms of the x1 k ­vector, the other one 2 k 1 2 complement each other, to what extent are they substitutes also brings in the x 2k ­vector. The two models are fitted; the resulting predictions, of two kinds, are used to create an for one another? In the GREG approach, it is difficult or ˆ even futile to pinpoint a variance structure that truly appropriate GREG estimator YGREG of Y= ∑U y k . Dupont (1995) makes the important point that the given captures a “reality” behind the data. The calibration composite information invites “two different natural approach is more direct. Some of its possibilities are approaches”: Besides the GREG approach, there is a explored in Estevao and Särndal (2002, 2006). calibration approach that will deliver final weights wk for a ˆ 8.2 Composite information in two­stage sampling calibration estimator Y= ∑ s w y . It is of interest to CAL 2 k k designs compare the results of the two approaches. Both of them allow more than one option: In the GREG approach, there The traditional two­stage sampling set­up (clusters are alterative ways of formulating the linear assisting sampled at stage one, elements sub­sampled within selected

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 113

clusters in stage two) has in common with two­phase Once the wI i are determined, the element weights sampling that the total information may have more than one wk= d k| i w I i follow. component. There may exist (a) information at the cluster Another reasonable integrated weighting is to impose level (about the clusters); (b) information at the element ∑ w= N w . For example, for single stage cluster s i k iI i level for all clusters; (c) information at the element level for sampling it implies that the cluster weight wI i is the average the selected clusters only. Here again, authors are of two of the element weights wk in that cluster. different orientations: some exploit the information via Two­stage sampling is also the topic in Kim, Breidt and calibration thinking, others follow the GREG thinking route. Opsomer (2005). They assume auxiliary information for

Estevao and Särndal (2006) develop calibration clusters, via a single quantitative cluster variable x( c ) i , but estimation for the traditional two­stage set­up, with none for elements. They develop and examine a GREG type composite information specified as follows: (i) for the estimator of the element total Y= ∑U y k , cluster population , U there is a known total ∑ x , Yˆ =∑ µˆ + ∑ d( tˆ − µˆ ), where tˆ is design unbiased I U I (c ) i i∈ UI i i∈ s I i i i i where x is a vector value associated with the cluster U , for the cluster total t= ∑ y , and µˆ is obtained by local () ci i i U i k i for ∈ I ; iU (ii) for the population of elements U = polynomial regression fit. The estimator can be expressed U , there is a known total ∑U x , where the vector on the linearly weighted form, with weights that turn out to ∪ i∈ U I i k value x k is associated with the element ∈ . kU Suppose be calibrated to the population totals of powers of the cluster both cluster statistics and element statistics are to be variable x( c ) i . produced in the survey: Both the cluster population total 8.3 Household weighting and person weighting Y= y and the element population total Y= y I ∑U I ()c i ∑U k are to be estimated. Some important social surveys set the objective to produce both household estimates and person estimates; If no relation is imposed between cluster weights wI i and some study variables are household (cluster) variables, element weights wk , the former are calibrated to satisfy others are person (element) variables. Consequently, a ∑s wI() i c i = x ∑U x (c ) i , the latter to satisfy I I number of papers have addressed the situation with single ∑s wk k= x ∑U x k . (Here, sI is the sample of clusters from stage cluster sampling (d = 1) and the integrated UI ; si is the sample of elements from the cluster U i ; k| i s = s is the entire sample of elements.) Then weighting that gives all members of a selected household ∪ i∈ s I i Yˆ = w y estimates the cluster population total equal weight, a weight also used for producing household ICAL ∑ s I Ii ( c ) i ˆ statistics. A general solution for this weighting problem, I , Y and YCAL = ∑ s wk y k estimates the element population total . Y when both household information and person information Integrated weighting is often used in practice: A are specified, is to obtain the household weights wI i convenient relationship is imposed between the cluster calibrated as in equation (8.1) with d k| i = 1, then take w= w . weight wI i and the weights wk for the elements within the kI i selected cluster. Two forms of integrated weighting are Several articles focus on auxiliary vector values x k discussed in Estevao and Särndal (2006). attributed to persons. Alexander (1987) derives weights by minimizing chi­square distance, whereas Lemaître and One of these is to impose wk= d k| i w I i , where d | ki is the inverse of the probability of selecting element k within Dufour (1987) and Niewenbrook (1993) derive the cluster i. (For example, in single stage cluster sampling, integrated weights via a GREG estimator. The Lemaître and when all elements k in a sampled cluster are selected, then Dufour technique proceeds by an indirect construction of an “equal shares auxiliary vector value” for all persons in a d k| i = 1. Consequently w= wI ki is imposed, and all elements in the cluster receive the same weight for selected household; their result is derivable from the direct computing element statistics, and that same weight is also procedure in Section 8.2. used for computing cluster statistics.) The calibration The household­weighting/person­weighting question is revisited in more recent papers. Some authors display equation ∑s wk k= x ∑U x k then reads w d = x x . The cluster weights w are now calibration thinking, others GREG thinking. Isaki, Tsay and ∑sI Ii ∑ si k | i k ∑U k Ii derived by minimizing ()/w− d2 d subject to the Fuller (2004) formulate the problem as one of calibrated ∑ s I Ii I i I i calibration equation that takes both kinds of information into weighting; their weights respect both household controls account: and person controls; no explicit assisting models are formulated. By contrast, Steel and Clark (2007) proceed by  w x  x  ∑ s Ii ( c ) i ∑ U (c ) i the GREG approach, with linear assisting model statements  I  = I  . (8.1)   and accompanying variance structures.  wIi d k | ix k  x k  ∑sI ∑ s i  ∑ U 

Statistics Canada, Catalogue No. 12­001­X 114 Särndal: The calibration approach in survey theory and practice

9. Calibration for nonresponse adjustment modeling. What theory demands of the statistician is not an 9.1 Traditional adjustment for nonresponse easy task, namely, to formulate “the true response model”, ˆ capable of providing accurate, non­biasing values θk . But The context of many good theory articles is the simple ˆ the factors 1/θ k are applied in many surveys in an uncritical one of Section 2, which includes total absence of and mechanical fashion, for example, by straight expansion nonresponse. It is good theory for conditions that seldom or within the strata already used for sample selection. never occur. (As an author of papers in that stream, I am not The traditional procedure is apparent for example in without guilt.) Practically all surveys encounter non­ Ekholm and Laaksonen (1991) and in Rizzo, Kalton and response; although undesirable, it is a natural feature, and Brick (1996). theory should incorporate it, from the outset, via a Practitioners often act as if the resulting Yˆ = perspective of selection in two phases. ˆ ∑ r dk(1/ θ k ) y k (following a more or less probing response In many surveys, nonresponse rates are extremely high ˆ modeling trying to get the θk ) is essentially unbiased, today, compared with what they were 40 years ago, that is, something which it is not (unless the ideal model happens to so low that one could essentially ignore the problem. Today, be specified); one acts (for purposes of variance estimation, survey sampling theory needs more and more to address the ˆ for example) as if π kθ k is the true selection probability of damaging consequences of nonresponse. In particular, one element k in a single step of selection, something which it is pressing objective is to examine the bias and to try to reduce definitely not. This practice, with roots in the idyllic past, it as far as possible. becomes more and more vulnerable as nonresponse rates A probability sample s is drawn from U = continue their surreptitious climb. {1, 2, ..., , ..., } ; kN the known design weight of element k An unavoidable bias results from the replacement of θ k is d k = 1/π k . Nonresponse occurs, leaving a response set r, ˆ by θk . Decades ago, when the typical nonresponse was but a subset of s; the study variable value yk is observed for a few per cent, it was defendable to ignore this bias, but with k∈ r only. The unknown response probability of element k today’s galloping nonresponse rates, the practice becomes ˆ is Pr(k∈= r | s ) θk . The unbiased estimator Y=∑ r dk φ k y k untenable. By first principles, unbiased estimation is the is ruled out because φk = 1/θ k is unknown. To keep the idea goal, not an estimation where the squared bias is a of a linearly weighted sum, how do we then construct the dominating (and unknown) contributor to the Mean Squared weights? Unit nonresponse adjustment by weighting, based Error. We must resolve to limit the bias as much as possible. on “nonresponse modeling”, has a long history. Calibration Calibration reasoning can help in constructing an auxiliary offers a newer perspective. vector that meets this objective. In what we may call “the traditional procedure”, the 9.2 Calibration for nonresponse bias adjustment probability design weights d k = 1/π k are first adjusted for nonresponse and possibly for other imperfections such as More or less contrasting with the traditional procedure outliers. The information used for this step is often a are a number of recent papers that emphasize calibration grouping of the sampled elements. Finally, if reliable reasoning to achieve the nonresponse adjustment. Recent population totals are accessible, the adjusted design weights references are Deville (1998, 2002), Ardilly (2006), chapter are subjected to a calibration with respect to those totals. 3, Skinner (1998), Folsom and Singh (2000), Fuller (2002), The methodology of the Labour Force Survey of Canada, Lundström and Särndal (1999), Särndal and Lundström described in Statistics Canada (1998), exemplifies this (2005) and Kott (2006). widespread practice. A (modified) design weight is first Calibration reasoning starts by assessing the total computed for a given household, as the product of three available auxiliary information: information at the sample factors. The product of the design weight and a nonresponse level (auxiliary variable values observed for respondents adjustment factor is called the sub­weight. The sub­weights and for nonrespondents), information at the population level are subjected in the final step to a calibration with respect to (known population auxiliary totals). The objective is to postcensal, highly accurate estimates of population by age make the best of the two sources combined, so as to reduce group, sex and sub­provincial regions. The final weights both bias and variance. The design weights are modified, in meet the desirable objective of consistency, in regions one or two calibration steps, to make them reflect (i) the within a province, with the postcensal estimates. The outcome of the response phase, (ii) the individual nonresponse bias remaining in the resulting estimates is characteristics of the respondents, and (iii) the specified unknown but believed to be modest. auxiliary information. The information can be summarized The traditional procedure is embodied in the estimator as follows: ˆ ˆ type Y= ∑ r dk(1/ θ k ) y k , where θ k has been estimated by ˆ θ k in a preliminary step, using response (propensity)

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 115

∗ Population level: The vector value x k is known (specified probably more important than the choice of the weighting in the frame) for every ∈ , kU thus known for every k∈ s methodology.” ∗ and for every k∈ r ; ∑U x k is a known population total. Let us examine the bias, when k= zx k . We need to o compare alternative x ­vectors in order to finally settle one Sample level: The vector value x is known (observed) for k k likely to yield the smallest bias. (I assume x to be such every ∈ , ks and thereby known for every ∈ ; kr the k o o that ′ k = μx 1 for all k and some constant vector , μ as is the unknown total ∑ U x k is estimated without bias by ∑ s d kx k . case for many x k ­vectors, including the examples 1 to 5 at Calibration on this composite information can be done in the beginning of Section 2.) A close approximation to the two steps (intermediate weights computed first, then used in ˆ bias of YCAL is obtained by Taylor linearization as the second step to produce final weights) or directly in one ˆ nearbias( YCAL )= (∑U k )′ ( U; θ − xB B U ), which involves a single step. Modest differences only are expected in bias difference between the weighted regression coefficient and variance of the estimates. In the single step option, the −1 U; θ = B( ∑U θk x k x′ k ) ∑U θ k x ky k and the unweighted one, combined auxiliary vector and the corresponding infor­ −1 U = B(∑U xk x′ k ) ( ∑U x ky k ). Unless all θ k are equal, the mation are bias caused by the difference in the two regression vectors ∗ ∗  may be substantial, even though x k is a seemingly “good x  ∑ x k x=k  ; X = U  . auxiliary vector”. This expression for nearbias is given in k x o  d x o  k  ∑ s k k  Särndal and Lundström (2005); related bias expressions, under different conditions, are found in Bethlehem (1988) Using an extension of the instrument vector method in and Fuller et al. (1994). We can write alternatively Section 4.3, we seek calibrated weights wk= d k v k , where ˆ nearbias( YCAL )=∑U (θkM k − 1) y k , where M k = ()∑U x k ′ −1 vk = F () λ′ z k is the nonresponse adjustment factor, with a ( ∑U θkx k x′ k ) x k . In comparing possible alternatives x k , a vector λ determined through the calibration equation convenient benchmark is the “primitive auxiliary vector”, ∑ r wk k = xX; the resulting calibration estimator is = x 1 for all ∈ , kU which gives Yˆ = ˆ k CAL YCAL = ∑ r wk y k . It is enough to specify the instrument N yr = N∑ r yk/, n r where nr is the number or vector value z k for respondents only; z k is allowed to respondents, with nearbias()(), N yr = N yU;θ − y U where differ from x k . The function ( ⋅ ) F has the same role as in yU ;θ = ∑U θk y k / ∑U θ k and yU = ∑U yk /. N The ratio Sections 4.2 and 4.3. Here, F ()λ′ z k implicitly estimates the nearbias( Y ˆ ) (θM− 1) y inverse response probability, φk = 1/ θ k as Deville (2002), ˆ CAL ∑ U k k k relbias( Y CAL ) = = Dupont (1995), Kott (2006) have noted. In the linear case, nearbias()() N yr N yU;θ − y U F( u )=+ 1 u , and vk =1 + λ ' z k , with − 1 λ'(=∑U x − ∑ r d x)().′ ∑ r d z x ′ measures how well a candidate vector x k succeeds in k k k k k k o The variables that make up the vector x k , although controlling the bias, when compared with the primitive observed for sampled elements only, can be crucially vector. We seek an x k that will give a small bias. But ˆ important for the reduction of nonresponse bias (although relbias() YCAL is not a computable bias indicator; it depends ∗ on unobserved y and on unobservable θ . We need a less important than the x k for the reduction of variance). k k ˆ For example, Beaumont (2005b) discusses data collection computable indicator that approximates relbias() YCAL and o process variables can be used in building the x k vector depends on the x ­vector but not on the y­variables, of component. which the survey may have many. ˆ It is easy to see that relbias( YCAL ) = 0 if an ideal 9.3 Building the auxiliary vector (probably non­existent) x­vector could be constructed such ′ In some surveys, there are many potential auxiliary that φk =1/ θ k = λ x k for all k∈ U and some constant variables, as pointed out for example by Rizzo, Kalton and vector . λ Brick (1996), and Särndal and Lundström (2005). For For an x­vector that can actually be formed in the survey, example, for surveys on households and individuals in we can at least obtain predictions of the φ k : Determine λ ′ 2 ˆ Scandinavia, a supply of potential auxiliary variables can be to minimize ∑U θk (φ k − λ x k ) ; we find = λλ U , where ˆ ′ ′ ′ − 1 derived from a matching of existing high quality U = λ(∑U xk ) ( ∑ U θ k x k x k ) ; the predicted value of φ k is ˆ ˆ ′ administrative registers. A decision then has to be made φkU = U k = λx M k . The (theta­weighted) first and second ˆ which of these variables should be selected for inclusion in moment of the predictions φ=kU M k are, respectively, MMN= ∑U θ / ∑U θ= / ∑U θ = 1/ θ and the auxiliary vector x k to make it as effective as possible, U ;θ k k k k U for bias reduction in particular. As Rizzo, Kalton and Brick 1 Q = θ(MM− )2 = (1/θ)(M − 1/θ) (1996) point out, “the choice of auxiliary variables is … θ ∑ U k k U ;θ U U U ∑U k

Statistics Canada, Catalogue No. 12­001­X 116 Särndal: The calibration approach in survey theory and practice

where MMNU = ∑U k /. Särndal and Lundström (2007) sketch of how a treatment of several of the nonresponse ˆ show that relbias() YCAL and Q have under certain errors may be accomplished under the caption of calibration conditions an approximately linear relationship, thinking. Folsom and Singh (2000) present a weight calibration Q relbias( Y ˆ )≈ 1 − method using what they call the generalized exponential CAL Q 0 model (GEM). It deals with three aspects: extreme value treatment, nonresponse adjustment and calibration through where φU =∑U φ k / N and Q0 =(1/ θU )( φ U − 1/ θ U ) is the maximum value of . Q Thus if Q were computable, it could post­stratification. The method provides built­in control for serve as an indicator for comparing the different candidate extreme values. Calibration to treat both coverage errors ˆ (under­ or over­coverage of the frame) and nonresponse is x k ­vectors. A computable analogue Q of Q is instead obtained as the variance of the corresponding sample­based discussed in Särndal and Lundström (2005) and Kott ˆ ˆ ′ (2006). Skinner (1998) discusses uses of calibration in the predictions φks =′ s k = λx()∑ s d k x k −1 presence of nonresponse and measurement error. He notes (),∑ r d x x′ x = m so that k k k k k something which remains a challenge almost ten years later: 1 Qˆ = d()() m− m2 = m m − m “More research is needed to investigate the properties of d ∑ r k k rd; rd; sd ; rd ; ∑ r k calibration estimates in the presence of non­sampling errors”. where d m d d m m =∑ r k k = ∑s k ; m = ∑ s k k . 11. Conclusion r; d d d s; d d ∑r k ∑r k ∑ s k If I am to select one issue for a concluding reflection on We expect relbias to decrease in a roughly linear fashion as the contents of this paper, let me focus on the concept of ˆ ˆ Q increases; thus, independently of the y­variables, Q may auxiliary information. It is the pivotal concept in the paper. be used as a tool for ranking different x­vectors in regard to If there is not auxiliary information, there is no calibration their capacity of reduce the bias. approach; there is nothing to calibrate on. I noted on the ˆ We can use Q as a tool to select x­variables for inclusion other hand that regression (GREG) estimation is an in the x k ­vector, for example, by stepwise forward alternative but different thought process for putting auxiliary selection, so that variables are added to x k one at a time, the information to work in the estimation. variable to enter in a given step being the one that gives the An objective in this paper has been to give a portrait of ˆ largest increment in . Q The method is described in Särndal the two types of reasoning, and I made a point of noting and Lundström (2007). how the thinking differs. I gave examples where essentially the same estimation objective is tackled by some authors 10. Calibration to account for other through calibration reasoning, by others through GREG non­sampling error reasoning (or at least primarily by one or the other type). The respective estimators that they end up recommending Nonresponse errors are critical determinants of the may or may not agree. Whether or not the difference has quality of published statistics. When we examine how the significant consequence (for variance, for bias, for practical calibration approach may intervene in the treatment other matters such as consistency and transparency) depends on sources of non­sampling error than the nonresponse, the the situation. This paper may help contributing an awareness literature to date is not surprisingly much less extensive. of the separation existing between two thought processes However, several authors sketch a calibration reasoning to that have guided researchers survey sampling. also incorporate frame errors, measurement errors, and outliers. Calibration has a potential to provide a more References general theory for estimation in surveys, encompassing the various non­sampling errors. Alexander, C.H. (1987). A class of methods for using person controls As Deville (2004) points out (my translation from the in household weighting. Survey Methodology, 13, 183­198. French): “The concept of calibration lends itself to be Andersson, P.G., and Thorburn, D. (2005). An optimal calibration applied with ease and efficiency to a great variety of distance leading to the optimal regression estimator. Survey Methodology, 31, 95­99. problems in survey sampling. Its scope goes beyond that of Ardilly, P. (2006). Les techniques de sondage. Paris: Editions regression estimation, an idea to which some seem to wish Technip. to reduce the calibration approach”. He provides a brief

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 117

Bankier, M.D., Rathwell, S. and Majkowski, M. (1992). Two step Estevao, V.M., and Särndal, C.­E. (2002). The ten cases of auxiliary generalized least squares estimation in the 1991 Canadian Census. information for calibration in two­phase sampling. Journal of Working paper, Census Operations Section, Social Surveys Official Statistics, 18, 233­255. Methods Division, Statistics Canada. Estevao, V.M., and Särndal, C.­E. (2004). Borrowing strength is not Bankier, M., Houle, A.M. and Luc, M. (1997). Calibration estimation the best technique within a wide class of design­consistent domain in the 1991 and 1996 Canadian censuses. Proceedings, Section on estimators. Journal of Official Statistics, 20, 645­660. Survey Research Methods, American Statistical Association, 66­ Estevao, V.M., and Särndal, C.­E. (2006). Survey estimates by 75. calibration on complex auxiliary information. International Beaumont, J.­F. (2005a). Calibrated imputation in surveys under a Statistical Review, 74, 127­147. quasi­model­assisted approach. Journal of the Royal Statistical Firth, D., and Bennett, K.E. (1998). Robust models in probability Society B, 67, 445­458. sampling. Journal of the Royal Statistical Society, 60, 3­21. Beaumont, J.­F. (2005b). On the use of data collection process Folsom, R.E., and Singh, A.C. (2000). The generalized exponential information for the treatment of unit nonresponse through weight model for design weight calibration for extreme values, adjustment. Survey Methodology, 31, 227­231. nonresponse and poststratification. Proceedings, Section on Beaumont, J.­F., and Alavi, A. (2004). Robust generalized regression Survey Research Methods, American Statistical Association, 598­ estimation. Survey Methodology, 30, 195­208. 603. Bethlehem, J.G. (1988). Reduction of nonresponse bias through Fuller, W.A. (2002). Regression estimation for survey samples. regression estimation. Journal of Official Statistics, 4, 251­260. Survey Methodology, 28, 5­23. Fuller, W.A., Loughin, M.M. and Baker, H.D. (1994). Regression Bethlehem, J.G., and Keller, W.J. (1987). Linear weighting of sample weighting in the presence of nonresponse with application to the survey data. Journal of Official Statistics, 3, 141­153. 1987­1988 nationwide Food Consumption Survey. Survey Breidt, F.J., and Opsomer, J.D. (2000). Local polynomial regression Methodology, 20, 75­85. estimators in survey sampling. The Annals of Statistics, 28, 1026­ Fuller, W.A., and Rao, J.N.K. (2001). A regression composite 1053. estimator with application to the Canadian Labour Force Survey. Breidt, F.J., Claeskens, G. and Opsomer, J.D. (2005). Model­assisted Survey Methodology, 27, 45­51. estimation for complex surveys using penalized splines. Hansen, M.H., and Hurwitz, W.N. (1943). On the theory of sampling Biometrika, 92, 831­846. from finite populations. Annals of Mathematical Statistics, 14, Chambers, R.L. (1996). Robust case­weighting for multipurpose 333­362. establishment surveys. Journal of Official Statistics, 12, 3­32. Harms, T. (2003). Extensions of the calibration approach: Calibration Chambers, R.L., Dorfman, A.H. and Wehrly, T.E. (1993). Bias robust of distribution functions and its link to small area estimators. estimation in finite populations nonparametric calibration. Journal Chintex working paper no. 13, Federal Statistical Office, of the American Statistical Association, 88, 268­277. Germany. Deming, W.E. (1943). Statistical Adjustment of Data. New York: Harms, T., and Duchesne, P. (2006). On calibration estimation for John Wiley & Sons, Inc. quantiles. Survey Methodology, 32, 37­52. Demnati, A., and Rao, J.N.K. (2004). Linearization variance Hidiroglou, M.A. (2001). Double sampling. Survey Methodology, 27, estimators for survey data. Survey Methodology, 30, 17­26. 143–154. Deville, J.C. (1998). La correction de la nonréponse par calage ou par Hidiroglou, M.A., and Särndal, C.­E. (1998). Use of auxiliary échantillonnage équilibré. Paper presented at the Congrès de information for two­phase sampling. Survey Methodology, 24, 11­ l’ACFAS, Sherbrooke, Québec. 20. Deville, J.C. (2002). La correction de la nonréponse par calage Horvitz, D.G., and Thompson, D.J. (1952). A generalization of généralisé. Actes des Journées de Méthodologie, I.N.S.E.E., Paris. sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663­685. Deville, J.C. (2004). Calage, calage généralisé et hypercalage. Internal document, INSEE, Paris. Huang, E.T., and Fuller, W.A. (1978). Nonnegative regression estimation for sample survey data. Proceedings, Social Statistics Deville, J.C., and Särndal, C.­E. (1992). Calibration estimators in Section, American Statistical Association, 300­305. survey sampling. Journal of the American Statistical Association, 87, 376­382. Isaki, C.T., Tsay, J.H. and Fuller, W.A. (2004). Weighting sample data subject to independent controls. Survey Methodology, 30, 35­ Deville, J.C., Särndal, C.­E. and Sautory, O. (1993). Generalized 44. raking procedures in survey sampling. Journal of the American Statistical Association, 88, 1013­1020. Kalton, G., and Flores­Cervantes, I. (1998). Weighting methods. In New Methods for Survey Research (Eds. A.Westlake, J. Martin, Duchesne, P. (1999). Robust calibration estimators. Survey M. Rigg and C. Skinner),. Berkeley, U.K.: Association for Survey Methodology, 25, 43­56. Computing. Dupont, F. (1995). Alternative adjustments where there are several Kim, J., Breidt, F.J. and Opsomer, J.D. (2005). Nonparametric levels of auxiliary information. Survey Methodology, 21, 125­135. regression estimation of finite population totals under two­stage Ekholm, A., and Laaksonen, S. (1991). Weighting via response sampling. Unpublished manuscript. modeling in the Finish Household Budget Survey. Journal of Knottnerus, P., and van Duin, C. (2006). Variances in repeated Official Statistics, 3, 325­337. weighting with an application to the Dutch Labour Force Survey. Estevao, V.M., and Särndal, C.­E. (2000). A functional form Journal of Official Statistics, 22, 565­584. approach to calibration. Journal of Official Statistics, 16, 379­399.

Statistics Canada, Catalogue No. 12­001­X 118 Särndal: The calibration approach in survey theory and practice

Kott, P.S. (2004). Comment on Demnati and Rao: Linearization Plikusas, A. (2006). Non­linear calibration. Proceedings, Workshop variance estimators for survey data. Survey Methodology, 30, 27­ on Survey Sampling, Venspils, Latvia. Riga: Central Statistical 28. Bureau of Latvia. Kott, P.S. (2006). Using calibration weighting to adjust for Ren, R. (2002). Estimation de la fonction de répartition et des fractiles nonresponse and coverage errors. Survey Methodology, 32, 133­ d’une population finie. Actes des journées de méthodologie 142. statistique, INSEE Méthodes, tome 1, 100, 263­289. Kovaĉević, M.S. (1997). Calibration estimation of cumulative Renssen, R.H., and Nieuwenbroek, N.J. (1997). Aligning estimates distribution and quantile functions from survey data. Proceedings for common variables in two or more sample surveys. Journal of of the Survey Methods Section, Statistical Society of Canada, 139­ the American Statistical Association, 92, 368­374. 144. Renssen, R.H., Kroese, A.H. and Willeboordse, A.J. (2001). Aligning Krapavickaitė, D., and Plikusas, A. (2005). Estimation of a ratio in the estimates by repeated weighting. Report, Central Bureau of finite population. Informatica, 16, 347­364. Statistics, The Netherlands. Lavallée, P. (2007). Indirect Sampling. New York: Springer­Verlag. Rizzo, L., Kalton, G. and Brick, J.M. (1996). A comparison of some weighting adjustment methods for panel nonresponse. Survey LeGuennec, J., and Sautory, O. (2002). CALMAR2: une nouvelle Methodology, 22, 43­53. version de la macro CALMAR de redressement d’échantillon par calage. Actes des Journées de Méthodologie, INSEE, Paris. Rueda, M., Martínez, S., Martínez, H. and Arcos, A. (2007). Estimation of the distribution function with calibration methods. Lehtonen, R., and Veijanen, A. (1998). Logistic generalized Journal of Statistical Planning and Inference, 137, 435­448. regression estimators. Survey Methodology, 24, 51­55. Särndal, C.­E. (1982). Implications of survey design for generalized Lehtonen, R., Särndal, C.­E. and Veijanen, A. (2003). The effect of regression estimation of linear functions. Journal of Statistical model choice in estimation for domains, including small domains. Planning and Inference, 7, 155­170. Survey Methodology, 29, 33­44. Särndal, C.­E., and Swensson, B. (1987). A general view of Lehtonen, R., Särndal, C.­E. and Veijanen, A. (2005). Does the model estimation for two phases of selection with application to two­ matter? Comparing model­assisted and model­dependent phase sampling and nonresponse. International Statistical Review, estimators of class frequencies for domains. Statistics in 55, 279­294. Transition, 7, 649­674. Lemaître, G., and Dufour, J. (1987). An integrated method for Särndal, C.E., Swensson, B. and Wretman, J. (1992). Model­assisted weighting persons and families. Survey Methodology, 13, 199­ Survey Sampling. New York: Springer­Verlag. 207. Särndal, C.­E., and Lundström, S. (2005). Estimation in Surveys with Lundström, S., and Särndal, C.­E. (1999). Calibration as a standard Nonresponse. New York: John Wiley & Sons, Inc. method for treatment of nonresponse. Journal of Official Särndal, C.­E., and Lundström, S. (2007). Assessing auxiliary vectors Statistics, 15, 305­327. for control of nonresponse bias in the calibration estimator. Merkouris, T. (2004). Combining independent regression estimators Statistics Sweden: Research and development ­ methodology from multiple surveys. Journal of the American Statistical report 2007:2, to appear, Journal of Official Statistics. Association, 99, 1131­1139. Singh, A.C., and Mohl, C.A. (1996). Understanding calibration Montanari, G.E. (1987). Post­sampling efficient prediction in large­ estimators in survey sampling. Survey Methodology, 22, 107­115. scale surveys. International Statistical Review, 55, 191­202. Singh, S., Horn. S. and Yu, F. (1998). Estimation of variance of Montanari, G.E., and Ranalli, M.G. (2003). On calibration methods general regression estimator: Higher level calibration approach. for design­based finite population inferences. Bulletin of the Survey Methodology, 24, 41­50. International Statistical Institute, 54 th session, volume LX, Skinner, C. (1998). Calibration weighting and non­sampling errors. contributed papers, book 2, 81­82. Proceedings International Seminar on New Techniques for Montanari, G.E., and Ranalli, M.G. (2005). Nonparametric model­ Statistics, Sorrento, November 4­6, 1998, 55­62. calibration estimation in survey sampling. Journal of the Statistics Canada (1998). Methodology of the Canadian Labour Force American Statistical Association, 100, 1429­1442. Survey. Statistics Canada, Household Survey Methods Division. Myrskylä, M. (2007). Generalised regression estimation for domain Ottawa: Minister of Industry, catalogue no. 71­526­XPB. class frequencies. Statistics Finland Research Reports 247. Statistics Canada (2003). Quality Guidelines (fourth edition). Ottawa: Nieuwenbroek, N.J. (1993). An integrated method for weighting Minister of Industry, Catalogue no. 12­539­XIE. characteristics of persons and households using the linear Steel, D.G., and Clark, R.G. (2007). Person­level and household­level regression estimator. Report, Central Bureau of Statistics, The regression estimation in household surveys. Survey Methodology, Netherlands. 33, 51­60. Nieuwenbroek, N.J., and Boonstra, H.J. (2002). Bascula 4.0 for Stukel, D.M., Hidiroglou, M.A. and Särndal, C.­E. (1996). Variance weighting sample survey data with estimation of variances. The estimation for calibration estimators: A comparison of jackknifing Survey Statistician, Software Reviews, July 2002. versus Taylor linearization. Survey Methodology, 22, 117­125. Nieuwenbroek, N.J., Renssen, R.H. and Hofman, L. (2000). Towards Théberge, A. (1999). Extension of calibration estimators in survey a generalized weighting system. In Proceedings, Second sampling. Journal of the American Statistical Association, 94, International Conference on Establishment Surveys, American 635­644. Statistical Association, Alexandria VA. Théberge, A. (2000). Calibration and restricted weights. Survey Park, M. and Fuller, W.A. (2005). Towards nonnegative regression Methodology, 26, 99­107. weights for survey samples. Survey Methodology, 31, 85­93.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 119

Tillé, Y. (2002). Unbiased estimation by calibration on distribution in Webber, M., Latouche, M. and Rancourt, E. (2000). Harmonised simple sampling designs without replacement. Survey calibration of income statistics. Statistics Canada, internal Methodology, 28, 77­85. document, April 2000. Tracy, D.S., Singh, S. and Arnab, R. (2003). Note on calibration in Wu, C. (2003). Optimal calibration estimators in survey sampling. stratified and double sampling. Survey Methodology, 29, 99­104. Biometrika, 90, 937­951. Vanderhoeft, C. (2001). Generalised calibration at Statistics Belgium. Wu, C., and Sitter, R.R. (2001). A model­calibration approach to SPSS Module g­CALIB­S and current practices. Statistics using complete auxiliary information from survey data. Journal of Belgium Working Paper no. 3. Available at: the American Statistical Association, 96, 185­193. www.statbel.fgov.be/studies/paper03_en.asp. Zheng, H., and Little, R.J.A. (2003). Penalized spline model­based Vanderhoeft, C., Waeytens, E. and Museux, J.M. (2001). Generalised estimation of the finite population total from probability­ calibration with SPSS 9.0 for Windows baser. In Enquêtes, proportional­to­size samples. Journal of Official Statistics, 19, 99­ Modèles et Applications (Eds. J.J. Droesbeke and L. Lebart), 117. Paris: Dunod Zieschang, K.D. (1990). Sample weighting methods and estimation of totals in the Consumer Expenditure Survey. Journal of the American Statistical Association, 85, 986­1001.

Statistics Canada, Catalogue No. 12­001­X

Survey Methodology, December 2007 121 Vol. 33, No. 2, pp. 121­130 Statistics Canada, Catalogue No. 12­001­X

Weighting for two­phase surveyed data

Seppo Laaksonen 1

Abstract Missingness may occur in various forms. In this paper, we consider unit non­response, and hence make attempts for adjustments by appropriate weighting. Our empirical case concerns two­phase sampling so that first, a large sample survey was conducted using a fairly general questionnaire. At the end of this contact the interviewer asked whether the respondent was willing to participate in the second phase survey with a more detailed questionnaire concentrating on some themes of the first survey. This procedure leads to three missingness mechanisms. Our problem is how to weight the second survey respondents as correctly as possible so that the results from this survey are consistent with those obtained with the first phase survey. The paper first analyses missingness differences in these three steps using a human survey dataset, and then compares different weighting approaches. Our recommendation is that all available auxiliary data should have been used in the best way. This works well with a mixture of the two classic methods that first exploits response propensity weighting and then calibrates these weights to the known population distributions.

Key Words: Calibration; Internal vs. external auxiliary variables; Response propensity modelling method; Selective sub­sample.

1. Introduction (i) A survey consisting of two (or, in some sense, three) steps or phases. The first phase is like a A standard survey is composed of one step or phase. This standard survey, in which a certain number of means that the potential survey units have first been chosen units respond. For the second phase, we only using a certain sampling design, and attempts have then keep in the frame the respondents who are willing been made to contact and interview these units as well as to contribute to a more detailed survey. This leads possible. However, varying amounts of non­response or first to having to distinguish such first­phase other forms of missingness or data deficiencies will have respondents who say they are willing to partic­ occurred. Usually, addressing missingness leads to the ipate voluntarily, on one hand, and those of these application of post­survey adjustment methods of varying respondents who actually answer the second degrees of sophistication, which take advantage of available questionnaire. This being the case, the latter sub­ auxiliary variables. The auxiliary variables may be derived group will thus respond to both questionnaires. from various sources (see e.g., Laaksonen 1999 or an (ii) When making attempts for post­survey adjust­ extended version in Laaksonen 2006b, and Lundström and ments, we will have the option to exploit both Särndal 2001), but for weighting purposes these are usually external and internal auxiliary variables in the taken from registers, or other administrative sources or second phase. The internal variables will thus be surveys. These kinds of auxiliary variables could be called available from the first survey. external, if we want to distinguish them from internal auxiliary variables, that is, internal in the sense that the We are only considering weighting adjustments although information is derived from the same survey or from its some of our ideas could be used in imputations, too. The predecessor. approach of the paper has not been much used in cross­ Internal auxiliary variables are especially used for sectional surveys although the same problem has been often imputations when the values for some items are missing. met. For instance, it is typical that a face­to­face survey is Such variables are also extensively used in panel surveys if conducted first and that at the end of it the interviewer will a certain respondent has responded in one wave but not in request the interviewee to respond to a self­administered another. In panel surveys, internal auxiliary information additional questionnaire and, if the respondent is willing, the may be used both for weighting adjustments and for interviewer will hand out the questionnaire immediately for imputations. filling in, or submit it later to the volunteer. In both cases, This paper does not concern a standard survey as the answers will be received by post or email. A recent described above. It discusses two special characteristics: example of this type is the European Social Survey (ESS) in which the supplementary questionnaire concerns especially the values of life (see www.Europeansocialsurvey.com).

1. Seppo Laaksonen, Department of Mathematics and Statistics, Box 68, FIN­00014 University of Helsinki, Finland. E­mail: Seppo.Laaksonen@ Helsinki.Fi. 122 Laaksonen: Weighting for two­phase surveyed data

Naturally, not all face­to­face respondents fill in this per cent of the initial sample. We now have the following questionnaire. three datasets available for the analysis: The second­phase questions do not necessarily concern A. First­phase respondents with survey variables Y 1 the same topic as those of the core questionnaire. Another B. Second­phase respondents with survey variables usual strategy is to start with a broad questionnaire on a Y 2 specific subject and then continue in the second phase with C. Both first and second­phase respondents with more detailed questions about the same subject. There can survey variables Y and . Y be some feedback from the first phase to the second 1 2 questionnaire, and even to a sample that depends on the Most users will receive both files, A and , B and they distribution of the key variables of the first survey (this is an can merge these together and obtain file C if a common example of adaptive sampling). This is often the case where identifier is available. What does a user expect having there is not much experience in this type of a survey. Thus, received both data files? Naturally, that the estimate for the the first survey also plays the role of a pilot survey. The so­ same parameter from both files is as identical as possible, called master samples are also close to this idea whereby the that is, the results are consistent with each other. The user first phase survey (including a sample from administrative obviously understands that a certain parameter estimated sources, like a micro census in some countries) may be from the smaller file C is less accurate than that estimated conducted for constructing an appropriate sampling frame. from a larger file. In principle, it is possible to impute the In this case, the variables of the master sample are fairly missing values for variables 2 , Y but we do not believe that it limited, including usually only factual or background is possible to do this well, hence we approach this question information. by weighting. Our aim is to attempt to construct adjusted In the case of master sampling, the objective is that the sampling weights for file B so that the analysis over constructed sampling frame is a good representation of the variables Y1 and Y2 is as adequate as possible. target population. Hence, when going to a sample, this Several strategies can be used for this kind of weighting. frame information can be well used as auxiliary data for Useful general aspects have been presented by, among editing, imputing and weighting of the real survey (second others, Kalton and Kasprzyk (1986), Little (1986), Särndal, phase survey). Each real survey is thus a sub­sample of the Swensson and Wretman (1992), Fuller, Loughin and Baker master sample. We consider here a more complex case as (1994), Wu and Sitter (2001), and Lundström and Särndal illustrated by Table 1. (2001). If we assume that the missingness only depends on the sampling design, we can construct the weights for files Table 1 Illustration of initial sample and three follow­up datasets A and B in the respective way. For example, if stratified random sampling has been applied, the same stratification Sample First Phase Volunteers Second Phase with Respondents for the Respondents would naturally be applied to both phases. In the case of Auxiliary with Variables Second Phase with Variables post­stratification, an analogous strategy may be applied.

Variables Y1 with Y2 In our particular example survey, the sampling frame X (e.g., Health, Variables (e.g., Skating, contained the respondents of the first rotation group of the (Gender, Age, Outdoor) Y1 Boating) Region, 12 months of the Finnish LFS. Each monthly sample was Season) Basic Weights Weights drawn randomly. The LFS is based on simple random Weights for for sampling, but due to nonresponse these weights were Design for 8,481 Units 5,480 Units adjusted by a standard calibration technique (Deville, Weights 10,666 Units for Särndal and Sautory 1993) using gender, age group (six 12,554 Units categories) and region (five categories) as auxiliary without variables. Later, we refer to these as design weights. The Overcoverage basic weights for the first­phase respondents were constructed correspondingly adding variable season (4×= 2 First, there is a standard sampling procedure including 8 categories over two years) to the pattern of auxiliary some auxiliary variables . X A fairly high response rate was variables. The ‘season’ variable is rarely used in Finnish obtained (10,666 out of 12,554 units, about 85%) for the surveys but was here considered to be necessary due to the first survey. Some attrition occurred due to the fact that all ‘seasonal’ nature of the survey (see Section 2). The present respondents were not willing to participate in the second paper does not consider this aspect in detail. The first three survey (we now have 68% of the initial sample left). Due to variables are usual in Finnish human surveys because such a rather high non­response rate in this second phase (in spite information can be validated well from updates in the of voluntariness), our remaining sub­sample covers only 44 population register. This being the case, we now presume

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 123 that we will have the best possible estimates for the first­ conducted, covering various leisure time and hobby phase respondents when using these adjusted calibrated questions such as cycling, motorcycling, walking, jogging, weights. In any case, we have no further access to other sailing, swimming, hunting, fishing, nature photography, possible useful information from external sources. skiing, skating and riding. In all cases, the reference period It is possible to use estimates obtained from the first­ was the previous year. Second, at the end of this survey the phase respondents as benchmarking information to calibrate respondents were asked whether they would be willing to the second­phase weights. This strategy is not difficult as receive a special postal survey questionnaire in which more such, but all variables X and as many of the variables Y1 detailed questions would be asked about some of these as possible should be included in this process. Moreover, as activities. This survey would be conducted in a few weeks’ precise aggregate or domain levels as possible should be time. found for this strategy, which is not an easy job and hence The survey was conducted over two years (1998­2000) in not attempted in this study. Nevertheless, it is not order to reduce response and interviewing burden. Another guaranteed that the estimates for other aggregates will be reason for this was that since these activities are seasonal to unbiased enough (results in Laaksonen 1999 give some some extent, the responses were expected to be seasonally evidence for this conclusion). influenced (e.g., responses to questions about skiing might My proposed strategy is more straightforward and works be different in summer and winter). The initial sample size without technical problems for all domains although it is not after the removal of overcoverage (104 units of over­ of course guaranteed that a possible bias will be coverage) was 12,554 individuals. substantially reduced for all domains. Thus, I have not tried We chose the following binary variables for our analysis any advanced calibration strategy, although this could be presented in Section 4: Outdoors (person has performed workable. I hope that other authors will show its possible regularly some outdoor activities in the nature), Health (is benefits. A useful reference for them is the paper by Dupont health good enough for outdoor activities?), Skiing, Fishing, (1995) that considers calibration of two­phase survey data, Skating, Boating, Cycling and Jogging. In all cases, value = however without empirical evidence. It should be noted that 1 means that a person has engaged in the activity during the I use calibration, but not a very advanced one (see Section preceding year, and value = 0, respectively, the opposite. All 3). these variables were included in the first­phase question­ The proposed methodology of this paper is largely based naire and we hence knew what to expect after the two on a response propensity modelling that has been consecutive phases. Note, that there are more complex successfully used in other types of situations, see e.g., variables in the data set but this simpler choice was made in Ekholm and Laaksonen (1991), Laaksonen (1999), Duncan order to interpret results more easily. The main conclusions and Stasny (2001), and Laaksonen and Chambers (2006). are the same in the case of another choice. The situation of Rizzo, Kalton and Brick (1996) is fairly In Section 4, we present two types of comparison, (i) close to the two­phase case of this paper, although it is those based on known information from the first phase, and concerned with a panel. Their methodology also has some (ii) those not based on known information. In both cases, we similar features. In addition, a major difference concerns the can fortunately check how well we have succeeded in the response mechanism that here occurs in two steps, that is, reduction of bias since we actually know the ‘true’ (or best both due to voluntariness and due to response in the second possible) estimates. In addition, we analyse some variables phase. We analyse these steps separately, too. Naturally, we only included in the second questionnaire, but we cannot compare the results obtained with alternative techniques. In say definitely how well each method works in these cases. Section 2, we briefly further describe our surveys and We do not present the latest results in detail but these were datasets, and Section 3 details the principles of our methods. observed to behave similarly to those of the second Section 4 presents comparison results, and Section 5 draws approach. a conclusion. 3. Response propensity modelling 2. Principles of the datasets method and calibration

The data are from a special survey conducted among This study comprises three steps with the following Finnish citizens aged 15­74 years old (for more information, weighting specifications: see Virtanen, Pouta, Sievänen and Laaksonen 2001). The First, well­designed calibrated sampling weights for the topic concerns their leisure time activities, especially first phase respondents were created using the variables relating to outdoor hobbies and activities. First, a CAPI Region, Gender, Age group and Season (see also Section 1).

(computer­assisted personal interview) survey was Let us use symbol wk for these sampling weights for

Statistics Canada, Catalogue No. 12­001­X 124 Laaksonen: Weighting for two­phase surveyed data

respondent . k These weights thus are based on calibration, w k (vol) Step 2: wk (res) = g 2 h . and also called ‘Basic’. Note that before this we have, p2 k before naturally, constructed design weights for the dataset, based on the stratified random sampling design. These are The correct sampling weights had to be used in each thus available for the non­respondents of the first phase, too. modelling task. For models (a) and (c) this meant weights

Next, we model voluntariness/response probabilities wk , but for model (b) weights wk (vol). In our comparison using the most common link function (Logit is not tests we also modelled the first­phase response and here we necessarily the best link function as learned from (2006a), used design weights. The use of weights in the modelling but this is what we use here.), that is, logit=π log( /(1 − gives more correct estimates, since we are trying to make π)), in which π is the binary response probability (either our analysis representative for the target population. In some 1 = volunteer vs. 0 = non­volunteer or 1 = respondent vs. cases, the influence of weights is substantial, like in business 0 = non­respondent) and the explanatory variables consist surveys where weights often vary more than in standard of variables X and some variables Y1 that have been household surveys. In this case, the results between considered to be ‘good.’ The model gives the predicted weighted and unweighted models were not highly different, response probabilities pk that are now used in the following although the weighted ones should be used (this is well way when constructing each particular adjusted sampling justified in Laaksonen and Chambers 2006 in which the weight: influence of weights is substantial; Rizzo et al. 1996 also use weights). The empirical results (estimates, their standard w w(res)= k g . errors and response probabilities) for the weighted solutions k p c k are presented later in Section 4. In addition to our above key techniques, in the next Here gc = a scaling factor which benchmarks the weights to certain known aggregates at level . c There are Section we also use weights wk when providing our ‘best several alternatives for this benchmarking, but some type of possible’ estimates for such parameters that are known, thus calibration could be considered as a standard way. In this based on variables 1 . Y study we use post­stratified aggregates h (this being the Moreover, we compare our specific results using post­ cross­classified cell of all three X variables = Age group, stratified calibration only without modelling (we use also Gender and Region, the whole number of cells = symbol ‘cal’ in the remaining sections). The latter could be 6*2*5 = 60) using the following straightforward technique interpreted as a very standard way of approaching the weighting problem (this was a house style prior the w methodology proposed here). Note, however, that if a g = ∑ h k . h response model only includes the variables (and the same ∑ wk/ p k h categories) used in post­stratification, the response As already pointed out, the quality is high in Finland for propensity­based weights are exactly the same as obtained these kinds of post­stratified aggregates but not necessarily by post­stratification. for any other aggregates. Because we have two steps for the second phase, we have the following three model options that were all also 4. Empirical results used in Section 4: This Section presents results from different methods. (a) Model for voluntariness First, we give results from different response models and (b) Model for the response given that the person then go on to compare different weights with each other, volunteered (called also ‘TwoStep’). and at the end of the Section compare some parameter (c) Model for the response as one step (called also estimates based on different techniques. ‘Direct’ and ‘OneStep’). 4.1 Models for voluntariness and response Note that steps (a) and (b) together give the weights for file B. This leads to the following formulation (vol = In order to fully understand the behaviour of missingness volunteer; p1 = estimated response probability at step 1, (due to non­response and voluntariness) in all three phases

p2 = estimated response probability at step 2; respectively of the survey, we present in Table 2 results that are based on for the scaling factors g1 and 2 ): g such auxiliary variables X that are available in each step (in practice, we also used the variable ‘season’ but do not w Step 1: w (vol)= k g , include its effects in this analysis since it is not a key issue k p 1 h 1 k in this paper).

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 125

Table 2 Logistic regressions using the three common explanatory variables in the three phases, that is, for the first phase respondents, for the voluntary respondents in the second phase and for the real respondents in the second phase. The estimates are odds ratios; their 95% confidence intervals are presented in parenthesis Explanatory variables and Model 1 Model 2a Model 3a Model 4a other statistics First phase response Voluntariness Response for volunteers Second phase response Gender (ref. Female) Male 0.71 0.84 0.75 0.77 (0.65, 0.78) (0.76, 0.93) (0.68, 0.83) (0.71, 0.83) Age group (65+) 24 and under 1.00 5.57 0.51 1.49 (0.83, 1.21) (4.65, 6.68) (0.41, 0.64) (1.28, 1.73) 25­34 0.96 4.76 0.65 1.73 (0.79, 1.15) (4.00, 4.81) (0.52, 0.81) (1.49, 2.00) 35­44 0.85 4.08 0.64 1.62 (0.71, 1.02) (3.46, 4.81) (0.52, 0.80) (1.40, 1.88) 45­54 0.89 3.16 0.86 1.82 (0.74, 1.07) (2.71, 3.69) (0.69, 1.06) (1.58, 2.10) 55­64 1.18 2.05 1.15 1.75 (0.96, 1.45) (1.74, 2.41) (0.90, 1.47) (1.49, 2.04) Region (North) South­East 0.55 2.12 0.96 1.35 (0.46, 0.66) (1.79, 2.50) (0.79, 1.16) (1.17, 1.55) South­West 0.76 1.83 1.04 1.35 (0.64, 0.91) (1.57, 2.14) (0.86, 1.25) (1.18, 1.55) Mid­West 1.14 2.14 1.16 1.56 (0.91, 1.42) (1.77, 2.59) (0.93 1.43) (1.33, 1.83) Mid­East 0.96 1.20 1.15 1.19 (0.78, 1.18) (1.01, 1.44) (0.92, 1.43) (1.02, 1.40) Number of observations 12,554 10,666 8,481 10,666 ­2 Log L 10,904 10,296 8,569 14,618

There are many interesting outcomes in these actively pursue recreation in nature (Outdoor), are not consecutive missingness behaviour models. The results of willing to receive any new questionnaire, either. This is seen the first survey are fairly ordinary, for example, men from the very high odds ratios. Interestingly, the respective respond more poorly than women in both phases. The odds ratio for the variable Health is close to the one for the response propensities are also lower in the South than in the volunteers. This, thus, means that a non­healthy person is rest of the country. The differences between age groups are not very likely to volunteer, but if he/she does, she/he somewhat surprising since the middle­age groups respond responds as well as a healthy one. The tendency is similar most poorly. with the variable Outdoor. It should be noted that the non­ The voluntariness estimates are different. People in the healthy and non­outdoor domains are not very large and Mid­East and North are the least willing to participate in the although their roles in the response propensity modelling are second survey, but the response premia given that a person important, their impacts on the final estimates are not very is voluntary do not differ much. By age, it seems that dramatic (Section 4.3). younger people are more willing to participate but do not, When adding the other two internal auxiliary variables, nevertheless, respond very well. Older people, thus, seem in that is, Skiing and Fishing, the same selectiveness continues this sense to be more prepared to make a commitment than although not as substantial. As a conclusion, we see clearly young people. However, we see clearly that the oldest ones that the response mechanism of the second survey does not will be under­represented without adjustments. seem to be very non­informative. Consequently, it is When considering the two first internal auxiliary expected that this leads to some effects on reweights and on variables (Table 3), it is observed that the people who are survey estimates. These are considered in the next two sub­ not relatively healthy (variable Health) and who do not sections.

Statistics Canada, Catalogue No. 12­001-X 126 Laaksonen: Weighting for two­phase surveyed data

Table 3 Logistic regressions using some auxiliary variables from the first phase respondents in addition to those used in Table 2. The model numbers in this table and in Table 2 correspond to each other so that the response variable and the datasets are the same Explanatory variables Model 2b Model 3b Model 4b Model 4c and other statistics Voluntariness Response for volunteers Second phase response Second phase response Gender (ref. Female) Male 0.94 0.77 0.82 0.75 (0.85, 1.04) (0.69, 0.85) (0.75, 0.88) (0.68, 0.83) Age group (65+) 24 and under 4.92 0.52 1.30 0.51 (4.07, 5.97) (0.41, 0.65) (1.12, 1.52) (0.41, 0.64) 25­34 3.83 0.65 1.46 0.65 (3.18, 4.60) (0.52, 0.81) (1.25, 1.70) (0.52, 0.81) 35­44 3.26 0.64 1.37 0.64 (2.74, 3.88) (0.51, 0.80) (1.18, 1.58) (0.52, 0.80) 45­54 2.59 0.85 1.56 0.86 (2.20, 3.05) (0.68, 1.06) (1.34, 1.81) (0.69, 1.06) 55­64 1.73 1.18 1.55 1.15 (1.45, 2.05) (0.89, 1.46) (1.32, 1.81) (0.90, 1.47) Region (North) South­East 2.15 0.96 1.34 0.96 (1.81, 2.55) (0.79, 1.16) (1.16, 1.54) (0.79, 1.16) South­West 1.92 1.04 1.36 1.04 (1.64, 2.26) (0.86, 1.25) (1.19, 1.56) (0.86, 1.25) Mid­West 2.09 1.15 1.52 1.16 (1.71, 2.54) (0.93, 1.43) (1.29, 1.78) (0.93 1.43) Mid­East 1.17 1.15 1.18 1.15 (0.98, 1.41) (0.91, 1.42) (1.00, 1.38) (0.92, 1.43) Outdoor 3.04 1.24 1.93 1.77 (3.43, 2.71) (1.43, 1.07) (2.15, 1.74) (1.97, 1.59) Health 3.61 1.02 2.71 2.49 (4.61, 2.82) (1.58, 0.66) (3.51, 2.09) (3.24, 1.92) Skiing 1.36 (1.47, 1.25) Fishing 1.27 (1.38, 1.17) Number of observations 10,666 8,481 10,666 10,666 ­2 Log L 9,721 8,560 14,342 14,244

4.2 Comparison of different weights distribution for each weight is skewed to the right, least for the design weights, naturally. It is somewhat surprising that As already explained, we provided several weights. the skewness is the highest for the volunteer weights. More Table 4 gives a summary of these with descriptive statistics details about the weight distributions and the differences in order to explain the changes that occur after each between the weights are presented in Figures 1 to 3. adjustment operation. The design weights cannot be used in Figure 1 illustrates well how some weights have our comparisons since no data on variables Y are available increased substantially due to the response propensity for the initial sample. It is, however, illustrative to see that it modelling (Model 2b). It is possible to look in detail to see has the lowest relative variation measured here with 1 + cv 2 which types of units are under the plots with high weight in which cv is the coefficient of variation. This formula is increase. For example, behind the separate left­side plots also used as an approximation of the design effect (DEFF). with RP weights higher than 700 are persons who are not Rizzo et al. (1996) also use this indicator when comparing healthy and do not engage much in outdoor activities but their weights. are, nevertheless, still in the volunteer data file. Similarly, The changes are not dramatic in the first step, that is, we can find other interesting groups by using the results from design weights to first­phase basic weights (except for from the model estimations. However, the majority of the the average that is related to decreasing counts), but in the plots are in the same area and, consequently, less changes following two steps the DEFFs are higher. We also see that can be expected in the estimates than in the area with more the variation for both calibrated weights is lower than that substantial weight changes. for the respective response propensity­based weights. The

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 127

Table 4 Descriptive statistics for different sampling weights. RP = Response Propensity

Weight Phase Unit size Average Skewness 1 + cv 2 Design Weight Zero 12,658 308 0.94 1.30 (Calibrated) Basic Weight First 10,666 365 1.30 1.39 Calibrated Weight Volunteers 8,481 460 2.52 1.63 RP Weight, Model 2b Volunteers 8,481 460 4.60 1.82 Calibrated Weight Second 5,480 712 1.64 1.62 TwoStep RP Weight, Models 2b and 3b Second 5,480 712 3.60 1.84 OneStep RP Weight, Model 4b Second 5,480 712 2.56 1.80

3,000 The dispersion in the Figure 2 scatter is somewhat stronger than that in Figure 1 but the profile is similar. 2,500 Consequently, interesting sub­groups can be found behind 2,000 distinct plots. 1,500 Finally, Figure 3 compares the two alternative second RP_Weight 1,000 phase weights with each other. This scatter differs 500 considerably from the previous two, since the relationship is

0 rather linear. The maximum values of the two­step weights 0 500 1,000 1,500 2,000 2,500 3,000 are higher than those of the one­step weights, but the Cal_Weight weights of many one­step weights are, however, clearly Figure 1 Scatter plot between the two volunteer higher. For example, non­healthy people who, however, weights engage in outdoor activities receive relatively high one­step

4,000 weights but there is no clear age effect. On the other hand, 3,500 people with little outdoor activities in older age groups 3,000 receive relatively high two­step weights but health does not 2,500 relate to them. Nevertheless, it is not expected that there will 2,000 be big differences in respective estimates although one of 1,500

TwoStep_Weight 1,000 these two alternatives should have been introduced into use. 500 If this choice were a simpler one, that is, one­step weighting, 0 it would still be useful to analyse both steps and their 0 1,000 2,000 3,000 4,000 Cal_Weight response propensities separately in order to understand better the reasons for both types of missingness. Figure 2 Scatter plot for second phase respondents between the calibrated weight and the two­ step response propensity based weight 4.3 Comparison of parameter estimates We have not been able to make complete simulation 4,000 studies with different assumptions in order to analyse which 3,500 type of method would be best in each particular case. 3,000 2,500 Fortunately, we can get quite close to this by comparing the 2,000 effects on the estimates from three different perspectives. 1,500 First, we have prepared the response/voluntariness models OneStep_Weight 1,000 by using both X and some Y1 variables. Consequently, we 500 know the ‘best’ parameter estimates based on these Y1 0 0 1,000 2,000 3,000 4,000 values from the first survey. Second, we add auxiliary TwoStep_Weight variables Y1 in the model but exclude some Y1 values from Figure 3 Scatter plot for second phase respondents them. However, we know the ‘best’ values in these cases between the two alternative response and thus can make exact comparisons. Third, we can propensity modelling weights compare some estimates that are not known in any way. In

Statistics Canada, Catalogue No. 12­001­X 128 Laaksonen: Weighting for two­phase surveyed data

this last case, we can only deduce which values might be the 6 best. 5 We present our explicit results based on the variables 4 Volunteers described in Section 2. Note that we do not consider it 3 Second­Phase important to present standard errors for each estimate 2 Respondents because we are concentrating on the biases in these 1 estimates. However, it is good to notice that the standard 0 errors are around 0.2­0.4 percentage points for the first­ Skiing Health Biking Fishing Skating Boating Jogging phase data set and around 0.3­0.5 percentage points for the Outdoor second­phase data set (lowest always in Health, second in Jogging, and highest in Outdoor and Skiing). Figure 4 Bias in estimates in percentage points based on unadjusted sampling weights for second­ Figure 4 gives the results based on the weights without phase respondents and for volunteers using any adjustment (that is often the case in practice, unfortunately). We see that the bias is substantial in most estimates, lowest in Jogging, which was not very actively practised when compared to Outdoor, for example. In 6 general, most users are unhappy with such big biases that 5 are statistically significant and highly significant except for 4 Initial Jogging in the second phase (e.g., the 95% confidence 3 Post­Stratified Response Propensity interval of the bias for Health is from 1.7 points to 2.3 2 points). Here, as in later results, the bias means over­ 1 estimation so that while missingness increases the estimate 0 Skiing becomes too high. The results without good adjustments Health Biking Fishing Skating Boating Jogging will be too optimistic, that is, people seem to do ‘too much’ Outdoor of all exercises. Note that the same tendency is obviously Figure 5 Bias in estimates in percentage points for also in the first­phase estimates but we cannot justify this. volunteers based on unadjusted sampling There are surprising differences between those two weights (symbol = ‘Initial’), post­stratified estimates, sometimes the ‘volunteer’ data give a more calibration and response propensity method in which variables Outdoor and biased result, sometimes it takes the second­phase respon­ Health have been used as auxiliary dents data. We do not interpret these in detail but naturally variables (Model 2b in Table 3) they reflect differences in missingness, and can be considered to be warnings for a user. For comparison, we show again in Figure 5 the same unadjusted results for volunteers as in Figure 4 but we have 6 added the corresponding estimates based on post­stratified 5 calibration and response propensity modelling. This graph 4 Post­Stratified 3 Two­Step clearly shows that post­stratification gives some benefit One­Step compared to the unadjusted solution. However, the response 2 1 propensity method is the best in each case, and extremely 0 good for Health and Outdoor that have been used as Skiing Health Biking Fishing Skating Boating Jogging auxiliary variables in the supported models. Outdoor Figures 6 and 7 concern the final­step estimates and are thus the most important. Figure 6 shows the same Figure 6 Bias in estimates in percentage points for conclusion as Figure 5 in the sense that the response respondents after both steps based on post­ propensity technique is superior to post­stratified calibration stratification, and the two response propensity methods, i.e., ‘Two­step’ and although all differences are not statistically highly ‘One­step’ so that variables Outdoor and significant (especially Jogging). The difference between the Health have been used as auxiliary one­step method and the two­step method is fairly small and variables. The two­step method is based on the bias varies from one variable to the next. Hence, basing the two consecutive models (Models 2b and 3b in Table 3) whereas the one­step method on this study, we cannot say which of these two has been constructed direct to the second­ specifications is better. phase respondents (Model 4b in Table 3)

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 129

Figure 7 presents some comparisons when the two new Similarly to the ESS, this study concerns two­phase variables have been added to the response propensity model. surveyed data. The response rate in the second survey was The results are quite predictable since this reduces the bias substantially lower than in the ESS. The effect from in these estimates and in all other estimates to some extent selectiveness is also higher. Using the response propensity as well. The bias is still too large in Boating and Biking in models we predicted this selectiveness and exploited the the opinion of many users, I suppose. We can reduce this results in weighting adjustments, and as the final step we bias, naturally, by adding new auxiliary variables to the calibrated the sums of the weights to correspond to certain model. How far could we go in this? This has not been known population aggregates. This strategy aims at making examined further in this study. On the other hand, we have the most of all available auxiliary information, derived both worse tools for reducing bias in such variables that have from registers and other external sources, and also of the been based on the second survey only. We tested several previous phase of the survey at the micro level. such estimates and observed some changes in corresponding In our example, the second phase of the survey estimates, being of the same level as in the cases of Boating comprised two different steps but only one data collection. and Biking in Figure 7. In this case, however, we cannot The first step concerned willingness to participate check the bias. We can only believe basing on our previous voluntarily in the second phase of the survey, and the exercises that these results are less biased than those based second step the actual survey participation of these on more poorly adjusted ones. volunteers, respectively. We examined both steps separately and found interesting information on their response mechanisms. Moreover, we used the results from this 6 analysis for reweighting adjustments. For the sake of 5 comparison, we looked at these both steps in one occasion 4 Post­Stratified and built a respective model, and continued the reweighting 3 Model 4b Model 4c analogously. Finally, we compared the estimates. It was 2 somewhat surprising that the two results differed quite little 1 in our examples. This is, on the other hand, a good point, 0 since it is easier to work with one step, and hence this could Skiing Health Biking Fishing Skating Boating Jogging

Outdoor be introduced into use. We thus propose a certain methodology for two­phase Figure 7 Bias in estimates in percentage points for sampling weighting, but cannot say definitely which respondents after both steps based on specification would be the best in each particular case. Our post­stratification, and the two ‘One­step’ methodology is quite easy to exploit, but the advantages response propensity methods so that variables Skiing and Fishing have also from it depend naturally largely on the availability of good been used as auxiliary variables (Model 4c external and internal auxiliary data. If no direct auxiliary in Table 3). These are compared to those variables are available, it will not be clear how good the based on Model 4b adjusted estimates will be. Our examples show that these will be easily less biased than the initial ones. However, our 5. Discussion recommended technique seems to be somewhat conservative so that all the best adjusted estimates in our The problem discussed in this paper is common in analysis are slightly overestimated although not statistically surveys. There are many surveys which are conducted in significantly. This is an interesting question for future more than one step, and some inconsistencies have occurred research that is still needed especially because this problem between these surveys due to missingness and other is becoming more common in the survey world. Another discrepancies. An internationally well­known example is interesting topic for future research is how to make an the European Social Survey (ESS) that includes two optimal choice of auxiliary variables in the two­phase questionnaires, a core one and a supplementary one. The survey setting. number of respondents is naturally smaller for the latter than for the former. This leads to some selectiveness, for Acknowledgements example, responding to the second questionnaire being positively associated to political activity. This is awkward The author would like to thank the editor and the from the user’s point of view because an estimate based on a anonymous referees for their precise and helpful comments. larger dataset differs from that based on a smaller one, although both concern the same variable and time period.

Statistics Canada, Catalogue No. 12­001­X 130 Laaksonen: Weighting for two­phase surveyed data

References Laaksonen, S. (2006b). Need for high quality auxiliary data service for improving the quality of editing and imputation. In United Deville, J.­C., Särndal, C.­E. and Sautory, O. (1993). Generalized Nations Statistical Commission, “Statistical Data Editing”, 3, 334­ raking procedures in survey sampling. Journal of the American 344. Statistical Association, 88, 1013­1020. Laaksonen, S., and Chambers, R. (2006). Survey estimation under Duncan, K.B., and Stasny, E.A. (2001). Using propensity scores to informative non­response with follow­up. Journal of Official control coverage bias in telephone surveys. Survey Methodology, Statistics, 81­95. 27, 2, 121­130. Little, R.J.A. (1986). Survey nonresponse adjustments for estimates Dupont, F. (1995). Alternative adjustments where there are several of means. International Statistical Review, 54, 139­157. levels of auxiliary information. Survey Methodology, 21, 125­135. Lundström, S., and Särndal, C.­E. (2001). Estimation in the presence Ekholm, A., and Laaksonen, S. (1991). Weighting via response of nonresponse and frame imperfections. Statistics Sweden. modelling in the Finnish household budget survey. Journal of Official Statistics, 7, 2, 325­337. Rizzo, L., Kalton, G. and Brick, J.M. (1996). A comparison of some weighting adjustment methods for panel nonresponse. Survey Fuller, W.A., Loughin, M.M. and Baker, H. (1994). Regression Methodology, 22, 43­53. weighting in the presence of nonresponse with application to the 1987­1988 Nationwide Food Consumption Survey. Survey Särndal, C.­E., Swensson, B. and Wretman, J. (1992). Model Assisted Methodology, 20, 75­85. Survey Sampling. Springer.

Kalton, G., and Kasprzyk, D. (1986). The treatment of missing survey Virtanen, V., Pouta, E., Sievänen, T. and Laaksonen, S. (2001). data. Survey Methodology, 12, 1­16. Luonnon virkistyskäytön kysyntätutkimuksen aineistot ja menetelmät. (Data and methods of survey on recreational use of Laaksonen, S. (1999). Weighting and auxiliary variables in sample nature). In Luonnon Virkistyskäyttö (Recreational use of nature). surveys. In “Enquêtes et Sondages. Méthodes, modèles, Finnish Forest Research Institute, 802. applications, nouvelles approches” (Eds., G. Brossier and A.­M. Dussaix). Dunod, Paris. 168­180. Wu, C., and Sitter, R.R. (2001). A model­calibration approach to using complete auxiliary information from survey data. Journal of Laaksonen, S. (2006a). Does the choice of link function matter in the American Statistical Association, 96, 185­193. response propensity modelling? Model Assisted Statistics and Applications, An International Journal. Publisher: IOS Press. 1, 95­100.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 131 Vol. 33, No. 2, pp. 131­137 Statistics Canada, Catalogue No. 12­001­X

Weighting in rotating samples: The SILC survey in France

Pascal Ardilly and Pierre Lavallée 1

Abstract The European Union’s Statistics on Income and Living Conditions (SILC) survey was introduced in 2004 as a replacement for the European Panel. It produces annual statistics on income distribution, poverty and social exclusion. First conducted in France in May 2004, it is a longitudinal survey of all individuals over the age of 15 in 16,000 dwellings selected from the master sample and the new­housing sample frame. All respondents are tracked over time, even when they move to a different dwelling. The survey also has to produce cross­sectional estimates of good quality. To limit the response burden, the sample design recommended by Eurostat is a rotation scheme consisting of four panels that remain in the sample for four years, with one panel replaced each year. France, however, decided to increase the panel duration to nine years. The rotating sample design meets the survey’s longitudinal and cross­sectional requirements, but it presents some weighting challenges. Following a review of the inference context of a longitudinal survey, the paper discusses the longitudinal and cross­sectional weighting, which are designed to produce approximately unbiased estimators.

Key Words: Longitudinal survey; Panel; Weight share method; Longitudinal weighting; Cross­sectional weighting.

1. Introduction and the information and automated data processing system for dwelling and office space (SITADEL) respectively (see Statistics on Income and Living Conditions (SILC) is a Ardilly 2006). European survey that produces data on the income and Each incoming panel includes all individuals living in the living conditions of persons living in regular households selected dwellings. Surveying all members of the (persons living in communal households are excluded). It households living in the selected dwellings makes it was introduced in 2004 as a replacement for the European possible to produce both individual­level and household­ Panel. While it is a European Union (EU) survey and level estimates and helps keep collection costs down by therefore under Eurostat responsibility, it is conducted inde­ maximizing the number of individuals reached in each pendently in each EU member state. Hence, the member contact. On the other hand, some of the estimates are states ­ France in this case ­ are free to adjust the sample narrower in scope, applying only to the population aged 16 design suggested by Eurostat to meet their national and over on December 31 of the survey year. requirements. The data are also processed by the individual Each year, one subsample is rotated out and replaced member states, as is usually the case for Eurostat surveys in with another subsample. In the survey’s starting year, 2004, the EU. This article deals only with the SILC survey each subsample consisted of 1,780 dwellings (give or take a conducted in France, but it may also be of interest to other few units because of rounding). In the second and EU member states. subsequent years (i.e., from 2005 on), the size of the year’s SILC is a longitudinal survey conducted once a year in incoming subsample was 3,000 dwellings. Note that at the May. It focuses on individuals rather than households, and outset in 2004, the sample was 16,000 dwellings, divided data are collected through personal interviews with every into nine equal parts. One of those parts was surveyed only person in the sampled dwellings. SILC can be thought of as once (in 2004), another twice (2004 and 2005), a third three the European version of the Statistics Canada’s Survey of times (2004, 2005 and 2006), and so on. After the start­up Labour and Income Dynamics (SLID) (see Lavallée 1995, phase, a given panel will be surveyed for nine consecutive and Lévesque and Franklin 2000). years. During the start­up phase, which will end in 2012 The SILC sample is rotating: each year, it is formed by with the departure of the ninth and last subsample from the combining nine panel subsamples selected under identical 2004 selection, the subsamples will have been surveyed steady­state conditions, partly from the master sample and fewer than nine times. partly from the new­housing sample frame. The master The sampling procedure itself is the standard method of sample and the new­housing sample frame are two dwelling selecting units from the master sample and the new­ frames constructed from the French census of population dwelling sample frame (see Ardilly 2006). In this case, no

1. Pascal Ardilly, Sampling and statistical data division, INSEE, Paris, France. E­mail: [email protected]; Pierre Lavallée, Social Survey Methods Division, Statistics Canada, Ottawa, Ontario, K1A 0T6 Canada. E­mail: [email protected]. 132 Ardilly and Lavallée: Weighting in rotational samples: The SILC survey in France category of individuals is overrepresented. The survey has a interest measured for individual i. When we look at change, * uniform sampling fraction ­ ignoring rounding ­ except for we may want to estimate the difference ∆ ,+ 1 tt between the vacant rural dwellings and dwellings that were secondary total Y +1 t at + 1 t over Ω +1 t and the total Yt at t over Ω t , * residences in the 1999 census and became principal that is, ∆t, t+ 1 =YY t + 1 − t . This is a cross­sectional view. residences by survey date, which are traditionally under­ Alternatively, we may want to estimate the difference ∆ ,+ 1 tt represented. between the totals for the units that are common to

Under the collection process, each subsample is populations Ω +1 t and Ω t , where the size difference considered a true panel of individuals. Panel members who between the two populations is due to their incoming units move are tracked, and their files are sent to the appropriate (births) and outgoing units (deaths). This is a longitudinal regional branch of INSEE. More details on SILC’s sample view. Let Ωt, t+ 1 = Ω t ∩ Ω t +1 , the population that is design are available in the November 17, 2003, issue of the common to t and + 1. t Then ∆ ,+ 1 tt is defined as ∆,+ 1 tt = Official Journal of the European Union and internal INSEE ∑ (YYt+1 − t ). i ∈Ω t, t + 1 i i documents describing sampling practices in France. The two approaches are illustrated in the diagrams Since SILC is a longitudinal survey with panels that below. The upper rectangle represents the entire population overlap in time, weighting the sample presents a special at time t, and the lower one represents the entire population problem. This paper provides a detailed picture of the two at time + 1. t The “minus” side represents deaths in the types of weighting used for SILC. We will begin by broad sense (persons who have died, emigrated, moved to a discussing some general principles related to SILC’s sample communal household, and so on), and the “plus” side design. Then we will examine longitudinal weighting, represents births in the broad sense (newborns, new followed by cross­sectional weighting. immigrants, persons who have become part of the survey Note that we will not consider the topics of non­response population by passing an age threshold, and so on). The correction and estimate adjustment. Those issues are dealt grey portion represents the inference population on each with in the same way as they are generally for any other date. longitudinal survey, such as the SLID (see Lavallée 1995, and Lévesque and Franklin 2000). 2.2 Surveys repeated over time and potential strategies The goal, of course, is to produce both longitudinal 2. General principles estimates and cross­sectional estimates. There are 2.1 Two approaches: The longitudinal view and the essentially three possible strategies: cross­sectional view 1. An “independent” sampling each year. In fact, Each year, we have a sample of fully panelized because we have a master sample and a new­ individuals, eight ninths of whom were interviewed at least housing sample frame, the panels are selected from once in previous years (barring non­response). the same localities each year, and as a result, the Two types of parameters may be of interest: annual totals subsamples are not truly independent. This solution

Yt (or their satellites), and changes in totals ∆ ,+ 1 tt between can be improved for estimating changes. two years, consecutive or otherwise. For simplicity, we will 2. A fully panelized sampling, i.e., initial selection of confine ourselves to differences in totals between two a sample that is surveyed each year. This scenario consecutive years. When discussing changes, we have to be presents a response burden problem, since the clear about the inference populations involved. We can look SILC survey is to continue indefinitely. It is at the data in two different ways: either as populations that therefore unrealistic. change over time ­ the cross­sectional approach ­ or as a fixed population ­ the longitudinal approach. If we let Ω t be 3. A rotational sample. This is the scenario that was the entire in­scope population in year t, the annual total for chosen, because of its advantages in satisfying both year t is given by YY= ∑ t , where Y t is a variable of longitudinal and cross­sectional goals. t i ∈Ω t i i

­ ­ + +

LONGITUDINAL CROSS­SECTIONAL

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 133

The table below characterizes the three potential sample years t and t + 1, we have to deal with eight different panels, designs in terms of the two desired approaches. each selected from a different population (a population made up of individuals who are, naturally, different from Sample TYPE CROSS­SECTIONAL LONGITUDINAL year to year). If we were dealing with just one panel, we approach approach “Independent” each year CUSTOMARY POSSIBLE but less efficient would only need to use the sampling weights associated Panel IMPOSSIBLE without a CUSTOMARY with the panel members who were still in­scope on date t, top­up sample since those weights are calculated once and for all at the Rotational POSSIBLE POSSIBLE time of selection and can be used to make inferences about the initial population each year throughout the panel’s life. The rotation strategy has four major advantages: The essential difficulty is to represent population Ω t on i. It reduces the sampling error associated with date t using eight panel subsamples selected on different measuring change (in principle, as do panels, dates and therefore from different populations. Intuitively, it though it is theoretically less efficient than a “pure” makes sense that a given individual would have a panel). probability of selection on date t that would depend on the number of panel subsamples for which he or she could be ii. It has a smaller response burden than a “pure” chosen. For this discussion, it is assumed that there is no panel. Under the circumstances, since France has a non­response. This situation can be expressed formally by nine­year panel, this argument must be used with letting a = a panel subsample to be surveyed in year t for restraint. It is more persuasive in the scenario , tk the kth time, and s= ∪ 8 a . recommended by Eurostat, which consists of an t, t+ 1 k =1 t, k

annual survey for four consecutive years. Note that we can write at+1, k + 1= a t , k ( ∀ t , ∀ k ≠ 9) since we are obliged to use each (non­outgoing) panel subsample in iii. It takes into account very “naturally” how the its entirety year after year. This is pictured below. population changes over time. This point will become clearer when we look at the coverage of a a a a a a a a a new populations. t ,9 t ,8 t ,7 t ,6 t ,5 t ,4 t ,3 t ,2 t ,1 t iv. It reduces observation errors (as do panels). a a a a a a a a a + 1 t ,8 t ,7 t ,6 t ,5 t ,4 t ,3 t ,2 t ,1 t +1,1 t On the other hand, the strategy also has at least three weaknesses:

i. Participants have to be tracked over time, which TIME results in tracing costs and non­response due to moves. The grey part represents s ,+ 1 tt , which is the sample used ii. The length of the individual series is limited to nine in this longitudinal approach. It is from the individuals in t +1 t years, which is substantial, though not as s ,+ 1 tt that we obtain both Yi and Yi , i.e., information informative as a pure panel. about individual i on dates t and t + 1 respectively. Suppose we have an individual i in Ω who is in­scope iii. The longitudinal/cross­sectional weighting method t on date t. We denote as L the number of years in is not straightforward. i {t− 7, t − 6, ..., t − 1, t } during which individual i was in­ scope and therefore had a chance of being selected as a 3. Longitudinal weighting member of an incoming panel. It is assumed here that each year, the sample frame covers the survey population exactly.

This type of weighting is inherently somewhat easier to We have Li ∈ {1, 2, 3,… , 8}. In addition, we denote as K i understand than cross­sectional weighting because there is the set of k indexes out of 1, 2, 3, …, 8 for which i∈ at , k . no need to take account of how the population changes over These are the numbers of the panels of which individual i is time (except for “deaths”, units that leave the survey a member on date t. For all i in st, t+ 1 , K i will be construed population over time, but they are not much of a technical as a set containing at least one element. Most of the time, problem). The idea behind longitudinal estimation, of K i will in fact have only one index, but in some cases, it course, is to make an inference based on a single population will have two or even more indexes. That will be the case if at an initial date. i is selected for a panel, he/she moves and his/her new Clearly, the rotational nature of the sample design is what dwelling is chosen for another panel in a subsequent year. makes weighting difficult, since between two consecutive Note that our scenario excludes the possibility of selecting a

Statistics Canada, Catalogue No. 12­001­X 134 Ardilly and Lavallée: Weighting in rotational samples: The SILC survey in France given dwelling twice, since dwellings from the master Equation (1) is the most general formula for the “raw” sample and the new­housing sample frame are not supposed longitudinal weight. It can then be simplified for specific to be surveyed more than once. This is just a practical situations. For example, if we ignore the cases in which a convention, however, as the theory can easily accommodate panel member can be selected more than once, we have a system in which dwellings can be selected multiple times. W i∈ a , W(,) t k ,+ 1 tt i If t, k let i be his/her “raw” sampling weight. W i = (2) In fact, it is the sampling weight of the dwelling in which i Li was living at the time he/she was chosen as a panel member, where Wi is the weight of i relative to the one panel i.e., at the time of the annual selection from Ω t−+ k 1 . This weighting system allows direct inference from subsample subsample of which he/she was a member on date t. In a to the entire population Ω . In particular, France’s case, because of the sample sizes involved, it , tk t−+ k 1 seems quite appropriate to use that equation. If we assume ∑ W(,) t k provides an unbiased estimate of the total i∈ a t, k i number of in­scope individuals who are members of that we are in the ideal position ­ though that seems population Ω . For SILC in France, that total is roughly simplistic in our circumstances ­ of having a population that t−+ k 1 does not change over time, we will have L = 8 for all i. 60 million. The longitudinal weight assigned to each i individual i in s will therefore be as follows: The population changes a great deal in nine years, but with ,+ 1 tt shorter panel lives, the ideal case might be an acceptable 1 W t, t + 1 = W( t , k ). (1) approximation. Moreover, if the panels are selected with i L ∑ i i k∈ K i equal probabilities, Wi will be equal to a constant W and we will have This equation is derived from the application of the weight share method (see Lavallée 1995, and Lavallée t, t + 1 W Wi = . (3) 2002) in which the initial population (the population of 8 sampling units) is defined as the union of the populations Such a scenario is highly improbable in France’s case. ΩΩΩ, ..., , and the final population (the population t−7t− 1 t First, up to 2012, the sample will contain subsamples with of observation units) as Ω . This is illustrated in the t very different raw weights. Second, the sampling process is diagram below; for greater clarity, only three of the initial likely to focus on generating a predetermined number of subpopulations are shown. Clearly, the number of links is dwellings (as the total number of dwellings increases), and equal to Li (in this case, i has exactly eight links, while j not a constant sampling fraction. must have fewer than eight because it does not appear in the Note that equation (3) is intuitive. Ultimately, everything oldest sample frames). In practice, it is realistic to proceed proceeds “as if” any individual in the longitudinal sample as if Ω ⊂ Ω ⊂… ⊂ Ω ⊂ Ω . We can work with t−−7 t 6 t−1 t s had a selection probability eight times the selection nested populations, since all individuals who leave the ,+ 1 tt probability of each panel subsample s ,+ 1 tt that is part of. survey population in the time before t will not be part of The foregoing applies to the survey in its steady state and s . t, t + 1 must be adapted slightly during the start­up phase, i.e., until 2012. The first longitudinal operation is performed on the X X combined 2004­2005 data, to estimate the changes between X X Ω t − 7 i 2004 and 2005 with the 2004 reference population (from which the “deaths” are removed in 2005). In this case, we

need only to divide all the weights Wi of the eight

subsamples a2004,1 to a2004,8 by 8; in other words, Li = 8 X X for all i. In 2006, when we look at the 2005­2006 changes, X X X X the denominator Li may take only two values. In the first X X i i Ω t Ω t − 3 X X scenario, panel member i was in the sample frame used in j X X j X 2004 (and hence could have been selected in 2004) and so X Li = 8. This is due to the fact that everything proceeds as if, in 2004, the seven selection processes for panels a2005,2 to X X a2005,8 had been carried out under exactly the same X i X conditions. In the second scenario, individual i was not in Ω X t j X the 2004 sample frame ­ but is in the 2005 frame and is X X necessarily in a2005,1 ­ and Li = 1. For the 2006­2007 changes, Li can be equal to 1, 2 or 8, and so on. We will not

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 135 have the set of all possible values of Li in {1, 2, 3, …, 8} weakness in births coverage is generally regarded as very until we measure the 2011­2012 changes. minor because it is partially corrected with adjustments. Once we reach this point in the longitudinal weighting The main technique used to produce cross­sectional ,+ 1 tt process, we can calculate the longitudinal weights Wi weights is the weight share method (Lavallée 2002). As and then derive the estimator of the difference ∆ ,+ 1 tt using noted previously, in year t we have nine panel subsamples a(1≤≤ k 9). We will describe two different ways of ∆ˆ = WYYt, t+ 1.( t + 1 − t ). (4) t, k t, t+ 1 ∑ i i i using the weight share method. The information that must s t, t + 1 be collected in the questionnaire is the same for both ,+ 1 tt Logically, the weights Wi are used only to estimate methods. change. They are of no value for point estimates because the 4.1 Method 1 inference population has little meaning on a particular date. ,+ 1 tt The more rigorous approach involves linking all nine Note that up to this point, the Wi have not been corrected for non­response or adjusted in any other way. In subsamples a , tk to the cross­sectional sample for year t, practice, equation (4) will be subject to adjustments in the which we will denote u %t (Merkouris 2001). In other words, 9 case of the SILC survey. the sample u %t is the same as st, t = ∪ k =1 at , k . First, we must * determine the links associated with this approach. When a Estimation of the difference ∆t, t+ 1 =YY t + 1 − t is a cross­ sectional matter and therefore involves the weighting panel member in one of the nine subsamples a , tk is process described in the next section. selected, he/she points to himself/herself as a member of the cross­sectional sample at t (similar to what is shown in the diagram in 3.1). Under these conditions, when the survey is (1) t 4. Cross­sectional weighting in steady­state mode, the cross­sectional weight Wi of an individual i in u %t is calculated as shown below. The The aim is to make an inference about the total in­scope household of which i is a member is denoted m. We have population Ω t on the current date t. The essential difficulty lies in the fact that in theory, a given (panelized) subsample 9 provides adequate coverage of the population only in the ∑ ∑ Wj ( t , k ) k=1 j ∈ m year in which it was selected. After that year, the panel j∈ a t, k W t (1) = (5) subsample no longer represents the new population of i 9 “births”, the units that become in­scope. That is the case for ∑ ∑ 1 k=1 j ∈ m newborns, immigrants, individuals who reach specific age j ∈ Ω t− k + 1 thresholds, homeless people who start living in a regular dwelling, people who leave communal dwellings, and so on. where Wj (,) t k is the sampling weight from sample at , k . While in practice we might consider this coverage defect This expression shows that all members of the same acceptable for a period of time, it very quickly becomes a household ultimately have the same weight. In the serious problem (that is true each year for most panel numerator, we have the sum of all the “raw” weights (the subsamples), and a top­up sample must be obtained in some sampling weights) of the household’s panel members. It is fashion. It is worth noting that the problem of population understood that a panel member generally appears in only change over time is highly dissymmetrical, since the one subsample, but that there may be cases in which a panel subpopulation that disappears from year to year (the member is selected two or more times over a period of nine “deaths”) presents no particular difficulties for weighting. consecutive years (usually because he/she has moved). Note In the SILC survey, the top­up sample is obtained as that dwellings selected from the master sample and the new­ follows. We survey all individuals in the household of each housing sample frame are not supposed to be selected again panel member interviewed in the longitudinal tracking and therefore, in the case of SILC, the probability that an process. Thus, every household surveyed in the cross­ individual who has not moved will appear in two different sectional process is made up of two types of people: panel panels is zero. members and cohabitants (people who are surveyed but are As in the longitudinal case (see 3.1), weighting can be not panel members). This method covers a large portion of carried out only if the data management system is capable of the “births” (in the broad sense) in the population. However, linking each panel member in u %t to all panel samples a , tk in it does not cover households consisting entirely of “births”, which he/she is included. In the denominator, for each of the such as households of new immigrants. “Birth” status is nine years t – 8 to t considered, we count the household usually determined by asking the birth date of newborns and members (both panel members and cohabitants) who are in the landing date of immigrants. Moreover, in practice, the the sample frame from which the incoming panel subsample

Statistics Canada, Catalogue No. 12­001­X 136 Ardilly and Lavallée: Weighting in rotational samples: The SILC survey in France for the year in question is selected. This calculation clearly for any individual i in household m. It is very easy to verify requires the information provided by the questionnaire. that if k = 1 (which is the case for the incoming subsample), % There are two advantages to this approach: it is Wi ( t ,1) is the sampling weight of household m. completely general, and it produces unbiased cross­sectional The problem with this approach lies in the existence weights directly because every cross­sectional household is (a priori) on date t of individuals who cannot be surveyed necessarily linked to one of the nine subsamples involved. because they belong to households that cannot be “reached”

The fact that there is an incoming subsample each year through the sampling a , tk (as long as ≥ 2 k ), i.e., ensures the completeness of the cross­sectional population individuals whose probability of being surveyed at t is zero.

Ω t ; that is, in more technical terms, it ensures that there is This impediment did not exist in the previous method at least one link for each household considered at t. This is a because taking all the subsamples into account at once useful property of rotational sampling, as discussed in ensured that on date t, every household had a non­zero section 2.2. On the other hand, the weighting formula has a probability of being selected, at least through at ,1 . This disadvantage, which is its (relative) complexity both in illustrates once again one of the key advantages of rotational theoretical terms and for computer programming purposes. sampling, which is that it covers the entire population each

In the start­up phase (up to and including 2011), the year. In our approach, it is clear that if we consider a , tk for formula must be adjusted. The numerator remains the same, ≥ 2, k the population of households consisting exclusively but the denominator covers all individuals who could be of “immigrants” (in the broad sense) between t – k + 1 and t sampled in 2004 (the survey’s first year) and subsequent is not covered. To formalize the situation and produce the immig years. In 2004, weighting is trivial since there is no weight final cross­sectional weight, we will use Ω α ,t to denote share, but in 2005, we have the population of “immigrants” (in the broad sense) on date 9 t in households that consist only of immigrants who can be ∑ ∑ Wj ( t , k ) sampled after year α , with t−≤8 α ≤ t − 1. To be more k=1 j ∈ m j∈ a t, k precise, we should say “who can be sampled on or after a W t (1) = . (6) i ∑1 + 8 ⋅  ∑ 1  date that is strictly subsequent to the collection date in year j∈ m  j∈ m  α.” j∈ Ω  j ∈ Ω  2005  2004  On date t, the entire population Ω t is partitioned into In 2006, the formula will be immig nine components: the eight subpopulations Ω α , t , with α 9 ranging from t – 8 to t – 1, and the subpopulation consisting ∑ ∑ Wj ( t , k ) k=1 j ∈ m of individuals who either were already surveyable at t – 8 or j∈ a t (1) t, k became surveyable on a date subsequent to t – 8 (i.e., who Wi = . (7) ∑1 +  ∑ 1  + 7 ⋅  ∑ 1  immigrated after t – 8) but at t are members of a household j∈ m  j∈ m  j∈ m  containing at least one person who is surveyable at t – 8. We j∈ Ω2006  j∈ Ω2005  j ∈ Ω 2004  consider that if the household at t contains at least one 4.2 Method 2 person who is surveyable at t – 8, that will be the case on any date between t – 8 and t – 1. This ignores situations in It is possible to take an alternative approach to cross­ which an individual who is in­scope on a given date sectional weighting one that leads to a “slightly” simpler becomes out­of­scope for a time (as a result of emigration, equation and is easier to program, but one that presents a for example) and then becomes in­scope again. difficulty that was not present in the previous method and Next, we use u % to denote the cross­sectional sample at may make the final weights somewhat less precise. The idea , tk t from panel a , which leads to ∪ 9 u%%= u . Let Y immig is to use one subsample at a time rather than all of them at t, k k =1 t, k t α ,t be the total of the Y t defined on Ω immig . Following the once. We take one of the nine subsamples a and the i α , t , tk weight share performed for all = 2, ..., 9 k , we have sample of households to which it leads. We then apply the weight share, which when the survey is in steady­state mode t −1 yields an individual weight equal to  % t  t immig E∑ Wj (,) t k ⋅ Yj = ∑ Yj − ∑ Y α , t  j∈ u%  Ω α =t − k +1  t, k  t ∑ Wj ( t , k ) j∈ m j∈ a t −1 % t, k immig Wi ( t , k ) = (8) (9) 1 =YYt − ∑ α , t ∑ α =t − k +1 j∈ m j ∈ Ω t− k +1

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 137 and weight share divided by 2; and the weights of all other individuals will be the weights from the weight share E W% ( t ,1) ⋅ Yt  = Yt = Y (10) ∑j j ∑ j t divided by 9.  j∈ u %  Ω  t,1  t This procedure can be carried out for one subsample after another and does not have to take account of what happens since u%t ,1= a t ,1 . If we were using shorter­duration panels, we might be in other subsamples. If an individual is surveyed at t through immig two (or more) different subsamples a , we carry out the able to ignore the Yα ,t and take the actual total over Ω t . , tk In that case, the “raw” final cross­sectional weight of any full procedure for each of the two (or more) subsamples. % This could occur, for example, in the case of a household individual i would be Wi ( t , k ) / 9 if i is from at , k , which would yield the final estimator composed of two panel members from two different subsamples a , tk who married each other and before their 1 9 Yˆ = W% ( t , k ) ⋅ Y t marriage were each tracked separately as one­person t 9 ∑ ∑ i i k=1 i ∈ u % t, k households. In that scenario, each individual would be “formally” surveyed twice, once as a panel member and

1 t once as a cohabitant. = W% ( t , k )⋅ Y . (11) * ∑ i i Finally, to estimate the difference ∆ = YY− , we 9 i∈ u % ,+ 1 tt t+1 t t (1) t can use the weights Wi from method 1 and calculate However, since the panels used in France have long ˆ * t+1(1) t + 1 t(1) t lives, we will probably not be able to ignore the Y immig (an ∆t, t+ 1 = ∑WYWYi i − ∑ i i . (13) α ,t i∈ u% i∈ u % analysis of the collection files will provide the answer), t+1 t (2) t which will mean having to compute specific weights for the Alternatively, we can use the weights Wi from method immig * individuals in Ω α ,t . In those circumstances, we check that 2. In that case, the estimator of the difference ∆ ,+ 1 tt will be immig any individual i in Ω α ,t who ends up in the cross­ given by sectional sample u % will have a raw cross­sectional weigh t ˆ * t+1(2) t + 1 t(2) t (2) t % ∆ˆ = WYWY− . (14) Wi equal to the weight share value Wi (,) t k divided by t, t+ 1 ∑i i ∑ i i i∈ u%t+ 1 i∈ u % t t − α (and therefore 1≤− α ≤ 8). t Any individual in Ω t immig who does not belong to any of the Ω α ,t (i.e., the vast majority of individuals) will have a final weight of % immig References Wi ( t , k ) / 9. Note that if i is in Ω α , t , he/she can be nd surveyed only through at,1,,,. a t ,2… a t , t −α Thus we have Ardilly, P. (2006). Les techniques de sondage, 2 edition. Editions Technip, Paris. ~  immig Lavallée, P. (1995). Cross­sectional weighting of longitudinal surveys  i − α if ) /( ) , ( i t k t W ∈ Ω α , t W t ) 2 ( =  (12) of individuals and households using the weight share method. i ~ Survey Methodology, 21, 25­32.  i k t W 9 / ) , ( otherwise Lavallée, P. (2002). Le sondage indirect, ou la méthode généralisée In the start­up phase, the weighting process has to be du partage des poids. Éditions de l’Université de Bruxelles et adjusted. In 2005, the final cross­sectional weight of Editions Ellipses. immig individuals in Ω 2004, 2005 will come directly from the Lévesque, I., and Franklin, S. (2000). Longitudinal and Cross­ selection of the dwelling from a (they can only be Sectional Weighting of the Survey of Labour and Income 2005,1 Dynamics, 1997 Reference year. Research document on income, reached through this incoming panel). In contrast, all other Statistics Canada, Catalogue No. 75F0002MIE­00004, june 2000. individuals can be surveyed “normally” in the nine panels Merkouris, T. (2001). Cross­sectional estimation in multiple­panel a2005, k (1≤≤ 9), k so that their weights as calculated by the household surveys. Survey Methodology, 27, 171­181. weight share method will all be divided by 9. In 2006, the immig Särndal, C.­E., Swensson, B. and Wretman, J. (1992). Model Assisted weights of the individuals in Ω 2005,2006 will be equal to the Survey Sampling. New York: Springer­Verlag. weight of the dwelling in which they live, a weight that Singh, M.P., Drew, J.D., Gambino, J.G. and Mayda, F. (1990). directly reflects the sampling from a 1 , 2006 ; the weights of Méthodologie de l’Enquête sur la population active. Statistics immig Canada, Catalogue 71­526. the individuals in Ω 2004,2006 will be the weights from the

Statistics Canada, Catalogue No. 12­001­X

Survey Methodology, December 2007 139 Vol. 33, No. 2, pp. 139­150 Statistics Canada, Catalogue No. 12­001­X

Cell collapsing in poststratification

Jay J. Kim, Jianzhu Li and Richard Valliant 1

Abstract Poststratification is a common method of estimation in household surveys. Cells are formed based on characteristics that are known for all sample respondents and for which external control counts are available from a census or another source. The inverses of the poststratification adjustments are usually referred to as coverage ratios. Coverage of some demographic groups may be substantially below 100 percent, and poststratifying serves to correct for biases due to poor coverage. A standard procedure in poststratification is to collapse or combine cells when the sample sizes fall below some minimum or the weight adjustments are above some maximum. Collapsing can either increase or decrease the variance of an estimate but may simultaneously increase its bias. We study the effects on bias and variance of this type of dynamic cell collapsing theoretically and through simulation using a population based on the 2003 National Health Interview Survey. Two alternative estimators are also proposed that restrict the size of weight adjustments when cells are collapsed.

Key Words: Bias; Combining cells; Coverage error; Poststratification; Under­coverage; Weight trimming.

1. Introduction either low or high for the group represented by the cell. When the IAF for a cell falls outside some bounds set in Poststratification is a common technique used in survey advance, the cell is combined with another. For example, weighting that can serve to (1) reduce variances or (2) adjust the collapsing threshold for “high” ratio might be 2 and the for deficient coverage by the sample of some groups in the threshold for “low” ratio 0.6, which are the bounds used in target population. In household surveys in the U.S. the the Current Population Survey (CPS) conducted by U.S. second purpose is especially important because some Bureau of the Census (see Kostanich and Dippo 2000, page demographic groups, like young Black males, are covered 10­7). The second criterion is the sample size. A cell whose less well than others (e.g., see Kostanich and Dippo 2000, raw sample count is too small may be collapsed on the chapter 16). Adjusting for undercoverage can lead to grounds that the IAF is unstable. We will refer to a cell as differential weights, which may correct for bias but will also sparse if it violates one or the other of the criteria and is increase standard errors. Practitioners often avoid making collapsed with another cell. extreme weight adjustments, in effect trading­off some bias The categories of the variables that define poststrata are reduction in order to keep variances under control. usually sorted based on a natural ordering (e.g., age or One method of controlling the size of weight adjustments income categories) or a convenient ordering (e.g., race­ is to collapse the initial poststratification cells together if the ethnicity). Common practice is to collapse a cell with an adjustment in a cell exceeds some limit. Little (1993) and adjacent one which is similar in characteristics, disregarding Lazzeroni and Little (1998) cover methods of collapsing different coverage ratios of the individual cells. categories of ordinal poststratifiers. Other strategies for how Kalton and Flores­Cervantes (2003, page 95) observed to collapse strata or construct estimators have been that “methods that automatically restrict the range of the suggested by Fuller (1966), Kalton and Maligalig (1991), adjustments are redistributing the excess adjustments that and Tremblay (1986). Kim, Thompson, Woltman, and Vajs would otherwise be given to some respondents to other (1982) give some practical applications. In this paper, we respondents. The appropriateness of this redistribution study the effects on bias and variance of combining cells, should be examined.” This paper indeed examines its assuming that more finely defined cells would be preferable appropriateness and identifies circumstances where the if the sample sizes and sizes of weight adjustments were weight redistribution due to collapsing may be quite within some tolerances set by the survey designers. harmful. Two criteria are often used to decide whether a cell An obvious weakness of popular collapsing strategies is should be collapsed with another. The first is the inverse that coverage bias for some groups will be incompletely coverage ratio or initial adjustment factor (IAF), and is corrected. For example, suppose that the survey estimate for defined as the ratio of the control count to the initially the number of units in a group is only 1/3 of the census weighted sample count for the cell. A ratio which is count, so that initial weights would have to be multiplied by significantly different from 1 indicates that coverage is 3 to correct for undercoverage. If cell collapsing restricts the

1. Jay J. Kim, National Center for Health Statistics, Centers for Disease Control and Prevention; Jianzhu Li, Joint Program in Survey Methodology, University of Maryland; Richard Valliant, Survey Research Center, University of Michigan. 140 Kim, Li and Valliant: Cell collapsing in poststratification weight adjustment for units in that group to a factor of 2, effects of collapsing pairs of cells with different coverage then the survey estimate for the number of units in the group rates. will be only 2/3 of the census count. In addition, if cells with To derive theory for alternative estimators of means, we much different means are combined, bias can be introduced model a unit’s being covered by the sampling frame and a rather than corrected. The incomplete correction for under­ cell’s being sparse or not as random events. Define three coverage and collapsing of cells with disparate character­ indicator variables: δk = 1 if unit k is selected for the istics may lead to bias in totals, means, and other types of sample and 0 if not; ck = 1 if unit k is covered by the frame estimates. and 0 if not; d i = 1 if poststratum i is classified as sparse in Table 1 gives some illustrative coverage ratios, i.e., a particular sample and 0 if not (= 1,… , ). iI These survey estimates prior to poststratification divided by census indicators are assumed to be mutually independent and to counts, for the March 2002 U.S. Current Population Survey have expectations πk,, φ k and pi , respectively. Consider a and the 2003 Behavioral Risk Factors Surveillance Survey stratified, two­stage probability sample design. A design

(BRFSS) in a set of 44 counties in the southwestern U.S. stratum is denoted by h; sh is the set of primary sampling The survey estimates include a nonresponse adjustment for units (PSU’s) selected from design stratum h; shj ( i ) is the set both CPS and BRFSS. Coverage ratios shown for the subset of sample units from sample PSU j in stratum h that are also of demographic groups in the CPS range from 0.70 to 0.93 in poststratum i; U h is the population of PSU’s in stratum with Black­only typically being less than for other groups. h; U hj is the population of units in PSU j within stratum ; h BRFSS is a telephone survey with low response rates in this and U () i hj is the population of units in PSU j within stratum set of counties, and the ratios for BRFSS are much smaller h that are in poststratum i. For the analysis in this section, than for CPS. There are also substantial differences in the sample design does not need to be specified in more coverage ratios for different groups in BRFSS. For example, detail. Note that some summations in sections 2 and 3 of the the ratio for 35 ­ 44 year old Hispanic males is 0.18 but is form ∑ ∑ ∑ for units within poststratum i could h j∈∈ sh k s hj() i 0.37 for Black/Multiracial/Other males in the same age be simplified to ∑ without loss of generality. We have k∈ s() i range. If these two groups were collapsed, incomplete used the more elaborate notation to make clear how the coverage would be under­corrected for the Hispanics but stages of sampling should be treated. over­corrected for the Black/Multiracial/Other group. 2.1 Hájek estimator Another example is the 2003 National Health Interview Survey (NHIS) where American Indians and Asians were First, consider the Hájek estimator of a mean, which is collapsed with Whites within age groups. In the cell for ages I y k/ π k 25­29, for example, the coverage rates for Whites, ( ∑i=1 ∑ h ∑ j∈ sh ∑ k ∈ s hj( i ) ) yˆ = ≡ TNˆ/ ˆ . (1) American Indians, and Asians were 0.60, 0.44, and 0.31, π I π π 1/ π ∑i=1 ∑ h ∑ j∈ s ∑ k ∈ s k respectively (Tompkins and Kim 2006). ( h hj( i ) ) This paper demonstrates the weaknesses of current cell The expectation of Tˆ with respect to sampling and the collapsing procedures and proposes some alternatives. π coverage mechanism is EET()ˆ = ∑I ∑ ∑ c π π i=1 h j ∈ U h Section 2 discusses the bias of some standard estimators c ∑ k∈ U φky k ≡ T , where the c superscript denotes when there is undercoverage. Section 3 introduces two new hj( i ) ˆ ˆ “covered”. Similarly, the expectation of N π is EENc π() π = estimators that retain more of the undercoverage adjustment I c ˆ ∑i=1 ∑ h ∑ j ∈ U ∑ k ∈ U φ ≡ N . Expanding y around than the standard method when cells are collapsed. h hj( i ) k π (TNc , c ), its linear approximation is yˆ B Tc/ N c+ 1/ N c × Empirical properties of the standard and alternative methods π ((/)).TTNNˆ− c c ˆ Next, consider the bias of yˆ as an are investigated through simulation in section 4. We π π π estimator of YTN= ∑ I / with T being the total for the conclude in section 5 with a summary and some possibilities i =1 i i full population of units in poststratum i (not just the covered for future research. portion). After some calculation, the bias is

c T 1 I C ˆ B φy bias(yπ ) c −∑ T i = (2) 2. Some standard estimators N N i =1 φ

Three standard estimators of the population mean are the where φ =∑ φk/,()()/,/,NCφ y =∑ φ k − φy k − Y N Y = T N Hájek estimator, the poststratified estimator, and the and ∑ denotes the sum over i,,, h j∈ U h and k∈ U hj( i ) . ˆ poststratified estimator with initial poststrata collapsed Consequently, yπ is biased if there is any correlation together where necessary. Each of these is defined in detail between the variable measured, y, and the coverage below. When the sampling frame covers units in the target probability φ k . The bias in (2) is (1), O meaning that it population at different rates, each estimator can be biased. remains important even in large samples. Kim et al. (2005) give some numerical illustrations of the

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 141

Table 1 Coverage ratios from the Current Population Survey (CPS) and Behavioral Risk Factors Surveillance Survey (BRFSS). White only and black only mean that only those races were reported by a respondent in the CPS. Residual­only race group includes cases indicating a single race other than white or black, and cases indicating two or more races. Hispanics may be of any race in the BRFSS tabulation March 2002 Current Population Survey Age White­only Black­only Residual­only Male Female Male Female Male Female 0 ­15 0.93 0.93 0.78 0.79 0.91 0.90 16 ­ 19 0.90 0.88 0.76 0.81 0.93 0.75 20 ­ 24 0.79 0.85 0.72 0.77 0.75 0.72 25 ­ 34 0.83 0.89 0.70 0.76 0.76 0.80 All ages 0.90 0.92 0.78 0.83 0.85 0.84 2003 BRFSS: 44 border counties in Arizona, California, New Mexico, and Texas Age White Non­ Hispanic Black, Multiracial, Other Hispanic Male Female Male Female Male Female 18 ­ 24 0.19 0.26 0.12 0.24 0.15 0.22 25 ­ 34 0.20 0.31 0.10 0.16 0.19 0.39 35 ­ 44 0.28 0.31 0.37 0.25 0.18 0.30 All ages 0.25 0.31 0.25 0.20 0.18 0.31 Sources: Bureau of the Census (2002), Gonzalez, Town, and Kim (2005).

If the coverage probability is the same for every unit in where φi =∑ φ k/,()()/,NC iφ yi =∑ φ k − φ iy k − Y i N i and ∑ ˆ poststratum i, i.e., φ=k φ ()i for any k∈ U hj( i ) , then the denotes the sum over h,, j∈ U h and k∈ U hj( i ) . Thus, y PS1 ˆ − 1 approximate bias reduces to bias(yπ ) B φ∑ i Wi × is biased if there is any correlation between the y variable (())()φi − φ Yi − Y where WNNi= i / and Yi = measured and the coverage probability φ k in any of the ∑ y/. N If there is a correlation between the poststrata. If the coverage rate is constant at φ= φ ()i h, j∈ Uh , k ∈ U hj( i ) k i k poststratum coverage probabilities and the poststratum within poststratum i, then the poststratified estimator is means, the Hájek estimator will again be biased, and the approximately unbiased. From (3) it is apparent that bias could be either positive or negative. If the coverage poststrata should be formed so that either coverage rates or rates or the means are constant across poststrata, i.e., the y’s are homogeneous within each poststratum. This is

φ=()i φ o or YYi = , then the Hájek estimator will be similar to the recommendations of Eltinge and Yansaneh unbiased, but poststrata are usually not formed this way. (1997), Kalton and Maligalig (1991), and Little and Also, the bias exists even when the appropriate set of Vartivarian (2005) for the formation of nonresponse poststrata, that subdivide the population into groups with adjustment cells. In large surveys, the initial set of candidate different means, is unknown to the sampler. poststrata is often more extensive than the sample can 2.2 Poststratified mean with no cell collapsing support. With few exceptions, some of the initial poststrata are collapsed to control weight adjustments. If no collapsing ˆ The poststratified mean is defined as yPS1 = occurs, this is usually because small categories are pre­ I ˆ ˆ ˆ ˆ 1/NNNT∑ i =1 (i /π i ) π i where Tπ i and N πi are defined as in collapsed based on prior experience in the same or a similar (1) but excluding the summation over i. Define survey. In that sense, PS1 does not really exist in practice. T c = ∑ φ y and N c = ∑ φ . These i h,, j∈ Uh k ∈ U hj( i ) k k i h, j∈ Uh , k ∈ U hj( i ) k More common is the collapsing approach, PS2, described are the expected (with respect to the coverage mechanism) below. total and count of covered units in poststratum i. Expanding ˆ c c 2.3 Poststratified mean with collapsing yPS1 around (Ti , N i ), i= 1,… , I , its linear approximation is Turning to the poststratified estimator with collapsing, the sparse cells are identified and combined with other cells 1  N N T c   ˆ B i c i ˆi ˆ yPS1 ∑c Ti + ∑ c Tπi − c N πi   . considered to be their nearest neighbors. This could result in N i i    NNNi i  i   more than one sparse cell being collapsed with a given The bias of the poststratified estimator is then the first term nonsparse cell. Neighbors can be defined in various ways, ˆ I e.g., cells with similar estimated coverage rates, NNπi /, i of this expression minus ∑ i =1 TNi /, and after some manipulation, can be written as cells that are adjacent in some substantive sense like nearby income classes, or cells that have similar means on some C bias (yˆ ) B I W φyi important survey variables. The general algorithm for PS1 ∑ i =1 i (3) φi collapsing, given an initial set of cells, is:

Statistics Canada, Catalogue No. 12­001­X 142 Kim, Li and Valliant: Cell collapsing in poststratification

(1) Compute the collapsing criteria for each cell, e.g., biased estimates. We refer to these as weight restriction ˆ the IAF’s, NNi/, π i and the cell sample sizes; (WR) methods. The two alternatives presented in this (2) Identify the sparse cells, i.e., those whose criteria section use cell collapsing but retain a larger share of the fall outside the bounds for collapsing; weight adjustment for individual cells than does the standard collapsing method. (3) Determine the nearest, nonsparse neighbor of each The first alternative is denoted PS.WR1 and consists of sparse cell and combine the sparse cell with its the following algorithm. Denote the maximum allowable neighbor. weight adjustment by f max with max > 1. f The poststratified mean with collapsing is then (1) Execute steps (1) ­ (3) of the algorithm in section yˆ = 1/ N∑ ( N / Nˆ ) T ˆ where Tˆ = ∑ ∑ PS2 g gπ g π g πg i∈ Ag h, j ∈ sh , k ∈ s hj( i ) 2.3 for PS2. y / π and Nˆ is defined similarly. Define TTc = ∑ c k k πg g A g i c c (2) Censor any IAF greater than f to f and and NN= ∑ with A being the set of poststrata in max max g A g i g collapsed group g. Expanding yˆ around (TNc , c ), for adjust each weight in the corresponding initial cell PS2 g g to w% = w f with w =1/ π . For units in cells each collapsed group g, gives k k max k k with IAF≤ max , f set w% k= w k . 1  N N T c   ˆ B  g c g ˆg ˆ  (3) Compute a collapsing adjustment factor (CAF) for yPS2 ∑c Tg + ∑ c Tπg − c N πg  . N g NNNg    g g  g   a collapsed group g as % It follows that fg= N g ∑ ∑ w% k . i∈ Ag h, j ∈ sh , k ∈ s hj( i ) C bias(yˆ ) B W φyg (4) PS2 ∑ g g (4) The final adjusted weight is then w% f % for unit k φ g k g in group g. with

WNNNCg= g/ , φ g =∑ φ k / g ,φ yg =∑ ( φ k − φ g )(y k − Y g ) / N g , This method will reduce the largest values of the final weight adjustment below the without­collapsing adjust­ and the summations in φ g and Cφ yg are over i∈ Ag ,, h ˆ ments, NNi/, π i though there may be one or more groups j∈ U h , and k∈ U hj( i ) . If φ k is constant within collapsed that have CAF’s greater than the f max cutoff. The control group g, this estimator is unbiased, but if φ=k φ( i ), i.e., the total for group g, N g , is met in the sense that coverage rate is constant within poststratum i but can differ % ∑i∈ A ∑ h,, j ∈ s k ∈ s w% f= N but the control totals for the across the poststrata, then the bias becomes g h hj( i ) k g g individual cells in Ag are not. C ∗ To analyze the properties of PS.WR1, define A and bias(yˆ ) B W φyg (5) , gsp PS2 ∑ g g A to be the sets of sparse and nonsparse poststrata in φ g , gsp collapsed group g. PS.WR1 can be expressed as with φ =∑W φ( i ), C∗ =∑ W ( φ ( i ) − φ )( Y − Y ), W = g giφ yg gi g i g gi 1 N NN/, and the summations are over i∈ A . yˆ = g T ˆ i g g PS.WR1 ∑ g ˆ g WR1 ˆ ˆ N N g WR1 Thus, in the case where yPS1 will be unbiased, yPS2 will be biased if poststrata are collapsed together that have where different coverage rates and different population means. ˆ ∗ TgWR1 = ∑ ∑ ∑ ∑ fmax wk y k i∈ Ag, sp h j∈ sh k ∈ s hj( i ) Since φ g and Cφ yg are both (1), O the bias does not decrease as the sample increases; thus, the bias­squared will + w y ∑i∈ A ∑ h ∑ j∈ s ∑ k ∈ s k k eventually be the dominant part of the mean square error. If g, sp h hj( i ) ˆ cells are collapsed, the cells in each group should have the and N g WR1 has a similar definition with yk set to 1. The ˆ same coverage rates, the same means, or both to avoid bias. expectation of Tg WR1 over the coverage, sparseness, and ˆ c sampling mechanisms is Ec E sp Eπ ( T gWR1 )= Tg + ( f max − 1) × T % c where T% c = ∑ p T c . Likewise, EEEN()ˆ = g g i∈ A g i i c spπ g WR1 c % c % c c ˆ 3. Weight restricted estimators Ng +( fmax − 1) N g with Ng = ∑ i∈ A pi N i . yPS.WR1 can be g ˆ ˆ expanded around the expectations of (TNgWR1 , g WR1 ). After We examine two alternative methods of weight compu­ ˆ some manipulation, the approximate bias of y PS.WR1 tation when collapsing of poststrata is used, extending work becomes of Kim (2004). The alternatives are designed to be C ˆ αφ, y , g compromises between (a) use of all poststrata and the bias(yPS.WR1 ) B ∑ Wg (6) g ()αφ potential for large weight adjustments and (b) collapsing of g poststrata yielding less variable weights but potentially where

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 143

In case (ii), the requirement is (αφ )g =∑ α i φ k /N g , α i = 1 + ( fmax − 1) p i , ˆ C =( α φ − ( αφ ) )(y − Y ) / N , 2 bias(y PS2 ) αφ, y , g ∑ i k g k g g 0 ≤ W C / φ ≤ . ∑ g g pφ , y , g g f max − 1 and the summations are over i∈ Ag,,, h j ∈ U h and

k∈ U hj( i ) . In the case of a common coverage probability in If the covariance between the probability of being sparse poststratum i, i.e., φ=k φ( i ), we have and covered, pi φ ( i ), and the cell means, Yi , is small in all groups and the opposite sign of bias(ˆ ), y then yˆ ()αφ = φ +f −1 ( p φ ) PS2 PS.WR1 g g ( max ) g ˆ will be less biased than PS2 . y and The second alternative is denoted PS.WR2 and is

g intended to exercise more control over the size of the final αφ−αφ=φ−φ+i ()(i ) (()ig )( fmax − 1)( pi φ−φ ()()) i p g weight adjustment than does PS.WR1. In PS.WR1 the final where ()pφg =∑ A Wgi p i φ ( i ). From this it follows that g adjustment can be larger than max . f PS.WR2 seeks to limit ∗ Cαφ,,y g= Cφ yg +( fmax − 1) C pφ , y , g the final adjustment to max = 2 f or some other maximum set in advance. The general idea is to first determine which with cells should be collapsed together, as was done for PS.WR1. C pφ , y , g =Wgi (()())(). pi φi − p φg Y i − Y g ∑ A g Then weights in the sparse cells are multiplied by max . f The weights in the non­sparse cell in a collapsed group are then If the cell means Y are all equal within a collapsed group, i adjusted by a constant factor to bring the estimated then CC∗ = = 0 and yˆ will be approximately φyg p φ , y , g PS.WR1 population count in the group to the control count. The unbiased. In the special case in which coverage is constant detailed algorithm for computing weights for PS.WR2 is the in a group, i.e., φ=(),i φ g then Cαφ, y , g =( f max − 1) following: φ ∑ i∈ A W( p− p )( Y − Y ) with p= ∑ A W p . Thus, gg gi i g i g g g gi i (1) Execute steps (1) ­ (3) of the algorithm in section if p and φ ( ) i are constant within A , then yˆ will be i g PS.WR1 2.3 for PS2. nearly unbiased even if the cell means Yi,, i∈ A g differ. This condition is almost sure to be false as long as one (2) In a group containing at least one non­sparse cell, poststratum in a group has a probability of being sparse that compute the control total in group g as NN= ∑ i∈ A and the adjusted weight for all units is substantially different from the others. g g i k in A as w% = w f . In the case of a common coverage probability in , gsp k k max poststratum ,φ ( ), ii we can also compare the biases of the (3) Compute the adjusted weight for all units k in ˆ ˆ ˆ % ˆ collapsed cell estimator, PS2 , y with that of PS.WR1 . y Using A as w% = w()/ N − N N where , gsp k k g g, sp g, sp ˆ results in the previous paragraph, the bias in (6) can be Nˆ = ∑ ∑ w and N% = g, sp i∈ A g, sp h,, j∈ sh k ∈ s hj( i ) k g, sp expressed as ∑ ∑ w% . i∈ Ag, sp h, j ∈ sh , k ∈ s hj( i ) k ∗ Cφyg/φ g + ( fmax − 1) C pφ , y , g / φ g  (4) The final adjusted weight is then w % for unit k in bias(yˆ ) B W   . k PS.WR1 ∑ g g group g.  1+ (fmax − 1)( p φ )g / φ g  This second weight restricted estimator can be written as Since 1+ (fmax − 1)( p φ )g / φ g ≥ 1, we can use (5) to obtain yˆ = (1/ N ) Tˆ where ˆ PS.WR 2 ∑ g g WR2 bias (y PS.WR1 ) Tˆ = f w y gWR2 ∑i∈ A ∑ h, j∈ s , k ∈ s max k k ≤ W C∗ /φ + ( f − 1) C / φ  g, sp h hj( i ) ∑ g g φ y, g g max pφ , y , g g  N− f N ˆ + g maxg , sp w y ˆ ∑i∈ A ∑ h, j∈ s , k ∈ s k k =bias(yPS2 ) + ( f max − 1) Wg C pφ , y , g/.φ g (7) g, sp h hj( i ) ˆ ∑ g N g, sp If pφ () i and Y are uncorrelated, the absolute bias of ˆ i i where Ng, sp = ∑i∈ A ∑ h, j ∈ s , k ∈ s wk . The expectation of yˆ is less than or equal to that of yˆ because ˆ g, sp h hj( i ) PS.WR1 PS2 Tg WR 2 over the coverage, sparseness, and sampling 1+ (fmax − 1)( p φ )g / φ g ≥ 1. When pi φ() i and Yi are mechanisms is correlated, there are two cases to consider: (i) N− f N % c ˆ ˆ c g max g c c bias( PS2 ) ≥ 0 y and (ii) bias( PS2 ) < 0. y In the former, the E E E() Tˆ = f T% + ()T− T % c spπ gWR2 max g c% c g g last line of (7) will be less than or equal to the absolute bias NNg− g ˆ of yPS2 if % c c where Ng = ∑ i∈ A pi N i . After some calculation, the ˆ g − 2 bias(y PS2 ) approximate bias of yˆ can be written as ≤ W C /φ ≤ 0. PS.WR 2 ∑ g g pφ , y , g g f max − 1

Statistics Canada, Catalogue No. 12­001­X 144 Kim, Li and Valliant: Cell collapsing in poststratification

ˆ public­use file. A subset of the NHIS was created with bias(y PS.WR2 ) 21,664 persons. These were divided into 25 strata with each 1 c c c B f N % (µ − µ ) having six PSUs. The strata and PSU’s are based on those in N max ∑ g g g, sp g, sp the NHIS public use file, but sets of three strata were 1 + W collapsed together to create new design strata for the study ∑ g g c ∑ qi N i population. We used four binary variables (0­1 char­ A g acteristics) for the simulation, each of which is based on a × q Tc − N− 1 q Nc T  ∑A i i g ∑A i i ∑ A i person’s self­report:  g ( g )( g )  where µc = TN%% c c and µc = ∑q Tc /. ∑ q N c Health insurance coverage ­ whether a person was g, sp g g g, sp Ag i i A g i i Next, note that in the case of a common coverage covered by any type of health insurance; probability in cell i,(),φ=k φ i Physical, mental, or emotional limitation ­ whether a c −1 c person was limited in any of these ways; qi T i− N g qi N i T i ∑Ag ( ∑Ag )( ∑ A g ) Medical care delayed ­ whether a person delayed = (qiφ k − ( q φ ) g )( y k − Y g ) ∑Ag ∑ h, j∈ Uh , k ∈ U hj( i ) medical care or not because of cost in last 12 months; =∑ Ni( q i φ ( i ) − ( q φ )g )( Y i − Y g ) A g Overnight hospital stay ­ whether a person stayed ≡NCC( ∗ − ) gφ yg p φ , y , g overnight in a hospital in last 12 months. with q=1 − p , ( q φ ) = q φ / N , and CC∗ , were i i g∑ i k g φyg p φ, y , g Table 2 shows the percentages of persons with these four defined previously. Next, use the fact that ∑ q N c = A g i i characteristics in cells formed by age and sex. These 16 c% c c c NNg− g to define PNNg= g/, g the proportion of units (age × sex) cells are the initial set of poststrata used in %%c c covered in group g, and PNNg= g/, g the expected estimation. The percentages can vary substantially among proportion covered in sparse cells in group g. Then, the bias the cells, depending on the characteristic. For, example, 18­ can also be written as 24 year olds are much more likely to have no health 1 insurance; children under age 5 and the elderly age 65 and ˆ B % c c c bias(yPS.WR2 ) fmax ∑ N g(µ g, sp − µ g, sp ) N g over are much more likely to have had a hospital stay. ∗ Collapsing cells together that have different means, or Cφyg− C p φ , y , g + ∑ W g . (8) proportions in this case, has the potential to introduce bias, g PPc− % c g g as noted earlier. ˆ Judging from (8), y PS.WR 2 will be approximately We also created one artificial binary variable that had a unbiased if the mean per unit for the units covered by the common mean of 0.20 regardless of the unit’s poststratum frame in each collapsed cell is the same in the sparse cells, membership. In that case all estimators, including the Hájek c c c µ g, sp , as in the nonsparse cells, i.e., µg, sp = µ g, sp , and the estimator, will be unbiased regardless of coverage rates. ∗ covariances, Cφ yg and C pφ , y , g , are both 0. The latter is Also, the conventional thinking that collapsing of cells may accomplished by combining cells with the same means, Yi . reduce variances by smoothing out extreme weight Combining cells with equal coverage rates does not result in adjustments may hold for this variable. ˆ y PS.WR 2 being unbiased. This is more restrictive than for ˆ 4.2 Sample design PS.WR1 , y which is unbiased if either the coverage rates or the means are the same in all cells in a collapsed group. Two sample PSU’s were selected in each stratum with probability proportional to size (PPS) with the size being the count of persons in each PSU. Sampling of PSU’s was done 4. An empirical investigation with­replacement to simplify variance estimation. If without­replacement sampling had been used, then a more To test some of the ideas presented earlier, we conducted elaborate method of selection and variance estimation a simulation study of the bias properties of alternative would have been needed (see, e.g., Särndal, Swensson, and methods of poststratification. We also examined the Wretman 1992, chapter 3). In each sample PSU, 20 persons performance of one variance estimator that is often used in were selected by simple random sampling without practice. replacement for a total of 1,000 persons in each sample. For each combination of parameters discussed below, 2,000 4.1 Study population samples were selected. The population used in the simulation was extracted from Sixteen initial poststrata were used which were the cross the 2003 National Health Interview Survey (NHIS) person of the eight age groups, shown in Table 2, with gender. In

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 145 each sample, we computed the estimators of population sample that was selected. For example, if the coverage ratio proportions, described earlier in sections 2­3 ­ the Hájek in the poststratum of males younger than 5 years old is 0.9, ˆ ˆ estimator, π , y the poststratified estimator PS1 , y that uses all then 90% of the population in that poststratum was 16 poststrata, the poststratified estimator with collapsing of randomly selected to stay in the sampling frame while the ˆ cells, PS2 , y and the two weight­restricted estimators, rest had a zero probability of being sampled. yˆ and ˆ . y The simulation code was written in PS.WR1 PS.WR 2 4.4 Collapsing rules the R language (R Development Core Team 2005) with extensive use of the R survey package (Lumley 2004, We set up situations where the conditions for 2005). unbiasedness in sections 2 and 3 can be violated when cells were collapsed in the simulations. Each of the estimators, 4.3 Coverage mechanisms ˆ ˆ ˆ yPS2,, y PS.WR1 and y PS.WR 2 involve cell collapses. If the IAF ˆ Five sets of coverage mechanisms, shown in Table 3, (poststratification factor) in an initial poststratum, NNi/, π i were employed to filter the population before the PSU’s exceeds the maximum allowable adjustment, max , f or if the were sampled. The coverage ratios varied by poststratum cell sample size is less than a minimum, ni ,min , we call this and were different for each of the five characteristics for poststratum a “sparse” cell and collapse it with a which proportions were estimated. The coverage ratios neighboring cell. We used two methods of determining specific to each of the five characteristics are named C1 neighbors, designated here as “adjacency” and “close­ through C5 in Table 3. These coverage ratios were mean”. artificially created based on the population means for each In adjacency collapsing, the neighbors of a specific cell age and sex group. Poorer coverage was assigned to groups are defined as the cells either horizontally or vertically with larger percentages with a characteristic for health adjacent to it in the age × sex table. For example, in the insurance coverage and limitations; the opposite was true following, abbreviated table, the neighbors of cell 3 are the for delayed medical care and hospital stays. In C5 the shaded cells 2, 4, and 7. coverage ratios are quite variable and are intended to lead to 1 5 coverage adjustments that vary substantially among the 2 6 initial set of 16 poststrata. Although the rates in Table 3 are 3 7 low, they are comparable to or higher than those for BRFSS in Table 1. In applying these rates, we randomly selected a 4 8 subset of the population to be in the sample frame for each Table 2 Percentages of persons with four health­related characteristics in groups formed by age and sex Percentage of persons with characteristic Population Counts Not covered by Physical, mental, Delayed medical care Hospital stay in last health insurance emotional limitations in last 12 months 12 months Age Male Female Total Male Female Total Male Female Total Male Female Total Male Female Total < 5 843 795 1,638 10 9 9 4 3 3 3 4 3 17 15 16 5 ­ 17 2,271 2,082 4,353 13 14 13 10 6 8 4 4 4 2 1 2 18 ­ 24 998 1,031 2,029 37 31 34 4 4 4 8 11 9 3 14 8 25 ­ 44 2,971 3,207 6,178 28 23 25 7 7 7 9 10 9 3 10 6 45 ­ 64 2,421 2,597 5,018 14 14 14 16 19 18 7 11 9 8 10 9 65 ­ 69 305 384 689 2 1 2 24 29 27 3 8 6 15 14 14 70 ­ 74 275 344 619 1 1 1 34 32 33 2 5 4 18 15 17 75+ 423 717 1,140 1 1 1 41 48 45 2 2 2 22 22 22 Total 10,507 11,157 21,664 18 16 17 12 13 13 6 8 7 7 10 8 Table 3 Coverage ratios used in the simulations C1: Not covered by C2: Physical, mental, C3: Delayed medical C4: Hospital stay C5: Common Mean health insurance emotional limitations care in last 12 months Y in the last 12 months Age Male Female Male Female Male Female Male Female Male Female < 5 0.9 0.9 0.9 0.9 0.5 0.5 0.8 0.8 0.9 0.8 5 ­ 17 0.8 0.8 0.9 0.6 0.5 0.5 0.5 0.5 0.7 0.2 18 ­ 24 0.5 0.5 0.8 0.8 0.8 0.8 0.5 0.8 0.4 0.4 25 ­ 44 0.5 0.5 0.8 0.8 0.8 0.8 0.5 0.8 0.6 0.5 45 ­ 64 0.8 0.8 0.5 0.6 0.8 0.8 0.5 0.8 0.3 0.8 65 ­ 69 0.9 0.9 0.5 0.6 0.5 0.5 0.8 0.8 0.4 0.4 70 ­ 74 0.9 0.9 0.5 0.5 0.5 0.5 0.8 0.8 0.2 0.7 75+ 0.9 0.9 0.5 0.5 0.5 0.5 0.8 0.8 0.8 0.9

Statistics Canada, Catalogue No. 12­001­X 146 Kim, Li and Valliant: Cell collapsing in poststratification

ˆ In the adjacency method, a sparse cell was collapsed with For the collapsed stratum estimator, PS2 , y (9) applies with the neighbor with the smallest poststratification factor. In the linear substitute defined as close­mean collapsing, a sparse cell was collapsed with the N nonsparse cell whose unweighted sample mean was closest u=g ( y − yˆ ), k ∈ s , k ˆ k g g to that of the sparse cell. N g π Two different values of f were used in the max with yˆ = Tˆ/ N ˆ and s is the set of all sample units in simulations – = 2 f and 1.8. Use of = 1.8 f leads to g gπ g π g max max group g. more collapsing of cells than = 2 f and exhibits more of max In the cases of PS.WR1 and PS.WR2, we calculate the the biases (for the characteristics other than the artificial one final weights as described in section 3 and call the R with a common mean) noted in sections 2 and 3 caused by poststratify function. This results in the linear combining of cells with different means or different substitute being computed as coverage rates. The minimum cell size was set to n = 25. Of course, in practice many variations are used N i ,min u=g ( y − yˆ ) = y − y ˆ to decide which combinations of cells are allowable. We k Nˆ k g k g have used just two of the possibilities for illustration in the g simulation. because Nˆ = ∑ ∑ w% = N . The mean yˆ is g i∈ Ag k ∈ s i k g g Once all of the sparse cells and their neighbors are computed as identified, the collapsing process proceeds sequentially from ˆ yg = ∑ ∑w% k y k ∑ ∑ w % k cell 1. In a survey with many potential poststrata defined in i∈ Ag k ∈ si i∈ Ag k ∈ s i advance, these procedures might have to be performed = w% y N . ∑i∈ A ∑ k ∈ s k k g iteratively to eliminate all sparse cells. In this simulation, we g i performed only one round of collapsing. The weighted linear substitute is then u= w% () y − yˆ and ( k k k g u = ∑ w% u . 4.5 Variance estimation hj+ i, k∈ s hj( i ) k k In the cases of PS2, PS.WR1, and PS.WR2, these For each of the estimators of a proportion, a linearization variance estimators do not account for the dynamic nature of variance estimate was calculated. Each of the variance cell collapsing which can vary from sample to sample. estimators is based on the linear substitute method (e.g., see Consequently, there is a source of variation that is not Särndal et al. 1992, chapter 5). The variance estimates for accounted for, and we can anticipate that the variance all estimators of proportions were computed using the estimates will be somewhat too small compared to svydesign, poststratify, and svymean functions empirical, simulation variances. in the R survey package. The general, theoretical approach is to make a linear approximation for a particular 4.6 Simulation results estimator. The linear approximation is rearranged so that the Tables 4­7 summarize results for coverage correction estimator is written as a sum of weighted PSU totals, and errors, relative biases of estimated proportions, variances of the variance estimator for with­replacement PSU sampling alternative estimators, and confidence interval coverage ˆ ˆ ˆ ˆ is used. The estimators yPS1,,, y PS2 y PS.WR1 and y PS.WR 2 are using linearization variance estimators. Table 4 shows treated as standard poststratified estimators for the purposes average absolute coverage correction error, defined as ˆ of variance estimation. For PS1 , y define the following: e= () DI −1 D I NNˆ − 1 ∑d=1 ∑ i = 1 di i (10) u = N/(),, Nˆ y− yˆ k ∈ s with yˆ = Tˆ/ N ˆ and s k i iπ k i i i iπ i π i ˆ being the set of all sample units in poststratum where d is one of the = 2,000 D samples and N di is the estimated number of units in poststratum i based on the final i; uk is known as a linear substitute; ( weights for a particular estimator (Hájek, PS2, PS.WR1, or u = w u ; and hj + ∑ k k PS.WR2). The value of e is 0 for the poststratified i, k∈ s hj( i ) estimator with no cell collapsing, PS1, since it corrects ( 1 ( coverage error completely in each of the 16 poststrata. To u h ++ = ∑ u hj + illustrate how the average coverage correction errors can n j∈ s h h vary, we estimated the proportions for the health insurance ˆ The variance estimator for yPS1 is then and common mean Y variable using the C1 and C5 frame 1 n (( coverage ratios. For most combinations of coverage ratios, v() yˆ = h (u− u )2 . (9) PS1 2 ∑ ∑ hj+ h ++ collapsing method, and adjustment bound, PS.WR1 more N hn h − 1 j∈ s h effectively corrects for coverage error than the standard

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 147 collapsing estimator, PS2. For example, = 0.086 e with smaller than those of PS1 for health insurance and PS.WR1 for (health insurance, adjacency collapsing, limitations, but are more variable than PS1 for delayed care

max = 2) f while PS2 has = 0.120. e In contrast, PS.WR2 is and hospital stay. These results also make it clear that the somewhat worse than PS2 in coverage correction. variance of a poststratified estimator can be either increased Table 5 presents the relative biases (relbias), defined as or decreased by collapsing. There are some minor variance D ˆ ˆ 100∑ d =1 (yd − Y ) / Y where yd is one of the estimates of gains from using PS2 for some combinations for the first proportion for sample d. The Hájek estimates are badly four variables, but with (adjacency, max = 2) f the PS2 biased for the first four characteristics since they include no variance of hospital stay is 17% larger than that of PS1. correction for the differential undercoverage among the With (adjacency, max = 1.8), f PS2 is 23% more variable for cells. The relbiases range from ­12.1% for limitations to hospital stay. PS.WR1 does not have the extreme variances 13.4% for hospital stay. As noted in section 2, the bias can of PS2 in adjacency collapsing; like PS2, PS.WR2 has be either positive or negative, depending on the correlation larger variance for hospital stay in adjacency collapsing. of coverage rates and cell means. When close­mean, rather than adjacency, collapsing is used, Poststratification with no collapsing of cells (PS1) gives variances of PS2, PS.WR1, and PS.WR2 are much closer to nearly unbiased estimates while the alternatives ­ PS2, those of PS1. However, for the Common Mean Y variable, PS.WR1, and PS.WR2 ­ all introduce a bias when using collapsing always reduces variance. The reductions are adjacency collapsing for the first four characteristics. The almost 20% for adjacency collapsing. number of poststrata after collapsing, shown in Table 5, The right­hand section of Table 6 lists the ratios of the ranges from 6 to 16 when max = 2 f and from 5 to 13 when empirical mean square errors (MSEs) of estimated max = 1.8. f The relative biases of PS2, using adjacency proportions as a proportion of the MSE of PS1. With a few collapsing, range from ­4.4% to 6.2% when max = 2 f and exceptions, PS2 is the worst choice of the poststratified from ­6.5 to 9.4% when max = 1.8. f With adjacency estimators for the first four characteristics regardless of the collapsing, The alternatives, PS.WR1 and PS.WR2, have combination of variable, max , f and collapsing method. biases that are intermediate between PS1 (no collapsing) When max = 1.8, f the choice that leads to more collapsing, and PS2. PS.WR1, in particular, is reasonably competitive the MSEs of PS2 range from 1.8% to 44.2% larger than with PS1 in terms of bias with adjacency collapsing. In those of PS1. The MSEs of both PS.WR1 and PS.WR2 are contrast, close­mean collapsing yields PS2, PS.WR1, and near those of PS1 with the exception of (hospital stay,

PS.WR2 estimates that are essentially unbiased when max = 1.8, f adjacency) where the 6.3% bias of PS.WR2 max = 2. f With mean collapsing and max = 1.8, f PS2 and leads to an MSE 25.6% larger than that of PS1. Close­mean PS.WR2 are still somewhat biased, but PS.WR1 compares collapsing is preferable to adjacency collapsing, although well with PS1. For the fifth characteristic (Common mean for the first four characteristics none of the estimators have Y), all estimators are nearly unbiased, regardless of smaller MSEs than PS1, which does not use collapsing. collapsing method, as expected. The estimators again perform differently for the One justification that is conventionally given for Common mean Y variable. The MSEs of Hájek, PS2, collapsing cells is that extreme weights will be reduced and PS.WR1, and PS.WR2 are all less than that of PS1. The variances of estimates will, in turn, be reduced. Table 6 Hájek estimator has the smallest MSE, owing to the fact that shows the ratios of the empirical variances of estimated poststratification is unnecessary to correct bias in estimating proportions as a proportion of the variance of PS1. The the mean. Hájek estimates have variances that are about 12% and 18% Table 4 Average absolute coverage correction error as defined in expression (10) Collapsing Adjustment Hájek PS2 PS.WR1 PS.WR2 Method Bound (standard (truncate weights (fixed maximum

fmax collapsing) then collapse) weight adjustment) C1 Coverage ratios for health insurance variable Adjacency 2 0.257 0.120 0.086 0.221 Close mean 2 0.257 0.080 0.127 0.281 Adjacency 1.8 0.256 0.150 0.085 0.202 Close mean 1.8 0.256 0.101 0.109 0.258 C5 Coverage ratios for common mean Y variable Adjacency 2 0.442 0.326 0.196 0.331 Close mean 2 0.441 0.321 0.203 0.370 Adjacency 1.8 0.442 0.330 0.206 0.376 Close mean 1.8 0.442 0.337 0.214 0.446

Statistics Canada, Catalogue No. 12­001­X 148 Kim, Li and Valliant: Cell collapsing in poststratification

Table 5 Relative biases (in percent) of estimated proportions. (Figures for Hájek and PS1 are not affected by collapsing and are repeated in the four sections of the table to facilitate comparisons) Characteristic Range of Hájek PS1 PS2 PS.WR1 PS.WR2 no. of (no (standard (truncate weights (fixed maximum poststrata after collapsing) collapsing) then collapse) weight adjustment collapsing Adjacency collapsing, adjustment bound = 2 Health insurance (10, 16) ­11.5 0.1 ­4.4 1.0 ­1.4 Limitations (8, 15) ­12.1 ­0.3 ­2.0 0.1 ­1.0 Delayed care (6, 14) 8.2 ­0.2 2.2 ­0.6 0.9 Hospital stay (9, 16) 13.4 0.2 6.2 ­0.7 2.8 Common mean Y (5, 11) 0.3 0 0.4 0.4 0.6 Close­mean collapsing, adjustment bound = 2 Health insurance (10, 16) ­11.5 0.1 ­0.5 0.5 ­0.3 Limitations (8, 15) ­12.1 ­0.3 ­1.2 0.2 ­1.1 Delayed care (6, 14) 8.2 ­0.2 ­0.3 ­0.3 ­0.2 Hospital stay (9, 16) 13.4 0.2 0.4 0.1 0.4 Common mean Y (6, 11) 0.3 0 0.2 0.1 0.2 Adjacency collapsing, adjustment bound = 1.8 Health insurance (7, 13) ­11.5 0.1 ­6.5 0.7 ­3.5 Limitations (7, 12) ­12.1 ­0.3 ­3.4 0.3 ­2.0 Delayed care (5, 11) 8.2 ­0.2 3.5 ­0.4 2.5 Hospital stay (5, 12) 13.4 0.2 9.4 0.0 6.3 Common mean Y (5, 9) 0.3 0.1 0.5 0.6 0.6 Close­mean collapsing, adjustment bound = 1.8 Health insurance (6, 13) ­11.5 0.1 ­1.6 0.3 ­1.7 Limitations (7, 12) ­12.1 ­0.3 ­2.7 0.9 ­2.4 Delayed care (5, 10) 8.2 ­0.2 0.2 ­0.3 0.5 Hospital stay (5, 12) 13.4 0.2 1.5 0.3 2.0 Common mean Y (5, 10) 0.3 0.2 0.3 0.3 0.3

Table 6 Ratio of variances (or MSEs) to the variance (or MSE) of the poststratified estimator (PS1) with no collapsing. (Figures for Hájek are repeated in the four sections of the table to facilitate comparisons) Ratio of variances to the variance of the Ratio of MSEs to the MSE of the poststratified poststratified estimator (PS1) estimator (PS1) Characteristic Hájek PS2 PS.WR1 PS.WR2 (fixed Hájek PS2 PS.WR1 PS.WR2 (fixed (standard (truncate maximum (standard (truncate maximum collapsing) weights then weight collapsing) weights then weight collapse) adjustment) collapse) adjustment) Adjacency collapsing, adjustment bound = 2 Health insurance 0.877 1.014 1.025 0.991 1.500 1.101 1.018 1.006 Limitations 0.821 0.966 1.035 0.977 1.555 1.008 1.017 0.992 Delayed care 1.099 1.023 1.003 1.000 1.239 1.023 1.000 1.000 Hospital stay 1.290 1.169 1.000 1.073 1.733 1.244 1.000 1.070 Common mean Y 0.755 0.805 0.908 0.818 0.752 0.801 0.904 0.826 Close­mean collapsing, adjustment bound = 2 Health insurance 0.877 1.013 1.014 1.008 1.500 1.006 1.006 1.006 Limitations 0.821 0.999 1.025 0.994 1.555 1.008 1.017 1.008 Delayed care 1.099 0.997 0.998 1.001 1.239 1.000 1.000 1.000 Hospital stay 1.290 1.011 1.000 1.008 1.733 1.012 1.000 1.012 Common mean Y 0.776 0.935 0.974 0.902 0.781 0.933 0.973 0.906 Adjacency collapsing, adjustment bound = 1.8 Health insurance 0.877 0.960 1.044 0.976 1.500 1.179 1.024 1.048 Limitations 0.821 0.939 1.032 0.961 1.555 1.034 1.017 1.000 Delayed care 1.099 1.051 0.991 1.032 1.239 1.057 0.989 1.034 Hospital stay 1.290 1.225 1.043 1.201 1.733 1.442 1.023 1.256 Common mean Y 0.780 0.815 0.882 0.828 0.779 0.816 0.893 0.829 Close­mean collapsing, adjustment bound = 1.8 Health insurance 0.877 1.010 1.006 1.019 1.500 1.018 1.000 1.024 Limitations 0.821 0.983 1.051 0.975 1.555 1.034 1.034 1.017 Delayed care 1.099 1.003 0.995 1.001 1.239 1.000 1.000 1.000 Hospital stay 1.290 1.052 1.001 1.059 1.733 1.035 1.000 1.047 Common mean Y 0.771 0.924 0.958 0.876 0.778 0.932 0.959 0.879

Statistics Canada, Catalogue No. 12­001-X Survey Methodology, December 2007 149

Table 7 Coverage rates in percent of 95% confidence intervals computed using t­distribution with 25 DF. (Figures for Hájek and PS1 are not affected by collapsing and are repeated in the four sections of the table to facilitate comparisons) Characteristic Hájek PS1 PS2 PS.WR1 PS.WR2 (no collapsing) (standard (truncate weights (fixed maximum collapsing) then collapse) weights adjustment Adjacency collapsing, adjustment bound = 2 Health insurance 75.9 93.8 90.1 94.4 93.6 Limitations 70.9 94.5 93.1 94.5 93.9 Delayed care 92.0 94.0 94.5 94.0 94.6 Hospital stay 82.2 94.5 91.8 94.3 93.9 Common mean Y 94.8 93.8 94.6 94.4 94.5 Close­mean collapsing, adjustment bound = 2 Health insurance 75.9 93.8 93.7 94.2 94.0 Limitations 70.9 94.5 93.0 94.3 93.5 Delayed care 92.0 94.0 93.6 93.9 93.8 Hospital stay 82.2 94.5 94.3 94.7 94.6 Common mean Y 93.7 92.9 92.2 92.5 93.2 Adjacency collapsing, adjustment bound = 1.8 Health insurance 75.9 93.8 87.5 94.1 92.4 Limitations 70.9 94.5 92.0 94.2 93.3 Delayed care 92.0 94.0 94.8 94.5 94.5 Hospital stay 82.2 94.5 88.4 94.0 91.1 Common mean Y 94.8 94.1 94.3 94.8 94.7 Close­mean collapsing, adjustment bound = 1.8 Health insurance 75.9 93.8 92.8 94.3 93.3 Limitations 70.9 94.5 92.3 94.5 93.0 Delayed care 92.0 94.0 93.8 94.0 94.4 Hospital stay 82.2 94.5 93.8 94.6 93.8 Common mean Y 94.9 94.5 93.6 93.8 94.8

Table 7 reports the empirical coverages of 95% CI’s 5. Concluding remarks computed using the estimated proportions and the linearization variance estimator that naturally accompanies Designers of surveys of households or establishments each. A t­distribution with 25 degrees of freedom is used in often have a lengthy list of poststrata or cells in mind when all cases. The Hájek coverage rates are extremely poor, as they develop weighting systems. If the sample size in a expected, ranging from 70.9% to 92% for the first four poststratum is small or the sample estimate of the population characteristics. The poststratified estimators, PS1 and count in a poststratum is much different from an external PS.WR1 provide 93.8% to 94.7% coverage, i.e., near the control count, the poststratum may be collapsed with an nominal 95%. In contrast, PS2 coverage is somewhat poor adjacent one. The conventional justification for collapsing is for Health insurance and hospitalization, especially for that the possibility of creating extreme weights is reduced as

(adjacency, max = 1.8) f where the coverages are 87.5% and are variances of estimates. 88.4%. Coverage rates for PS.WR2 are slightly less than for However, a poor choice of the method for collapsing has PS.WR1 but are reasonably close to nominal. Use of close­ at least two undesirable consequences: (i) deficient frame or mean collapsing generally improves the cases of poor sample coverage in some cells is not completely corrected coverage found with adjacency. For Common Mean Y and (ii) estimates from the standard approach to collapsing coverages are good, ranging from 92.2% to 94.9%. may be quite biased. The latter problem can result in In summary, the weight­restricted estimators, PS.WR1 confidence intervals that cover at much less than the and PS.WR2, have some advantage over the standard nominal rate. Collapsing leads to bias when coverage rates, collapsing estimator, PS2. They are generally less biased cell means, or both are correlated within a collapsed and retain more of the undercoverage adjustment than does poststratum. The bias can be either positive or negative, PS2. However, the most critical element in bias­control is depending on the correlation. how the cells are collapsed in the first place. Collapsing Cells should be collapsed based on similarity of coverage using nearness of cell means or coverage rates is far more rates, population cell means, or both in order to avoid bias. preferable than collapsing using some adjacency criterion This method of collapsing can be much different from based on neither of these. Only when cell means were equal standard procedures that only collapse “adjacent” cells, e.g., did we observe any gain in MSE from collapsing cells. by combining contiguous age groups. If the adjacency However, equality of cell means is the exception in practice. coincides with cells that have similar coverage rates or

Statistics Canada, Catalogue No. 12­001­X 150 Kim, Li and Valliant: Cell collapsing in poststratification means, no bias results. But, this should be checked rather Fuller, W. (1966). Estimation employing post strata. Journal of the than assumed. American Statistical Association, 61, 1172­1183. There are at least two practical issues with collapsing Gonzalez, J.F., Town, M. and Kim, J. (2005). Mean square error analysis of health estimates from the Behavioral Risk Factor based on cell means. One is that, while the theory directs us Surveillance System for counties along the United States/Mexico to collapse based on population means, in a particular border region. Proceedings of the Section on Survey Methods sample we will only have estimates for the population Research. Alexandria VA: American Statistical Association. covered by the frame. Coverage may be so deficient that the Kalton, G., and Flores­Cervantes, I. (2003). Weighting methods. means of the covered and non­covered parts of the Journal of Official Statistics, 19, 81­97. population are substantially different, even within the initial Kalton, G., and Maligalig, D.S. (1991). A comparison of methods of poststrata. This would be a case of “nonignorable non­ weighting adjustment for nonresponse. Proceedings of the U.S. Bureau of the Census Annual Research Conference, 409­428. coverage.” If so, poststratification based only on the initial set of cells or combinations of them cannot correct coverage Kim, J.J. (2004). Effect of collapsing rows/columns of weighting matrix on weights. Proceedings of the Section on Survey Methods bias. A second practical issue is that data on many items are Research, American Statistical Association. collected in most surveys. Collapsing based on the cell Kim, J.J., Thompson, J.H., Woltman, H.F. and Vajs, S.M. (1982). means for one variable may not work well for other Empirical results from the 1980 Census Sample Estimation Study. variables. In that case, the compromise, suggested by Little Proceedings of Section on Survey Research Methods, American and Vartivarian (2005) for nonresponse adjustment, of Statistical Association, 170­175. collapsing based on some weighted average of the means of Kim, J.J., and Tompkins, L. (2007). Comparisons of current and alternative collapsing approaches for improved health estimates. an important set of variables should be a good solution. Paper presented at the 11 th Biennial CDC/ASTDR Symposium on Extensions of this research would be to examine the Statistical Methods, in Atlanta, Georgia, April 17­18, 2007. performance of the class of calibration estimators in Kim, J.J., Tompkins, L., Li, J. and Valliant, R. (2005). A simulation correcting coverage errors. Poststratification is a special study of cell collapsing in poststratification. Proceedings of the case. When categories of qualitative auxiliaries are Section on Survey Methods Research, American Statistical combined due to small sample sizes or other reasons, the Association. same bias problems we have illustrated here may be intro­ Kostanich, D., and Dippo, C. (2000). Current Population Survey: Design and Methodology. Technical paper 63. Washington DC: duced in more general calibration estimators. One method Department of Commerce. of allowing some flexibility to depart from controls while Lazzeroni, L., and Little, R.J.A. (1998). Random­effects models for retaining important auxiliaries is already available in Rao smoothing poststratification weights. Journal of Official Statistics, and Singh (1997). The effect of their proposals on coverage 14, 61­78. bias needs to be investigated. Little, R.J.A. (1993). Post­stratification: A modeler’s perspective. Journal of American Statistical Association, 88, 1001­1012. Acknowledgements Little, R.J.A., and Vartivarian, S. (2005). Does weighting for nonresponse increase the variance of survey means? Survey The authors are indebted to the Associate Editor and the Methodology, 31, 161­168. referees for their careful reviews and constructive Lumley, T. (2004). Analysis of complex survey samples. Journal of comments. This paper reports the general results of research Statistical Software, 9, 1­19. undertaken in part by National Center for Health Statistics Lumley, T. (2005). Survey: Analysis of complex survey samples. R package version 3.0­1. University of Washington: Seattle. (NCHS) staff. The views expressed are attributable to the authors and do not necessarily reflect those of the NCHS. R Development Core Team (2005). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for The work of R. Valliant was partially supported by Statistical Computing, http://www.R­project.org. Professional Services Contracts 200­2004­M­09302 and Rao, J.N.K., and Singh, A.C. (1997). A ridge­shrinkage method for 200­2006­M­17916 between the University of Michigan range­restricted weight calibration in survey sampling. and the National Center for Health Statistics. Proceedings of the Survey Research Methods Section, Alexandria VA: American Statistical Association. References Särndal, C.­E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer­Verlag. Bureau of the Census (2002). Sources and Accuracy of Estimates for Tompkins, L., and Kim, J.J. (2006). Evaluation of collapsing criteria Poverty in the United States: 2002. Washington DC. in sample weighting. Internal NCHS memorandum. Eltinge, J., and Yansaneh, I. (1997). Diagnostics for formation of Tremblay, V. (1986). Practical criteria for definition of weighting nonresponse adjustment cells with an application to income non­ classes. Survey Methodology, 12, 85­97. response in the U.S Consumer Expenditure Survey. Survey Methodology, 23, 37­45.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 151 Vol. 33, No. 2, pp. 151­157 Statistics Canada, Catalogue No. 12­001­X

A single frame multiplicity estimator for multiple frame surveys

Fulvia Mecatti 1

Abstract Multiple Frame Surveys were originally proposed to foster cost savings on the basis of an optimality approach. As surveys on special, rare and difficult­to­sample populations are becoming more prominent, a single list of population units to be used as a sampling frame is often unavailable in sampling practice. In recent literature multiple frame designs have been put forward in order to increase population coverage, to improve response rates and to capture differences and subgroups. Alternative approaches to multiple frame estimation have appeared, all of them relying upon the virtual partition of the set of the available overlapping frames into disjointed domains. Hence the correct classification of sampled units into the domains is required for practical applications. In this paper a multiple frame estimator is proposed using a multiplicity approach. Multiplicity estimators require less information about unit domain membership hence they are insensitive to misclassification. Moreover the proposed estimator is analytically simple so that it is easy to implement and its exact variance is given. Empirical results from an extensive simulation study comparing the multiplicity estimator with major competitors are also provided.

Key Words: Difficult­to­Sample populations; Dual frame survey; Misclassification; Raking ratio; Variance estimation.

1. Introduction same unit can visit more than one site involved in the survey, the sites overlap configuring a multiple frame In classic finite population sampling a basic hypothesis is framework. the availability of a unique and complete list of units Estimation in multiple frame surveys, as first developed forming the target population to be used as a sampling by Hartley (1962, 1974), is based on the virtual partition of frame. In some cases a set of two or more lists is available the population (i.e., the unknown union of the Q for survey purposes. The general case of ≥ 2 Q lists, overlapping frame) into 2Q − 1 disjointed domains (i.e., the singularly partial and possibly overlapping, is known as mutually exclusive intersections of frames). Hence the total Multiple Frame Survey. Multiple frame surveys were Y of a study variable , y taken as the parameter to be originally introduced (Hartley 1974) as a device for estimated, is expressed as a sum of domain totals. Sample reducing survey costs by achieving the same precision as a data from the Q frames are used to produce estimates for customary unique­frame survey. In modern sampling the domain totals. Estimated domain totals are finally practice, as surveys of special, rare and difficult­to­sample combined to provide estimation for the population total . Y populations are becoming more common (Kalton and A number of estimators have been developed according to Anderson 1986; Sudman and Kalton 1986; Sudman, Sirken alternative approaches to multiple frame estimation (see and Cowan 1988) it is often the case that a unique list of Section 2). Since all estimators appearing in literature rely units does not exist and the population size N is an on the partition into the domains as mentioned above, the unknown parameter to be estimated. Recent literature correct identification of the domain membership of each considers multiple frame surveys with the main aim of sampled unit is required for their practical application. This increasing population coverage, of improving response rates is a strong assumption that may not always be true in and of capturing differences and subgroups more accurately practice, as argued for instance in Lohr and Rao (2006). (Iachan and Dennis 1993; Carlson and Hall 1994; Haines Indeed this implies that every sampled unit should be and Pollock 1998; Eurostat 2000). In a recent paper Lohr questioned on both the survey value and on its membership and Rao (2006) stated: “As the U.S., Canada, and other to each frame involved in the survey, in order to be able to nations grow in diversity, different sampling frames may correctly classify them into the domains. In addition to the better capture subgroups of the population. […] We natural risk of misclassification there might also be a risk anticipate that modular sampling designs using multiple connected with confidentiality and with the sensitivity of frames will be widely used in the future”. A contemporary units to the frame membership which could both increase application could be found in web surveys: the population the rate of non­response and affect the estimator precision. coverage can be improved and the bias due to the features of This situation could apply, for instance, when surveying the site used for data collection can be reduced by using two sensitive characteristics (private behaviours, addictions …) or more independent web sites simultaneously. Since the or when sampling elusive populations (illegal immigrants,

1. Fulvia Mecatti, Department of Statistics, University of Milan­Bicocca, Via Bicocca degli Arcimboldi, 8. Ed. U7, 20126 Milan, Italy. E­mail: [email protected]. 152 Mecatti: A single frame multiplicity estimator for multiple frame surveys ex­prisoners, patients…). In the present paper a different ˆ (q ) Y = ∑ ∑ ∑ wiδ i(). K y i (2) approach to estimation in multiple frame surveys is adopted. K q∈ K i ∈ s q The concept of unit multiplicity, corresponding to the Note that when a unbiased estimator for the total Y is number of frames to which units belong, is proposed in given, an estimator for the population size N is also given alternative to the existing approaches based on the domain by simply substituting sample values y by 1’s. membership, i.e., to which frames units belong. An unbiased i Estimators available in literature result from setting estimator, naturally insensitive to domain misclassification weights w () q in (2) according to three main approaches. and applying to any number of frames, is presented. The i Since multiple frame surveys were originally put forward proposed multiplicity estimator has a simple analytical with the aim of fostering cost savings by achieving equal or structure so that it can be easily implemented, while its exact greater precision than a customary unique­frame survey, an variance is given in a closed form and hence readily optimum approach was first suggested by using optimum estimated for any sample size. weights w ()q in (2), i.e., by minimizing the estimator In Section 2 an overall discussion of the main contri­ i , opt variance (Hartley 1962, 1974; Lund 1968; Fuller and butions to multiple frame estimation is presented in a Burmeister 1972). Optimum estimators have optimal unified view and the necessary notation is introduced. In theoretical properties (Skinner 1991; Lohr and Rao 2000) Section 3 a multiplicity estimator is proposed and variance but present practical problems due mainly to their estimation is analysed. An extensive simulation study complexity (explicit though complex formulae for optimum comparing the proposed estimator with major competitors is weights w ()q with any number of frames are given in Lohr presented in Section 4. i , opt and Rao 2006, Section 3). Moreover, optimum weights depend on unknown population covariances so that they 2. Optimum, pseudo­optimum and must be estimated from sample data. This is both compu­ single frame estimation tationally complex and affects optimality since the extra variability in estimating the covariances leads to larger mean Although literature has mostly dealt with the dual frame square errors (Lohr and Rao 2006, Section 7). case ( = 2), Q a general theoretical framework for multiple In order to improve the applicability, a single frame (SF) frame surveys ( ≥ 2) Q has been recently provided in Lohr approach has been proposed by using fixed weights which and Rao (2006). By using their multiple frame notation, ensure design­unbiasedness. For simple random sampling in different estimation approaches are briefly reviewed and the every frame, the SF estimator is given by substituting () q ()()q K −1 available estimators are presented in a unified way which weights wi in (2) with wi, SF = w = ()∑ q∈ K f q where highlights their dependency on the domain membership of fq= n q/ N q denotes the frame sampling fraction (Bankier the sampled units. 1986; Kalton and Anderson 1986; Skinner 1991; Skinner,

Let A1 LL A q A Q be a collection of ≥ 2 Q overlapping Holmes and Holt 1994). Since fixed weights usually differ frames, the union of which offers a population coverage from optimum weights, the SF estimator is generally less adequate for survey objectives. Let the index sets K be the efficient than an optimum estimator (Lohr and Rao 2000). subsets of the range of the frame index = 1L q Q . For Finally a pseudo­optimum approach was proposed (Skinner every index set K⊆ {1LL q Q } a domain is defined as and Rao 1996; Lohr and Rao 2000) in order to achieve both C C the set DAAK = (∩q∈ K q )∩ ( ∩ q∉ K q ), where denotes a wider applicability than optimum estimators and to complementation. improve efficiency compared with the SF approach. A Let the domain membership indicator be the indicator pseudo­maximum likelihood (PML) estimator for multiple ()q δ i ()K taking value 1 if unit i is included in domain DK frame surveys is given by substituting in (2): wi , PML = (K ) ˆ ˆ and 0 otherwise. The estimating total Y over the (unknown) w= NK /()/∑q∈ K ∑ i ∈ s δi K = NK n K where the esti­ q ˆ union of the Q overlapping frames is then expressed as a mated domain sizes N K are the solution of a system of sum over the set of 2Q − 1 disjointed domains non linear equations. Although complex to implement for practical applications (an iterative linear approximation of Y= y = δ ().K y (1) ∑i ∑ ∑ i i Nˆ under simple random sampling is given in Lohr and i∈∪ A K i∈ ∪ A K q q q q Rao 2006, Section 4.1) the PML estimator retains good

Let sq be a sample selected from frame Aq under a theoretical properties from the optimum approach. given design, independently for = 1L . qQ A general Note that formula (2) involves the domain membership expression of a multiple frame estimator based on the indicator δ i (K ); hence optimum, pseudo­optimum and SF domain classification is then estimators apply only if the correct classification of sample data into the 2Q − 1 domains is accomplished.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 153

In the next Section a multiple frame estimator is simple random sampling of every frame the estimator presented on the basis of a single frame multiplicity ap­ variance is given by proach which does not require domain classification. ˆ V( Y M ) =

2 3. The single frame multiplicity estimator Q N− n     q q N y2 m− 2 − y m − 1 . (5) ∑q ∑ i i  ∑ i i  q=1 nq( N q − 1)  i∈ A i∈ A  The notion of multiplicity was first introduced in  q q   connection with Network Sampling (Casady and Sirken An unbiased variance estimator for simple random 1980; Sirken 2004). It is also a tool of the Generalized sampling of every frame is then Weight Share Method (Lavallée 2002; 2007) as well as of ˆ vˆ ( Y M ) = the Center Sampling estimation theory (Mecatti 2004) since center sampling and multiple frame surveys are equivalent 2 Q N( N− n )     under certain conditions. In Lohr and Rao (2006), the q q q N y2 m− 2− f − 1 y m −1 . (6) ∑ 2 q∑ i i q  ∑ i i  multiplicity of domain D is defined as the cardinality of q=1 n( N − 1)  i∈ s i∈ s  K q q  q q   the index set . K Since domains are mutually exclusive, multiplicity is also a characteristic of every population unit, The performance of the multiplicity estimator for finite being the number of frames in which each unit is included sample sizes has been empirically studied under simple among the Q involved in the survey. random sampling and compared to major competitors in a simulation study. Let mi be the multiplicity of unit . i Note that unit multiplicity may be collected simply by asking sampled units how many frames they belong to. 4. Simulation study

Since clearly ∑q ∑ i∈ A y= ∑ i∈∪ A m y , it follows that q i q q i i Several simulation results concerning dual frame Q estimators have appeared in literature (Bankier 1986; Y= y m − 1 . (3) ∑ ∑ i i Skinner and Rao 1996; Lohr and Rao 2000). In the general q=1 i ∈ A q case of ≥ 2 Q frames, Lohr and Rao (2006) extensively Notice that expression (3), which involves exclusively investigated the empirical mean squared errors of a set of sums over the frames, represents a practical advantage with eight estimators under optimum, pseudo­optimum and respect to equation (1). In fact the domains provide a virtual single frame approaches, in a three­frame framework under (unknown) partition of the population while the sample a two­stage design. Their results suggest that optimum selection is actually performed in the Q overlapping estimators are theoretically optimal but in practice the extra frames. This leads to a SF multiplicity estimator as given by variability in estimating optimum weights leads to larger

Q mean squared errors. Hence the PML estimator appears as ˆ (q )− 1 YM = ∑ ∑ wi y i m i (4) the best performer in terms of empirical relative efficiency. q=1 i ∈ s q Furthermore, their study regarded a case of about 10% of sampled units misclassified into the domains and more with fixed weights w() q ensuring, for instance, design­ i research on the effects of misclassification on the estimator unbiasedness. For simple random sampling of every frame performances is recommended. we have w(q )= f− 1 ,. ∀ i ∈ s i q q In the present study pseudo­optimum and single frame Unlike the optimum, PML and SF estimators discussed estimators are compared with the multiplicity estimator (4), in Section 2, estimator (4) does not involve the sample with three main objectives: membership indicator and it is very simple to implement in practical applications. Furthermore, it is to be noted that for i) to investigate empirical conditions in which the simple random sampling of every frame, the sampled values multiplicity estimator results more efficient than −1 in multiplicity estimator (4) are weighted by (),fq m i i.e., the SF estimator (Section 4.2); by a specific frame coefficient; vice versa, in the SF esti­ ii) to consider the raking ratio correction to known () K − 1 mator sampled values are weighted by wi = (),∑ q∈ K f q frame sizes N q as already proposed in order to i.e., by an average coefficient over the frames involved in improve efficiency of the SF estimator (Section ˆ each domain. As a consequence YM is expected to be more 4.3); accurate than the SF estimator, as confirmed by simulation iii) to explore the effects of increasing rates of results. Moreover, owing to its Horvitz­Thompson structure, misclassification upon the empirical properties of ˆ the exact variance of YM can be derived in closed form. For the PML and SF estimators (simple and raked)

Statistics Canada, Catalogue No. 12­001­X 154 Mecatti: A single frame multiplicity estimator for multiple frame surveys

versus the natural insensitivity of the multiplicity 0.45 Overlapping rate estimator (Section 4.4). 0.4 10% 0.35 4.1 Implementation 25% 0.3 The simulation study was performed in an artificial three­ 43% 0.25 frame setup and implemented as follows. N population 55% 0.2 pseudo­values yi are generated from a Gamma distribution. 65% Some preliminary simulations indicated that both increasing 0.15 73%

values of the population size N and different values for the Sampling disproportion 0.1 85% Gamma parameters (leading to an asymmetrical and almost 99% symmetrical shape) do not produce significant differences in 0.05 the pattern of the relative performance of the estimators 0 0 0.5 1 1.5 2 2.5 3 considered. The study was then conducted by setting Frame coverage = 1,200 N and by generating from a Gamma distribution Figure 1 Simulated populations with parameters of 1.5 and 2. Every pseudo­value yi is randomly assigned to the = 3 Q frames according to 3 4.2 Multiplicity versus simple single frame independent Bernoulli trials with probability αq = NN q /, estimation = 1, 2, 3. q Different scenarios regarding both frame coverage and frame overlapping result from different As noted in Section 3, the multiplicity estimator involves specific frame weights whereas the SF estimator is based on choices for α q , under the two constraints: a) ∑ q αq ≥ 1 in order to ensure that the 3 frames cover the entire population average coefficients. As a consequence the two estimators and b) the 3 frames are non­empty. In some cases, the coincide for constant sample fraction fq = f in every desired frame overlapping was produced by fixing the ratio frame, i.e., for proportionate sampling, and they offer different estimates for disproportionate sampling. Simu­ NNK / of the population units included in each domain. lation results provide empirical evidence that the multipli­ Chosen a set of sampling fractions fq= n q/ N q , q = 1, 2, 3, a simple random sampling is selected independently city estimator is more accurate than the simple SF estimator. ˆ from every frame, iteratively for 10,000 simulation runs. For Estimator YM is shown to be more efficient in all the cases ˆ ˆ explored except in one extreme case in which the three a given estimator, say , Y the collection of values {,Yp p = 1...10,000} is assumed as its monte carlo distribution and frames are almost complete and hence the total overlapping ˆ ˆ is close to 100%. Neglecting this single case, efficiency the empirical mean EYYmc( )= ∑ p p /10,000 and the empir­ 2 ˆ ˆ ˆ gains of YM over the SF estimator, as measured by a ical mean squared error MSEmc (YY )=∑ p [ p − ] /10,000 Y are calculated. The monte carlo error is controlled by only customary empirical efficiency ratio (see Table 1), range accepting simulations giving empirical relative bias from 5% to 48%, and are never less than 26% in half of the ˆ ˆ simulations. Efficiency of the multiplicity estimator over the RBmc (YEYYY )= 100 ⋅ |mc ( ) − | / less than 1.5% for those estimators known to be unbiased. Furthermore, by using the SF estimator increases as the sampling disproportion in­ exact variance of the multiplicity estimator as given by (6), creases (see Table 2) whereas it has resulted as being ˆ ˆ essentially independent with respect to increasing levels of simulations ensure |MSEmc (Y M ) − VY(M ) | ≤ 0.03. Sev­ eral different scenarios have been investigated by combining frame coverage and overlapping. different levels of frame coverage, of frame overlapping and Table 1 ˆ of sampling disproportion, leading to 29 simulated Empirical efficiency ratio of YM versus SF esti­ mator: Elementary statistics over 28 simulated populations. In Figure 1 the simulated populations are populations represented as points in the plane formed by the two main th simulation parameters, namely the (total) frame coverage on average Max min median 75 quantile 0.7425 0.95 0.52 0.74 0.89 the horizontal axis (as given by ∑ q α q ) and the sampling disproportion on the vertical axis, i.e., the dispersion among Table 2 ˆ the sampling fractions f q as measured by Empirical efficiency ratio of YM versus the SF esti­ 2 mator for increasing levels of sampling disproportion ∑q ∑ q ′ |fq − f q ′ | / 3 . The different shape of populations/ points in Figure 1 indicates different levels of overlapping, Sampling Disproportion 0.11 0.22 0.31 0.40 namely the total rate of population units classified into the Empirical efficiency ratio averaged 0.92 0.81 0.68 0.57 four overlapping domains. for different levels of frames coverage/overlapping

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 155

4.3 Raking ratio adjustment estimator, as viewed as a particular case of the Generalized It has been suggested that the raking ratio adjustment Weight Share Method, is outlined in Lavallée (2002, 2007). using the known frame sizes N q (Bankier 1986) be used in Table 3 Efficiency of Mrak versus SFrak: Ten indicative order to improve efficiency of the simple SF estimator simulation runs Theoretical and empirical results that have already appeared Frame coverage Sampling Empirical fractions efficiency ratio in literature confirm that the raking ratio SF estimator αq = N q / N Mrak versus (SFrak) can be considerably more efficient than the simple fq= n q/ N q SFrak SF estimator (Skinner 1991; Lohr and Rao 2000, 2006; 0.60 0.60 0.60 0.01 0.95 0.15 0.26 Mecatti 2005). 0.35 0.35 0.35 0.80 0.20 0.50 0.54 In order to adjust the multiplicity estimator via raking 0.85 0.85 0.85 0.01 0.95 0.15 0.71 ratio, knowledge of the domain membership of sampled 0.35 0.40 0.50 0.70 0.80 0.60 0.96 units has to be assumed. By using this additional, though 0.85 0.85 0.85 0.80 0.20 0.50 1.01 ˆ 0.60 0.60 0.60 0.70 0.80 0.60 1.09 redundant, information YM may be rewritten as 0.80 0.50 0.35 0.01 0.95 0.15 1.22 ˆ −1 0.35 0.40 0.50 0.01 0.95 0.15 1.63 YM =∑ ∑()() |K | fq ∑ δ i K y i (7) K q∈ K i∈ s 0.70 0.05 0.95 0.70 0.80 0.60 2.09 q 0.70 0.05 0.95 0.80 0.20 0.50 5.79 where || K indicates the number of frames involved in domain DK and it equals unit multiplicity mi for all 4.4 Misclassification (0) − 1 i∈ DK . Setting the initial weights at hKq =(), | K | f q the t th iteration of the raking ratio multiplicity estimator (Mrak) The aim of the final part of the simulation study is to is obtained by substituting the following raked weights in investigate the sensitivity of the pseudo­optimum (PML) (7) and single frame estimators (simple and raked) to increasing levels of misclassification of sampled units into the  N h(t − 1) if q∈ K q Kq domains, with respect to the structural insensitivity of the  (t − 1) (t )  h n h = ∑ Kq q proposed multiplicity estimator. For a chosen rate of Kq  q∈ K  misclassification, the desired number of sampled units to be (t − 1) if q∉ K  hKq inexactly classified is taken from the domain with the where q= Q if t is a multiple of Q otherwise q = largest size and randomly assigned to the remaining mod( ), tQ for = 1, 2, t L until convergence. domains, independently for each frame. Simulations regarded different levels of frames coverage Tables 4 and 5 show elementary statistics summarizing combined with different sets of sampling fractions, leading simulation results in the case of exact classification and in to increasing sampling disproportion. the case of slight misclassification equal to 1% of sampled Empirical results show that Mrak is more efficient than units. Note that for exact classification all the estimators SFrak in 38% of cases explored and it is equally or less appear unbiased (or nearly unbiased). As regards efficiency, efficient in the remaining cases. Efficiency gains range from according to other simulation results (Lohr and Rao 2006) 3% to 74% and occur for low levels of frame coverage. For SFrak and PML estimators show similar performances. As increasing frame coverage (and hence increasing over­ expected, for exact classification they are more efficient ˆ lapping) Mrak estimator is superior to SFrak estimator for than YM in all the cases explored (except for two isolated high sampling disproportion only. In the other cases, namely cases) as a consequence of the different amount of for increasing frame coverage/overlapping combined with information used in the estimation process. However the SF low to medium sampling disproportion, Mrak can be (simple and raked) and PML estimators tends to become ˆ considerably less efficient than SFrak (see Table 3 for the biased and less efficient than YM in presence of just a small ten indicative cases) and also severely biased. Thus amount of misclassification. empirical results suggest that the raking ratio adjustment has Table 4 Relative bias in case of 1% of misclassification: Ele­ better effects under a single frame approach than under a mentary statistics over the 29 simulated populations multiplicity approach, although there are conditions in (absolute) Average Min Max Median 75 th which the latter is still superior. With this respect more RB mc 1% of quantile sampled units research is needed. Particularly, since the raking ratio misclassified procedure is in fact a special case of calibration (Deville and ˆ YM 0 0 0 0 0 Särndal 1992; Deville, Särndal and Sautory 1993), potential SF 2.5880 0.83 7.02 2.65 2.13 improvements might follow by applying the more general SFrak 1.7632 0.73 4 1.97 1.65 ˆ PML 2.7352 0.23 4.67 3.46 2.87 calibration to estimator YM . Calibration of the multiplicity

Statistics Canada, Catalogue No. 12­001­X 156 Mecatti: A single frame multiplicity estimator for multiple frame surveys

Table 5 Acknowledgements ˆ Empirical efficiency ratio of YM versus SF and PML esti­ mators: Elementary statistics over the 29 simulated popu­ lations for exact classification and for slight misclassification This work was partially supported by a grant from the Italian Ministry of University and Research. The author Empirical efficiency Average Min Max Median 75 th ratio quantile wishes to thank Jon N.K. Rao for the useful discussion and resourceful advice. Thanks are also due to the editor, two Exact classification SFrak 1.43 0.69 3.21 1.51 1.28 associate editors and two anonymous referees for their PML 1.41 0.72 3.30 1.47 1.25 constructive comments and suggestions. 1% misclassification SF 0.39 0.13 0.71 0.54 0.34 SFrak 0.78 0.13 1.98 0.95 0.74 References PML 0.77 0.14 1.94 0.98 0.70 Bankier, M.D. (1986). Estimators based in several stratified samples with applications to multiple frame surveys. Journal of American Statistical Association, 81, 1074­1079. Finally we focused on the case of maximum efficiency of ˆ SF, SFrak and PML over YM for exact classification, Casady, R.J., and Sirken, M.G. (1980). A multiplicity estimator for namely the case of high frame overlapping/coverage and multiple frame sampling. Proceedings of the Survey Research low sampling disproportion. In this set up, increasing rates Methods Section, American Statistical Association, 601­605. of misclassification of sampled units into domains (from 0 Carlson, B.L., and Hall, J.W. (1994). Weighting sample data when to 50%) were investigated. Table 6 and 7 show respectively multiple sample frames are used. Proceedings of the Survey the relative bias and the efficiency ratio of Yˆ versus SF, Research Methods Section, American Statistical Association, 882­ M 887. SFrak and PML estimators, for increasing levels of misclassification. It is to be noticed that although the Deville, J.C. and Särndal, C.­E. (1992). Calibration Estimators in negative effects of misclassification are rapid and severe for Survey Sampling. Journal of American Statistical Association, 87, 376­382. all the competitors, the PML estimator emerges as the least affected. Deville, J.­C., Särndal, C.­E. and Sautory, O. (1993). Generalized As a conclusion the proposed multiplicity estimator, raking procedures in survey sampling. Journal of American besides being simple, is recommended when the risk of Statistical Association, 88, 1013­1020. (even slight) misclassification of sampled units into the Eurostat (2000). Push and Pull Factors of International Migration. domains is a concrete possibility. Country Report­Italy, 3/2000/E/n.5, Bruxelles: European Communities Printing Office.

Table 6 (absolute) Relative bias for increasing rate Fuller, W.A., and Burmeister, L.F. (1972). Estimators of samples of misclassification selected from two overlapping frames. Proceedings of the Social Statistics Sections, American Statistical Association, 245­249. ˆ % misclassification YM SF SFrak PML 0 0 0 0 ≈ 0 Haines, D.E., and Pollock, K.H. (1998). Combining multiple frame to 1% 0 2.57 1.38 4.3 estimate population size and totals. Survey Methodology, 24, 79­ 5% 0 13.57 7.15 2.75 88. 10% 0 17.80 14.14 4.56 20% 0 25 25 6 Hartley, H.O. (1962). Multiple frame surveys. Proceedings of the 50% 0 144 68 39 Social Statistics Sections, American Statistical Association, 203­ 206.

Table 7 Hartley, H.O. (1974). Multiple frame methodology and selected ˆ Empirical efficiency ratio of YM versus SF, SFrak and applications. Sankhyā, C, 36, 99­118. PML estimators for increasing rate of misclassification Iachan, R., and Dennis, M.L. (1993). A multiple frame approach to ˆ ˆ ˆ % YM versus YM versus YM versus sampling the homeless and transient population. Journal of misclassification SF SFrak PML Official Statistics, 9, 747­764. 0 0.640 3.210 3.300 1% 0.260 1.040 1.100 Kalton, G., and Anderson, D.W. (1986). Sampling rare populations. 5% 0.020 0.060 0.370 Journal of Royal Statistical Society, A, 149, 65­82. 10% 0.010 0.020 0.150 20% 0.004 0.004 0.080 Lavallée, P. (2002). Le sondage indirect ou la méthode généralisée du 50% ≈ 0 0.001 0.006 partage des poids. Editions de l’Université de Bruxelles, Editions Ellipses. Lavallée, P. (2007). Indirect Sampling. New York: Springer. Lohr, S.L., and Rao, J.N.K. (2000). Inference from dual frame surveys. Journal of American Statistical Association, 95, 271­280.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 157

Lohr, S., and Rao, J.N.K. (2006). Multiple frame surveys: Point Skinner, C.J. (1991). On the efficiency of raking ratio estimator for estimation and inference. Journal of American Statistical multiple frame surveys. Journal of American Statistical Association, 101, 1019­1030. Association, 86, 779­784.

Lund, R.E. (1968). Estimators in multiple frame surveys. Proceedings Skinner, C.J., Holmes, D.J. and Holt, D. (1994). Multiple frame of the Social Statistics Sections, American Statistical Association, sampling for multivariate stratification. International Statistical 282­288. review, 62, 333­347.

Mecatti, F. (2004). Center sampling: A strategy for surveying Skinner, C.J., and Rao, J.N.K. (1996). Estimation in dual frame difficult­to­sample populations. Proceedings of the Statistics surveys with complex designs. Journal of American Statistical Canada Symposium. Association, 91, 349­356.

Mecatti, F. (2005). Single frame estimation in multiple frame survey. Sudman, S., and Kalton, G. (1986). New developments in the Proceedings of the Statistics Canada Symposium. sampling of special populations. Annual Review of Sociology, 12, 401­429. Sirken, M.G. (2004). Network sample surveys of rare and elusive populations: A historical review. Proceedings of Statistics Canada Sudman, S., Sirken, M.G. and Cowan, C.D. (1988). Sampling rare Symposium, Keynote Address. and elusive populations. Science, 2, 991­996.

Statistics Canada, Catalogue No. 12­001­X

Survey Methodology, December 2007 159 Vol. 33, No. 2, pp. 159­166 Statistics Canada, Catalogue No. 12­001­X

Variance estimation for a ratio in the presence of imputed data

David Haziza 1

Abstract In this paper, we study the problem of variance estimation for a ratio of two totals when marginal random hot deck imputation has been used to fill in missing data. We consider two approaches to inference. In the first approach, the validity of an imputation model is required. In the second approach, the validity of an imputation model is not required but response probabilities need to be estimated, in which case the validity of a nonresponse model is required. We derive variance estimators under two distinct frameworks: the customary two­phase framework and the reverse framework.

Key Words: Imputation model; Nonresponse model; Marginal random hot deck imputation; Reverse framework; Two­ phase framework; Variance estimation.

1. Introduction generated according to the nonresponse mechanism. In the reverse framework, the order of sampling and response is Variance estimation in the presence of imputed data for reversed. That is, the population is first randomly divided simple univariate parameter such as population totals and into a population of respondents and a population of population means has been widely treated in recent years; nonrespondents according to the nonresponse mechanism. see for example, Särndal (1992), Deville and Särndal Then, a random sample is selected from the population (1994), Rao and Shao (1992), Rao (1996) and Shao and (containing respondents and nonrespondents) according to Steel (1999). In practice, it is often of interest to estimate the the sampling design. As we will see in section 4, the reverse ratio of two population totals, = / X , RY where framework facilitates the derivation of variance estimators

(Y , X )= ∑ i∈ U ( yi , x i ), y and x denote two variables of but unlike the two­phase framework, it requires the interest potentially missing and U denotes the finite additional assumption that the nonresponse mechanism does population (of size ) N under study. Although variance not depend on which sample is selected. This assumption is estimation for a ratio in the presence of imputed data is a satisfied in many situations encountered in practice. For problem frequently encountered in practice (especially in each framework, inference can be based either on an business surveys), it has not been, to our knowledge, fully Imputation Model (IM) or a Nonresponse Model (NM). The studied in the literature. In this paper, we consider the case IM approach requires the validity of an imputation model, Marginal Random Hot Deck (MRHD) imputation whereas the NM approach requires the validity of a performed within the same set of imputation classes for both nonresponse model. variables y and . x In other words, to compensate for In section 2, we introduce notation, assumptions and the nonresponse, random hot deck imputation is performed imputed estimator of a ratio under weighted MRHD separately for both variables within the same set of imputation. The IM and NM approaches are then presented imputation classes. This situation occurs frequently in in sections 2.1 and 2.2. In section 2.3, the bias of the practice. For simplicity, we consider the case of a single imputed estimator is discussed. In section 3, variance imputation class. Extensions to multiple imputation classes estimators are derived under the two­phase framework and are relatively straightforward for most derivations presented the IM approach using the method proposed by Särndal in this paper. (1992). We show that, under MRHD imputation, the naïve In this paper, we derive variance estimators that take variance estimator (that treats the imputed values as sampling, nonresponse and imputation into account. Two observed values) generally overestimates the sampling distinct frameworks for variance estimation have been variance when y and x are positively correlated. In section studied in the literature: (i) the customary two­phase 4, we derive variance estimators under the reverse framework (e.g., Särndal (1992)) and (ii) the reverse framework and both the IM and the NM approaches using framework (e.g., Shao and Steel (1999)). In the two­phase the method proposed by Shao and Steel (1999). Finally, we framework, nonresponse is viewed as a second phase of conclude in section 5. selection. That is, a random sample is selected from the population according to a given sampling design. Then, given the selected sample, the set of respondents is

1. David Haziza, Département de mathématiques et de statistique, Université de Montréal, Montréal, (Québec), H3C 3J7, Canada. E­mail: [email protected]. 160 Haziza: Variance estimation for a ratio in the presence of imputed data

* * 2. Notation and assumptions let yi and xi denote the imputed values to replace the missing values yi and xi , respectively. An imputed Our goal is to estimate . R We select a random sample, estimator of R is given by , s of size , n according to a given sampling design ( ). ps A w y % complete­data estimator is given by ∑ i i R ˆ = i∈ s , (2.4) I w x% Y ˆ ∑ i i R ˆ = HT , (2.1) i∈ s ˆ X HT where y% = a y +(1 − a ) y * and x% = b x +(1 − b ) x * . ˆ ˆ i i i i i i i i i i where (,)(,)YHT X HT = ∑ i∈ s wi y i x i denote the Horvitz­ Under weighted MRHD imputation, to compensate for the Thompson estimators for Y and , X respectively and * missing value yi , a donor j is selected at random with w = 1/ π denotes the sampling weight of unit , i where π () y i i i replacement from sr so that is its probability of inclusion in the sample. The estimator w ˆ * j R in (2.1) is asymptotically ­ p unbiased for , R i.e., P(). yi= y j = ˆ ∑ wl a l ERRp (),≈ where the subscript p denotes the expectation l∈ s and variance with respect to the sampling design ( ). ps * ˆ Similarly, to compensate for the missing value xi , a donor Since R is a nonlinear function of estimated totals, its exact () x ˆ k is selected at random with replacement from sr so that design variance, VRp (), cannot be easily obtained. To overcome this problem, Taylor linearization is often applied * w k P(). xi= x k = in order to approximate the exact variance. An ∑ wl b l asymptotically ­ p unbiased estimator of the approximate l∈ s Rˆ variance of is given by Note that, when both yi and xi are missing, j is generally ˆ not equal to k under weighted MRHD imputation. VSAM =∑∑ ∆ ije i e j , (2.2) i∈ s j ∈ s Random hot­deck imputation within classes is widely used in practice because (i) it preserves the variability of the ˆ ˆ where ei =1/ XHT ( yi − Rx i ), ∆ij = ( π ij − π i π j ) / π ij π i π j original data; and (ii) it leads to plausible values. The latter and πij is the joint selection probability of units i and . j is particularly important in the case of categorical variables Note that πii = π i . In the case of simple random sampling of interest. However, random hot­deck imputation within without replacement, the variance estimator (2.2) reduces to classes suffers from an additional component of variance due to the use of a random imputation mechanism. The Vˆ =1 − n 1 s2+ Rˆ 2 s 2 − 2 Rs ˆ  , (2.3) SAM ( N ) nx 2  y x xy  main reason weighted MRHD imputation is used is that it leads to asymptotically unbiased estimator under the where nonresponse model approach (see section 2.1) unlike 2 1 2 2 1 2 unweighted MRHD imputation. sy = ∑ (yi − y ) , sx = ∑ (xi − x ) (y ) ( x ) (y ) ( x ) n − 1 i∈ s n − 1 i∈ s Let EI(.| s , s r , s r ), VI(. | s , s r , s r ) and (y ) ( x ) CovI (.,. |s , sr , s r ) denote the conditional expectation, and the conditional variance and the conditional covariance 1 operators with respect to the random imputation mechanism sxy = ∑ (xi − x )( yi − y ) n − 1 i∈ s (here, weighted MRHD imputation). Using a first­order Taylor expansion, it can be shown that with ˆ (y ) ( x ) y r ˆ 1 EI( R I | s , s r , s r )≈ ≡ R r , (2.5) (y , x )= (y , x ). xr n ∑ i∈ s i i where We now turn to the case for which both variables x and w a y y may be missing. Let ai be the response indicator of unit ∑ i i i y = i∈ s i such that ai = 1 if unit i responds to variable y and r ∑ wi a i ai = 0, otherwise. Similarly, let bi be the response i∈ s indicator of unit i such that bi = 1 if unit i responds to () y and variable x and bi = 0, otherwise. Let sr be the set of () x w b y respondents to variable y of size ry and sr be the set of ∑ i i i x = i∈ s respondents to variable x of size rx . Also, let rxy be the r ∑ wi b i number of respondents to both variables y an . x Finally, i∈ s

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 161

denote the weighted means of the respondents to variables y P( bi= 1, b j = 1 | s , s ) = Z pxi p xj for ≠ . ij However, and , x respectively. The approximation in (2.5) will be we do not assume that, for a given unit , i response to valid if the sample size within classes is sufficiently large, variable y is independent of response to variable . x In which we assume to be the case. Now, let other words, if we let pxyi= P( a i = 1, b i = 1 | s , Z s ), then we have p≠ p p , in general. Within an impu­ 2 1 2 xyi xi yi syr = ∑ wi a i() y i− y r tation class, we assume a uniform response mechanism such ∑ wi a i i∈ s i∈ s that pyi= p y, p xi = p x and p xyi = p xy . We also assume that, after conditioning on s and Ζ , the and s nonresponse mechanism is independent of all other 2 1 2 variables involved in the imputed estimator (2.4) as well as sxr = ∑ wi b i() x i− x r ∑ wi b i i∈ s the joint selection probabilities. In other words, the i∈ s distribution of R s does not depend on Ys =∈{;},y i i s denote the variability of the ­ y values and the ­ x values in Ws ={;}w i i ∈ s and Π s={π ij ;i ∈ s , j ∈ s} , after () y (x ) the set of respondents sr and sr , respectively. Noting conditioning on s and Ζ s. As a result, except for the that, under weighted MRHD imputation, response indicators ai and bi , we assume that all the

* 2 * 2 variables involved in the imputed estimator (2.4) as well as VI(),() y i= s yr V I x i = s xr the joint selection probabilities are treated as fixed when taking expectations and variances with respect to the and nonresponse model. From this point on, we use the subscript * * q to denote the expectation and variance with respect to the CovI (y i , x i ) = 0, nonresponse mechanism. ˆ (y ) ( x ) we can approximate VI( R I | s , s r , s r ) by 2.2 The imputation model approach V( Rˆ | s , s()()y , s x ) I I r r In the IM approach, inference is made with respect to the joint distribution induced by the imputation model, the   1 2 2ˆ 2 2 2 (2.6) sampling design and the nonresponse model. The impu­ ≈ 2 ∑ wi(1− a i ) s yr + R r∑ w i(1 − b i ) s xr  . xr  i∈ s i∈ s  tation model is a set of assumptions about the unknown distribution of (YX , )= {(y , x ); i ∈ U }. Within an Expressions (2.5) and (2.6) will be useful in subsequent U U i i imputation class, the imputation model, , m in the case of sections when discussing the bias and the variance of the ˆ MRHD imputation, is given by imputed estimator RI . As we will see in sections 3 and 4, the conditional variance (2.6) is a measure of the variability  y = µ + ε  i y i due to the imputation mechanism. m :  (2.7) Next, we describe two approaches to inference that will   xi = µ x + η i be used to obtain variance estimators in sections 3 and 4: the Nonresponse Model (NM) approach and the Imputation where ε i is a random error term such that E m(ε i ) = 0, Model (IM) approach. 2 E m(ε i ε j ) = 0, for ≠ , ij Vm (ε) i = σ ε and ηi is a random error term such that E m( η i ) = 0, E m ( η i η j ) = 0, for 2.1 The nonresponse model approach 2 ≠ , ij V m(). ε i = σ η Furthermore, we assume that

In the NM approach, inference is made with respect to E m(ε iη i ) = σ εη . Here, E m (.), Vm (.) and Covm (.) denote the joint distribution induced by the sampling design and the respectively the expectation, the variance and the covariance nonresponse model. The nonresponse model is a set of operators with respect to model . m It is implicit in the assumptions about the unknown distribution of the response notation that expectations or variances with respect to model indicators R s={(,);}.a i b i i ∈ s This unknown distri­ m are conditional on ZU={;}. z i i ∈ U In this approach, bution is often called the nonresponse mechanism. Let we assume that the distribution of the model errors (,εU (y ) (x ) pyi= P( a i = 1 | s , Z s ) be the response probability of unit ηU )= {(ε i , η i );i ∈ U } does not depend on , s sr , sr ,

i to variable , y where Zs={;} z i i ∈ s and z i is a vector WU ={ w i ; i ∈ U } and Π U={π ij ;i ∈ U , j ∈ U }, after of auxiliary variables available for all sample units used to conditioning on ZU . As a result, except for the variables of form the imputation classes. Similarly, let p xi = P ( b i = interest y and , x all variables involved in the imputed 1 |s , Z s ) be the response probability of unit i to variable estimator (2.4) are treated as fixed when taking expectations . x We assume that units respond independently; i.e., p yij = and variances with respect to the imputation model. P( ai= 1, a j = 1 | s , s ) = Z pyi p yj for i≠ j and p xij =

Statistics Canada, Catalogue No. 12­001­X 162 Haziza: Variance estimation for a ratio in the presence of imputed data

2.3 Bias of the imputed estimator 3.1 Estimation of the sampling variance VSAM ˆ ˆ To study the bias of the imputed estimator (2.4), we use Let VORD be the naive variance estimator of R I ; i.e., the ˆ the standard decomposition of the total error of RI : variance estimator obtained by treating the imputed values ˆ as observed values. The variance estimator VORD is thus ˆ ˆ ˆ (y ) ( x ) ˆ %%%ˆ ˆ RI − R = R − R  +  EI( R I | s , s r , s r ) − R  obtained by replacing ei by ei =1/ XI ( y i − R I x i ) in (2.2) which leads to +Rˆ − E( R ˆ | s , s(y ) , s ( x ) ) . (2.8)  I I I r r  ˆ VORD =∑∑ ∆ ije%% i e j . (3.2) i∈ s j ∈ s The first term RRˆ − on the right­hand side of (2.8) is ˆ called the sampling error, the second term EI( R I | s , As we show now in the case of simple random sampling s(y ),) s ( x ) − Rˆ is called the nonresponse error, whereas the ˆ r r without replacement, VORD overestimates VSAM under third term Rˆ− E( R ˆ | s , s(y ) , s ( x ) ) is called the impu­ I I I r r MRHD imputation whenever σεη > 0 (as it is usually the tation error. case in practice). After some algebra, we obtain Using a first­order Taylor expansion, it can easily be shown that, under the NM approach, the imputed estimator E( Vˆ− V ˆ | s , s(y ) , s ( x ) ) ˆ mI ORD SAM r r (2.4) is asymptotically ­ pqI unbiased; that is, ERpqI( I − (3.3) ) ≈ 0. R Also, under the IM approach and model (2.7), it µ  r  σ ≈ 2 y 1−n 1 − xy εη . can be shown that the imputed estimator (2.4) is 2  ( )   µ µ x N  n  n asymptotically ­ mpqI unbiased under the IM approach; that x is, ERR(ˆ −≈ ) 0. Thus, the imputed estimator is mpqI I Expression (3.3) shows that Vˆ is ­ mpqI biased for robust in the sense that it is valid under either the NM ORD Vˆ unless σ = 0, r= n (which is the case of approach or the IM approach. Note that for the asymptotic SAM εη xy complete data) or n= N (which is the census case). The bias to be equal to 0 under both approaches, we require that fact that Vˆ is not a valid estimator of V can be easily the sample size within each imputation class is sufficiently ORD SAM explained by noting that although MRHD imputation large. From this point on, we thus assume that the bias of preserves the variability, s 2 and s 2 , corresponding to ˆ x y RI is negligible. variables x and , y it does not preserve the covariance, s xy , in (2.3). Indeed, imputation tends to underestimate 3. Variance estimation: relationships between variables that are positively ˆ The two­phase framework correlated. As a result, VORD overestimates VSAM because of the presence of the minus sign in front of s xy in (2.3). To In this section, we derive variance estimators under the overcome this difficulty, Särndal (1992) proposed to ˆ ˆ (y ) ( x ) two­phase framework and the IM approach according to the estimate VDIF = EmI ( VSAM − V ORD | s , sr , s r ) by a ˆ ˆ (y ) method proposed by Särndal (1992) and Deville and ­ mI unbiased estimator DIF ; V i.e., EmI ( VDIF | s , sr , (x ) Särndal (1994). Using the decomposition (2.8), the total sr ).= VDIF However, the derivation of this component for ˆ an arbitrary design involves very tedious algebra in the case variance of RI can be approximated by of a ratio. Therefore, we propose an alternative that does not ˆ ˆ 2 VmpqI( R I − R ) ≈ EmpqI ( R I − R ) require any derivation but involves the construction of a new set of imputed values. It can be described as follows: % whenever a = 0 and/or b = 0, select a donor j at =VVVVSAM + NR +I + 2MIX , (3.1) i i random with replacement from the set of respondents to ˆ where VEVREVSAM =m p()() = m SAM is the sampling both variables y and x (i.e., the set of sampled units for ˆ variance of the complete­data estimator , R VENR = pq which ai = 1 and bi = 1) with probability wj/ ∑ l∈ s w l a l b l ˆ ()y() x ˆ (y ) ( x ) Vm( E I ( R I | s , s r , s r )− R | s , sr , s r ) is the non­ and impute the vector (,).xj y j In other words, whenever ˆ % response variance of the imputed estimator R I , VI = one variable is missing, the observed value is discarded and ˆ ˆ (y ) ( x ) (y ) ( x ) EVRmpq I( I− ERss I ( I | , r , s r ) | ss ,r , s r ) is the im­ set to missing; the missing values are then replaced by the ˆ putation variance of the imputed estimator RI , and VMIX = values of a donor selected at random among the set of ˆ ()()y x ˆ ˆ (y ) ( x ) Epqm[((|, ERsss I I r , r )− RRRsss )( − )|,r , r ] is a respondents to both variables x and y (often called the set mixed component. Note that the expression (3.1) contains of common donors). Similarly, when both variables are only one cross product term, 2MIX , V because the other cross missing, the vector (,)xj y j of a donor j is imputed. Then, product terms are all asymptotically equal to 0. use the standard variance estimator (2.2) valid in the complete response case using these imputed values. Let

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 163

Vˆ * denote the resulting variance estimator. Note that this ˆ ORD V NR = new set of imputed values is used only to obtain a valid estimator of the sampling variance and is not used to 2 2 2 estimate the parameter of interest . R It can be shown that ∑wi a i ∑ w i ∑ wi a i  1  i∈ s i∈ s i∈ s  2 ˆ *  + − 2 s VORD is an asymptotically ­ mpqI unbiased estimator of 2  ˆ2 ˆ2 ˆ ˆ  Iy x I  Na N NN a VSAM . In practice, one could, for example, create a variance   estimation file containing the new set of imputed values and use standard variance estimation systems (used in the  w2 b w2 w2 b  complete data case) to obtained an estimate of the sampling ∑i i ∑ i ∑ i i + i∈ s +i∈ s − 2 i∈ s  Rˆ 2 s 2 variance.  ˆ2 ˆ2 ˆ ˆ  I Ix Nb N NN b  

3.2 Estimation of the nonresponse variance VNR 2 2 2 2 (y )  w a b w w a w b   An estimator Vˆ of V= E V ( E ( Rˆ | s , s , ∑i i i ∑ i ∑ i i ∑ i i (3.5) NR NR pq m I I r  i∈ s i∈ s i ∈ s i∈ s  ˆ  ()x ˆ (y ) ( x ) −2 +2 − − R I s xyr  . sr )− R | s , sr , s r ) can simply be obtained by esti­  NNNNNNNˆ ˆ ˆ ˆ ˆ ˆ ˆ  ˆ ()()y x ˆ (y ) ( x )  a b a b   mating Vm((|, E I R I s s r , s r )− R |, s sr , s r ). Using a    first­order Taylor expansion, we obtain The estimator (3.5) is asymptotically ­ mpqI unbiased for

V NR ≈ NR . V In the special case of simple random sampling without replacement, expression (3.5) reduces to

 w2 a w2 w2 a  r  s 2 r s 2  ∑i i ∑ i ∑ i i Vˆ =1 1 −y Iy +R ˆ 2 1 − x  Ix 1  i∈ s +i∈ s −2 i∈ s  σ 2 NR 2 n  r I n  r 2   ˆ2 ˆ2 ˆ ˆ  ε x I    y   x µ x  Na N NN a   r r r r s ˆ x y  xy  xy  xyr  −2R I 1 − . 2 2 2 nr  r  r  r   w b w w b  2 xy  x  y  xy  ∑i i ∑ i ∑ i i µ  + i∈ s +i∈ s − 2 i∈ s  y σ 2  ˆ2 ˆ2 ˆ ˆ  µ  η Nb N NN b x  3.3 Estimation of the imputation variance V%   I ˆ % ˆ ˆ (y ) ( x ) An estimator VI of VI= E mpq V I( R I − R | s , sr , s r ) 2 2 2 2 can simply be obtained by estimating V( Rˆ− R ˆ | s ,  w a b w w a w b   I I ∑i i i ∑ i ∑ i i ∑ i i µ   s(y ),) s ( x ) given by (2.6). An asymptotically ­ I unbiased of −2  i∈ s +i∈ s − i ∈ s − i∈ s  y σ , (3.4) r r  ˆ ˆ ˆ2 ˆ ˆ ˆ ˆ  µ  εη  V( Rˆ− R ˆ | s , s(y ) , s ( x ) ) is then given by NNNNNNNa b a b x   II r r      Vˆ = 1 w2 (1− a ) s2 + Rˆ 2 w 2 (1 − b ) s 2 . (3.6) ˆ ˆ ˆ 2 I x 2 ∑ i i Iy I∑ i i Ix  where (N , Na , N b )= ∑ i∈ s wi (1, a i , b i ). Now, let s Ix = I  i∈ s i∈ s  1/Nˆ ∑ w ( x% − x ) 2 and s2 = 1/ Nˆ ∑ w ( y% − y ) 2 i∈ s i i I Iy i∈ s i i I It follows that Vˆ in (3.6) is asymptotically mpqI­ with (x , y )= 1/ Nˆ ∑ w ( x%% , y ). Note that s 2 and s 2 I II i∈ s i i i Ix Iy unbiased for V % . In the special case of simple random denote respectively the sample variability of the ­ x values I 2 sampling without replacement, expression (3.6) reduces to and the ­ y values after imputation. It can be shown that s Ix 2 and s 2 are respectively asymptotically ­ mI unbiased for 2 r  s r s 2  Iy Vˆ =N 1 −y Iy +R ˆ 2 1 − x  Ix . the model variances σ 2 and σ 2 . Also, let s= 1/ Nˆ I 2 n  nI  n  n  η ε xyr ab x I      ˆ ∑ i∈ s wi a i b i( x i− x rr ) ( y i − y rr ), where Nab = ∑ i∈ s wi a i b i and ˆ − 1 (,)xrr y rr= N ab∑ i∈ s w iiii a b( x , y i ). Note that s xyr is 3.4 Estimation of the mixed component VMIX ­ m unbiased for the model covariance σ εη. It follows that ˆ ˆ VNR is obtained by estimating the unknown quantities in Finally, we obtain an estimator VMIX of VMIX by esti­ (3.4), which leads to mating

Statistics Canada, Catalogue No. 12­001­X 164 Haziza: Variance estimation for a ratio in the presence of imputed data

ˆ ()()y x ˆ ˆ (y ) ( x ) EERsssm[((|, I I r , r )− RRRsss )( − )|,r , r ]. assumption that the response probabilities do not depend on the sample s. Under the NM approach, the total variance of Using a first­order Taylor expansion, we obtain ˆ RI can be approximated by ˆ ()()y x ˆ ˆ (y ) ( x ) Em[((|, E I R I s s r , s r )− R )( R − R )|, s sr , s r ] ˆ ˆ V( RI − R ) ≈ Eq V p E I ( R I − R |a , b )

 2 2  ∑wi a i ∑ w i +EVRRVERR(ˆ − |a , b ) + (ˆ − |a , b ), (4.1) 1  i∈ s i∈ s  2 pq I I q pI I ≈  − σ µ 2  ˆ ˆ ˆ 2  ε x  NNa N ′ ′    where = a (a1 , ..., a N ) and = b (b1 , ..., bN ) denote the vectors of response indicators to variables y and , x 2 2 respectively. ∑wi b i ∑ w i  2 µ y  2 Under the IM approach, the total variance of Rˆ can be + i∈ s − i∈ s  σ I  ˆ ˆ ˆ 2  µ  η NNb N x  approximated by   ˆ ˆ V() RI − R ≈ Emq V p E I( R I − R |a , b )  w2 a w2 b w 2   ∑i i ∑ i i ∑ i µ  i∈ s i∈ s i∈ s  y   −2 + − 2 σ . +E V( Rˆ − R |a , b )  ˆ ˆ ˆ ˆ ˆ 2  µ  εη  (3.7) mpq I I NNNNNa b x      + EVERR(ˆ − |a , b ). (4.2) An estimator of (3.7) is thus given by q m pI I V ˆ = Under both the NM and the IM approaches, an estimator MIX of the first term on the right hand side of (4.1) and (4.2) can be obtained by finding an asymptotically ­ pI unbiased   w2 a w2   w2 b w 2   ∑i i ∑ i ∑i i ∑ i estimator of VER(ˆ | a , b ). Also, the second term on the 1  i∈ s −i∈ s s2 +  i∈ s − i∈ s  Rˆ 2 s 2 p I I 2   ˆ ˆ ˆ2 Iy  ˆ ˆ ˆ 2  r Ix right hand side of (4.1) and (4.2) can be estimated by Vˆ x I  NNa N NNb N I      given by (3.6). Under the NM approach, an estimator the last term on the right hand side of (4.1) can be obtained by  w2 a w2 b w 2   estimating VER(ˆ | a , b ), whereas an estimator of the ∑i i ∑ i i ∑ i  q pI I −2 i∈ s +i∈ s − 2 i∈ s  Rˆ s . (3.8) last term on the right hand side of (4.2) can be obtained by  ˆ ˆ ˆ ˆ ˆ 2  I xyr  NNNNNa b  ˆ    estimating VERRm pI( I − | a , b ) under the IM approach. As a result, the estimators of the first two terms in (4.1) and The estimator (3.8) is asymptotically ­ mpqI unbiased for (4.2) are identical and thus are valid regardless of the

MIX . V In the case of simple random sampling without approach (NM or IM) used for inference. Only the third ˆ replacement, the component VMIX is equal to zero. More term on the right hand side of (4.1) and (4.2) will depend on ˆ generally, the component VMIX is equal to zero for any the approach used. In the case of the IM approach, unistage self­weighting design (i.e., a sampling design for specification and validation of the imputation model is which the sampling weight are all equal). For unequal crucial to achieve asymptotic unbiasedness of the third probability designs, it is important to include the component component, whereas in the case of the NM approach, the ˆ VMIX because its contribution (positive or negative) to the asymptotic unbiasedness of the third component relies of the overall variance could be substantial (Brick, Kalton and correct specification of the nonresponse model. Kim (2004)). ˆ Finally, an asymptotically ­ mpqI unbiased estimator of 4.1 Estimation of Vp E I( R I − R | a,) b ˆ the total variance VVRRTOT =mpqI() I − is thus given by Using a first­order Taylor expansion and expression ˆ(TP) ˆ * ˆ ˆ ˆ (2.5), an estimator of VERR(ˆ − | a , b ), denoted by ˆ , V VVVVVTOT= ORD + NR +I + 2MIX . p I I 1 is given by ˆ V1 =∑ ∑ ∆ij ξ i ξ j , (4.3) 4. Variance estimation: The reverse framework i∈ s j ∈ s where In this section, we derive variance estimators under the reverse framework and both the NM and the IM approaches  y  ξ = 1 1 a() y− y − r  1 b( x− x ) . according to the method proposed by Shao and Steel (1999). i x  ˆ i i r x  ˆ i i r  r Na r  N b  Recall that, under this framework, we require the additional

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 165

In other words, the estimator Vˆ is obtained from the ˆ ˆ ˆ 1 N a N b N ab where p ˆ y = , p ˆ x = and p ˆ xy = . complete data variance estimator (2.2) by replacing ei by Nˆ Nˆ Nˆ ξ i . In the case of simple random sampling without replacement, the estimator (4.3) reduces to The estimator (4.6) is asymptotically ­ pqI unbiased for the approximate variance (4.5), noting that s 2 , s 2 and s are s2 2 r r s  Iy Ix xyr n 1 yr 2 s xr xy  xy  xyr 2 2 Vˆ =1 − +Rˆ − 2 R ˆ . (4.4) asymptotically ­ pqI unbiased for S y , S and S xy , respect­ 1 ( N ) 2  rr rr  r  r  r  x xr  y x x  y  xy  tively. In the case of simple random sampling without replacement, the estimator (4.6) reduces to

4.2 Estimation of V E( Rˆ − R | a, b) under the 2 q pI I r  s r s 2 NM approach Vˆ (NM) =1 n −y Iy +R ˆ 2 n − x  Ix 2 2 N N  rI  N N  r x I    y   x First, note that Y N ˆ a b n rx r y  r xy  r xy  s xyr  EpI(), R I ≈ −2R ˆ − . (4.7) NXa b I      N Nrxy  r x  r y  r xy  where (Ya , N a )=∑ ai ( y i ,1) and (Xb , N b )= ∑ bi ( x i ,1). i∈ U i∈ U ˆ 4.3 Estimation of Eq V m E pI( R I − R | a, b) under Using a first­order Taylor expansion, it can be shown that the IM approach VERR(ˆ − | a , b ) can be approximated by q pI I Using a first­order Taylor expansion, it can be shown that 1 − p EVERR(ˆ − | a , b ) can be approximated by ˆ 1 y  2 2 1 − p x  2 q m pI I VERRq pI( I − |a , b ) ≈ 2    Sy + R  S x NX  py  p x    E V E( Rˆ − R |a , b ) ≈ 1 1− 1 σ 2 q m pI I 2   ENN( )  ε µ x   q a  pxy− p x p y   − 2R   S xy  , (4.5) px p y   2 µ    µ  N  y 1 1 2 y  ab  1  (4.8) +  −  ση − 2    E q   − σ εη  , where µx  ENN q() b  µ x   NNNb b    2 1 2 S y = ∑ (yi − Y ) , ˆ N − 1 i∈ U where Nab = ∑ i∈ U ai b i . An estimator of EVERq m pI( I − | a , b ) R is obtained by estimating unknown quantities in 2 2 (4.8), which leads to S = 1 (x− X ) x N − 1 ∑ i i∈ U     Vˆ (IM) =1 1 − 1 s2 + Rˆ 2 1 − 1 s 2 and 2 2  ˆ ˆ Iy I ˆ ˆ  Ix x I  Na N   Nb N  S = 1 (x− X )( y − Y ) xy N − 1 ∑ i i i∈ U N ˆ   − 2Rˆ ab − 1 s . (4.9) with I ˆ ˆ N  xyr  NNb b   (YX , )= 1 (y , x ). N ∑ i∈ U i i The estimator (4.9) is asymptotically ­ mpqI unbiased for the approximate variance (4.8). It is interesting to note that, An estimator of VERR(ˆ − | a , b ) is obtained by q pI I ˆ (NM) under weighted MRHD imputation, the estimator V2 in estimating unknown quantities in (4.5), which leads to ˆ (IM) (4.6) obtained under the NM approach is identical to V2 1 − p ˆ  1 − p ˆ Vˆ (NM) = 1 y s2+ Rˆ 2 x  s 2 in (4.9) obtained under the IM approach. However, this may 2 ˆ 2  pˆ  Iy I p ˆ  Ix N x I  y  x  not be the case with a different imputation method. Also, the component Vˆ is negligible with respect to Vˆ when the pˆ− p ˆp ˆ   2 1 − 2 Rˆ xy x y s sampling fraction / nN is negligible, where Vˆ stands for I ˆ ˆ  xyr  2 px p y   ˆ (NM) ˆ (IM) ˆ V2 or 2 . V In this case, the component V2 may be     omitted from the calculations. = 1 1− 1 s2 + Rˆ 2 1 − 1 s 2 2  ˆ ˆ Iy I ˆ ˆ  Ix Finally, an estimator of the total variance under the x I  Na N   Nb N  reverse framework is given by N ˆ   − 2Rˆ ab − 1 s , (4.6) VVVVˆ(RE) = ˆ + ˆ + ˆ . I ˆ ˆ N  xyr  TOT 1 I 2 NNb b  

Statistics Canada, Catalogue No. 12­001­X 166 Haziza: Variance estimation for a ratio in the presence of imputed data

Under the reverse framework, both the NM approach and variables, contrary to imputing independently each variable. the IM approach lead to the same estimator of the total The results for JRHD imputation can be obtained using ˆ (RE) variance. Thus, the variance estimator VTOT is robust in the similar techniques presented in this paper. Finally, the sense that it is valid under either the NM approach or the IM results presented in this paper can be easily extended to the approach. case of both deterministic and random regression imputation performed within imputation classes. 5. Summary and conclusions Acknowledgements In this paper, we have derived variance estimators for the imputed estimator of a ratio under two different frame­ The author thanks an Associate Editor for constructive works. The reverse framework facilitates the derivation of comments and suggestions that helped improving the the variance expressions (in comparison with the customary quality of the paper. David Haziza’s work was partially two­phase framework), especially if the sampling fraction is supported by a grant from the Natural Sciences and ˆ small, in which case we can omit the component 2 . V Engineering Research Council of Canada. However, unlike the two­phase framework, it requires an additional assumption that the response probabilities do not depend on the realized sample . s Also, the two­phase References framework uses a natural decomposition of the total error Brick, M.J., Kalton, G. and Kim, J.K. (2004). Variance estimation that leads to a natural decomposition of the total variance. with hot deck imputation using a model. Survey Methodology, 30, That is, the total variance can be expressed as the sum of the 57­66. sampling variance, the nonresponse variance and the Deville, J.­C., and Särndal, C.­E. (1994). Variance estimation for the imputation variance, which allows the survey statistician to regression imputed Horvitz­Thompson estimator. Journal of get an idea of the relative magnitude of each component. Official Statistics, 10, 381­394. Under the reverse approach, there is no easy interpretation Rao, J.N.K. (1996). On variance estimation with imputed survey data. for the variance components (except the imputation Journal of the American Statistical Association, 91, 499­506. variance). We have considered the case of weighted MRHD Rao, J.N.K., and Shao, J. (1992). Jackknife variance estimation with survey data under hot deck imputation. Biometrika, 79, 811­822. imputation within classes. Another version of weighted random hot­deck imputation, which we call weighted joint Särndal, C.­E. (1992). Methods for estimating the precision of survey random hot deck (JRHD) imputation, is identical to estimates when imputation has been used. Survey Methodology, 18, 241­252. weighted MRHD imputation, except that when both variables are missing, a donor j is selected at random from Shao, J., and Steel, P. (1999). Variance estimation for survey data the set of common donors (i.e., the set of respondents to with composite imputation and nonnegligible sampling fractions. Journal of the American Statistical Association, 94, 254­265. both variables y and ) x with probability wj/ ∑ l∈ s w l a l b l and the vector (,)xj y j is imputed. This version of the method helps preserving relationships between survey

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 167 Vol. 33, No. 2, pp. 167­172 Statistics Canada, Catalogue No. 12­001­X

Efficient bootstrap for business surveys

James Chipperfield and John Preston 1

Abstract The Australian Bureau of Statistics has recently developed a generalized estimation system for processing its large scale annual and sub­annual business surveys. Designs for these surveys have a large number of strata, use Simple Random Sampling within Strata, have non­negligible sampling fractions, are overlapping in consecutive periods, and are subject to frame changes. A significant challenge was to choose a variance estimation method that would best meet the following requirements: valid for a wide range of estimators (e.g., ratio and generalized regression), requires limited computation time, can be easily adapted to different designs and estimators, and has good theoretical properties measured in terms of bias and variance. This paper describes the Without Replacement Scaled Bootstrap (WOSB) that was implemented at the ABS and shows that it is appreciably more efficient than the Rao and Wu (1988)’s With Replacement Scaled Bootstrap (WSB). The main advantages of the Bootstrap over alternative replicate variance estimators are its efficiency (i.e., accuracy per unit of storage space) and the relative simplicity with which it can be specified in a system. This paper describes the WOSB variance estimator for point­in­time and movement estimates that can be expressed as a function of finite population means. Simulation results obtained as part of the evaluation process show that the WOSB was more efficient than the WSB, especially when the stratum sample sizes are sometimes as small as 5.

Key Words: Variance; Bootstrap; Stratified sampling.

1. Introduction Core survey output statistics for ABS business surveys are estimates at a point in time, estimates of movement In 2000, the Australian Bureau of Statistics (ABS) first between two time points, and estimates of rates. Business obtained a register of businesses containing taxation data surveys are equal probability designs within stratum, are from the Australian Taxation Office (ATO). The data items highly stratified (100s of strata), can be either single or two included turnover, sales, and other expense items. In 2001, phase sample designs, and for surveys that sample on more the ABS used this register as a sampling frame for some than one occasion the overlapping sample can range from 0 surveys in order to improve the efficiency of its sample to 100%. The sample size for business surveys range from designs. This data is updated for each business at least less than 1,000 to 15,000; stratum level sample sizes can be annually. To make maximum use of these administrative as low as 3 and as high as several hundred. data items in estimation the ABS developed a generalized Section 2 introduces the GREG estimator. Section 3 estimation system called ABSEST, with the capability of discusses alternative variance estimators for GREG and supporting generalized regression estimation (GREG) and justifies why the Bootstrap variance estimator was chosen variance estimation. ABSEST has been routinely used for for ABSEST. Section 4 describes the Without Replacement the monthly ABS Retail Survey since July 2005. Scaled Bootstrap (WOSB) and Rao and Wu (1988)’s With A generalized estimation system is highly desirable for Replacement Bootstrap (WSB) variance estimators for statistical agencies as it supports a variety of survey output point­in­time estimates under single­phase designs. Section requirements at high levels of statistical rigor for an 5 describes the WOSB for movement estimates. Section 6 acceptable cost. The ABS has invested considerable measures the bias and variance properties of WOSB and resources into its generalized estimation system for business WSB in a simulation study. Section 7 gives some surveys. Prior to 1998, the ABS’s generalized estimation concluding remarks. system was capable of Horvitz­Thompson, ratio, and two­ phase estimation with variance estimates based on Taylor Series (TS) approximations. In 1999, the Taylor Series 2. Generalised regression (GREG) estimator method was replaced with the Jackknife method. Subsequent feedback about the computer design and In this section we briefly describe the GREG that is usability were that changes to the generalized estimation implemented in ABSEST. Consider a finite population U system made it increasingly complex to maintain and divided into H strata UUUU= { 1, 2 , ..., H } , where U h is develop and that processing time could be undesirably long. comprised of N h units. The finite population total of These key features were important when choosing the interest is YY= ∑ , where Y= ∑ y and = 1, ..., h h h h i∈ U h hi variance estimation method for ABSEST. . H Within stratum , h the sample sh of nh units is selected

1. James Chipperfield, Australian Bureau of Statistics. E­mail: [email protected]; John Preston, Australian Bureau of Statistics. E­mail: [email protected]. 168 Chipperfield and Preston: Efficient bootstrap for business surveys

* from U h by Simple Random Sampling without Replace­ (ii) for each of the R sub­samples computing wi = * * ment (SRSWOR). The complete sample set is denoted by bi w i , where bi depends upon the number of times

s= { s1 , s 2 , ..., s H }. unit i is selected in the sub­sample; Consider the case where a K vector of auxiliary T ˆ* ˆ ˆ * ˆ * * variables i = x (x1 i , ..., x ki , ..., x Ki ) is available for i∈ s (iii) calculate θ = θ Y(), where = Y∑ i∈ s wi y i ; and the corresponding vector of population totals · ˆ = X∑ i∈ U xi are known. The GREG estimator (Särndal, (iv) estimate variance of θ by Varrep (θ= ) − 1R ˆ *(r ) ˆ 2 ˆ *(r ) Swensson and Wretman 1992, page 227) is given by (R − 1)∑ r =1 ( θ − θ ) , where θ is the esti­ ˆ ˆT ˆ th Yreg =∑i∈ s w% i y i = ∑ i∈ s wi y i +(),XXB − where w% i = mate of θ based on the r replicate sample. Note: ˆ ˆ − 1 ˆ ˆ − 1 * * wi g i , wi= N h / n h , = BT t with T being the gener­ the expression for replicate weights, wi= b i w i , ˆ ˆ alised inverse of , T = X∑ i∈ s wi x i and g i =(1 + includes the Jackknife, Bootstrap and Balanced −2T ˆ − 1 ˆ T ˆ T −2 ˆ σix i T( X− X ) ), T = ∑ i∈ s wi x i x iσ i , t = Repeated Replication as special cases. − 2 2 ∑ i∈ s wi x i yi σ i , σ i is a constant motivated by the T superpopulation model yi = iβ + ε x i such that ε i is ˆ ˆ As we can express Yreg by a function θ, the variance of independently and identically distributed with mean 0 and Yˆ * can be calculated by the above steps where specifically 2 ˆ ˆ reg variance σ i , and (). = B β E It is well known that Yreg is −1 steps (iii) and (iv) respectively become: (iii) calculate unbiased to (). On The weights w % i are stored for ready ˆ * * *** * Yreg = ∑ i∈ s w% i y i , where w% i= w i g i and gi has the same calculation of estimates. In practice bounds will be placed * form as gi but is calculated using the weights wi instead on the weights, w % i . If the weights, w % i , given by the above of the weights wi for ∈ ; is (iv) estimating variance by equation, are outside these bounds, they are calculated · ˆ − 1R ˆ *(r ) ˆ 2 Varrep (YRYYreg )= ( − 1)∑ r =1 (reg− reg ) . through iteration (see Method 5 of Singh and Mohl 1996). The attractive feature of these replication methods is that ˆ The expression for Yreg can be adapted to a range of * only the selection of the replicate samples and the value bi estimates, including domains and multi­phase (see Estevao, is required to calculate unbiased variance estimates for Hidiroglou and Särndal 1995). For example, when = x 1, i many commonly used sample designs and for estimators Yˆ becomes the Horvitz­Thompson estimator given by reg that have good first order Taylor Series approximations. ˆ · ˆ Y = ∑hw ∑ i∈ s y with estimated variance Var ( ) Y = * hh hi Also if the replicate weights, w % are stored the variance 2 − 1 2 2 −1 ˆ 2 i ∑ h N h nh(1− f h ) sˆ h , where sˆ h=( n h − 1)∑ i∈ s ( yhi− y h ) , ˆ * h estimates of Yreg require simple calculations that can be yˆ = ∑ i∈ s y / n and f= n/. N h h hi h h h h completed in a short time; this approach of storing replicate weights has been applied successfully by the ABS’ 3. Comparison of alternative variance estimators generalized estimation system for household surveys. Once replicate weights are available, calculation of variance for a The ABSEST variance estimation method was required variety of analysis, such as linear regression, involves to have bias and variance properties that were competitive in simple calculations that require little time and does not simulation studies, when compared with alternatives in the require the analyst to have knowledge about the sample literature. In order to simplify the maintenance and design. Next we consider the relative merits of some development of the system, the variance estimation system replicate variance estimators for implementation in specifications were required to be generic such that all ABSEST. * calculations were largely independent of the estimator. The drop­one Jackknife forms replicate samples, , s by (ABSEST need only support SRSWOR within stratum and dropping one unit at a time. This implies that = . Rn For single stage sample designs). Also, strong consideration was large­scale surveys this storage requirement is excessive. given to minimise the computational costs. The delete­a­group Jackknife, while reducing R by Firstly, we also considered the Bootstrap, Jackknife and dropping a group of units within a stratum at a time, would Balanced Repeated Replication (BRR) methods (Shao and still have at least = 2 RH replicates­ a minimum of two Tu 1995, Rao and Wu 1988). Consider estimating the groups per stratum is required to calculate variance. Despite variance of a function θ=ˆ θ Y(ˆ ), where Yˆ is a P vector of performing well in an empirical study where nh = 2 (see ˆ Shao and Tu 1995, page 251), the Jackknife was rejected on estimates = Y ∑ i∈ s wi y i , yi is the response vector from the basis of its excessive storage requirement. unit i with elements y pi , and θ is a smooth function. Estimating the variance using a replication method involves For stratified designs the scaled Balanced Repeated the following steps: Replication (BRR) requires approximately RH= repli­ cate weights. Firstly, the replicate samples are formed by

(i) independently sub­sampling from the set s a total randomly splitting the stratum sample sh into two groups * of R times; then allocating one of these groups to sh for each

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 169

h= 1, ..., H . The allocation of groups to replicates, 4. Without replacement scaled bootstrap (WOSB) defined by a Hadamard matrix, is done in such a way to for point in time estimates eliminate between stratum covariance in the replicate 4.1 Method samples. The Grouped BRR (GBRR) (see Shao and Tu 1995) can arbitrarily reduce R at the cost of introducing For point­in­time GREG estimates, the Without Replace­ between stratum covariance in the replicate samples. ment Scaled Bootstrap (WOSB) variance estimator involves Preston and Chipperfield (2002) showed in an empirical repeating the following R times: evaluation for a typical ABS business survey that BRR (and r GBRR) was significantly more unstable than the Bootstrap. (a) forming the set s by selecting mh units by In their summary of the literature, Kovar, Rao and Wu SRSWOR from sh independently within each stra­ (1988) found that the scaled Bootstrap tended to have a tum h= 1, ..., H , where mh= [ n h / 2] and the larger bias compared with the Jackknife or TS when esti­ operator [.] rounds down its argument down to the mating the variance of GREG estimates. As the relative nearest integer; assessment of these methods varied according to the * * (b) calculating whi= w hi(1 − γ h + γ hhn/) m hhi δ for underlying simulation model and the stratum sample size it * i∈ sh , where γh = (1 − fh ) m h / ( n h− m h ) , δ hi was important to make an assessment that was based on a is 1 if i∈ s r and 0 otherwise; and model and sample design that were typical of ABS business h *** surveys. Section 6 shows these properties to be acceptable. (c) calculating w% hi= w hi g hi for ∈ ; is and Unlike the other replication methods, the value of R for the th ˆ * (d) calculating the r Bootstrap estimate of , YYreg = Bootstrap may be chosen arbitrarily and so meet storage and * ∑ i∈ s w% hi y i . The justification mh= [ n h / 2] is given in computation restrictions. Further, the selection of the section 4.2. The Bootstrap variance estimator is Bootstrap replicate samples is more easily specified in a given by the Monte Carlo approximation, computer system compared with selection of the BRR · ˆ −1 R ˆ*(r ) ˆ 2 VarB (Yreg ) = ( − 1) R ∑ r =1 ().YYreg− reg The WSB replicate samples. method is the same as WOSB except that the rep­ We considered the relative merits of a number of other licate samples are selected by SRSWR and the variance estimators for implementation in ABSEST. The TS scaling factor is instead γh =(1 −fh ) m h / ( n h − 1 ) , method was not suitable as its variance expression for where mh is often set to nh − 1 in the literature. complex estimands involves many terms specific to the Preston and Chipperfield (2002) found that WOSB estimand, making it difficult to adapt into a generalized was found to have significantly less replication error system. This problem was addressed by Nordberg (2000) than the WSB­ the error due to replicate sampling who described a method for variance calculation that and conditional on the sample set. automatically generates the Taylor Series expansions. They implemented the method in a computer system, called It is easy to see that the WOSB and WSB estimators are CLAN. CLAN can handle any function of means under unbiased estimators of Var (θˆ ). The TS approximate Probability Proportional to Size (PPS) and cluster sampling. variance is given by Var· (θˆ ) = ∇′ θ ˆˆ ( ˆ ) ∇θ ˆ , Y V where ˆ () Y ˆ V A limitation of CLAN is that it does not produce replicate is a PP× matrix with elements weights which support complex analysis, such as regression N2 (1− f ) Cov(· YYˆ , ˆ ) = s ˆ analysis, either within or outside a statistical agency; it p p′ n p, p′ requires knowledge of the sample design; and it would be a where relatively complex system to specify and maintain in a sˆ = 1 (y− yˆ )( y − y ˆ ); generalized system (in comparison to the Bootstrap). For the p, p′ ∑ pi p p′ i p ′ n − 1 i∈ s same reason other linearized variance estimators (described ˆ 1 in Estevao, Hidiroglou and Särndal 1995 and evaluated in yp = ∑ y pi ; n i∈ s Yung and Rao 1996) were rejected, despite good theoretical Yˆ = w y properties, good empirical results and being computationally p ∑ i pi i∈ s efficient. for p, p′ = 1, ..., P , and ∇′ =( ∂ / ∂YY , ..., ∂ / ∂ ) | . It is On the above considerations, the preferred variance 1 P Yˆ easy to see that estimation method for ABSEST was the Bootstrap. In the · ˆ* ˆ′ ˆ ˆ * ˆ ′ ˆˆ ˆ ˆ next section we describe the WOSB and WSB, where only E* (Var (θ )) = ∇θEV* ( ( )) ∇θ Y = ∇ θ (), ∇θ Y V the former is implemented in ABSEST. by noting that · ˆ* ˆ * · ˆ ˆ EYYYY* [Cov(p ,p′ )]= Cov(p , p ′ )

Statistics Canada, Catalogue No. 12­001­X 170 Chipperfield and Preston: Efficient bootstrap for business surveys

where E* denotes the expectation with respect to re­ 5. Movement variance between sampling. Note the scaling constants applied to whi to single phase estimates calculate the replicate weights are chosen so that the correct finite population correction factor is obtained. It therefore A key output requirement of many business surveys is follows that the Monte Carlo approximation to the variance, the estimate of change between two time points. Denote the · ˆ −1 R ˆ *(r ) ˆ 2 ˆ (t ) ( t ) ( t ) (t ) VarB (θ ) = (R − 1)∑ r =1 ( θ −θ ) , is unbiased for Var (θ ). finite population at time t by UUUU= {1 , 2 , ..., H }, () t where U h is the stratum h population at time t that is () t 4.2 A note on the relative efficiency of WSB and made up of N h units. The population total at time t is ()t (t ) () t WOSB sampling Y = ∑ ∑ t y . Estimating the variance of ∆ = h i∈ U h hi · ˆ YYˆ(t )− ˆ ( t − 1) , the difference between two time periods, is the To simplify notation, let vˆ boot =VarB ( θ ). The variance of the Bootstrap variance estimator can be written as focus of this section. The terms corresponding to n h , f h 2 (t ) () t ( )2 t and sh at time t are denoted by nh , f h and sh Var (vˆ )= Var ( E [ vˆ | s ]) + E[Var ( vˆ | s )], boot s * boot s * boot respectively. When sampling on two occasions define N c , n , n (1) , and n (2) to be the number of units in the following where s denotes the expectation with respect to the sample hc ch ch sets UU(1)∩ (2) , s(c )=∩ s (1) s (2) , s(1)=− s (1) s (c ) , and design. If vˆ is unbiased (i.e., E[ vˆ | s ] = Var( θˆ )) h h h h h hc h h boot * boot s(2)=− s (2) s ( ) c respectively. In ABS business surveys the then Var (ˆ ) v does not depend upon how the replicate hc h h boot time 1 sample of size n (1) is an SRSWOR from U (1) . The samples are selected. The term Var (ˆ | ) vs is the h h * boot time 2 sample is the union of the following two samples: an replication error conditional on the sample and is inversely SRSWOR of n units from s () c and an SRSWOR of n (2) proportional to . R The value of R is chosen to be hc h ch units from UUU(2)−∩( (1) (2) ). The time 2 sample is sufficiently large such that Var (E [ vˆ | s ]) is small h h h s * boot effectively an SRSWOR from U (2) . At the ABS, the size of relative to ˆ , v the estimated sample variance. The h boot the overlapping sample, n , is controlled by the Permanent efficiency of two Bootstrap estimators can be compared by hc Random Number method (see Brewer, Gross and Lee the size of Var (E [ vˆ | s ]) when both estimators have s * boot 1999). the same value of . R Next we summarise empirical results The estimator of Var (∆ˆ ) can be expressed as based on actual data that show the WOSB can be ˆ ˆ(1) ˆ (2) ˆ(1) ˆ (2) significantly more efficient than WSB. The benefits of Var (∆= ) Var (YY )+ Var ( ) − 2Cov(YY1 , 2 ). efficiency are either reduced computation time and/or more ˆ ˆ(2) ˆ (1) accurate variance estimates. Consider the Horvitz­Thompson estimator ∆ = − , YY ˆ t ˆ Preston and Chipperfield (2002) compared the efficiency where t = 1, 2 and Y is defined analogously to . Y Tam (1) (2) of WOSB with m= [ n / 2] and WSB with m=− n 1 (1985) show that when UUh= h , an unbiased estimator h h h h ˆ (see Rao and Wu 1984) for the Australian Quarterly of Var (∆ ) under the above sampling scheme is Economic Activity Survey in March 2000. This survey has (1) (2) (1) (2) Var· (∆=ˆ ) Var· (YYYYˆ )+ Var· ( ˆ ) − 2Cov(· ˆ , ˆ ), a stratum level sample size that varies from 4 and into the 100s. The results (derived from Preston and Chipperfield where 2002, Table 1) show at the national level the size of · ˆ ()t 2 (t )2 ( t ) Var (Y )=∑ Nh (1 − f t ) s h / n h , Vars (E* [ vˆ boot | s ]) was 54% smaller for WOSB compared h with WSB sampling when = 100 R (See Preston and Cov(· Y(1) , Y (2) )= N2 (1 − f ) s(12) n /( n (1) n (2) ), Chipperfield 2002 for more empirical estimates of ∑ h 12, h h hc h h h Var (E [ vˆ | s ]) for WSB and WSOB). In other words, s * boot (12) − 1 ˆ ˆ WOSB required about half the number of replicates to sh=( n hc − 1)∑ ( y1i − y 1 ) ( y 2i − y 2 ), i∈ s (c ) achieve the same replication error as WSB. This represents h a significant efficiency gain. Another benefit of WOSB over ˆ − 1 yt= n hc∑ y ti (c ) WSB is that the computational time in selecting the replicate i∈ s h samples is considerably less. (1) (2) for = 1, 2 t and f12, h = nh n h/. n hc N h From empirical investigations, the choice of m = (1) (2) h When UUh≠ h , a more general form of Tam’s · [nh / 2] for WOSB minimized Vars (E* [ vˆ boot | s ]). As n estimator is given by Var( ∆ˆ ), except that increases, we suspect that the difference between WOSB · ˆ ()t (t )2 (t )2 ( t ) and WSB will reduce to approximately zero. More work Var (Y )=∑ Nh (1 − ft ) s h / n h , needs to be done to establish these properties. h

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 171

· (1) (1) (1) (2) (1) (2) (12) ˆ ˆ(1) ˆ (2) ˆ ()t (t ) Cov(Yˆ , Y ˆ ) = N N/( n n ) n (1− f ) s where ∆reg =YY reg − reg and Yreg = ∑ w% hi y i . ∑ h h h h c12, h h (t ) h i∈ s · ˆ and The proof that VarB (∆ reg ) is unbiased is straight­ ˆ forward and is similar to the proof that VarB (θ ) is unbiased (1) (2) nh n h N hc (see section 4). f 12, h = (1) (2) . nhc N h N h The approach described above requires a separate set of replicate weights for movement and level variance esti­ · ˆ For the reminder of this section we assume that Var (∆ ) is mates. Roberts, Kovačević, Mantel and Phillips (2001) con­ ˆ (1) (2) unbiased for Var (∆ ) when UUh≠ h . (It is worthwhile sider approximate Bootstrap variance estimators of move­ noting that Var· (∆ˆ ) can take negative values when (1) (2) ment that only use the level replicate weights, hence re­ UUh≠ h . Nordberg (2000) gives an unbiased estimator ducing computational costs and simplifying the method and ˆ (1) (2) of Var (∆ ) for the regression estimator when UUh≠ h , its implementation in a computing system. but there is no obvious way in which it can be used with the Bootstrap as described in this paper.) ˆ ˆ(1) ˆ (2) Estimating the variance of ∆reg = reg − reg , YY the 6. Simulation study movement between GREG estimates at times 1 and 2, using WOSB involves repeating the following R times: This section summarizes a simulation study for point­in­ time and movement estimates carried out to empirically (a) forming the set s * by independently selecting measure the bias and variability of WOSB and WSB over (1) (1) (2) (2) mch = [nch / 2], mch= [ n ch / 2] and mch= [ n ch / 2] repeated sampling when = 100. R A population was (1) (2) units by SRSWOR from the sets shc , shc , and shc generated at time points 1 and 2 from the following models, (1) (2) respectively; yi =(0.75 x1i + 0.25 x2 i ) W (0, 2.5, 1) and yi = (1) 1.5 yi (0, 5, 1), W where the auxiliary variables are given (b) for i∈ s (1) calculate the replicate weights h by x1i =0.25 x2 i + 0.75 [100L (0, 1, 1)] and x1 i = 100 (0, 1, L 1) where (, µ γ , α ) W and ( µ , γ , α ) L are the  n n(1) n  w*(1)= N/ n (1) 1 − γch − γhc + γch δ r( t ) Weibull and Log­normal distributions with location, shape hi h  chn(1) 1ch n (1) ch m hi   h h ch  and scale parameters given by µγ, and α. These distributions reflect the long tails that are typical of economic survey data. The times 1 and 2 populations were for i∈ shc , of size 3,000, with 2,500 population units common to both n n n w *(1) =1 − γch − γ1ch + γ1c h δ r (1)  time points. Each population unit, , i was assigned to one of hi  ch 1ch 1ch hi   n1h n1h m1c h  5 strata at both time points using zi , where (1) zi= x1 i W (0, 2.5,1) and the stratum boundaries were for i∈ shc , where zi = 50, 100, 150, 250. This resulted in stratum popula­ tion sizes that ranged from 400 to 1,000. [n1h (1− f h ) − n ch (1 − f12,h )] m 1c h γ1ch = , A total of 3,000 simulated stratified SRSWOR were {n1ch ( n 1c h− m 1c h )} () t taken from the population at times 1 and 2, where nh = 12, n(1)= n (2) = 4 and n = 8 for all h and = 1, 2. t γ =(1 − f ) m /( n− m ) ch ch ch ch 12, h ch ch ch For WOSB the replicate sample sizes are given in sections 4 and δ*( ) t equals 1 if unit i is selected in the and 5. For WSB the replicate sample sizes for movements hi (1) (1) (2) (2) replicate group at time point t and zero otherwise; were mch=−[ n ch 1], mch=[ n ch − 1] and mch=[ n ch − 1] and for levels were m(t )= n ( t ) − 1. The WSB estimator for (c) calculate weights defined analogously for i∈ s (2) ; h h h movements has the same form as WOSB but has a slightly *( ) t *( ) ( )* tt (1) (2) different scaling factors and takes replicate samples with (d) calculate w% hi = whi g i for i∈ sh,, s h where ( )* t replacement. gi has the same form as gi but is calculated *() t (t ) From each of the 3,000 simulated samples Yˆ j is using the weights whi instead of whi ; reg calculated, where Yˆ j is given by Yˆ with = x (x , x ), ˆ* ˆ *(2) ˆ *(1) ˆ *(t ) reg reg i1 i 2 i (e) calculate ∆reg =YY reg − reg , where Yreg = *(t ) σi = 1 and j = 1, 2, ..., 3,000. The true standard error of ∑ i∈ s (1) w% hi y i . The WOSB variance estimator is ˆ Yreg is calculated by given by 3,000 S = 1 (YYˆ j − )2 . R 3,000 ∑ reg · ˆ − 1ˆ * ˆ 2 j =1 VarB (∆reg ) = (R − 1)∑ ( ∆reg − ∆ reg ) , r =1

Statistics Canada, Catalogue No. 12­001­X 172 Chipperfield and Preston: Efficient bootstrap for business surveys

ˆ The Bootstrap’s estimated standard error of Yreg from the sometimes small. As a result, the WOSB was implemented j th sample is in ABSEST.

100 ˆ j 1 ˆ j*( r ) 2 References S = ∑ (YYreg − ) , 100 r =1 Brewer, K.R.W., Gross, W.F. and Lee, G.F. (1999). PRN sampling: ˆ j*( r ) ˆ *(r ) where Yreg is defined analogously to Yreg . The Relative The Australian experience. Proceedings of the International Bias (RB) of the Bootstrap’s standard error is Associations of Survey Statisticians, Helsinki.

3,000 Estevao, V., Hidiroglou, M.A. and Särndal, C.­E. (1995). RB()Sˆ = 1 ∑ (Sˆ j − S ). Methodological principles for a generalised estimation system at 3,000S j =1 Statistics Canada, Journal of Official Statistics, 11, 2, 181­204.

The Relative Root Mean Squared Error (RRMSE) of the Kovar, J., Rao, J.N.K. and Wu, C.F.J. (1988). Bootstrap and other Bootstrap’s estimated standard error is methods to measure errors in survey estimates. The Canadian Journal of Statistics, 25­44. 3,000 RRMS()Sˆ = 1 1 ∑ (Sˆ j − S )2 . Nordberg, L. (2000). On variance estimation for measures of change S 3,000 j =1 when samples are coordinated by the use of permanent random numbers. Journal of Official Statistics, 16, 4, 363­378. Similar definitions for RRMSE and bias are used when estimating the movement variance. The 95% coverage Preston, J., and Chipperfield, J.O. (2002). Using a generalised estimation methodology for ABS business surveys. Methodology probabilities, the percentage of 95% confidence intervals Advisory Committee (available at www.abs.gov.au). containing the true population total, of WOSB and WSB for levels and movement are also compared. Rao, J.N.K., and Wu, C.F.J. (1984). Bootstrap inference for sample surveys. Proceeding Section on Survey Methods Research, The results in Table 1 show that the RB and the RRMSE Journal of the American Statistical Association, 106­112. of the WOSB and WSB are both acceptably small. The bias of WSB’s time point 1 estimates are slightly higher than Rao, J.N.K., and Wu, C.F.J. (1988). Resampling inference with complex survey data. Journal of the American Statistical WOSB resulting in slightly worse coverage probabilities. Association, March, 83, 401, 231­241.

Table 1 Roberts, G., Kovačević, M., Mantel, H. et Phillips, O. (2001). Cross Bootstrap estimate of the standard error for sectional inference based on longitudinal surveys: Some movements and point­in­time estimates experience with Statistics Canada Surveys. Presented at the 2001 FCSM Research Conference, Arlington, VA, November 14­16, Time point 1 Movement proceedings available at www.fcsm.gov. Method RB RRMSE C95% RB RRMSE C95(%) WOSB 0.7 17.3 94.7 2.1 20.7 95.3 Särndal, C.­E., Swensson, B. and Wretman, J. (1992). Model Assisted WSB ­3.1 15.8 93.7 ­1.3 19.6 94.6 Survey Sampling. Springer­Verlag.

Shao, J., and Tu, D. (1995). The Jackknife and Bootstrap. New York: 7. Summary Springer­Verlag. Singh, A.C., and Mohl, C.A. (1996). Understanding calibration From the simulation results, both the WOSB and WSB estimators in survey sampling. Survey Methodology, 22, 107­115. were considered to be reliably accurate over repeated sam­ pling. Conditional on the sample, the WOSB was found to Tam, S.M. (1985). On covariance in finite population sampling. The Statistician, 34, 429­433. be significantly more efficient (up to 50%) than WSB for stratified sampling when the stratum sample size is Yung, W., and Rao, J.N.K. (1996). Jackknife linearization variance estimators under stratified multi­stage sampling. Survey Methodology, 22, 23­31.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 173 Vol. 33, No. 2, pp. 173­185 Statistics Canada, Catalogue No. 12­001­X

Bayesian estimation in small areas when the sampling design strata differ from the study domains

Jacob J. Oleson, Chong Z. He, Dongchu Sun and Steven L. Sheriff 1

Abstract The purpose of this work is to obtain reliable estimates in study domains when there are potentially very small sample sizes and the sampling design stratum differs from the study domain. The population sizes are unknown as well for both the study domain and the sampling design stratum. In calculating parameter estimates in the study domains, a random sample size is often necessary. We propose a new family of generalized linear mixed models with correlated random effects when there is more than one unknown parameter. The proposed model will estimate both the population size and the parameter of interest. General formulae for full conditional distributions required for Markov chain Monte Carlo (MCMC) simulations are given for this framework. Equations for Bayesian estimation and prediction at the study domains are also given. We apply the 1998 Missouri Turkey Hunting Survey, which stratified samples based on the hunter’s place of residence and we require estimates at the domain level, defined as the county in which the turkey hunter actually hunted.

Key Words: Hierarchical Bayes; Markov chain Monte Carlo; Bayesian prediction; Random sample size; Spatial correlation; Stratification.

1. Introduction We consider a stratified cluster sampling design where within each stratum clusters are selected by simple random Small sample sizes occur often when analyzing sample sampling without replacement. In our application, clusters survey data. These small sample sizes arise frequently when are of unequal sizes and the cluster sizes are unknown at the studying subpopulations such as socio­demographic groups. time of designing the survey. We consider a domain esti­ We may also consider spatial regions and time periods as mation problem where domains cut across the design subpopulations or study domains. Due to small sample clusters. As a result, domain population size is unknown and sizes, direct survey estimators could be highly unreliable. domain sample size is random. In our application, realized Estimation stemming from areas with small sample sizes sample sizes for the domains are small and as such standard has been termed small area estimation (SAE). Rao (2003) design­based domain estimation techniques (see Cochran gives a nice review of many SAE techniques. Some recent 1977, Lohr 1999) are unreliable. We propose a fully small area review papers are found in Rao (2005) and Jiang Bayesian hierarchical model to get around the problem. and Lahiri (2006). We begin by obtaining estimates of success rates and Appropriate models are needed in order to produce population sizes simultaneously at the small area level for reliable small area statistics. Different model­based methods individuals from each of the design strata. The estimates are include the empirical best prediction method (see Prasad obtained by borrowing information from the neighboring and Rao 1990, Jiang, Lahiri and Wan 2002, Das, Jiang and small areas. This is done through a spatial structure built Rao 2004, Jiang and Lahiri 2006) and Bayesian methods into the Bayesian model. Therefore, the resulting estimates (see Malec, Sedransk, Moriarity and LeClere 1997, Ghosh, are much more stable than the direct survey methods. We Natarajan, Stroud and Carlin, 1998, He and Sun 2000). For then compute a weighted average of the success rates from a good review on Bayesian small area estimation, the the design stratum for the final small area estimate. For readers are referred to Rao (2003). This paper concerns a example, if a county is the small area, we compute a county­ practical implementation of Bayesian methodology. One specific success rate estimate for each of the design strata critical step in implementing a Bayesian model is selecting and average them for a single county estimate. To combine prior distributions. The propriety of the posterior distri­ the sampling design strata, the individual stratum population bution and the robustness of the priors should be carefully sizes should be known. In the case where these population examined. When an MCMC simulation such as Gibbs sizes are not known, they can be estimated using our sampler is used in the computation, the convergence of the proposed model. This work is motivated by applying Gibbs chain must be monitored. For more details, see Carlin Bayesian methods in estimating the turkey hunting success and Louis (2000). rates at the county level in Missouri. We propose a new

1. Jacob J. Oleson, Department of Biostatistics C22­GH, The University of Iowa, Iowa City, Iowa, 52242­1009, U.S.A.; Chong Z. He and Dongchu Sun, Department of Statistics, 146 Middlebush Hall, University of Missouri, Columbia, Missouri 65211­6100, U.S.A.; Steven L. Sheriff, Resource Science Center, Missouri Conservation Department, 1110 South College Avenue, Columbia, Missouri 65201. 174 Oleson, He, Sun and Sheriff: Bayesian estimation in small areas when design strata differ from domains family of generalized linear mixed models with correlated In 1998, the MTHS sampling scheme was changed. The random effects when there is more than one unknown frame is still the list of all turkey hunters registered to hunt parameter. We will call this a bivariate generalized linear in Missouri and contains, among other things, information mixed model (bivariate GLMM). A generalized linear on each hunter’s county of residence. Simple random mixed model (GLMM) with possible correlated random samples used previously put too much weight on the heavy effects is often used when there is only one unknown population masses of Kansas City and St. Louis. Hence, parameter (Sun, Speckman and Tsutakawa 2000). The counties near these metropolises received large samples and proposed model estimates both the population size and the counties further away (e.g., the southern tiers) received parameter of interest and, hence, advances the current state insufficient samples. Ideally we would like to stratify by the of small area research. county where hunters pursued turkeys to draw samples The above­described scenario is present in the 1998 representative at this domain level. Information about where Missouri Turkey Hunting Survey (MTHS). This was a the hunters pursued turkeys is unavailable until after spring postseason mail survey that provided the Missouri questionnaires are returned, meaning this type of strat­ Department of Conservation with information concerning ification is not possible. the number of turkeys harvested by hunters on each day of An alternative is to stratify by where the hunter lives the hunting season and in which county the harvest since hunters tend to hunt near where they live (in locations occurred. Also, the total number of trips made to the with which they are familiar). This causes a problem in counties by these hunters on each hunting day was recorded. estimating the parameters of interest which are hunting Hunting success rates were then calculated from this success rates per county. We would like estimates based on information. hunting location, but the sampling design uses the hunter’s The MTHS example is presented in detail in Section 2. It place of residence. The new sampling design of MTHS is a is followed in Section 3 by a summary of the proposed stratified simple random sampling of clusters with unequal methodology and by general formulae. Although we esti­ sizes. In this case, a cluster represents a registered hunter mate success rates, the methodology is generalizable. We and its elements are the hunting trips for that hunter. The also give general formulae to find the estimates and pre­ hunter’s place of residence is used as a stratification factor. dictions for the small areas as well as full conditional The design strata are: 1) Non­residents of Missouri, 2) distributions for use with MCMC simulations. Final Residents of Northern Missouri, 3) Residents of Southern comments are given in Section 4. Missouri, 4) Residents of St. Louis metro area, and 5) Residents of Kansas City metro area. Figure 1 shows the boundaries that were used in determining the four Missouri 2. 1998 Missouri turkey hunting survey resident strata. These are based on the first three digits of the postal zip code. Proportional allocation was used to 2.1 Background to 1998 MTHS determine the number of sampled hunters in each stratum. The Missouri Department of Conservation began As shown in Table 1, there were 110,691 total permit biennially in 1986 to track hunter tendencies with the holders. A sample of 8,000 was proportionally allocated to MTHS. This survey asked the hunter what county he/she each of the five strata. In Table 1 the column “% of Sample” hunted in, on what day that occurred, and if the hunt was refers to the percent of the overall sample that comes from successful or not. It began as a simple random sample of all that particular stratum (sample replied/total sample replied). spring turkey hunting permit holders. He and Sun (1998) The column “% of Strata Sampled” contains the percent of used the information from the 1996 survey to estimate hunters who were sampled from that stratum (sample turkey hunter success rates in all 114 counties of Missouri replied/total permits). The number of clusters in the with a Bayesian Beta­Binomial model. He and Sun (2000) population and the number of clusters in the sample for each estimated county­specific hunting success rates per week of stratum are known fixed numbers. The hunting trips taken the hunting season. Only one harvested turkey was allowed by each hunter are the elements within the cluster. per week in 1996 spring hunting season. They used a Throughout the remainder of the paper, population size is GLMM and estimated only success rates. Oleson and He the number of hunting trips taken by all hunters in the (2004) extended this model in order to estimate hunting frame, termed hunting pressure. For the MTHS example, success rates for each day of the hunting season. They found the population size is unknown but not random; the sample significant auto­correlation among the days of the hunting size is best considered random since we do not know how season and among the counties of Missouri when estimating many trips each hunter will take. Note that a hunter does not success rates. have to hunt in his/her design stratum. The hunter may hunt in many different counties during the season. For each study

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 175 domain (i.e., county), the population size is again unknown were mailed the survey again, to which 26% responded but not random; the sample size is again random where we from the second mailing. The survey was mailed a third do not know in which county a hunter is going to hunt. time to those who had still not responded after two months Small sample sizes can still result; for example, Dunklin with 20% of the third mailing responding. This resulted in County, Pemiscot County, and New Madrid County in an overall response rate of 66.5%. From Table 1 we see that southeast Missouri had zero trips reported to them non­residents of Missouri had the highest response rate at throughout the 3­week season in 1998 and Lawrence nearly 75%. The two metro areas of St. Louis and Kansas County in southwest Missouri had zero trips on the first day City then followed with response rates of 69% and 66%, and only five the remainder of the first week. respectively. The rural areas in northern Missouri and southern Missouri had the lowest response rates at Table 1 Sample sizes and response rates from 5 strata for 1998 MTHS approximately 63%. Design Total Sample Sample Response % of % of Strata The 1998 spring turkey hunting season in Missouri Stratum Permits Sent Replied Rate (%) Sample Sampled consisted of the 21 consecutive days beginning on Monday, Non­ 19,798 1,600 1,180 73.75 22.2 5.96 19 April. During the first week, a turkey hunter could Resident harvest one bearded turkey. If successful during the first North 21,756 1,532 975 63.64 18.3 4.48 Missouri week, the hunter was allowed to harvest only one additional South 34,375 2,421 1,509 62.33 28.4 4.39 bearded turkey during the last two weeks of the season. If Missouri the hunter was unsuccessful during the first week, the hunter St. Louis 19,959 1,405 969 68.97 18.2 4.85 could harvest two bearded turkeys during the second two Kansas 14,803 1,042 688 66.03 12.9 4.65 City weeks, but only one bird per day could be taken. During this Total 110,691 8,000 5,321 66.51 100 4.81 3­week season, the turkey biologist wanted information concerning hunter success for the first day due to the The MTHS is a post­season mail survey where the opening day effect in the hunting season when hunting questionnaires are mailed to the hunter’s listed residence. pressure normally exceeds that of any other day of the The survey has questions specifying each day of the hunting season. The remaining six days of the first week are season and asking hunters whether or not they hunted, in combined into a second time period of interest, called week what county they hunted, and if their hunt was successful or 1. Week 2 and week 3 are modeled separately to give a total not. In total 41% of the residents responded to the first of four time periods. mailing. Those who did not respond within two months

North South St. Louis Kansas City Figure 1 Sampling domain boundaries for the four Missouri resident sampling domains

Statistics Canada, Catalogue No. 12­001­X 176 Oleson, He, Sun and Sheriff: Bayesian estimation in small areas when design strata differ from domains

The small area of interest is the county­time period ind (y | n, p ) Binomial(n, p ) , (1) combination for turkey hunters from each stratum. Turkey ijk ijk ijk ∼ ijk ijk population management focuses on the county level as the where, =, 1 i… , I is the county (i.e., study domain), smallest areal unit of interest due to hunters’ ability to =,1 j… , J is the time period, and =, 1 k… , K is the indicate where they have hunted in reporting their success. design stratum. Here IJK= 114 , = 4 , = 5 for MTHS. A hunting license is required to hunt, thus we know how The previous analyses of the MTHS assumed a fixed many total hunters there are in Missouri and in which sample size, nijk , which we believe is best considered stratum they live. Spreading the data out across 114 counties random. We don’t know the number of trips made to each produces very sparse data within each of the counties. For county until after the survey has been collected. We also this reason, rather than looking at success rates on a daily don’t know the total number of hunting trips taken in the basis, we will use four time periods as defined above. In small area, N ijk , only the number of potential hunters. Since calculating the county­specific success rates, however, we hunters must stop after their second turkey, counties with use the number of hunting trips in each county. We do not higher success rates would be expected to have fewer know how many trips each hunter will take or in which hunting trips (or days) for the same number of harvested county they will hunt. Thus, while we know the number of birds than a county with lower success rates. If there is a hunters in each stratum, the number of trips is unknown. correlation, then the sample size must be considered Furthermore, while we know the number of hunters who random. Also, in Bayesian hierarchical modeling, if the have been sampled from each stratum, the number of trips distribution of a sample size is independent of the to a specific county is unknown and random because a distribution of the response variable, then the estimates are different sample of hunters will yield a different number of identical for random and non­random (fixed) sample size , n trips. There are two principle parameters to estimate and one see Durbin (1969). The estimated success rate is smoother random variable to predict. The first parameter is the total for a fixed nijk than when it is random (Woodard, He and number of hunting trips taken by all hunters, known as Sun 2003). Malec et al. (1997) applied the Bayesian small hunting pressure. This is important to wildlife managers area estimation to the National Health Interview Survey. who are concerned about the quality of the hunting There are two major differences between our model and that experience. Too many hunters in an area have the tendency of Malec et al. (1997). First, the population sizes, N ijk , are to interfere with each other’s hunt, which in turn lowers the known in their model but unknown in our model. This is the quality of the hunting experience. The second parameter to main reason that we introduce the bivariate GLMM. estimate is the hunting success rate, or the proportion of Secondly, the logit of the success rate is modeled as a linear turkeys harvested per hunting trip. If there are not enough function of covariates in their model. Therefore, the turkeys in a county, wildlife managers will close the county estimates depend on the values of covariates but not the for a number of years in order for the turkey population to spatial locations. We will add a spatial component to the recover. The random variable to predict is the total number logit of success rates in addition to the covariates so that the of turkeys harvested. In Missouri, every turkey that is estimates depend on both covariates and spatial locations. harvested must be reported. In 1998, turkey hunters were This is necessary if some important covariates are not required to go to a check station where their turkeys were available. tallied. It was expensive to maintain these check stations. To incorporate the randomness of the sample sizes, we One of purposes of this project is to predict the total number model n with Poisson distributions of turkeys harvested at the county level and compare this to ijk the actual number recorded at the check stations. ind (nijk | N ijk )∼ Poisson (RNk ijk ) . (2) Next we present a model to estimate hunting success rate and hunting pressure, as well as predict total turkey harvest The mean and variance of the Poisson distribution for nijk is simultaneously. The model accounts for random sample a constant multiplied by the population size, N ijk . This sizes that the previous models of He and Sun (1998), He and constant Rk is the ratio of the number of hunters in the Sun (2000), and Oleson and He (2004) did not. sample for stratum k to the total number of hunters from stratum . k This ratio can be calculated from Table 1. 2.2 The Model For the Poisson distribution, the overall sample size, n.. k Let pijk denote the success rate, yijk the number of is considered random. We presented in the previous section successes, nijk the number of trips in the sample, and N ijk why we consider this assumption to be appropriate. If n.. k the total number of trips, in county , i at time , j for were fixed, then the multinomial distribution would be a individuals from stratum . k We model turkey harvests with more appropriate model. The likelihoods of these two independent binomial distributions

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 177 approaches are very similar and yield comparable results in overview of MCMC methodologies see Gelman et al. either context (see Agresti 2002, pages 8­9). (1995), Gilks, Richardson and Spiegelhalter (1996) and

We model pijk using its logit function and N ijk using a Robert and Casella (1999). The full conditional distributions logarithm transformation, i.e., required for this evaluation may be found directly from p  those given in Fact 1 in Section 3. Most of these conditional η = log ijk , ω =log(N ) . (e ) ( u ) ijk   ijk ijk distributions (,,,)θ ak akδ ak δ u ak are of standard forms and 1 − pijk  are easily sampled from. Other full conditional distributions Linear mixed models are used for the priors on both η ijk (,ηik ω ik and ρ ak ) have log­concave densities for the and ωijk by assuming MTHS. The adaptive rejection sampling method of Gilks

ηijk = θ1 jk +u 1 ik + e 1 ijk , and Wild (1992) can be used to generate random samples from log­concave densities. ωijk = θ2 jk +u 2 ik + e 2 ijk . 2.3 Bayesian estimation and prediction th Here for a =1 , 2, θ ajk denotes fixed effect due to the j At this point, we have obtained estimates of ().pijk, N ijk time in stratum k, u aik represents a random county effect, (e ) We wish to pool the estimates from the stratum together in and random errors eaijk are iid N (0 , δak ) . To complete the Bayesian hierarchical model, we need to order to estimate pij and N ij . We will also predict the specify the priors for =(), θ ,…, θ ′ u =,(u … , unobserved number of harvests in county , i denoted hi . θ ak a1 k aJk ak a1 k (l ) ( l ) (e ) To obtain the estimates of (,),pij N ij let ()pijk, N ijk , u aIk ),′ and δ ak . He and Sun (2000) and Oleson and He (2004) show that there is significant spatial correlation =,1 l… , L be the output from the sampled Gibbs chain among counties of Missouri in estimating the success rates. after the burn­in sample. Define K p(l ) N ( l ) They use a conditional auto­regressive (CAR) structure to (l ) ∑ k =1 ijk ijk pij = ,l =1 , … , L . model spatial dependence between neighboring counties. K N (l ) ∑ k =1 ijk The joint density of u ak is given by The posterior mean and variance of pij can then be f (u ak ) = approximated by

(u ) − I 1  t  L 2 2 1 1 (l ) (2πδak ) |IC −ρak |exp −uak ( I −ρ ak C ) u ak  . (3) Eˆ () p = p (4) 2δ (u ) ij ∑ ij  ak  L l =1 (u ) (u ) Here δ1 k and δ2 k > 0 are variance components and and L − ρ ICak is a nonnegative definite symmetric matrix ˆ 1 (l ) 2 L ˆ 2 V() pij = ∑ {pij }− {E ( p ij )} , (5) (Besag 1974). I is the II× identity matrix and C is an L − 1 l =1 L − 1 adjacency matrix whose component {}c is 1 if areas i and ij respectively. Similarly define NN()l = ∑ K (l ) . The j share a common boundary with {c } = 0. We also ij k =1 ijk ii posterior mean and variance of N can be approximated define ρ and ρ to be the spatial correlation parameters. ij 1k 2k from the MCMC simulation output as well. Let λ≤… ≤ λ be the eigenvalues of the matrix . C 1 I We now focus on Bayesian prediction of the unobserved These matrices, − ρ IC , are positive definite if ak harvest by their posterior predictive distributions. We have λ−1 ≤ ρ ≤ λ − 1 (Clayton and Kaldor 1987). For the 1 ak I that y represents the number of harvested turkeys in Missouri data, = 114 I and the numerical values of λ and ijk 1 county , i of time period , j and of sampling stratum k for λ are ­2.8931 and 5.6938, respectively. This means that 114 those in the sample, whereas for those not in the sample, we the density of u exists if ρ is in (­0.3457, 0.1756). ak ak let the number of harvested turkeys in county , i during time For the remaining priors, we assume the following. θ ajk , j stratum k be represented as y ∗ . Thus, the number of is normal with a mean µ and variance τ . Let ρ be ijk ajk ajk ak turkeys harvested in county i at time j for hunters from uniformly distributed on the interval ().λ−1 , λ − 1 Finally, a 1 I stratum k is h= y + y ∗ . Here y is a known value common prior for the variance components δ () e and δ () u is ijk ijk ijk ijk ak ak and we need only find y ∗ . We may think of y ∗ given inverse gamma (Gelman, Carlin, Stern and Rubin 1995) ijk ijk (n,, N p ) as a binomial random variable in the form whose densities are proportional to ijk ijk ijk of (1). Thus ()e (u ) 1 βak  1 β ak  ind exp− and exp − , ∗ (e )α()e + 1 ()e  (u )α(u ) + 1 (u )  ak δ ak δ (yijk | n ijk,, N ijk p ijk )∼ Binomial(Nijk− n ijk , p ijk ). (6) δak ak  δ ak ak  (l ) ( l ) respectively. Let (pijk, N ijk ), l = 1 , … , L , be the output from running a To evaluate the posterior distribution, we apply MCMC Gibbs chain after a burn­in sample. The predictive mean of ∗ methods such as Gibbs sampling (Gelfand and Smith 1990) yijk given data d ={(yijk , n ijk ) : i = 1 , ... , I , j = 1 , ..., J , to obtain samples from the posterior distribution. For an k=,1 ..., K } is then

Statistics Canada, Catalogue No. 12­001­X 178 Oleson, He, Sun and Sheriff: Bayesian estimation in small areas when design strata differ from domains

L measures the deviance over an MCMC run and includes a Eˆ ( y∗ |d )= 1 (N()l − n ) p (l ) . (7) ijk ∑ ijk ijk ijk penalized fit measure taking into account the model L l =1 Finally we write dimension. Smaller values of DIC indicate a better­fitting model. Our model with data­dependent priors had a ˆ =y + Eˆ ( y ∗ |d ) , h ijk ijk ijk DIC = 13,910.5. No alternate models had a significantly J K reduced DIC value from this model. The model with non­ ˆ= ˆ . (8) hi ∑ ∑ h ijk informative priors on θ and θ had DIC = 14,700.4. A j=1 k = 1 1jk 2 jk reduced model with common correlation parameters The variability of hi is given by L J K ρ11 = ρ 12 = ρ 13 = ρ 14 = ρ 15 and ρ21 =ρ 22 = ρ23 =ρ 24 = Vˆ ( h |d ) = 1 (N(l )− n ) p (l ) (1 − p ( l ) ) ρ had DIC = 13,896.9. i L − 1 ∑ ∑ ∑ ijk ijk ijk ijk 25 l=1 j = 1 k = 1 As another model check, we have calculated statewide 2 L J K  averages of the estimates using both the simple naive + 1 [y+ ( N(l ) − n ) p (l ) ] − ˆ (l ) . L − 1 ∑ ∑ ∑ ijk ijk ijk ijk h i  design­based estimates and Bayesian estimates which l=1 j = 1 k = 1  follow in Table 2. At the statewide level, sample sizes are The proof is a straightforward application of conditional large enough to consider the design­based estimates reliable. expectations. Note that hijk is defined as a random variable. The statewide design­based estimates and model estimates It does not equal Nijk p ijk just as yijk does not equal match closely. Thus, the Bayes estimator performs well in nijk p ijk . Therefore, one should not simply use Nˆ ijk p ˆ ijk to terms of the design consistency property (see You and Rao estimate hijk although it might be very close if N ijk is large 2003 and Jiang and Lahiri 2006). We note that the first day enough. estimates are slightly lower due to smoothing, but the 2.4 Model fitting success rate estimates are still much higher for the first day than any other time period. Fifteen–thousand iterations were run for the Gibbs chain, of which 5,000 were discarded as a burn­in sequence. Thus, Table 2 Success rate estimates for 1998 MTHS posterior estimates are based on 10,000 autocorrelated Design Stratum Period Kills Trips Design­based Bayesian samples from the posterior distribution. In monitoring the Non­Resident Day 1 105 424 0.248 0.236 convergence of Gibbs sampling, we have used the Week 1 271 1,759 0.154 0.155 diagnostics of Heidelberger and Welch (1983) as well as Week 2 198 1,481 0.134 0.132 graphical monitoring of the sample paths. Convergence Week 3 75 635 0.118 0.120 diagnostics and posterior summaries were performed with North Day 1 82 452 0.181 0.171 the BOA software (Smith 2005). We let variance compo­ Week 1 211 1,535 0.138 0.137 nents δ () e and δ () u follow a non­informative IG (2, 1) prior ak ak Week 2 161 1,663 0.097 0.097 which gives a mean of one and infinite variance. We give Week 3 99 1,129 0.088 0.090 data­dependent priors to θ−1 jk ∼ N ( 1 . 5 , 16) and θ 2 jk ∼ (4 , 25). N To obtain a data­dependent prior, we modeled the South Day 1 138 751 0.184 0.178 following two steps because a completely arbitrary data­ Week 1 224 2,475 0.091 0.092 dependent prior might lead to unreliable posterior estimates Week 2 186 2,375 0.078 0.069 (Wasserman 2000). First, we began with noninformative Week 3 90 1,748 0.052 0.053 (but conjugate) priors to obtain the posterior means and St. Louis Day 1 54 346 0.156 0.149 standard deviations for each of these parameters through Week 1 91 1,280 0.071 0.073 MCMC simulations. We then set data­dependent prior Week 2 85 1,257 0.068 0.069 means to be close to the posterior means of θ1 jk and θ 2 jk Week 3 45 828 0.054 0.059 and the data­dependent prior standard deviations to be about ten times the posterior standard deviations, after the Kansas City Day 1 52 262 0.199 0.181 posterior estimates were obtained using the noninformative Week 1 95 894 0.106 0.108 priors. The estimates using these two approaches were quite Week 2 92 942 0.098 0.099 similar, but the model using the data­dependent priors gave Week 3 55 611 0.090 0.093 smaller variances. Overall Day 1 431 2,235 0.193 0.181 We fit many simplified models for comparison and Week 1 892 7,943 0.112 0.111 model checking. As a model selection tool we use the Week 2 722 7,718 0.094 0.088 Deviance Information Criterion (DIC) suggested by Week 3 364 4,951 0.074 0.075 Spiegelhalter, Best, Carlin and van der Linde (2002). DIC is a generalization of the Akaike Information Criterion that Total 2,409 22,847 0.105 0.102

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 179

2.5 Data analysis for MTHS counties having no estimate because nij = 0. Alternatively, Posterior means and standard deviations for parameters the Bayesian estimates for the success rate in county i for under all design strata are listed in Table 3. We note that the time j are plotted in the second row of Figure 2. The mean and variance estimates of θ and θ as well as Bayesian model success rate estimates produce a much 12k 13k more sensible range from 0.03 to 0.30. We don’t expect any θ 22k and θ 23k for =, 1 , 5 k… are approximately equal. This gives reason to believe that success rate estimates and county to have a success rate estimate of 0. Also, turkeys are hunting pressure estimates are similar for week 1 and 2 of quite difficult to hunt and a high success rate is not sensible the hunting season. There remains a difference in the first (He and Sun 1998, He and Sun 2000, Oleson and He 2004). day and week 3 of the hunting season, though (Vangilder, Thus a lower value, such as that produced from the model, Sheriff and Olsen 1990, Kimmel 2001). makes more intuitive sense. Standard deviations for the model success rate estimates are given in the third row of Table 3 Posterior means (Standard deviations) of model Figure 2. parameters Success rate estimates tend to decrease over the course of Non­Res North South St. Louis K.C. the hunting season. In addition, the highest success rate estimates are in the northern portion of Missouri. This was δˆ e1 0.133 0.117 0.136 0.164 0.143 (0.034) (0.030) (0.036) (0.051) (0.040) also shown to be true in previous analyses of the MTHS

δˆ e2 0.096 0.044 0.041 0.061 0.060 using 1996 data (He and Sun 2000, Oleson and He 2004). (0.017) (0.007) (0.006) (0.011) (0.011) We note that the highest success rate estimates are not in the 0.146 0.148 0.140 0.166 0.186 δˆ z1 same counties, though. The highest estimated rates from (0.040) (0.041) (0.038) (0.053) (0.059) 1998 have moved slightly to the east of where they were in δˆ z 2 0.570 0.511 0.633 0.373 0.356 (0.092) (0.081) (0.094) (0.062) (0.063) 1996. This shift is due to a time trend as noted by conservationists. θˆ 11 ­1.209 ­1.530 ­1.244 ­1.420 ­1.472 (0.131) (0.130) (0.135) (0.159) (0.160) We also produce estimates of the population size N ijk . The model­based estimates of N are plotted in the first θˆ 12 ­1.613 ­1.748 ­1.867 ­1.983 ­1.928 ij (0.109) (0.111) (0.130) (0.147) (0.150) row of Figure 3. The standard errors for the model­based

θˆ 13 ­1.801 ­2.081 ­2.152 ­2.032 ­2.007 estimates are plotted in the second row of Figure 3. The (0.113) (0.113) (0.129) (0.148) (0.148) values plotted for weeks 1, 2, and 3 are the daily averages ˆ ­1.830 ­2.135 ­2.354 ­2.115 ­2.008 θ14 for those weeks. It is apparent that more people were (0.132) (0.123) (0.138) (0.156) (0.153) hunting on the first day than any other day of the hunting θˆ 21 3.368 3.333 3.297 3.305 3.308 (0.098) (0.098) (0.087) (0.088) (0.098) season. We plot the actual check station data as the first map in θˆ 22 4.582 4.361 4.371 4.349 4.238 (0.097) (0.093) (0.084) (0.081) (0.090) Figure 4. Using formula (8) we predict the model­based

θˆ 23 4.404 4.429 4.459 4.293 4.229 total harvest hi in the second map of the first row. From (0.097) (0.093) (0.088) (0.083) (0.093) these figures, the model­based predictions look to be 3.693 4.093 4.039 3.934 3.891 θˆ 24 reasonably accurate. We expect to see a small amount of (0.101) (0.095) (0.086) (0.082) (0.094) overprediction as is shown by comparing the plots. Some ρˆ 1 0.153 0.103 0.163 0.158 0.125 (0.019) (0.071) (0.011) (0.016) (0.068) hunters may underestimate how often they went out to hunt and overestimate the number of birds they harvested. The ρˆ 2 0.168 0.172 0.171 0.172 0.172 (0.007) (0.003) (0.004) (0.004) (0.004) overcount may possibly also be attributed to turkeys harvested but not reported at a check station. One more We also point out the correlation parameter estimates of explanation could be that the hunters returning the survey

ρ1 k and ρ 2 k . The estimates for ρ 2k for the hunting pressure are those who were more successful, and those not returning are all near their upper boundary, suggesting a strong the survey were less successful or did not hunt. Comparing relationship between the counties when estimating the the predictions from the model to the actual harvest numbers number of hunting trips taken. Most of the estimates for ρ1 k is a confirmation that the model is appropriate and gives are more than two standard deviations away from zero as accurate predictions. Also, many states do not have check well. stations and this shows that they could obtain appropriate The simple naive design­based success rate estimates for estimates at the domain level even with a small sample at each county are in the first row of Figure 2. Design­based the domain level. estimates in the counties range from 0 to 0.55 with some

Statistics Canada, Catalogue No. 12­001­X 180 Oleson, He, Sun and Sheriff: Bayesian estimation in small areas when design strata differ from domains

SUCCESS RATES Day 1 Week 1 Week 2 Week 3

No data 0.0 0.2 0.4 0.6

0.0 0.10 0.25 STANDARD DEVIATIONS OF SUCCESS RATES Day 1 Week 1 Week 2 Week 3

0.0 0.04 Figure 2 Hunting success rates from stratified 1998 MTHS. Row 1 ­ design based estimates of success rates. Row 2 ­ Bayesian estimates of success rates. Row 3 ­ Standard deviations of Bayesian estimates of success rates

NUMBER OF TRIPS Day 1 Week 1 Week 2 Week 3

0 400 800 1,200 STANDARD DEVIATIONS OF NUMBER OF TRIPS

Day 1 Week 1 Week 2 Week 3

0 40 80 120 Figure 3 Estimated number of hunting trips from stratified 1998 MTHS. Row 1 ­ Bayesian estimated number of tips. Row 2 ­ Standard deviation of Bayesian estimates of number of trips

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 181

1998 Check Station Harvests Bayesian Predicted Harvests

0 200 600 1,000

Scatterplot Prediction Standard Errors 0 200 600 1,000 1998 Bayesian Prediction 0 200 600 1,000 1998 Check Station Data

0 5 10 15 20 25 30 Figure 4 1998 check station data, bayesian prediction, scatterplot of check station by prediction, standard error of bayesian prediction

3. General formulae for a bivariate GLMM where ηik is an unknown parameter, and it is often assumed

the scale parameter φ1 is known. The probability density 3.1 A general model function of nik is in the family,

We ignore the time component in the general model for g2( n ik |ω ik , φ2 ) = simplicity. Let nik and Yik be the sample sizes and response exp[A2 (φ 2 ){ nik ω ik − B2 ( ωik )} + C2 ( nik , φ 2 )] , (10) variables of interest in study domain i and design stratum where ωik is an unknown parameter equal to a function of , k respectively. We assume that {}Yik, n ik given unknown the population size N ik , and φ 2 is often known. The joint parameters {ηik , ω ik , φ1 , φ 2 }, with =, 1 , I , i… and =,1 , K , k… are independent. Assume that the conditional density of Yik and nik , a bivariate GLMM, is then density (mass) function of Y given n belongs to the ik ik p( yik, n ik | η ik , ω ik , φ1 , φ 2 ) = following family of probability densities, g1( yik |η ik , n ik , φ1 ) g 2 ( nik | ω ik , φ2 ) . (11) g( y |η , n , φ ) = 1 ik ik ik 1 The distribution family (10) is often called a generalized exp[(){Aφ y η − B ( η , n )} + C ( y , n , φ )] , 1 1 ik ik1 ik ik 1 ik ik 1 (9) linear model, which includes binomial, Poisson, normal and

Statistics Canada, Catalogue No. 12­001­X 182 Oleson, He, Sun and Sheriff: Bayesian estimation in small areas when design strata differ from domains

(e ) gamma distributions. See, for example, Sun et al. (2000). ii) [ωik | ⋅ ] ∝ exp(A2 ( φ 2 )[ nik ω ik − B2 ( ωik )] − 1/ 2 δ 2 k t 2 The distribution family (9) is a generalization of such a [()h2 ωik −x 2 i θ 2 k − S2i u 2 ik ])() h 2 ′ ωik or generalized linear model by including an additional equivalently v2ik =ω h 2 ()ik has the conditional parameter. Four special cases of (9) are the binomial, density, Poisson, normal and gamma distributions which are all a A(φ )[ n h− 1 ( v )  part of the exponential family. 2 2ik 2 2 ik  − 1  The bivariate GLMM is applicable when estimates at the [v2ik |⋅ ] ∝ exp − B2 ( h 2 ( v 2 ik ))]  .   study domains are of interest and the sample size nik is ()e t 2 −1/ 2 δ (v −x θ 2 k − S u ) considered random as well as being part of the observation.  2k 2 ik 2 i 2 i 2 ik 

The bivariate GLMM is also useful when estimates of N ik For = 1 , 2, a we have the following conditional are required. distributions. A linear mixed model can be used as a prior for η to ik iii) (θ ak |⋅ ) are mutually independent and have account for the variability in ηik . However, one might be conditional posterior distributions interested in both ηik and ωik or a function of ηik and ωik  − 1  where ω is often a function of the population size. Here  t  ik  1I+ 1 X X   τ (e ) a a  we would need to model ηik and ωik simultaneously. A  ak δ ak   general class of GLMM for η and ω could be   ik ik  µ   1 t ak  N I  (e ) Xa( v ak− u ak ) + 1  , . h1 ()η k =X1θ1 k + S 1 u 1k + e 1 k , (12) τ  δ ak ak    − 1  h2 ()ω k =X2θ 2 k + S 2 u 2k + e 2 k . (13)    1IXX+ 1 t  τ (e ) a a   With a =1 , 2, a = X {x aik } and a= S {}s aik are known  ak δ ak   design matrices. The vector θ ak is the vector of fixed iv) ( ak |⋅ u ) are mutually independent and have effects, uak is the vector of random effects, and eaik are (e ) conditional posterior distributions independent residual effects and eaik ∼ N (0 , δ ak ). In −1 addition, uak and eaik are assumed mutually independent.    1t 1   ()e Sa S a + (u ) B ak   3.2 Additional priors  δak δ ak     To complete the Bayesian hierarchical model, we need to  1 t   N I  (e ) Sa( v ak− X a θ ak  , . (e )  δ  specify the priors for (,,),θ ak δ u a=1 , 2, k = 1 , … , K .  ak  ak ak   The common prior for fixed effects θ is normal with a − 1 ak  1t 1   large variance or a constant prior. Random effects are often  ()e Sa S a+ (u ) B ak    δak δ ak  spatially correlated. The density may be of the CAR form   whose joint density is given by equation (3) with (e ) v) (δak | ⋅ ) have posterior distributions BIC= − ρ . Finally, a common prior for variance ak ak ()e (e ) () e αak +I / 2 , β ak  components δ ak is an inverse gamma distribution.  t  To evaluate the posterior distribution, MCMC methods, InverseGamma + 1/ 2(vak − X aθ ak − S a u ak )  . such as Gibbs sampling, may be used to obtain samples   (v− Xθ ak − S u )  from the posterior distribution. We give full conditional  ak a a ak  distributions in the general case below. ()u ()u (u ) t vi) (δak | ⋅ )∼ Inverse Gamma (αak +I / 2 , βak + 1/ 2 u ak Fact 1 Let (Ω⋅ | ) represent the conditional distributions of Bak u ak ). Ω given all other parameters and [Ω⋅ | ] represent the t 1/ 2 uakB ak u ak  conditional density. The conditional posterior densities of vii) [ρak | ⋅ ] ∝|B ak | exp − (u )  . 2δ ak  ηik and ωik are as follows. (e ) viii) If φ has the prior density (), φ p then i) [ηik | ⋅ ] ∝ exp(A1 ( φ 1 )[ yik η ik − B1 ( η ik , n ik )] − 1/ 2 δ 1 k 1 1 t 2 [()h η −x − S u])() h′ η or equivalently −1 1 ik1 iθ1 k 1 i 1 ik 1 ik A(φ )[ y h ( v )  v=η h () has the conditional density, 1 1ik 1 1 ik 1ik 1 ik  − 1  [φ1 | ⋅ ] ∝p ( φ1 ) exp − B1 ( h 1 ( v 1 ik ) , n ik )]  . − 1 A1(φ 1 ) [ yik h 1 ( v 1 ik )    +C1() yik , n ik , φ1  − 1    [v1ik |⋅ ] ∝ exp −B1 ( h 1 ( v 1 ik ) , n ik )] .  ()e t 2  ix) If φ 2 has the prior density (), φ 2 p then −1/ 2 δ1k (v 1 ik −x 1 iθ1 k − S 1 i u 1 ik ) 

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 183

− 1 Assume that a further observation y ∗ follows A2(φ 2 )[ nik h 2 ( v 2 ik )  ik   ∗ −1 g3 ( yik | , ξ d ) that may be dependent on . d The predictive [φ2 | ⋅ ] ∝p ( φ2 ) exp − B2 ( h 2 ( v 2 ik ))]  . ∗ density of yik given d is written as +C() n ,φ   2ik 2  [y∗ |d ]= g ( y∗ |ξ ,d )[ ξ | d ] d ξ . ik ∫ ξ 3 ik Examining parts ), ), iii and ) vii of Fact 1, the conditional densities of ηik, ω ik and ρ ak are often log­ Under the squared error loss function, the best predictor is concave. then the predictive mean given by

∗  ∗ ∗ ∗    3.3 Estimating quantities of study domains E( yik |d ) = ∗ yik g3 ( y ik |ξ, d ) dyik [ ξ | d ] d ξ. ∫ξ  ∫y ik  It is often of interest to estimate certain quantities related Similarly, the best predictor of h() y∗,, … y ∗ given d is to the study domains. For instance, to estimate a quantity at i1 iK then domain , i let ∗ ∗ ∗ ∗ E{( h yi1, … , y iK )|}d = E {( h yi1 , … , y iK )|}[|]ξ ,d ξ d d ξ , (14) ψ=if i( η,ω; i1 i 1 …;η,ω iK iK ) . ∫ ξ

Bayesian estimates of ψ i can be easily computed based on where a random sample from the joint posterior. For example, let ∗ ∗ (l ) ( l ) E{ h ( yi1 , … , y iK )|ξ ,d } = (ηik , ω ik ),l = 1 , … , L and =, 1 k… , K be the output from K MCMC simulations, and define h( y∗,, … y ∗ ) [ y ∗ |ξ, d ] dy∗ …dy ∗ . (15) ∫ (y∗, … , y ∗ ) i1 iK∏ ik i1 iK (l ) (l ) ( l ) ( l ) ( l ) i1 iK k =1 ψi =f i( η,ω;;η,ω i1 i 1 … iK iK ) . For the distribution family g( y ∗ |ξ, d ), the right hand side Given y ={y , i = 1 , ..., I , k = 1 , ..., K } , the posterior 3 ik ik of (15) often has a closed form expression. mean and variance of ψ , the general forms of (4) and (5), i Bayesian predictions of (14) can be easily computed can be approximated by based on a random sample from the joint posterior of ξ . For L example, let ξ (l ) ,=l1 , … , L , be the output from MCMC E ˆ (ψ |y ) =1 ψ (l ) i ∑ i simulations. The posterior predictive mean (14) can then be L l =1 approximated by and L ˆ ∗ ∗ 1 ∗ ∗ (l ) L E{ h ( yi1, … , y iK )|d } = E { h ( yi1 , … , y iK )|ξ ,d } . (l ) 2 2 L ∑ Vˆ(ψ |y ) =1 { ψ } −L {E ˆ ( ψ |y )} , l =1 i L − 1∑ i L − 1 i l =1 ∗ ∗ The posterior predictive variance of h() yi1 ,, … y iK given d respectively. may be written as

∗ ∗ ∗ ∗ 3.4 Predicting quantities of study domains V{( h yi1, … , y iK )|}d = E [{( V h yi1 , … , y iK )|ξ , d }|] d

∗ ∗ ∗ +V[ E { h ( yi1 , … , y iK )|ξ , d }| d ] . Now let N ik be the population size and Yik the response value of interest for those not in the sample from study This predictive variance can be approximated by domain i and design stratum . k We wish to predict the ∗ K ∗ L quantities YYi = ∑ k =1 ik , the total response value in study ˆ ∗ ∗ 1 ∗ ∗ (l ) ∗ V{( h yi1, … , y iK )|}d = ∑ V {( h yi1 , … , y iK )|ξ , d } domain . i For given nik, Y ik should be of the same family as L l =1 Y such that L ik ∗ ∗ (l ) ∗ ∗ 2 ∗ 1 ˆ + [{(E h yi1, … , y iK )|ξ ,d } −E {( h yi1 , … , y iK )|}]d . A1(φ 1 ){ y ik η ik  ∑ L l =1 ∗   g3( yik | η ik , n ik , N ik , φ1) = exp −B1 ( ηik , N ik − n ik )}  .  ∗  +C1( yik , N ik − n ik , φ1 )  4. Comments This is simply (9) with nik replaced by Nik− n ik . To simplify notation, let ξ denote the parameters in the model, In this article, we developed a bivariate GLMM

d represent the data. (In our case here d =,{()yik n ik : containing two unknown canonical parameters. This was i=1 , ..., I , k = 1 , ..., K } and ξ might include the necessary to obtain estimates when the sample size was parameters in modeling both yik and nik .) Under the prior random and estimates N ik were required in addition to π ()ξ (either proper or improper), we have the posterior estimates of ηik . The model was built using two density [ξ |d ]∝ ( d | ξ ) π ( ξ ). f simultaneous GLMMs in a Bayesian hierarchical structure.

Statistics Canada, Catalogue No. 12­001­X 184 Oleson, He, Sun and Sheriff: Bayesian estimation in small areas when design strata differ from domains

The proposed model has the advantage of being applicable Clayton, D., and Kaldor, J. (1987). Empirical Bayes estimates of age­ standardized relative risks for use in disease mapping. Biometrics, to a wide array of problems. Introducing a random sample 43, 671­681. size and estimating the population sizes are useful techniques for many applications. Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley Naturally, we think that there is an inverse relationship & Sons, Inc. Third Edition. between hunting success rates and the number of trips taken Das, K., Jiang, J. and Rao, J.N.K. (2004). Mean squared error of to a county. In modeling this, we may think of a two­fold empirical predictor. The Annals of Statistics, 32, 818­840. CAR model to view the relationship between ηik and N ik such as that of Kim, Sun and Tsutakawa (2001). A Durbin, J. (1969). Inferential aspects of the randomness of sample multivariate CAR model may be another approach to size in survey sampling. In New developments in survey sampling, (Eds., N.L. Johnson and H.J. Smith). New York: Wiley­ address this situation. Interscience, 629­651. For each of the spatial models, we have assumed a common correlation, ρ k , across the entire state. It may be Gelfand, A.E., and Smith, A.F.M. (1990). Sampling­based more reasonable to include additional correlation terms in approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398­409. different productivity regions defined by the Missouri Department of Conservation. This would be an interesting Gelman, A., Carlin, J., Stern, H. and Rubin, D. (1995). Bayesian Data and complicated addition to the model. In addition, the Analysis. Chapman & Hall, London, U.K. spatial structure used in this paper is similar to that of He and Sun (2000) and Oleson and He (2004) where this spatial Ghosh, M., Natarajan, K, Stroud, T. and Carlin, B.P. (1998). Generalized linear models for small­area estimation. Journal of modeling worked well. the American Statistical Association, 93, 273­282. It may be useful to include the distance from hunter’s home to the hunting location to help estimate hunting Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov pressure. Most hunters stay close to home when hunting and Chain Monte Carlo in Practice. Chapman & Hall. this information could be incorporated into the hierarchical Gilks, W.R., and Wild, P. (1992). Adaptive rejection sampling for framework. Gibbs sampling. Applied Statistics, 41, 337­348. Note that the estimated harvest is higher than the check He, C.Z., and Sun, D. (1998). Hierarchical Bayes estimation of station harvest. This is partly because more successful hunting success rates. Environmental and Ecological Statistics, 5, hunters tend to reply to the mail survey. We are conducting 223­236. research adjusting for the nonresponse bias. He, C.Z., and Sun, D. (2000). Hierarchical Bayes estimation of hunting success rates with spatial correlations. Biometrics, 56, 360­367. Acknowledgements Heidelberger, P., and Welch, P. (1983). Simulation run length control The work was partially supported by Federal Aid in in the presence of an initial transient. Operations Research, 31, Wildlife Restoration Project W­13­R. Sun’s research was 1109­1144. supported by the National Science Foundation grant SES­ Jiang, J., and Lahiri, P. (2006). Mixed model prediction and small 0351523, and NIH grant R01­CA100760. The authors would area estimation. Test, 15, 1­96. like to thank Dr. Larry Vangilder and Mr. Jeff Beringer of the Jiang, J., Lahiri, P. and Wan, S.­M. (2002). A unified jackknife theory Missouri Department of Conservation for their helpful for empirical best prediction with M­estimation. The Annals of comments. The authors wish to express deep appreciation to Statistics, 30, 1782­1810. the editors, an associate editor, and two referees for Kim, H., Sun, D. and Tsutakawa, R.K. (2001). A bivariate Bayes constructive comments on earlier versions of the paper. method for improving the estimates of mortality rates with a twofold conditional autoregressive model. Journal of the American Statistical Association, 96, 1506­1521. References Kimmel, R.O. (2001). Regulating spring wild turkey hunting based Agresti, A. (2002). Categorical data analysis. New York: John Wiley on population and hunting quality. In Proceeding of the National & Sons, Inc. Wildlike Turkey Symposium, 8, 243­250.

Besag, J. (1974). Spatial interaction and the statistical analysis of Lohr, S.L. (1999). Sampling: Design and analysis. Duxbury Press. lattice systems (with discussion). Journal of the Royal Statistical Society, Methodological, Series B, 36, 192­236. Malec, D., Sedransk, J., Moriarity, C.L. and LeClere, F.B. (1997). Small area inference for binary variables in the National Health Carlin, B.P., and Louis, T.A. (2000). Bayes and Empirical Bayes Interview Survey. Journal of the American Statistical Association, Methods for Data Analysis. Chapman & Hall Ltd. 92, 815­826.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 185

Oleson, J.J., and He, C.Z. (2004). Space­time modeling for the Sun, D., Speckman, P.L. and Tsutakawa, R.K. (2000). Random Missouri Turkey Hunting Survey. Environmental and Ecological effects in generalized linear mixed models (GLMMs). In Statistics, 11, 85­101. Generalized Linear Models: A Bayesian Perspective, (Eds., D.K. Dey, S.K. Ghosh and B.K. Mallick). New York : Marcel Prasad, N.G.N., and Rao, J.N.K. (1990). The estimation of the mean Dekker, 23­39. squared error of small­area estimators. Journal of the American Statistical Association, 85, 163­171. Vangilder, L.D., Sheriff, S.L. and Olsen, G.S. (1990). Characteristics, attitudes, and preferences of Missouri’s spring turkey hunters. In Rao, J.N.K. (2003). Small Area Estimation. Wiley­Interscience. Proceeding of the National Wildlife Turkey Symposium, 6, 167­ 176. Rao, J.N.K. (2005). Inferential issues in small area estimation: Some new developments. Statistics in Transition, 7, 513­526. Wasserman, L. (2000). Asymptotic inference for mixture models using data­dependent priors. Journal of the Royal Statistical Robert, C.P., and Casella, G. (1999). Monte Carlo Statistical Society, Statistical Methodology, Series B, 62, 159­180. Methods. Springer­Verlag. Woodard, R., He, C.Z. and Sun, D. (2003). Bayesian estimation of Smith, B.J. (2005). Bayesian Output Analysis program (BOA), hunting success rate and harvest for spatially correlated post­ version 1.1.5. http://www.public­health.uiowa.edu/boa. stratified data. Biometrical Journal, 45, 985­1005.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and Van Der Linde, A. You, Y., and Rao, J. (2003). Pseudo hierarchical Bayes small area (2002). Bayesian measures of model complexity and fit (Pkg: 583­ estimation combining unit level models and survey weights. 639). Journal of the Royal Statistical Society, Methodological, Journal of Statistical Planning and Inference, 111, 197­208. Series B, 64, 583­616.

Statistics Canada, Catalogue No. 12­001­X

Survey Methodology, December 2007 187 Vol. 33, No. 2, pp. 187­198 Statistics Canada, Catalogue No. 12­001­X

Small area estimation of average household income based on unit level models for panel data

Enrico Fabrizi, Maria Rosaria Ferrante and Silvia Pacei 1

Abstract The European Community Household Panel (ECHP) is a panel survey covering a wide range of topics regarding economic, social and living conditions. In particular, it makes it possible to calculate disposable equivalized household income, which is a key variable in the study of economic inequity and poverty. To obtain reliable estimates of the average of this variable for regions within countries it is necessary to have recourse to small area estimation methods. In this paper, we focus on empirical best linear predictors of the average equivalized income based on “unit level models” borrowing strength across both areas and times. Using a simulation study based on ECHP data, we compare the suggested estimators with cross­ sectional model­based and design­based estimators. In the case of these empirical predictors, we also compare three different MSE estimators. Results show that those estimators connected to models that take units’ autocorrelation into account lead to a significant gain in efficiency, even when there are no covariates available whose population mean is known.

Key Words: European Community Household Panel; Average equivalized income; Linear mixed models; Empirical best linear unbiased predictor; MSE estimation.

1. Introduction which is defined according to certain principles described on the EUROSTAT web site http://europa.eu.int/comm/ In recent years, the academic world has taken an eurostat/ramon/nuts/home regions en.html ). Unfortunately increasing interest in the analysis of regional economic NUTS1 correspond to areas (five groups of Administrative disparities that represent a serious challenge to the Regions in the Italian case) that are too large to effectively promotion of national economic growth, and thus to social measure local area income disparity or to provide useful cohesion. This is particularly true within the European information for the purposes of regional governance. Union, where regional disparities are a distinguishing Therefore, to obtain estimates for a finer geographic detail, a feature of the economic landscape. This renewed interest in small area estimation method has to be used and the local economies has produced a growing demand for problem is to select an appropriate and effective method. regional statistical information and has stimulated research In this paper, in order to combine information from past on income distribution, poverty and social exclusion at the surveys, related auxiliary variables and small areas, we sub­national level. consider several possible extensions of the well­known unit In the 1990s, Eurostat (the EU’s Statistics Bureau) level nested error regression model (see Battese, Harter and launched the European Community Household Panel Fuller 1988) for the estimation of the average of household (ECHP), an annual panel survey of European households equivalized income. Using ECHP panel survey data, we conducted using standardised methods throughout the EU’s illustrate how such model could be potentially useful in various member countries (Betti and Verma 2002; Eurostat improving the efficiency of small area estimates by 2002). The ECHP terminated in 2001, after eight waves. exploiting the correlation of individual household incomes Currently, it is being replaced by the Survey on Income and over time. Living Conditions in the Community (EU­SILC), which In section 2, we present a general set­up for small area resembles the ECHP in many ways, but for which no data estimation using panel survey data and briefly review both has yet been published. The ECHP panel survey covered a design­based and model­based small area estimation wide range of topics and, in particular, it made it possible to methods. In this section, we develop empirical best linear calculate disposable equivalized household income, which unbiased predictors (EBLUP) and their mean squared error constitutes a key variable in the study of economic equity (MSE) estimators for selected unit level cross­sectional and and poverty. time series models using the available theory on EBLUP for The ECHP was designed to provide reliable estimates for small area estimation (see Rao 2003, and Jiang and Lahiri large areas within countries called NUTS1 (NUTS stands 2006a, for details). We note that cross­sectional and time for the “Nomenclature of Territorial Units for Statistics” series models were considered in the small area literature,

1. Enrico Fabrizi, DMSIA, University of Bergamo, via dei Caniana 2, 24127, Bergamo, Italy. E­mail: [email protected]; M.R. Ferrante, Department of Statistics, University of Bologna, via Belle Arti 41, 40126, Bologna, Italy. E­mail: [email protected]; S. Pacei, Department of Statistics, University of Bologna, via Belle Arti 41, 40126, Bologna, Italy. E­mail: [email protected]. 188 Fabrizi, Ferrante and Pacei: Small area estimation of average household income based on unit level models but only in the context of area level modelling (see Rao and same characteristics as larger ones. Moreover, when infor­ Yu 1994; Ghosh, Nangia and Kim 1996; Datta, Lahiri, mation about auxiliary variables is available, a particular Maiti and Lu 1999; Datta, Lahiri and Maiti 2002; synthetic estimator, the regression estimator, may be Pfeffermann 2002; among others). obtained by fitting a regression model to all sample data. In section 3, we briefly review ECHP survey and Note that the synthetic estimator is area specific with respect describe how we use this survey data to conduct a Monte to the auxiliary variables but not with respect to the study Carlo simulation study to compare different small area variable. estimators and their MSE estimators. In section 4, we report For instance, if we consider only those observations from results from the Monte Carlo simulation experiment. We the last wave (= ), tT the simple regression model would note that the simulation experiment is aimed at evaluating be given by: design­based properties of all estimators, even if they are ydTi=x ′ dTi β + e dTi derived as model based predictors. We observed that the E( e )= 0, E ( e 2 )= π 2 . EBLUPs perform very well compared to the design­based dTi dTi estimators even though our pseudo­population exhibits signs To take account of the complexity of the sampling of non­normality. The non­normality of the pseudo­popu­ % design, the weighted least squares estimate β w of β may be lation, however, seems to affect the efficiency of the MSE obtained, and thus the synthetic regression estimator will be estimators. In our simulation, the Taylor series (see Prasad given by: and Rao 1990; Datta and Lahiri 1999, among others) and y =′ β% d = X 1,… , m . (1) the parametric bootstrap (see Butar and Lahiri 2003) MSE dT, RSYN dT w estimators are found out to be more sensitive to the non­ Synthetic estimators usually display very low variances, normality than the jackknife method of Jiang, Lahiri and but they may be severely biased whenever the model Wan (2002). We end the paper with a few concluding holding for the whole sample does not properly fit area­ remarks. specific data. Composite estimators are weighted averages of a direct and a synthetic estimator. We consider the 2. The small area estimation methods considered composite estimator:

ydT, COMP = φdTy dT, DIR +(1 − φ dT )y dT , RSYN , (2) To describe sample data, let ydti denote the value of a study variable for the i th unit belonging to the d th small where area for time t (d= 1,..., m ; t = 1,..., T ; i = 1,..., ndt ). More­ MSED (y dT , RSYN ) over, let x dti be the vector of covariates’ values associated φdT = MSED (y dT, DIR )+ MSED (y dT , RSYN ) with each ydti (and whose first element is equal to 1), and let = Xx{} ′ be the n× p matrix of covariates’ values for s dti and MSE D signifies that the mean square error is evaluated the whole sample (n= ∑ d, t ndt ). Let us suppose that we are in relation to the randomization distribution. This choice of interested in predicting small area means for the target φ dT leads to composite estimators y , COMP dT that are variable at final time T: Y , ( d= 1, ..., m ). Let us also dT approximately optimal in terms of MSE D (see Rao 2003, suppose that the vectors of mean population values of section 4.3). In practice, the quantities in the formula for covariates are known for time T; we denote these vectors by φ dT ’s are unknown and may be estimated from the data. ′dT (d= X 1, ..., m ). Unbiased and consistent estimators can be obtained for MSE (y )= V ( y ) using standard formulas. 2.1 Design­based estimators D dT, DIR D dT , DIR An approximately design unbiased estimator of

A first solution to the small area estimation problem is to MSED (y dT , RSYN ) can be obtained using the formulas use direct estimators, that is, estimators employing only y discussed in Rao (2003, section 4.2.4). In particular, we values obtained from the area (and time) which the calculate the approximation: parameter refers to. The simplest of direct estimators of the mse (y )≈ ( y − y)(),2 − v y population mean is the weighted mean. We denote this D dT, RSYNdT , RSYNdT , DIR D dT , DIR direct estimator as ydT , DIR ( d= 1, ..., m ) and we will be where mse D and v D stand for the estimators of the using it as a benchmark in the following sections. corresponding MSE D and VD . In particular, v D is the Synthetic estimators may be generally defined as ordinary design unbiased estimator of VD . We then take its unbiased estimators for a larger area with acceptable average over d, as usual, in order to obtain a more stable standard errors. They are used to calculate estimates for estimator. In fact, one problem with mse D is that it can small areas, under the hypothesis that small areas have the even be negative.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 189

Moreover, a modified direct estimator borrowing We consider different specifications for linear mixed strength over areas for estimating the regression coefficient models, all of which can be viewed as special cases of the can be used to improve estimator reliability. If auxiliary general model (4). For the sake of simplicity, we have information is available, the generalized regression esti­ adopted a unit level notation when describing the models mator (GREG), considered. The first model: ′ ∑ wj e j MM1: ydti=x dti β + v d + α t + e dti , (5) j∈ s % dT y dT, GREG =X ′dT β w + , (3) w may be obtained from formula (4) setting s==2, q1 m , ∑ j 2 2 2 j∈ s dT q2 = T GIGIRI1 = σν m,,.2 = σα T = σ e n It includes inde­ pendent area and time effects, and therefore area effects are approximately corrects the bias of the synthetic estimator by − 1 assumed not to evolve over time. This random effects means of the term ()∑j∈ s w ∑ j∈ s w e , based on dT j dT j j structure corresponds to the assumption of a constant regression residuals e . j covariance between units that belong to the same area, 2.2 Model­based estimators observed at two different points in time. The model­based estimators we have considered are The second model: based on the specification of explicit models for sample data MM2 : ydti=x ′ dti β + δ dt + e dti , (6) which approximate a hypothetical data­generating process. As a consequence, the problem of estimating Y comes corresponds to the particular case in which = 1, s dT 2 2 down to one of prediction. Moreover, mean square errors q1 = mq,,. GIRI1 = σδ mq = σ e n The effects of interaction and other statistical properties of estimators are usually between area and time are introduced, that is, we assume evaluated with respect to the data­generating process. We there are area effects which are not constant over time. have focused here on “unit level” models based on models The third model: relating ydti to a vector of covariates x dti . The use of MM3: y=x ′ β + v + α* + e , (7) explicit models has several advantages, the most important dti dti d t dti 2 2 of which being the opportunity to test underlying is obtained setting s=2, q1 = m , q 2 = T GIRI 1 =σv m,, =σ e n assumptions. while the generic element 2 (,) k gh of G 2 is 2 (,) k gh = 2 h− k In the estimation of the small area means or totals of σα ρ α ,h , k = 1, ..., T . There are independent area and continuous variables, linear mixed models are very often time effects, just as in MM1, but the time effects are used. A general linear mixed model can be described as assumed to follow an AR(1) process. follows: The fourth model:

* y= X β + Z1v 1 +... + Zs v s + e , (4) MM4 : ydti=x ′ dti β + δ dt + e dti , (8) where = y {}ydti is the n­vector of sample observations, β is similar to model MM2 in that it is characterized by time a × 1 p vector of fixed effects, v j is a q j × 1 vector of varying area effects, but the further assumption that such random effects (j= 1, ..., s ), = e { edti } a vector of errors; X effects follow an AR(1) process is also introduced. Thus, is assumed of rank p ,{}i= Zz ′ dti is a n× q j matrix of provided we order observations by area, with respect to the th incidence of the j random effect. We assume that general formula (4) we have s==1, q1 mq , 1 = G 2 E()0,() vj = V vj =G j ,()0,() E e = V e = R (all expecta­ diag(GRI1 d ), = σ e n where 1 d ,d= G 1, ..., m , is a TT× v,,,… v e 2 h− k tions are wrt. model (4)) and that 1 s are mutually matrix the generic element g1 r (,),, h k= σδ ρ δ h k = independent. 1, ..., . T As a consequence, the variance­covariance matrix of y is The last specification: given by: * MM5: ydti=x ′ dti β + v d + α t + e dti , (9) s

V=V () y = Zj G j Z′ j + R = ZGZ′ + R , ∑ may be obtained by (4) setting s=2, q1 = m , q 2 = T , 1 = G j =1 2 2 σν m ,.2 = σ IGα IT Provided we order observations by where = ZZ[1 | ...| Z s ]. It is usually assumed that matrices household and time, = Rdiag( R di ) where R di is a TT× G, R depend on a k­vector of variance components ψ , and matrix whose generic element is given by rdi (,) h k = 2 h− k so we can write V (ψ ) = ZG ( ψ ) Z′ + R ( ψ ). σe ρ e ,h , k = 1,..., T . There are independent area and Note that at the level of individual observations, the time effects like in MM1, but errors are assumed to be model (4) can be rewritten as ydti=x′ dti β + z 1 ′ dti v 1 +... + autocorrelated according to an AR(1) process. ′sdti v s+ z e dti .

Statistics Canada, Catalogue No. 12­001­X 190 Fabrizi, Ferrante and Pacei: Small area estimation of average household income based on unit level models

In order to evaluate the impact that using past survey 1990; Datta and Lahiri 1999). More recently, due to the waves has on the efficiency of estimator, a cross­sectional advent of high­speed computers and powerful software, linear mixed model (SMM) using data from the last wave T resampling methods have been proposed. For instance, only, has been taken as the benchmark: Butar and Lahiri (2003) introduce a parametric bootstrap method based on the assumption of normality, but analyti­ ′ SMM: ydTi=x dTi β + ϑ d + e dTi (10) cally less onerous than the Taylor series method. Jiang et al. IID IID (2002) discuss a general jackknife method, which requires a 2 2 with ϑd ∼ N(0, σϑ ), ejdT ∼ N (0,σ e ). distributional assumption weaker than normality (posterior This is also a particular case of (4) obtained for = 2, s linearity). We aim to empirically compare the performance 2 2 q1= m , 1 = σ GIν m and = σ RIe n . Note that (10) is the of these three estimators within a context where the number standard nested error regression model of Battese et al. of areas is moderate and the assumption of normality may (1988). not hold perfectly true. The following is a short description We also consider the corresponding random error of the three estimation approaches. variance linear models (see Rao 2003; section 5.5.2) Let us define MSE[η% EBLUP ( ψˆ )] =E ( η% EBLUP ( ψˆ ) − η )2 , obtained by replacing ′dti β x in formulas (5) ­ (10) with a where expectation refers to model (4). It is possible to show general intercept θ. These models will be denoted as that, under normality, MM1*, MM2*, MM3*, MM4*, MM5*, SMM*. All the EBLUP assumptions made regarding random effects and residuals MSE[η% ( ψˆ )] EBLUP BLUP 2 remain unchanged. This latter group of models enables us to =g1()()( ψ + g 2 ψ + E η%% − η ) (12) explore the gains in efficiency obtained by exploiting the −1 repetition of the observation on the same unit when no where g1 ()(ψ =′ − kG GZ ′ V ZG) k ′ and 2 () ψ g = −1 − 1 −1 covariates are available at the population level. d′′(), X V X d with =′ − dm k ′ GZ ′ V X (see Rao 2003, In small area estimation, the aim is to predict scalar linear chapter 3). Using the following approximation, based on a combinations of fixed and random effects of the type Taylor series argument η=′ β mk + ′ v where m and k are × 1 p and × 1 q vectors E (η%%EBLUP − η EBLUP) 2 respectively, with q= ∑ j q j . The best linear unbiased ′ predictor (BLUP) of η can be obtained by estimating ≈tr[( ∂b′ / ∂ψ )( V ∂ b ′ / ∂ψ ) V ()] ψ =g 3 () ψ (,)β v minimizing the model MSE among all linear where ′= bk ′ GZ ′ V − 1 , a second order approximation to estimators: (12) can be found: BLUP % η% ( ψ ) =m′ β ( ψ ) + k ′ % ( ψ ). v (11) EBLUP MSE[η% ( ψˆ )] ≈g1 ( ψ ) + g 2 ( ψ ) + g 3 ( ψ ). (13) When the variance components in ψ are unknown, they Note that here ≈ means that the omitted terms are of may be estimated from the data and substituted into formula order (). −1 om An asymptotically unbiased estimator of (11), thus obtaining “empirical BLUP” η% EBLUP ( ψˆ ) = (13), based on Prasad and Rao (1990), is given by m′βˆ ( ψˆ ) + k ′ ( ψ ˆ ˆ ) v (see Rao 2003, chapter 6, and Jiang and Lahiri 2006b for details). EBLUP msePR (η% ) =g1 ( ψˆ ) + g 2 ( ψ ˆ ) + 2 g 3 ( ψ ˆ ). (14) As far as the estimation of ψ is concerned, a number of methods have been proposed in the literature, such as Datta and Lahiri (1999) show that, under normality and EBLUP Maximum Likelihood (ML) and Restricted Maximum REML or ML estimation of ψ, msePR ( η% ) estimates Likelihood (REML) which assume the normality of random MSE[ηψ% EBLUP (ˆ )] with a bias of order (). −1 om terms, and the MINQUE proposed by Rao (1972) which is Butar and Lahiri (2003) propose a parametric bootstrap non­parametric. In the present work we have opted for the estimation of (13) under the assumption of normality. We REML method, thus assuming normality. adapt their estimator to the models we are analysing, assuming the following bootstrap model: 2.3. Measures of uncertainty associated with predictors based on linear mixed models ind i)y** | v∼ N [ Xβˆ + Zv* , R ( ψˆ )] The difficult problem of estimating the MSE of EBLUP (15) estimators, taking the variability of the estimated variance ind and covariance components into account, has been faced in ii)v* ∼ N [ 0 , G (ψˆ )] the small area literature by adopting diverse approaches.

One popular method is based on the Taylor series where = v (v1 ′ , ... vs ′ ) ′ . The parametric bootstrap is then used approximation of MSE under normality (Prasad and Rao twice, once to estimate the first two terms of (13), thus

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 191

correcting the bias of g1()(),ψ+ˆ g 2 ψ ˆ and once to estimate 2003; Vandecasteele and Debels 2004), and the results of 3 ( ψ ). g these studies show that in the case of some countries, The following estimator of (13) is proposed: including Italy, this effect disappears under the control of weighting variables. mse (η% EBLUP ) BL We have focused our attention on the eight ECHP waves * * =2[()g1 ψˆ + g 2 ()] ψ ˆ − EB [() g1 ψ ˆ + g 2 ()] ψ ˆ available for Italy (1994­2001). Given that our aim is to +E [(,( η% y βψˆ ˆ* ), ψ−η ˆ * )% (,(),)]y βψψˆ ˆ ˆ (16) assess whether the use of several successive observations of B the same household could be profitable for the purposes of where ψˆ * is the same as ψˆ except that it is calculated on small area estimation, we have overlooked the problem of * y instead of y, and E B is the expected value with regard attrition and only considered those households that partici­ to the bootstrap model (15). pate to the survey for all waves. The bootstrap estimator (16) does not require the Our target variable is disposable, post­tax household analytical derivation of 3 () ψˆ g which can be rather income at the time of the last wave (2001). In studies of laborious when G and R have complicated structures. poverty and inequality, income is often equivalized Jiang et al. (2002) introduced a general jackknife esti­ according to an equivalence scale in order to avoid compa­ mator for the variance of empirical best predictors in linear rison problems caused by differences in the composition of and non­linear mixed models with M­estimation. In the households. We consider the widely­used modified OECD problem we are investigating here, the estimator they scale, also adopted by Eurostat (2002) in its publications on propose can be written as: income, poverty and social exclusion. According to this m scale, equivalized income is calculated by dividing EBLUP m − 1 mseJLW (η% ) =g1 ( ψˆ ) − ∑ [g1 (ψˆ− j ) − g 1 ( ψ ˆ )] disposable household income by the number (k) of m j =1 “equivalent adults”, defined as k=1 + 0.5 a + 0.3 c , where a m m − 1 EBLUP EBLUP 2 is the number of adults other then the “head of the +∑ ( η%%− j − η ) (17) m j =1 household” and c is the number of children aged 13 or less. In general, the equivalized income can be perceived as the where ψˆ − j is the estimate of ψ calculated by using all data th EBLUP amount of income that an individual, living alone, should except those from the j area. Similarly, η% − j = EBLUP ˆ dispose of in order to attain the same level of economic η% (y −j , β ( ψˆ − j ), ψ ˆ − j ). It is worth pointing out that, on the basis of the wellbeing he/she enjoys in his/her household. simulation results reported in Jiang et al. (2002), mse is Of the many covariates available in the bountiful ECHP JLW questionnaire, we have chosen only those for which area deemed to be more robust than mse PR with regard to departures from the assumption of normality, which can means were available from the 2001 Italian Census results. also be expected to be crucial for mse . Thus the chosen covariates are: the percentage of adults; the BL percentage of employed; the percentage of unemployed; the percentage of people with a high/medium/low level of 3. The simulation study based on the European education in the household; household typology (presence Household Community Panel data of children, presence of aged people, etc.); the number of rooms per­capita and the tenure status of the accommo­ The target population of the ECHP survey consists of all dation (rented, owned etc.). resident households of a large subset of the EU member As we have said, the aim of this paper is to compare the countries. Although general survey guidelines were issued performance of different estimators in the controlled by Eurostat, a certain degree of flexibility was allowed, so environment of a simulation exercise. A number of works in there are some differences in the sampling design across the literature have compared small area estimators using countries. As far as Italy is concerned, the survey is based Monte Carlo experiments in which samples are drawn from on a stratified two stage design, in which strata were formed synthetic populations based either on Censuses (Falorsi, by grouping the PSUs (municipalities) according to geo­ Falorsi and Russo 1994; Ghosh et al. 1996) or on the graphic region (NUTS2) and demographic size. For more replication of sample units’ records (Falorsi, Falorsi and details of the survey, see Eurostat (2002). Russo 1999; Lehtonen, Särndal and Veijanen 2003; Singh, The ECHP deals with unit non­response, sample attrition Mantel and Thomas 1994). Since household income is not and new entries using weighting and imputation. As attrition measured by the Italian Census (nor is it given by the results could lead to biased estimates of income if it does not of other Censuses conducted by EU countries), we treated appear at random, the effect of poverty on dropout the ECHP survey data as the pseudo­population and then propensity has been investigated (Rendtel, Behr and Sisto draw samples using stratified probability proportional to

Statistics Canada, Catalogue No. 12­001­X 192 Fabrizi, Ferrante and Pacei: Small area estimation of average household income based on unit level models size sampling, the size variable being given by survey To motivate the selected specifications of the random weights. This solution may not be as good as that of using effects part of the considered linear mixed models (see data from a real Census population, but it is hopefully more section 2.2), an approach often recommended in textbooks realistic than generating population values of household (see Verbeke and Molenberghs 2000, chapter 9) has been income from a parametric model. followed: first we fit a standard OLS regression to our data Monte Carlo samples of 1,000 (roughly 15% of the using all available covariates; then we analyse the resulting actual ECHP sample size) were drawn from the synthetic residuals as a guide to identifying the random effects. This population by stratified random sampling without replace­ preliminary analysis has been conducted separately on ment, with strata given by the 21 NUTS2 regions. Thus several random samples of size 1,000, drawn up according these regions are treated as planned domains (as in the to the replication design described above. ECHP) for which sample size in the small areas is The adjusted R 2 of the OLS regression is close to 0.35 established beforehand, so that the sampling fractions reflect in every observed sample. This rather low figure is the result the over­sampling of smaller regions exactly as they do in of the nature of the phenomenon under study (household the actual ECHP sampling design. The region­specific income is not easy to predict), the information contained in sample sizes we obtained range from 14 to 112, being on the survey and the constraint represented by the need to average equal to 48. Therefore in our simulation = 1,000, n include only those covariates for which the population total the number of small areas corresponds to that of the Italian can be obtained from the Census. Regions (= 21) m and the number of points in time Figure 1 contains “box and whiskers” plots of the corresponds to the ECHP available waves ( = 8). T residuals by area and wave constructed for one of the Monte The distribution of equivalized household income in our Carlo sample (very similar findings may be observed in pseudo­population (that is the distribution obtained by every sample). Analysis of the plots suggests that there is weighted estimation from the ECHP sample data) is within­area and within­wave correlation, and thus the need characterized by an overall mean of 22,547 Euros and a to specify models including area and wave effects. From an coefficient of variation of 0.59. The distribution is positively analysis of residuals, it is less clear whether the inclusion of skewed (even though skewness is not extreme: skewness interaction effects (that is time varying area effects) would 3 coefficient γ1 = µ 3 / σ ≅ 2.5) and kurtosis (/κ= µ 4 be beneficial or not. σ≅4 14.3). The difference between mean and median is Moreover the residuals show a degree of autocorrelation, 9% of the mean. An interesting feature is given by the large the average of the autocorrelation coefficient calculated over disparities among administrative regions (that are the small all individual residual histories being 0.27. Even though this areas of interest in our study). The mean of the equivalized autocorrelation level is not very high, for the sake of household income ranges from 16,604 to 27,011, that is the completeness we decided to also take into consideration most affluent area has a mean equivalent income 62% models with autocorrelated errors or random effects. After higher than the poorest one. Also the coefficient of variation having tested various different autocorrelation structures

(ranging from 0.28 to 0.84), skewness ( γ1 ranging from 0.1 (ARMA(p, q), General Linear, etc.), we found that the auto­ to 4.6) and kurtosis ( κ ranging from ­0.7 to 32.9) show that regressive process of order 1 provides the best fit to our the distribution of our target variable is quite a bit different data. in different areas. Residuals Residuals ­30,000 ­10,000 0 10,000 30,000 ­30,000 ­10,000 0 10,000 20,000 30,000 40,000 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 wave area

Figure 1 Box and whiskers plot of residuals by wave (left) and area (right)

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 193

The apparent skewness of residuals also suggests that the Carlo estimates of expected values and variances, frequently normality assumption for errors does not hold exactly. We used in simulation studies on small area estimation (Heady, maintain this assumption for all the models we specify, and Higgins and Ralphs 2004; EURAREA Consortium 2004). we use REML estimators for variance components. In fact, The gain in efficiency connected to each small area we may expect departures from normality to have a slight estimator is evaluated using the ratio of its ARMSE to the impact on point values of predictors. BLUP formulas can be ARMSE of certain estimators we use as benchmarks. In derived without normality; moreover, there are sound particular, all estimators are compared with the weighted reasons for us to expect REML (and ML) estimators of ψ mean y , DIR dT and we denote this ratio as AEFFDir . to perform well even if normality does not hold (see Jiang, Moreover, EBLUP estimators associated with models (5) ­ 1996, for details). Departures from normality may have (9), which use data from previous waves, are compared with more a serious impact on MSE estimation, and this is a the EBLUP estimator associated with the cross­sectional problem we are going to be looking at in section 4.2 below. model (10), in order to assess the gain in efficiency deriving from the use of past waves. In this case the ratio is denoted

as AEFFSect . 4. Results As far as the evaluation of the degree of shrinkage is concerned, we have compared the empirical standard 4.1 Point estimators deviation of population area values: All computations involved in the simulation exercise m described in section 3 were carried out using SAS version − 1 2 ESD= m∑ ( YdT− Y T ) , 9.1 for Windows. EBLUP estimators are obtained using d =1 Proc MIXED, and the generation of samples is based on where Y is the mean of the population values of the m Proc SURVEYSELECT. T Given that the primary goal of Small Area Estimation is areas at time T, with the empirical standard deviation of the the precise estimation of area­specific parameters, we first estimated area values, which in the case of a simulation evaluated how well the described estimators perform when study is given by: predicting individual area values. Moreover, we also R  m  −1 − 1 2 evaluated the amount of over­shrinkage connected with esd = R∑ m ∑ ( y%%dT( r )− y T ( r ) ), each estimator. In fact, small area estimates should reflect r=1  d =1  (at least approximately) the variability in the underlying area where y % () Tr is the mean of the estimated values for the m parameters taken as a whole. areas at time T in the simulation run r. The comparison is We note that our simulation experiment is aimed at carried out using the indicator evaluating design­based properties of the estimators, that is, the population from which the random samples are esd generated is held fixed. RESD= − 1 (19) ESD For the evaluation of the estimators’ performance, we adopted an approach that is commonly found in the which tells us how the empirical standard deviation literature (see Rao 2003; section 7.2.6), using two associated with one estimator differs from that of the indicators, the Average Absolute Relative Bias (AARB) and population. the Average Relative Mean Square Error (ARMSE): Table 1 contains the percentage values of AARB,

m R ARMSE, AEFF and RESD obtained for the direct esti­ −1 − 1 y % dT( r )  AARB = m∑ R ∑ − 1  mator, the design­based estimators given in (2) and (3) and d=1 r = 1 Y dT  the EBLUP estimators derived from models (5) ­ (10). 2 (18) m R  All estimators perform significantly better than y , DIR dT −1 − 1 y % dT( r )   ARMSE = m∑ R ∑ − 1   in terms of ARMSE, leading to less than 100% AEFF Dir Y d=1 r = 1 dT   values. We can also see that design­based estimators are worse than EBLUP estimators in terms of ARMSE, and that where y % () r dT is the estimate for area d, time T and replicated the gain in efficiency demonstrated by AEFF Dir is sample r, while YdT is the population mean being estimated. particularly high in some cases (in excess of 50%). This Note that AARB measures the bias of an estimator, whereas result highlights the superior accuracy of the model­based ARMSE measures its accuracy. The number of replications estimators in question. R is set at 500, a figure large enough to obtain stable Monte

Statistics Canada, Catalogue No. 12­001­X 194 Fabrizi, Ferrante and Pacei: Small area estimation of average household income based on unit level models

Table 1 Performance indicators ­ auxiliary information is available

Model AARB% ARMSE% AEFFDir % AEFFSect % RESD% DIR 0.0 0.787 100.0 ­ 15.6 COMP 2.7 0.552 70.1 ­ ­9.8 GREG 0.2 0.543 68.2 ­ 10.0 SMM 2.3 0.377 47.7 100.0 ­8.7 MM1 3.1 0.358 45.3 95.0 2.4 MM2 2.4 0.427 54.1 113.4 ­4.4 MM3 2.6 0.380 48.3 101.2 4.7 MM4 2.6 0.429 54.2 113.6 ­8.0 MM5 2.9 0.318 40.4 84.7 ­7.7 The most reliable EBLUP estimator is the one associated These results confirm the fact that household level data at with the MM5 model, with independent area and time several consecutive points in time may be employed, via effects and residuals autocorrelated according to an AR(1) certain kinds of longitudinal model, to produce more process, leading to a gain in efficiency of about 60% efficient estimates. compared with the direct estimator. This is followed by the Moving on to the indicator for shrinkage reported in the EBLUP estimator associated with model MM1, which last column of the table, we can see that the direct estimator differs from the previous one only because of the absence of overestimate the standard deviation of the population of area autocorrelated residuals. means, by 15%. The same effect, albeit somewhat In terms of bias, the GREG estimator gives the smallest attenuated, is observed for the GREG estimator, whose value of AARB, as would be expected (Särndal, Swensson standard deviation is over­inflated by 10%. On the contrary, and Wretman 1992, chapter 7; Veijanen, Lehtonen and the COMP estimator tends to “shrink” the estimates towards Särndal 2005). This is followed by the remaining estimators, the centre of the distribution, leading to a reduction in the all of which reveal a similar value for AARB. Of the standard deviation of area means of about 10% with respect EBLUP estimators, those associated with the MM1 and to the population. These results are in line with those MM5 models are more efficient in terms of ARMSE, but obtained by other authors comparing the same kinds of they are slightly more biased than the one associated with estimator (Heady et al. 2004; Spjøtvoll and Thomsen 1987). the SMM. This is probably due to the fact that we limit our The results obtained for EBLUP estimators are more evaluation of performance to the last wave; for this data encouraging, as the calculated percentage difference is subset we would expect the fit of the regression underlying always less then 10% in absolute terms. Hence, in this SMM, based on the last wave only, to be better than the one respect all EBLUP estimators seem to be acceptable. based on the whole data set. As far as EBLUP estimators are Moreover, we may expect that the BLUP estimators are concerned, the AEFF Sect column shows how the gain in under­dispersed compared to the corresponding population efficiency of the predictors, based on borrowing strength parameters. In this case, the indicator RESD assumes over time, is positive in some cases and negative in others. positive values for some longitudinal EBLUP estimators Models MM2 and MM4 (see formulas (6) and (8)), where because it is calculated only on the last wave, while effects of interaction between area and time are present, are longitudinal models are aimed to predict m× T parameters. apparently inadequate because the predictors associated Table 2 summarizes the results regarding those EBLUP with both models perform rather poorly. The performance estimators associated with random error variance models, as of the predictor associated with MM3 (see (7)) is also described in the last paragraph of section 2. When no slightly worse than that of the predictor associated with the auxiliary variables are included in the models, the advantage cross­sectional model: this rather surprising result is of “borrowing strength” over time and area is singled out probably due to the low number of waves, which does not independently of the advantage associated with covariates. allow for an effective estimation of the correlation As expected, the improvements in efficiency measured coefficient between consecutive time effects. by AEFF Dir are smaller than those shown in Table 1, As we have already said, the estimator associated with although the reductions in ARMSE remain significant. The model MM5 is the one that performs the best: it is ranking of those predictors associated with the various considerably more efficient than the one associated with random effects specification remains the same as the one

SMM, with an AEFF Sect of roughly 85% representing a gain presented in Table 1, the predictor associated with the in efficiency of about 15% due to consideration of more 5* MM model resulting the most efficient, as shown by than one wave. The EBLUP estimator associated with MM1 ARMSE%. The gain in efficiency associated with this latter also turns out to be more efficient than the one associated estimator compared with the direct estimator is about 43%. with SMM, but in this case the gain is one of only 5%.

Statistics Canada, Catalogue No. 12­001-X Survey Methodology, December 2007 195

Table 2 Performance indicators ­ auxiliary information is unavailable

Model AARB% ARMSE% AEFFDir % AEFFSect % RESD% SMM * 2.7 0.575 72.8 100.0 ­7.6 MM1 * 2.9 0.556 70.3 96.6 7.5 MM2 * 2.8 0.639 80.8 111.0 ­3.0 MM3 * 3.7 0.574 72.6 99.7 8.6 MM4 * 3.5 0.691 87.2 119.8 ­6.7 MM5 * 3.0 0.445 56.2 77.2 ­6.3

EBLUP EBLUP th With regard to bias, the EBLUP estimators obtained where η% () r dT is η% dT calculated on the r replicated from those models with no covariates tend to be more % EBLUP− 1R EBLUP sample and ηdT =R ∑ r =1 η% dT( r ) , will be used as biased than the corresponding ones with covariates. benchmark for the comparison of the performance of the The analysis of the AEFF Sect column shows that the mean square error estimators described in section 2.3, reduction in ARMSE allowed for by some of those models because the true mean squared error is not known. borrowing strength over time, is larger than in the case As in the case of point estimators, all computations are where covariates are included, as it reaches 22% in the best * done using SAS. To determine the Prasad­Rao estimator example of the 5 MM model. (14), the output of Proc MIXED’s ESTIMATE statement is This last result is really encouraging. In fact, within the used with the option KENWARDROGER activated. The context of Small­Area Estimation, the absence of any sum g1()()ψ+ˆ g 2 ψ ˆ is obtained from the output of Proc known totals of covariates in the population can be very MIXED. The KENWARDROGER option allows for the limiting when trying to obtain reliable estimates. The calculation of an MSE inflation factor, described in observed ARMSE reduction connected to the consideration Kenward and Rogers (1986), which is equivalent to of more waves in a panel survey show that estimates may be 2 3 ( ψˆ ) g (see also Rao 2003, section 6.2.7). improved “borrowing strength” over time, when it is not EBLUP The estimator mseBL (η% dT ) is re­sampling based. possible to exploit auxiliary information. Hence the evaluation of its performance with respect to a With regard to the results of shrinkage, they may be Monte Carlo exercise requires the implementation of two considered acceptable also in this case, and one can see a nested simulations: for each r ( r= 1,..., R ), we run the relationship between the results obtained for EBLUP RBOOT replications needed to approximate expectations estimators derived from analogous models with or without with respect to the bootstrap model. To limit the covariates. computational burden, we set RBOOT = 150. Butar and 4.2 Comparing different estimators of the MSE of Lahiri (2003) propose an analytical approximation of

EBLUP estimators mseBL , but only for models that are not as complex as the In section 2.3 we reviewed three different estimators of one in question. % EBLUP % EBLUP the MSE associated with EBLUP estimators. In this section For both mseBL (η dT ) and mseJLW (η dT ), we have we are going to compare the performances of these three prepared ad­hoc SAS codes using the output of Proc estimators using our simulation exercise. Given that we are MIXED as inputs. focusing on MSE estimation rather than a comparison of In order to compare the three MSE estimators, we EBLUP estimators derived from different models, we only employ the same measures used to evaluate the performance consider the predictor associated with model MM5, which of point estimators, AARB and ARMSE. As there is usually emerged as the best performer in the previous section. some concern about the under­estimation of MSE Let us denote the predictor of Y with η% EBLUP and its estimators, we are also interested in the sign of any bias dT dT associated with the estimators in question. Therefore, in the mean square error as MSE(η% EBLUP ). The following esti­ dT case of MSE estimators we do not only calculate the mator: average of the absolute values of the estimates obtained for 1 R the bias in each region (AARB), but also the average of mse (η% EBLUP ) = ( η% EBLUP − η% EBLUP) 2 these estimates without the absolute value (AARB’), so as ACT dT R ∑ dT( r ) dT r =1 to better understand whether the given estimators indeed % EBLUP 2 +( ηdT − Y dT ), tend to under­evaluate the MSE or not. Hence the calculated measures are:

Statistics Canada, Catalogue No. 12­001­X 196 Fabrizi, Ferrante and Pacei: Small area estimation of average household income based on unit level models

2 2 2− 1 m R EBLUP EBLUP predictor characterized by γ = σˆn() σ ˆ n + σ ˆ −1 − 1  mse* (η% dT )  i v i v i e AARB = m∑ R ∑  EBLUP − 1  ranging from 0.54 to 0.9. We note also that some, but not  mse (η% )  d=1 r = 1  ACT dT  all, areas are characterized by the presence of outliers m R EBLUP   mse (η% )   (skewness coefficient γ1 ranges from 0.1 to 4.6). AARB′ = m−1 R − 1 * dT − 1 , ∑ ∑  EBLUP   In this setting MSE estimators show a behavior quite d=1 r = 1  mseACT (η% dT )     different form that illustrated in the case of model MM5. 2 m  R EBLUP    mse (η% )   Results are shown in Table 4. ARMSE = m−1 R − 1 * dT − 1 ∑  ∑  EBLUP   d =1  r =1  mseACT (η% dT )   EBLUP   Table 4 Performance of MSE estimators of η% dT under model SSM* where the symbol * refers to the considered estimation Estimator AARB AARB' ARMSE procedures, that are PR, BL, JLW. Results of the mse PR 0.449 0.262 0.503 comparisons based on = 500 R MC iterations are reported mse BL 0.376 0.213 0.376 in Table 3. mse JLW 0.354 0.149 0.335 EBLUP Table 3 Performance of MSE estimators of η% dT under model MM5 All estimators overestimate the actual MSE, although mse overestimates less than the other two. From a Estimator AARB AARB' ARMSE JLW detailed analysis of results related to individual areas, we mse 0.378 ­0.383 0.238 PR have the values of AARB′ (that represents the most mse 0.377 ­0.318 0.228 BL apparent difference with the results of Table 3) is driven by mse 0.337 0.036 0.261 JLW severe overestimation of actual MSE in areas characterized by the lowest levels of skewness and kurtosis. For these In terms of ARMSE and AARB, the three estimators 2 areas σˆ e largely overstates actual variation in the data, thus behave similarly, with no particular one emerging as clearly 2 2 leading to overestimation of g ( σ , σ ). This is likely to be better than the other two. Nonetheless, the AARB' column 1 v e due to the fact that the failure of normality (the excess of clearly shows that mse and mse systematically 2 PR BL kurtosis) causes the overestimation of σ . This problem did underestimate MSE , whereas mse does not. This is e ACT JLW not appear in the case of model MM5 because of the probably due to the failure of the normality assumption for presence of covariates and the AR(1) modeling of individual error terms. In fact, as we foresaw in section 3, equivalized residuals. income is a positively skewed variable, and the regression residuals e also appear to be so. Normality is a crucial assumption in the derivation of mse PR and mseBL , while 5. Concluding remarks and further developments mse JLW could be expected to be more robust in this respect. Our findings are consistent with the theory predictions and The results obtained show that, in general, EBLUP simulation results described in Jiang et al. (2002). Although estimators derived from unit level linear mixed model

Bell (2001) noted that mse JLW may be negative for some specifications that “borrow strength over time”, as well as data set because of the bias correction, this never happens in over areas, provide a significant gain in efficiency compared our simulations. For all replicated data set we have that the with both the direct estimator and with other commonly­ second term in (17) gives a positive, and in most cases used design based estimators such as the optimal composite substantial contribution to the estimate of the MSE. A estimator and the GREG estimator. Moreover, the mean discussion of modifications of (17) when it returns negative squared error of some of the longitudinal EBLUP estimators values can be found in Jiang and Lahiri (2006b). in question is considerably lower, on average over the areas, To conclude then, in the case of the present problem, than that of the analogous cross­sectional EBLUP esti­

mse JLW emerges as the most appropriate of the three mators. Among the model specifications used to derive EBLUP measures for estimating MSE(η% dT ). This finding could EBLUP estimators, those with independent time and area be of importance for any application of normality­based effects, whether inclusive of the autocorrelation of residuals linear mixed models theory to data set in which normality or not, appear the most efficient, offering a gain in assumptions for error terms do not hold exactly. efficiency of about 55­60% compared with the direct We replicated the simulation exercise also for the cross­ estimator. These results also hold when covariates are section model without covariates * , SMM that is often removed; in fact, they offer the chance to obtain reliable considered in simulations aimed at the comparison of small area estimates even in the absence of covariates, different estimation methods. To this end we note that for provided that repeated observations of the same unit at 2 2 this model the ratio σˆe/ σ ˆ v is around 12, leading to a several points in time are available. Besides the shrinkage

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 197 effect connected to EBLUP estimators appears moderate, Falorsi, P.D., Falorsi, S. and Russo, A. (1994). Empirical comparison of small area estimation methods for the Italian Labour Force reducing the need for ensemble or multiple estimation (Rao Survey. Survey Methodology, 20, 171­176. 2003, Chapter 9). With regard to estimation of the MSE of Falorsi, P.D., Falorsi, S. and Russo, A. (1999). Small area estimation the small area estimators in question, we noted that the at provincial level in the Italian Labour Force Survey. Journal of jackknife estimator provides the best results being correct, the Italian Statistical Society, 1, 93­109. on average, over the areas and thus more robust to any Ghosh, M., Nangia, N. and Kim, D. (1996). Estimation of median departure from the standard assumptions of linear mixed income of four­person families: A bayesian time series approach. models. This finding may be of importance to all Journal of the American Statistical Association, 91, 1423­1431. applications of normality­based linear mixed models theory Heady, P., Higgins, N. and Ralphs, M. (2004). Evidence­Based to data set in which normality assumptions do not hold Guidance on the Applicability of Small Area Estimation Techniques. Paper presented at the European Conference on exactly, as in the case of income data. Quality and Methodology in Official Statistics, Mainz, Germany, May, 24­26. Jiang, J. (1996). REML estimation: Asymptotic Behavior and Related Acknowledgements Topics. The Annals of Statistics, 24, 255­286. Research was partially funded by Miur­PRIN 2003 Jiang, J., Lahiri, P. and Wan, S.M. (2002). A unified jackknife theory for empirical best prediction with M­estimation. The Annals of “Statistical analysis of changes of the Italian productive Statistics, 30, 1782­1810. sectors and their territorial structure”, coordinator Prof. Jiang, J., and Lahiri, P. (2006a). Estimation of finite population C. Filippucci. The work of Enrico Fabrizi was partially domain means ­ A model­assisted empirical best prediction supported by the grants 60FABR06 and 60BIFF04, approach. Journal of the American Statistical Association, 101, University of Bergamo. 301­311. We thank ISTAT for kindly providing the data used in Jiang, J., and Lahiri, P. (2006b). Mixed model prediction and small this work. area estimation. Editor’s invited discussion paper, Test, 15, 1, 1­ 96. Kenward, M.G., and Roger, J.H. (1986). Small sample inference for References fixed effects from restricted maximum likelihood. Biometrics, 53, 983­997. Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error Lehtonen R., Särndal C.­E. and Veijanen A. (2003) The effect of component model for prediction of county crop areas using survey model choice in estimation for domains, including small domains, and satellite data. Journal of the American Statistical Association, Survey Methodology, 29, 1, 33­44. 83, 28­36. Pfeffermann, D. (2002). Small area estimation – New developments Bell, W. (2001). Discussion with “Jackknife in the Fay­Herriott and directions. International Statistical Review, 70, 125­143. model with an example”, Proceeding of the Seminar on Funding Opportunity in Survey Research, 98­104. Prasad, N.G.N., and Rao, J.N.K. (1990). The estimation of mean squared errors of small area estimators. Journal of the American Betti, G., and Verma, V. (2002). Non­monetary or lifestyle Statistical Association, 85, 163­171. deprivation, in EUROSTAT (2002). Income, Poverty Risk and Social Exclusion in the European Union, Second European Social Rao, C.R. (1972). Estimation of variance and covariance components Report, 87­106. in linear models. Journal of the American Statistical Association, 67, 112­115. Butar, F., and Lahiri, P. (2003). On measures of uncertainty of empirical bayes small­area estimators. Journal of Statistical Rao, J.N.K. (2003). Small Area Estimation, New York: John Wiley & Planning and Inference, 112, 63­76 Sons, Inc. Datta, G.S., and Lahiri, P. (1999). A unified measure of uncertainty of Rao, J.N.K., and You, M. (1994). Small area estimation by combining estimated best linear predictors in small area estimation problems. time series and cross­sectional data. Canadian Journal of Statistica Sinica, 10, 613­627. Statistics, 22, 511­528. Datta, G.S., Lahiri, P. and Maiti, T. (2002). Empirical bayes Rendtel, U., Behr, A. and Sisto, J. (2003). Attrition effect in the estimation of median income of four­person families by state European Community household panel, CHINTEX PROJECT, using time series and cross­sectional data. Journal of Statistical European Commission. Planning and Inference, 102, 83­97. Särndal, C.­E., Swensson, B. and Wretman, J. (1992). Model Assisted Datta, G.S., Lahiri, P., Maiti, T. and Lu, K.L. (1999). Hierarchical Survey Sampling. New York: Springer­Verlag. bayes estimation of unemployment rates for the States of the U.S. Journal of the American Statistical Association, 94, 1074­1082. Singh, A.C., Mantel, H.J. and Thomas, B.W. (1994). Time series EURAREA CONSORTIUM (2004). EURAREA. Enhancing Small EBLUPS for small areas using survey data. Survey Methodology, Area Estimation Techniques to meet European Needs, Project 20, 1, 33­43. Reference Volume, downloadable at http://www.statistics.gov.uk/ eurarea/download.asp. Spjøtvoll, E., and Thomsen, I. (1987). Application of some empirical bayes methods to small area statistics. Bulletin of the International EUROSTAT (2002). European social statistics ­ Income, poverty and Statistical Institute, 4, 435­450. social exclusion. 2 nd report.

Statistics Canada, Catalogue No. 12­001­X 198 Fabrizi, Ferrante and Pacei: Small area estimation of average household income based on unit level models

Vandecasteele, L., and Debels, A (2004). Modelling attrition in the Verbeke, G., and Molenberg, H. (2000). Linear Mixed Models for European Community Household Panel: The effectiveness of Longitudinal Data, New York: Springer­Verlag. weighting, 2 nd International Conference of ECHP Users, EPUNet 2004, Berlin, June 24­26. Veijanen, A., Lehtonen, R. and Särndal, C.­E. (2005). The Effect of Model Quality on Model­Assisted and Model­Dependent Estimators of Totals and Class Frequency for Domains, paper presented at SAE2005 Conference, Challenges in Statistics Production for Domains and Small Areas, August, 28­31 2005, Jyväskylä, Finland.

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 199 Vol. 33, No. 2, pp. 199-210 Statistics Canada, Catalogue No. 12-001-X

Estimation of the coverage of the 2000 census of population in Switzerland: Methods and results

Anne Renaud 1

Abstract Coverage deficiencies are estimated and analysed for the 2000 population census in Switzerland. For the undercoverage component, the estimation is based on a sample independent of the census and a match with the census. For the overcoverage component, the estimation is based on a sample drawn from the census list and a match with the rest of the census. The over- and undercoverage components are then combined to obtain an estimate of the resulting net coverage. This estimate is based on a capture-recapture model, named the dual system, combined with a synthetic model. The estimators are calculated for the full population and different subgroups, with a variance estimated by a stratified jackknife. The coverage analyses are supplemented by a study of matches between the independent sample and the census in order to determine potential errors of measurement and location in the census data.

Key Words: Census; Coverage errors; Dual system; Multi-stage sampling plan; Measurement errors.

1. Introduction Overcoverage is estimated from a sample of individuals SE drawn from census records. A search for duplicates and In any census, some people are not enumerated and other erroneous records then serves to determine whether a should be, while others are counted twice or should not have given record corresponds to a real person to be enumerated. been enumerated. There is both undercoverage and over- Net coverage is estimated on the basis of a capture-recapture coverage, and quite often, the combined result is net under- model known as the dual system (Wolter 1986, Fienberg coverage. For example, net undercoverage is estimated at 1992). The dual estimator is applied in homogeneous cells, 1.6% in the United States in 1990 (Hogan 1993), 2.2% in and the results are recombined using a synthetic model to the in 1991 (Brown, Diamond, Chambers, obtain results for different domains of the population Buckner and Teague 1999) and 3% in Canada in 2001 (Hogan 2003). The purpose of the project is not to adjust the (Statistics Canada 2004). By contrast, in the United States in census figures but rather to obtain information on the quality 2000, there is estimated to be net overcoverage of 0.5% of the 2000 census and potential improvements for future (Hogan 2003). Coverage deficiencies may vary greatly censuses. between subgroups of the population. In the United States in This article describes the different steps followed in 2000, blacks were found to have a net undercoverage of obtaining estimates, then presents the results. Sections 2 and 1.8%, while whites had an overcoverage of 1.1%. Also, 3 describe the data sets and the coverage estimators. Section values often vary between age classes and regions, for 4 provides the details on constructing the different statuses example. These coverage deficiencies, and other errors such used in the estimators. Section 5 describes the approach as measurement errors, result in a biased picture of the used to compare the values collected in the census and in the population. They are therefore studied in order to obtain survey for the matched persons from SP. Sections 6 and 7 information on the quality of the available data and to find present the numerical results and the conclusion. ways to improve censuses of the population. The 2000 population census in Switzerland gives a picture of the population on December 5, 2000. In this 2. The three data sets article, coverage deficiencies in a Swiss census are esti- 2.1 Census mated for the first time. Undercoverage, overcoverage and net coverage resulting from the 2000 census are all The 2000 census was conducted under the auspices of analysed. Undercoverage is estimated from a sample of the Federal Statistical Office, with the reference date of individuals SP, independent of the census, on which a December 5, 2000. Information was collected for 7.3 coverage survey was organized a few months after the million inhabitants, 3.1 million households, 3.8 million census (collection took place in April and May 2001). The dwellings and 1.5 million buildings. The different levels data from the survey are matched with data from the census were then linked by common identifiers when the data were to determine whether persons in SP were enumerated. processed.

1. Anne Renaud, Service de méthodes statistiques, Office fédéral de la statistique, Espace de l’Europe 10, CH-2010 Neuchâtel, Switzerland. E-mail: [email protected]. 200 Renaud: Estimation of the coverage of the 2000 census of population in Switzerland

The collection of information on persons and households case of missing data or inconsistency in questionnaires (item was the responsibility of Switzerland’s 2,896 political non-response). communes. The latter had a choice between different The population of interest for coverage estimates is the methods of collection: resident population (based on economic residence) in − TRADITIONAL: use of census agents; private and administrative households. Collective house- − SEMI-TRADITIONAL: pre-printed questionnaires holds, which account for 2.3% of the enumerated resident based on the communal register of inhabitants are population, are excluded from the estimates. mailed out and then collected by census agents; 2.2 S sample, coverage survey and matching − TRANSIT: pre-printed questionnaires are mailed out P (undercoverage) and mailed back; − FUTURE: identical to TRANSIT except that links The size objective for the SP sample is set at between households and dwellings are supplied by approximately 50,000 people. In the absence of existing the commune; frames in Switzerland, this value was determined approxi- − TICINO: similar to TRANSIT but limited to the mately, based on experiences in other countries. In canton of Tessin. particular, the Australian results for 1996 were used, since the sampling plan for Australia’s coverage survey was Most of the SEMI-TRADITIONAL, TRANSIT, similar to the one for Switzerland in 2000 (ABS 1997).

FUTURE and TICINO communes also offered the option of The SP sample, which is independent of the census, is completing questionnaires online. The 2,208 SEMI- constructed in two parts: the canton of Tessin (TICINO) and TRADITIONAL, TRANSIT, FUTURE and TICINO the rest of Switzerland (NORD). Both parts use a multi- communes that used the pre-printing of questionnaires stage draw. The first stage consists in selecting 303 primary based on communal registers of inhabitants account for units - these are political communes for TICINO and postal nearly 96% of the population. For most of these communes, codes for NORD - according to a stratified plan with a draw the tasks of mailing out questionnaires and controlling their proportional to the number of buildings. The second stage return were organized at a national centre. consists in a simple random draw of a fixed number of 60 The data set for individuals contains 7,452,075 entries. buildings per primary unit. In the NORD plan, these One feature of this data set is that it contains two records for buildings are allocated to a maximum of three mail delivery the same person if that person has two residences (2.3% of routes, based on an intermediate sampling stage. The the population; for example, a student who both resides with sampling is thus constructed so as to consolidate the field his parents and has a residence close to his school). In the work while limiting the variability of the weights. For case of two residences, one is coded as the economic practical reasons and in light of available resources, postal residence and the other as the civil residence. The economic codes that include a large proportion of buildings lacking residence is the place where the person spends the most time complete postal addresses or coded as unoccupied are per week and the civil residence is where the person’s selected with a lower probability than other postal codes. official papers are kept (birth certificate for Swiss citizens, These tend mainly to be postal codes in rural areas or residence permit for foreigners). Where there is just one industrial zones, which are unlikely to exhibit major residence, it is both the economic and the civil residence. coverage deficiencies. With the assistance of postal Switzerland is considered to have a resident population of employees, complete lists of households are drawn up in the 7,280,010 based, on the set of records showing the field within the sample of approximately 16,000 buildings. economic residence. A sub-sample of buildings is then drawn so as to obtain a Households are classified as private, collective or total of approximately 27,000 households. For more administrative. Examples of private households are families, information on the sampling and survey procedure, see couples and persons living alone. Examples of collective Renaud (2001) and, in greater detail, Renaud and households are groups of occupants of a home for the aged Eichenberger (2002). or a boarding school or the inmates of a prison. The coverage survey consists in contacting the 27,000 Administrative households group together people with no households - by telephone if a telephone number is found fixed residence, travellers and persons - by building or and in person if not. The variables collected are those that commune - who could not be assigned to private or lend themselves to matching with the census and defining collective households (2.4% of the resident population). subgroups of interest for the coverage study (socio- Census data contain no imputation at the record level, demographic variables, addresses). The collection operation since communes sent basic information for non-respondents covers all members of all households in the selected (unit non-response). However, values are imputed in the buildings.

Statistics Canada, Catalogue No. 12-001-X Survey Methodology, December 2007 201

The final sample SP contains nP = 49,883 people in the that of SP, had little influence on the processing of the data, population of interest (persons listed at their economic since there was no field work or interview supplementary to residence and residing in a private household). Of the the census. households contacted, 88% were reached by telephone and The SE sample was selected from the census data using 12% in person. The weighting depended on the sampling a two-stage draw. Only elements included in the population and an adjustment for non-response. The adjustment for of interest were eligible (records at the economic residence, non-response was based on a homogeneity model in cells without members of collective households). The primary constructed on the basis of the sampling strata and whether units of SE were identical to the primary units of SP or not a telephone number was known to exist (interviews (postal codes and communes). However, the list of postal conducted by telephone or in person). It also incorporated codes in the NORD plan that was used for SP did not an estimate of the proportion of true households among the correspond exactly to the list of postal codes that were households to be contacted, since a sizable portion of the present in the census data. Census records found in postal households to be contacted actually consisted of vacant codes that did not exist in the list used for SP were dwellings, stores or businesses. No calibration was applied, therefore reallocated to existing codes, taking geographic since the auxiliary data available were not independent of location into account (this involved assigning fictitious the census. There was no partial non-response. The postal codes for the sampling). In the second stage, records weighting details are documented in Renaud and Potterat were drawn from the population of interest using a simple (2004). random plan, without intermediate stages. The allocation Based on the questions asked in the survey and various was done in such a way as to obtain constant weights in the plausibility controls, we hypothesize that the SP data are sampling strata of the primary units. In the end, the sample correct and usable for matching with the census. The quality contained nE = 55,375 records (Renaud 2003). S criteria used are as follows: We hypothesize that E records are sufficient to identify − completeness: the record is sufficient to identify persons (completeness), since there is little imputation in the the person; census data and most questionnaires were pre-printed based − appropriateness: the person should have been on registers of inhabitants. Appropriateness and uniqueness S enumerated; were determined in a matching between E and the rest of − uniqueness: the person is listed only once; the census using a procedure similar to the matching S − belonging to population of interest: the person is between P and the census. In our case, this involves a S listed at his/her economic residence and in a search for duplicates or triplicates of elements of E, S private household; supplemented by an analysis of suspect cases in E. An j − correctness of location: the person is listed at the element is considered appropriate if it is not considered e.g. correct address on Census Day. erroneous in the analysis of suspect cases ( , a note on the questionnaire indicating that the person has gone abroad). j The matching between the SP sample and the census An element is considered unique if no duplicate or serves to determine the matching status Pj of each element triplicate is detected in the census. There is no S j of SP. Status Pj is equal to 1 if the element is matched supplementary interview for E. There is therefore no in the census (enumerated person) and 0 if this is not the information supplementary to the census for SE persons case (person not enumerated). In our case, the data collected (actual location? actual type of residence or household?). in the coverage survey, the final census data and images of The search for duplicates/triplicates and suspect cases the census questionnaires are used for automatic matching, results in an enumeration status Ei for each element j of manual matching and controls. No supplementary interview SE. Status Ei is equal to 1 if the element should indeed took place in addition to the coverage survey. Persons who have been enumerated in the census (default value) and 0 if moved between Census Day and the day of the survey were it should not have been enumerated. In practice, it can take sampled at their address on the day of the survey and then on values between 0 and 1 if the case is not determined searched for on a priority basis at the address they had precisely. Thus, duplicates and triplicates receive indicated for the day of the census. No case was unresolved respectively the values 1/2 and 1/3 if there is no information by the end of the process. allowing the correct record to be determined from among the records detected. These cases, which are rare, consist of 2.3 S sample and search for erroneous records E persons who completed more than one questionnaire in the (overcoverage) census without any link having been made between those The size objective for the SE sample was set at approxi- questionnaires during the processing of the data. mately 55,000 persons. This value, somewhat greater than

Statistics Canada, Catalogue No. 12-001-X 202 Renaud: Estimation of the coverage of the 2000 census of population in Switzerland

3. Coverage estimators N ˆ ⎡+1, k ⎤ NNkk= []1,+ ⎢⎥, (4) ⎣⎦N11, k 3.1 Undercoverage and overcoverage where N is the total of records correctly counted in cell The undercoverage rate is estimated by Rˆ = 1− Rˆ , 1,+ k under m k during capture (census), N is the total in k during where Rˆ is the estimate of the correct matches rate based +1, k m recapture (estimated from sample S ) and N is the on the S sample. Similarly, the overcoverage rate is P 11, k P number of records common to the two lists (estimated from defined as Rˆˆ=−1R , where Rˆ is the estimate of the over c c matches between S and the census). correct records rate based on the S sample. The correct P E The different terms of equation (4) are estimated using matches rate and the correct records rate are estimated by undercoverage and overcoverage estimates. This is an the weighted means of matching status P and enumeration j extension of the model in Wolter (1986), similar to the one status E , as follows: j used by Hogan (2003). Thus, the total of the records

∑ wPPj, j ∑ wEE, jj correctly counted in the census N1,+ k is estimated by the ˆ jS∈ P ˆ jS∈ E Rm = and Rc = , (1) enumerated total Ck multiplied by the correct records rate ∑ wPj, ∑ wEj, jS∈ P jS∈ E ˆ Rck, to take account of overcoverage. Also, the ratio between the total in the recapture N and the number of where wPj, is the weight of element j of sample SP and +1, k records common to the two lists N is estimated by the wE, j is the weight of element j of sample SE. We note 11, k ˆ inverse of the rate of matching Rˆ between the coverage that the denominator of Rc is the sum of the weights wE, j mk, of SE and not the number C of known records in the survey and the census in order to take account of census, so as to have a potentially less biased estimator. undercoverage. We obtain The estimate of the undercoverage and overcoverage −−11 NCRRˆˆˆˆˆ===[][][ CRRCF ] ˆ, (5) ˆ ˆ kkckmkkckmkk,, ,, k rates in a domain d is given by Runder, d = 1 − Rmd, and ˆˆ −1 RRd=−1cd, with ˆˆˆ over, , where Fkckm= RR,,k is the coverage correction factor in ˆ cell k. Factor Fk combines the effects of overcoverage and ∑ wPIPj, j jd ∑ wEJE, jjjd Rˆ = jS∈ P and Rˆ = jS∈ E . (2) undercoverage of cell k estimated from samples S and md, wI cd, wJ P ∑ jS∈ Pj, jd ∑ jS∈ Ej, jd P E SE. We note that undercoverage in one domain may be offset by overcoverage in the same domain. Thus, nil net Identifiers I and J take on the value 1 if element j, jd jd undercoverage in a domain does not mean that no coverage respectively of S and S , is found in domain d ; P E deficiency exists in it. otherwise their value is 0. The proposed estimates are based on the assumptions of 3.2 Net coverage the dual model, the choice of estimation cells, and the ˆ choice of the statuses defining the estimators Rck, and The net undercoverage rate is estimated by Rˆ = ˆ netunder Rmk. The dual model is useful since it takes into account − Rˆ Rˆ= CNˆ net , 1 net where net / is the estimate of the the fact that some persons are reached neither by the census coverage rate, C is the number enumerated in the (capture) nor by the coverage survey (recapture). However, ˆ population of interest and N is the estimate of the true total a series of conditions must be met to avoid estimation Rˆ in the population of interest. If netunder is negative, there is biases. The coverage survey and the census must be totally net overcoverage. independent. The matching must be of very high quality. ˆ The estimate of the true total N is based on the dual The model must be applied in cells with persons who have model (Wolter 1986). This model is built on the principle of the same probability of being enumerated in the census and capture (census) and recapture (coverage survey). It is the survey respectively; see Section 3.3. Lastly, the applied in estimation cells k= 1, ..., K in order to best population must not change too much between Census Day satisfy the assumptions of the model; see discussion below. and the day of the survey. As to the estimators Rˆ and ˆ ck, Thus, the estimate of the true total N is composed of the Rˆ , they are based on the quality of the matching and the ˆ mk, sum of the estimated true totals Nk in disjoint estimation search for erroneous records. Also, it is necessary to ensure cells covering the population of interest kK= 1, ..., : that the definition of a correct match in SP and the K definition of a correct record in S are identical, i.e., that ˆˆ E NN= ∑ k. (3) there is a balance between overcoverage and undercoverage; k =1 see Section 4. All those elements are taken into ˆ The estimated totals Nk have the form given by the dual consideration insofar as possible in the present estimates. model: The estimate of net undercoverage in a domain d has the ˆˆ ˆ form RRCnetunder, ddd=−11net, =− /,Nd where Cd is

Statistics Canada, Catalogue No. 12-001-X Survey Methodology, December 2007 203

ˆ the enumerated number in the domain and Nd is the Let h= 1, ..., H be the stratum used in the first stage of ˆ estimate of the true total. The estimate of the true total Nd sampling, i= 1, ..., mh the number of the primary unit in is based on a synthetic model that assumes that the stratum h (postal code for NORD or commune for correction factor is fixed in each cell k = 1, ...,K : TICINO), and j = 1, ..., nhi the number of the person in

KK primary unit i of h. For the needs of the jackknife method, ˆˆ ˆ NNdkdkd==∑∑,, CFk. (6) samples SP and SE are partitioned, in each stratum h, in kk==11 mh subsets corresponding to the persons in primary units

α=1, ...,mh . Ckd is the number enumerated in the population of , θˆ θˆ interest in the intersection between cell k and domain d, Let ()hα be the estimator having the same form as but ˆ calculated on the sample from which primary unit α of and Fk is the correction factor for the coverage in cell k. The hypothesis of the synthetic model is satisfied if the stratum h. has been removed. We note that estimators Rˆ Rkˆ = K behaviour of any subset in the cell is identical to that of the mh(),α k and ch(),α k, 1, ..., are combined to form Rˆ entire cell. This homogeneity is best controlled by the net(hα ): −1 choice of cells. Here we are using the homogeneous cells ⎡ K Rˆ ⎤ ˆ ch(),α k defined by the dual model. RCCnet(hkα ) = ⎢ ⎥ . (7) ∑ Rˆ ⎣k =1 mh(),α k⎦ 3.3 Estimation cells The corrected weights w′hij used to calculate values Rˆ Rˆ The estimation cells k= 1, ..., K are constructed in mh()α and ch()α have the following form: such a way as to group together elements that have ⎧ 0ifi =α ⎪ homogeneous probabilities of enumeration in the census and ⎪ m ww′ = h ifα∈hi and ≠α (8) the survey respectively (dual hypothesis) and homogeneous hij⎨ hij m − ⎪ h 1 net coverage rates (synthetic hypothesis). We want a ⎪ ⎩whhij ifα∉ . minimum of 100 persons per cell in SE and SP in order to control the variance and limit the estimation bias. The This form of correction is preferred to the quotient variables defining the cells are selected using a logistic between the sum of the weights of the elements in the regression model and a discrimination method applied to the stratum and the sum of the weights without primary unit α data from SP (binary variable: Pj ). The three most since it allows us to take account of the variability due to the influential variables are cross-tabulated: nationality in two unknown number of elements in the stratum. categories, marital status in two categories and size of The jackknife estimator becomes: m commune in three categories. The other variables are then h ˆ ∑∑ θhα successively integrated. Groupings are created when the cell θ=ˆ h α=1 , (9) JK m sizes are too small (official language of commune in two ∑h h categories, age class in seven categories and sex in two with pseudo values θ=ˆˆmm θ−−θ(1) ˆ. The categories). In the end, 121 estimation cells are obtained; see hhα h() hα estimator of its variance can take different forms; see the Renaud (2004) for more details. example of Shao and Tu (1995). We apply the following form: 3.4 Variance of coverage estimators m − 1 mh The variance of the estimators is estimated by a stratified v θ=ˆˆh θ−θˆ2 ()JK ∑∑h α=1()()hα (.) h , (10) mh jackknife applied to the (identical) primary units of SP and m S ˆˆh ˆ E. We note that the variance of the estimated under- with θ=hh∑α=1 θα /mh. Lastly, we use v()θJK as an ˆˆ (.) () coverage Runder =−1 Rm is equal to the variance of the estimator of the variance of θˆ. The estimates in the ˆ estimated matching rate Rm. Similarly, the variance of the subgroups use the same form of estimator with integration Rˆ=−Rˆ ˆ overcoverage over 1 c is equal to that of the correct of a domain indicator in the construction of θ hα . No ˆ () record rate Rc, and the variance of the net undercoverage correction for the finite population is applied in the ˆˆ Rnetunder =−1 Rnet is equal to that of the net coverage rate estimates. Also, other variabilities are not taken into ˆ Rnet. account, such as the variability induced by the weighting θ Let be the parameter of interest taking the form of a model for non-response in SP. weighted mean of statuses in the case of undercoverage and Problems, such as the lack of stability of estimation in overcoverage, and the form of a linear function of quotients strata with few primary units, appeared in the course of between two weighted means in the case of net under- applying this approach. However, tests on the sharing of coverage. Its estimator is θˆ. some primary units and a comparison with the Taylor

Statistics Canada, Catalogue No. 12-001-X 204 Renaud: Estimation of the coverage of the 2000 census of population in Switzerland linearization or a simple jackknife suggest that the esti- For estimates of net undercoverage, it is important to mators of variance by stratified jackknife that are presented meet the balancing requirement. The criteria used in in this document are fairly conservative. defining correct statuses are thus completeness, appropriate- ness and uniqueness. The criteria of belonging to the 4. Choice of correct matching population of interest and correctness of location cannot be statuses and enumerations considered, since they are not usable in defining the correct enumeration status. The criteria of completeness, appro- A key element of coverage estimates is the definition of priateness and uniqueness are already integrated into the correct matching status S the for the elements of P and the construction of the basic statuses Pj and E j. Thus, esti- correct enumeration status S for the elements of E. These mates are made with basic statuses Pj and E j. correct statuses are defined on the basis of frames Pj and For estimates not using the dual system and the need for E j determined during the matchings. balancing, it is possible to use other criteria to define correct Is a match with a census element that is part of a statuses. Other types of correct match statuses are used in collective household accepted as a correct match for an the analysis of potential measurement errors in Section 5 element of SP, or is this a case of undercoverage of the and the more detailed analyses of matches and enumerations population of interest? Is a duplicate outside the population presented in Renaud (2004). S of interest for an element of E really considered a duplicate, and hence an instance of overcoverage, or should 5. Comparison of matches it be excluded? A clear definition is needed. Also, the statuses used in estimates of net undercoverage must be 5.1 Potential measurement errors chosen in such a way as to satisfy the balance between over- Measurement errors or classification errors are related to and undercoverage; see the concept of “balancing,” as, for coverage errors. A person who is classified in domain d example, in Hogan (2003). A match (Pj = 1) with an according to the census (e.g., a person between 10 and 19 element outside the population of interest may, for example, years of age) but who in reality is outside the domain (e.g., a be rejected as a correct match (correct match status = 0, no person 60 years of age) would end up as a case of undercoverage) only if the search for correct records would overcoverage in domain d and an undercoverage case also detect this element as incorrect because it is out of outside that domain. This misclassification does not cause a scope (correct enumeration status = 0, no overcoverage). coverage error at the overall level, but it causes an error at The criteria for defining correct statuses are constructed the level of subgroups of the population. using information available for elements of SP and SE. As The reasons for differences between the values collected regards SP, we start with the assumption that census in two surveys such as the census and the coverage survey records that were matched with elements of SP serve to may be quite varied and difficult to dissociate. It is to be identify persons (completeness) and these persons should expected that there will be matching errors, differences indeed have been enumerated (appropriateness). We also resulting from collection methods (paper questionnaire or consider that they are unique, since uniqueness, while telephone/face-to-face interview) and data processing controlled by matches, is achieved in the great majority of methods, or real differences due to the time lag between the cases controlled in SE. The criteria of belonging to the collection periods (December 2000 and April-May 2001). population and correctness of location are controlled by Also, it is difficult to determine the correct response if there comparison with the information collected in the coverage are two different values. What is the correct choice - the survey, considered as reference information. No supplemen- census? the survey? another value not collected? tary data collection was organized to resolve ambiguous Potential measurement errors with respect to census data cases. As regards SE, we have the criterion of completeness are analysed on the basis of a set of matches between the considered as having been met in the census data and the independent sample SP and the census. We choose to results concerning uniqueness and appropriateness obtained determine which variables show respectively few or many in the matching with the rest of the census. For duplicates potential classification problems, without making a and triplicates, we define E j = 1/d′ , with d′ = number of judgment on the quality of either data collection. One use of duplicates/triplicates in the population of interest according this information is to evaluate the choice of estimation cells to the census. The criteria of belonging to the population of for the dual system and select subgroups for which the interest and correctness of location for the elements of SE estimates of coverage deficiencies are soundest. cannot be controlled, since we have no reference data For the category variable X, we define the matching ˆ supplementing the census. rate in the good domain RX as follows:

Statistics Canada, Catalogue No. 12-001-XPB Survey Methodology, December 2007 205

∑ wPPj,, X j Like classification errors, location errors do not cause ˆ jS∈ P match RX = , (11) coverage errors at the overall level but they cause errors at ∑ wPj, jS∈ P match the level of subgroups such as regions or types of where wPj, is the weight of element j of matched sample communes. Different rates may be defined. We will retain SP (SP match) and the classification status PX, j is equal the basic location rate and the extended location rate, both to 1 if element j appears in the same class in the census weighted by wPj, the weight of element j of matched ˆ and the survey, and 0 otherwise. The value of RX is sample SP. The location status takes on the value 1 if the estimated with the set of matched elements and with the element lies within the basic area or the extended area, as subgroup of elements without imputation in the census. the case may be; otherwise it equals 0. In particular, we will

We also define a measure of asymmetry ϕX (,dd′ ) for study the correctness of location of persons who have classes d and d′ of variable X: moved, in order to detect possible problems relating to the time lag between Census Day and the actual day of ′ ∑ wIddPj, j(, ) ϕ=dd′ jS∈ P match collection of census data. X (, ) ′ , (12) ∑ wIddPj, j(, ) jS∈ P match Idd′ = j d 6. Results where j ( , ) 1 if element appears in domain according to the survey and in domain d′ according to the 6.1 Estimates of coverage deficiencies census, and 0 otherwise. The factor ϕX (,dd′ ) is equal to 1 if there is a balance in the classification errors - in other The overall net undercoverage rate is estimated at 1.41% words, if the number of elements in d according to the with a standard deviation of 0.12%. The overcoverage rate survey and d′ according to the census is equal to the is 0.35% (standard deviation = 0.03%) and the under- number in d′ according to the survey and in d according to coverage rate is 1.64% (standard deviation = 0.11%). These the census. The further the factor lies from 1, the less results are of the same order of magnitude as those of other balance there is. countries, although they are in the lower range; see Table 1. Overcoverage is minor in the great majority of the 5.2 Potential location errors domains studied. The highest rate is observed for persons Comparisons between the census and the survey can also between 20 and 31 years of age (0.93% with a standard be used to study people’s geographic location. In the census deviation of 0.09%); see Table 2. However, undercoverage data, we have a unique address if the person has a single is high in several domains. For example, a rate of 8.03% residence and two addresses - principal and secondary - if (standard deviation = 0.85%) is observed for foreigners with the person has two residences. In the survey data, we have temporary settlement permits (“other permits”) and a rate of one or two addresses on Census Day, one or two addresses 3.50% (standard deviation = 0.50%) is observed for 20-31- on the day of the survey and information on a possible move year-olds. Also, an undercoverage rate of 2.4% is observed between the two dates. If a person has a single residence and in the Italian-speaking region of the country (language of has not moved, that person’s principle address on Census commune: Italian; NUTS region: Ticino, and collection Day and his/her principle address on the day of the survey method: TICINO). However, the results are related to are identical. The person does not have secondary addresses. relatively great variability (standard deviation of approx.

Different measures of distance are considered in order to 0.5%), since samples SP and SE include only 1,500 and determine potential location errors in the census. For 1,700 persons respectively in this region. practical reasons, including the data available, we define Net undercoverage is positive in all the domains studied. geographic areas around the person’s principle address There is therefore no net overcoverage. The highest values collected in the survey for Census Day (reference address). are observed for foreigners with permanent or temporary The areas are sets of political communes. They are defined permits (2.89% and 3.48%, standard deviations = 0.32 % on the bases of postal codes identified in the survey. The and 0.39%) as well as for 20-31-year-olds (2.84%, standard person’s basic area is defined by the set of communes that deviation = 0.36%). No significant difference is observed have buildings within the postal code of the person’s between males and females, between languages or between reference address. The definition of this area uses data from NUTS regions. Because of the small size of the sample with the Swiss building register, since the latter has information the collection variant TICINO, this method cannot be on buildings’ postal address and the commune within which differentiated from the others used in the country. On the they are located. The extended area includes the communes other hand, significant differences are observed between within the basic area and the set of communes adjacent to marital statuses, as well as between types and sizes of them; see Renaud (2004) for examples. communes.

Statistics Canada, Catalogue No. 12-001-X 206 Renaud: Estimation of the coverage of the 2000 census of population in Switzerland

Table 1 for settlement permits (98.8% for non-imputed values). The International comparison of overall results. Estimated rates of ˆ RX rate is 99.5% for age classes (with or without impu- overcoverage Rˆover, undercoverage Rˆunder and net under- coverage Rˆnetunder, with corresponding estimated standard tations). However, it should be noted that date of birth, deviations. References: Statistics Canada (1999, 2004), Hogan along with surname and given name, was one of the main (1993, 2003), McLennan (1997) and Trewin (2003) variables in the matching. Age differences are therefore Net possible only in the case of a non-automatic (computer- Overcoverage Undercoverage undercoverage [%] [%] [%] assisted or manual) matching. Three variables exhibit a Switzerland 2000 0.3 (0.03) 1.64 (0.11) 1.41 (0.12) matching rate in the good domain that is markedly lower Canada 1996 0.74 (0.04) 3.18 (0.09) 2.45 (0.10) than that observed for sex, age, permit and marital status. These are the variables for labour market status (in the 2001 0.96 (0.05) 3.95 (0.13) 2.99 (0.14) labour market, unemployed, not in the labour market), United States 1990 3.1 4.7 1.6 (0.10) position in household (alone, spouse, common-law union, 2000 - - -0.5 (0.21) person with child or children, other head of household, Australia 1996 0.2 1.8 1.6 (0.10) related to head of household, other; results limited to private 2001 0.9 2.7 1.8 (0.10) households), and size of the person’s household (according ˆ We note that the net undercoverage rate is greater than to economic residence and in private households). The RX the undercoverage rate in the case of permanent settlement rate is 90.4% for labour force status (91.1% for non-imputed permits. This effect, which is unrealistic, is due to the choice values), 91.4% for position in household (94.9% for non- of estimation cells and the resulting smoothing. The imputed values) and 88.3% for household size. construction of the cells made it necessary to group The measure of asymmetry ϕX (,dd′ ) takes on the value foreigners with permanent and temporary permits into a 1.33 for sex (d = male and d′ = female). There are more single category for aggregates so as to obtain the minimum persons coded as males according to the survey who are size of 100 persons per cell. By making this grouping, we coded as females in the census than there are females are treating foreigners as a homogeneous group, whereas according to the survey who are males according to the this is not the case. This shows the limitations of the method census. The proportion of males is slightly higher in the and the difficulty of satisfying the assumptions of the survey. However, these results must be interpreted with models used in applying the approach. In the case of caution, since they based on very few cases; see Table 3. A foreigners, we note, however, that the confidence intervals McNemar test is just significant at the 5% level without of the undercoverage and net undercoverage rates overlap. taking the design into account, but it is no longer significant The consequences of the weaknesses of the application are at that level if the design is factored in. On the other hand, therefore limited. quite substantial asymmetries are observed for marital It should also be noted that the results are presented in status. There are fewer single persons in the survey who are domains defined by variables for which low levels of married in the census than the reverse (factor 0.33 for potential measurement errors were observed. The fact is that d = single and d′ = married). Similarly, there are fewer results for groups as defined by household or labour market married persons in the survey who are widowed in the characteristics would not be very reliable; see Section 6.2. census than the reverse (factor 0.42 for d = married and The precision of the results obtained is generally better d′ = other). Asymmetry is also observed for the settlement than the objective set at the beginning of the project. That permit variable. The tendency is to have more Swiss persons objective was to have a standard deviation of 0.3% for in the survey who are described as foreigners in the census subgroups of 10,000 individuals in SP. In the case of, for than the reverse, and to have more permanent permits in the example, age classes 32- 44 and 45-59, which have between survey and temporary permits in the census than the reverse 10,000 and 12,000 persons, the standard deviations are (factors 5.22 for d = Swiss and d′ = foreigner with 0.19% and 0.14 %. permanent permit and 3.83 for d = foreigner with permanent permit and d′ = foreigner with temporary 6.2 Potential measurement and classification errors permit). The factors calculated are based on few cases. Of the 49,107 elements matched between the coverage However, they give an insight into the potential differences survey and the census, 96% exhibit no difference in sex, the between data collection via the census questionnaire and a seven age classes, the three marital status classes and the survey conducted mainly by telephone. The labour force three settlement permit classes (Swiss, permanent, status variable includes more divergent cases; see Table 4. ˆ temporary). The matching rate in the good domain RX is Thus, for example, we observe fewer persons employed 99.3 % for sex (with and without imputations), 98.3% for force in the survey and fewer persons not in the labour force marital status (98.4% for non-imputed values) and 98.7% census than the reverse (factor of 0.46 for d = in labour

Statistics Canada, Catalogue No. 12-001-X Survey Methodology, December 2007 207 force and d′ = not in labour force). There are also fewer the elements in the boxes (,dd′ ) is sizable. The census unemployed persons in the survey and persons not in the variables at the household level (position in household and labour force in the census than the reverse (factor of 0.26 for size of household) are influenced by the complex process of d = unemployed and d′ = not in labour force). The household formation. They are less reliable than those position-in-household variable also exhibits asymmetries, concerning persons. The values at the household level are but these are based on few elements, since the dispersion of more reliable in the survey.

Table 2 Enumerated number C and estimated rates of overcoverage Rˆover, undercoverage Rˆunder and net under- coverage Rˆnetunder for different domains [%], with corresponding estimated standard deviations (SDs)

Variable Category C Rˆover SD Rˆunder SD Rˆnetunder SD Overall 7,121,626 0.35 0.03 1.64 0.11 1.41 0.12 Sex Male 3,497,940 0.37 0.04 1.74 0.13 1.46 0.13 Female 3,623,686 0.33 0.03 1.55 0.10 1.37 0.13 Age class ≤ 9 810,373 0.26 0.05 1.46 0.21 1.34 0.26 10-19 833,185 0.27 0.05 1.30 0.19 1.04 0.22 20-31 1,115,804 0.93 0.09 3.50 0.34 2.84 0.36 32-44 1,544,721 0.33 0.05 1.65 0.16 1.43 0.19 45-59 1,431,771 0.22 0.04 1.18 0.14 1.04 0.14 60-79 1,146,709 0.10 0.03 0.91 0.13 0.82 0.12 ≥ 80 239,063 0.11 0.06 1.20 0.31 1.03 0.27 Settlement permit Swiss 5,674,266 0.33 0.03 1.28 0.09 0.98 0.10 Foreigner, permanent 1,020,242 0.33 0.06 1.85 0.29 2.89 0.32 Foreigner, temporary 427,118 0.56 0.11 8.03 0.85 3.48 0.39 Marital status Single 2,975,643 0.50 0.05 2.07 0.18 1.72 0.19 Married 3,377,223 0.23 0.04 1.27 0.11 1.25 0.12 Widowed 369,339 0.25 0.08 1.23 0.26 0.79 0.13 Divorced 399,421 0.24 0.08 1.95 0.35 1.02 0.10 Commune language German + Romansh 5,128,353 0.33 0.04 1.50 0.11 1.28 0.12 French 1,680,062 0.35 0.06 1.89 0.25 1.79 0.27 Italian 313,211 0.53 0.12 2.35 0.49 1.56 0.19 NUTS region Région lémanique 1,296,464 0.37 0.07 2.19 0.38 1.84 0.28 Espace Mittelland 1,640,489 0.35 0.09 1.39 0.15 1.25 0.10 Nordwestschweiz 976,699 0.18 0.04 1.50 0.27 1.32 0.12 Zurich 1,221,014 0.31 0.05 1.58 0.19 1.46 0.13 Ostschweiz 1,020,897 0.40 0.07 1.29 0.23 1.24 0.12 Zentralschweiz 665,904 0.36 0.06 1.57 0.25 1.19 0.12 Ticino 300,159 0.54 0.12 2.38 0.52 1.57 0.19 Commune size Small 1,372,958 0.34 0.05 1.50 0.15 1.12 0.14 Medium 2,398,256 0.41 0.07 1.32 0.16 1.07 0.19 Large 3,350,412 0.31 0.03 2.01 0.19 1.77 0.19 Type City/town 2,078,780 0.35 0.04 1.96 0.17 1.82 0.20 Agglomeration 3,145,541 0.36 0.06 1.49 0.19 1.34 0.12 Rural 1,897,305 0.32 0.04 1.56 0.17 1.07 0.12 Collection method TRADITIONAL 265,607 0.39 0.05 1.91 0.28 1.07 0.12 SEMI-TRADITIONAL 174,501 0.37 0.08 1.07 0.24 1.16 0.13 TRANSIT + FUTURE 6,381,359 0.33 0.03 1.62 0.11 1.42 0.12 TICINO 300,159 0.54 0.12 2.38 0.52 1.57 0.19

Statistics Canada, Catalogue No. 12-001-X 208 Renaud: Estimation of the coverage of the 2000 census of population in Switzerland

Table 3 Comparison of values collected in the survey and the census for the sex variable

Sex Survey Male Female Total Census Not matched Total 393 383 776 Matched Total 24,171 24,936 49,107

Matched Male 23,967 166 24,133 Female 204 24,770 24,974

Matched Male 6 0 6 (imputed value) Female 0 13 13

Total 24,564 25,319 49,883

Table 4 Comparison of values collected in the survey and the census for the labour force status variable

Labour force status Survey Employed Unemployed Not in ≤ 15 years Total labour of age force Census Not matched Total 424 23 217 112 776 Matched Total 25,163 498 14,501 8,945 49,107

Matched Employed 23,953 188 2,007 13 26,161 Unemployed 300 221 323 1 845 Not in labour force 901 89 12,143 18 13,151 ≤ 15 years of age 9 0 28 8,913 8,950

Matched (imputed variable Employed 564 22 312 6 904 Unemployed 14 8 26 1 49 Not in labour force 92 15 881 5 993 ≤ 15 years of age 0 0 0 0 0

Total 25,587 521 14,718 9,057 49,883

6.3 Potential location and time lag errors Efforts to locate persons who moved indicate that 151 = 145 + 6 persons were located near their address Of the 49,107 elements matched between the coverage reported on the day of the survey and not near their Census survey and the census, 97.7% are found within the basic Day address (9%, weighted). Also, a set of 688 persons in area around the reference address collected in the survey. NORD, among the 922 located in the two basic areas, were The corresponding value is 98.1% for persons who did not actually found to be residing in the building on the day of indicate any move between Census Day and the day of the the survey. During the coverage survey, special care was survey. It is 83.9% for those who indicated a move (1,512 taken regarding questions on addresses on Census Day and persons); see absolute numbers in Table 5. on the day of the survey. We therefore believe that the It is worth noting that 9.4% of the persons in NORD who addresses of persons who moved are of better quality in the did not move are found close to their reference address but survey data than in the census data. On this basis, we deduce not in exactly the same building. While these problems of that out of the 1,512 persons who moved, at least exact location have a negligible effect on the census data, 151 + 688 = 839 are enumerated in the census at an address they show the difficulty of identifying the buildings sampled that they did not have on the official day of data collection when constructing lists of households in the field during the but at an address that they had some time after that date. The survey, as well as the difficulty of assigning persons to exact time lag is not known, since the moving date was not buildings during the processing of the census data. How- collected in the survey. ever, a supplementary survey would be needed to evaluate the respective effects of these two difficulties.

Statistics Canada, Catalogue No. 12-001-X Survey Methodology, December 2007 209

Table 5 Comparison of the location of matched persons. The areas are defined for the address on Census Day (according to information collected in the survey) and for the address on the day of the survey (also according to information collected in the survey). Presence in the basic area, the extended area (outside the basic area) or outside the extended area for persons who did not move (stayed) and persons who moved (moved) between the census and the survey

Day of survey Stayed Moved Basic area Basic area Extended area Outside extended area Total Census Basic area 46,689 922 69 277 1,268 Day Extended area 258 42 4 3 49 Outside extended area 648 145 6 28 179 Missing 0 15 1 0 16 Total 47,595 1,124 80 308 1,512

7. Conclusion adaptations to take account of the new data collection system. Overall coverage deficiencies in the 2000 census of population in Switzerland are of the same order of magnitude as those for the censuses of other countries. Acknowledgements However, differences are noted for subgroups (e.g., regions). Of the three components, undercoverage is of great I wish to thank Philippe Eichenberger of the statistical interest, since it not only serves to detect groups of persons methods unit of the Office fédéral de la statistique for the not well enumerated, but it also lends itself to analysing productive discussions throughout the project. Warm thanks location and measurement errors. As to overcoverage are also extended to Dr. Rajendra Singh and his colleagues estimates, these are limited by the lack of information in the Decennial Statistical Studies Division of the U.S. Census Bureau for their assistance in developing methods supplementary to the census for SE. In the future, they could be improved by collecting supplementary information and estimates. I also wish to thank to all census personnel on characteristics reported on Census Day in a survey of who performed tasks and provided information needed to persons in that sample (e.g., location and household type). carry out the project, and to Paul-André Salamin of the Net undercoverage estimates are based on several assump- statistical methods unit for his careful rereading of the tions. The results in large domains seem reliable, but certain article. risks, notably related to the choice of estimation cells, exist when domains are smaller. For future estimates, we propose to evaluate the model approach applied in the United References Kingdom instead of the estimation cells traditionally used in ABS (1997). The 1996 census of population and housing. Annual the United States. report 1996-97, Australian Bureau of Statistics. An important element to review for future estimates is the choice of the population of interest. The decision to limit Brown, J.J., Diamond, I.D., Chambers, R.L., Buckner, L.J. and that population to persons in private households and the Teague, A.D. (1999). A methodological strategy for a one-number census in the UK, Journal of the Royal Statistical Society, Series economic residence led to a few problems in the estimates, A, 162(2), 247-267. since it was difficult to delimit that population precisely. In a future estimation, collective households could be excluded Fienberg, S.E. (1992). Bibliography on capture-recapture modelling so as to avoid practical problems relating to collection but with application to census undercount adjustment. Survey Methodology, 18(1), 143-154. retain all types of residences. The set of records for the economic residence would then be treated as a domain. Hogan, H. (1993). The post enumeration survey: Operation and Estimating the coverage deficiencies of a census is an results. Journal of the American Statistical Association, 88(423), 1047-1060. ambitious project that has proved to be worthwhile. The results provide information on the quality of the data from Hogan, H. (2003). The accuracy and coverage evaluation: Theory and the 2000 census and the different coverage problems. design. Survey Methodology, 29(2), 129-138. Upcoming censuses will essentially be based on registers. McLennan, W. (1997). Census of Population and Housing, Data Coverage estimates will be based on the experience Quality - Undercount. Autralia 1996. Information paper 2940.0. acquired in making the 2000 estimates, with probable Australian Bureau of Statistics.

Statistics Canada, Catalogue No. 12-001-X 210 Renaud: Estimation of the coverage of the 2000 census of population in Switzerland

Renaud, A. (2001). Methodology of the Swiss Census 2000 Coverage Shao, J., and Tu, D. (1995). The Jackknife and Bootstrap. Springer Survey. Proceedings of the Survey Research Methods Section Series in Statistics. [CD-ROM], American Statistical Association. Statistics Canada (1999). Couverture. Rapport technique du Renaud, A. (2003). Estimation de la couverture du recensement de la recensement de 1996, 92-370-XIF, Statistics Canada. population de l’an 2000. Échantillon pour l’estimation de la sur- couverture (E-sample). Methodology Report 338-0019, Federal Statistics Canada (2004). Couverture. Rapport technique du Statistical Office. recensement de 2001, 92-394-XIF, Statistics Canada.

Renaud, A. (2004). Coverage estimation for the Swiss population Trewin, D. (2003). Census of Population and Housing, Data census 2000. Estimation methodology and results. Methodology Quality - Undercount. Autralia 2001. Information paper 2940.0. Report 338-0027, Swiss Federal Statistical Office. Australian Bureau of Statistics.

Renaud, A., and Eichenberger, P. (2002). Estimation de la couverture Wolter, K.M. (1986). Some coverage error models for census data. du recensement de la population de l’an 2000. Procédure Journal of the American Statistical Association, 81(394), 338-346. d’enquête et plan d’échantillonnage de l’enquête de couverture. Methodology Report 338-0009, Federal Statistical Office.

Renaud, A., and Potterat, J. (2004). Estimation de la couverture du recensement de la population de l’an 2000. Echantillon pour l’estimation de la sous-couverture (P-sample) et qualité du cadre de sondage des bâtiments. Methodology Report 338-0023, Federal Statistical Office.

Statistics Canada, Catalogue No. 12-001-X Survey Methodology, December 2007 211 Vol. 33, No. 2, pp. 211­215 Statistics Canada, Catalogue No. 12­001­X

Use of a web­based convenience sample to supplement a probability sample

Marc N. Elliott and Amelia Haviland 1

Abstract In this paper we describe a methodology for combining a convenience sample with a probability sample in order to produce an estimator with a smaller mean squared error (MSE) than estimators based on only the probability sample. We then explore the properties of the resulting composite estimator, a linear combination of the convenience and probability sample estimators with weights that are a function of bias. We discuss the estimator’s properties in the context of web­based convenience sampling. Our analysis demonstrates that the use of a convenience sample to supplement a probability sample for improvements in the MSE of estimation may be practical only under limited circumstances. First, the remaining bias of the estimator based on the convenience sample must be quite small, equivalent to no more than 0.1 of the outcome’s population standard deviation. For a dichotomous outcome, this implies a bias of no more than five percentage points at 50 percent prevalence and no more than three percentage points at 10 percent prevalence. Second, the probability sample should contain at least 1,000­10,000 observations for adequate estimation of the bias of the convenience sample estimator. Third, it must be inexpensive and feasible to collect at least thousands (and probably tens of thousands) of web­based convenience observations. The conclusions about the limited usefulness of convenience samples with estimator bias of more than 0.1 standard deviations also apply to direct use of estimators based on that sample.

Key Words: Bias; Composite estimator; Calibration.

1. Introduction possibility of integrating web­based convenience samples into the context of probability sampling. Web­based surveys have steadily increased in use and In this paper we describe a methodology for combining a take a variety of forms (Couper 2000). For instance, web­ convenience sample with a probability sample to produce an based probability samples use a traditional sampling frame estimator with a smaller mean squared error (MSE) than and provide web­mode as one response option or the only estimators that employ only the probability sample. We then response option. Web­based probability samples can have explore the properties of the resulting composite estimator, a high response rates and produce estimators with minimal linear combination of the convenience and probability non­response bias (Kypri, Stephenson and Langley 2004). samples with weights determined by bias. This leads to In contrast, web­based convenience samples are based on recommendations regarding the usefulness of supple­ “inbound” hits to web pages obtained from anyone online menting probability samples with web­based convenience who finds the site and chooses to participate (sometimes as samples. Because the marginal costs of web­based con­ a result of advertising to a population that is not specifiable) venience samples are very low, we focus on identifying or based on volunteerism from recruited panels that are not situations in which the increase in effective sample size necessarily representative of the intended population. (ESS) attributable to the inclusion of the convenience The primary appeal of web­based convenience samples sample may be sufficient to justify a dual­mode approach. lies in the potentially very low marginal cost per case. Visits We demonstrate that there are limited circumstances under to a web site do not require expensive labor (as for phone which a supplemental web­based convenience sample may calls) or materials (as for mailings) for each case, combined meaningfully improve MSE. While we focus on web­based with rapid data collection and reductions in marginal data convenience samples, the discussion that follows applies to processing costs per case. Even with some fixed costs, the other low­cost data collection methods with poor population total costs per case are potentially very low, especially for coverage. large surveys. The disadvantage of these samples is also clear: potentially large and unmeasured selection bias. Most discussions of web­based convenience samples of 2. Problem context which we are aware have either argued that probability samples are unimportant in general, tried to delineate the 2.1 Initial conditions circumstances under which convenience samples may be For the combined probability/convenience sample, we useful, or dismissed the use of convenience samples propose that the same survey be administered simul­ entirely. We explore a different avenue by investigating the taneously to a traditional probability sample (with or

1. Marc Elliott, Ph.D. and Amelia Haviland, Ph.D., Rand Corporation, 1776 Main Street, Santa Monica, CA 90407, U.S.A. 212 Elliott and Haviland: Use of a Web­based convenience sample to supplement a probability sample without a web­based response mode) and a web­based We then can estimate the remaining bias for a given convenience sample. We envision a multi­purpose survey parameter as the difference between the estimate in the with a number of survey outcomes. In this paper we will probability sample and the weighted estimate in the focus on the estimation of means, but future work might convenience sample. extend these results to other parameters, such as regression parameters. Although we will initially consider cases where the bias of convenience sample estimates is known, we will 3. Efficiency considerations later consider the extent to which the probability sample 3.1 Linear combinations of biased and unbiased provides a means of measuring the unknown bias in each estimators of a population mean parameter estimate from the convenience sample. With known bias, one may combine the convenience and The most efficient estimator that is a linear combination probability samples in a manner that minimizes MSE. If of the (weighted) convenience and probability samples is a estimates from the convenience sample are very biased, the special case of an estimator given in a result by Rao (2003, convenience sample will accomplish little. This possibility pages 57­58). The properties of this estimator lead to requires that the probability sample be large enough to stand general recommendations regarding the conditions of on its own. Thus, one approach would be to set aside a small probability sample size, convenience sample size, and portion of the probability sample budget to create a large convenience sample estimator bias under which the convenience sample supplement. convenience sample meaningfully improves the ESS of the For example, consider a survey for which the primary probability sample. interest is in estimates for the population as a whole, but for We begin by asking: What is the most efficient estimator which subpopulations estimates would also be desirable if a of this form when the magnitude of the bias is known? We sample size supporting adequate precision were affordable. will later consider relaxing the assumption of known Suppose further that one could draw 4,000 probability magnitude of the bias. observations and 10,000 convenience observations for the Let n1 and n2 be the effective sample sizes of the cost of the probability sample of 5,000. For a given probability sample and convenience samples, respectively, outcome, if bias is large, standard errors increase moderately after dividing nominal sample sizes by design effects through a small proportionate loss in sample size; if bias is associated with the sample design and non­response small overall and within each subpopulation, there might be adjustments. This includes propensity score or other a “precision windfall,” allowing acceptably precise sub­ weighting in the case of the convenience sample. The 2 population analyses. former population has mean µ , and variance σ1 ; the latter has mean µ + ε and variance σ 2 , where ε is the known 2.2 Initial bias reduction 2 bias remaining after weighting and µ is the unknown We will demonstrate that the bias of convenience sample parameter of interest. The corresponding sample means 2 estimators must be quite small for the sample to be useful, have expectation µ and µ + ε and variance σ i/ n i for suggesting that it may be best to focus on estimating = 1, 2 i under an infinite population sampling model. We parameters that are typically subject to less bias than overall assume these two estimators are uncorrelated, as they come unadjusted population estimates of proportions or means, from independent samples. such as regression coefficients (Kish 1985). From Rao (2003, pages 57­58), the most efficient Additionally, one might reduce bias by calibrating the composite estimator of µ takes the form convenience sample to known population values (Kalton x(σ2 / n ) + x ( ε2 + σ 2 / n ) and Kasprzyk 1986) or by applying propensity score µˆ = 2 1 1 1 2 2 , 2 2 2 weights that model membership selection between the two ε + σ1//n 1 + σ 2 n 2 samples to observations from the convenience sample (Rosenbaum 2002; Rosenbaum and Rubin 1983). A small with remaining bias set of items can be included to allow the use of either 2 approach. These items might include both items that predict  σ1 / n 1  εc = ε 2 2 2  differences between respondents to web surveys and other  ε + σ1//n 1 + σ 2 n 2  survey modes, as well as items tailored to the content of the particular survey. The design effect from the resulting and variable weights will reduce the ESS for convenience (σ2 /n )( ε 2 + σ 2 / n ) sample estimators, but the low costs of these observations MSE = 1 1 2 2 . c 2 2 2 makes compensating for moderate design effects affordable. ε + σ1//n 1 + σ 2 n 2

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 213

As can be seen, the composite estimator is a convex Table 1 Maximum contributions of convenience samples to the estimation of a proportion by bias in percentage combination of the convenience sample and probability points sample means. The influence of the former is determined by & the ratio of the MSE (here variance) of the probability E Overall Prevalence of Outcome MICCS (Standardized 10% 50% sample mean to the sum of that term and the MSE of the # Bias ) Bias in Percentage Points convenience sample mean. Similarly, the remaining bias is 0.01 0.3% 0.5% 10,000 the original bias multiplied by this same ratio, whereas the 0.02 0.6% 1.0% 2,500 resultant MSE is the product of the two MSEs divided by 0.05 1.5% 2.5% 400 c 0.10 3.0% 5.0% 100 their sum. Note that bias approaches zero both as ε→ 0 (no 0.20 6.0% 10.0% 25 selection bias in the convenience sample estimate) and as # Of estimators of means using only the convenience sample ε → ∞ (no weight given to the convenience sample). & ESS added with an infinitely large convenience sample relative to 3.2 Quantifying the contributions of the no use of a convenience sample. convenience sample For a proportion near 50%, a bias of 2.5 percentage

We now can evaluate the contributions of the points limits the potential increment of ESS 1 to 400. The convenience sample based on the known remaining bias in minimum increment to ESS 1 that offsets the fixed cost of its associated estimators. To this end, we will define several setting up the web­based response mode will vary by user, quantities. but we suspect increments of less than 100 will rarely be Let cost­effective. Table 1 then implies that convenience 2 2 2 2 samples for which the standardized biases of estimators σ/ n  ε + σ/n + σ / n  ESS =1 1 n =  1 1 2 2  n restricted to the convenience sample generally exceed 0.1 1 MSE 1 ε2 + σ 2 / n 1 c  2 2  standard deviations will rarely prove cost­effective. For a be the effective sample size needed for an unbiased sample dichotomous variable with P between 0.1 and 0.5 this mean with the same MSE as the composite estimator. To corresponds to a bias of 3 to 5 percentage points. further simplify this expression, let us define the remaining How easily are biases of this magnitude achieved with standardized bias, = ε/, σ E and consider the case in which adjusted estimates from convenience samples? Several the observations from the convenience and probability studies compared propensity­weighted web­based conve­ nience samples to RDD surveys. One (Taylor 2000) populations have equal variance, (σ1 = σ 2 = σ ). In this advocated the stand­alone use of such convenience samples case, the increment to ESS 1 attributable to the convenience despite differences of as much as five percentage points in a sample, the difference between ESS 1 with and without the convenience sample, is number of estimates for dichotomous outcomes regarding political attitudes, with standardized bias of 0.05 to 0.10 if 1 1  one treats RDD as a gold standard. Another (Schonlau, 2 = n 2 2  . 1/ n2 + E1 + n2 E  Zapert, Simon, Sanstad, Marcus, Adams, Spranca, Kan, Turner and Berry 2003) does not report magnitudes of 3.3 Maximum contribution of the convenience differences, but does report that 29 of 37 items regarding sample health concerns exhibit differences that are statistically 2 As 2 → ∞, n the increment to ESS 1 approaches 1/ . E significant at <0.01. p Given the reported sample sizes This limit, the inverse of the squared standardized bias, is (and optimistically ignoring any DEFF from weighting), it the maximum possible incremental contribution of the can be shown that significance at that threshold implies convenience sample to the ESS 1 (abbreviated MICCS). If point estimates of standardized bias exceeding 0.05 for the MICCS is small, then a convenience sample of any size estimators of 78% of items. The key outcome in a Slovenian cannot meaningfully improve MSE. If the MICCS is large comparison of a probability phone sample and a Web­based enough to be meaningful, we then need to consider what convenience sample (Vehovar, Manfreda and Batagelj convenience sample sizes are needed to achieve a large 1999) would be estimated with a standardized bias of more proportion of the MICCS. than 0.1 from the convenience sample even after extensive To develop intuition for the magnitude of E (standardized weighting adjustments. It should be noted that there may bias) we consider the important case of a dichotomous also be mode effects on responses for the Web mode when outcome, for which EPP= ε/ (1 − ) where P is the compared to a telephone mode among subjects randomized population probability of the outcome. Table 1 below to response mode (Fricker, Galesic, Tourangeau and Yan translates bias for a dichotomous outcome from percentage 2005), so that not all differences between Web convenience points to standardized bias and then to the corresponding samples and non­Web probability samples may result from MICCS for = 0.1 P and = 0.5. P selection.

Statistics Canada, Catalogue No. 12­001­X 214 Elliott and Haviland: Use of a Web­based convenience sample to supplement a probability sample

2 2 3.4 Actual contribution of the convenience sample estimate of bias is σεˆ = σ 1//. 1 + σ 2 2 nn If σ1 = σ 2 = σ, While the maximum possible increment (MICCS) is the true standard error for the estimate of standardized bias 2 (E) is σˆ =1/1 + 1/ 2 . nn No matter how large the 1/ , E the actual increment to ESS 1 can be expressed as E (k / k + 1) MICCS where = E 2 . kn The shortfall of the convenience sample, this term can never be less than the 2 inverse of the square root of the probability sample size. actual increment to ESS 1 from the MICCS can then be expressed as MICCS­ ESS= 1/[(E2 )(1 + n E 2 )]. This It has been demonstrated that the relative error in MSE 1 2 for a composite estimator is relatively insensitive to small implies that the returns to ESS 1 diminish with increasing size of the convenience sample, more quickly with large errors in the estimates of bias (Schaible 1978), which is bias since the bias eventually dominates any further encouraging for well­estimated biases. Unfortunately, unless variance reduction. Half of the MICCS noted is achieved both the probability and convenience ESS are large, the when the ESS of the convenience sample is equal to the standard error of the estimate for E is impractically large MICCS. For example, if bias is 0.01 standard deviations and relative to the values of E that make the convenience a convenience sample has an ESS of 10,000, then the supplement useful ( < 0.10). E For example, suppose that a MICCS is 10,000, but the actual incremental contribution to probability sample of ESS 1,000 and a convenience sample of ESS 5,000 yielded a point estimate of standardized bias ESS 1 will be 5,000. This suggests that convenience samples with ESS 2­20 times as large as MICCS will suffice for of 0.02. If the point estimate were correct, the convenience most purposes, which correspond to 67%­95% of the sample would increase the ESS 1 by 1,667. But this estimate potential gain in ESS. Such heuristics in turn imply could also have a true bias of 0.088 standard deviations collecting 200 ­ 4,000 such cases when E is relatively large (95% upper confidence limit), which would imply that the ( = 0.05 E to 0.10) and 5,000 ­ 200,000 such cases when E increment would be less than 130. is relatively small ( = 0.01 E to 0.02). Table 2 provides If we assume that the convenience sample size will always be at least twice the probability sample size, these illustrative examples of the ESS 1 achieved at several combinations of sample sizes and bias. results imply that practical applications of this technique must have a minimum sample size of 1,000­10,000 for the Table 2 Examples of ESS1 at several sample sizes and levels of probability sample if they are to address the uncertainty in standardized bias the magnitude of bias in convenience sample estimators & n1 n2 E ESS1 for the ESS1 / n1 (standard errors of E in the 0.01 to 0.04 range). (Probability (Convenience (Standardized Composite Sample Size) Sample Size) Bias # ) Estimate 1,000 1,000 0.01 1,909 1.909 1,000 1,000 0.10 1,091 1.091 4. Discussion 1,000 10,000 0.01 6,000 6.000 1,000 10,000 0.10 1,099 1.099 We describe a composite estimator that is a linear 1,000 100,000 0.01 10,091 10.091 combination of an unbiased sample mean estimate from a 1,000 100,000 0.10 1,100 1.100 10,000 1,000 0.01 10,909 1.091 probability sample and a biased (propensity­score weighted) 10,000 1,000 0.10 10,091 1.009 sample mean estimate from a web­based convenience 10,000 10,000 0.01 15,000 1.500 sample. We use the MSE of this composite estimator to 10,000 10,000 0.10 10,099 1.010 characterize the contributions of the convenience sample to 10,000 100,000 0.01 19,091 1.909 10,000 100,000 0.10 10,100 1.010 an estimator based only on the probability sample in terms Number of estimators of means using only the convenience sample of ESS. We then calculate the maximal contribution of the # Of estimators of means using only the convenience sample convenience sample, the role of the convenience sample & ESS relative to no use of a convenience sample. size in approaching this limit, and the roles of both sample sizes in estimating bias with sufficient precision. 3.5 Precision for estimating bias Practitioners sometimes assume that small probability Heretofore, we have assumed a known bias in samples are sufficient to estimate the bias in estimates from convenience sample estimators; in practice, the bias will corresponding convenience samples. Our results suggest need to be estimated using information from both samples. otherwise. We demonstrate that the standardized bias of We next explore the extent to which the size of the web­based convenience sample estimators after initial probability sample also constrains the usefulness of the adjustments to reduce bias must be quite small (no more convenience sample through the need to precisely estimate than 0.1 standard deviations, and probably less than 0.05 the remaining bias. standard deviations) for the MSE of the overall estimate to We can estimate ε as the difference between the sample be meaningfully smaller than it would be without use of the mean of the probability sample and the weighted mean of convenience sample. We further demonstrate that the convenience sample. The true standard error for the convenience sample sizes of thousands or tens of thousands

Statistics Canada, Catalogue No. 12­001­X Survey Methodology, December 2007 215 are also needed to realize practical gains. Finally, we Michelle Platt, B.A. for their assistance in the preparation of demonstrate that a large probability sample size (1,000­ the manuscript. 10,000) is also needed for reasonably precise estimates of Marc Elliott is supported in part by the Centers for the remaining bias in initially bias­adjusted convenience Disease Control and Prevention (CDC U48/DP000056); the sample estimators. Because the bias of estimates in an views expressed in this article do not necessarily reflect the application to a multipurpose survey is likely to vary by views of the Centers for Disease Control and Prevention. outcome, the global decision to substitute a large number of inexpensive surveys for fewer traditional surveys must be References made carefully. The greatest opportunity in cost savings may be in large Couper, M.P. (2000). Web surveys: A review of issues and surveys, simply as a function of their size. On the other approaches. Public Opinion Quarterly, 64, 464­494. hand, the greatest proportionate gains in precision are likely Fricker, S., Galesic, M., Tourangeau R. and Yan, T. (2005). An to occur for samples of intermediate size. Gains might also experimental comparison of web and telephone surveys. Public be substantial for large samples in which the main Opinion Quarterly, 69, 370­392. inferences are smaller subgroups. For example, a national Kalton, G., and Kasprzyk, D. (1986). The treatment of missing survey survey of 100,000 individuals might make inference to 200 data. Survey Methodology, 12, 1­16. geographic subregions, with samples of 500 for each. If one Kish, L. (1985). Survey Sampling. New York: John Wiley & Sons, supplemented this national sample with a very large web­ Inc. based convenience sample, estimated the bias nationally, Kypri, K., Stephenson, S. and Langley, J. (2004). Assessment of nonresponse bias in an internet survey of alcohol use. Alcoholism: and elected to assume that the bias did not vary regionally, Clinical and Experimental Research, 28, 630­634. one might decrease the MSE of the sub­region estimates Rao, J.N.K. (2003). Small Area Estimation. New York: John Wiley & substantially through the use of such a composite estimator. Sons, Inc. As a final caveat, the conclusions about the limited Rosenbaum, P.R. (2002). Observational Studies, 2 nd Edition, New usefulness of convenience samples with estimator bias of York: Springer­Verlag. more than 0.1 standard deviations are not limited to attempts Rosenbaum, P.R., and Rubin, D.B. (1983). The central role of the to use a composite estimator. The same approach can be propensity score in observational studies for casual effects. applied to show that an estimator based only on a Biometrika, 70, 41­55. convenience sample of any size with a standardized bias of Schaible, W.A. (1978). Choosing weights for composite estimators 0.2 (e.g., ten percentage points for a dichotomous variable for small area statistics. Proceedings of the Section on Survey Research Methods, American Statistical Association, 741­746. with = 0.5) P will have an MSE greater than or equal to that of an estimate from a probability sample of size 25. Schonlau, M., Zapert, K., Simon, L.P., Sanstad, K., Marcus, S., Adams, J., Spranca, M., Kan, H., Turner, R. and Berry, S. (2003). A comparison between responses from propensity­weighted web survey and an identical RDD survey. Science Computer Review, Acknowledgements 21, 1­11. Taylor, H. (2000). Does internet research ‘work’? Comparing on­line The authors would like to thank John Adams, Ph.D., for survey results with telephone surveys. International Journal of his comments on the manuscript, Matthias Schonlau, Ph.D., Market Research, 42, 41­63. for his comments on an earlier version of the manuscript, Vehovar, V., Manfreda, K.L. and Batagelj, Z. (1999). Web surveys: and Colleen Carey, B.A., Kate Sommers­Dawes, B.A., and Can weighting solve the problem? Proceedings of the Section on Survey Research Methods, American Statistical Association, 962­ 967.

Statistics Canada, Catalogue No. 12­001­X

217

ACKNOWLEDGEMENTS

Survey Methodology wishes to thank the following people who have provided help or served as referees for one or more papers during 2007.

R.C. Bailey, George Mason University S. Lele, University of Alberta M. Bankier, Statistics Canada C. Léon, Statistics Canada J.­F. Beaumont, Statistics Canada P.S. Lévy, RTI International Y. Bélanger, Statistics Canada S. Lohr, Arizona State University W. Bell, U.S. Bureau of the Census W.W. Lu, Acadia University Y. Berger, University of Southampton L. Mach, Statistics Canada C. Boudreau, Medical College of Wisconsin T. Maiti, Iowa State University M. Brick, Westat, Inc. D. Malec, United States Bureau of the Census J. Chen, University of Waterloo, Canada H. Mantel, Statistics Canada R. Clark, University of Wollongong, Australia A. Matei, Université de Neuchâtel, Switzerland E. Dagum, University of Bologna S. Merad, Office for National Statistics, UK A. Dessertaine, EDF, R&D­OSIRIS­CLAMART J. Montequila, Westat, Inc. F. Dupont, INSEE R. Munnich, University of Tubingen M. Elliott, University of Michigan P.L. Nascimento Silva, University of Southampton J. Eltinge, U.S. Bureau of Labor Statistics G. Nathan, Hebrew University of Jerusalem L.R. Ernst, Bureau of Labour Statistics, U.S.A. J. Opsomer, Iowa State University L. Fattorini, Università di Siena S.M. Paddock, Rand corporation, U.S.A. R. Fisher, U.S. Department of Treasury S. Pramanik, JPSM, University of Maryland W. Fuller, Iowa State University N. Prasad, University of Alberta J. Gambino, Statistics Canada L. Qualité, Université de Neuchâtel, Switzerland D. Garriguet, Statistics Canada T.J. Rao, ISI C. Girard, Statistics Canada J.N.K. Rao, Carleton University E. Gois, Instituto Nacional de Estatistica, Portugal L.­P. Rivest, Université Laval B. Graubard, National Cancer Institute S. Rubin­Bleuer, Statistics Canada R. Griffin, US Bureau of the Census M. del Mar Rueda, University of Granada D. Haziza, Statistics Canada P. Saavedra, ORC Macro, U.S.A. S. Heeringa, ISR, Michigan F. Scheuren, NORC M. Hidiroglou, Statistics Canada E. Schindler, US Bureau of the Census J. Jiang, University of California at Davis K. Sinha, University of Illinois at Chicago, U.S.A. W. Kalsbeek, U. of North Carolina C.J. Skinner, University of Southampton J.­K. Kim, Department of Applied Statistics, Seoul, Korea P.J. Smith, Center for Disease Control and Prevention P. Knottnerus, Statistics Netherlands M. Thompson, University of Waterloo E. Korn, National Institute of Health Y. Tillé, Université de Neuchâtel P. Kott, National Agricultural Statistics Service F. Verret, Statistics Canada D. Ladiray, INSEE S.G. Walker, University of Kent, U.K. P. Lahiri, JPSM, University of Maryland W. Winkler, US Bureau of the Census N. Laniel, Statistics Canada K. Wolter, NORC M. Latouche, Statistics Canada C. Wu, University of Waterloo P. Lavallée, Statistics Canada W. Yung, Statistics Canada S. Lee, University of California at Los Angeles

Acknowledgements are also due to those who assisted during the production of the 2007 issues: Cécile Bourque, Louise Demers, Anne­Marie Fleury, Roberto Guido, Liliane Lanoie, Micheal Pelchat and Isabelle Poliquin (Dissemination Division), Sheri Buck (Systems Development Division), François Beaudin (Official Languages and Translation Division) and Sophie Chartier and Gayle Keely (Household Survey Methods Division). Finally we wish to acknowledge Christine Cousineau, Céline Ethier and Denis Lemire of Household Survey Methods Division, for their support with coordination, typing and copy editing.

JOURNAL OF OFFICIAL STATISTICS An International Review Published by Statistics Sweden

JOS is a scholarly quarterly that specializes in statistical methodology and applications. Survey methodology and other issues pertinent to the production of statistics at national offices and other statistical organizations are emphasized. All manuscripts are rigorously reviewed by independent referees and members of the Editorial Board.

Contents Volume 23, No. 1, 2007 Challenges to the Confidentiality of U.S. Federal Statistics, 1910­1965 Margo Anderson and William Seltzer ...... 1 Efficient Stratification Based on Nonparametric Regression Methods Enrico Fabrizi and Carlo Trivisano...... 35 A Selection Strategy for Weighting Variables Under a Not­Missing­at­Random Assumption Barry Schouten...... 51 Imputing for Late Reporting in the U.S. Current Employment Statistics Survey Kennon R. Copeland and Richard Valliant ...... 69 Incentives in Random Digit Dial Telephone Surveys: A Replication and Extension Richard Curtin, Eleanor Singer and Stanley Presser...... 91 Methods for Achieving Equivalence of Samples in Cross­National Surveys: The European Social Survey Experience Peter Lynn, Sabine Häder, Siegfried Gabler and Seppo Laaksonen...... 107 Book and Software Reviews ...... 125

Contents Volume 23, No. 2, 2007 Estimation of Nonresponse Bias in the European Social Survey: Using Information from Reluctant Respondents Jaak Billiet, Michel Phillippens, Rory Fitzgerald and Ineke Stoop ...... 135 Measuring Disability in Surveys: Consistency over Time and Across Respondents Sunghee Lee, Nancy A. Mathiowetz and Roger Tourangeau...... 163 Quantifying Stability and Change in Ethnic Group Ludi Simpson and Bola Akinwale ...... 185 Seasonal Adjustment of Weekly Time Series with Application to Unemployment Insurance Claims and Steel Production William P. Cleveland and Stuart Scott...... 209 Finite Population Small Area Interval Estimation Li­Chun Zhang ...... 223 Predicting Natural Gas Production in Texas: An Application of Nonparametric Reporting Lag Distribution Estimation Crystal D. Linkletter and Randy R. Sitter...... 239 Disclosure Avoidance Practices and Research at the U.S. Census Bureau: An Update Laura Zayatz ...... 253 Book and Software Reviews ...... 267 Contents Volume 23, No. 3, 2007 The Morris Hansen Lecture 2006 Statistical Perspectives on Spatial Social Science Michael F. Goodchild...... 269 Using Geospatial Information Resources in Sample Surveys Sarah M. Nusser ...... 285 Discussion Linda Williams Pickle...... 291 Optimizing the Use of Microdata: An Overview of the Issues Julia Lane...... 299 Benchmarking the Effect of Cell Adjustment on Tabular Outputs: The Shortcomings of Current Approaches Paul Williamson ...... 319 Summary of Accuracy and Coverage Evaluation for the U.S. Census 2000 Mary H. Mulry...... 345 Resampling Variance Estimation in Surveys with Missing Data A.C. Davison and S. Sardy...... 371 Nonresponse Among Ethnic Minorities: A Multivariate Analysis Remco Feskens, Joop Hox, Gerty Lensvelt­Mulders and Hans Schmeets ...... 387 Procedures for Updating Classification Systems: A Study of Biotechnology and the Standard Occupational Classification System Neil Malhotra and Jon A. Krosnick ...... 409

All inquires about submissions and subscriptions should be directed to [email protected] The Canadian Journal of Statistics La Revue Canadienne de Statistique

CONTENTS TABLE DES MATIÈRES

Volume 35, No. 1, March/mars 2007 Paul GUSTAFSON Editorial ...... 1 Report from the previous Editor...... 5 Acknowledgement of referees’ services...... 7 Renato ASSUNCÇÃO, Andréa TAVARES, Thais CORREA & Martin KULLDORFF Space­time cluster identification in point processes...... 9 Elvan CEYHAN, Carey E. PRIEBE & David J. MARCHETTE A new family of random graphs for testing spatial segregation ...... 27 Aurélie LABBE & Mary E. THOMPSON Multiple testing using the posterior probabilities of directional alternatives, with application to genomic studies...... 51 H.a.o.l.a.n. LU, James S. HODGES & Bradley P. CARLIN Measuring the complexity of generalized linear hierarchical models...... 69 Aurore DELAIGLE Nonparametric density estimation from data with a mixture of Berkson and classical errors ...... 89 Ozlem ILK & Michael J. DANIELS Marginalized transition random effect models for multivariate longitudinal binary data...... 105 Richard A. LOCKHART, John J. SPINELLI & Michael A. STEPHENS Cramér­von Mises statistics for discrete distributions with unknown parameters...... 125 P.W. FONG, W.K. LI, C.W. YAU & C.S. WONG On a mixture vector autoregressive model ...... 135 Jung Wook PARK, Marc G. GENTON & Sujit K. GHOSH Censored time series analysis with autoregressive moving average models ...... 151 Abdessamad SAIDI Consistent testing for non­correlation of two cointegrated ARMA time series...... 169 Forthcoming papers/Articles à paraître...... 189 Complete online access to The Canadian Journal of Statistics ...... 191

Volume 35, No. 2, June/juin 2007 Pierre DUCHESNE On consistent testing for serial correlation in seasonal time series models ...... 193 Jaakko NEVALAINEN, Denis LAROCQUE & Hannu OJA On the multivariate spatial median for clustered data...... 215 Liqun WANG A unified approach to estimation of nonlinear mixed effects and Berkson measurement error models...... 233 Juan Carlos PARDO­FERNÁNDEZ, Ingrid VAN KEILEGOM & Wenceslao GONZÁLEZ­MANTEIGA Goodness­of­fit tests for parametric models in censored regression...... 249 Song Xi CHEN & Tzee­Ming HUANG Nonparametric estimation of copula functions for dependence modelling ...... 265 Victor DE OLIVEIRA Objective Bayesian analysis of spatial data with measurement error ...... 283 Gonzalo GARCÍA­DONATO & Dongchu SUN Objective priors for hypothesis testing in one­way random effects models ...... 303 Hui LI, Guosheng YIN & Yong ZHOU Local likelihood with time­varying additive hazards model ...... 321 Forthcoming papers/Articles à paraître...... 338 Complete online access to The Canadian Journal of Statistics ...... 339 Volume 35, No. 3, September/septembre 2007

David F. ANDREWS Robust likelihood inference for public policy...... 341

Zeny Z. FENG, Jiahua CHEN, Mary E. THOMPSON Asymptotic properties of likelihood ratio test statistics in affected­sib­pair analysis ...... 351

Hwashin H. SHIN, Glen TAKAHARA & Duncan J. MURDOCH Optimal designs for calibration of orientations...... 365

Jiancheng JIANG, Haibo ZHOU, Xuejun JIANG & Jianan PENG Generalized likelihood ratio tests for the structure of semiparametric additive models...... 381

Runze LI & Lei NIE A new estimation procedure for a partially nonlinear model via a mixed­effects approach...... 399

Axel MUNK, Matthias MIELKE, Guido SKIPKA & Gudrun FREITAG Testing noninferiority in three­armed clinical trials based on likelihood ratio statistics ...... 413

Inkyung JUNG & Martin KULLDORFF Theoretical properties of tests for spatial clustering of count data...... 433

QiQi LU & Robert B. LUND Simple linear regression with multiple level shifts...... 447

Forthcoming papers/Articles à paraître...... 459

Volume 36 (2008): Subscription rates/Frais d’abonnement ...... 460