Samples and Author(s): Leslie Kish Source: International Statistical Review / Revue Internationale de Statistique, Vol. 47, No. 2 (Aug., 1979), pp. 99-109 Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1402563 . Accessed: 13/07/2011 16:10

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=isi. .

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org InternationalStatistical Review, 47 (1979) 99-109 Longman Group Limited/Printed in Great Britain

Samples and Censuses'

Leslie Kish The University of Michigan, Ann Arbor, Michigan 48106, U.S.A.

Summary Two relatedtopics receivebrief but comprehensivereviews, for guidingdecisions about threesources for collectingdata. First, the relativeadvantages of samples,censuses, and registersare comparedalong eight criteria:cost, detail, timeliness, relevance, etc. Second,15 methods are indicated for using samples in connection with censuses; they are sorted into five kinds of purposes: as substitutesfor, or as aids to, censuses; fromcensus tapes; censuses as auxiliarydata for sampling.Finally, current and future paths are indicated for combiningthe strengthsof the threesources, in orderto obtainaccurate estimates which are bothtimely and detailedfor localareas and smalldomains.

The Editor draws attention to the timeliness of this contribution in connection with the design and execution of the censuses to be taken in 1980/81 and the many kinds of involved.

1 Introduction This brief reviewpaper has a double purpose: first to comparethe relativeadvantages and disadvantagesof samples and censuses; and then to indicate the variety of ways for using samplesto good advantagein connectionwith censuses.Thus we note the ways first of com- petition,then of collaborationbetween the two sourcesof data.These two topics are distinct yet closely related,so that a joint treatmentshould benefiteach topic. The paper is aimed at readerswho are acquaintedboth with samplingmethods and with the conductof completecensuses. Concise reminders of the possibilitiesand problemsof both samplesand censusesare listed and noted. Thenthe manyopportunities offered by their com- bined uses are reviewed.Those possibilities,problems, and opportunitiesare too numerousto be spelledout in detail,or even to be intelligiblyand correctlydefined. Instead I merelypresent a reasonablycomprehensive view of the diverseaspects. First,a list of eightcriteria is presentedfor assessingthe relativeadvantages and problemsof samplesand of censuses;administrative registers are also comparedwith them.Then a list of fifteen methods are developedto exhibit the diverseuses of samplingin combinationwith censuses.Though brief, these two lists shouldremind readers of methodsthey read or thought about, or perhapsused. The lists aim to stimulatethe readerto think and investigatefurther; they do not try to supplyready answers for diversesituations. I would comparethese lists to the checklistsairplane pilots must completebefore they take off. I especiallyhope that these lists will be usefulfor planningsamples in connectionwith the censusesof 1980and after,and especiallyin the vast majorityof less developedcountries. The remarksfocus on censusesof population and housing, but most are also relevant to agricultural, industrial, and other censuses. Instead of the details and depths of some treatments that these topics received elsewhere (see references), I am aiming here at comprehensiveness. I hope and expect that the readers

1 I am gratefulfor remarksof F. Azorin, J. Desabie, and E. Szabady, also for stimulationfrom statisticiansof 15 Asian countries at the SamplingWorking Group at the East-West Center, Honolulu, 24 January- 16 February1979. 100 will supplythe technicaldetails and the necessaryparameters (cost, variances,requirements) for their own specificsituations.

2 Comparisonsof sampleswith censuses The list of eight criteriaI use here grew throughdiscussions, from introspection,and from recollectingexperiences without recordingtheir sources. The eight criteriagrew from only three,and the numbercould be expandedif a longerlist would be more useful.The scopes of some of the criteriaare expandedwith severalclosely relatedmeanings appearing on a single line. Table 1 Eight criteriafor comparingthree sources of data

Criteria Samples Census Adm. Registers Rich, Complex, Diverse, Flexible * * Accurate, Relevant, Pertinent * ? Inexpensive * * * * Timely, opportune,seasonal * * * Precise (large and complete) * * Detailed ** * Inclusive (coverage),credible, P.R. * ? Population content * * *

These eight criteriarepresent my compromisebetween brevity and completeness.They seem to me to cover adequatelythe needs of diversesituations. There are no criteriafor choosing criteria- here or elsewhere.But the readermust judge their adequacyfor any situation,also how well they would be met by the three sourcesof data. This list aimsto overcomethe commonpractice of choosingone data source,then justifying it with a singlecriterion. One directormay say: 'We use samplingbecause it is much cheaper than a census would be'; and another in a similar situation counters,'We prefer a census becauseit yields abundantgeographic detail'. Both statementsare correctbut incomplete:all criteriashould be consideredjointly for a reasoneddecision. Checks(*) are used undereach sourceto indicateits relativeadvantage according to each of the eight criteria.The relativeadvantages of samplesand censusesare complementary:one is strongerwhere the otheris weaker.These are my roughjudgments for most situations,and the user can change them to suit specificsituations. Yet the list may be useful and stimul- ating even when my checksseem arbitrary,and obviouslysubject to exceptions;for example, there are many inaccuratesample surveys, also censuseswith poor coverage. Samplesurveys can be designedto obtainwide varietiesof complexdata, rich and deep in content.They can be tailoredflexibly to fit a varietyof needs and methodsof collection.Such data are not gatheredwith completecensuses; attempts to do so would result in very high costs and also in low quality.Even less should one try to stretchthe scope of administrative registersto those tasks. The observationprocedures in sample surveyscan be directedto obtain data which are relevantand pertinentto researchand for decisions, and which are reasonablyaccurate for definedaims in many situations. That small samplescan be made inexpensiveis well appreciated.But we should also stress that samplesare and can be much more timely, and repeatedmore often - yearly,monthly, even weekly,if planned.For rapidlychanging or fluctuatingvariables this can morethan make up for their small sizes; I note epidemicsas one example and seasonal crops as another. Flexibilityin timing, in spacing,and in methodscan be had in samples.They can meet the needs for timing in seasonal activitiesin crops, employment,etc. that censusescannot. 101 Becausethey are large and complete,censuses are preciseand detailed.Precision refers to the inverseof samplingvariability; but censusesmay be more inaccuratethan samplesdue to of measurement.Censuses give data in great detail for small domains and especially for local areas,which samplesfail to provide;and this is probablytheir principalcontinuing utility. But even here we sound severalcautions. First, detail and precisionare both lost over time, especiallyfor unstablevariables and populations(Waksberg, 1968). (Consider epidemics, industrialoutput, and fertilityin these times of changingbirth rates!) Second, precisionfor small local areas is subjectto randomfluctuations in time, and to relativelylarge correlated errorsof enumerators(Hansen, Hurwitz, and Bershad,1961). Censusesseem often (though not always, nor necessarily)to obtain better coveragethan samplesurveys. Thus they tend to be more inclusivein populationextent than samplesurveys. This is partly becauseit is less difficultto check completethan samplecoverage, but mostly because of the credibilityaroused by the public relationscampaigns for censuses.These can greatlyimprove censuses; though the propagandacan prejudicesome attitudesurveys - and even some census counts. But content- as distinctfrom the extent- of the surveypopulation can be controlledand directedtoward aims better in specializedsurveys than in complete censuses,which must have more generalaims. The definitionof the populationcan be made to suit the survey'saims, but this flexibilitymay be prohibitedby the public aspects of the census. Its very high cost (roughlyone hour's pay per capita)is the great disadvantageof complete censuses;also a reasonfor not doing them more often, or with greaterdepth and richnessof data. Even if money were available,human and technical resourceswould be difficultto marshalfor the sporadicefforts; even currentcensuses face formidableobstacles. Furthermore, beyond financialcosts and strains on resources,we should consideralso the social costs of the burdens on the time and attention of the respondents.The burden of public relation campaignsshould be noted; but the campaignsmay also contributeto nationalsocial cohesion, and yield benefitsin self-assessment.Yet this may perhapsbe done betterif not repeatedtoo often. Another advantage of samples- overlooked but worth attention- is the confidentiality impartedby randomselection. With the identificationof respondentskept confidential,the source of individualresponses is hiddenin the publishedresults. Confidentiality requirements are posing obstacles and limits to statistics for local areas and small domains. For these reasons deliberaterandom errors are being proposed- and these will tend to underminethe principalarguments for completecensuses. The contrasts of costs and of other features are greaterfor large populations, because desired sample sizes tend to be fixed ratherthan proportionalto populationsizes. Thus a sample of 10 000 dwellingsis only 1: 10 000 in a country of 100000 000 dwellings,but it would be a sample of 1 : 10 from 100 000. Thus the contrastsof sample/censusin costs and in detail have differentdimensions in differentsituations; and those also affect the other criteria.

3 Administrativeregisters Registers exist for a great variety of purposes in a bewildering variety of quality. Vital registers in Northern European countries (and few other places) have excellent coverage of basic population data, but not in most countries. There are also registers for voting, social security (welfare), medical care; lists for special populations in education, fertility clinics, hospitals. Also lists for utilities like electricity; telephones cover a large portion of dwellings in some places, but much less in others. And so on. Registers can often be competitive sources to samples and censuses. Financing comes from 102 theirprincipal functions, and statisticaluses are only auxiliary.Information may exist in files, on cards;or on tapes and discs more recently,which can make their use especiallycheap - so cheap that samplingmay not be needed. Of course, society pays for the true costs somehow, but with otheraims in view. Changingthe techniquesof collectionto providegood (complete, accurate,diverse) statistics would entail great additionalexpenses. Data from registers,like censuses,are detailedand precise.However, their accuracyvaries greatly:electric bills and reportedwages may be more accuratethan interviews;but birthand death registersare known to be woefully inaccuratein many countries.The data are often timely and up-to-date.The population coverage may be good or poor, and 'compulsory' reportingdoes not mean good coverage.Think of school recordsand of the historyof estab- lishingcompulsory education in many countries.Many registers(like utilities)may have high coverageor low, and their contentsmay also differgreatly from the populationsone needs. Primarily,data from registersdiffer from other sources in being far from rich, but very inexpensive. That leads to an attractiveprospect: to utilize registers,especially registerswith good coverage,to obtain richerdata. Registersare financedto serve the needs of individuals(like medicaland utility lists) or of society (like taxes and militaryservice); it seems attractiveto obtainstatistical data for 'little'extra cost. Withoutthose registerswe wouldnot havehistorical ,because budgets just for statisticaldata are not old. However,we should be more carefulwith our enthusiasm:how good and how cheap are the data? And these two qualities may well conflict. Obtainingand keeping data involves administrativecosts, plus social burdenson the respondents.The richnessof the data should be severelylimited, becausetheir qualitywould sufferfrom being a fringe activity of clerks and respondents,whose interestsare elsewhere. I believe we should separatepersonal data from statisticaldata. Personaldata are needed for the individual'sown use; should be confidential;need individualidentification; and can have individualizedform and content.In full contrast,statistical data areneeded for popula- tion aggregatesand averages;for publicuse; should be anonymous;and must be standardized in form and content. Because of those conflicts I believe that samplingshould be used to collect the statistical data which are not needed for personal data on registers.Sampling should get better and richerdata from speciallytrained personnel, and cheaperdata because of great reductionsin size of the effort.

4 Samplesconnected with censuses Methodsof surveysampling have a largeliterature that needs no summaryhere. The methods are oriented to designs for distinct and separatesurveys, but those methods have general applicability.Yet the applicationsof those methods to samples connected with censuses deservesome specialattention here; these samplesdiffer from surveysbecause of theirdouble roots. They share methods, techniques,and theory with the survey samples.But their con- nection with censusesgives them both specialfunctions and special advantagesin funds and in resources.Hence they often have largesizes; especiallyin class I, as noted below. But they also sharewith censusessome inflexibilitiesin timing,also in the contentsof theirpopulations, and especially in the restrictions on data that official censuses can afford to collect without jeopardizing the wide cooperation they need and get. The list of available methods should be inspected for their possible utilization and practical utility. If needed, any of the methods may be placed on one of four levels of availability for any specific situation, referring either to actual or to potential use: (1) successfully used; (2) used but not successfully or adequately; (3) not used though available; and (4) resources not now available. Needed methods at levels 2, 3, or 4 should be examined for possible transfer 103 to level 1. But methods not needed badly enough currentlybelong elsewhere.On the other hand, methods which would be needed, but are not now available,should deserve special considerationsfrom appropriatetechnicians. Most of the methodslisted here have appearedin the literatureon samplesconnected with censuses.Those methods are describedin some detail, with justificationsand procedures,in the references;especially in Gurney and Manno (1976). Our treatmentis much briefer,but our list is more completethan those in the references. Thesereferences are for the reader'sconvenience, and are far from complete.A documented searchof the literaturewould be too difficult,and it would undoubtedlyrun into conflictsof priorities,and into obscure and difficultdescriptions. Furthermore, a reportedsuccess (or failure)in any situationmay be due to special and unknown circumstances,resources - or even to mistakenperceptions; the historyof one officemay not be the best guide to the future of another.

Table 2 Samplesconnected with censuses

1. Sample enumerationsto supplementcomplete censuses: (a) Obtain richer,more diverse, detailed, deeper data (b) Reduce costs of collection and of tabulation (c) Reduce aggregatesocial burden on respondents

2. Samples added to complete censuses to evaluate and to improve them: (a) Evaluationstudies of content (Post EvaluationStudies) (b) Coveragechecks; dual coverage (c) Pilot studies of questions and techniquesbefore the census (d) Quality control of individualenumerators, coders, processors 3. Samples from census records,microfilms, tapes: (a) Early (advanced,preliminary) tabulation and release (b) Complex, multivariateanalyses of relations (c) Public use tapes for further,deeper analyses, without identification 4. Census as auxiliarydata for samples: (a) Data for selections: measuresof size, stratifiers,maps of enumerationareas; seldom addressesor names (b) Data for improvedestimation with ratios, regressions (c) Samples added to censuses to serve as bases for continuingsurveys 5. Joint uses of several sources: (a) Currentestimates for local areas and small domains (b) Rotating (moving) monthly samples of 1/120 (or weekly 1/520)

Therehas been great recentgrowth in the use of samplesconnected with censuses,and in 1980 probablymost countrieswill use one or more of the methodslisted in Table 2. Their uses seem to be very uneven, more uneven probablythan dictatedby genuinedifferences in objectivesituations. There seems to be a great deal of arbitrarinessin what methods have been used and where, dependingon the choices of statisticiansand on the decisions of ministers. The presentationaims to help future choices and decisions. I hope especiallythat it can stimulatethe combineduse of severalmethods, where feasible. The list (Table2) is meant to be complete,and the descriptionsare briefbut, I hope, adequateas remindersfor readerswho are acquaintedwith the methods,and can look up detailselsewhere. My five classesfor 15 methodsare somewhatarbitrary but useful,I hope. The classification is by differentpurposes, and notes will be made later about differentprocedures which may 104 be used. The timing of the samplesmay be before, during,or after the census. The sample schedule may be added to the census schedule or be separatefrom it; and they may be independentfrom or be dependenton the censusschedule. The samplingunits may be house- holds, enumerationareas (EA's), or administrativeunits. Some of the possible combination of proceduresand purposesare betterthan others.

4.1 Sampleenumerations as portionsof andsubstitutes for completecensuses On completecensuses each questionis expensive,because it is multipliedby the sizes of the populations(of households,persons, farms, etc.). Hence, censusesare kept brief and simple, sometimeswith precodedor easily coded items, to save costs of collection and tabulation. And - we add again- to reducethe respondentburden on society.However, ever more diverse data are being obtainedwith sampleswhich are portionsof the entirecensus. These samples are substitutesfor completecensuses; hence, they tend to be large, ranging from perhaps 1: 100 to 1:4 of the complete census. These are very large comparedto the usual survey samples,yet they can achievemost of the savingsthat samplingfrom censusescan yield. It may be both rationaland feasible to use two or more samplesfor diversepurposes. A widelyspread and large(1 : 4 to 1 : 10) samplemay be desiredfor items neededin greatdetail and for local areas,and for items which are not too difficultto obtain. On the other hand, smaller (1:100 to 1: 20) and more concentratedsamples may sufficefor difficultdata, for nationalstatistics and for large domains;special enumerators may be used for those difficult data. Whenseveral samples are used to get differentdata, there is a conflictconcerning spreading them over many differentunits versusconcentrating them in all the same units. Spreadingthe schedulesavoids the concentrationof respondentburden. But concentratingthem reducesthe total cost, and yieldsmore information in the relationsbetween the sets of variables.Thus, for example,one may design, as additionto a completecore census, one sampleof 10 per cent, denotedby (AB), whichcombines both the (A) easier,and the (B) more difficultitems. Alter- nativelythe two samplesmay be separatedinto (A) a 10 per cent sampleof easieritems, and (B) a 2 per cent sampleof difficultitems. But we mayprefer (AB) a 2 per cent combinedsample plus (A) an 8 per cent sampleof easieritems. Furthermore,this last (A) may be an 8 per cent selectionof householdsin all EA's done by the regularenumerators, with the (AB) added as a 2 per cent sampleof EA's by specialenumerators. Reasonablesample plans should be influencedby three importantfactors: the numberof visits to the household (or other respondentunits); the kinds of enumeratorsused for the completecensus, and for the sample; and the nature and size of the EA's. The numberof visits to the householdare causedby threecensus tasks: (a) prelistingof all householdswithin the EA's, (b) plus the completecensus of core questions,(c) plus the sampleenumerations. The three tasks may be done in one visit (abc), to save enumerators'travel and respondent burden.Or the prelisting(a) may be separatedfrom a combinedvisit (bc) for the complete and samplecensuses. The sampleschedule may incorporateand replacethe censusschedule in plans (bc) and (abc), or the two may be given separatelyon the same visits. Or prelistingplus censusvisit (ab) may be separatedfrom samplevisit (c) and schedules.Or all threevisits may be separate (a)(b)(c). Each plan has been proposed somewhere. Methods and units of selection may also vary for diverse situations. Selecting individual persons is too bothersome; and it is not worthwhile, because individuals of the household tend to enter different cells in the tabulations by age, sex, etc. But households may be selected separately to spread the sample widely in all (or many) areas. Selecting households and using a combined schedule on single visits would involve all enumerators, hence may not be desir- able for more difficult items. Selection of households also poses problems, and it is best done after prelisting households, which may involve additional costs. However, if the sample 105 questionnairesare given after the census, special enumeratorsmay be sent to the sample of households. However, the travel of speciallytrained enumerators may be reducedand single visits to sample households obtained, when enumerationareas (EA's) are used for selecting the sample.This procedureconcentrates the sampleand gives less detail (greaterfluctuation) for local areas. But this fluctuationis less importantwhere the EA's are smaller.It is also less importantfor many statistics which are less needed for local areas. If selections of entire EA's can be toleratedthey are simplerto administer,because selection of householdscan cause problemsand biases.It is less desirableto use largerunits, administrativedistricts, for sample enumerations. Diversepossibilities on severalfactors could lead to a greatvariety of possibledesigns; but consideringconnections between the factors, five designsmay seem most salient, each with its own advantagesand its own faults. 1. Householdsselected in all EA's on a single visit cut down the travelingcost and res- pondent burden,and spreadthe sample into all EA's. However, selectionbiases seem inevitableand often bad. Furthermore,by precludingthe use of special enumerators, the observationalbiases of the regularenumerators also seem inevitable. 2. Householdsselected in all EA's afterprelisting can be made, with care,to avoid (reduce) selection biases, and also to spreadthe sample. Expensesare increasedby the special samplevisit, especiallyif specialenumerators are sent to all the EA's. 3. Selection of complete EA's can avoid the extra expenses,eliminate selection , and permiteffective utilization of specialinterviewer. I also like it for its simplicity.Of course it increasesthe variancesof statistics,because of clusteringboth of characteristicsand of interviewereffects. This increaseof the variancemust be faced realistically.(a) It may be importantin statisticsfor small geographicaldomains based on a few EA's. (b) For nationalstatistics and for majorprovinces it may be less importantthan a reducedbias. (c) For statistics of most domains (like age, occupation,education), which are not geographicand cut across EA's, the increasein the varianceis less. (d) Correctionwith ratio estimatesto the full sample can become more effectivehere than in household selection, and can greatlyreduce the clusteringeffect. 4. Two-stage selections of EA's then of households may bring effective compromises between(2) and (3). But it may be too complexfor some situations. 5. It may be effectiveto use differentmethods in differentareas. Especially, in urbanareas (2) may not be too expensive,and (3) reservedfor ruralareas where homogeneity within EA's may be less.

Finally three technicalpoints about the conflict of varianceversus bias in selectingEA's versus households.First, that ratio adjustmentto the completecensus may reducemuch of the selectionbias of method(1), or the highervariance of method(3). Second,that the conflict is greaterwith smallthan with largesampling fractions. Third, that the clusteringof method(3) makes the computationof variancesmore important.The 'designeffect' of clusteringmay be large for some global nationaland local statistics;it will be much less for statisticsbased on small domainsthat cut across the EA's. On the other hand, 'designeffects' are bound to be small,probably negligible, in most situationswhere households were selectedwith method (1) or (2). We may note here the double function of enumerationareas for censuses;they partition the areaof the populationwith clear,stable, and identifiable boundaries to facilitateits complete 106 and unique coverage;and they also attemptto create equal and feasiblework loads. These two aims often conflictand need to be harmonizedand compromised.In addition,EA's are also used for samplesurveys as well as for censussamples, for both of which smallerunits are preferred.The compactpopulations of EA's are more or less homogeneous;they also suffer fromthe homogeneityof correlatedbiases of enumerators.These factors need to be considered beforeusing them for samplingunits. Neverthelessthey are convenientfor samplesconnected with censuses.In separatedsample surveys they are usuallysubsampled. Much variationexists betweencensuses in the averagesize of EA's used. SmallerEA's are more efficientfor complete(compact) sampling units. Smallerunits can also be used to reduce the effects of correlatedresponse variances of enumerators;this can be done either by res- trictingthe workloadsto singleEA's or by spreadingthese among different local areas.Smaller EA's can also be used for flexibilityin assigningunequal workloads to enumerators. Thereis no need to make each enumerator'sworkload exactly one EA; betterboundaries and flexible workloadsmay be had by combiningEA's into workloads.On the other hand, larger units permit choosing better boundaries.They may also be more convenientto administer uniformly.Finally, larger EA's are likely to have smallerco-efficients of variationper EA but not per area, and probablylarger variances per EA. The size of EA's should also be relatedto the type of enumeratorsused, and to the kind of employmentand the form of paymentused. All of those can vary a great deal. They can be volunteers,or temporaryemployees, or a class of employees (teachers,civil servants)on temporaryassignments. Basic needed qualificationscan be varied accordingto need and to availablelabor supply. Those factors should affect the size of the EA's, the workloads,the period of the census,and also the feasiblecomplexity and qualityof the data collected.They shouldalso influencethe possibleselection of specialenumerators for the sampleenumeration and for evaluationstudies. 4.2 Samplesadded to censusesto evaluateand improvethem Whereasthe portionsabove substitutefor and resemblecensuses, samples added to improve censuses differ more from both of them. Those samples are usually smaller,perhaps from 1: 100 down to 1 : 1000, or even to 1 : 10 000. Post-enumerationstudies (PES) have been used to evaluateand check the qualityof census enumerationsto estimatebiases, and to measureresponse errors. In some versionsthe PES enumeratorsare given the censusresponses for their samplescases, then they use them to get the 'best' answerswith more and betterquestions. In otherversions, the PES interviewersare kept ignorant of census responses in order to get PES responses 'independent'of them. However,independence is not complete,because the respondentshave not entirelyforgotten. Reconciliationof the pairs of responsesfor a 'best' answercan come later. Checks for completenessof coverage would usually follow the first version: the check enumeratorswould have a list of units in definedareas, and then try to find missed units. Checks for coverageindependent of the census are less likely. On the other hand, sample studies using the techniquesof 'dual coverage'(Marks, 1974) for estimatingundercoverage are possible where lists of households(or other units) are availablefrom some source. The procedures for and coverage from this source should be quite different from the census methods for the technique to be fairly effective. Instead of a PES done after the census, a sample of high quality enumeration may be done simultaneously with the census. (Quality checks before the census seem unlikely.) A sample of EA's may be covered with better methods, better enumerators, longer , instead of the census methods used in the remainder of the country. The extra expense is less than with double coverage of sample areas, and the respondent burdens of double interviews are avoided also. The contrast of these check areas with areas covered by census methods 107 yields estimatesof the net bias. These estimatesof the net differencesfrom the sample/census comparisonshave highersampling errors than with double coverage.Also the method lacks estimatesfor gross errorswhich may be obtainedfrom double coverage.But on the whole this is a simplerand cheapermethod. The sampleareas should be selectedwith carefulmatching (stratification)of control areas. The samplingunits for quality checks are more likely to be EA's or administrativedistricts than households,because these would be inconvenient. It may be possible to use for content and for coveragestudies the same sampleof EA's as for a sampleenumeration; or the evaluationEA's may be a subsampleof the enumeration EA's. However,it may be difficultto insist on good samplesof the entire countryfor quality checks.It may be necessaryto make inferencesfrom restrictedareas of the population,in the expectationthat the results found there holds for the nation. Perhaps some 'experimental' design for representingdiverse conditions can give adequateresults. But such results seem convincingonly if the sample/censusdifferences are similar.If those differencesvary we prob- ably lack strongmodels for estimatingaverage differences over the population. For pilot studies it has been common practiceto choose areas which are convenientand believedto yield a good test of questionsand of techniques.This approachassumes that areas of difficultyare well known.It may be usefulto supplantor supplementthe subjectiveapproach with a pretest based on a probabilitysample; this has been done in Hungary.This gives a 'Micro-census'in advance of the census; it may be investigatedespecially in difficultareas, perhaps,with increasedsampling rates. Evaluationsurveys are designedto check the averagequality of the census and of its major components.Quality control and correctionof individualenumerators are differentmatters; they need specifictreatment suited to actualfield conditions,and to proceduresof supervision. The qualitycontrol of editors and coders in the office is anotherspecialized matter we may omit to treat here.

4.3. Samplingfrom census records, microfilms, tapes Whereearly tabulationsand releasesare wanted, it is convenientto base them on selections of entireEA's (or even administrativedistricts) in accordwith the system of returnsfrom the field collection. The selections should be predesignatedand speeded along. They should representgood and valid samples;not merelythe firstarrivals which are boundto be a biased portion of the population.Perhaps methods should be prespecifiedfor substitutingfor a few late arrivals.The need for these early tabulationsis less for modernsituations and procedures which are completelymechanized, such as prepunchedvariables. Continuingadvances both in statisticaland in computingmethods have madepossible more complexanalyses of censusdata. Demandsexist for deepermultivariate analyses of relations. For some of these it is convenientto select samples from the entire census to reduce com- putations; though the need for samplingmay be reduced with faster machinesand better programs. Publicuse tapes are also preparedfrom censusrecords for the use of researchers.Identifying dataare removed from the tapes,and a randomselection helps greatly to preventidentification. Samplesof householdsare preferredfor these uses; spreadingthe samplereduces the level of sampling errors, and it also facilitates the estimation of those errors by avoiding clustering. Households are easier to select than persons, and they provide samples of persons, families, and households. The clustering of individuals in households matters little in analyses which seldom group multiple members of the same households.

4.4 Censuses as auxiliary data for samples When comparing the advantages of sampling, we must also remember that good samples need and are based on census data. These are the chief sources in the selection process for 108 measuresin size for PPS sampling,and for stratifyingvariables; also for maps and other aids for samplingunits, such as EA's. But addressesand names from censusesare seldom used, both because of confidentialityand because of their obsolescence.Census data can also be used to improvestatistics, especially through ratio and regressionestimates. Some of the samplesdescribed in earliersections could be used directlyas bases for con- tinuing surveys.This is especiallytrue for samples of EA's, or largerunits used for longer questionnairesor for qualitychecks. These could then have directlinks with the census. I end here with the suggestionthat it may be useful to have a simple design that uses the same sampleof EA's for pilot studies(IIc), sampleenumeration (I), evaluationof contentand coverage(IIa and IIb), early tabulation(IIIa), and for continuingsurveys (IVc). This master sample can be augmentedwhere needed, especiallyfor the sample enumeration.

5 Joint uses of severalsources Censusdata are usuallyobsolete, data from registersinadequate, and sampledata lack detail, especiallyfor local areas. Since the strengthsand weaknessesof the three sourcesare com- plementary,it seemsreasonable to try to combinethe strengthsof the three sourcesto obtain estimatesfor small domains,especially for local areas;estimates which are current,pertinent, and accurate.To the generalneeds of researchershave been addedthe needs of socialplanners, of administrators,and of policy makersfor valid, currentdata for small domainsand local areas. Local area estimation has become a fast developing field, being pushed by increasing demands,as well as being simultaneouslypulled along by new developmentsin computing technologyand by new statisticaltechniques. Demographershave combinedcensus data with registersof births,deaths, auto and school registrations,etc. There are 'componentmethods' (U.S. Bureau of the Census, 1966), and 'ratio-correlationmethods'. Census and sampledata may be combinedin 'syntheticestimates' (Gonzalesand Hoza, 1978), and all three sourceswith a regressionmethod (Ericksen,1973, 1974).A new reviewof all these and other methodswith extensivebibliography is by Purcell and Kish (1979). There appearsto be no uniquelybest procedure;results dependnot only on the neededstatistics, but especiallyon the qualitiesof the threesources; and those qualities vary widely from situationto situation. I also believethat we may have futuredesigns to obtain the detaileddata of censusesfrom rotatingsamples. For example,a rotatingmonthly sample of 1 : 120 can cover the nation in 10 years. If needed to measuremonthly changesyou can have samplesof 1: 60 with 50 per cent overlaps.The collectionperiod may be spreadover the entiremonth, or be confinedinto representativeweeks. The problemsof local areasmay also be approachedfrom currentlyavailable samples. We must distinguishthe periodsof collection,reference, and reporting.For example,suppose the collection period is the first week of each month, and the referenceperiod is the previous month. A monthly sample of 1: 400 would yield yearly samples of 12:400 = 3:100, with- out overlapsor 1.5:100 with overlapsof 1/2. When properlydesigned, such samplescould yield useful data annually or biennially for small domains and local areas. Their precision could be further improved with the methods noted above.

References Des Raj (1972). The Design of SampleSurveys. McGraw-Hill, New York. Sections 1.8-1.9, 19.1-19.14. Ericksen,E.P. (1973).A methodfor combiningsample survey data and symptomaticindicators to obtain population estimatesfor local areas. Demography,10, 137-160. 109

Ericksen, E.P. (1974). A regressionmethod for estimating population changes in local areas. Journalof the AmericanStatistical Association,69, 867-875. FAO (1969). Reporton the 1960 WorldCensus of Agriculture:Methodology. Rome: FAO. ChapterIV, 69-104. Gil, B. and Ghansah, D.K. (1968). Design and concepts of population censuses and sample surveys in a developing country- Ghana. In Caldwell, J.C. and Okonjo, C., eds, The Populationof TropicalAfrica. London: Longmans. Gonzalez, M.E. and Hoza, C. (1978). Small area estimation with application to unemploymentand housing estimators.Journal of the AmericanStatistical Association,73, 7-15. Gurney, M. and Manno, C.S. (1971). New Florencia: A Case Studyfor the 1970 Censusesof Populationand Housing, Unit IV, Sampling.Washington; U.S. Bureau of the Census, Series ISP3, No. le. Hansen, M.H., Hurwitz, W.N. and Bershad, M.A. (1961). 'Measurementerrors in censuses and surveys'. Bulletinof the InternationalStatistical Institute, 38/2, 359-374. Hill, K. (1977). 'Censusdata requiredfor indirectmethods of estimatingdemographic parameters.' Population Bulletinof the UnitedNations EconomicCommission for WesternAsia, No. 13, 56-66. Kish, L. (1965). SurveySampling. New York: John Wiley. Section 1.3, 13.1-13.6. Marks, E.S., Seltzer, W. and Krotki, K.J. (1974) PopulationGrowth Estimation. N. York: PopulationCouncil. Purcell, N.J. and Kish, L. (1979). Estimationfor small domains. Biometrics,35, June. United Nations Statistical Office (1969). Principlesand Recommendationsfor the 1970 PopulationCensuses, ST/STAT/SeriesM/44, pp. 10.13. United Nations Statistical Office (1978). Draft Principlesand Recommendationsfor Populationand Housing Censuses,E/CN.3/515. United Nations Statistical Office (1979). Recommendationsfor the 1980 Populationand Housing Censuses, forthcoming. United StatesBureau of the Census(1960). The accuracyof census statisticswith or without sampling.Technical Paper No. 2. United States Bureau of the Census (1966). Methods of population estimation: Part I, illustrativeprocedure of the Census Bureau'sComponent Method II. CurrentPopulation Reports, Series P-25, No. 339. Waksberg, J. (1968). The role of sampling in population censuses: its effect on timeliness and accuracy. Demography,3, 362-373. Waksberg, J. and Hanson, R.H. (1965). Sampling Applicationsin Censuses of Population and Housing. Washington:U.S. Census Bureau,Technical Paper No. 13. Yates, F. (1953). SamplingMethods for Censusesand Surveys.2nd ed. London: Chas. Griffin & Co. Sections 1.2-1.4. Zarkovich,S.S. (1965). SamplingMethods and Censuses.Rome: FAO. Chapters1-6.

Rpsumi

Deux sujets lies sont passes en revue rapidementmais avec comprehension,en vue de guiger les decisions sur trois sources de collecte de donnees. D'abord on compare les avantages respectifs des 'chantillons, recense- ments et fichiers, suivant huit critbres:cofit, details, d6lais,convenance, .... En second lieu, on indique quinze methodes pour l'emploi d'echantillonsen liaison avec des recensements,triees selon cinq objets: pour se sub- stituer, ou pour aider aux recensements;sondage sur les bandes magnetiquesde recensements;recensements comme sources d'informationspour les sondages. Finalement les voies habituelleset futures, en vue de com- biner les moyens des trois sources, ont 6t6 indiqudes,en vue d'obtenirdes estimations qui soient A la fois A et suivant pr6cises jour d6taill6es de petites regions et de petits domaines d'6tude.