The Household Survey In The German Census 2011

Wolf Bihler, Dr. Andreas Berg

NTTS Bruxelles, 22 th of February 2011 Objectives

Two Objectives: „ Estimation of over- and under-coverage of the German population registers and therefore indirectly the official number of people living in every municipality over 10.000 inhabitants. „ Statistical correction of population register data Population=people in registers-overcoverage+undercoverage „ Results for demografic variables will be enumerated from corrected population registers „ Estimation of variables, not available in registers (for instance employment, migration)

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 2 Sampling frame Sampling frame

„ Address and building register (AGR) by 1.9.2010 „ AGR contains data from municipalities population registers, georeferenced data and data from the German Federal Employment Agency „ Sample unit address „ Address with non-zero probability to be drawn: only addresses of residences exclusive sensitive addresses (addresses relevant for sampling)

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 4 Sampling frame

Addresses relevant for sampling: „ Non-sensitive special addresses „ Addresses from population registers if positively tested as a residence address „ Addresses, found in the geo-referenced data and as well from sources related to the German Federal Employment Agency „ Addresses, only from one of the latter sources but positively tested as a residence address

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 5 Sampling frame

AGR sensitive 27,7 Mill. Special 36.300 Addresses addresses addresses 55.900 Non- addresses sensitive 19.500 addresses

Population register non population addresses Register 19,1 Mill. addresses addresses 859.000 addresses

Sampling frame 19,9 Mill. addresses

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 6 Requirements for sampling design

„ Sampling unit address; all individuals living at a drawn address will be surveyed „ Only in this way over- and undercoverage can be identified after comparing with existing people in the population registers „ Enumeration unit is the individual „ Sample size (People living with first or secondary addresses): 7,853 Mill. (9,6% of the official number of people living in by 31.12.2009)

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 7 Requirements for sampling design

„ Precision requirements and regional units determined by law need to be kept in mind for results „ Research project managed by Uni Trier/GESIS Mannheim: Recommendations for sampling method

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 8 Stratification Stratification

„ Classical method with well-known theory carried out on two stages „ Stratification variables „ First stage -> regional Stratification „ Second stage -> Classes of address size

„ Allocation on the base of the official number of people living in Germany by 31.12.2009

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 10 Special addresss vs Non special addresses

„ Split because of different concepts

„ Relatively small number of nonsensitive special addresses

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 11 Stratification, first stage

„ Example: NUTS 2, 3

„ Recommendation of the sampling project for the German Census 2011, Demands from Rheinland-Pfalz, large cities, precision restrictions

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 12 Precision restrictions

„ Regarding total number of inhabitants: „ Municipalities with more than 10.000 inhabitants and city districts of cities with more than 400.000 inhabitants: relative coefficient of variation <=0,5%

„ Regarding other variables: precision restrictions as well on NUTS 3 level and for „Verbandsgemeinden in Rheinland-Pfalz“

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 13 Precision restrictions

„ => practical implementation: for Gemeindeverbände in Rheinland-Pfalz in general:

relative coefficient of variance <=0,5%

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 14 Sampling points

„ Regional stratification into “Sampling Points” (SMP’s) „ Districts of cities with more than 400 000 inhabitants (Typ 0) „ Municipalities from 10 000 inhabitants (Typ 1) „ Gemeindeverbände and remains of Gemeindeverbände respectively with more than 10 000 Einwohner (Typ 2) „ NUTS 3 remains(Typ 3)

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 15 Number of sampling points by states and smp type

Smp type State Total 0 1 2 3 Baden-Württemberg 3 245 128 35 411

Bayern 10 214 30 71 325

Berlin 12 0 0 0 12

Brandenburg 0 72 6 14 92

Bremen 3 1 0 0 4

Hamburg 7 0 0 0 7

Hessen 3 165 0 21 189

Mecklenburg-Vorpommern 0 24 28 12 64

Niedersachsen 2 204 68 35 309

Nordrhein-Westfalen 16 337 0 18 371

Rheinland-Pfalz 0 45 119 21 185

Saarland 0 40 0 5 45

Sachsen 4 66 14 10 94

Sachsen-Anhalt 0 59 16 11 86

Schleswig-Holstein 0 54 52 10 116

Thüringen 0 33 5 17 55

Total 60 1559 466 280 2365

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 16 Example: Kreis -Worms

Sampling Point Smp type Inhabitants Alzey 073310003003 1 17732 VG Alzey-Land 073315001 2 24382 VG Eich 073315002 2 12488 VG 073315003 2 10124 VG Westhofen 073315004 2 11721 VG Wöllstein 073315005 2 11851 VG Wörrstadt 073315006 2 28177 07331 3 8283

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 17 Stratification, second stage

„ Within every sampling point stratification regarding 8 address size classes (registered people first and secondary residence),

„ uniform number of individuals within each class

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 18 Household Survey Census Wiesbaden

Addresses Registered individuals Address sizes

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-4 19,875 435 2,2 37,697 816 825 2,2

2 4-6 8,109 162 2,0 37,699 762 753 2,0

3 6-10 4,885 98 2,0 37,698 752 756 2,0

4 10-14 3,131 63 2,0 37,706 765 759 2,0

5 14-19 2,276 46 2,0 37,684 747 762 2,0

6 19-25 1,746 35 2,0 37,693 781 756 2,1

7 25-35 1,301 26 2,0 37,705 755 754 2,0

8 35 or more 708 70 9,9 37,720 3697 3729 9,8

Total 42,031 935 2,2 301,602 9075 9093 3,0

Erwarteter relativer Standardfehler für die Einwohnerzahl: 0,13%

Sonderanschriften 51 5 9,8 2645 175 259 6,6

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 19 Allocation of the Sample size

„ Allocation of the sample size to the strata

„ Microcensus: proportional allocation

„ => choose a form of allocation accounting for different variances between the strata

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 20 Optimal Allocation

„ Sum of squared expected relative coefficients of variation of the population (first and secondary residences from the AGR) need to be minimized for cross-classifications of sampling points and address size classes

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 21 Constraints

„ Enumeration unit is the individual, sampling unit is address

„ Number of people to be drawn (expected value) defined beforehand

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 22 Constraints

Recommendation of the research project „ Optimal Allocation with box constaints „ Lower and upper sample fraction bounds

Municipality size Sampling fraction Upper from… to… inhabitants Lower bound bound 10 000 - 30 000 5% 50% 30 000 - 100 000 4% 40% 100 000 or more 2% 40%

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 23 Household Survey Census Wiesbaden

Addresses Registered individuals Address sizes

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-4 19,875 435 2,2 37,697 816 825 2,2

2 4-6 8,109 162 2,0 37,699 762 753 2,0

3 6-10 4,885 98 2,0 3,7698 752 756 2,0

4 10-14 3,131 63 2,0 37,706 765 759 2,0

5 14-19 2,276 46 2,0 37,684 747 762 2,0 6 19-25 1,746 35 2,0 37,693 781 756 2,1

7 25-35 1,301 26 2,0 37,705 755 754 2,0

8 35 or more 708 70 9,9 37,720 3697 3,729 9,8

Total 42,031 935 2,2 301,602 9075 9,093 3,0

Erwarteter relativer Standardfehler für die Einwohnerzahl: 0,13%

Sonderanschriften 51 5 9,8 2645 175 259 6,6

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 24 Household survey Census Verbandsgemeinde Monsheim (Kreis Alzey-Worms)

Addresses Registered individuals Address sizes

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-2 1,252 626 50,0 1,317 676 659 51,3

2 2 659 33 5,0 1,318 66 66 5,0

3 2-3 443 31 7,0 1,317 93 92 7,1

4 3-4 398 76 19,1 1,319 249 252 18,9

5 4 329 16 4,9 1,316 64 64 4,9 6 4-5 302 52 17,2 1,318 222 227 16,8

7 5-6 244 38 15,6 1,318 207 205 15,7

8 6 or more 128 64 50,0 1,320 760 660 57,6

Total 3,755 936 24,9 10,543 2337 2,225 22,2

Erwarteter relativer Standardfehler für die Einwohnerzahl: 0,16%

Sonderanschriften

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 25 Sampling points not to be included in the optimization process

Allocation of the sample size „ Fixed sample fraction of 5% for all strata of Gemeindeverbände (with the exception of strata in Rheinland-Pfalz of SMP type 2) and NUTS 3 remains

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 26 Example: Kreis Alzey-Worms

Sampling Point Smp type Inhabitants Alzey 1 17,732 VG Alzey-Land 2 24,382 VG Eich 2 12,488 VG Monsheim 2 10,124 VG Westhofen 2 11,721 VG Wöllstein 2 11,851 VG Wörrstadt 2 28,177 Osthofen (Stadt) 3 8,283

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 27 Optimal allocation algorithm

Due to several different constraints the classical formula for optimal allocation need to be modified.

Implementation of an algorithm developed by University of Trier and GESIS Mannheim

Programmed in R

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 28 Special addresses

„ No sampling point allocation but aggregation on NUTS 3 level

„ Uniform sample fraction 10%

„ Constraint: At least 2 addresses need to be drawn

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 29 Household survey Census Wiesbaden

Addresses Registered individuals Address sizes

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-4 19,875 435 2,2 37,697 816 825 2,2

2 4-6 8,109 162 2,0 37,699 762 753 2,0

3 6-10 4,885 98 2,0 37,698 752 756 2,0

4 10-14 3,131 63 2,0 37,706 765 759 2,0

5 14-19 2,276 46 2,0 37,684 747 762 2,0

6 19-25 1,746 35 2,0 37,693 781 756 2,1

7 25-35 1,301 26 2,0 37,705 755 754 2,0

8 35 or more 708 70 9,9 37,720 3697 3,729 9,8

Total 42,031 935 2,2 301,602 9075 9,093 3,0

Erwarteter relativer Standardfehler für die Einwohnerzahl: 0,13%

Special addresses 51 5 9,8 2,645 175 259 6,6

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 30 Implementation Implementation normal addresses

„ Drawing of normal addresses in SAS using proc SURVEYSELECT

„ Number of sample addresses determined by optimal allocation algorithm

„ Simple random sampling without replacement in every stratum

„ Output: Address-ID’s of sampled addresses (household survey/post enumeration survey) for marking in AGR

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 32 Implementation non-sensitive special addresses

„ Drawing of non-sensitive special addresses in SAS with proc SURVEYSELECT „ Number of addresses to be drawn determined by 10%-rule „ Systematic sampling within each stratum (here: NUTS 3) „ Sorting of addresses within each stratum by address size (Avoiding extreme sampling errors)

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 33 Performance

„ Drawing of addresses in SAS using procedure proc SURVEYSELECT „ Normal addresses 15 sec „ Special addresses under 1 sec

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 34 Household survey Census Wiesbaden

Address sizes Addresses Registered individuals

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-4 19,875 435 2,2 37,697 816 825 2,2

2 4-6 8,109 162 2,0 37,699 762 753 2,0

3 6-10 4,885 98 2,0 37,698 752 756 2,0

4 10-14 3,131 63 2,0 37,706 765 759 2,0

5 14-19 2,276 46 2,0 3,7684 747 762 2,0

6 19-25 1,746 35 2,0 37,693 781 756 2,1

7 25-35 1,301 26 2,0 37,705 755 754 2,0

8 35 or more 708 70 9,9 37,720 3,697 3,729 9,8

Total 42,031 935 2,2 301,602 9,075 9,093 3,0

Expected relative coefficient of variance for total number of inhabitants: 0,13%

Non-sensitive special 51 5 9,8 2,645 175 259 6,6 addresses

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 35 Household survey Census Verbandsgemeinde Monsheim (Kreis Alzey-Worms)

Address sizes Addresses Registered individuals

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-2 1,252 626 50,0 1,317 676 659 51,3

2 2 659 33 5,0 1,318 66 66 5,0

3 2-3 443 31 7,0 1,317 93 92 7,1

4 3-4 398 76 19,1 1,319 249 252 18,9

5 4 329 16 4,9 1,316 64 64 4,9

6 4-5 302 52 17,2 1,318 222 227 16,8

7 5-6 244 38 15,6 1,318 207 205 15,7

8 6 or more 128 64 50,0 1,320 760 660 57,6

Zus. 3,755 936 24,9 10,543 2,337 2,225 22,2

Expected relative coefficient of variance for total number of inhabitants : 0,16%

Non-sensitive special addresses

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 36 Household survey Census Buxtehude

Address sizes Addresses Registered individuals

Sample Sample Sample Sample Sample From … to Population Population No. registered fraction expected fraction individuals Total Total % Total Total Total %

1 0-2 3,952 520 13,2 5,165 667 680 12,9

2 2-3 1,999 117 5,9 5,164 307 302 5,9

3 3-4 1,458 73 5,0 5,166 262 259 5,1

4 4-5 1,212 49 4,0 5,164 211 209 4,1

5 5-8 882 61 6,9 5,166 363 357 7,0

6 8-13 518 50 9,7 5,160 502 498 9,7

7 13-22 303 34 11,2 5,151 565 578 11,0

8 23 or more 95 38 40,0 5,186 1,357 2,074 26,2

Zus. 10,419 942 9,0 41,322 4,234 4,957 10,2

Expected relative coefficient of variance for total number of inhabitants : 0,54%

Non-sensitive special addresses

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 37 Sample sizes and sampling fractions for states – Normal addresses

Propor- Sample Sample Sampling Proportion Address Person Sample tion of State Residents Residents fraction of residents Population Population addresses residents expected realized addresses realized expected SH 857,132 2960,763 76,280 2,86,395 285,567 8.90% 9.67% 9.65% HH 264,780 1,750,708 6,784 62,416 62,047 2.56% 3.57% 3.54% NI 2,343,781 8,177,912 242,175 805,248 804,257 10.33% 9.85% 9.83% HB 148,580 662,811 4,063 28,136 28,760 2.73% 4.24% 4.34% NW 4,084,030 18,146,467 333,514 1,487,655 1,487,714 8.17% 8.20% 8.20% HE 1,519,560 6,333,546 185,051 733,744 733,322 12.18% 11.59% 11.58% RP 1,253,844 4,150,466 190,604 552,412 552,844 15.20% 13.31% 13.32% BW 2,553,507 10,697,384 272,235 1,135,841 1,135,623 10.66% 10.62% 10.62% BY 3,102,221 12,840,749 270,153 1,165,681 1,166,233 8.71% 9.08% 9.08% SL 318,532 1,058,720 41,353 130,908 130,684 12.98% 12.36% 12.34% BE 310,273 3,451,337 7,421 124,352 122,367 2.39% 3.60% 3.55% BB 671,536 2,547,525 74,234 297,963 297,810 11.05% 11.70% 11.69% MV 391,046 1,666,991 30,105 142,248 142,510 7.70% 8.53% 8.55% SN 902,723 4,209,897 90,390 368,976 369,788 10.01% 8.76% 8.78% ST 618,954 2,325,709 75,999 242,236 242,822 12.28% 10.42% 10.44% TH 570,327 2,245,224 48,010 195,018 194,628 8.42% 8.69% 8.67% Total 19,910,826 83,226,209 1,948,371 7,759,228 7,756,976 9.79% 9.32% 9.32%

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 38 Sample sizes and sampling fractions for states – Non-sensitive special addresses

Propor- Proportion Sample Sample Sampling Address Person Sample tion of of State Residents Residents fraction Population Population addresses residents residents expected realized addresses expected realized SH 766 32,810 77 3,284 3,056 10.05% 10.01% 9.31% HH 256 19,192 26 1,949 2,142 10.16% 10.16% 11.16% NI 2,479 92,527 255 9,514 9,253 10.29% 10.28% 10.00% HB 150 7,911 16 847 779 10.67% 10.70% 9.85% NW 3,773 198,078 385 20,252 2,0072 10.20% 10.22% 10.13% HE 1,516 60,482 151 6,029 6,241 9.96% 9.97% 10.32% RP 850 42,438 100 5,072 4,696 11.76% 11.95% 11.07% BW 3,376 139,840 339 14,067 12,900 10.04% 10.06% 9.22% BY 3,088 166,032 335 18,137 17,731 10.85% 10.92% 10.68% SL 295 10,210 29 993 1,004 9.83% 9.72% 9.83% BE 496 41,260 50 4,159 3,987 10.08% 10.08% 9.66% BB 418 27,601 47 3,125 2,904 11.24% 11.32% 10.52% MV 209 11,185 41 2,110 2,258 19.62% 18.86% 20.19% SN 850 56,656 86 5,741 6,220 10.12% 10.13% 10.98% ST 554 29,805 54 2,891 3,017 9.75% 9.70% 10.12% TH 471 26,010 58 3,186 2,801 12.31% 12.25% 10.77% Total 19,547 962,037 2,049 101,355 99,061 10.48% 10.54% 10.30%

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 39 Sample sizes and sampling fractions for states – Totals

Propor- Proportion Sample Sample Sampling Address Person Sample tion of of State Residents Residents fraction Population Population addresses residents residents expected realized addresses expected realized SH 857,898 2,993,573 76,357 289,679 288,623 8.90% 9.68% 9.64% HH 265,036 1,769,900 6,810 64,365 64,189 2.57% 3.64% 3.63% NI 2,346,260 8,270,439 242,430 814,762 813,510 10.33% 9.85% 9.84% HB 148,730 670,722 4,079 28,983 29,539 2.74% 4.32% 4.40% NW 4,087,803 18,344,545 333,899 1,507,907 1,507,786 8.17% 8.22% 8.22% HE 1,521,076 6,394,028 185,202 739,773 739,563 12.18% 11.57% 11.57% RP 1,254,694 4,192,904 190,704 557,484 557,540 15.20% 13.30% 13.30% BW 2,556,883 10,837,224 272,574 1,149,908 1,148,523 10.66% 10.61% 10.60% BY 3,105,309 13,006,781 270,488 1,183,818 1,183,964 8.71% 9.10% 9.10% SL 318,827 1,068,930 41,382 131,901 131,688 12.98% 12.34% 12.32% BE 310,769 3,492,597 7,471 128,511 126,354 2.40% 3.68% 3.62% BB 671,954 2,575,126 74,281 301,088 300,714 11.05% 11.69% 11.68% MV 391,255 1,678,176 30,146 144,357 144,768 7.70% 8.60% 8.63% SN 903,573 4,266,553 90,476 374,717 376,008 10.01% 8.78% 8.81% ST 619,508 2,355,514 76,053 245,127 245,839 12.28% 10.41% 10.44% TH 570,798 2,271,234 48,068 198,203 197,429 8.42% 8.73% 8.69% Total 19,930,373 84,188,246 1,950,420 7,860,583 7,856,037 9.79% 9.34% 9.33%

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 40 Post enumeration survey

„ Post enumeration survey for quality assessment of the official number of population „ Sample drawn immediately after household survey as a sub sample „ Large municipalities from 10 000 inhabitants „ Design similar to household survey, aggregation of address size strata if needed „ Sub-sample fraction 5% uniformly in every strata

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 41 Post enumeration survey

„ Smaller municipaltities with less than 10 000 inhabitants „ Stratification regarding 2 municipality size classes (less than 2,000 inhabitants/ more than 2,000 inhabitants) „ Within each municipality size class stratification analogous to household survey with few additional aggregation possibilities of SMPs „ For each (combined) SMP sample size 0.3 % of the official number of inhabitants by 31.12.2009 „ Allocation of sample size (addresses) uniformly to the address size classes of a (combined) SMP’s

Authors: Wolf Bihler, Andreas Berg, Destatis February 11 42 Outlook Outlook

What else need to be done? „ Additional sampling of newly detected addresses in the AGR after 1.9.2010 (carried out April 2011) „ Most important source is another data supply of the population registers by 1.11.2010 „ Probably small numbers, simple design to be proposed „ Estimation and Calculation of errors „ Implementation of recommendations from research project „ Partial implementation of small area estimators „ Two parts: „ First preliminary results, 18 months after due date „ Final results, 24 months after due date Authors: Wolf Bihler, Andreas Berg, Destatis February 11 44 Thank you for your attention!

Questions?

Andreas Berg, [email protected] Wolf Bihler, [email protected]