<<

Survey frame geocoding using administrative data sources

Mirosław Migacz Chief GIS Specialist Central Statistical Office of Dublin, 2 XI 2017 Survey frames

• Survey frame statistical units: • persons, • buildings, • enterprises, • farms. • Georeference – localization of statistical units: • facilitates field work for interviewers, • facilitates survey management, • enables survey result presentation on maps. Spatial data in official statistics

VOIVODSHIP NATIONAL REGISTER OF BOUNDARIES (PRG) TERC

NATIONAL REGISTER OF LOCALITY GEOGRAPHICAL NAMES SIMC (PRNG)

STATISTICAL STATISTICAL REGION CENSUS ENUMERATION AREA CENSUS ENUMERATION AREA BREC BOUNDARIES

ULIC STREET STREET AXES

BUILDING STATISTICAL ADDRESS NOBC DWELLING POINTS PBA – spatial address databases

January 2010 – continuously • reference May 2010 • address point since July 2010 material • reference acquisition • address point collection material database processing update January 2009 – January 2010 – December 2009 June 2010

4 PBA vs survey frames

• PBA • locations of buildings with at least one dwelling. • Survey frames • OBS – frame for social surveys, • OBR – frame for agricultural surveys, • BJS – statistical unit database (enterprises)

OBS OBR BJS

5 Survey frame geocoding

SURVEY FRAME SPATIAL DATA GEOCODING

• address data • address points • TERYT • TERYT identifiers identifiers PBA vs survey frames

PBA OBS

OBR

other sources BJS Survey frames vs data sources

PRG complementary OBS

OBR

LPIS BJS Survey frame geocoding

PBA Farm Structure OBS Survey 2016

PRG OBR

LPIS BJS Survey frame geocoding

PBA OBS

PRG OBR

Improvement of the use of administrative LPIS sources (ESS.VIP BJS ADMIN) Survey frame geocoding

Improvement of the PBA use of administrative OBS sources (ESS.VIP ADMIN) - application

PRG OBR

LPIS BJS National Register of Boundaries (PRG) PRG

address points for all buildings EMUiA

register of: - localities gmina - streets (LAU2) - addresses Register of localities, streets and addresses (EMUiA)

Address point

Street

Locality Register of localities, streets and addresses (EMUiA)

TERYT identifier – voidable

Locality

Administrative unit Register of localities, streets and addresses (EMUiA)

Locality

Street

TERYT identifier – voidable Street name Register of localities, streets and addresses (EMUiA) locality address

street Register of localities, streets and addresses (EMUiA)

Address point Gmina Locality Loc. Street Str. Addr. X Y ID ID # name Węgorzyno X Kolejowa X 1 281563,44 636550,11

Locality Gmina ID Locality Loc. ID X Węgorzyno 0980062

Street Gmina ID Locality Street Str. ID /Loc. ID X X Kolejowa 08828 EMUiA – problems

• multiple localities with the same name within one voivodeship • multiple street names with the same name within one voivodeship / gmina / locality • typing errors • completeness issues EMUiA – solutions

• assign gmina ID to localities: • pairing by locality ID with TERYT locality register • spatial join • assign gmina ID to address points: • spatial join • assign locality ID to address points: • pair by both: gmina ID and locality name EMUiA – solutions

• assign street ID to address points: • pairing by street name with the street feature class • pairing with TERYT street catalogue […] Pairing w/ TERYT street catalogue

Locality Street ID (SIMC) ID Street name variations: NAZWA_1 ULICA_1: NAZWA_2 + NAZWA_1 ULICA_2: NAZWA_1 + NAZWA_2 ULICA_3: CECHA + NAZWA_2 + NAZWA_1 (CECHA + NAZWA_1 if NAZWA_2 IS NULL) Pairing w/ TERYT street catalogue

Locality ID Street ID (SIMC) Street name variations: NAZWA_1 ULICA_1: NAZWA_2 + NAZWA_1 Pairing by: ULICA_2: NAZWA_1 + NAZWA_2 ULICA_3: CECHA + NAZWA_2 + NAZWA_1 • SIMC + NAZWA_1 (CECHA + NAZWA_1 if NAZWA_2 IS NULL) • SIMC + ULICA_1 • SIMC + ULICA_2 • SIMC + ULICA_3 • NAZWA_1 • ULICA_1 • ULICA_2 • ULICA_3 Pairing w/ TERYT street catalogue

• pairing addresses with street catalogue by street names (string) • multiple matches -> multiplying address point records • result: 13 635 270 matched address point records (initial number of address points: 7 533 868), • 275 453 (3,6%) out of 7 533 868 address points with a street name present but no street ID assigned, Survey frame geocoding

• agricultural survey frame: a bit more than half of records qualified for pairing (identifiers present) acquired georeference, • other survey frames: Q4 2017, Q1 2018 Conclusions on source data

• hope for data quality improvement over time (the PRG dataset tested is dated 13.06.2016), • other techniques for record matching in order to assign identifiers to more address points: • building an address locator for ArcGIS geocoding tools, • string distance analyses (e.g. stringdist Python module). Survey frame geocoding using administrative data sources

Mirosław Migacz Chief GIS Specialist Central Statistical Office of Poland

@mireslav

www.linkedin.com/in/migacz

[email protected]

www.slideshare.net/MirosawMigacz