A Geocoding Best Practices Guide

A Geocoding Best Practices Guide By Daniel W. Goldberg November 2008 University of Southern California GIS Research Laboratory SPONSORING ORGANIZATIONS: Canadian Association of Provincial Cancer Agencies Canadian Partnership Against Cancer Centers for Disease Control and Prevention College of American Pathologists National Cancer Institute National Cancer Registrars Association Public Health Agency of Canada SPONSORS WITH DISTINCTION: American Cancer Society American College of Surgeons American Joint Committee on Cancer North American Association of Central Cancer Registries, Inc. A GEOCODING BEST PRACTICES GUIDE SUBMITTED TO THE NORTH AMERICAN ASSOCIATION OF CENTRAL CANCER REGISTRIES NOVEMBER 10, 2008 BY DANIEL W. GOLDBERG UNIVERSITY OF SOUTHERN CALIFORNIA GIS RESEARCH LABORATORY This page is left blank intentionally. D. W. Goldberg TABLE OF CONTENTS List of Tables .............................................................................................................. vii List of Figures .............................................................................................................. ix List of Equations .......................................................................................................... x List of Best Practices ................................................................................................... xi List of Acronyms ........................................................................................................ xiii Foreward .................................................................................................................... xiv Preface ......................................................................................................................... xv Acknowledgements .................................................................................................. xviii Dedication .................................................................................................................. xix About This Document ................................................................................................ xx Executive Summary ................................................................................................. xxiii Part 1: The Concept and Context of Geocoding ........................................................... 1 1. Introduction ............................................................................................................................. 3 1.1 What is Geocoding?....................................................................................................... 3 2. The Importance of Geocoding ............................................................................................. 9 2.1 Geocoding's Importance to Hospitals and Central Registries ................................ 9 2.2 Typical Research Workflow ......................................................................................... 9 2.3 When To Geocode ...................................................................................................... 14 2.4 Success Stories .............................................................................................................. 17 3. Geographic Information Science Fundamentals .............................................................. 19 3.1 Geographic Data Types .............................................................................................. 19 3.2 Geographic Datums and Geographic Coordinates ................................................ 19 3.3 Map Projections and Regional Reference Systems ................................................. 20 Part 2: The Components of Geocoding ...................................................................... 23 4. Address Geocoding Process Overview ............................................................................. 25 4.1 Types of Geocoding Processes .................................................................................. 25 4.2 High-Level Geocoding Process Overview ............................................................... 25 4.3 Software-Based Geocoders ........................................................................................ 26 4.4 Input Data ..................................................................................................................... 28 4.5 Reference Datasets ...................................................................................................... 31 4.6 The Geocoding Algorithm ......................................................................................... 32 4.7 Output Data .................................................................................................................. 33 4.8 Metadata ........................................................................................................................ 34 November 10, 2008 iii A Geocoding Best Practices Guide 5. Address Data ......................................................................................................................... 37 5.1 Types of Address Data ................................................................................................ 37 5.2 First-Order Estimates .................................................................................................. 41 5.3 Postal Address Hierarchy............................................................................................ 41 6. Address Data Cleaning Processes ...................................................................................... 45 6.1 Address Cleanliness ..................................................................................................... 45 6.2 Address Normalization ............................................................................................... 45 6.3 Address Standardization ............................................................................................. 50 6.4 Address Validation ....................................................................................................... 51 7. Reference Datasets ............................................................................................................... 55 7.1 Reference Dataset Types ............................................................................................ 55 7.2 Types of Reference Datasets ...................................................................................... 55 7.3 Reference Dataset Relationships ............................................................................... 65 8. Feature Matching .................................................................................................................. 69 8.1 The Algorithm .............................................................................................................. 69 8.2 Classifications of Matching Algorithms .................................................................... 71 8.3 Deterministic Matching ............................................................................................... 71 8.4 Probabilistic Matching ................................................................................................. 78 8.5 String Comparison Algorithms .................................................................................. 80 9. Feature Interpolation ............................................................................................................ 83 9.1 Feature Interpolation Algorithms .............................................................................. 83 9.2 Linear-Based Interpolation ......................................................................................... 83 9.3 Areal Unit-Based Feature Interpolation ................................................................... 90 10. Output Data ........................................................................................................................ 93 10.1 Downstream Compatibility ...................................................................................... 93 10.2 Data Loss .................................................................................................................... 93 Part 3: The Many Metrics for Measuring Quality ...................................................... 95 11. Quality Metrics .................................................................................................................... 97 11.1 Accuracy ...................................................................................................................... 97 12. Spatial Accuracy .................................................................................................................. 99 12.1 Spatial Accuracy Defined .......................................................................................... 99 12.2 Contributors to Spatial Accuracy ............................................................................ 99 12.3 Measuring Positional Accuracy .............................................................................. 104 12.4 Geocoding Process Component Error Introduction ......................................... 104 12.5 Uses of Positional Accuracy ................................................................................... 105 13. Reference Data Quality .................................................................................................... 111 13.1 Spatial Accuracy of Reference Data .....................................................................

A Geocoding Best Practices Guide

Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

Georeferencing Accuracy of Geoeye-1 Stereo Imagery: Experiences in a Japanese Test Field

Knowledge-Driven Geospatial Location Resolution for Phylogeographic

An Iterative Approach to the Parcel Level Address Geocoding of a Large Health Dataset to a Shifting Household Geography

Introduction to Georeferencing

Georeferencing

A Comparison of Address Point and Street Geocoding Techniques

TAGGS: Grouping Tweets to Improve Global Geotagging for Disaster Response Jens De Bruijn1, Hans De Moel1, Brenden Jongman1,2, Jurjen Wagemaker3, Jeroen C.J.H

A Cross-Sectional Ecological Analysis of International and Sub-National Health Inequalities in Commercial Geospatial Resource Av

Structured Toponym Resolution Using Combined Hierarchical Place Categories∗

Toponym Resolution in Scientific Papers

Geocoding Guide for Slovakia Table of Contents