A Geocoding Best Practices Guide
Total Page:16
File Type:pdf, Size:1020Kb
A Geocoding Best Practices Guide By Daniel W. Goldberg November 2008 University of Southern California GIS Research Laboratory SPONSORING ORGANIZATIONS: Canadian Association of Provincial Cancer Agencies Canadian Partnership Against Cancer Centers for Disease Control and Prevention College of American Pathologists National Cancer Institute National Cancer Registrars Association Public Health Agency of Canada SPONSORS WITH DISTINCTION: American Cancer Society American College of Surgeons American Joint Committee on Cancer North American Association of Central Cancer Registries, Inc. A GEOCODING BEST PRACTICES GUIDE SUBMITTED TO THE NORTH AMERICAN ASSOCIATION OF CENTRAL CANCER REGISTRIES NOVEMBER 10, 2008 BY DANIEL W. GOLDBERG UNIVERSITY OF SOUTHERN CALIFORNIA GIS RESEARCH LABORATORY This page is left blank intentionally. D. W. Goldberg TABLE OF CONTENTS List of Tables .............................................................................................................. vii List of Figures .............................................................................................................. ix List of Equations .......................................................................................................... x List of Best Practices ................................................................................................... xi List of Acronyms ........................................................................................................ xiii Foreward .................................................................................................................... xiv Preface ......................................................................................................................... xv Acknowledgements .................................................................................................. xviii Dedication .................................................................................................................. xix About This Document ................................................................................................ xx Executive Summary ................................................................................................. xxiii Part 1: The Concept and Context of Geocoding ........................................................... 1 1. Introduction ............................................................................................................................. 3 1.1 What is Geocoding?....................................................................................................... 3 2. The Importance of Geocoding ............................................................................................. 9 2.1 Geocoding's Importance to Hospitals and Central Registries ................................ 9 2.2 Typical Research Workflow ......................................................................................... 9 2.3 When To Geocode ...................................................................................................... 14 2.4 Success Stories .............................................................................................................. 17 3. Geographic Information Science Fundamentals .............................................................. 19 3.1 Geographic Data Types .............................................................................................. 19 3.2 Geographic Datums and Geographic Coordinates ................................................ 19 3.3 Map Projections and Regional Reference Systems ................................................. 20 Part 2: The Components of Geocoding ...................................................................... 23 4. Address Geocoding Process Overview ............................................................................. 25 4.1 Types of Geocoding Processes .................................................................................. 25 4.2 High-Level Geocoding Process Overview ............................................................... 25 4.3 Software-Based Geocoders ........................................................................................ 26 4.4 Input Data ..................................................................................................................... 28 4.5 Reference Datasets ...................................................................................................... 31 4.6 The Geocoding Algorithm ......................................................................................... 32 4.7 Output Data .................................................................................................................. 33 4.8 Metadata ........................................................................................................................ 34 November 10, 2008 iii A Geocoding Best Practices Guide 5. Address Data ......................................................................................................................... 37 5.1 Types of Address Data ................................................................................................ 37 5.2 First-Order Estimates .................................................................................................. 41 5.3 Postal Address Hierarchy............................................................................................ 41 6. Address Data Cleaning Processes ...................................................................................... 45 6.1 Address Cleanliness ..................................................................................................... 45 6.2 Address Normalization ............................................................................................... 45 6.3 Address Standardization ............................................................................................. 50 6.4 Address Validation ....................................................................................................... 51 7. Reference Datasets ............................................................................................................... 55 7.1 Reference Dataset Types ............................................................................................ 55 7.2 Types of Reference Datasets ...................................................................................... 55 7.3 Reference Dataset Relationships ............................................................................... 65 8. Feature Matching .................................................................................................................. 69 8.1 The Algorithm .............................................................................................................. 69 8.2 Classifications of Matching Algorithms .................................................................... 71 8.3 Deterministic Matching ............................................................................................... 71 8.4 Probabilistic Matching ................................................................................................. 78 8.5 String Comparison Algorithms .................................................................................. 80 9. Feature Interpolation ............................................................................................................ 83 9.1 Feature Interpolation Algorithms .............................................................................. 83 9.2 Linear-Based Interpolation ......................................................................................... 83 9.3 Areal Unit-Based Feature Interpolation ................................................................... 90 10. Output Data ........................................................................................................................ 93 10.1 Downstream Compatibility ...................................................................................... 93 10.2 Data Loss .................................................................................................................... 93 Part 3: The Many Metrics for Measuring Quality ...................................................... 95 11. Quality Metrics .................................................................................................................... 97 11.1 Accuracy ...................................................................................................................... 97 12. Spatial Accuracy .................................................................................................................. 99 12.1 Spatial Accuracy Defined .......................................................................................... 99 12.2 Contributors to Spatial Accuracy ............................................................................ 99 12.3 Measuring Positional Accuracy .............................................................................. 104 12.4 Geocoding Process Component Error Introduction ......................................... 104 12.5 Uses of Positional Accuracy ................................................................................... 105 13. Reference Data Quality .................................................................................................... 111 13.1 Spatial Accuracy of Reference Data .....................................................................