(Title of the Thesis)*
Total Page:16
File Type:pdf, Size:1020Kb
A NEW APPROACH FOR GEOCODING POSTAL CODE-BASED DATA IN HEALTH RELATED STUDIES by Andrei Rosu A thesis submitted to the Department of Geography In conformity with the requirements for the degree of Master of Science Queen’s University Kingston, Ontario, Canada (September, 2014) Copyright ©Andrei Rosu, 2014 Abstract Geocoding involves the conversion of textual addresses or names of places into digital coordinates. Health researchers often use geocoding for studying the spatial distribution of populations based on a certain health outcome. To conduct any form of geocoding, health researchers generally require the use of address based data (e.g. street addresses) that are commonly obtained through survey questionnaires or hospital registries. Due to Canadian privacy and confidentiality laws, high precision addresses must be masked or aggregated to coarser geographies. In Canada, most health studies adopt the use of postal codes as they are widely available and accessible. Traditionally, postal codes are geocoded based on the Statistics Canada geocoding methodology that links postal codes to geographic representation points. However, this approach can lead to bias results in accessibility or spatial pattern analysis studies as postal codes (particularly those in rural areas) are at times displaced at far distances from actual residences. As a result, this research introduces a new postal code geocoding approach that can potentially improve upon the traditional approach by considering the land-use within postal code boundaries. Using two study areas (City of Kingston and the province of Ontario) the new and traditional approach were compared to determine which of the two better represents populations (based on residential location) at the postal code geography. Results showed that the new approach significantly improved how populations are represented in rural areas, with minimal improvements for urban areas. The impact of the new approach was also examined using population accessibility to medical clinics in the City of Kingston. The level of impact was based on the amount of population that was misallocated (by the two approaches) to non- nearest medical clinics. No significant difference was found in results between the two approaches with the new approach misallocating approximately the same amount of population (at both urban and rural areas) as the traditional approach. A larger study area that incorporates a higher number of rural postal codes is suggested (where the new approach has a higher geocoding positional accuracy) for future research. ii Acknowledgements I would like to wholeheartedly thank my supervisor Dr. Dongmei Chen for her incredible support, patience and advice. Dr. Chen showed great faith in my abilities and motivated me to do my best. I also admired how Dr. Chen was always a few steps ahead, always knowing what to do next, it was just up to me to put in the work. Her determination, confidence, vision and enthusiasm were something I will strive to model after. Additionally, I would like to thank my work supervisors, Dr. Ian Janssen and Dr. William Pickett for their kindness, patience and understanding with regards to my graduate studies. I am very grateful for their support. I would like to thank Dr. Masroor Hussain and Dr. Chen’s lab that kept me in touch with my graduate work. I appreciate very much whenever others show the same passion for geography and I am glad that as a group we were able to share ideas and experiences related to our graduate studies and life. Additionally, I would like to thank Afshin Vafaei who provided a lot of assistance with the statistical component of my masters. He was very helpful in terms of providing me information on what statistical analysis would be most appropriate for my two studies. Thanks to Afshin, I also rediscovered my passion for learning statistics, especially conceptual statistics. Furthermore, I would like to thank Dr. Janssen’s lab for being great role models for me, they have all worked very hard and shown great commitment towards their graduate work which kept me motivated with my own graduate work. I would also like to thank Mr. Sean Howard, who provided me with a lot of information on postal code geocoding methodologies that are currently used in the industry. Also, thanks to Luis Hernandez, who has offered me some key ideas for my masters work especially for my first analysis. I am also thankful to Dr. Jason Gilliland who provided me with an important reference for my thesis. Additionally, I would like to thank Ian Smith (Niagara College professor) who was the first to encourage me to pursue a master in Geography at Queen’s! Finally, I would like to thank my family and friends, their support have always been very important to me. iii Table of Contents Abstract ii Acknowledgements iii List of Figures vi List of Tables viii List of Abbreviations x Chapter 1 Introduction 1 Chapter 2 Literature Review 8 2.1 Introduction: Defining Geocoding 8 2.2 Geocoding Approaches and Conceptualization 10 2.3 Privacy and Confidentiality Issues 14 2.4 Structure of Postal Code in Canada 15 2.5 Traditional Approaches to Postal Code Geocoding 17 2.5.1 How Representation Points are Calculated for Geographic Areas 22 2.6 Positional Error Involved in Postal Code Geocoding 27 2.7 Impact of Positional Error in Accessibility based Studies in Health 29 2.8 Summary 31 Chapter 3 Proposing a New Postal Code Geocoding Approach for Health Studies 32 3.1 Conceptualizations and Implementation 32 Chapter 4 Methodology 39 4.1 Comparing the proposed approach to the traditional approach 39 4.1.1 Study areas and materials 41 4.1.2 Topology check and data integrity 43 4.1.3 Analysis 44 4.2 Impact of the Proposed Approach on Accessibility Studies in Health 47 4.2.1 Introduction 47 4.2.2 Study Area and Materials 52 4.2.3 Analysis 53 Chapter 5 Results 59 5.1 Results: Comparing the Proposed to the Traditional Approach 59 5.2 Results: Impact of the Proposed Approach on Accessibility Studies in Health 63 iv Chapter 6 Discussion and Conclusions 69 6.1 Key Findings 69 6.2 Meaning of Findings 73 6.3 Contributions of this research and future research directions 75 6.4 Final Conclusions 79 References 81 Appendix 86 v List of Figures Figure 2.1: An example of converting descriptive location data into geographic coordinates. 8 Figure 2.2: Forces that contributed to the evolution of geocoding in public health practices. 10 Figure 2.3: An illustration of a digital road network used for geocoding addresses in ArcGIS. 12 Figure 2.4: Example of different levels of census geographies that postal codes are geocoed to. 18 Figure 2.5: An illustration of how postal codes are linked to census geographic areas. 20 Figure 2.5:1 An illustration of how postal code data is geocoded using the Postal Code Conversion File (PCCF). 21 Figure 2.5.2: A mean weighted formula used for calculating the representation point for a dissemination area. 24 Figure 2.5.3: A minimum squared distance formula for calculating dissemination area representative points. 26 Figure 2.6: Geographic representation points used for calculating nearest distances to healthcare facilities. 29 Figure 3.1: An illustration of how the dissemination block population density is aggregated into the postal code boundary. 34 Figure 3.2: An example of how population from the postal code is aggregated into the residential land-use boundaries. 35 Figure 3.3: An example of how dissemination blocks are overlaid with the postal code boundary. 37 Figure 3.4: An example of how the proposed approach postal code point is created based on the use of residential land-use boundaries. 38 Figure 3:5: An example of how the proposed approach postal code point is created based on the use of dwelling boundaries. 38 Figure 4.1.1: The postal code boundaries in the province of Ontario study area. 42 Figure 4.1.2: Study area based in the west side of the City of Kingston, ON. 42 Figure 4.1.3: Materials used within the west Kingston study area. 43 Figure 4.1.4: An example of measuring the distance between the proposed and traditional approach postal code points and residential areas. 46 Figure 4.2.1: An example of how the proposed approach postal code points are assigned to nearest the clinic. 49 Figure 4.2.2: An example of how populations are misallocated when they are aggregated to nearest proposed approach postal code points. 51 vi Figure 4.2.3: An example of how populations are misallocated when they are aggregated to the nearest proposed approach postal code points. 51 Figure 4.2.4 Materials used in the analysis for the City of Kingston Study area (second analysis) 53 Figure 4.2.5: Randomly generated population points. 54 Figure 4.2.6: Illustration of how road distances were calculated between population points and the nearest medical clinics. 55 Figure 5.1.1: Frequency charts of the mean distance measures obtained for the west City of Kingston study area. 61 Figure 5.1.2: Frequency charts of the distance measures obtained for the Ontario study area. 62 Figure 5.2.1: Spatial pattern of matching and non-matching misallocated population points. 68 vii List of Tables Table 2.1: Geocoding steps performed in a typical GIS program. 14 Table 2.2: A list of alphabetical letters that belong to the first character of the FSA (Forward Sortation Area) component of the postal code. 16 Table 2.3: A list of individual features that are represented by their own local area unit (LDU) in urban and rural areas. 16 Table 2.4: Shortest distance measures between census geographic points and healthcare facilities categorized by rural, small town, suburban and urban scales.