Towards Real-Time Geodemographic Information Systems: Design, Analysis and Evaluation
Total Page:16
File Type:pdf, Size:1020Kb
Towards real-time geodemographic information systems: design, analysis and evaluation Muhammad Adnan Department of Geography University College London (UCL) Thesis submitted in the fulfillment of the requirement for the degree of: Doctor of Philosophy (PhD) September, 2011 Word Count: 60, 881 Declaration DECLARATION I, Muhammad Adnan, confirm that the work presented in this thesis 'Towards real- time geodemographic information systems: design, analysis and evaluation' is exclusively my own. Where information has been derived from other sources, I confirm that this has been indicated in this thesis. The views expressed in this publication are those of the author and not necessarily of the University College London. Signed: Date: Acknowledgments ACKNOWLEDGEMENTS I must thank, first of all, my supervisor Paul Longley, who has provided support, advice and guidance throughout this PhD. I will always be thankful to him for his constant encouragement, support, and help during this PhD. Paul's guidance and encouragement enabled me to complete this research successfully. Secondly, I am grateful to Alex Singleton, my second supervisor, for his support and guidance during the PhD. Paul and Alex have always been very kind and supportive during the course of the PhD research, and it wouldn't have been possible to complete the PhD without their support and guidance. Thanks must go to Local Futures Group (LFG) who funded this PhD through a Knowledge Transfer Partnership. The team at LFG was very supportive throughout the research. Before the Knowledge Transfer Partnership, PhD was supported by a UCL's SPLINT (Spatial Literacy in Teaching and Learning) grant. Many people gave me very valuable ideas, support or materials through my research. I would like to thank Pablo Mateos, Maurizio Gibin, James Cheshire, Dan Lewis, Chris Gale, John Fisher, Liane Hoogland, Amreen Sandhu, Juliana Cipa, Amna Abdul Wahid and Muhammad Salman for their valuable ideas and moral support. Finally, I would like to thank my friends, colleagues and family with whom I have shared different bits of the process of this PhD. It would have been very difficult to complete the PhD research without the moral support of my family in Pakistan, so special thanks goes to them. Abstract ABSTRACT Geodemographic classifications provide discrete indicators of the social, economic and demographic characteristics of people living in small neighbourhood areas. They have been regarded as products, which are the final 'best' outcome that can be achieved using available data and algorithms. However, reduction in the cost of geocomputation, increased network bandwidths and increasingly accessible spatial data infrastructures have together created the potential for the creation of classifications in near real-time within distributed environments. Current geodemographic classifications are said to be 'closed' in nature due to the data and algorithms used. This thesis is a step towards an open geodemographic information system that allows users to specify the importance of their selected variables and then perform a range of statistical analysis functions which are necessary to create classifications tailored to user requirements. This thesis discusses the socio-economic data sources currently used in the creation of geodemographic classifications, and explains the work towards the creation of a non-conventional data sources arising out of the UCL's surname database. Such data sources are seen as key to the creation of tailor made classifications. The thesis explains and compares different cluster analysis techniques for the segmentation of geodemographic classifications. The development of an online information system employs an optimisation of k-means clustering algorithm. This optimisation uses CUDA (Computer Unified Development Architecture) for parallel processing of computationally expensive k-means on NVIDIA's graphics cards. The concluding chapters of the thesis set out the architecture of a real-time geodemographic information system. The thesis also presents the results of the creation of bespoke local area classifications. The developmental work culminates in a pilot real-time geodemographic information system for the specification, estimation and testing of classifications on the fly. Table of Contents TABLE OF CONTENTS DECLARATION ................................................................................................................ 2 ACKNOWLEDGEMENTS ................................................................................................... 3 ABSTRACT ...................................................................................................................... 4 TABLE OF CONTENTS ...................................................................................................... 5 LIST OF FIGURES ........................................................................................................... 10 LIST OF TABLES ............................................................................................................. 15 LIST OF ABBREVIATIONS ............................................................................................... 17 1 CHAPTER 1: INTRODUCTION.................................................................................... 19 1.1 GEODEMOGRAPIHCS ................................................................................................. 19 1.2 MOTIVATION FOR THIS PHD ........................................................................................ 20 1.3 AIMS AND OBJECTIVES ............................................................................................... 22 1.4 METHODS AND OUTPUTS ............................................................................................ 23 1.5 THESIS STRUCTURE ................................................................................................... 25 2 CHAPTER 2: GEODEMOGRAPHIC CLASSIFICATIONS .................................................. 30 2.1 AREA CLASSIFICATIONS .............................................................................................. 30 2.2 GEODEMOGRAPHIC CLASSIFICATION .............................................................................. 34 2.3 HISTORY OF GEODEMOGRAPHICS CLASSIFICATION AS A SOCIAL MEASUREMENT ........................ 35 2.3.1 PRE 1981 DEVELOPEMENTS ............................................................................................. 35 2.3.2 POST 1981 DEVELOPEMENTS (1981 - 2005) ..................................................................... 38 2.3.3 CURRENT DEVELOPEMENTS............................................................................................... 39 2.4 HOW GEODEMOGRAPHIC CLASSIFICATION DIFFERS FROM OTHER STATISTICAL SOURCES .............. 40 2.4.1 GEODEMOGRAPHICS VS IMD (INDEX OF MULTIPLE DEPRIVATION) .................................... 40 2.5 A CRITICAL REVIEW OF GEDEMOGRAPHIC CLASSIFICATIONS.................................................. 42 2.5.1 DOES ONE SIZE FIT ALL? THE NEED FOR BESPOKE CLASSIFICATIONS ........................................... 42 2.5.2 OPEN METHODS ............................................................................................................. 43 2.5.3 PUBLIC CONSULTATION .................................................................................................... 44 2.6 BESPOKE VS STANDARD CLASSIFICATIONS ....................................................................... 44 2.7 HOW GEODEMOGRAPHIC CLASSIFICATIONS ARE COMPUTED ................................................ 46 2.7.1 SPECIFICATION ............................................................................................................... 46 Table of Contents 2.7.2 NORMALISATION ............................................................................................................ 48 2.7.3 CLUSTERING ................................................................................................................... 49 2.7.4 VISUALISATION ............................................................................................................... 50 2.8 THE INTERNATIONAL DIMENSION .................................................................................. 52 2.9 CONCLUSION ........................................................................................................... 54 3 CHAPTER 3: THE CREATION OF A GEO-REFERENCED GLOBAL NAMES DATABASE ...... 55 3.1 INTRODUCTION ........................................................................................................ 55 3.2 OVERVIEW OF DIFFERENCES IN NAMING CONVENTIONS ...................................................... 58 3.2.1 TELEPHONE DIRECTORIES.................................................................................................. 58 3.2.2 ELECTORAL REGISTERS ..................................................................................................... 63 3.2.3 SURNAME COUNTS ......................................................................................................... 64 3.3 ACCOMMODATING GEOGRAPHIC VARIATIONS IN NAMING CONVENTIONS: .............................. 64 3.4 CHOOSING AN APPROPRIATE DBMS (DATABASE MANAGEMENT SYSTEM) ................................ 67 3.5 GEO-REFERENCING NAME ADDRESS DATA ....................................................................... 68 3.5.1 CREATION OF THE POSTAL MASTER TABLE ........................................................................... 72 3.6 GEO-REFERENCING NAME DATA