Leeds Thesis Template

A Framework for Big Data in Urban Mobility and Movement Patterns Analysis Eusebio Amechi Odiari Submitted in accordance with the requirements for the degree of Doctor of Philosophy (PhD) The University of Leeds Faculty of Environment, School of Geography September, 2018 - i - The candidate confirms that the work submitted is his own and that appropriate credit has been given where reference has been made to the work of others. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement. The right of Eusebio Amechi Odiari to be identified as Author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. - ii - Acknowledgements I would like to thank my supervisors Prof Mark Birkin, Prof Susan Grant- Muller and Dr Nicolas Malleson for the fervent guidance they have given me throughout the research work. This work was supported by a grant from the Consumer Data Research Centre (CDRC), Economic and Social Research Council (ESRC) and by funding from the Association of Train Operating Companies (ATOC). I am indebted to Tony Magee and Nick Wilson of ATOC for scoping the project in, for their encouragement and technical support. This project has been a multi-disciplinary one, cutting across schools within the Faculty of Environment (School of Geography and Institute for Transport Studies), the Leeds Institute for Data Analytics (LIDA) and the project industrial sponsors the Association of Train Operating Companies (ATOC). Managing the project would not have been possible without the unique PhD administrative structure within the School of Geography, University of Leeds, which includes the project support team Dr Nik Lomax and the independent assessor Dr Gordon Mitchell. I am particularly grateful for their technical support, research advice and industrial guidance. I would like to acknowledge the moral support, the several technically inciting discussions, and the generosity I have enjoyed from colleagues at the Consumer Data Research Centre (CDRC), Leeds Institute for Data Analytics. They include in no particular order Emily Sheard, Michelle Morris, Rachel Oldroyd, Robin Lovelace, Nick Hood, Stephen Clark, Tom Waddington, and Phani Kumar. The same acknowledgement goes to postgraduate colleagues at School of Geography, Institute for Transport Studies (ITS), and to teaching, research and support staff at School of Geography and the CDRC. - iii - Abstract Novel large consumer datasets (called ‘Big Data’) are increasingly readily available. These datasets are typically created for a particular purpose, and as such are skewed, and further do not have the broad spectrum of attributes required for their wider application. Railway ticket data are an example of consumer data, which often have little or no supplementary information about the passengers who purchase them, or the context in which the ticket was used (like crowding-level in the train). These gaps in consumer data present challenges in using these data for planning, and inference on the drivers of mobility choice. Heckman’s in-depth discussion of ‘sample selection’ bias and ‘omitted variables’ bias (Heckman, 1977), and Rubin’s seminal paper on ‘missing values’ (Rubin, 1976) laid the framework for addressing omitted variables and missing data problems today. On the strength of these, a powerful set of complementary concerted methodologies are developed to harness railways consumer (ticketing) data. A novel spatial microsimulation methodology suitable for skewed interaction data was developed to combine LENNON ticketing, National Rail Travel Survey, and Census interaction data, to yield an attribute-rich micro-population. The micro-population was used as input to a GIS network, logistically constrained by the transit feed specification (GTFS). This identifies the context of passenger mobility. Bayesian models then enable the identification of passenger behaviour, like missing daily trip rates with season tickets, and flows to group stations. Case studies using the micro-level synthetic data reveal a mechanism of rail- heading phenomena in West Yorkshire, and the impact of a new station at Kirkstall Forge. The spatial microsimulation and GIS-GTFS methods are potentially useful to network operators for the management and maintenance on the railways. The representativeness of the micro-level population created has the potential to alter multi-agent transport simulation genres, by precluding the need for the complexities of utility-maximizing traffic assignment. - iv - Table of Contents Acknowledgements ..................................................................................... ii Abstract ....................................................................................................... iii Table of Contents ....................................................................................... iv List of Tables ............................................................................................... x List of Figures ............................................................................................ xi List of Abbreviations ................................................................................ xiv Chapter 1 INTRODUCTION AND OVERVIEW ............................................ 2 1.1 Background .................................................................................... 3 1.2 Urban mobility concept ................................................................... 6 1.2.1 Trip based mobility .............................................................. 6 1.2.2 Alternative mobility concepts ............................................... 6 1.3 Research questions and hypothesis............................................... 7 1.4 Aims and objective of thesis ........................................................... 9 1.5 Conceptual methodology .............................................................. 12 1.6 Scope and structure of the thesis ................................................. 14 1.7 Summary table ............................................................................. 17 Chapter 2 REVIEW OF CONSUMER MOBILITY ANALYSIS .................... 18 2.1 Review of the transport concepts ................................................. 19 2.2 Consumer data and the rail sector ............................................... 21 2.3 Nomenclature for gaps in data ..................................................... 23 2.3.1 Statistical missing data mechanism................................... 24 2.3.2 Ascertain missingness mechanisms.................................. 26 2.4 Review of spatial-microsimulation ................................................ 27 2.4.1 Deterministic spatial microsimulation ................................ 28 2.4.2 Stochastic spatial microsimulation..................................... 29 2.4.3 Remarks ............................................................................ 29 2.5 Transport modelling genres .......................................................... 30 2.5.1 Trip-based mobility models ............................................... 31 2.5.1.1 Trip based mobility ................................................. 31 2.5.1.2 Direct demand forecasting ..................................... 32 2.5.2 Activity-based mobility models .......................................... 33 2.5.2.1 Review of multi-agent transport models ................. 35 - v - 2.5.2.2 Alternative disaggregate models ............................ 36 2.6 GIS logistical network models ...................................................... 37 2.7 Review of Bayesian modelling framework .................................... 38 2.7.1 Data imputation strategies ................................................. 39 2.7.2 Bayesian analysis strategies ............................................. 40 2.8 Chapter summary ......................................................................... 42 Chapter 3 SETTINGS, DATA AND PRE-PROCESSING ........................... 44 3.1 West Yorkshire study data ........................................................... 45 3.2 Railways statistical missing data ................................................. 49 3.2.1 Missing data mechanism ................................................... 49 3.2.2 LENNON pre-processing ................................................... 51 3.2.3 Little’s MCAR test .............................................................. 55 3.3 Reconciling datasets ................................................................... 58 3.3.1 Boundaries and re-zoning ................................................. 58 3.3.2 Income for Census ............................................................ 62 3.4 Preliminary summary of data ........................................................ 65 3.4.1 O-D spatial interaction ....................................................... 66 3.4.2 Classification of variables and values................................ 67 3.4.3 Covariate regression ......................................................... 69 3.5 Remarks ....................................................................................... 72 Chapter 4 SPATIAL MICROSIMULATION ................................................ 74 4.1 Literature on spatial microsimulation ............................................ 74 4.2 The Lagrange multiplier ................................................................ 78

Leeds Thesis Template

Chairman's Signature 12Th May 2020 Page 1

An Annotated List of Documents Relating to Cononley, Cowling and District

Scrutiny Inquiry Report

78A Bus Time Schedule & Line Route

Public Accountability Meeting

The Parish of Kildwick, Cononley and Bradley

A Family at War Introduction William Whitaker, the Older Brother

Madge Bank, Cononley

Directions to the Healthy Home & Tewit Cote 2018

The Last British Ice Sheet: a Review of the Evidence Utilised in the Compilation of the Glacial Map of Britain

Key Stage 2 Rivers and Mountains

'And This Is Hell My Friend.' Tony D'souza on Vengeance