Leveraging Twitter Data to Support Transit Planning and Operations
Total Page:16
File Type:pdf, Size:1020Kb
Leveraging Twitter Data to Support Transit Planning and Operations by Omar Kabbani A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Department of Civil and Mineral Engineering University of Toronto © Copyright by Omar Kabbani 2020 i Leveraging Twitter Data to Support Transit Planning and Operations Omar Kabbani Master of Applied Science Department of Civil and Mineral Engineering University of Toronto 2020 Abstract Twitter provides an unfiltered and timestamped feed of information that can be aggregated to generate valuable insights. This research creates a framework for processing a public Twitter feed to generate insights on rider satisfaction and to identify passenger–related transit incidents. Detecting these incidents in real time enables transit agencies to immediately respond to them by dispatching security, safety, or maintenance crews, and in the context of the current COVID–19 pandemic, to provide targeted cleaning measures to combat the spread of the virus. Using natural language processing, we identify eyewitness tweets about transit and then extract latent information from the tweets such as location details, sentiments, and topics. This enables agencies to respond to an incident faster and to identify spatial and temporal patterns for incidents and interests throughout the network. ii Acknowledgments I would like to thank my family and friends, whose names would span pages if I were to mention them one by one. Additionally, I would like to thank my research supervisors, Professors Amer Shalaby, Tamer El–Diraby, and Willem Klumpenhouwer who believed in my abilities and presented me with this opportunity. This work would not have been possible without their constant guidance and supervision. A big thank you to the vibrant community of Twitter users whose tweets made this work possible. Lastly, I would like to thank Calgary Transit, S&P North America, the Natural Sciences and Engineering Research Council, and the Canadian Urban Transit Research and Innovation Consortium for funding this project. iii Table of Contents Chapter 1: Introduction ................................................................................................................... 1 1.1 Background ........................................................................................................................... 1 1.2 Motivation ............................................................................................................................. 1 1.3 Aims and Objectives ............................................................................................................. 3 1.4 Research Framework ............................................................................................................. 3 1.5 Limitations ............................................................................................................................ 5 1.6 Case Study ............................................................................................................................. 6 1.7 Thesis Organization............................................................................................................... 7 Chapter 2: Literature Review .......................................................................................................... 9 2.1 Social Media Analytics ......................................................................................................... 9 2.2 Social Media Analytics – Implementation in Transit .......................................................... 10 2.3 Twitter Data......................................................................................................................... 14 Chapter 3: Methodology ............................................................................................................... 18 3.1 Data Collection ............................................................................................................... 18 3.1.1 Transit Agency Operations ........................................................................................... 18 3.1.2 Business Requirements ................................................................................................. 22 3.1.3 Twitter Data .................................................................................................................. 29 3.2 Develop Social Media Analytics Tool ................................................................................ 31 3.2.1 Input ........................................................................................................................ 31 3.2.2 Data Extraction and Filtering .................................................................................. 34 3.2.3 Data Clustering ............................................................................................................. 39 3.2.4 Data Filtering ................................................................................................................ 41 3.2.5 Social Network Setup and Analysis ............................................................................. 43 3.2.6 Output ........................................................................................................................... 51 iv 3.3 Develop a Detector for Passenger–related Incidents........................................................... 52 3.3.1 Transit Tweet Detector ........................................................................................... 52 3.3.2 In–the–moment/Eyewitness Tweet Detector ................................................................ 56 Chapter 4: Results and Discussion ................................................................................................ 61 4.1 Results ............................................................................................................................ 61 4.1.1 Global Tweets Containing the Word “bus” ............................................................ 61 4.1.2 Tweets from Calgary .................................................................................................... 70 4.1.3 Tweets Mentioning @calgarytransit ............................................................................. 71 4.2 Discussion ........................................................................................................................... 75 4.2.1 Assigning Generic Transit Tweets to Agencies ...................................................... 76 4.2.2 Categorical, Temporal, and Sentiment Analysis of Tweets Mentioning @calgarytransit ............................................................................................................................................... 78 4.2.3 Validation ..................................................................................................................... 82 4.2.4 Extraction of Latent Location Information ............................................................. 95 4.2.5 Disruption Days .......................................................................................................... 120 Chapter 5: Conclusion................................................................................................................. 128 5.1 Summary ........................................................................................................................... 128 5.2 Implications for Public Transit Operations ....................................................................... 128 5.3 Future Work ...................................................................................................................... 129 Appendix ..................................................................................................................................... 131 References ................................................................................................................................... 136 v List of Tables Table 2.1: Properties of Twitter's API .......................................................................................... 14 Table 3.1: Terms for the transit detector bag of words ................................................................. 54 Table 3.2: Bigrams for the transit detector bag of words ............................................................. 55 Table 3.3: Terms associated with personal incidents on transit.................................................... 57 Table 3.4: Features for detecting in-the-moment/eyewitness tweets ............................................ 58 Table 4.1: Critical tweets about disruptive riders from the dataset of global tweets .................... 62 Table 4.2: Critical tweets about alcohol/drug use from the dataset of global tweets ................... 64 Table 4.3: Critical tweets about crowding and COVID-19 from the dataset of global tweets ..... 67 Table 4.4: In-the-moment/eyewitness transit tweets from Calgary .............................................. 70 Table 4.5: In-the-moment/eyewitness transit tweets that mention @calgarytransit ..................... 71 Table 4.6: Inferred transit agencies for the tweets from Table 6 .................................................. 78 Table 4.7: Sentiment analysis of the 69 in–the–moment tweets that mention @calgarytransit ... 81 Table 4.8: Complaint tweets with neutral sentiments ................................................................... 82 Table 4.9: In-the-moment/eyewitness transit tweets that