Bus Ridership Prediction Using Machine Learning

BUS RIDERSHIP PREDICTION USING MACHINE LEARNING INTEGRATED WITH GEOGRAPHIC INFORMATION SYSTEM by Pengyu Li A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Civil Engineering Spring 2020 © 2020 Pengyu Li All Rights Reserved BUS RIDERSHIP PREDICTION USING MACHINE LEARNING INTEGRATED WITH GEOGRAPHIC INFORMATION SYSTEM by Pengyu Li Approved: __________________________________________________________ Sue McNeil, Ph.D. Professor in charge of thesis on behalf of the Advisory Committee Approved: __________________________________________________________ Sue McNeil, Ph.D. Chair of the Department of Civil and Environmental Engineering Approved: __________________________________________________________ Levi T. Thompson, Ph.D. Dean of the College of Engineering Approved: __________________________________________________________ Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate and Professional Education and Dean of the Graduate College ACKNOWLEDGMENTS I would like to express the deepest appreciation to my advisor, Professor Sue McNeil, for the countless hours of mentorship and advice throughout my undergraduate and graduate studies. The continuous help and encouragement she gave constantly motivate me to pursue the best outcomes in my academic career and personal life. I would also like to thank her for her trust in me when deciding to extend our relationship to my graduate study. Without her guidance and persistent suggestions, this thesis would not have been possible. I would like to thank Professor Earl “Rusty” Lee, who introduced me to the world of geographic information system (GIS). I worked for Dr. Lee during my junior and senior years, and learned so much about accessibility in transportation planning and traffic modeling. Without the basic knowledge of census data and GIS outside of class, I would not be capable of conducting the data geoprocessing in this study, or even come up with the idea of this thesis. I would like to thank Professor Nii Attoh-Okine and his student Dr. Ahmed Lasisi. My friend and colleague Ahmed helped me learn the basics of machine learning, which allowed me to further explore and apply the machine learning algorithms to this study. To Cathy and David at DART First State, thank you for providing me with the important data for this study, and thank you for walking me through the planning operations of the bus agency and answering my numerous questions along the way. iii Lastly, thank you to my parents who did not have an obligation to finance my expensive overseas study but still supported me as always. Thank you for your endless love, and encouragement, and for always respecting my choices. iv TABLE OF CONTENTS LIST OF TABLES .................................................................................................. vii LIST OF FIGURES ............................................................................................... viii ABSTRACT ............................................................................................................. x Chapter 1 INTRODUCTION .............................................................................................. 1 1.1 Problem Statement ..................................................................................... 1 1.2 Research Questions ................................................................................... 3 1.3 Motivation and Research Objective .......................................................... 3 1.4 Proposed Methodology .............................................................................. 5 1.4.1 Data Collection and Pre-processing .............................................. 5 1.4.2 Hypothetical Bus Stop Creation .................................................... 6 1.4.3 Spatial Data Analysis .................................................................... 6 1.4.4 Visualization and Recommendation .............................................. 6 1.5 Outline ....................................................................................................... 7 2 LITERATURE REVIEW ................................................................................... 9 2.1 Introduction ............................................................................................... 9 2.2 Need for New Travel Demand Models ................................................... 10 2.3 Bus Stop/ Bus Route Design and Optimization ...................................... 12 2.4 Bus Route/ Bus Stop Designs in GIS Applications ................................. 14 2.5 Machine Learning Applications to Bus Transit ....................................... 15 2.6 Influential Variables ................................................................................ 15 2.7 Summary .................................................................................................. 17 3 DATA SET DESCRIPTION ............................................................................ 18 3.1 Defining the Study Area .......................................................................... 18 3.2 Brief Review of DART’s Bus Network .................................................. 19 3.3 Description of DART’s Ridership Data .................................................. 22 3.4 Bus Stop Visualization and Geoprocessing in ArcGIS ........................... 24 v 3.4.1 Mapping the Existing Bus Stops ................................................. 25 3.4.2 Creating the Hypothetical Bus Stops ........................................... 25 3.5 Description of Demographic Data at the Census Block Group Level ..... 29 3.6 Land Use Data ......................................................................................... 34 3.7 Employment Data .................................................................................... 35 3.8 Geoprocessing and Spatial Analysis ....................................................... 35 3.9 Data Assumptions and Limitations ......................................................... 35 4 METHODOLOGY ........................................................................................... 40 4.1 Introduction ............................................................................................. 40 4.2 Machine Learning .................................................................................... 41 4.2.1 Supervised Learning .................................................................... 42 4.2.2 Machine Learning Algorithms .................................................... 42 4.3 K-Fold Cross-validation .......................................................................... 45 4.4 Prediction Outputs ................................................................................... 46 4.5 Additional Data Processing ..................................................................... 48 5 RESULTS AND DISCUSSIONS .................................................................... 50 5.1 Introduction ............................................................................................. 50 5.2 Model Outputs and Performance ............................................................. 51 5.2.1 Summary Statistics for Alternative Algorithms .......................... 51 5.2.2 The Best Performing Model - Lightgbm ..................................... 53 5.3 Mapping the Ridership ............................................................................ 55 5.4 Feature Importance .................................................................................. 66 6 CONCLUDING REMARKS ........................................................................... 71 6.1 Conclusions ............................................................................................. 71 6.2 Contributions to the Transportation Planning Field ................................ 72 6.3 Future work ............................................................................................. 73 REFERENCES ............................................................................................................. 76 vi LIST OF TABLES Table 1. Top Ten Bus Stops by Daily Ridership ................................................... 23 Table 2. ACS Dataset and Selected Attributes ...................................................... 31 Table 3. Land Use Types ....................................................................................... 34 Table 4. Statistical Description of Prediction Outputs vs. Historical Ridership (On) .......................................................................................................... 51 Table 5. Statistical Description of Prediction Outputs vs. Historical Ridership (Off) ......................................................................................................... 52 Table 6. Statistical Description of Prediction Outputs vs. Historical Ridership (Total) ...................................................................................................... 52 Table 7. Top Ten Bus Stops by Predicted Total Daily Ridership ......................... 55 vii LIST OF FIGURES Figure 1: DART Ridership vs. Expense .................................................................. 11 Figure 2: DART Bus Stops in Delaware ................................................................. 20 Figure 3: DART Bus Stops in Wilmington-Newark Area ...................................... 21 Figure 4: Relative Frequency of “On” Ridership at Bus Stops (Passengers/Day)

Load more