Master Thesis, University of Waterloo, 2013
Total Page:16
File Type:pdf, Size:1020Kb
A Framework for Traffic Collision Prediction Using Historical Accident Information and Real-Time Sensor Data: A Case Study for the City of Ottawa by Enrique A. Reverón Yanes A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Applied Science in Electrical and Computer Engineering Ottawa-Carleton Institute for Electrical and Computer Engineering Department of Systems and Computer Engineering Carleton University Ottawa, Ontario July 2019 © 2019, Enrique A. Reverón Yanes Abstract According to recent studies, beyond being a major worldwide problem with huge economic impact, traffic collisions are poised to become as well one of the most important leading causes of death. Proactive traffic enforcement and intervention should be based on a thorough analysis on the collision data available to identify leading causes of accidents, the most prone locations as well as to predict the conditions for collision occurrence. This thesis presents a novel framework for collision prediction that takes into consideration historical and real-time factors, such as weather, geospatial information and social event data that can be obtained with existing sensor technology. A prototype is proposed, implemented and evaluated for the city of Ottawa, Canada, to predict: (1) accident frequency (collision vs no- collisions) and (2) accident severity (in terms of fatal, injury and property damage only accidents). The best performance was achieved in both cases using gradient boosted trees. ii Acknowledgements I would like to express my gratitude to my family in Canada, Perla and Leonardo, for their full support and encouragement during these two years that I have spent studying in Canada. It has been a very difficult journey for us, I will be grateful forever for your support and love. I would like to thank to my family outside Canada; my father Enrique, my sisters Mariela and Miriam, my aunt Maria Julieta, my cousins Maria Julieta and Jose Manuel, my step-daughter Ivana; for their motivation and support. This achievement is dedicated specially to my mother, Aida; I have a special debt with her, I would have liked it so much that you celebrated this with me in person. Finally, I would like to thank Dr. Ana-Maria Cretu for her guidance, motivation and encouragement through my study and research. iii Table of Contents Abstract .............................................................................................................................................................. ii Acknowledgements..................................................................................................................................... iii List of Tables ................................................................................................................................................ viii List of Figures .................................................................................................................................................. x List of Algorithms ...................................................................................................................................... xix List of Appendices ..................................................................................................................................... xxi List of Acronyms ........................................................................................................................................ xxii 1. Introduction ........................................................................................................................................... 1 1.1. Research Problem Justification ......................................................................................................... 1 1.2. Thesis Objectives...................................................................................................................................... 2 1.3. Thesis Organization ................................................................................................................................ 3 2. Literature Review ................................................................................................................................ 4 2.1. Accident Factors ....................................................................................................................................... 4 2.2. Accident Trends ..................................................................................................................................... 10 2.3. Methods for Accident Modeling and Prediction..................................................................... 12 2.3.1. Neural Networks ........................................................................................................................ 13 2.3.2. Fuzzy Techniques ...................................................................................................................... 15 2.3.3. Decision Trees ............................................................................................................................. 16 2.3.4. Regression ..................................................................................................................................... 18 2.3.5. Bayesian Networks.................................................................................................................... 19 2.3.6. Association Rules ....................................................................................................................... 20 2.3.7. Other Techniques for Traffic Accident Modeling and Prediction ....................... 20 2.3.8. Variable/Attribute Selection ................................................................................................ 22 iv 2.4. Summary of Literature ....................................................................................................................... 23 3. Methodology ....................................................................................................................................... 24 3.1.1. Business Understanding ......................................................................................................... 25 3.1.2. Data Understanding .................................................................................................................. 26 3.1.3. Data Preparation ........................................................................................................................ 27 3.1.4. Modeling ......................................................................................................................................... 27 3.1.5. Evaluation ...................................................................................................................................... 28 3.1.6. Deployment................................................................................................................................... 28 4. Proposed Framework for Accident Prediction ................................................................... 30 4.1. Identification and Selection of Important Features (Collision Framework Risk Factors) ...................................................................................................................................................................... 30 4.2. Definition of the Collision Framework Process ...................................................................... 32 5. Prototype Implementation: Business/Data understanding and Preparation ..... 35 5.1. Business/Data understanding ........................................................................................................ 35 5.1.1. Determine Business Objectives ........................................................................................... 35 5.1.2. Inventory of Available Resources ...................................................................................... 35 5.1.3. Data Description ......................................................................................................................... 39 5.1.4. Data Exploration ......................................................................................................................... 47 5.2. Data Quality Verification ................................................................................................................... 62 5.3. Data preparation ................................................................................................................................... 66 5.3.1. Data Selection and Cleaning .................................................................................................. 66 5.3.2. Data Construction ...................................................................................................................... 68 5.4. Data Integration ..................................................................................................................................... 92 5.5. Data formatting ...................................................................................................................................... 96 6. Prototype Implementation: Modeling, Model Comparison and Evaluation ......... 98 6.1. Metrics to Evaluate the Performance of a Binary Classifier ............................................. 99 v 6.1.1. Confusion Matrix .......................................................................................................................