Copyright ©Alexander S Lee 2019
Total Page:16
File Type:pdf, Size:1020Kb
Combined Model Approach to the Problem of Ranking Item Type text; Electronic Dissertation Authors Lee, Alexander S. Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 28/09/2021 08:08:32 Link to Item http://hdl.handle.net/10150/631941 COMBINED MODEL APPROACH TO THE PROBLEM OF RANKING by Alexander S. Lee __________________________ Copyright ©Alexander S Lee 2019 A Dissertation Submitted to the Faculty of the DEPARTMENT OF SYSTEMS AND INDUSTRIAL ENGINEERING In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2019 2 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of the requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that an accurate acknowledgement of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the copyright holder. SIGNED: Alexander S. Lee 4 ACKNOWLEDGEMENTS Over the years since my first year at the University of Arizona, I met many mentors, colleagues, and friends. I am extremely thankful for the experiences that I gained for both my professional and personal growth. I will cherish all the great memories and friendships formed during my time at the University of Arizona. I am very grateful to Dr. Wei-Hua Lin, Dr. Young-Jun Son, Dr. Ricardo Valerdi, and Dr. Joseph Valacich for serving on my doctoral committee. I especially want to thank my advisor, Dr. Wei-Hua Lin, for his academic guidance, encouragement, and patience. His research advice over the years helped shape me into a better researcher. In addition, I want to thank Dr. Larry Head for his advice on careers and networking in conferences. I also want to thank the University of Arizona Department of Systems & Industrial Engineering faculty and staff. I especially want to thank Linda Cramer and Mia Schnaible for their guidance with the paperwork and sending important emails and reminders on careers, seminars, and upcoming events. Their help definitely made my life a lot easier! I extend my thanks to all former and current University of Arizona Systems & Industrial Engineering students I have met over the years. Personally, I want to thank Dong Xu, Chao Meng, Sojung Kim, Byung-Ho Beak, Matthew Dabkowski, Tommy Ryan, and Danny Thebeau for their friendship and guidance. I feel very lucky to have met them and call them my friends. Most importantly, I want to thank my family, especially my parents and my brother, for their continuous love and support. None of the things that I have accomplished would have been possible without them. 5 TABLE OF CONTENTS LIST OF FIGURES……………………………………………………………………………….9 LIST OF TABLES……………………………………………………………………………….10 ABSTRACT……………………………………………………………………………………...11 1. INTRODUCTION…………………………………………………………………………...13 1.1. Main Research Goal…………………………………………………………………….15 1.2. Objectives……………………………………………………………………………….16 1.2.1. Qualitative and Quantitative Elements of Combined Models…………………...16 1.2.2. Quantification of Similarity for Road Segment Clustering……………………...16 1.2.3. Road Hotspot Identification……………………………………………………...16 1.2.4. Unsupervised Learning Hybrid Model…………………………………………..17 1.2.5. Traffic Congestion Measurement………………………………………………..17 1.2.6. Combined Model based on Machine Learned Ranking Algorithms for Voting Bias Detection……………………………………………………………………………18 1.2.7. Human-Machine Symbiosis……………………………………………………...18 1.3. Organization of the Remainder of the Dissertation……………………………………..18 2. BACKGROUND AND LITERATURE REVIEW………………………………………….20 2.1. Road Safety and Hotspot Identification…………………………………………………20 2.2. Traffic Congestion Measurement……………………………………………………….22 2.2.1. Specific Aspects of Traffic Congestion………………………………………….23 2.2.2. Hybrid Model…………………………………………………………………….24 2.2.3. Principal Component Analysis and Ranking…………………………………….25 2.3. Machine Learning Algorithms…………………………………………………………..26 6 2.3.1. Voting Bias………………………………………………………………………26 2.3.2. Combined Model………………………………………………………………...27 2.4. Literature Summary……………………………………………………………………..29 3. PROPOSED METHODOLOGIES………………………………………………………......30 3.1. Enhanced Empirical Bayesian Method………………………………………………….30 3.1.1. Similarity Measure……………………………………………………………….30 3.1.1.1. Road Segment Similarity Measure………………………………………31 3.1.1.2. Proportion Discordance Ratio (PDR)……………………………………32 3.1.2. Empirical Bayesian Method………………………………………………….......34 3.1.3. Enhancement to the Empirical Bayesian Method………………………………..37 3.2. Unsupervised Learning Hybrid Model………………………………………………….39 3.2.1. Normalized Scoring Method (NSM)…………………………………………….40 3.2.2. Principal Component Analysis (PCA)…………………………………………...40 3.2.3. Proportion Discordance Ratio (PDR) Similarity Matrix………………………...42 3.3. Machine Learned Ranking Combined Model - Supervised Learning Approach……….43 3.3.1. Inverse Mean-Squared Error (MSE) Weighted Sum…………………………….43 3.3.1.1. Support Vector Machines (SVM)………………………………………..45 3.3.1.2. Neural Networks (NN)…………………………………………………...49 3.4. Methodology Summary…………………………………………………………………51 4. ENHANCED EMPIRICAL BAYESIAN METHOD – CASE STUDY IN PHOENIX, ARIZONA……………………………………………………………………………………52 4.1. Background……………………………………………………………………………...52 4.2. Description of the Sites and Data used in the Arizona Case Study……………………..53 7 4.3. Results and Discussion………………………………………………………………….57 4.3.1. Overall Road Hotspot Identification……………………………………………..57 4.3.2. Road Hotspot Identification for Different Timeframes………………………….62 4.3.2.1. Seasons…………………………………………………………………...63 4.3.2.2. Days of the Week………………………………………………………...64 4.3.2.3. Times of the Day…………………………………………………………65 4.4. Chapter Summary……………………………………………………………………….67 5. HYBRID MODEL RANKING FOR TRAFFIC CONGESTION ASSESSMENT OF METROPOLITAN AREAS…………………………………………………………………69 5.1. Background……………………………………………………………………………...70 5.2. Dataset Description……………………………………………………………………...71 5.3. Results and Discussion………………………………………………………………….72 5.3.1. Normalized Scoring Method (NSM)…………………………………………….72 5.3.2. Principal Component Analysis (PCA)…………………………………………...77 5.3.3. Proportion Discordance Ratio (PDR) Similarity Matrix………………………...82 5.4. Chapter Summary……………………………………………………………………….87 6. BASEBALL HALL OF FAME MACHINE LEARNING COMBINED MODEL…………89 6.1. Background……………………………………………………………………………...90 6.2. Dataset Description……………………………………………………………………...92 6.3. Results and Discussion………………………………………………………………….94 6.3.1. Outfielders………………………………………………………………………..95 6.3.2. Infielders…………………………………………………………………………98 6.3.3. Starting Pitchers………………………………………………………………...100 8 6.3.4. Overall Summary of Results……………………………………………………102 6.4. Chapter Summary……………………………………………………………………...108 7. SUMMARY AND CONCLUSIONS………………………………………………………110 7.1. Research Summary…………………………………………………………………….110 7.1.1. Contributions to Combined Model for Addressing a Class of Problems Pertaining to Extreme Values and Rare Events………………………………………………111 7.1.2. Contributions to Exploring and Clarifying Similarity for Road Segment Clustering…………………………………………………………………………112 7.1.3. Contributions to Unsupervised Learning Hybrid Model for Traffic Congestion Measurement and Ranking………………………………………………………..112 7.1.4. Contributions to Exploring the Weighted Sum of Classification Probabilities using Machine Learned Ranking Algorithms……………………………………..113 7.2. Firsts in the Research…………………………………………………………………..114 7.3. Future Research Directions…………………………………………………………….115 RESEARCH FUNDING SOURCES………………………………………………………...…118 REFERENCES…………………………………………………………………………………119 9 LIST OF FIGURES Figure 4.1: Initial Cluster Centroids of 55 Road Segments in Arizona State Route 101………………………………………………………………………………………………..55 Figure 4.2: 2014 4 th Quarter Predictions vs. Actual Number of Crashes in Arizona State Route 101………………………………………………………………………………………………..59 Figure 4.3: Arizona State Route 101 Ratios vs. Enhanced EB Predictions……………………...61 Figure 4.4: Arizona State Route 101 Seasonal Heat Map……………………………………….64 Figure 4.5: Arizona State Route 101 Daily Heat Map…………………………………………...65 Figure 4.6: Arizona State Route 101 Hourly Heat Map…………………………………………67 Figure 5.1: Metropolitan Area Normalized Scoring Method (NSM) Scores and Clusters………74 Figure 5.2: Metropolitan Area Normalized Scoring Method (NSM) Scores Ordered Based on TomTom’s Ranks………………………………………………………………………………...76 Figure 5.3: Principal Component Analysis Graph……………………………………………….78 Figure 5.4: First Principal Component Scores Ordered Based on TomTom’s Ranks…………...79 Figure 5.5: Proportion Discordance Ratio (PDR) Similarity Matrix…………………………….83 Figure 6.1: Top 30 Outfielders – Hall of Fame (HOF) Probability 95% Confidence Intervals…………………………………………………………………………………………..98 Figure 6.2: Top 30 Infielders – Hall of Fame (HOF) Probability 95% Confidence Intervals…………………………………………………………………………………………100 Figure 6.3: Top 30 Starting Pitchers – Hall of Fame (HOF) Probability 95% Confidence Intervals…………………………………………………………………………………………102 Figure 6.4: ROC Curve…………………………………………………………………………104 10 LIST OF TABLES Table