A Deep Learning Approach to Big Data an Application to Traffic

FakultätBauingenieurwesen Institut fürBauinformatik THESIS A Deep Learning Approach to Big Data An Application to Traffic Prediction Submitted by: Falk Hügle(3254131) Advisors: Prof. Dr.-Ing. Raimar Scherer Dipl.-Ing. Ngoc Trung Luu Selbstständigkeitserklärung Hiermit versichere ich, dass ich die vorliegende Arbeit ohne unzulässigeHilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; die aus fremden Quellen direkt oder indirekt übernommenen Gedanken sind als solche kenntlich gemacht. Die Arbeit wurde bisher weder im Inland noch im Ausland in gleicher oder ähnlicher Form einer anderen Prüfungsbehördevorgelegt und ist auch noch nicht veröffentlicht worden. New York, 19.05.2017 ............................. Falk Hügle Contents List of Figures II List of Tables III List of AlgorithmsIV 1 Introduction 1 1.1 Subject and Goals...................................1 1.2 Structure........................................1 2 Academic and Historical Context2 2.1 Academic Context...................................2 2.2 Historical Context...................................3 3 Theory 7 3.1 Machine Learning....................................7 3.1.1 Basics......................................7 3.1.2 Learning Paradigms..............................8 3.1.3 Data in Machine Learning........................... 10 3.1.4 Statistical Learning Theory.......................... 11 3.1.5 Loss and Cost Functions............................ 15 3.1.6 Types of Problems............................... 20 3.1.7 Types of Models................................ 28 3.1.8 Hyperparameter Optimization........................ 29 3.1.9 Assessing Performance............................. 33 3.2 Artificial Neural Networks............................... 35 3.2.1 Basics...................................... 35 3.2.2 Artificial Neurons............................... 37 3.2.3 Types of Architectures............................. 42 3.2.4 Learning Algorithms.............................. 54 3.2.5 Improving Generalization........................... 67 3.3 Deep Learning..................................... 74 3.3.1 Theoretical Justification............................ 74 3.3.2 Challenges in Training Deep Neural Networks................ 76 3.3.3 Solutions to Challenges............................ 79 3.3.4 New Developments............................... 86 3.4 Big Data......................................... 89 4 Application to Traffic Prediction 94 4.1 Problem Description.................................. 94 4.2 Related Research.................................... 95 4.3 Traffic Research Basics................................. 96 4.4 Data and Data Pre-Processing............................ 97 4.5 Model.......................................... 101 4.6 Implementation..................................... 109 4.7 Training......................................... 110 4.8 Model Evaluation.................................... 116 4.9 Possible Improvements and Future Research..................... 122 4.10 Other Applications in Civil Engineering....................... 123 4.11 Conclusion....................................... 124 References 125 Acronyms 138 I List of Figures 3.1 Loss Functions - Regression.............................. 16 3.2 Loss Functions - Classification............................. 18 3.3 Linear Regression example............................... 22 3.4 Logistic Regression example.............................. 25 3.5 Density Estimation example.............................. 27 3.6 Biological Neuron.................................... 38 3.7 Artificial Neuron.................................... 38 3.8 Activation Functions.................................. 41 3.9 Perceptron and Multilayer Perceptron........................ 44 3.10 Autoencoder...................................... 45 3.11 Fully Connected and Stacked Elman Recurrent Neural Network.......... 48 3.12 Boltzmann Machine and Restricted Boltzmann Machine.............. 53 3.13 Deep Boltzmann Machine............................... 54 3.14 Momentum and Nesterov Momentum update rule.................. 58 3.15 Recurrent Neural Network unrolled in time..................... 62 3.16 Underfitting and Overfitting.............................. 68 3.17 Comparison Deep Learning and conventional models................ 74 3.18 Rectified Linear Activation Function......................... 79 3.19 Maxout Unit...................................... 80 3.20 Long Short-Term Memory Unit............................ 81 3.21 Unsupervised Pre-Training effectiveness....................... 84 4.1 Fundamental Diagram of Traffic Flow........................ 96 4.2 Traffic Plot: Occupancy vs. Flow........................... 97 4.3 Traffic Plot: Speed vs. Flow.............................. 97 4.4 Traffic Plot: Occupancy vs. Speed........................... 97 4.5 Traffic and Weather Station locations........................ 98 4.6 Incident Data format.................................. 100 4.7 Model Architecture................................... 103 4.8 Correlation Significance Matrix............................ 105 4.9 Correlation Matrix................................... 105 4.10 Training and Validation Error............................. 116 4.11 Trained Weights - Convolutional Layers....................... 117 4.12 Trained Weights - LSTM Layer............................ 118 4.13 Mixture Components Probabilities.......................... 119 II List of Tables 3.1 Performance Metrics Classification.......................... 34 3.2 Model Comparison 1.................................. 34 3.3 Model Comparison 2.................................. 34 4.1 Hyperparameters - Model............................... 110 4.2 Traffic Regimes..................................... 111 4.3 Hyperparameters - Learning Algorithm....................... 111 4.4 Hyperparameters - Regularization.......................... 112 4.5 Result MAE and MSE................................. 120 4.6 Result Accuracy, Precision and Recall........................ 120 4.7 Model Comparison LSTM vs. Trivial - Paired T-Test................ 121 4.8 Model Comparison LSTM vs. Trivial - McNemar Test............... 121 4.9 Model Comparison LSTM vs. RNN - Paired T-Test................. 121 4.10 Model Comparison LSTM vs. RNN - McNemar Test................ 122 III List of Algorithms 1 Perceptron Learning Algorithm............................ 55 2 Batch Gradient Descent................................ 56 3 Mini-Batch Stochasic Gradient Descent....................... 57 4 Backpropagation.................................... 61 5 Backpropagation Through Time........................... 63 6 Contrastive Divergence................................. 66 IV Chapter 1 Introduction 1.1 Subject and Goals The primary goals of this thesis are to give an overview over the field of Deep Learning (DL), i.e. Machine Learning (ML) with deep Artificial Neural Networks (ANNs), and to demonstrate an application ofDL to a particular Big Data (BD) problem. Specifically, a broad background introduction to the theory ofML and ANNs is provided. A comprehensive exploration of the fieldDL then elucidates, from a theoretical perspective, why DL works, what its advantages are over shallow ANNs, which problems occur in training Deep Neural Networks (DNNs), and which recent theoretical advances have helped overcome these challenges. Furthermore, the subject ofBD is explored in some detail. In this context, it is exhibited how, in light of ever larger and more ubiquitous Data Sets,DL presents itself as an excellent approach to addressBD problems. In order to further substantiate the validity of this data-driven method, a practical application of DL to Traffic Modeling (TM) is discussed in detail. This use case exemplifies a scenario in which learning complex patterns directly from data is advantageous compared to explicitly modeling the relationships between a system's constituent parts. This research project encompasses acquisition and Pre-Processing of a massive Data Set, model design and its implementation using state of the artDL libraries, model training leveraging cloud accessible supercomputing resources, as well as generation of predictions on test data and their statistical analysis. 1.2 Structure Chapter 2 lays out the academic context ofDL, including its definition and relationship to other fields. Furthermore, the history ofDL is explored in the broader context of the history of Artificial Intelligence (AI) and, in particular, ANN research. Chapter 3 focuses on theory. The subject requires a thorough understanding ofML and ANNs, which are each discussed in dedicated subchapters. These two subchapters are to be understood as a reference for the rest of the thesis and can be skipped if the reader has sufficient background knowledge. The third subchapter provides a detailed view onDL, in particular, its theoretical justification, what challenges exist in training DNNs, and what solutions are available to overcome these challenges. Furthermore, promising new developments in the field are outlined. The fourth subchapter provides an overview of the subject ofBD and discusses its relationship toDL. Chapter 4 describes an application ofDL to theBD problem ofTM. In particular, the underlying data, model design, model implementation, and training are explained in detail. Moreover, possible use cases of the model, as well as results of example calculations are given, based on which model quality is compared to alternative approaches. Lastly, other possible applications ofDL in Civil Engineering are discussed. No particular

Load more