Computer Vision and Machine Learning for Autonomous Vehicles

Computer Vision and Machine Learning for Autonomous Vehicles by Zhilu Chen A Dissertation Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering August 2017 APPROVED: Prof. Xinming Huang, Major Advisor Prof. Lifeng Lai Prof. Haibo He Abstract Autonomous vehicle is an engineering technology that can improve transporta- tion safety, alleviate traffic congestion and reduce carbon emissions. Research on autonomous vehicles can be categorized by functionality, for example, object detection or recognition, path planning, navigation, lane keeping, speed control and driver status monitoring. The research topics can also be categorized by the equipment or techniques used, for example, image processing, computer vision, machine learning, and localization. This dissertation primarily reports on computer vision and machine learning algorithms and their implementations for autonomous vehicles. The vision- based system can effectively detect and accurately recognize multiple objects on the road, such as traffic signs, traffic lights, and pedestrians. In addition, an autonomous lane keeping system has been proposed using end-to-end learning. In this dissertation, a road simulator is built using data collection and augmentation, which can be used for training and evaluating autonomous driving algorithms. The Graphic Processing Unit (GPU) based traffic sign detection and recognition system can detect and recognize 48 traffic signs. The implementation has three stages: pre-processing, feature extraction, and classification. A highly optimized and parallelized version of Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) is used. The system can process 27.9 frames per second with the active pixels of a 1,628 Ö 1,236 resolution, and with the minimal loss of accuracy. In an evaluation using the BelgiumTS dataset, the experimental results indicate that the detection rate is about 91.69% with false positives per window of 3:39×10−5, and the recognition rate is about 93.77%. We report on two traffic light detection and recognition systems. The first sys- i tem detects and recognizes red circular lights only, using image processing and SVM. Its performance is better than that of traditional detectors and it achieves the best performance with 96.97% precision and 99.43% recall. The second system is more complicated. It detects and classifies different types of traffic lights, including green and red lights in both circular and arrow forms. In addition, it employs image processing techniques, such as color extraction and blob detection to locate the candidates. Subsequently, a pre-trained PCA network is used as a multi-class classifier for obtaining frame-by-frame results. Furthermore, an online multi-object tracking technique is applied to overcome occasional misses and a forecasting method is used to filter out false positives. Several additional optimization techniques are employed to improve the detector performance and to handle the traffic light transitions. A multi-spectral data collection system is implemented for pedestrian detection, which includes a thermal camera and a pair of stereo color cameras. The three cameras are first aligned using trifocal tensor, and the aligned data are processed by using computer vision and machine learning techniques. Convolutional channel features (CCF) and the traditional HOG+SVM approach are evaluated over the data captured from the three cameras. Through the use of trifocal tensor and CCF, training becomes more efficient. The proposed system achieves only a 9% log-average miss rate on our dataset. Autonomous lane keeping system employs an end-to-end learning approach for obtaining the proper steering angle for maintaining a car in a lane. The convolutional neural network (CNN) model uses raw image frames as input, and it outputs the steering angles corresponding to the input frames. Unlike the traditional approach, which manually decomposes the problem into several parts, such as lane detection, path planning, and steering control, the model learns to extract useful features on ii its own and learns to steer from human behavior. More importantly, we find that having a simulator for data augmentation and evaluation is important. We then build the simulator using image projection, vehicle dynamics, and vehicle trajectory tracking. The test results reveal that the model trained with augmented data using the simulator has better performance and achieves about a 98% autonomous driving time on our dataset. Furthermore, a vehicle data collection system is developed for building our own datasets from recorded videos. These datasets are used in the above studies and have been released to the public for autonomous vehicle research. The experimental datasets are available at http://computing.wpi.edu/Dataset.html. iii Acknowledgements I would like to express my gratitude to my advisor, Professor Xinming Huang, for the opportunity to do research at WPI and his guidance in my research. Thanks for Professor Haibo He, Lifeng Lai and many other professors for their help. I've learned a lot from them. Thanks to my families and my friends for giving me the courage and confidence. iv Contents Abstract i Acknowledgements iv Contents ix List of Tables x List of Figures xv List of Abbreviations xvii 1 Introduction 1 1.1 Motivations . 1 1.2 Summary of Contributions . 3 1.3 Outline . 8 2 Background 10 2.1 Datasets . 11 2.2 Object detection and recognition . 13 2.2.1 Traffic sign . 13 v 2.2.2 Traffic light . 14 2.2.3 Pedestrian . 15 2.3 Lane keeping . 19 3 A GPU-Based Real-Time Traffic Sign Detection and Recognition System 21 3.1 Introduction . 22 3.2 Traffic Sign Detection and Recognition System . 23 3.2.1 System Overview . 23 3.2.2 Pre-processing . 24 3.2.3 Traffic Sign Detection . 26 3.2.4 Traffic Sign Recognition . 29 3.3 Parallelism on GPU . 29 3.4 Experimental Results . 31 3.5 Conclusions . 35 4 Automatic Detection of Traffic Lights Using Support Vector Ma- chine 36 4.1 Introduction . 37 4.2 Proposed Method for Traffic Light Detection . 38 4.2.1 Locating candidates based on color extraction . 38 4.2.2 Traffic light detection using template matching . 38 4.2.3 An improved method using SVM . 40 4.3 Data Collection and Performance Evaluation . 43 4.4 Conclusions . 46 vi 5 Accurate and Reliable Detection of Traffic Lights Using Multi-Class Learning and Multi-Object Tracking 48 5.1 Introduction . 49 5.2 Data Collection and Experimental Setup . 51 5.2.1 Training data . 52 5.2.2 Test data . 58 5.3 Proposed Method of Traffic Light Detection and Recognition . 58 5.3.1 Locating candidates based on color extraction . 60 5.3.2 Classification . 65 5.3.2.1 PCANet . 65 5.3.2.2 Recognizing green traffic lights using PCANet . 66 5.3.2.3 Recognizing red traffic lights using PCANet . 69 5.3.3 Stabilizing the detection and recognition output . 69 5.3.3.1 The problem of frame-by-frame detection . 69 5.3.3.2 Tracking and data association . 71 5.3.3.3 Forecasting . 72 5.3.3.4 Minimizing delays . 74 5.4 Performance Evaluation . 76 5.4.1 Detection and recognition . 76 5.4.2 False positives evaluation . 78 5.5 Discussion . 79 5.5.1 Comparison with related work . 79 5.5.2 Limitation and plausibility . 80 5.6 Conclusions . 83 vii 6 Pedestrian Detection for Autonomous Vehicle Using Multi-spectral Cameras 84 6.1 Introduction . 85 6.2 Data Collection and Experimental Setup . 87 6.2.1 Data Collection Equipment . 87 6.2.2 Data Collection and Experimental Setup . 90 6.3 Proposed Method . 90 6.3.1 Overview . 90 6.3.2 Trifocal tensor . 91 6.3.3 Sliding windows vs. region of interest . 93 6.3.4 Detection . 97 6.3.5 Information fusion . 99 6.3.6 Additional constraints . 100 6.3.6.1 Disparity-size . 100 6.3.6.2 Road horizon . 100 6.4 Performance Evaluation . 102 6.5 Discussion . 105 6.6 Conclusions . 108 7 End-to-End Learning for Lane Keeping of Self-Driving Cars 109 7.1 Introduction . 110 7.2 Implementation Details . 111 7.2.1 Data pre-processing . 111 7.2.2 CNN implementation details . 113 7.3 Evaluation . 116 viii 7.4 Discussion . 119 7.4.1 Evaluation . 119 7.4.2 Data augmentation . 122 7.5 Conclusions . 123 8 Building an Autonomous Lane Keeping Simulator Using Real-World Data and End-to-End Learning 124 8.1 Introduction . 125 8.2 Building a Simulator . 128 8.2.1 Overview . 128 8.2.2 Image projection . 130 8.2.3 Vehicle dynamics and vehicle trajectory tracking . 134 8.2.4 CNN implementation . 142 8.3 Experiment . 143 8.3.1 Data collection . 143 8.3.2 Data augmentation . 147 8.3.3 Evaluation using simulator . 147 8.4 Discussion . 151 8.5 Conclusions . 153 9 Conclusions 154 Bibliography 157 ix List of Tables 3.1 HOG parameters in our system . 28 4.1 Evaluation result based on Rin=Rout for different p values . 45 4.2 Evaluation result: precision and recall . 46 5.1 Number of training samples of Green ROI-n and Red ROI-n . 58 5.2 Information of 23 test sequences . 59 5.3 Test result of 17 sequences that contain traffic lights . 78 5.4 Number of false positives in traffic-light-free sequences . 79 5.5 Results of several recent works on traffic lights detection . 81 8.1 Evaluation result using the simulator, with and without augmented data.149 x List of Figures 2.1 Performance results from the Caltech Pedestrian Detection Benchmark. 17 3.1 Three stages in our proposed system. 24 3.2 48 classes of traffic signs can be detected and recognized in our system.

Load more