Image-Based Perceptual Learning Algorithm for Autonomous Driving

Image-based Perceptual Learning Algorithm for Autonomous Driving DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Yunming Shao, M.S. Graduate Program in Geodetic Science The Ohio State University 2017 Dissertation Committee: Dr. Dorota A. Grejner-Brzezinska, Advisor Dr. Charles Toth, Co-advisor Dr. Alper Yilmaz Dr. Rongjun Qin Copyrighted by Yunming Shao 2017 Abstract Autonomous driving is widely acknowledged as a promising solution to modern traffic problems such as congestions and accidents. It is a complicated system including sub-modules such as perception, path planning and control etc. During the last few years, the research and experiment have transferred from the academic to the industrial sector. Different sensor configurations exist depending on the manufactories, but an imaging components is exclusively used by every company. In this dissertation, we mainly focus on innovating and improving the camera perception algorithms using the deep learning algorithms. In addition, we propose an end-to-end control approach which can map the image pixels directly to control commands. This dissertation contributes in the development of autonomous driving in the three following aspects: Firstly, a novel dynamic objects detection architecture using still images is proposed. Our dynamic detection architecture utilizes the Convolution Neural Network (CNN) with end-to-end training approach. In our model, we consider multiple requirements for dynamic object detection in autonomous driving in addition to accuracy, such as inference speed, model size, and energy consumption. These are crucial to the deployment of a detector in a real autonomous vehicle. We determine our final architecture by exploring different pre-trained feature extractor and different combination of multi-scale feature layers. Our architecture is intensively tested on KITTI visual ii benchmark datasets [84] and has achieved comparable accuracy to the state-of-the-art approaches in real time. Secondly, to take advantage of the contextual information in the video sequences, we develop a video object detection framework based on CNN and Long Short Term Memory (LSTM). LSTM is a special kind of Recurrent Neural Network (RNN). The architecture we proposed in chapter 3 acts as the still image detector and feature extractor, and LSTM is responsible for exploiting the temporal information in the video stream. The input to the LSTM can be the visual features of the still image detector or the detection results in an individual frame, or both. We found that a combination of proper visual feature and detection results used as the temporal information to LSTM can achieve better performance compared with only one of them being used. Finally, we design an end-to-end control algorithm that takes the video sequences as input and directly outputs the control commands. We mainly focus on the supervised learning methods, i.e., convolution neural network and recurrent neural network, and train them using simulated data and real road data. As in the video object detection task, the recurrent neural network is designed to take advantage of the temporal information. By experiments, we evaluate several different proposed architecture networks, and recommend the one with the best performance. iii Acknowledgment First and foremost, I would like to thank my adviser Dr. Dorota Grejner-Brzezinska, for her valuable mentorship, patience, and encourage during my Ph.D. study. Her continuous support when repeatedly shifting research focus and exploring new ideas is greatly appreciated. I would like to also thank my co-adviser, Dr. Charles Toth, for his guidance of my study and the insight to the research topics. In addition, I learned a lot by attending the weekly group meeting he hosted. I also appreciate his patient review and constructive feedback to my dissertation. I would also like to give thanks to Dr. Alper Yilmaz and Dr. Rongjun Qin for serving on the committee of my dissertation defense. Both of them have provided precious comments and feedback, which help improve my dissertation. I would further like to thank my fellow students and colleagues for maintaining a professional research environment and providing helpful discussion with me. Finally, I would like to thank my family, especially my wife Yan Gao. They have been always encouraging and supporting me, and I would not have been able to complete this work without all of their love and encouragement. iv Vita 2010………………………. B.S. Mapping Engineering, Chinese University of Petroleum 2013………………………. M.S. Mapping Engineering, Chinese Academy of Sciences 2013 to present…………… Graduate Student, School of Earth Sciences, The Ohio State University Fields of Study Major Field: Geodetic Science v Table of Contents Abstract ......................................................................................................................... ii Acknowledgment ......................................................................................................... iv Vita ................................................................................................................................ v Fields of Study .............................................................................................................. v Table of Contents ......................................................................................................... vi List of Acronyms .......................................................................................................... x List of Tables .............................................................................................................. xii List of Figures ............................................................................................................ xiv Chapter 1: Introduction ................................................................................................ 1 1.1 Motivation ....................................................................................................... 1 1.2 History and Present ......................................................................................... 3 1.3 System Architecture ........................................................................................ 6 1.3.1 Sensor Input ............................................................................................ 7 1.3.2 Perception ............................................................................................. 10 1.3.3 Planning ................................................................................................ 13 vi 1.3.4 Control .................................................................................................. 15 1.4 Contributions................................................................................................. 17 1.5 Dissertation Organization ............................................................................. 19 Chapter 2: Foundation and Literature Review ........................................................... 21 2.1 Computer Vision and Deep Learning................................................................ 21 2.1.1 Computer Vision ............................................................................................ 21 2.1.2 Deep Learning ................................................................................................ 27 2.1.3 Datasets .......................................................................................................... 28 2.2 Neural Network ................................................................................................. 29 2.2.1 Neurons and Neural Network Architecture .................................................... 29 2.2.2 Activation Functions ...................................................................................... 31 2.2.3 Training A Neural Network ........................................................................... 34 2.3 Convolutional Neural Network ......................................................................... 39 2.3.1 Architecture .................................................................................................... 39 2.3.2 Convolution .................................................................................................... 41 2.3.3 Pooling ........................................................................................................... 43 2.3.4 Dropout........................................................................................................... 45 2.3.5 Transfer Learning ........................................................................................... 46 2.4 Recurrent Neural Network ................................................................................ 48 vii 2.4.1 Architecture .................................................................................................... 49 2.4.2 Training RNNs ............................................................................................... 50 Chapter 3: Dynamic Object Detection on Road ........................................................ 52 3.1 Related Work..................................................................................................... 52 3.1.1 Traditional Approaches .................................................................................. 52 3.1.2 CNNs for Object Detection ...........................................................................

Image-Based Perceptual Learning Algorithm for Autonomous Driving

Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation

A Group-Theoretic Framework for Data Augmentation

Real-Time Monitoring for Hydraulic States Based on Convolutional Bidirectional LSTM with Attention Mechanism

Data Augmentation and Feature Selection for Automatic Model Recommendation in Computational Physics

Generative Adversarial Networks to Improve the Robustness of Visual Defect Segmentation by Semantic Networks in Manufacturing Components

Posture Recognition Using Ensemble Deep Models Under Various Home Environments

Automatic Relational Data Augmentation for Machine Learning

Data Augmentation Schemes for Deep Learning in an Indoor Positioning Application

Augmenting Image Classifiers Using Data Augmentation Generative

GAN-Based Image Data Augmentation

Empirical Evaluation of Variational Autoencoders for Data Augmentation

Text Augmentation for Neural Networks