Image-Based Perceptual Learning Algorithm for Autonomous Driving

Image-Based Perceptual Learning Algorithm for Autonomous Driving

Image-based Perceptual Learning Algorithm for Autonomous Driving DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Yunming Shao, M.S. Graduate Program in Geodetic Science The Ohio State University 2017 Dissertation Committee: Dr. Dorota A. Grejner-Brzezinska, Advisor Dr. Charles Toth, Co-advisor Dr. Alper Yilmaz Dr. Rongjun Qin Copyrighted by Yunming Shao 2017 Abstract Autonomous driving is widely acknowledged as a promising solution to modern traffic problems such as congestions and accidents. It is a complicated system including sub-modules such as perception, path planning and control etc. During the last few years, the research and experiment have transferred from the academic to the industrial sector. Different sensor configurations exist depending on the manufactories, but an imaging components is exclusively used by every company. In this dissertation, we mainly focus on innovating and improving the camera perception algorithms using the deep learning algorithms. In addition, we propose an end-to-end control approach which can map the image pixels directly to control commands. This dissertation contributes in the development of autonomous driving in the three following aspects: Firstly, a novel dynamic objects detection architecture using still images is proposed. Our dynamic detection architecture utilizes the Convolution Neural Network (CNN) with end-to-end training approach. In our model, we consider multiple requirements for dynamic object detection in autonomous driving in addition to accuracy, such as inference speed, model size, and energy consumption. These are crucial to the deployment of a detector in a real autonomous vehicle. We determine our final architecture by exploring different pre-trained feature extractor and different combination of multi-scale feature layers. Our architecture is intensively tested on KITTI visual ii benchmark datasets [84] and has achieved comparable accuracy to the state-of-the-art approaches in real time. Secondly, to take advantage of the contextual information in the video sequences, we develop a video object detection framework based on CNN and Long Short Term Memory (LSTM). LSTM is a special kind of Recurrent Neural Network (RNN). The architecture we proposed in chapter 3 acts as the still image detector and feature extractor, and LSTM is responsible for exploiting the temporal information in the video stream. The input to the LSTM can be the visual features of the still image detector or the detection results in an individual frame, or both. We found that a combination of proper visual feature and detection results used as the temporal information to LSTM can achieve better performance compared with only one of them being used. Finally, we design an end-to-end control algorithm that takes the video sequences as input and directly outputs the control commands. We mainly focus on the supervised learning methods, i.e., convolution neural network and recurrent neural network, and train them using simulated data and real road data. As in the video object detection task, the recurrent neural network is designed to take advantage of the temporal information. By experiments, we evaluate several different proposed architecture networks, and recommend the one with the best performance. iii Acknowledgment First and foremost, I would like to thank my adviser Dr. Dorota Grejner-Brzezinska, for her valuable mentorship, patience, and encourage during my Ph.D. study. Her continuous support when repeatedly shifting research focus and exploring new ideas is greatly appreciated. I would like to also thank my co-adviser, Dr. Charles Toth, for his guidance of my study and the insight to the research topics. In addition, I learned a lot by attending the weekly group meeting he hosted. I also appreciate his patient review and constructive feedback to my dissertation. I would also like to give thanks to Dr. Alper Yilmaz and Dr. Rongjun Qin for serving on the committee of my dissertation defense. Both of them have provided precious comments and feedback, which help improve my dissertation. I would further like to thank my fellow students and colleagues for maintaining a professional research environment and providing helpful discussion with me. Finally, I would like to thank my family, especially my wife Yan Gao. They have been always encouraging and supporting me, and I would not have been able to complete this work without all of their love and encouragement. iv Vita 2010………………………. B.S. Mapping Engineering, Chinese University of Petroleum 2013………………………. M.S. Mapping Engineering, Chinese Academy of Sciences 2013 to present…………… Graduate Student, School of Earth Sciences, The Ohio State University Fields of Study Major Field: Geodetic Science v Table of Contents Abstract ......................................................................................................................... ii Acknowledgment ......................................................................................................... iv Vita ................................................................................................................................ v Fields of Study .............................................................................................................. v Table of Contents ......................................................................................................... vi List of Acronyms .......................................................................................................... x List of Tables .............................................................................................................. xii List of Figures ............................................................................................................ xiv Chapter 1: Introduction ................................................................................................ 1 1.1 Motivation ....................................................................................................... 1 1.2 History and Present ......................................................................................... 3 1.3 System Architecture ........................................................................................ 6 1.3.1 Sensor Input ............................................................................................ 7 1.3.2 Perception ............................................................................................. 10 1.3.3 Planning ................................................................................................ 13 vi 1.3.4 Control .................................................................................................. 15 1.4 Contributions................................................................................................. 17 1.5 Dissertation Organization ............................................................................. 19 Chapter 2: Foundation and Literature Review ........................................................... 21 2.1 Computer Vision and Deep Learning................................................................ 21 2.1.1 Computer Vision ............................................................................................ 21 2.1.2 Deep Learning ................................................................................................ 27 2.1.3 Datasets .......................................................................................................... 28 2.2 Neural Network ................................................................................................. 29 2.2.1 Neurons and Neural Network Architecture .................................................... 29 2.2.2 Activation Functions ...................................................................................... 31 2.2.3 Training A Neural Network ........................................................................... 34 2.3 Convolutional Neural Network ......................................................................... 39 2.3.1 Architecture .................................................................................................... 39 2.3.2 Convolution .................................................................................................... 41 2.3.3 Pooling ........................................................................................................... 43 2.3.4 Dropout........................................................................................................... 45 2.3.5 Transfer Learning ........................................................................................... 46 2.4 Recurrent Neural Network ................................................................................ 48 vii 2.4.1 Architecture .................................................................................................... 49 2.4.2 Training RNNs ............................................................................................... 50 Chapter 3: Dynamic Object Detection on Road ........................................................ 52 3.1 Related Work..................................................................................................... 52 3.1.1 Traditional Approaches .................................................................................. 52 3.1.2 CNNs for Object Detection ...........................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    148 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us