Towards the Design of Neural Network Framework for Object Recognition and Target Region Refining for Smart Transportation Systems

Towards the Design of Neural Network Framework for Object Recognition and Target Region Refining for Smart Transportation Systems By Yiheng Zhao Thesis submitted in partial fulfillment of the requirements For Master of Computer Science degree in Electrical and Computer Engineering School of Electrical Engineering and Computer Science Faculty of Engineering University of Ottawa © Yiheng Zhao, Ottawa, Canada, 2018 Abstract Object recognition systems have significant influences on modern life. Face, iris and finger point recognition applications are commonly applied for the security purposes; ASR (Automatic Speech Recognition) is commonly implemented on speech subtitle generation for various videos and audios, such as YouTube; HWR (Handwriting Recognition) systems are essential on the post office for cheque and postcode detection; ADAS (Advanced Driver Assistance System) are well applied to improve drivers’, passages’ and pedestrians’ safety. Object recognition techniques are crucial and valuable for academia, commerce and industry. Accuracy and efficiency are two important standards to evaluate the performance of recognition techniques. Accuracy includes how many objects can be indicated in real scene and how many of them can be correctly classified. Efficiency means speed for system training and sample testing. Traditional object detecting methods, such as HOG (Histogram of orientated Gradient) feature detector combining with SVM (Support Vector Machine) classifier, cannot compete with frameworks of neural networks in both efficiency and accuracy. Since neural network has better performance and potential for improvement, it is worth to gain insight into this field to design more advanced recognition systems. In this thesis, we list and analyze sophisticated techniques and frameworks for object recognition. To understand the mathematical theory for network design, state-of-the-art networks in ILSVRC (ImageNET Large Scale Visual Recognition Challenge) are studied. Based on analysis and the concept of edge detectors, a simple CNN (Convolutional Neural Network) structure is designed as a trail to explore the possibility to utilize the network of high width and low depth for region proposal selection, object recognition and target region refining. We adopt Le-Net as the template, taking advantage of multi-kernels of GoogLe-Net. We made experiments to test the performance of this simple structure of the vehicle and face through ImageNet dataset. The accuracy for the single object detection is 81% in average and for plural object detection is 73.5%. We refined networks through many aspects to reach the final accuracy 95% for single object detection and 89% for plural object detection. ii Acknowledgements Firstly, thanks to Professor Azzedine Boukerche, I gained a chance to pursue master’s degree in University of Ottawa and did the research in Paradise Lab. I was also grateful for his kindness to provide me finical supporting. With funding, I can purchase necessary books, documents software and equipment for my thesis experiments. During the study life, he taught us important academic and social network skills. Besides, he led us visiting many famous companies to gain insight into industry field. Secondly, I give my thank to University of Ottawa, who offers me the chances to learn deep techniques of computer science. It also provides me a great environment for studying and good academic resources in its library. I also enjoy the health welfare it brings to me. Thirdly, I appreciate my colleagues in Paradise Lab. They always accompany with me and give me supports. Besides, I was inspired by them to write my thesis. Finally, I appreciative to be born in such happy family. My parents give me gratuitous help, care and finical supports. They did everything for me to ensure that I have a great condition in my life. I’m proud of them and do my best to let them proud of me. iii Table of Contents Abstract……………………………………………………………………………………………………ii Acknowledgements .................................................................................................................................. iii Table of Contents ...................................................................................................................................... iv Nomenclature ............................................................................................................................................vi 1 Introduction ............................................................................................................................................. 1 1.1 Background ........................................................................................................................................ 1 1.2 Motivation and Contribution .............................................................................................................. 4 1.3 Thesis Outline .................................................................................................................................... 6 2 Theories and Literature Review ............................................................................................................ 8 2.1 Related Knowledge ............................................................................................................................ 8 2.1.1 Convolution. ................................................................................................................................ 8 2.1.2 Artificial Neural Network. ........................................................................................................ 10 2.1.3 Over-fitting and Under-fitting Problem. ................................................................................... 14 2.1.4 Techniques for Network Training. ............................................................................................ 16 2.1.5 Techniques for Plural Object Detection. ................................................................................... 21 2.2 State-of-art CNNs and Frameworks ................................................................................................. 25 2.2.1 Le-Net. ...................................................................................................................................... 25 2.2.2 Alex-Net. ................................................................................................................................... 27 2.2.3 ZF-Net. ...................................................................................................................................... 30 2.2.4 VGG-Net. .................................................................................................................................. 32 2.2.5 GoogLe-Net. ............................................................................................................................. 33 2.2.6 Microsoft Res-Net. .................................................................................................................... 34 2.2.7 Crafting GBD-Net. .................................................................................................................... 36 2.2.8 Summary. .................................................................................................................................. 37 2.3 Frameworks for Plural Object Detection ......................................................................................... 38 2.3.1 R-CNN. ..................................................................................................................................... 38 2.3.2 SPP Network. ............................................................................................................................ 39 2.3.3 Fast R-CNN. ............................................................................................................................. 41 2.3.4 Faster R-CNN. .......................................................................................................................... 42 iv 2.3.5 Summary. .................................................................................................................................. 43 3 Research Design and Methodology ..................................................................................................... 44 3.1 Analysis and Design ........................................................................................................................ 44 3.1.1 General Structure. ..................................................................................................................... 44 3.1.2 Conv Layer. ............................................................................................................................... 47 3.1.3 Pooling Layer. ........................................................................................................................... 49 3.1.4 Output Layer. ............................................................................................................................ 49 3.1.5 Training. .................................................................................................................................... 50 3.2 Summary .......................................................................................................................................... 51 4 Experiments and Results ...................................................................................................................... 55 4.1

Load more