Implementation of Convolutional Neural Networks in Fpga for Image Classification
Total Page:16
File Type:pdf, Size:1020Kb
IMPLEMENTATION OF CONVOLUTIONAL NEURAL NETWORKS IN FPGA FOR IMAGE CLASSIFICATION A Thesis Presented to the Faculty of California State Polytechnic University, Pomona In Partial Fulfillment Of the Requirements for the Degree Master of Science In Electrical Engineering By Mark A. Espinosa 2019 SIGNATURE PAGE THESIS: IMPLEMENTATION OF CONVOLUTIONAL NEURAL NETWORKS IN FPGA FOR IMAGE CLASSIFICATION AUTHOR: Mark A. Espinosa DATE SUBMITTED: Spring 2019 Department of Electrical and Computer Engineering Dr. Zekeriya Aliyazicioglu Thesis Committee Chair Electrical & Computer Engineering Dr. James Kang Electrical & Computer Engineering Dr. Tim Lin Electrical & Computer Engineering ii ACKNOWLEDGEMENTS To my mom and dad who have always believed in me in all my endeavors all my life. To my wife who has kept me going and cheered me on every step of the way. Thank you and love you all. iii ABSTRACT Machine Learning and Deep Learning its sub discipline are gaining popularity quickly. Machine Learning algorithms have been successfully deployed in a variety of applications such as Natural Language Processing, Optical Character Recognition, and Speech Recognition. Deep Learning particularly is suited to Computer Vision and Image Recognition tasks. The Convolutional Neural Networks employed in Deep Learning Neural Networks train a set of weights and biases which with each layer of the network learn to recognize key features in an image. This work set out to develop a scalable and modular FPGA implementation for Convolutional Neural Networks. It was the objective of this work to attempt to develop a system which could be configured to run as many layers as desired and test it using a currently defined CNN configuration, AlexNet. This type of system would allow a developer to scale a design to fit any size of FPGA from the most inexpensive to the costliest cutting-edge chip on the market. The objective of this work was achieved, and all layers were accelerated including Convolution, Affine, ReLu, Max Pool, and Softmax layers. The performance of the design was assessed, and it was determined its maximum achievable performance was approximately 3 GFLOPS. iv TABLE OF CONTENTS SIGNATURE PAGE .................................................................................................................................... ii ACKNOWLEDGEMENTS .......................................................................................................................... iii ABSTRACT ...................................................................................................................................................iv TABLE OF CONTENTS .............................................................................................................................. v LIST OF TABLES ..................................................................................................................................... xii LIST OF FIGURES ...................................................................................................................................... xvi 1.0 CURRENT WORK AND PRACTICE ........................................................................................... 1 1.1 Machine Learning ........................................................................................................................ 1 1.2 Deep Learning .............................................................................................................................. 4 1.3 Core Deep Learning Architectures ............................................................................................ 5 1.4 AlexNet.......................................................................................................................................... 5 1.5 VGGnet ......................................................................................................................................... 7 1.6 ResNet ........................................................................................................................................... 8 1.7 Convolutional Neural Networks in FPGA: Literature Review .............................................. 10 1.7.1 Up to Now .......................................................................................................................... 10 1.7.2 Other Works ...................................................................................................................... 11 2.0 THESIS STATEMENT ................................................................................................................. 15 3.0 FUNDAMENAL CONCEPTS ...................................................................................................... 16 3.1 Convolutional Neural Networks ............................................................................................... 16 3.1.1 Convolutional Neural Networks: What are they ............................................................ 16 3.1.2 Convolution Operation: Theory, Properties, and Mathematical Representation ....... 17 3.1.2.1 Sparse Interactions ....................................................................................................... 20 v 3.1.2.2 Parameter Sharing ........................................................................................................ 21 3.1.2.3 Sparse Interactions ....................................................................................................... 24 3.1.3 Activation and Pooling ...................................................................................................... 25 3.2 Convolutional Neural Network in Practice.............................................................................. 26 3.2.1 Practical Convolution ....................................................................................................... 26 3.2.2 Stride .................................................................................................................................. 27 3.2.3 Padding .............................................................................................................................. 28 3.2.4 Putting It All Together ...................................................................................................... 30 4.0 DCNN DESIGN AND IMPLEMENTATION ............................................................................. 33 4.1 Similarities .................................................................................................................................. 33 4.1.1 Bus Protocol ....................................................................................................................... 33 4.1.2 DSP Slices........................................................................................................................... 34 4.1.3 Scalability ........................................................................................................................... 34 4.1.4 Data Format....................................................................................................................... 34 4.2 Distinctions ................................................................................................................................. 35 4.2.1 Simple Interface ................................................................................................................ 35 4.2.2 RTL Instead of High-Level Synthesis .............................................................................. 35 4.2.3 Flexible Design ................................................................................................................... 36 4.3 Goals ........................................................................................................................................... 36 4.4 Tools ............................................................................................................................................ 37 4.5 Hardware .................................................................................................................................... 38 5.0 FPGA TOP-LEVEL ARCHITECURAL DESIGN AND CONCEPT ....................................... 41 5.1 AXI BUS Overview .................................................................................................................... 41 vi 5.2 Data Format ............................................................................................................................... 44 5.2.1 Fixed Point ......................................................................................................................... 45 5.2.2 Floating Point Single and Double Precision .................................................................... 46 5.2.3 Floating Point Half Precision ........................................................................................... 48 5.2.4 Data Format Decision ....................................................................................................... 49 5.3 DSP48 Floating Point Logic ...................................................................................................... 50 5.4 Top Level Architecture and Concept ....................................................................................... 51 5.5 Top Level Module Descriptions ................................................................................................ 53 5.6 Top Level Port Descriptions ....................................................................................................