Capsule Network Implementation on FPGA FPGA

Journal of Pure & Applied Sciences www.Suj.sebhau.edu.ly ISSN 2521-9200 Received 23/03/2020 Revised 16/08/2020 Published online 05/10/2020 Capsule Network Implementation On FPGA *Salim A. Adrees , Ala A. Abdulrazeg Department of Computer Engineering, College of Engineering, Omar Al-Mukhtar University, Libya *Corresponding author: [email protected] Abstract A capsule neural network (CapsNet) is a new approach in artificial neural network (ANN) that produces a better hierarchical relationship. The performance of CapsNet on graphics processing unit (GPU) is considerably better than convolutional neural network (CNN) at recognizing highly overlapping digits in images. Nevertheless, this new method has not been designed as an accelerator on field programmable gate array (FPGA) to measure the speedup performance and compare it with the GPU. This paper aims to design the CapsNet module (accelerator) on FPGA. The performance between FPGA and GPU will be compared, mainly in terms of speedup and accuracy. The results show that training time on GPU using MATLAB is 789.091 s. Model evaluation accuracy is 99.79% and the validation accuracy is 98.53%. The time required to finish one routing algorithm iteration in MATLAB is 0.043622 s and in FPGA it takes 0000056s which means FPGA module is 67 times faster than GPU. Keywords: Artificial intelligent, Capsule neural network, Deep learning, FPGA, Image recognition. FPGA [email protected] ANN (CapsNet) CNN GPU FPGA MATLAB 70063 77097 9070071 0000056 00033522 MATLAB 59 FPGA I. Introduction Nowadays, Machine learning powers different fields between inputs and outputs. Unfortunately, CNN such as websites, cameras, and smartphones. has a drawback in its basic architecture. This Machine learning [1] was implemented to identify drawback causing it to be not effective [6-8]. The objects in the image and extract them to apply main drawback is the max pooling function in the further processing. The general idea of machine convolutional layer where extracting the learning is to take a real dataset and apply any information for the image is accrued [1]. After machine learning algorithms on the dataset such applying convolution on the input image using as deep learning [1], neural networks[2], feature extracting filters, the max pooling will take Perceptron algorithm [3], K-nearest neighbour a place as shown in Fig. 1.Max polling function algorithm [4], decision tree[5], etc. In addition, the detects the presents of a certain object only and most dominant technique is deep learning. ignores other important features such as precise Convolutional neural network (CNN) has controlled location, rotation, and the relationships to the the field of deep learning because it is very good to other objects in the same image [6-8]. extract and analyze data to form the relationships JOPAS Vol.19 No. 5 2002 50 Capsule Network Implementation On FPGA Adrees & Abdulrazeg. Fig. 1: Convolutional layers in CNN The Capsule network is a new approach of machine results in GPU. Section V presents the results in learning field. Hinton and his team in 2017 [6] FPGA. Finally, Section VI concludes the paper. introduced a dynamic routing mechanism for capsule networks. The approach has proved that it II. capsule neural network reduces the MNIST dataset error rates and size of The hierarchy in this algorithm is divided into three the training dataset. Results were estimated to be main blocks as shown in Fig. 2. Each block has sub better than the CNN on highly overlapped digits. operation and they are [6]: CNN learn to classify objects by passing the image 1. Primary capsules: through convolutional layers to extract low-level a) Convolution. features in first layers and then grouping up b) Reshaping process. features learned through the first layers to extract c) Squashing function. more complex features in deeper layers [1]. CNN 2. Higher level capsules: tries to overcome the limitation by using a large a) Routing between capsules. training dataset. The fundamental issue in CNN is 3. Loss calculation: using the max pooling function to extract features a) Margin loss. and fed them to the next layer [9]. For this reason, CapsNet is developed to replace invariance concept with equivariant. Equivariant means that if the input appears in the image but in different rotation or position the model will consider them, while invariance means that the model aims to identify the presence of the object ignoring other related properties. In the last few years, there was a race between GPU and FPGA vendors in terms of the amount of Fig. 2: Structure of the capsule networks computation that can be handled efficiently with high speed. In addition, the popularity of b) Reconstruction loss machine learning and its complex computation The key feature of this algorithm is the Routing demands makes it one of the important algorithm which represents the mid-section of the comparisons between GPU and FPGA [10]. CapsNet algorithm. The hardware part will be In this paper, we have implemented CapsNet on designed for the Routing algorithm. The FPGA and GPU and compare the accuracy and the mathematical representation of this algorithm is speedup performance. Vivado is used in FPGA shown in Procedure 1. implementation and MATLAB is used in GPU This procedure will receive û which the output of implementation. The organization of this paper is the primary capsules, the number of layers l and as follows. Section II includes details about the the number of epochs r. CapsNet algorithm. The FPGA implementation is described in Section III. Section IV presents the Procedure 1 Routing algorithm 1: procedure Routing (ûj|i , r , l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij ← 2. 3: for r iterations do 4: for all capsule i in layer l: ci ← softmax(bi) softmax computes Eq. (1) 5: for all capsule j in layer (l+1): sj ← ∑푖 푐푖푗 û푗|푖 6: for all capsule j in layer (l+1): vj ← squash(sj) squash computes Eq. (2) 7: for all capsule i in layer l and capsule j in layer (l+1): bij ← bi + û푗|푖 . vj 8: return vj exp (푏푖푗) 푐 = ( 1 ) 푖푗 ‖푠푗‖ 푠푗 ∑푘 exp (푏푖푘) 푣 = ( 2) 푗 2 ‖푠 ‖ 1 + ‖푠푗‖ 푗 JOPAS Vol.19 No. 5 2002 51 Capsule Network Implementation On FPGA Adrees & Abdulrazeg. First, all initial logics b assigned to zero and the Fig. 3: Flowchart of the hardware design coupling coefficient is calculated using softmax function, equation (1), then the sum Sj of all inputs The fixed-point representation is used in this in each digit capsule is calculated, the final step is design with 16-bit accuracy. This high number of applying squashing function on Sj, equation (2), bits reserved for the accuracy is because it is one then the initial logics b are updated and this is the of the aims of this research. A matrix û [160][31] end of the first epoch. This process will be repeated and flag for the loop are the inputs of this for r times. The complexity of the routing algorithm flowchart. The main obstacles of the hardware is the reason for choosing it to be implemented in design are represented in the square root function the hardware part. This will be proved in the result which is used in calculating the magnitude in MAG section. The dataset used in this algorithm is function, and the exponential function used in Modified National Institute of Standards and softmax function. First, the square root function is Technology (MNIST). The dataset consists of adopted from previous work [12]. Second, the 70,000 images (28 X 28). The training images are exponential function is implemented on b matrix. 60,000 images, and testing images are 10,000 The simulation in MATLAB shows that the values images [11]. This part will be implemented in in this matrix almost equal to zero except for one MATLAB. Profiling will be implemented on the code row for all images in MINTS dataset. By replacing to find the timing of each part. The code will be exp(b) with 2푏 the results will not be affected and executed on GPU. many clock cycles will be saved. III. FPGA implementation The software used in this part is Xilinx Vivado. The The hardware implementation is done in the software will provide the timing simulation of the prediction mode, not in the training mode. First, hardware design. the primary capsule layer is implemented in the MATLAB and the output of this layer will be IV. GPU Results adopted to the hardware implementation. Fig. 3 CapsNet is implemented on GPU. The module is shows the flowchart of the hardware design. trained using MNIST dataset. The results show that the error rate and loss after each epoch are close to Start zero. Fig. 4 shows the results in MATLAB simulation. û [160][31] flag = 0 Softmax function () Replication_1 function () Dot_Pro function () Replication_2 function () Fig. 4: Error rate and loss after each epoch Squashing function () The module evaluation accuracy is 99.79% and the validation accuracy is 98.53%. Reshaping function () The time analyzing of the code by implementing profiling on MATLAB code shows that the maximum amount of time from overall execution flag ==1? time is taken by the routing algorithm which is equal to 0.043622 s. Fig. 5 shows the profiling summary of the code in MATLAB. MAG function () Update_b function () flag =1 End JOPAS Vol.19 No. 5 2002 52 Capsule Network Implementation On FPGA Adrees & Abdulrazeg. Fig. 5: Profiling summary of the code Based on this profile summary, the part that implementation details such as how many adders, should be implemented in FPGA is the routing by multipliers and divider are required.

Load more