A Memristor-Based Cascaded Neural Networks for Specific Target
Total Page:16
File Type:pdf, Size:1020Kb
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2019 doi:10.20944/preprints201901.0319.v1 Article A Memristor-based Cascaded Neural Networks for Specific Target Recognition Sheng-Yang Sun, Hui Xu, Jiwei Li, Yi Sun, Qingjiang Li, Zhiwei Li * and Haijun Liu * College of Electronic Science, National University of Defense Technology, Changsha 410073, China; [email protected] (S.-Y. S.); [email protected] (H. X.); [email protected],cn (J. L.); [email protected] (Y. S.); [email protected] (Q. L.) * Correspondence: [email protected] (Z. L.); [email protected] (H. L.) 1 Abstract: Multiply-accumulate calculations using a memristor crossbar array is an important method 2 to realize neuromorphic computing. However, the memristor array fabrication technology is 3 still immature, and it is difficult t o f abricate l arge-scale a rrays w ith h igh-yield, w hich restricts 4 the development of memristor-based neuromorphic computing technology. Therefore, cascading 5 small-scale arrays to achieve the neuromorphic computational ability that can be achieved by 6 large-scale arrays, which is of great significance for promoting the application of memristor-based 7 neuromorphic computing. To address this issue, we present a memristor-based cascaded framework 8 with some basic computation units, several neural network processing units can be cascaded by this 9 means to improve the processing capability of the dataset. Besides, we introduce a split method to 10 reduce pressure of input terminal. Compared with VGGNet and GoogLeNet, the proposed cascaded 11 framework can achieve 93.54% Fashion-MNIST accuracy under the 4.15M parameters. Extensive 12 experiments with Ti/AlOx/TaOx/Pt we fabricated are conducted to show that the circuit simulation 13 results can still provide a high recognition accuracy, and the recognition accuracy loss after circuit 14 simulation can be controlled at around 0.26%. 15 Keywords: cascaded neural networks; memristor crossbar; convolutional neural networks 16 1. Introduction 17 Convolutional neural networks (CNNs) have been widely used in computer vision tasks such 18 as image classification [1][2][3], they are widely popular in industry for their superior accuracy on 19 datasets. Some concepts about CNNs were proposed by Fukushima in 1980 [4]. In 1998, LeCun et 20 al. proposed the LeNet-5 CNNs structure based on a gradient-based learning algorithm [5]. With 21 the development of deep learning algorithms, the dimension and complexity of CNNs layers have 22 grown significantly. However, state-of-the-art CNNs require too many parameters and compute at 23 billions of FLOPs, which prevents them from being utilized in embedded applications. For instance, 24 the AlexNet [1] proposed by Krizhevsky et al. in 2012, requires 60M parameters and 650K neurons; 25 ResNet [6], which is broadly used in detection task [7], has a complexity of 7.8 GFLOPs and fails to 26 achieve real-time applications even with a powerful GPU. 27 For the past few years, embedded neuromorphic processing systems have acquired significant 28 advantages, such as the ability to solve the image classification problems while consuming very 29 little area and power consumption [8]. In addition to these advantages, neuromorphic computing 30 can also be used to simulate the functions of human brains [9]. With the rapid development of the 31 VLSI industry, increasingly more devices are beginning to be miniaturized and integrated [10]. As 32 mentioned above, CNNs lead to an exploding number of network synapses and neurons in which the 33 relevant hardware costs will be huge [11], these complex computations and high memory requirements 34 make it impractical for the application of embedded systems, robotics and real-time or mobile scenarios 35 in general. 36 In recent years, the memristor [12] has received significant attention as synapses for neuromorphic 37 systems. The memristor is a non-volatile device that can store the synaptic weights of neural networks. © 2019 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2019 doi:10.20944/preprints201901.0319.v1 2 of 11 38 Voltage pulses can be applied to memristors to change their conductivity and tune them to a specific 39 resistance [13]. A memristor crossbar array can naturally carry out vector-matrix multiplication which 40 is a computationally expensive operation for neural networks. It has been recently demonstrated that 41 analog vector-matrix multiplication can be of orders of magnitude that are more efficient than ASIC, 42 GPUs or FPGA based implementations [14]. 43 Some neural network architectures based on the memristor crossbar have been proposed [15–17], 44 these network architectures utilize memristor crossbar to perform convolution computation iterations 45 which also cost too much time and require high memory. On the other hand, the fabrication technology 46 of large-scale memristor crossbar array is still immature [18,19], however useful cascaded framework 47 in real applications with memristor-based architectures have seldom been reported. To make full 48 use of limited memristor resources and make the system work at high speed in real-time processing 49 applications, we present a memristor-based cascaded framework with some neuromorphic processing 50 chip. That means several neural network processing chips can be cascaded to improve the processing 51 capability of the dataset. The basic computation unit in this work builds on our prior work devoloping 52 memristor-based CNNs architecture [20], which has validated that the three-layer CNNs with Abs 53 activation function can get desired recognition accuracy. 54 The rest of this paper is organized as follows, Section II presents the cascaded method based on 55 the basic computation unit and a split method, including the circuits implemented based on memristor 56 crossbar array. Section III exhibits the experimental results. The final Section IV concludes the paper. 57 2. Proposed Cascaded Method 58 To have a better understanding of cascaded method, we first introduce basic units that make up 59 the cascaded network, and then a detailed description of our cascaded CNN framework is presented. 60 2.1. Basic Computation Unit 61 To take full advantage of limited memristor array, a three-layer simplified CNNs is used as the 62 basic computation unit (BCU). The structure of this network is shown in Figure1. Max Selected … Spike Flow Classification Input Image Convolution Layer Pooling Layer Fully Connected Layer Figure 1. The basic computation unit architecture. It consists of three layers, behind the input layer is convolution layer which consists of k kernels, followed by a average-pooling layer and a fully connected layer. 63 The simplified CNNs includes three layers. The convolution layer includes k kernels with kernel 64 size Ks × Ks followed by absolute nonlinearity function (Abs), which extracts the features from the 65 input images and produce the feature maps. The average-pooling obtains spatial invariance while 66 scaling feature maps with pooling size Ps × Ps from their preceding layers. A sub-sample is taken from 67 each feature map, and it reduces the data redundancy. The fully connected layer (FCL) performs the 68 final classification or image reconstruction, it takes the extracted feature maps and multiplies a weight 69 matrix following a dense matrix vector multiplication pattern. G = H(I∗) = s(W · G∗ + b) = s(W · (G∗∗ ∗ W∗) + b) (1) = s(W · (d(W∗∗ ∗ I∗ + b∗) ∗ W∗) + b) Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2019 doi:10.20944/preprints201901.0319.v1 3 of 11 M-BCUs (a) W × H BCU #1_1 1 1 N-BCUs W2 × H2 BCU #2_1 P-BCUs W H F F 1 × 1 1 … 2 output input image BCU #1_2 BCU #3_1 … BCU #3_P … BCU #2_N W × H BCU #1_M W2 × H2 W1 × H1 Part #1 Part #2 Part #3 BCU #1_1 (b) W × H W1 × H1 BCU #1_2 f 1_1 (I * ) BCU #2_1 W × H image BCU #3_1 W × H W1 × H1 # BCU #1_3 1_ 2 * input image (!∗) f (I ) 2_1 3_1 2_1 image fimage (F) fimage ( fimage ) output W × H W1 × H1 1_ 3 * fimage (I ) Figure 2. The diagram of the proposed cascaded framework. (a): Standard cascaded network framework "M-N-P". (b): The typical "3-1-1" cascaded type. C×W×H C∗×W∗×H∗ 70 where H : R ! R is the transformation in the BCU (in other words, BCU 71 performs the image transformation), C is the number of channels of input image, s indicates the ∗ 72 hyperbolic tangent function, d indicates the Abs function, ∗ represents the convolution operator, b, b ∗ ∗∗ 73 are bias value, and W, W , W represent the weight matrix of each layer, respectively (refer to Figure 74 1). 75 2.2. The Proposed Cascaded Framework 76 Aim to combine several monolithic networks (BCUs) to obtain better performance, we propose a 77 cascaded CNN network, whose specific design can be seen in Figure2a. C×W×H 78 The cascaded framework includes three parts. Given the output G 2 R generated from a C×W×H C×W×H 79 part, a reconstruction transformation f : R ! R is applied to aggregate outputs over 80 all BCUs of this part, where C is the number of channels of input image, W and H are the spatial th 81 dimensions. The output of k part is described as Fk = Gk_1 ⊕ Gk_2 ⊕ ... ⊕ Gk_n (2) Fk+1 = Gk+1_1 ⊕ Gk+1_2 ⊕ ... ⊕ Gk+1_m (3) = Hk+1_1(Fk) ⊕ ... ⊕ Hk+1_m(Fk) th th 82 where Gk_n is the output of the n BCU of the k part, and ⊕ represents the reconstruction operator 83 which combined with several original outputs using an element-wise addition to produce a new 84 output. The new output Fk is treated as input to feed into the k + 1 part.