Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-Cnns)

Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs) Viraj Bangari,∗ Bicky A. Marquez, Heidi B. Miller, and Bhavin J. Shastriy Department of Physics, Engineering Physics & Astronomy, Queen's University, Kingston, ON KL7 3N6, Canada Alexander N. Tait National Institute of Standards and Technology (NIST), Boulder, Colorado 80305, USA Mitchell A. Nahmias, Thomas Ferreira de Lima, Hsuan-Tung Peng, and Paul R. Prucnal Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA (Dated: July 3, 2019) Convolutional Neural Networks (CNNs) are powerful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language processing. However, a convolution is a computationally expensive operation in digital electronics. In contrast, neuromorphic photonic systems, which have experienced a recent surge of interest over the last few years, propose higher bandwidth and energy efficiencies for neural network training and inference. Neuromorphic photonics exploits the advantages of optical electronics, including the ease of analog processing, and busing multiple signals on a single waveguide at the speed of light. Here, we propose a Digital Electronic and Analog Photonic (DEAP) CNN hardware architecture that has potential to be 2.8 to 14 times faster while maintaining the same power usage of current state-of-the-art GPUs. I. INTRODUCTION predefined random weights of their hidden layers cannot be modified [8]. The success of CNNs for large-scale image recognition An alternative approach uses silicon photonics to de- has stimulated research in developing faster and more sign fully programmable neural networks [15], using a so- accurate algorithms for their use. However, CNNs are called broadcast-and-weight protocol [10{12]. This pro- computationally intensive and therefore results in long tocol is capable of implementing reconfigurable, recurrent processing latency. One of the primary bottlenecks is and feedforward neural network models, using a bank of computing the matrix multiplication required for forward tunable silicon microring resonators (MRRs) that recre- propagation. In fact, over 80% of the total processing ate on-chip synaptic weights. Therefore, such a protocol time is spent on the convolution [1]. Therefore, tech- allows it to emulate physical neurons. Mach-Zehnder in- niques that improve the efficiency of even forward-only terferometers have been also used to model synaptic-like propagation are in high demand and researched exten- connections of physical neurons [14]. The advantage of sively [2, 3]. the former approach over the latter is that it has already In this work, we present a complete digital electronic demonstrated fan-in, inhibition, time-resolved process- and analog photonic (DEAP) architecture capable of per- ing, and autaptic cascadability [12]. The DEAP CNN de- forming highly efficient CNNs for image recognition. The sign is therefore compatible with mainstream silicon pho- competitive MNIST handwriting dataset[4] is used as a tonic device platforms. This approach leverages the ad- benchmark test for our DEAP CNN. At first, we train a vances in silicon photonics that have recently progressed standard two-layer CNN offline, after which network pa- to the level of sophistication required for large-scale inte- rameters are uploaded to the DEAP CNN. Our scope is gration. Furthermore, this proposed architecture allows arXiv:1907.01525v1 [eess.SP] 23 Apr 2019 limited to the forward propagation, but includes power the implementation of multi-layer networks to implement and speed analyses of our proposed architecture. the deep learning framework. Due to their speed and energy efficiency, photonic neu- Inspired by the work of Mehrabian et al. [16], which ral networks have been widely investigated from different lays out a potential architecture for photonic CNNs with approaches that can be grouped into three categories: DRAM, buffers, and microring resonators, our design (1) reservoir computing [5{8]; reconfigurable architec- goes a step further by considering specific input repre- tures based on (2) ring-resonators [9{12], and (3) Mach- sentation, as well as an example of how an algorithm for Zehnder interferometers [13, 14]. Reservoir computing in tasks such as MNIST handwritten digit recognition can the discrete photonic domain successfully implement neu- be mapped to photonics. Moreover, we consider summa- ral networks for fast information processing, however the tion of multi-channel inputs, multi-dimensional kernels, the limitations on weights being between 0 and 1, and the architecture for the depth of kernel or inputs. This work is divided in five sections: Following this ∗ [email protected] introduction, in section (II), we describe convolutions as y [email protected] used in the field of signal processing. Then, we intro- 2 duce silicon photonic devices to perform convolutions in image kernel output photonics. Section (III) introduces a hardware inspired o A1,1A1,2A1,3 F1,1 F1,2 o 1 2 R o algorithm to perform such full photonic convolutions. In H A1,4 A1,5 A1,6 F1,3 F1,4 o 3 4 D R Section (IV), we utilize our previously described architec- A1,7 A1,8 A1,9 D { ture to build a two-layers DEAP CNN for MNIST hand- W F3,4 F3,1F3,2 written digit recognition. Finally, in section (V), we show F { { 3,3 F3,3 F3,4 an energy-speed benchmark test, where we compare the F3,2 F2,1 F2,2 performance of DEAP with the empirical dataset Deep- A3,1A3,2A3,3 A2,1A2,2A2,3 A1,1A1,2A1,3 F3,1 3,4 3,5 2,4 2,5 1,4 1,5 F2,3 F2,4 Bench [17]. Note, we have made the high level simulator A A A3,6 A A A2,6 A A A1,6 F2,4 A3,7 A3,8 A3,9 A2,7 A2,8 A2,9 A1,7 A1,8 A1,9 and mapping tool for the DEAP architecture publicly F1,1 F1,2 F2,3 2 available [18]. F1,3 F1,4 F2,2 DR A { F2,1 AA3,9 AA3,8 AA3,6AA3,5 AA2,9 AA2,8 AA2,6AA2,5 AA1,9 AA1,8 AA1,6AA1,5 F1,4 o4 II. CONVOLUTIONS AND PHOTONICS A3,8A3,7A3,5A3,4A2,8A2,7A2,5A2,4A1,8A1,7A1,5A1,4 F1,3 o3 = o A3,6 A3,5A3,3 A3,2A2,6 A2,5A2,3 A2,2A1,6 A1,5A1,3 A1,2 * F1,2 2 II.1. Convolutions Background A3,5A3,4A3,2A3,1A2,5A2,4A2,2A2,1A1,5A1,4A1,2A1,1 F1,1 o 1 DR 2 F o A convolution of two discrete domain functions f and g is defined by: Figure 1. Schematic illustration of a convolution. At the top of the figure, an input image is represented as a matrix of 1 X numbers with dimensionality H × W × D where H, W and (f ∗ g)[t] = f[τ]g[t − τ]; (1) D are the height, width and depth of the image, respectively. t=−∞ Each element Ai;j of A represents the intensity of a pixel at where (f ∗g) represents a weighted average of the function that particular spatial location. The kernel F is a matrix with f[τ] when it is weighting by g[−τ] shifted by t. The dimensionality R × R × D, where each element Fi;j is defined as a real number. The kernel is slid over the image by using a weighting function g[−τ] emphasizes different parts of stride S equal to one. As the image has multiple channels (or the input function f[τ] as t changes. depth) D, the same kernel is applied to each channel. Assum- In digital image processing, a similar process is fol- ing H = W , the overall output dimensionality is (H −R+1)2. lowed. The convolution of an image A with a kernel F The bottom of the figure shows how a convolution operation produces a convolved image O. An image is represented generalized into a single matrix-matrix multiplication. where as a matrix of numbers with dimensionality H×W , where the kernel F is transformed into a vector F with DR2 ele- H and W are the height and width of the image, respec- ments, and the image A is transformed into a matrix A of 2 2 tively. Each element of a matrix represents the intensity dimensionality DR × (H − R + 1) . Therefore, the output is of a pixel at that particular spatial location. A kernel represented by a vector with (H − R + 1) elements. is a matrix of real numbers with dimensionality R × R. The value of a particular convolved pixel is defined by: Table I. Summary of Convolutional Parameters R R X X Oi;j = Fk;lAi+k;j+l: (2) Parameter Meaning k=1 l=1 N Number of input images H Height of input image including padding Using matrix slicing notation, Eq. (2) can be represented W Width of input image including padding as a dot product of two vectorized matrices: D Number of input channels O = vec(F )T · vec((A )m2[i;i+R])T : (3) R Edge length of kernel i;j m;n n2[j;j+R] K Number of kernels A convolution reduces the dimensionality of the input S Stride image to (H − R + 1) × (W − R + 1), so a padding of zero values is normally applied around the edges of the input image to counteract this. A schematic illustration parameter S is referred to as the \stride" of the convolu- of a convolution in digital image processing is shown at tion. This convolution is similar to Eq. (3), except that the top of Fig. 1. the outputs from each channel are summed together in When convolutions are used to perform parallel ma- the end, and that the stride parameter is always equal to trix multiplications in neural networks such as CNNs, a 1 in image processing.

Load more