Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-Cnns) Viraj Bangari , Bicky A
Total Page:16
File Type:pdf, Size:1020Kb
IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 26, NO. 1, JANUARY/FEBRUARY 2020 7701213 Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs) Viraj Bangari , Bicky A. Marquez, Member, IEEE, Heidi Miller , Alexander N. Tait , Member, IEEE, Mitchell A. Nahmias , Thomas Ferreira de Lima , Student Member, IEEE, Hsuan-Tung Peng , Paul R. Prucnal, Life Fellow, IEEE, and Bhavin J. Shastri , Senior Member, IEEE Abstract—Convolutional Neural Networks (CNNs) are powerful bottlenecks is computing the matrix multiplication required for and highly ubiquitous tools for extracting features from large forward propagation. In fact, over 80% of the total processing datasets for applications such as computer vision and natural time is spent on the convolution [1]. Therefore, techniques that language processing. However, a convolution is a computationally expensive operation in digital electronics. In contrast, neuromor- improve the efficiency of even forward-only propagation are in phic photonic systems, which have experienced a recent surge of high demand and researched extensively [2], [3]. interest over the last few years, propose higher bandwidth and In this work, we present a complete digital electronic and energy efficiencies for neural network training and inference. Neu- analog photonic (DEAP) architecture capable of performing romorphic photonics exploits the advantages of optical electronics, highly efficient CNNs for image recognition. The competitive including the ease of analog processing, and busing multiple signals on a single waveguide at the speed of light. Here, we propose a MNIST handwriting dataset [4] is used as a benchmark test for Digital Electronic and Analog Photonic (DEAP) CNN hardware our DEAP CNN. At first, we train a standard two-layer CNN architecture that has potential to be 2.8 to 14 times faster while us- offline, after which network parameters are uploaded to the ing almost 25% less energy than current state-of-the-art graphical DEAP CNN. Our scope is limited to the forward propagation, but processing units (GPUs). includes power and speed analyses of our proposed architecture. Index Terms—Deep learning, machine learning, neuromorphic Due to their speed and energy efficiency, photonic neural net- photonics, photonic neural networks, convolutional neural network works have been widely investigated from different approaches (CNN). that can be grouped into three categories: (1) reservoir com- puting [5]–[8]; reconfigurable architectures based on (2) ring- I. INTRODUCTION resonators [9]–[12], and (3) Mach-Zehnder interferometers [13], HE success of convolutional neural networks (CNNs) [14]. Reservoir computing in the discrete photonic domain T for large-scale image recognition has stimulated research successfully implement neural networks for fast information in developing faster and more accurate algorithms for their processing, however the predefined random weights of their use. However, CNNs are computationally intensive and hidden layers cannot be modified [8]. therefore results in long processing latency. One of the primary An alternative approach uses silicon photonics to design fully programmable neural networks [15], using a so-called Manuscript received April 18, 2019; revised August 14, 2019; accepted broadcast-and-weight protocol [10]–[12]. This scheme is based August 31, 2019. Date of publication October 4, 2019; date of current version on wavelength-division multiplexing (WDM) and is capable of November 4, 2019. The work of V. Bangari, B. A. Marquez, and B. J. Shastri implementing reconfigurable, recurrent and feedforward neu- was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Queen’s Research Initiation Grant (RIG). The work ral network models, using a bank of tunable silicon micror- of B. A. Marquez was also supported by the 2019 Queens Postdoctoral Fund. ing resonators (MRRs) that recreate on-chip synaptic weights. (Corresponding authors: Viraj Bangari; Bhavin J. Shastri.) Therefore, such a protocol allows it to emulate physical neurons. V. Bangari, B. A. Marquez, and H. Miller are with the Department of Physics, Engineering Physics & Astronomy, Queen’s University, Kingston, Mach-Zehnder interferometers (MZIs) have been also used to ON KL7 3N6, Canada (e-mail: [email protected]; [email protected]; model synaptic-like connections of physical neurons [14]. Fully [email protected]). reconfigurable photonic architectures based on MZI arrays are A. N. Tait was with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA. He is now with the Physical Measurement introduced in the work of Bagherian et al. [16]. This architecture Laboratory, National Institute of Standards and Technology (NIST), Boulder, uses constructive or destructive interference effects in MZIs to CO 80305 USA (e-mail: [email protected]). implement a matrix-vector operation on photonic signals. This M. A. Nahmias, T. Ferreira de Lima, H.-T. Peng, and P. R. Prucnal are with the Department of Electrical Engineering, Princeton University, Prince- approach is limited to using a single wavelength of light, and ton, NJ 08544 USA (e-mail: [email protected]; [email protected]; being all-optical, the architecture must grapple with both am- [email protected]; [email protected]). plitude and phase, with challenges of phase noise accumulation B. J. Shastri is with the Department of Physics, Engineering Physics & Astronomy, Queen’s University, Kingston, ON KL7 3N6, Canada, and also with from one nonlinear stage to another. the Department of Electrical Engineering, Princeton University, , Princeton, NJ The advantage of the broadcast-and-weight approach over the 08544 USA (e-mail: [email protected]). coherent approach is that it has already demonstrated fan-in, Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org. inhibition, time-resolved processing, and autaptic cascadabil- Digital Object Identifier 10.1109/JSTQE.2019.2945540 ity [12]. The DEAP-CNN architecture is therefore designed 1077-260X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 7701213 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 26, NO. 1, JANUARY/FEBRUARY 2020 using banks of tunable MRRs. The DEAP-CNN design is compatible with mainstream silicon photonic device platforms. Consequently, this approach leverages the advances in silicon photonics that have recently progressed to the level of sophis- tication required for large-scale integration. Furthermore, this proposed architecture allows the implementation of multi-layer networks to implement the deep learning framework. Inspired by the work of Mehrabian et al. [17], which lays out a potential architecture for photonic CNNs, our design goes a step further by considering specific details about a complete physical implementation, as well as an example of how an algorithm for tasks such as MNIST can be mapped to photonics. In particular, our approach exploits the use of multi-dimensional kernels and inputs, and the summation of multi-channel inputs that leads us to implement on-chip general photonic convolutions. Fur- thermore, we break the limitations on weights being between −1 and +1 by introducing transimpedance amplifiers (TIAs), Fig. 1. Schematic illustration of a convolution. At the top of the figure, an input image is represented as a matrix of numbers with dimensionality H × W × D which is a unique feature of our proposed DEAP-accelerator, where H, W ,andD are the height, width, and depth of the image, respectively. and has not been considered in related recent works [18]. Each element Ai,j of A represents the intensity of a pixel at that particular spatial F R × R × D This work is divided in five sections: Following this location. The kernel is a matrix with dimensionality , where each element Fi,j is defined as a real number. The kernel is slid over the image by introduction, in Section (II), we describe convolutions as used using a stride S equal to one. As the image has multiple channels (or depth) in the field of signal processing. Then, we introduce sili- D, the same kernel is applied to each channel. Assuming H = W , the overall (H − R +1)2 con photonic devices to perform convolutions in photonics. output dimensionality is . The bottom of the figure shows how a convolution operation generalized into a single matrix-matrix multiplication. Section (III) introduces a hardware inspired algorithm to per- where the kernel F is transformed into a vector F with DR2 elements, and form such full photonic convolutions. In Section (IV), we utilize the image A is transformed into a matrix A of dimensionality DR2 × (H − R +1)2 (H − R +1) our previously described architecture to build a two-layer DEAP . Therefore, the output is represented by a vector with elements. CNN for MNIST handwritten digit recognition. Finally, in Section (V), we show an energy-speed benchmark test, where we compare the performance of DEAP with the empirical dataset Using matrix slicing notation, Eq. (2) can be represented as a DeepBench [19]. Note, we have made the high level simulator dot product of two vectorized matrices: and mapping tool written in Python for the DEAP architecture T m∈[i,i+R] T publicly available [20]. Oi,j = vec(F ) · vec((Am,n)n∈[j,j+R]) . (3) A convolution reduces the dimensionality of the