Silicon Photonics for Artificial Intelligence Applications

FOCUS Photonics and artificial intelligence

SILICON PHOTONICS FOR ARTIFICIAL INTELLIGENCE APPLICATIONS

///////////////////////////////////////////////////////////////////////////////////////////////////

Bicky A. MARQUEZ1 ,* , Matthew J. FILIPOVICH1, Emma R. HOWARD1, Viraj BANGARI1, Zhimu GUO1, Hugh D. MORISON1, Thomas FERREIRA DE LIMA2, Alexander N. TAIT2,3, Paul R. PRUCNAL2, Bhavin J. SHASTRI 1,2 1 Department of Physics, Engineering Physics & Astronomy, Queen’s University, Kingston, ON KL7 3N6, Canada 2 Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA 3 Applied Physics Division, National Institute of Standards and Technology, Boulder, CO 80305, USA *[email protected]

Artificial intelligence enabled by neural networks has enabled applications in many fields (e.g. medicine, finance, autonomous vehicles). Software implementations of neural networks on conventional computers are limited in speed and energy efficiency. Neuromorphic engineering aims to build processors in which hardware mimic neurons and synapses in brain for distributed and parallel processing. Neuromorphic engineering enabled by silicon photonics can offer sub- nanosecond latencies, and can extend the domain of artificial intelligence applications to high-performance computing and ultrafast learning. We discuss current progress and challenges on these demonstrations to scale to practical systems for training and inference.

https://doi.org/10.1051/photon/202010440

nalog computing common neural models are spiking The photonic platform is currently has recently been artificial neurons and perceptrons. one of the most promising technolo- considered a poten- While spiking artificial neurons are gies to tackle the expensive calcula- tial avenue to de- biologically realistic, the field of AI is tions performed by deep networks. crease energy and currently perceptron-based [1]. Silicon photonics offers high-scala- time requirements Perceptrons implement multi- bility, high-bandwidth, low-footprint, for executing algorithms such as ply-accumulate (MAC) operations. and low-energy consumption [3]. The deepA neural networks. Analog spe- MAC operations serve to quantify the high-bandwidth and multiwavelen- cial-purpose hardware requires number of multiplications and addi- gth parallel properties of light allow the manufacturing of machines to tions required to run deep networks. for optical information processing at physically model each individual A perceptron of M inputs can per- a high data rate. The ability of neuro- component of such networks. This form M MAC operations per time morphic photonic systems to provide proves to be a significant challenge step. Multiple MAC operations can substantial improvement in our com- as current deep networks scale up be executed in parallel to implement puting capabilities is moving ever to thousands or even billions of any type of artificial neural network closer, with, potentially, PetaMac/ neurons to solve complex artificial (ANN). MACs are currently the most second/mm2 processing speeds. intelligence (AI) tasks. To enable the burdensome hardware bottleneck in In this article, we describe a photo- use of analog machines to map brain ANNs; for instance, the deep network nic scheme that can perform parallel circuitry, the functions of biological AlexNet requires 724 million MACs to MAC operations on-chip and introduce neurons must be modelled. The most solve ImageNet [2]. two photonic platforms that allow for

40 www.photoniques.com I Photoniques 104 Photonics and artificial intelligence FOCUS

AI hardware acceleration: i) a special-pur- approximately 100 billion neurons and pose photonic architecture for executing 100 trillion synaptic interconnections the direct feedback alignment (DFA) al- that must be represented in an artificial gorithm for neural network training [6], machine. However, a subset of the brain and ii) an implementation of a Long-Short circuitry can still be represented in an Term Memory (LSTM) neural network [7]. artificial machine to simulate some of Both proposed designs offer fundamental the human cognitive processes. speed and bandwidth advantages over di- Recently, most significant advances in gital electronic implementations. the field of AI have been achieved using a perceptron, shown in Fig. 1, as the ar- BACKGROUND: NEUROSCIENCE tificial model of the neuron. The output y AND COMPUTATION of the neuron represents the signals sent Digital computers are typically computing from the axon of a biological neuron and systems that perform logical and mathe- is mathematically described by matical operations with high accuracy. y = f (W∙ x+b). Nowadays, such complex systems signi- ficantly outweigh human capabilities for The xi inputs transmit the information calculation and memory. Nevertheless, to the neuron through the weights Wi if we were to compare a human agent which correspond to the strength of the with a digital machine, we would see that synapses. The summation of all weighted there are many abstractions that should inputs and their transformation via acti- be made to perform one-to-one compa- vation function f are associated with the risons. Such abstractions assume that physiological role of the neuron’s cell human cognitive processes are comple- body. The bias b represents an extra va- tely procedural and follow standard logic. riable that remains in the system even if However, most human cognitive acts do the other inputs are absent. not follow a set of well-defined instruc- ANNs are built using perceptrons as tions. Therefore, a one-to-one mapping neural primitives such that the synaptic between human and digital computers connections are either positive or nega- might not be suitable. tive to mimic excitatory and inhibitory Analog neuromorphic computing ap- neural behaviour. A nonlinear activation proaches might be more suited to mimic function can be used to define activated human brain processes. The goal is to and deactivated behaviours in artificial create a one-to-one mapping between the neurons. ANNs can be categorized as neural system and the analog machine, either feed forward (where connections where each biological quantity is mo- between neurons do not form a cycle) delled by an equivalent analog artificial or recurrent neural networks (where model. For an architecture such as the cycles exist). human brain, this could be a demanding Attempts to build fast and efficient requirement. The human brain contains perceptron-based ANNs have been

Figure 1. Schematic diagram of a perceptron.

Photoniques 104 I www.photoniques.com 41 FOCUS Photonics and artificial intelligence

reported throughout recent years. An Figure 2. An array of M MRRs can emulate interesting computing acceleration Add-drop MRR weight bank with a balanced the weighted addition of a single neu- technique consists in using hardware photodetector implementing M element-wise ron if add-drop MRRs and a balanced units to perform MAC operations at multipliers to perform N MAC operations photodetector are incorporated into high speeds. A MAC unit performs in parallel. the model, as shown in Fig. 2. In this multiplications and accumulation illustration we show how to perform processes: (a+w.x). Multiple MAC M MAC operations in parallel in pho- operations can be run in parallel to electro-optically, or through light tonics. Input values to the neuron

perform complex operations such as absorption such as phase-change or can be mapped to voltage values Vi

convolutions and digital filters. MACs graphene materials. In this work, tu- that tune each individual MRR(xi). are typically used in implementations ning is performed by thermally mo- Each voltage value has a one-to-one of ANNs in digital electronics [4]. difying the refractive index of the MRR correspondence with an MRR trans-

Nevertheless, the serialization of waveguide. The application of voltage mission profile Ti, and the same the summands to perform weighted values to the heater allows us to map principle holds for weight values. The addition makes this process ineffi- real-valued numbers to the device. experimental implementation of this cient; consequently, chip designers method requires the use of M lasers

are looking for alternative solutions with different wavelengthsλ i (with such as full parallelism. One of the Figure 3. i = 1, … ,M) that represent M channels. most promising technologies for (a) Transmission versus wavelength Two MRRs with different on and

this purpose is based on the photo- curves of two different MRRs (MRR(x1), off resonance configurations at the

nic platform. MRR(W1) performing element-wise optical same wavelength λ1 will therefore per- multiplications, and (b) the product of form element-wise multiplications, PHOTONIC PERCEPTRON such multiplication. as shown in Fig. 3. Here, we show AND MAC OPERATIONS an illustration of the multiplication A scalable photonic architecture that implements parallel MACs can be achieved using on-chip wavelength division multiplexing (WDM) tech- niques [8]. This design uses microring resonators (MRRs) [9], i.e. photonic synapses, to encode input values and weights onto multiple wavelength signals. Tuning a given MRR on and off resonance changes the transmission of each signal through the respec- tive filter, effectively multiplying the signal with a desired weight. An ad- vantage of using MRRs is the ability to tune the weight values using a va- riety of different methods: thermally,

42 www.photoniques.com I Photoniques 104 Photonics and artificial intelligence FOCUS

between two transmission elements then perform the multiplications for x1 and W1, yielding the resulting va- Benefiting from the speed each channel. This architecture is a lue R. In Fig. 3(a), the element x1 is general representation of the mul- tuned to have the maximum optical and energy advantages of tiwavelength platform as it can be transmission, whereas W1 is tuned photonics over traditional used for inference, as demonstrated to half the maximum. To implement digital computers, the DFA in [8], as well as in situ training. x1, MRR(x1) is set on-resonance with training algorithm can be λ1 and MRR(W1) is tuned to be half implemented in situ on silicon ON-CHIP NEURAL NETWORK off-resonance with the same wave- TRAINING length. They represent real-valued photonic hardware Benefiting from the speed and en- numbers 1 and 0.5, respectively. The ergy advantages of photonics over result of such multiplication, shown traditional digital computers, the in Fig. 3(b), is R = 0.5. A similar process APPLICATIONS DFA training algorithm can be im- is followed with the remaining sets To implement ANNs on photonic plemented in situ on silicon photo-

(MRR(xi), MRR(Wi)) for i > 1. Once the chips, we stack N element-wise nic hardware [6]. The DFA algorithm weighted-addition is performed using multipliers that perform weighted is a supervised learning algorithm a balanced photodetector, an on-chip additions, as shown in Fig. 4. The for training ANNs, where the error nonlinear function can be added by N × M input values received from is propagated through fixed random using a microring modulator. digital-to-analog converters (DACs) feedback connections directly from Based on this scheme, we can de- modulate the intensities of a group the output layer to the hidden layer. sign systems to solve many complex of M lasers with identical powers The DFA algorithm has been used AI tasks. In the following sections, we but unique wavelengths. These mo- to train ANNs using the MNIST, will describe how to efficiently imple- dulated inputs are sent into an ar- Cifar-10, and Cifar-100 datasets, and ment ANN training and inference on ray of photonic N × M weight banks yields comparable performance to photonic chips. (uploaded from the DACs), which the popular backpropagation training

Photoniques 104 I www.photoniques.com 43 FOCUS Photonics and artificial intelligence

Figure 4. The input and kernel values modulate the MRRs via electrical currents proportional to those values.

algorithm [10]. A DFA photonic inte- CONCLUSION of photonic machines face complex grated circuit can be designed with Neuromorphic photonics promises technical challenges that many re- two connected blocks with M = 10 exciting developments for the future search groups and companies have and N = 100. This design could per- of AI. In an effort to extend the bounds begun addressing, including the form 2000 MACs per pass, enabling of digital computers for AI applica- control of the processing unit and weight updates between two layers of tions, the high bandwidth operation efficient memory access. Successful 1000 neurons in 1000 passes. and full programmability of analog solutions to these problems could en- photonic integrated circuits can faci- able the widescale adoption of photo- LONG-SHORT TERM litate ultrafast learning and inference nic processors to tackle practical AI MEMORY NEURAL NETWORK of ANNs. Current implementations applications. Similar to the DFA circuit, LSTM networks [11] can also be implemented using the multiwavelength photonic architecture [7]. An LSTM network is a recurrent architecture REFERENCES that offers advantages for time-se- [1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (The MIT Press, ries processing. Neuromorphic 2016) photonic LSTMs offer a solution to the growing demand for high- [2] V. Sze, Y.-H. Chen, T.-J. Yang et al., Proc. IEEE 105, 2295 (2017) speed, high-bandwidth neural [3] P.R. Prucnal, B.J. Shastri, Neuromorphic Photonics, (CRC Press, 2017) networks in time-series applica- [4] M.A. Nahmias, T. Ferreira de Lima, A.N. Tait et al., IEEE J. Sel. Top. Quantum tions, including video processing, Electron. 26, 1 (2020) autonomous driving, and optical [5] J. Feldmann, N. Youngblood, C.D. Wright et al., Nature 569, 208 (2019) communications. The performance [6] M. J. Filipovich, Z. Guo, B. A. Marquez, et al., in Proc. IEEE Photon. Conf. (IPC), of the photonic LSTM for inference paper TuA3.2 (2020) tasks was tested by applying the network to a simple univariate time [7] E. R. Howard, B. A. Marquez, B. J. Shastri, in Proc. IEEE Photon. Conf. (IPC), paper TuA3.2 (2020) series data problem in simulation. The simulation of this task demons- [8] V. Bangari et al., IEEE J. Sel. Top. Quant. Electron. 26, 7701213 (2020) trates that even very small photonic [9] A.N. Tait, T. Ferreira de Lima, E. Zhou et al., Sci. Rep. 7, 7430 (2017) LSTM networks performing up to 64 [10] A. Nøkland, Adv. Neural Inf. Process Syst. 29, 1037 (2016) MACs per pass can be highly effec- [11] S. Hochreiter, J. Schmidhuber, Neural Comput. 9, 173 (1997) tive at performing inference tasks time series data.

44 www.photoniques.com I Photoniques 104