All-Optical Machine Learning Using Diffractive Deep Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Title: All-Optical Machine Learning Using Diffractive Deep Neural Networks Authors: Xing Lin 1,2,3, †, Yair Rivenson 1,2,3, †, Nezih T. Yardimci 1,3 , Muhammed Veli 1,2,3 , Mona Jarrahi 1,3 and Aydogan Ozcan 1,2,3,4,* Affiliations: 1Electrical and Computer Engineering Department, University of California, Los Angeles, CA, 90095, USA 2Bioengineering Department, University of California, Los Angeles, CA, 90095, USA 3California NanoSystems Institute (CNSI), University of California, Los Angeles, CA, 90095, USA 4Department of Surgery, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA *Correspondence to: [email protected] †These authors contributed equally to this work. Abstract: We introduce an all-optical Diffractive Deep Neural Network (D 2NN) architecture that can learn to implement various functions after deep learning-based design of passive diffractive layers that work collectively. We experimentally demonstrated the success of this framework by creating 3D-printed D 2NNs that learned to implement handwritten digit classification and the function of an imaging lens at terahertz spectrum. With the existing plethora of 3D-printing and other lithographic fabrication methods as well as spatial-light-modulators, this all-optical deep learning framework can perform, at the speed of light, various complex functions that computer-based neural networks can implement, and will find applications in all-optical image analysis, feature detection and object classification, also enabling new camera designs and optical components that can learn to perform unique tasks using D2NNs. 1 Main Text Deep learning is one of the fastest-growing machine learning methods of this decade (1), and it uses multi- layered artificial neural networks implemented in a computer to digitally learn data representation and abstraction, and perform advanced tasks, comparable to or even superior than the performance of human experts. Recent examples where deep learning has made major advances in machine learning include medical image analysis (2), speech recognition (3), language translation (4), image classification (5), among many others (1, 6). Beyond some of these mainstream applications, deep learning methods are also being used for solving inverse imaging problems (7-13 ). Here we introduce an all-optical deep learning framework, where the neural network is physically formed by multiple layers of diffractive surfaces that work in collaboration to optically perform an arbitrary function that the network can statistically learn. We term this framework as Diffractive Deep Neural Network (D 2NN) and demonstrate its learning capabilities through both simulations and experiments. A D 2NN can be physically created by using several transmissive and/or reflective layers, where each point on a given layer represents an artificial neuron that is connected to other neurons of the following layers through optical diffraction (see Fig. 1A). Following the Huygens’ Principle, our terminology is based on the fact that each point on a given layer acts as a secondary source of a wave, the amplitude and phase of which are determined by the product of the input wave and the complex-valued transmission or reflection coefficient at that point. Therefore, an artificial neuron in a diffractive deep neural network is connected to other neurons of the following layer through a secondary wave that is modulated in amplitude and phase by both the input interference pattern created by the earlier layers and the local transmission/reflection coefficient at that point. As an analogy to standard deep neural networks (see Fig. 1D), one can consider the transmission/reflection coefficient of each point/neuron as a multiplicative “bias” term, which is a learnable network parameter that is iteratively adjusted during the training process of the diffractive network, using an error back-propagation method. After this numerical training phase implemented in a computer, the D 2NN design is fixed and the transmission/reflection coefficients of the neurons of all the layers are determined. This D 2NN design, once physically fabricated using e.g., 3D-printing, 2 lithography, etc., can then perform, at the speed of light propagation, the specific task that it is trained for, using only optical diffraction and passive optical components/layers, creating an efficient and fast way of implementing machine learning tasks. To experimentally validate this all-optical deep learning framework, we used terahertz (THz) part of the electromagnetic spectrum and a standard 3D-printer to fabricate different layers that collectively learn to perform unique functions using optical diffraction through D 2NNs. In our analysis and experiments, we focused on transmissive D 2NN architectures (Fig. 1) with phase-only modulation at each layer. For example, using five 3D-printed transmission layers, containing a total of 0.2 million neurons and ~8.0 billion connections that are trained using deep learning, we experimentally demonstrated the function of a handwritten digit classifier. This D2NN design formed a fully-connected network and achieved 91.75% classification accuracy on MNIST dataset (14 ) and was also validated experimentally. Furthermore, based on the same D 2NN framework, but using different 3D-printed layers, we experimentally demonstrated the function of an imaging lens using five transmissive layers with 0.45 million neurons that are stacked in 3D. This second 3D-printed D 2NN was much more compact in depth, with 4 mm axial spacing between successive network layers, and that is why it had much smaller number of connections (<0.1 billion) among neurons compared to the digit classifier D 2NN, which had 30 mm axial spacing between its layers. While there are various other powerful methods to achieve handwritten digit classification and design lenses, the main point of these results is the introduction of the diffractive neural network as an all-optical machine learning engine that is scalable and power-efficient to implement various functions using passive optical components, which present large degrees of freedom that can be learned through training data. Optical implementation of learning in artificial neural networks is promising due to the parallel computing capability and power efficiency of optical systems (15 -17 ). Being quite different in its physical operation principles compared to previous opto-electronics based deep neural networks (15, 18 -20 ), the D 2NN framework provides a unique all-optical deep learning engine that efficiently operates at the speed of light using passive components and optical diffraction. Benefiting from various high-throughput and wide-area 3D fabrication 3 methods used for engineering micro- and nano-scale structures, this approach can scale up the number of neurons and layers of the network to cover e.g., tens of millions of neurons and hundreds of billions of connections over large area optical components, and can optically implement various deep learning tasks that involve e.g., image analysis (2), feature detection (21 ), or object classification (22 ). Furthermore, it can be used to design optical components with new functionalities, and might lead to new camera or microscope designs that can perform various unique imaging tasks that can be statistically learned through diffractive neural networks. As we discuss later in the manuscript, reconfigurable D2NNs can also be implemented using various transmission or reflection based spatial light modulators, providing a flexible optical neural network design that can learn an arbitrary function, be used for transfer learning, and further improve or adjust its performance as needed by e.g., new data or user feedback. Results D2NN Architecture. The architecture of a D2NN is depicted in Fig. 1. Although a D2NN can be implemented in transmission or reflection modes by using multiple layers of diffractive surfaces, without loss of generality here we focus on coherent transmissive networks with phase-only modulation at each layer, which is approximated as a thin optical element. In this case, each layer of the D2NN modulates the wavefront of the transmitted field through the phase values (i.e., biases) of its neurons. Following the Rayleigh-Sommerfeld diffraction equation, one can consider every single neuron of a given D2NN layer as a secondary source of a wave that is composed of the following optical mode (23 ): 5ͯ5 ͥ ͥ %ͦ_- 'ʚ ʛ Ĝ , (1) ͫ$ ͬ, ͭ, ͮ = -v ʠͦ_- + %Z ʡ ͙ͬͤ ʠ Z ʡ where l represents the l-th layer of the network, i represents the i-th neuron located at of layer l, is ʚͬ$, ͭ$, ͮ$ʛ the illumination wavelength, ͦ ͦ ͦ and . The amplitude and relative ͦ = ǭʚͬ − ͬ$ʛ + ʚͭ − ͭ$ʛ + ʚͮ − ͮ$ʛ ͞ = √−1 phase of this secondary wave are determined by the product of the input wave to the neuron and its transmission 4 coefficient (ͨ), both of which are complex-valued functions. Based on this, for the l-th layer of the network, one can write the output function ( ') of the i-th neuron located at as: ͢$ ʚͬ$, ͭ$, ͮ$ʛ ' ' ' 'ͯͥ ' %∆W , (2) ͢$ʚͬ, ͭ, ͮʛ = ͫ$ ʚͬ, ͭ, ͮʛ ∙ ͨ$ ʚͬ$, ͭ$, ͮ$ʛ ∙ ∑& ͢& ʚͬ$, ͭ$, ͮ$ʛ = ͫ$ ʚͬ, ͭ, ͮʛ ∙ |̻| ∙ ͙ where we define ' 'ͯͥ as the input wave to i-th neuron of layer l, refers to the ͡$ʚͬ$, ͭ$, ͮ$ʛ = ∑& ͢& ʚͬ$, ͭ$, ͮ$ʛ |̻| relative amplitude of the secondary wave, and ∆ refers to the additional phase delay that the secondary wave encounters due to the input wave to the neuron and its transmission coefficient. These secondary waves diffract between the layers and interfere with each other forming a complex wave at the surface of the next layer, feeding its neurons. The transmission coefficient of a neuron is composed of amplitude and phase terms, i.e., ' ' ' , and since we focus on a phase-only D 2NN architecture the ͨ$ ʚͬ$, ͭ$, ͮ$ʛ = ͕$ʚͬ$, ͭ$, ͮ$ʛ͙ͬͤʚ͞&$ ʚͬ$, ͭ$, ͮ$ʛʛ amplitude ' is assumed to be a constant, ideally 1, ignoring the optical losses – a topic that we ͕$ʚͬ$, ͭ$, ͮ$ʛ expand in the Discussion section.