Attojoule Scale Computation of Large Optical Neural Networks Alexander

Attojoule Scale Computation of Large Optical Neural Networks Alexander

Attojoule Scale Computation of Large Optical Neural Networks by Alexander Sludds B.S., Massachusetts Institute of Technology (2018) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Masters of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2019 c Massachusetts Institute of Technology 2019. All rights reserved. Author.............................................................. Department of Electrical Engineering and Computer Science May 18, 2019 Certified by. Dirk Englund Associate Professor Thesis Supervisor Accepted by . Katrina LaCurts Chair, Master of Engineering Thesis Committee Attojoule Scale Computation of Large Optical Neural Networks by Alexander Sludds Submitted to the Department of Electrical Engineering and Computer Science on May 18, 2019, in partial fulfillment of the requirements for the degree of Masters of Engineering in Electrical Engineering and Computer Science Abstract The ultra-high bandwidth and low energy cost of modern photonics offers many op- portunities for improving both speed and energy efficiency in classical information processing. Recently a new architecture has been proposed which allows for sub- stantial energy reductions in matrix-matrix products by utilizing balanced homodyne detection for computation and optical fan-out for data delivery. In this thesis I work towards the analysis and implementation of both analog and digital optical neural networks. For analog optical neural networks I discuss both the physical implemen- tation of this system as well as an analysis of limits imposed on this system by shot noise, crosstalk, and electro-optic/opto-electronic information conversion. From these results, it is found that femtojoule-scale computation per multiply and accumulate operation is achievable in the near term with further energy gains foreseeable with emerging technology. This thesis also presents a system-scale throughput and en- ergy analysis of digital optical neural networks, which can enable incredibly high data speeds (> 10GHz) with CMOS compatible voltages at weight transmitter power dissipation comparable to a modern CPU. Thesis Supervisor: Dirk Englund Title: Associate Professor 2 Acknowledgments I would first like to thank my thesis advisor Professor Dirk Englund. His consistent vision and encouragement has enabled the work in this thesis as well as my substantial personal development. I would also like to thank my collaborators Dr Ryan Hamerly and Liane Bernstein. I believe we make a great team, and I am always inspired by the unique thought process and ideas that both of you bring, as well as your patience. I would like to thank Professors Vivienne Sze and Joel Emer for their advice and support. They have taught me to think deeply about what makes a good benchmark and how to weight system level tradeoffs, which has in turn helped me think about what are good next steps experimentally to push existing benchmarks. I would like to thank other lab members who have gifted their expertise and experience in many subjects to aid this research. In particular, I would like to thank Christopher Panuski for his wealth of knowledge in applied integrated optics, Christopher Foy and Mohamed Ibrahim for their knowledge of modern CMOS processes and integration with photonics, and Ian Christen for his experience in integrated optical processes. I would like to thank the National Science Foundation who have given me a Grad- uate Research Fellowship. With this funding and support I will seek out positive contributions I can make through research and teaching to better our world. Finally, I must express my very profound gratitude to my parents and to my friends for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you. 3 Contents 1 Introduction 9 2 Motivation: Neural Networks and their Computational Complexity 11 2.1 Fully-Connected Neural Networks . 11 2.1.1 Back Propagation . 14 2.2 Measures of Neural Network success . 16 2.3 Convolutional Neural Networks (CNNs) . 17 2.4 Computational Complexity of Fully-Connected Neural Networks . 20 2.4.1 Strassen Method . 21 2.4.2 Coppersmith-Winograd Algorithm . 22 2.5 Computational Complexity of CNN . 22 2.5.1 Fast Fourier Transform for Convolution . 23 3 Analog Optical Neural Networks 24 3.1 System Architecture . 24 3.2 Energy Analysis and Standard Quantum Limit . 27 3.2.1 Analysis of Shot Noise . 27 3.2.2 Analysis of Thermal Noise . 30 3.2.3 Analysis of Crosstalk . 32 3.2.4 Standard Quantum Limit . 35 3.3 Lowering the Standard Quantum Limit through hardware-aware training 38 3.4 A discussion of the Landauer Limit . 40 3.5 System Design . 40 3.5.1 Modeling Digital Micromirror Device (DMD) Diffraction Patterns 42 3.5.2 Polarization and Star Configuration . 48 3.5.3 Current Status of Experiment . 49 4 Digital Optical Neural Networks 50 4 4.1 Digital ONN device architecture . 51 4.2 Experiment . 52 4.2.1 Results . 54 4.3 Discussion . 54 4.3.1 Transmitter Source Energy Consumption . 54 4.3.2 Receiver Energy Consumption . 56 4.3.3 Discretization Scheme . 57 4.3.4 Energy Efficiency . 57 4.3.5 Speed of Computation . 59 4.4 Conclusion . 59 5 Conclusion and Outlook 60 5 List of Figures 2-1 Fully-Connected Neural Network Architecture . 12 2-2 Convolutional Neural Network Architecture . 18 2-3 Toeplitz Matrix Conversion Scheme . 19 2-4 Patch Matrix Method . 20 3-1 Our Analog ONN Architecture . 25 3-2 The ratio of shot noise to thermal noise in our architecture . 32 3-3 Data batching on our ONN architecture . 33 3-4 A shows the definite integral over an incident Gaussian beam, repre- senting the fraction of power absorbed by a photodetector. B visualizes an incident Gaussian mode bleeding over into neighboring photodetec- tors. Note here that there is a non-unity fill-factor on the receiver array. C shows the generated crosstalk matrix. 34 3-5 Inference and training under the presence of crosstalk . 35 3-6 A comparison of shot noise for several neural network models . 36 3-7 A plot of DRAM access energy for our architecture . 37 3-8 A comparison of shot noise amount during train and inference . 39 3-9 A high level overview of our experimental setup . 41 3-10 DMD diffraction pattern imaged onto camera with interference . 41 3-11 DMD device specifications . 42 3-12 Simulated DMD far field diffraction pattern, all mirrors on . 44 3-13 Experimental DMD diffraction pattern, all mirrors on. 45 3-14 Simulated DMD diffraction pattern, every other row off. 46 3-15 Experimental measurement of far field DMD diffraction pattern, every other row off. 47 3-16 Imaged zero order mode from DMD onto camera. 47 6 3-17 Star configuration for imaging the DMD onto a camera with polarizing beamsplitters shown. 49 4-1 A description of our Digital ONN architecture . 52 4-2 A description of our experimental setup . 53 4-3 A plot of SNR and BER for a digital ONN with 1MHz bandwidth per transmitter . 55 4-4 A plot of SNR and BER for a digital ONN with 100MHz bandwidth per transmitter . 55 4-5 A plot of SNR and BER for a digital ONN with 10GHz bandwidth per transmitter . 56 7 List of Tables 4.1 Digital ONN Experimental Results . 54 4.2 Here we show the two figures of merit for the optical architecture, the energy consumption per MAC and latency for a matrix-matrix product. The latency is discussed in Subsection 4.3.5 . 58 8 Chapter 1 Introduction Modern computation is the study and creation of machines which are especially tai- lored towards processing, combining, and analyzing vast amounts of data. Recently, there has been substantial interest in the creation of computers which can efficiently perform the computation for machine learning. Machine learning has become an indis- pensable statistical tool that has substantially advanced the fields of machine-vision [1], game-playing [2], [3], and healthcare diagnostics [4], to name a few. In recent years, a trend in deep learning hardware has becoming increasingly more specialized, moving from CPU-based systems to specialized Application Specific Integrated Cir- cuits (ASICs) optimized for machine learning [5], [6]. A reason for this is because the death of Moore's law, Amdahl's Law and Dennard scaling [7], [8] limits the ability for CMOS to continue scaling the energy efficiency and throughput of general purpose computation. However, for ASICs there is still a critical problem limiting the ability to scale: copper is an incredibly lossy material. The bandwidth required for modern computing pushes the material limits of copper, making the limits of computation the interconnection and memory access energies [9], [10]. For this reason, memory ac- cess is the dominating factor in the energy consumption of modern machine learning hardware [11]. In order to overcome this problem, optical neural networks have been developed which can combine the high bandwidth and high communication efficiency of optics with the cost effectiveness, scalability, and computational efficiency of CMOS electronics. Past research from our group resulted in the design and experimental testing of a photonic integrated circuit capable of performing the multiplication of arbitrary 8×8 unitary matricies with a 100 KHz reprogramming speed [12]. This circuit was shown 9 to have high fidelity on vowel recognition tasks, making it a unique proof of con- cept system. This platform may in the future allow for low-power reprogrammable photonic information processing, similar to what an FPGA can perform for electronic signal processing. Other recent research in the field has exploited the novel properties of photonics for computation such as the explicit implementation of brain-inspired computation through optical spiking neural networks [13], passive computation with millimeter-wave scale diffractive optical neural networks [14], and advanced dataflow implementations with reservoir computation [15].

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    70 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us