Artificial Intelligence: from Electronics to Optics

BACK TO BASICS ARTIFICIAL INTELLIGENCE: FROM ELECTRONICS TO OPTICS /////////////////////////////////////////////////////////////////////////////////////////////////// Sylvain GIGAN1,3,*, Florent KRZAKALA2,3, Laurent DAUDET3, Igor CARRON3 1 Laboratoire Kastler Brossel, ENS-Université PSL, CNRS, Sorbonne Université, Collège de France, Paris, France 2 Laboratoire de Physique de l’École Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France 3 LightOn, 2 rue de la Bourse, Paris, France *[email protected] Machine Learning and big data are currently revolutionizing our way of life, in particular with the recent emergence of deep learning. Powered by CPU and GPU, they are currently hardware limited and extremely energy intensive. Photonics, either integrated or in free space, offers a very promising alternative for realizing optically machine learning tasks at high speed and low consumption. We here review the history and current state of the art of optical computing and optical machine learning. https://doi.org/10.1051/photon/202010449 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ince the dawn of voice recognition to translation, the memory through a communi- micro-electronics image analysis to future self-driving cation bus. This architecture has and the emergence cars. However, machine learning’s been basically unchanged since its of lasers, both op- progress requires exponentially in- inception, and only improved thanks tics and electronics creasing resources, be it in memo- to the progress of microelectronics platforms have been ry, raw computing power, or energy and nanolithography, allowing the competing for infor- consumption. We introduce in this feature sizes of components to shrink mation processing and transmission. article the basics of neural networks, to 7 or less nanometers nowadays. While electronics has been overwhel- and see how this new architecture This has consequently diminished Smingly dominating computing for the shatters the status quo and provides tremendously the Ohmic losses and last 50 years thanks to Moore’s law, op- optics a new opportunity to shine in the energy consumption to a few pJ/ tics and photonics have been increa- computing, whether in free space or operation, and allowed the increase singly dominant for communications, in integrated photonics circuits. of the operating clock frequencies from long distance communications of the components to reach several with optical fibers to optical inter- EXPECTED CONTENTS GHz. Thus, component density has connects in data centers. Machine Optical computing. Classical com- driven the number of transistors on Learning, that also originated in the puting, such as the one running on a processor to several tens of billions, 1950s, has seen tremendous develop- our PC, is based on the so-called Von- while driving its cost down. This is ments in the last decade. The emer- Neuman architecture laid out in the the well-known Moore’s law, lea- gence of deep neural networks has 1940s, where a program is stored in ding to the observation that a good become the de facto standard for big a memory, and instructions are read desktop PC nowadays has a proces- data analysis and many of the tasks and executed on a processor, while sing power of several TeraFlops (1012 that we today consider normal: from input and output are exchanged in floating point operations per second). Photoniques 104 I www.photoniques.com 49 BACK TO BASICS Optics has several advantages Figure 1. The network also includes input and compared to electronics: its intrin- Some historical examples of optical computing. output neurons, that either receive or sic parallelism, its almost unlimited Left: the 1972 Tilted Plane Optical processor send information to and from the out- bandwidth, the ability of simple used for synthetic Aperture Radar all optical side world. Just like the brain, a neu- transformation by simple propaga- image processing (from Kosma et al. Applied ral network can be made to “learn”, tion (such as a Fourier Transform) Optics 11.8 (1972): 1766-1777), right, a vector- i.e. be optimized for a given task, by compared to electrons [1]. Thus, matrix-multiplier with optical feedback adjusting its weights, for instance optics has from the very start been (from Psaltis et al. Optics Letters 10.2 (1985): being fed at the input with images, considered as a viable alternative for 98-100) Reprinted with permission from being able to classify them into cate- analog computing. In the eighties, the © The Optical Society. gories. The analogy with the brain emergence of optical non-linearities, stops there: while the brain counts semi-conductor lasers and optical approximately 80 billions neurons memories has given hope that optics are connected to each other in very and 100 trillions connections, ANNs may be used to build an all-purpose complex networks, and where the have to be limited to much less neu- computing platform. Alas, the pro- response of a neuron can be trig- rons and weights, and to much sim- gress in optics failed to match Moore’s gered in a complex and non-linear pler architectures, in order to make law exponential pace, and the hope way by the electric influx it receives the training of the network possible. of building such an all-purpose opti- from many other neurons. Artificial Several typical architectures have cal computer was abandoned in the neural networks are similarly made been developed over the last decades, nineties [2]. Still, optics found nume- of “neurons” or nodes, that integrate to maximize efficiency on a given rous applications in storage, and of signals from other neurons, with va- task, while keeping the training of course in telecommunications, both rious weights, and emit a resulting the neural network computationally in long-distance with optical fibers, signal based on a non-linear activa- tractable. Most of the time, neurons and more recently in interconnects. tion function. This signal is, in turns, are organized in layers of various sizes Neural networks. In parallel, a fed to a number of other neurons. (number of neurons) and connecti- computing paradigm, resolutely vity. It ranges from the simplest different from conventional pro- networks, such as the perceptron gramming, emerged also in the Figure 2. (a single layer linking N inputs to a 1950’s: Artificial neural networks or Structure of an artificial neural network. Left: single output) which was one of the ANN, on which all modern artificial an artificial “neuron” comprising several inputs earliest ANN, to multi-layered fee- intelligence is based. It is (loosely) value, and one or many outputs, result of the dforward neural networks (where inspired from the structure and be- non-linear combinations of the inputs. Center neurons are organized in successive haviour of the brain, where neurons and left: two popular ANN architectures. layers and information is passed from 50 www.photoniques.com I Photoniques 104 BACK TO BASICS layer to layer) to recurrent networks GPU and CPUs. The rise of deep (where information can flow backwards learning and big data has been mostly and be fed back to previous layers). The powered by Moore’s law, allowing trai- connections from a layer to the next can ning and inference of very large neural be very sparse, in particular convolutio- networks. An important factor driving nal layers, or dense (all-to-all connected). deep learning is the transition to Graphic The performance of ANNs depends on Processing Units (GPU). Initially de- its structure, for instance a perceptron signed for computer graphics, these is good for simple linear classification, specialized processors were optimized but more complex tasks require more for parallel processing of large vectors complex network structures. and matrices. For neural networks, Deep learning. While artificial intel- where training and inference require ligence saw good progresses, until the a vast number of such multiplications, early 2000s, its overall performance for GPUs turned out to be much more power- day-to-day tasks remained modest and ful than CPUs (Central Processor Unit) did not find any clear real-life applica- and are now ubiquitous - incidentally, tions. This changed tremendously in the NVIDIA, the leader in GPU for deep lear- last two decades, thanks to the emer- ning, has now a capitalization that is on gence of a powerful architecture: Deep par with Intel. However, GPU and CPU learning, and its corollary networks, are still enormously power-hungry: it has known as Deep Neural networks. Deep been shown that training a single neu- Neural networks are layered networks ral network can use as much energy as with a large number of “hidden” layers. 5 cars over their lifetime, and more glo- Pioneers such as Yann Lecun, Yoshua bally, big data and data centers already Bengio and Geoffrey Hinton, have shown account for an estimated 4% of our en- that deep neural networks have an abi- ergy, and it may grow to over half of our lity to solve highly complex problems energy consumption in the next decade, [3]: in essence, while the first layers can if nothing changes. Meanwhile, Moore’s pre-process the input information (for law is officially stalling: nanolithography instance contours in images, or words and transistors are reaching their phy- in text), deeper layers can gradually sical limits, progresses in consumption distil more abstract concepts, such as and speed are getting much slower [4]. identifying an object, or extracting the Worse, the implementation of neural sense of a text. Nowadays, deep learning networks on both CPU and GPU suffers has demonstrated unprecedented per- from the so-called “Von Neumann bottle- formance at tasks that we only recently neck”: the bus transferring data between believed would be forever out of reach of memory and computing units ultimately machines, from beating the best player limits performance.

Load more