<<

BACK TO BASICS

ARTIFICIAL INTELLIGENCE: FROM ELECTRONICS TO ///////////////////////////////////////////////////////////////////////////////////////////////////

Sylvain GIGAN1,3,*, Florent KRZAKALA2,3, Laurent DAUDET3, Igor CARRON3 1 Laboratoire Kastler Brossel, ENS-Université PSL, CNRS, Sorbonne Université, Collège de France, Paris, France 2 Laboratoire de Physique de l’École Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France 3 LightOn, 2 rue de la Bourse, Paris, France *[email protected]

Machine Learning and big data are currently revolutionizing our way of life, in particular with the recent emergence of deep learning. Powered by CPU and GPU, they are currently hardware limited and extremely energy intensive. , either integrated or in free space, offers a very promising alternative for realizing optically machine learning tasks at high speed and low consumption. We here review the history and current state of the art of optical computing and optical machine learning.

https://doi.org/10.1051/photon/202010449

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

ince the dawn of voice recognition to translation, the memory through a communi- micro-electronics image analysis to future self-driving cation bus. This architecture has and the emergence cars. However, machine learning’s been basically unchanged since its of , both op- progress requires exponentially in- inception, and only improved thanks tics and electronics creasing resources, be it in memo- to the progress of microelectronics platforms have been ry, raw computing power, or energy and nanolithography, allowing the competing for infor- consumption. We introduce in this feature sizes of components to shrink mation processing and transmission. article the basics of neural networks, to 7 or less nanometers nowadays. While electronics has been overwhel- and see how this new architecture This has consequently diminished Smingly dominating computing for the shatters the status quo and provides tremendously the Ohmic losses and last 50 years thanks to Moore’s law, op- optics a new opportunity to shine in the energy consumption to a few pJ/ tics and photonics have been increa- computing, whether in free space or operation, and allowed the increase singly dominant for communications, in integrated photonics circuits. of the operating clock frequencies from long distance communications of the components to reach several with optical fibers to optical inter- EXPECTED CONTENTS GHz. Thus, component density has connects in data centers. Machine Optical computing. Classical com- driven the number of on Learning, that also originated in the puting, such as the one running on a processor to several tens of billions, 1950s, has seen tremendous develop- our PC, is based on the so-called Von- while driving its cost down. This is ments in the last decade. The emer- Neuman architecture laid out in the the well-known Moore’s law, lea- gence of deep neural networks has 1940s, where a program is stored in ding to the observation that a good become the de facto standard for big a memory, and instructions are read desktop PC nowadays has a proces- data analysis and many of the tasks and executed on a processor, while sing power of several TeraFlops (1012 that we today consider normal: from input and output are exchanged in floating point operations per second).

Photoniques 104 I www.photoniques.com 49 BACK TO BASICS

Optics has several advantages Figure 1. The network also includes input and compared to electronics: its intrin- Some historical examples of optical computing. output neurons, that either receive or sic parallelism, its almost unlimited Left: the 1972 Tilted Plane Optical processor send information to and from the out- bandwidth, the ability of simple used for synthetic Aperture Radar all optical side world. Just like the brain, a neu- transformation by simple propaga- image processing (from Kosma et al. Applied ral network can be made to “learn”, tion (such as a Fourier Transform) Optics 11.8 (1972): 1766-1777), right, a vector- i.e. be optimized for a given task, by compared to [1]. Thus, matrix-multiplier with optical feedback adjusting its weights, for instance optics has from the very start been (from Psaltis et al. Optics Letters 10.2 (1985): being fed at the input with images, considered as a viable alternative for 98-100) Reprinted with permission from being able to classify them into cate- analog computing. In the eighties, the © The Optical Society. gories. The analogy with the brain emergence of optical non-linearities, stops there: while the brain counts semi-conductor lasers and optical approximately 80 billions neurons memories has given hope that optics are connected to each other in very and 100 trillions connections, ANNs may be used to build an all-purpose complex networks, and where the have to be limited to much less neu- computing platform. Alas, the pro- response of a neuron can be trig- rons and weights, and to much sim- gress in optics failed to match Moore’s gered in a complex and non-linear pler architectures, in order to make law exponential pace, and the hope way by the electric influx it receives the training of the network possible. of building such an all-purpose opti- from many other neurons. Artificial Several typical architectures have cal was abandoned in the neural networks are similarly made been developed over the last decades, nineties [2]. Still, optics found nume- of “neurons” or nodes, that integrate to maximize efficiency on a given rous applications in storage, and of signals from other neurons, with va- task, while keeping the training of course in telecommunications, both rious weights, and emit a resulting the neural network computationally in long-distance with optical fibers, signal based on a non-linear activa- tractable. Most of the time, neurons and more recently in interconnects. tion function. This signal is, in turns, are organized in layers of various sizes Neural networks. In parallel, a fed to a number of other neurons. (number of neurons) and connecti- computing paradigm, resolutely vity. It ranges from the simplest different from conventional pro- networks, such as the perceptron gramming, emerged also in the Figure 2. (a single layer linking N inputs to a 1950’s: Artificial neural networks or Structure of an artificial neural network. Left: single output) which was one of the ANN, on which all modern artificial an artificial “neuron” comprising several inputs earliest ANN, to multi-layered fee- intelligence is based. It is (loosely) value, and one or many outputs, result of the dforward neural networks (where inspired from the structure and be- non-linear combinations of the inputs. Center neurons are organized in successive haviour of the brain, where neurons and left: two popular ANN architectures. layers and information is passed from

50 www.photoniques.com I Photoniques 104 BACK TO BASICS

layer to layer) to recurrent networks GPU and CPUs. The rise of deep (where information can flow backwards learning and big data has been mostly and be fed back to previous layers). The powered by Moore’s law, allowing trai- connections from a layer to the next can ning and inference of very large neural be very sparse, in particular convolutio- networks. An important factor driving nal layers, or dense (all-to-all connected). deep learning is the transition to Graphic The performance of ANNs depends on Processing Units (GPU). Initially de- its structure, for instance a perceptron signed for computer graphics, these is good for simple linear classification, specialized processors were optimized but more complex tasks require more for parallel processing of large vectors complex network structures. and matrices. For neural networks, Deep learning. While artificial intel- where training and inference require ligence saw good progresses, until the a vast number of such multiplications, early 2000s, its overall performance for GPUs turned out to be much more power- day-to-day tasks remained modest and ful than CPUs (Central Processor Unit) did not find any clear real-life applica- and are now ubiquitous - incidentally, tions. This changed tremendously in the NVIDIA, the leader in GPU for deep lear- last two decades, thanks to the emer- ning, has now a capitalization that is on gence of a powerful architecture: Deep par with Intel. However, GPU and CPU learning, and its corollary networks, are still enormously power-hungry: it has known as Deep Neural networks. Deep been shown that training a single neu- Neural networks are layered networks ral network can use as much energy as with a large number of “hidden” layers. 5 cars over their lifetime, and more glo- Pioneers such as Yann Lecun, Yoshua bally, big data and data centers already Bengio and Geoffrey Hinton, have shown account for an estimated 4% of our en- that deep neural networks have an abi- ergy, and it may grow to over half of our lity to solve highly complex problems energy consumption in the next decade, [3]: in essence, while the first layers can if nothing changes. Meanwhile, Moore’s pre-process the input information (for law is officially stalling: nanolithography instance contours in images, or words and transistors are reaching their phy- in text), deeper layers can gradually sical limits, progresses in consumption distil more abstract concepts, such as and speed are getting much slower [4]. identifying an object, or extracting the Worse, the implementation of neural sense of a text. Nowadays, deep learning networks on both CPU and GPU suffers has demonstrated unprecedented per- from the so-called “Von Neumann bottle- formance at tasks that we only recently neck”: the bus transferring data between believed would be forever out of reach of memory and computing units ultimately machines, from beating the best player limits performance. at the game of Go, to self-driving cars, to The dawn of optical Machine Learning. language translation, to give just a few To overcome this fundamental problem, salient examples. Such deep networks some non-conventional computing hard- have grown to unbelievable sizes, up to ware has been introduced, called “neu- tens of billions of parameters (weights) to romorphic”, where circuits are directly be trained. Thus, a key-enabling concept emulating the connectivity and functions that has allowed deep learning to scale to of a neural network, instead of a program large size is the ability to train such large on a CPU or GPU. This approach, that network efficiently: the back-propagation broadly belongs to non-von-Neumann algorithm, a concept perfectly matched architectures, should be much more en- to deep architectures, where the network ergy efficient, and fast. Of all the possible can be trained layer by layer from the last implementations of neuromorphic com- to the first with a gradient-descent algo- puting, Optics and Photonics stands out, rithm. Thanks to these, machine learning with unique advantages. First, can has entered the ability to make sense of propagate virtually without loss or heating, complex and very large size information; whether in free space, in many materials, or this is sometimes coined as “big data”. in integrated waveguides. This propagation

Photoniques 104 I www.photoniques.com 51 BACK TO BASICS

can be used to emulate the connecti- Figure 3. to training [5], or even molecular vity between two neural layers, but also LightOn’s optical processor. Left: scheme of dynamics (see Fig. 3). In fact, these convolutions, etc. do not na- the random projections principle, information random multiplications can be seen turally interact, meaning it is possible is encoded on a spatial light modulator, then as universal compression engines, to multiplex information, and power a random matrix multiplication is achieved by with performance guarantees that are consumption is independent of the passing through a disordered material, and the well matched to the very statistical na- operating frequency. Finally, thanks to result is read of a camera sensor. Right: example ture of modern machine learning. Of tremendous progress in optoelectro- of an advanced machine learning task, here the course, this is just one approach to op- nics, detectors (from fast photodiodes automatic detection of conformational change tical machine learning, and other ap- to CMOS cameras), modulators (from of a large molecule in molecular dynamic proaches, either based on free space fast integrated electro-optics modula- calculations (here on SARS-Cov2 molecule, or integrated optics, fixed or tunable tors to spatial light modulators), and responsible for the COVID-19 disease) [5]. weights, linear or non-linear effects, source (lasers), are extremely efficient shallow or deep, also proposes various and can be mass-produced. The semi- solutions to accelerate machine lear- conductor industry naturally provides propagation in complex media. In ning and support its future growth. the backbone to produce photonic in- essence, we currently provide very tegrated circuits. In short, optics has large-scale random matrix multipli- CONCLUSION several key-advantages to implement cation (corresponding to a dense all- In conclusion, we have presented an neural networks in a nearly ideal way. to-all connectivity) between millions historic perspective of optical com- Still, optics faces several challenges, of inputs (spatial light modulator puting and shown that, after having in particular the difficulty to achieve pixels) and millions of outputs (came- failed at proposing an all-purpose non-linearities in hidden layers, or the ra pixels). Able to operate at several computing platform in the 20th cen- challenge to scale and tune networks kHz, it corresponds to doing several tury, optics and photonics have more with integrated optics, preventing the Peta-Operations per second (typical recently emerged as very appealing possibility, to date, to provide a true ver- of supercomputers), with a matrix solutions for hybrid hardware imple- satile platform for deep learning. Yet, size that could not even be stored in mentation of neural networks, able optics can provide a very solid alterna- the memory of a conventional com- to sustain the growth in computing tive in specialized implementations, puter, and with a consumption of a power and supersede electronics, from ultrafast small scale networks, few tens of Watts. While apparently beyond Moore’s law. Optical neural to convolutions and pre-processing very specific, the operation we pro- networks have recently rebooted the in imaging, to reservoir computing (a pose can be useful in many data pro- interest in optical computing, and we type of RNN with fixed weights). After cessing applications, from inference believe it is just the beginning. pioneering works in the 80s and 90s, many impressive advances have been reported in academia in the last decade, and industry also shown a renewed in- RÉFÉRENCES terest, whether within big companies or through start-up creations. [1] J.W. Goodman, Opt. Photonics News 2, 11 (1991) An example, LightOn. As an il- [2] R. Athale, P. Demetri Psaltis, Opt. Photonics News 27, 32 (2016) lustration of how optics can bene- [3] Y. LeCun, Y. Bengio, G. Hinton, Nature 521, 436 (2015) fit machine learning, LightOn (the [4] M.M. Waldrop, Nature 530, 144 (2016) company we co-founded in 2016) proposed a solution to perform op- [5] LightOn white paper "Photonic Computing for Massively Parallel AI". https://lighton.ai/wp-content/uploads/2020/05/LightOn-White-Paper-v1.0-S.pdf tical machine learning, based on our experience in free-space light

52 www.photoniques.com I Photoniques 104