<<

NEUROMORPHIC

Mitchell A. Nahmias, Bhavin J. Shastri, Alexander N. Tait, Thomas Ferreira de Lima and Paul R. Prucnal

34 & PHOTONICS NEWS JANUARY 2018 Getty Images

Sed min cullor si deresequi rempos magnis eum explabo. Ut et hicimporecum sapedis di aut eum quiae nonem et adi.

NEUROMORPHIC PHOTONICS

Photonic neural networks have the potential to revolutionize the speed, energy efficiency and throughput of modern computing—and to give Moore’s law–style scaling a new lease on life.

JANUARY 2018 OPTICS & PHOTONICS NEWS 35 Neuromorphic photonics uses modern fabrication techniques to SYNERGISTIC APPROACH implement efficient, scalable analog photonics operations.

Microwave photonics Limited functionality

Analog Fabrication Photonics Industry Neuromorphic photonics

Free-space networks Cannot be integrated Unsustainable performance

Scalable Computing Models

n an age overrun with information, the ability to Moving beyond Moore process vast volumes of data has become crucial. In the lat er half of the 20th century, microprocessors The proliferation of microelectronics has enabled faithfully adhered to Moore’s law, the well-known pre- the emergence of next-generation industries to diction of exponentially improving performance. As support emerging artif cial-intelligence services Gordon Moore originally predicted in 1965, the den- Iand high-performance computing. These data-inten- sity of , clock speed, and power ef ciency sive enterprises rely on continual improvements in in microprocessors doubled approximately every 18 hardware—and the demand for data will continue to months for most of the past 60 years. grow as smart gadgets multiply and become ever more Yet this trend began to languish over the last decade. integrated into our daily lives. Unfortunately, however, A law known as Dennard scaling, which states that those prospects are running up against a stark reality: microprocessors would proportionally increase in the exponential hardware scaling in digital electronics, performance while keeping their power consumption most famously embodied in Moore’s law, is fundamen- constant, has broken down since about 2006; the result tally unsustainable. has been a trade-of between speed and power ef ciency. This situation suggests that the time is ripe for Although densities have so far continued to a radically new approach: neuromorphic - grow exponentially, even that scaling will stagnate once ics. An emerging f eld at the nexus of photonics and device sizes reach their fundamental quantum limits neuroscience, neuromorphic photonics combines the in the next ten years. advantages of optics and electronics to build systems One route toward resolving this impasse lies in with high ef ciency, high interconnectivity and high photonic (PIC) platforms, which have information density. In the pages that follow, we take recently undergone rapid growth. Photonic communica- a look at some of the traditional challenges of photonic tion channels are not bound by the same physical laws information processing, describe the photonic neural- as electronic ones; as a result, photonic interconnects network approaches being developed by our lab and are slowly replacing electrical wires as communica- others, and of er a glimpse at the future outlook for tion bottlenecks worsen. PICs are becoming a key this emerging f eld. part of communication systems in data centers, where

36 OPTICS & PHOTONICS NEWS JANUARY 2018 1047-6938/18/01/34/8-$15.00 ©OSA Neuromorphic photonics combines the advantages of optics and electronics to build systems with high efficiency, high interconnectivity and high information density.

microelectronic compatibility and high-yield, low-cost multiplexing: waveguides can carry many signals along manufacturing are crucial. Because of their integration, dif erent wavelengths or time slots simultaneously PICs can allow photonic processing at a scale impos- without taking up additional space. This combination sible with discrete, bulky optical-f ber counterparts, enables an enormous amount of information—easily and scalable, CMOS-compatible silicon-photonic sys- more than one terabyte per second—to fl ow through a tems are on the cusp of becoming a commercial reality. waveguide only half a micron wide. PICs have several unique traits that could enable practical, scalable photonic processing and could leap- Energy efficiency. Photonic operations have the poten- frog the current stagnation of Moore’s law–like scaling tial to consume orders of magnitude less power than in electronic-only set ings: digital approaches. This property comes from so-called linear photonic operations (that is, those that can be Speed. Electronic microprocessor clock rates cannot described using linear algebra). Transmission elements exceed about four GHz before hit ing thermal-dissipation are sometimes considered to dissipate no energy; how- limits, and parallel architectures, such as graphic pro- ever, it always takes energy to generate, modulate and cessing units, are limited to even slower timescales. In receive signals. Nonetheless, the lack of a funda- contrast, each channel in a photonic system, by default, mental energy cost per operation means that photonic can operate at upwards of twenty gigahert to support processors may not be subject to the unfavorable scaling f ber optic communication rates. laws that have stymied further performance returns in electronic systems. Information density. Paradoxically, despite the large sizes of on-chip photonic devices—whose lower bound on size must exceed the wavelength of the light that Photonic signal processing travels through them—PICs can pack orders of mag- Optical signal processing has a rich history, but opti- nitude more information in every square centimeter. cal systems have had dif culty achieving scalability in One reason is that photonic signals operate much computing. Extensive research has focused on imple- faster, thereby shuffl ing much more data through the menting optical-computing operations using both digital system per second. Another is that lightwaves exhibit bits and continuous-valued analog signals. Concepts the superposition property, which allows for optical for neuro-inspired photonic computing originally

Von Neumann architecture Neural network architecture Photonic neural network

waveguide

Central weights Processing Unit input 01101010 10101110

input program data output input output

Memory output

Neural nets: The photonic edge Von Neumann architectures (left), relying on sequential input-output through a central processor, differ fundamentally from more decentralized neural-network architectures (middle). Photonic neural nets (right) can solve the interconnect bottleneck by using one waveguide to carry signals from many connections (easily N2~10,000) simultaneously.

JANUARY 2018 OPTICS & PHOTONICS NEWS 37 envisioned systems that used vertically oriented light applications and a market need for increased informa- sources or spatial light modulators together with free- tion fl ow both between and within processors. space holographic routing. Many researchers imagined These changes have led to an explosion in PICs, that an optical would consist of a 3-D holo- which are already f nding their way into fast Ethernet graphic cube programmed to route signals between switches in servers and data centers. Microwave photon- arrays of LEDs. ics are also emerging as a contender for radio-frequency Although optical logic devices later developed into applications, now enabled by the low cost of microchip the switches and routers that form today’s telecom- photonic integrated components. Researchers have munications infrastructure, optical computing did not implemented digital photonic devices in various tech- achieve the same level of success. Researchers realized nologies, including f bers, waveguides, semiconductor that the scaling laws for electronic components could devices and resonators. continue to address the bot lenecks in traditional pro- Both the analog and the digital approaches to optical cessors for many years to come. The ceaseless march of computing, however, still face challenges. Increasing Moore’s law meant that, while optical computing sys- the number of analog operations leads to noise and tems might outperform electronics in the short term, degrades signal integrity, limiting the potential com- microprocessors would eclipse them in several years. plexity of optical processors. And, while digital systems A close look at the hardware reveals that the past f lter out noise during every step and can f x errors after challenges of optical computing—and, particularly, they occur—making it easy for engineers to design com- optical neural computing—lay chiefl y in a few factors: plex systems with many interacting components—the the continued favorable scaling of electronic devices, high scaling cost of digital photonic devices makes this the packaging dif culties associated with free-space approach both prohibitively expensive and impractical. coupling and holographic interconnects, and the dif - culty in shrinking optical devices. Now, about 30 years Photonic neural networks later, the landscape has changed tremendously. With Neural network approaches represent a hybrid between Moore’s law confronting fundamental limitations, the the purely digital and analog approaches, allowing for scaling of electronics can no longer be taken for granted. more ef cient processors that are both less resource-inten- Meanwhile, large-scale integration techniques are start- sive and robust to noise. But what is a neural network? ing to emerge in photonics, driven by telecommunication Most modern microprocessors follow the so-called von Neumann architecture, in which machine instruc- tions and data are stored in memory and share a central 109 communication channel, or bus, to a processing unit. 108 Instructions def ne a procedure to operate on data, 107 Neuromorphic which is continually shuffl ed back and forth between 106 photonics State-of-the-art memory and the processor. 105 Neural networks function quite differently. 104 Neuromorphic electronics Individually, neurons can perform simple operations 103 TrueNorth 102 such as adding inputs together or f ltering out weaker NeuroGrid HICANN signals. In groups, however, they can implement far Efficiency (GMAC/s/W) 10 Von Neumann efficiency wall more complex operations through the formation of 1 Digital electronics –1 10 SpiNNaker Microwave electronics networks. Instead of using digital 0’s and 1’s, neural net- 10–2 works represent information in analog signals, which 1 102 104 106 108 1010 1011 Computational Speed (MMAC/s/cm2) can take the form of either continuous real-number values or of spikes in which information is encoded in Electronic vs. photonic neural nets the timing between short pulses. Rather than abiding Neuromorphic architectures potentially sport better speed- by a sequential set of instructions, neurons process to-efficiency characteristics than state-of-the-art electronic data in parallel and are programmed by the connec- neural nets (such as IBM’s TrueNorth, Stanford University’s tions between them. Neurogrid, the University of Heidelburg’s HICANN), as well as advanced digital electronic systems (such as the The input into a particular neuron is a linear com- University of Manchester’s SpiNNaker). bination—also referred to as a weighted addition—of

38 OPTICS & PHOTONICS NEWS JANUARY 2018 Rather than abiding by a sequential set of instructions, neurons process data in parallel and are programmed by the connections between them.

the output of other neurons. These connections can be A neural weighted with negative and positive values, respec- network being tively, which are called (borrowing the language of tested at Princeton University. neuroscience) inhibitory and excitatory synapses. The weighting is therefore represented as a real number, and the interconnection network can be expressed as a matrix. Photonics appears to be an ideal technology with which to implement neural networks. The greatest computational burden in neural networks lies in the interconnectivity: in a system with N neurons, if every Princeton University Lightwave Lab, 2017 neuron can communicate with every other neuron (plus itself), there will be N2 connections. Just one more neuron adds N more connections—a prohibitive situation if N variety of photonic neural models that exhibit a range is large. Photonic systems can address this problem in of interesting properties. two ways: waveguides can boost interconnectivity by carrying many signals at the same time through optical A spectrum of implementations multiplexing; and low-energy, photonic operations can One such photonic neural model, currently under inves- reduce the computational burden of performing linear tigation in our lab, involves engineering dynamical functions such as weighted addition. For example, by to resemble the biological behavior of neurons. associating each node with a color of light, a network Laser neurons, operating optoelectronically, can operate could support N additional connections without nec- at approximately 100 million times the speed of their essarily adding any physical wires. biological counterparts, which are rate-limited by bio- We can understand this beter through the example chemical interactions. These lasers represent neural of a multiply-accumulate (MAC) operation. Each such spikes via optical pulses by operating under a dynami- operation represents a single multiplication, followed cal regime called excitability. Excitability is a behavior by an addition. Since, mathematically, MAC operations in feedback systems in which small inputs that exceed comprise dot products, matrix multiplications, convo- some threshold cause a major excursion from equilib- lutions and Fourier transforms, they underlie much of rium—which, in the case of a laser neuron, releases an high-performance computing. They also constitute the optical pulse. This event is followed by a recovery back most costly operations in both hardware-based neural to equilibrium, the so-called refractory period. networks and machine-learning algorithms. In the We have found a theoretical link between the dynam- digital domain, MACs occur in a serial fashion, which ics of semiconductor lasers and a common neuron model means that the time and energy costs increase with the used in computational neuroscience, and have demon- number of inputs. strated how a laser with an embedded graphene section In contrast, passive lightwave devices, such as could efectively emulate such behavior. Building from wavelength-sensitive flters, do not inherently dissipate these results, a number of research groups have fabri- energy and can efciently perform such operations in cated, tested and proposed laser neurons with various parallel. They can therefore greatly enhance high per- feedback conditions. These include two-section models formance computing, especially systems that rely on in semiconductor lasers, photonic-crystal nanocavities, matrix multiplication. In addition, reprogrammabil- polarization-sensitive vertical cavity lasers, lasers with ity is possible with tunable photonic elements. These optical feedback or optical injection, and linked photo- advantages have motivated researchers to investigate a detector–laser systems with receiverless connections or

JANUARY 2018 OPTICS & PHOTONICS NEWS 39 20 W = 0.449 F W F −1 0 10 W win = 1 W F 0 0 s -10 1 s

] (mV) ] 2

2 -20 ] (mV) , s 2 20 W = 0.529 F 1 , s

1 10 (normalized) [s 0 1 -10 : s

s tate -20 20 W = 0.629 F Neuron state [s ur on 10 1 State

Ne 0 -10 -20 -0.5 0 0.5 Time (ms) (normalized) Self weight: W 2

F State 2 : s Left: A photonic neural network that can be implemented in silicon photonics. Right: The on-chip system with modulator neurons displays a characteristic oscillation called a Hopf bifurcation, which confirms the presence of an integrated neural network. Princeton University Lightwave Lab, 2017/ A. Tait et al., Sci. Rep. 7, 7430 (2017).

resonant tunneling. A recently demonstrated approach Institute of Standards and Technology—and said to based on optical modulators has the potential to exhibit of er potentially unmatched energy ef ciency—could much lower conversion costs from one processing be compatible with broadcast-and-weight. stage to another, and to be fully integrated on silicon- Coherent. A coherent approach, which uses destruc- photonic platforms. tive or constructive interference effects in optical Toward scalable networks interferometers to implement a matrix-vector operation Researchers have lately investigated interconnection on incoming signals, was recently demonstrated by a protocols that can tune to any desired network con- research team led by Marin Soljačić and Dirk Englund f guration. Arbitrary weights allow a wide array of and at the Massachuset s Institute of Technology, USA. potential applications based on classical neural net- In such an architecture there is no need to convert from works. Several notable approaches use complementary the optical domain to the electrical domain; hence, inter- physical ef ects in this regard. facing a coherent system with photonic, nonlinear nodes (for example, based on the Kerr ef ect) could in principle Broadcast-and-weight. A broadcast-and-weight neural allow for energy ef cient, passive all-optical processors. network architecture, demonstrated by our group at the The coherent approach is, however, limited to only Princeton Lightwave Lab, uses groups of tunable f lters one wavelength, and requires devices much larger to implement weights on signals encoded onto multiple than tunable f lters, which puts a cap on the infor- wavelengths. Tuning a given f lter on and of resonance mation density that the approach can achieve in its changes the transmission of each signal through that current form. In addition, all-optical interconnects f lter, ef ectively multiplying the signal with a desired must grapple with both amplitude and phase, and no weight. The resulting weighted signals travel into a solution has yet been proposed to prevent phase noise , which can receive many wavelengths accumulation from one stage to another. Nonetheless, in parallel to perform a summing operation. the investigation of large-scale networking schemes Broadcast-and-weight takes advantage of the enor- is a promising direction for the integration of various mous information density available to on-chip photonics technologies in the f eld towards highly scalable on- through the use of optical multiplexing, and is compat- chip photonic systems. ible with a number of laser neuron models. Filter-based weight banks have also been investigated both theoreti- Reservoir computing. A contrasting approach to cally and experimentally in the form of closely packed tunable neural networks being pursued by a number microring filters, prototyped in a silicon-photonic of labs, reservoir computing extracts useful information platform. And the interconnect architecture of a fully from a f xed, possibly of interacting integrated superconducting optoelectronic network nodes. Reservoirs require far fewer tunable elements recently proposed by scientists at the U.S. National than neural-network models to run ef ectively, making

40 OPTICS & PHOTONICS NEWS JANUARY 2018 Neuromorphic photonic processing has the potential to one day usher in a paradigm shift in computing—creating a smarter, more efficient world.

them less challenging to implement in hardware; how- Photonic processors also have unmatched speeds and ever, they cannot be easily programmed. These systems latencies, which make them well suited for specialized have utilized optical-multiplexing strategies in both applications requiring either real-time response times time and wavelength. Experimentally demonstrated or fast signals. One example is a front-end processor in photonic reservoirs have displayed state-of-the-art per- radio-frequency transceivers. As the wireless spectrum formance in benchmark classifcation problems, such becomes increasingly overcrowded, the use of large, as . adaptive phased-array antennas that receive many more radio waves simultaneously may soon become the norm. Marching ahead Photonic neural networks could perform complex sta- It remains to be seen in what ways photonic processing tistical operations to extract important data, including systems will complement microelectronic hardware, but the separation of mixed signals or the classifcation of current technological developments look promising. For recognizable radiofrequency signatures. example, the fxed cost of electronic-to-photonic con- Still another application example lies in low-latency, version is no longer as energetically unfavorable as in ultrafast control systems. It’s well understood that the past. A modern silicon-photonic link can transmit recurrent neural networks can solve various problems a photonic signal using only femtojoules of energy per that involve minimizing or maximizing some known bit of information, whereas thousands of femtojoules of function. A processing method known as Hopfeld energy are consumed per operation in even the most optimization requires the solution to such a problem efcient digital electronic processors, including IBM’s during each step of the algorithm, and could utilize TrueNorth cognitive computing chip and Google’s ten- the short convergence times of photonic networks for sor processing unit. nonlinear optimization. The comparisons should get beter still as perfor- Fiber optics once rendered copper cables obsolete mance scaling in optoelectronic devices continues to for long-distance communications. Neuromorphic improve. New modulators or lasers based on plasmonic photonic processing has the potential to one day usher localization, graphene modulation or nanophotonic in a similar paradigm shift in computing—creating a cavities have the potential to increase efciency. The smarter, more efcient world. OPN next generation of photonic devices could potentially Mitchell A. Nahmias, Bhavin J. Shastri, Alexander N. Tait, consume only hundreds of atojoules of energy per time Thomas Ferreira de Lima and Paul R. Prucnal (prucnal@ slot, allowing analog photonic MAC-based processors princeton.edu) are with the Department of Electrical Engi- to consume even less per operation. neering, Princeton University, Princeton, N.J., USA. In light of these developments, photonic neural net- works could fnd a place in many applications. These References and Resources systems can act as a coprocessor for performing compu- tationally intense linear operations—including MACs, c P. Prucnal and B. Shastri. Neuromorphic Photonics (CRC Press, 2017). Fourier transforms and convolutions—by implementing c M. Nahmias et al. J. Sel. Top. Quantum . 19, them in the photonic domain, potentially decreasing 1800212 (2013). the energy consumption and increasing the through- c A. Tait et al. J. Lightwave Technol. 32, 4029 (2014). put of signal processing, high-performance computing c B. Shastri et al. Sci. Rep. 5, 19126 (2015). c P. Prucnal et al. Adv. Opt. Photon. 8, 228 (2016). and artifcial-intelligence algorithms. This could be a c A. Tait et al. Sci. Rep. 7, 7430 (2017). boon for data centers, which increasingly depend on c Y. Shen et al. Nat. Photon. 11, 441 (2017). such operations and have consistently doubled their c J.M. Shainline et al. Phys. Rev. Appl. 7, 034013 (2017). energy consumption every four years. c G. Van der Sande et al. Nanophotonics, 6, 561 (2017).

JANUARY 2018 OPTICS & PHOTONICS NEWS 41