commentary Why future supercomputing requires optics H. John Caulfi eld and Shlomi Dolev Could optical technology off er a solution to the heat generation and bandwidth limitations that the computing industry is starting to face? The benefi ts of energy-effi cient passive components, low crosstalk and parallel processing suggest that the answer may be yes.

here are points in time at which a Matrix SLM well-known technology, which has been used and gradually improved Output vector T detector array for decades, reaches its physical limitations, forcing a shift towards dramatically diff erent technology. Automobiles, television, computers, photography and communication are all domains in Input vector which such shift s have occurred. Th e VCSEL array point at which these massive shift s in technology occur is oft en decades aft er the initial invention and oft en requires substantial investment. Th e time is fast approaching for such a shift in computer technology, as the frequency (bandwidth) limitations of silicon electronics and printed metallic tracks are reached. Th e computer industry has already acknowledged the situation by introducing multicore technology Lenses which implicitly recognizes the use of a distributed and parallel architecture to Lenses scale computer power. For decades, optics have proved to be the best means of conveying information from one point to another, as can clearly Figure 1 | The Stanford Vector Matrix Multiplier. The use of a parallel approach greatly improves be seen by the growth of the massive computing performance. VCSEL, vertical cavity surface emitting laser; SLM, spatial light modulator. Figure fi bre optics communications industry. reproduced with permission from ref. 11, © 2009 OSA. Th e fact that light beams can be easily transmitted in parallel in free space, and do not suff er from crosstalk, is of great electronic switching element — and that industrial-scale optical processing devices importance. Th e fabrication of multicore great progress has been made in all- have been partially successful6. In addition, chips requires fast buses to connect cores, optical switching and logic, there is an recent research activity in academia7–10, and new designs are considering the use opportunity to avoid light-to-electronics and the United States government of optical interconnects to aid reliable and and electronics-to-light conversions in the contract reviews from the Air Force and fast communication1. Indeed, the fi rst steps design of all-optical computers. from DARPA, are clear signs for great towards commercializing the use of optics It is important to note that the design excitement in the fi eld. for communication among cores were of optical computers does not necessarily Th e widely accepted Strong Church– recently reported by fi rms such as and have to mimic the design of the electronic Turing thesis asserts that anything can be Avago, among others2–5. computer, just as the design of electronic computed by a digital computer. Accepting But optical technology also has the computers did not mimic the design of this suggests that optics off ers no new opportunity to go beyond being just mechanical computing devices. Optics computational capability. To be useful, the a convenient pipe for ultrafast data implies a new means for the application best that optics can do is to accomplish transmission, and actually perform data of parallelism, with many beams in space the same goals as digital electronics, processing. Given that the basic computing representing individual computations. but in a more effi cient or advantageous element of a computer is a transistor — an Recently, attempts to produce manner. Th e advantage could be in speed,

NATURE PHOTONICS | VOL 4 | MAY 2010 | www.nature.com/naturephotonics 261

© 2010 Macmillan Publishers Limited. All rights reserved

nnphoton_.2010.94_MAY10.inddphoton_.2010.94_MAY10.indd 261261 110.4.160.4.16 99:42:01:42:01 AAMM commentary

energy consumption, heat generation or a 1 NAND the fundamental minimal operation another metric. A A’ required — the Boolean logic gates. Speed is a frequently suggested Laser Rolf Landauer12 showed that the energy advantage for optical computing. Ironically, needed to operate such a gate stems from the speed of optical processing is ultimately 0 0 the need to erase one of the output bits, as limited by the speed of electronic input B the Boolean logic gates have two binary and output. In essence, optical processing 0 AND inputs but only one binary output. Shortly is inextricably tied to Moore’s law for aft er that, Bennett, Landauer, Keyes and b electronic computing, and the maximum others realized that if the extra bit was benefi t that optics can off er today is to kept, there would be no minimum energy replace electronic functions that slow per gate operation. Aside from the energy system speeds. It is not the speed of needed to input data and readout results, components that counts; it is the speed the logic itself could be free. of the supercomputer as a whole. Optics Recently, the concept of logic systems can help in this respect. An algorithm’s that consume zero energy has been computational complexity is normally the proposed. Two independent ways to way algorithms are compared. Electronics Figure 2 | The concept of ‘zero-energy’ logic do this were recently discovered by has two ways to handle complexity: the use gates using optical passive elements. a, Powered Caulfi eld, Shamir, Hardy, Soref and of space (parallelism) and time. Optics has electronic elements to perform AND and NAND others13–15, accomplishing the theoretical a third way: fan-in and fan-out, in which boolean functions. A, A’ and B are the inputs of the thermodynamic ‘permission’ to do so. many independent beams may be modifi ed device. b, Rochon prisms. Optics, when compared with electronics, by a single pixel in an optical processor. has two important advantages — superior In principle, supercomputers based on fan-in and fan-out, and signal transmission optical technology may help to mitigate the Stanford multiplier. Th e input is a without the need to apply a voltage. the heat generation that plagues electronic vector of light sources that are projected Once generated, light propagates by itself computers. Th e heat problem of electronic in parallel by lenses on the rows of a slide according to Maxwell’s equations, and devices becomes much worse as speed or on a spatial light modulator, and the today we can manipulate the apparent increases and density decreases, both of output that encodes the multiplication speed of light in some striking ways, which are scaling rapidly according to product is obtained by gathering light without violating Maxwell’s equations Moore’s law (the two-year doubling in the from each row to a sensitive detector. or relativity. We also have at our number of transistors on a silicon chip). Loading data in parallel to the input vector disposal numerous passive elements for Early workers envisaged optical computers as and the spatial light modulator leads to a manipulating light that require no energy rivals of electronic computers. It now appears great improvement in computing speed. to be supplied and do not generate heat, for that they are allies in keeping Moore’s law Th e architecture supports principal DSP example waveguides, fi bres, mirrors, prisms alive. Faster operations require more power, primitives such as vector-by-matrix and beamsplitters. while smaller components provide less and matrix-by-matrix multiplications, Figure 2 presents two designs for optical area through which to remove the resulting convolution, correlation and discrete implementation of an energy-free logical heat from each element. Th e increased use Fourier transforms. Functioning as a gate. Both designs are zero-energy devices. of optical components that require little co-processor of a legacy processor, the Figure 2a is an example of using a new kind or, in the case of passive devices, no heat architecture can effi ciently process integers of logic (called directed logic) that gives dissipation may prove very important. Th ere as well as complex numbers and perform the same result as conventional Boolean are several approaches that use the special extended-precision arithmetic. Th ese logic but does so in a way that is optics- characteristics of light for information building blocks enable DSP-intensive friendly13. Th is device produces both A processing. Th ey are described in turn below. applications, and the fi nding of optimal AND B and also its complement A NAND solutions to several non-deterministic B. Note that NANDs can be combined Vector matrix multiplier polynomial-time complete problems using to implement any other kind of Boolean A vector matrix optical multiplier (Fig. 1) exponential space. Th e device operational logic gate. Figure 2b shows a generalized uses a lens to split the light received rate, of the order of at least 16 tera eight-bit optical logic element14,15. It is essentially from each entry of an input vector to integer operations per second, and power a physically embodied lookup table that the corresponding entries of the vectors rating, of more than 16,000 giga eight-bit computes A AND B, with NAND obtained of the matrix. When a beam traverses instructions per watt, show great promise by the use of additional Rochon prisms. the matrix, the intensity of the beam is for enhancing existing applications By choosing the beams to defl ect or not changed according to the value of the and introducing new optical computer to defl ect, we can implement any Boolean entry that the beam reached. Another applications such as video compression logic gate using only passive components lens is used to sum the resulting light or a three-dimensional geometry engine. (Rochon prisms of two diff erent intensity of each resulting vector. A high- Th e applications of such a device include displacements and defl ectors, shown in the performance, power-effi cient, optical motion estimation, string matching or fi gure as mirrors). digital signal processor (DSP) has recently solution to more instances of NP-hard been described11: when compared with (non-deterministic, polynomial-time) Optical pre-processing current technology, it provides a gain in computing problems. Designs based on optical pre-processing processing speed of two to three orders of (repeated copying and doubling the result magnitude while consuming signifi cantly Zero-energy logic iteratively) yield an exponential growth in less power. Its operation can be understood In the 1960s, the study of the physics the number of results with the number of by the architecture presented in Fig. 1, of computing rapidly focused on iterations. Th e idea is that an intermediate

262 NATURE PHOTONICS | VOL 4 | MAY 2010 | www.nature.com/naturephotonics

© 2010 Macmillan Publishers Limited. All rights reserved

nnphoton_.2010.94_MAY10.inddphoton_.2010.94_MAY10.indd 262262 110.4.160.4.16 99:42:02:42:02 AAMM commentary

result is recorded as an optical pattern on a Vertex 2 b a fi lm and then projected and (optically) modifi ed to gain the next multiplicative step over a larger fi lm, thus gaining exponential size results within a linear number of steps. Th e doubling process can be executed once, as a pre-processing stage, to support any future input. For example, all the Hamiltonian paths in a clique can be represented in a very large Vertex 3 matrix, with the transparent points in Vertex 1 each line of the maxtrix representing a single Hamiltonian path. When an actual weighted graph is given, the weight vector of the graph is multiplied in parallel by the Figure 3 | Architecture for solving Hamiltonian path problem. a, Each column represents a city, and each pre-processed matrix to obtain the best box in a column represent the history of a light beam that may arrive there. Red arrows represent light Hamiltonian path. beams; the left one is blocked by a barrier that represent a non-existing edge from vertex 1 to vertex 2. The Pre-processing by modifying and lowest box is a path that starts in the vertex represented by the column; the second group of three boxes copying previously obtained results has represents a path that started in another vertex and continued to the vertex represented by the column; proved to be useful in architectures that the last group of nine boxes represent paths of length two. b, Photo of a typical set-up, reproduced with are based on the representation of Turing permission from ref. 17, © 2007 Springer. machine confi gurations as a location in three-dimensional space16. Traditional microprocessors support a set of copying masks block the propagation of in a way similar to the diffi culties in fi rst computation primitives, such as addition light from these locations. prototypes for electronics-based processing and multiplication. A new set of primitives units. Th is is the beginning of an alliance that fi ts the optics capabilities, and is An outlook for the future between optics and electronics. It is an based on the Turing machine mapping Ultimately, electronics and optics in exciting time. ❐ to confi gurations, is suggested in ref. 17. computing are more complementary than One such representation is designed to competitive, and all applications of optics John Caulfi eld is in the physics department at Fisk solve the Hamiltonian path problem, or will involve some degree of electronics. University, 1000 17th Avenue, North Nashville, in fact the ‘travelling salesman’ problem. Th e development of faster and smaller Tennessee 37208, USA; Shlomi Dolev is at the Intuitively, if we allow many light beams to electronics makes optical processing ever Department of Computer Science, Ben-Gurion traverse all possible Hamiltonian paths in more useful and attractive in reducing the University of the Negev, Beer-Sheva 84105, . parallel, and design the traversal length to heat load and in increasing the net speed e-mail: hjc@fi sk.edu; [email protected] be proportional to the edges’ weights, the of a computing system. What is important beam that succeeds in doing so fi rst must is how to partition a task so that the net References have traversed the shortest Hamiltonian eff ect of both electronics and optics is 1. http://web.mit.edu/newsoffi ce/2009/optical-computing.html 2. http://www.acreo.se/templates/Page____2215.aspx path. In some cases, the number of beams optimized in a well-defi ned manner. Major 3. http://www.avagotech.com/pages/home/ that arrive simultaneously at a location companies, universities and government 4. http://techresearch.intel.com/articles/Tera-Scale/1419.htm representing a city may grow exponentially agencies are now studying the role that 5. Assefa, S. Xia, F. & Vlasov, Y. Nature 464, 80–84 (2010). 6. http://www.thirdwave.de/3w/tech/optical/EnLight256.pdf with the number of cities, creating a optics can have in obtaining the advantages 7. Dolev, S., Haist, T. & Oltean, M. (eds) in Proc. 1st Int. Workshop number that can overwhelm any physical of faster operation at lower heat burden. on Optical Supercomputing Vol. 5172 (Springer, 2008). device. Moreover, the traversal history of a Although quantum entanglement is a 8. Dolev, S. & Oltean, M. (eds) in Proc. 2nd Int. Workshop on Optical Supercomputing Vol. 5882 (Springer, 2009). beam can be lost. Th e suggested apparatus hot topic among physicists and computer 9. Caulfi eld, H. J., Dolev, S. & Green, W. M. J. (eds) J. Opt. Soc. Am. A uses a distinct location for each possibility scientists, it seems premature to expect it 26 (Optical High-Performance Computing feature issue) (2009). of beam traversal16. to solve our computing problems in the 10. Caulfi eld, H. J., Dolev, S. & Green, W. M. J. (eds) Appl. Opt. A 48 (Optical High-Performance Computing feature issue) (2009). Figure 3 shows three columns, with near future, as it is still at the stage of basic 11. Tamir, D., Shaked, N., Wilson, P. & Dolev, S. J. Opt. Soc. Am. A each column representing a city. Th e research. Currently, classical optics alone 26, A11–A20 (2009). arrival of a beam at a certain slot in seems to be ready to join electronics on 12. Landauer, R. IBM J. Res. Dev. 5, 183–191 (1961). 13. Hardy, J. & Shamir, J. Opt. Express 15, 150–165 (2007). a column corresponds to a particular the silicon chip to realize operations more 14. Caulifi eld, H. J., Soref, R. A, Zavalin, A. & Hardy, J. traversal path; the higher the slot is, powerful than either technology alone can Opt. Commun. 271, 365–376 (2007). the longer the path it represents. Th is deliver. Optical interconnects are already 15. Caulfi eld, H. J. in Proc. 2nd Int. Workshop on Optical allows detection of whether there exists used in practice to help remove electronic Supercomputing Vol. 5882, 30–36 (Springer, 2009). 16. Dolev, S. & Nir, Y. US patent 20050013531 (2005). 18–20 a Hamiltonian path. Th e architecture is bottlenecks , and optical-based zero- 17. Dolev, S. & Fitoussi, H. in Proc. 4th Int. Conf. FUN 2007, 120–134 designed for a fully connected graph that energy logic may relieve some of the (Springer, 2007). allows the representation of any input heating problems that Moore’s law implies. 18. Bozhevolnyi, S. I. Plasmonic Nanoguides and Circuits (World Scientifi c, 2008). graph by the use of barriers that block the Th e appropriate question today is 19. Brongersma, M. L. et al. in Plasmonic Nanoguides and Circuits light traversing from one city to another. no longer ‘why bother with optical (ed. Bozhevolnya, S. I.) (World Scientifi c, 2008). Th e barriers prevent light beams from computing?’ but rather ‘what is the 20. Fainman, Y., Ikeda, K. & Tan, D. T. H. in Proc. 2nd Int. Workshop on Optical Supercomputing Vol. 5882, 2–4 (Springer, 2009). traversing edges that do not exist in the best way to incorporate optics into input graph. Some locations in a column future supercomputers?’ Th ere are great Acknowledgements represent a path that has traversed a certain challenges in the actual prototyping We thank William Green, Shaya Fainman, Joseph Rosen, city more than once; pre-processing by for mass production of optical devices, Nati Shaked and Hen Fitoussi for helpful inputs.

NATURE PHOTONICS | VOL 4 | MAY 2010 | www.nature.com/naturephotonics 263

© 2010 Macmillan Publishers Limited. All rights reserved

nnphoton_.2010.94_MAY10.inddphoton_.2010.94_MAY10.indd 263263 110.4.160.4.16 99:42:02:42:02 AAMM