3D CMOS-Memristor Hybrid Circuits: Devices, Integration, Architecture, and Applications

Kwang-Ting Cheng and Dmitri B. Strukov Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 93106 {timcheng, strukov}@ece.ucsb.edu

ABSTRACT that can directly benefit from 3D integration [5] include those demanding significant amounts of memory access In this paper, we give an overview of our recent research (such as imaging, networking, and computing) as well as efforts on monolithic 3D integration of CMOS and those whose performance is dominated by interconnects memristive nanodevices. These hybrid circuits combine a (such as and FPGAs [6] ). As both military and CMOS subsystem with several layers of nanowire crossbars, industry are moving toward multi-core, multi-threaded, and consisting of arrays of two-terminal memristors, all multi-media applications, the demand for lower latency and connected by an area-distributed interface between the higher bandwidth between computing elements and memory CMOS subsystem and the crossbars. This approach is growing fast. combines the advantages of CMOS technology, including its high flexibility, functionality and yield, with the extremely Beyond these immediate needs, there is tremendous high density of nanowires, nanodevices and interface vias. potential in the field of neuromorphic circuits. Indeed, in As a result, the 3D hybrids can overcome limitations many cases, conventional using digital pertinent to other 3D integration techniques (such as architectures are inadequate for real-time applications such as through-silicon vias) and enable 3D circuits with pattern classification (including threat detection and face unprecedented memory density (up to 1014 bits on a single 1- recognition). Neuromorphic architectures may be a viable cm2 chip) and aggregate interlayer communication bandwidth solution to address challenges associated with inherent (up to 1018 bits per second per cm2) at manageable power requirements for massive, parallel information processing, dissipation. Such performance represents a significant step provided that dense integration between memory and towards addressing the most pressing needs of modern processing elements can be obtained. compact electronic systems. In order to realize the true potential of 3D integration, vertical stacking must maintain a sufficiently high density of Categories and Subject Descriptors vertical interconnects to provide high-bandwidth and low- B.3.m [Memory Structures]: Miscellaneous. B.7.1 latency communication to and from each (“tier”) in the [Integrated Circuits]: Types and Design Styles. stack, without sacrificing too much area for vias. Presently, General Terms vertically stacking of multiple tiers of 3D ICs is implemented Design, Reliability. using through-silicon vias (TSVs) (e.g. [7] ), micro-bumps, or capacitive/inductive couplings (e.g. [8] ). Such vertical Keywords integration solutions suffer from power dissipation problems, 3D Integration; Memristors. low interconnect density (because of a poor accuracy of wafer alignment, as compared with that of photolithographic 1. Introduction masks defining features on a single wafer for multiple Three-dimensional (3D) circuits are a natural way of metallization layers), low yields (due to the lack of the increasing integration density to overcome the inevitable known-good-die assembly), and poor cost efficiency [9] . As limitations in the lateral scaling of electron devices [1] . In a result, such solutions cannot provide adequate density and comparison with 2D planar circuits, 3D integrated circuits throughput that will be required, for example, for real time (ICs) offer the potential benefits of better performance, signal processing from focal plane arrays [10] . higher connectivity, reduced interconnect delays, lower 3D integration could fully benefit from monolithic ICs, power consumption, better space utilization, and more but previous approaches to 3D monolithic integration have flexible heterogeneous integration [2] [3] [4] . Applications been limited by the requirement to integrate active components in a vertical stack (e.g. [11] [12] [13] ). Such Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are vertical integration of active devices faces severe thermal not made or distributed for profit or commercial advantage and that copies management, design complexity, and testability challenges. bear this notice and the full citation on the first page. To copy otherwise, or Moreover, multilayer CMOS circuits with thin-film republish, to post on servers or to redistribute to lists, requires prior specific have inadequate characteristics for high permission and/or a fee. performance memory and logic applications. ISPD’12, March 25–28, 2012, Napa, California, USA. Copyright 2012 ACM 978-1-4503-1167-0/12/03...$10.00.

33 These challenges can be overcome with 3D hybrid [31] , high performance reconfigurable logic circuits [25] circuits that monolithically integrate conventional CMOS [33] [34] [35] [38] [39] and to achieve high bandwidth circuits with quasi-passive, two-terminal, nanoscale devices between monolithically integrated memory and logic in a distributed in additional layers. To pursue this idea, we System-On-Chip fashion. In addition, due to its unique formed the Center of 3D Hybrid CMOS-Nano Circuits properties, CMOL may be the first technology to enable bio- (HyNano), funded by the AFOSR with an award from the inspired massively parallel advanced information processing 2011 Multidisciplinary Research Program of the University [14] [37] [40] . Research Initiative (MURI). HyNano consists of 9 faculty The main objectives of our recent efforts are extending members from multiple disciplines, including computer the CMOL concept to the third dimension, and to engineering, electrical engineering, materials science, and simultaneously make this approach practical by improving physics, from four universities: University of California, memristive crosspoint devices, understanding the solid state Santa Barbara (the lead institution), Stony Brook University physics of their operation, and developing those computer (SUNY), at Ann Arbor, and architectures with the greatest benefit from 3D CMOL University of Massachusetts at Amherst. In this paper we technology. give a brief overview of the main ideas underpinning this approach and discuss preliminary results, including the 3. Technical Challenges and Recent Progress conceptual design, simulations and key experimental milestones. 3.1 Resistive Switching “Memristive” Devices 2. Overview of 3D Hybrid Circuits In the simplest form the memristive (a) S OFF Figure 1 demonstrates the main idea of CMOL circuits ion device consists of three V [14] [15] [16] - the acronym stands for Cmos+MOLecular layers, top and bottom conductor scale devices, which were recently extended to 3D circuits (metallic) A electrode ON [27] . Such circuits are based on the combination of a single and a of some conventional CMOS chip and several layers of quasi-passive material sandwiched in nanoscale crosspoint devices. In the simplest case these are between them (Fig. 2). (b) two-terminal resistive switching (“memristive”) devices (Fig. By applying voltage I state ON 1c), which can be reversibly switched between high and low bias across electrodes of -Vt such device, the electrical conductivity state OFF (a) (b) (c) 200 of a thin film can be 0 V 100 changed reversibly and add‐on top 50 nm hp state ON uA ) nanowire ( retained for sufficiently level 0 + similar Pt long time between high VOFF +Vt VON ‐100 CMOS TiO2 V two‐terminal Current stack <50 ns TiOx conductive (ON) state nanodevices Pt ‐ bottom nanowire at each ‐200 ‐2 ‐1 0 1 2 and the highly resistive level crosspoint Voltage ( V ) (OFF) state. Figure 2b Fig. 2. (a) Device structure and (b) shows schematically an typical hysteretic I–V behavior for Fig. 1. The basic idea of 3D CMOL circuits [15] [16] [26] : (a) I-V for a bipolar bipolar switching, shown hybrid circuit cartoon, (b) crossbar topology, and (c) micrograph of switching device. In schematically [15] [16] [19] [26] . array of metal oxide memristive devices, and typical switching I-V bipolar devices curves [21] . electrical stress (voltage or current bias) of opposite polarity is required to the device between ON and OFF states resistive states [16] [17] [18] [21] [23] . and a particular shape of I-V in the ON/OFF states might The main advantage of memristive devices is their very vary and will depend on the particular device and/or high density. This high density may be sustained with their additional layers integrated in the device stack. Note that in interconnection by crossbar circuits, in which nanodevices addition to the digital mode operation most of the memristive are sandwiched between two layers of parallel nanowires devices can be switched continuously between ON and OFF (Fig. 1b). Because of crossbar regularity and lack of overlay states (Fig. 2b) by applying gradually increasing electrical (alignment) requirements, they allow for simple and cost- stress. Also, Fig. 2 does not show the so-called “forming efficient fabrication. step,” which might be required before the devices can be switched reversibly. Such a forming step is essentially a one- Moreover, in CMOL circuits, high density and low cost time application of relatively large voltage bias and might be nanodevices are combined with the most attractive properties eliminated in properly engineered devices. of CMOS technology, including high flexibility, functionality and yield. Such synergy can be effectively Resistive switching has been observed experimentally exploited to build terabit scale digital memories [24] [30] for at least fifty years [20] . The interest in devices exhibiting

34 resistive bistability was greatly revived as further scaling of based on memristive properties common for Pt/TiO2-x/Pt, CMOS technology has become increasingly challenging. was design to tune device conductance at a specific bias Although, many material systems exhibit resistive switching point to 1% relative accuracy (which is roughly equivalent to and several are compatible with the CMOS processes, these seven-bit precision) within its dynamic range even in the devices face their own challenges. In particular, they suffer presence of large variations in switching behavior. from low yield and high device-to-device variability. Moreover, the device operation physics, which is likely 3.2 Crossbar Circuits different for different material systems, is still not well The footprint of the memristive devices could be very understood. small and essentially defined by the overlap area of the two 2 There are three main classes of the materials systems for electrodes (), i.e. close to 4Fnano , where Fnano is a memristive devices. The first one is based on chalcogenide half pitch. The most natural way of sustaining the density of materials, with resistance modulation in this materials is single devices is to integrate them into a passive crossbar induced by transition from the crystalline state to a structures, which are implemented with mutually disordered amorphous state [22] . While this concept is the perpendicular layers of parallel wires (electrodes) with thin most mature and best understood, rather slow set process (i.e. film sandwiched between two layers (Fig. 3a). Unlike active annealing time to get to crystalline state) and limited devices of CMOS circuits (), two-terminal endurance may be a problem in the context of the proposed memristive devices could have only one critical dimension, hybrid circuits. i.e. film thickness [15] , which can be controlled without the use of expensive fabrication techniques. This is why Resistive switching in metal-oxide devices has been crossbars can be built with advanced patterning techniques observed in a wide range of material systems including non- such as nanoimprint lithography, so that Fnano may be stoichiometric binary/ternary oxides and perovskites [21] . potentially scaled down to just a few nanometers (essentially This group includes the most promising candidates for limited by quantum mechanical tunneling), which far hybrid circuits, due to CMOS compatibility. The switching exceeds the limits of conventional optical lithography. mechanisms in these devices are, however, the least Finally, fabrication of many thin-film, memristive devices understood, in part due to very rich experimental phenomena does not require high temperatures, thus enabling back-end observed. monolithic integration of multiple layers on a CMOS base (Fig. 1a), so that the effective device footprint may be In solid state electrolytes and some a-Si and SiO2 2 devices, the most likely mechanism is metallic filament reduced even further to 4Fnano /K, where K is the number of formation via redox reaction at the interface. Such reaction vertically integrated layers. results in the penetration of electrode material into the insulating film, with subsequent diffusion upon bias (a) reversing [18] . While these devices exhibit some of the best top endurance and low reset currents, the main concern here is nanowire level volatility due to high mobility cations (which may also create similar problems for CMOS integration). A similar mechanism is resisitive switching devices at each likely behind switching in some organic films. Such material bottom nanowire crosspoint systems may be very attractive because of potentially low level fabrication costs, but their temperature stability is a concern. (b) While there is an abundance of literature on bistable I-V curves, statistics and yield data for memristive devices are +VWRITE rarely reported, most likely due to poor results [16] . Still, A there are some very encouraging data, e.g. as obtained very recently for p-Si/a-Si/Ag junctions [23] , the sample-to- sample distribution of the threshold voltage in these devices -VWRITE is very narrow (r.m.s. deviation ~10%). In addition, these (c) devices combine long retention time, high ON/OFF ratio (~104), and large endurance (> 108 cycles). Also, excellent reproducibility has been reported in [24] which demonstrates VREAD integration of such devices in a 30 by 30 crossbar structure. A Though reproducibility of crosspoint memristive devices is perhaps still the most critical challenge at the moment, in V some cases even relative large amount of device-to-device out Fig. 3. Passive crossbar array: (a) A schematic of the structure variations might be mitigated by using approach proposed in and the idea of (b) writing and (c) reading a particular bit [26] [41] . In this work a simple feedback algorithm, which is [31] .

35 The basic operation (i.e. reading/writing the state of the potentially very low cost and low overhead. The area crosspoint devices) of passive crossbar circuits can be interface is enabled by (i) the crossbar array which is rotated explained using simplified equivalent circuits shown on Fig. by an angle α with respect to the mesh of CMOS-controlled 3 [26] [31] . Assuming digital mode operation, in the ON vias; and (ii) a double decoding scheme that provides unique state (representing logic 1) memristive device is essentially a access to each crosspoint device. More specifically, as Figs. , so that the application of a voltage VREAD, with Vt < 4a-c show, two types of vias, one connecting to the lower VREAD < VON, where Vt and VON are denoted in Figure 2, to (shown with blue dots) and the other to the upper (red dots) one nanowire (say, the second horizontal wire in Fig. 3c) wire level in the crossbar, are arranged into a square array leading to the particular crosspoint memristive device gives a with a side length of 2βFCMOS, which is also equal to the side substantial current injection into the second wire (Fig. 3c). length of the “cells” grouping two vias of each kind. At the This current pulls up voltage Vout which, e.g., can now be crossbar rotation angle α = arcsin(1/β), the vias naturally read out by a sense . To have low current at voltages subdivide the wires into fragments of length 2(βFCMOS)/Fnano. above ~Vt, the diode property prevents parasitic currents Here FCMOS is the CMOS half pitch, while β > 1 is a which might be induced in other ON-state cells by the output dimensionless number that depends on the cell size (i.e. voltage (see the red line in Fig. 3c). In OFF state (which complexity) in the CMOS subsystem; it is not arbitrary, but represents logic zero) the crosspoint current is very small, is chosen from the spectrum of possible values β = (r2 + 1/2 giving a nominally negligible contribution to output signals 1) ×Fnano/FCMOS, where r is an integer so that the precise at readout. In order to switch the cell into ON state, the two number of devices on the wire fragment is r2- 2 2 nanowires leading to the device are fed by voltages VWRITE, 1≈β (FCMOS/Fnano) . with V < V < 2V (Fig. 3b). The left inequality WRITE ON WRITE The decoding scheme in CMOL is based on two separate ensures that this operation does not disturb the state of “semi- address arrays (one for each level of wire in the crossbar so selected” devices contacting just one of the biased that there are a total of 4N edge channels to provide access to nanowires. The write 0 operation is performed similarly two different via controllers (one 'blue' and one 'red') in each using the reciprocal switching with threshold V . OFF of N2 addressing cells in the CMOS plane. In contrast to standard memory arrays, in CMOL each control and data line 3.3 CMOS-Nano Crossbar Integration pair electrically connects the peripheral input/outputs to a via One of the most important challenges of the hybrid instead of a single memory element. In turn, each via is circuits is the efficient implementation of the integration of connected to a wire fragment in the crossbar. The two CMOS and nano crossbars. First of all, there is a conceptual perpendicular sets of wire fragments provide unique access problem that arises from the mismatch of half-pitches Fnano to any crosspoint device even for large values of β. For and FCMOS of the nano and CMOS subsystems. Even if Fnano example, selecting pins δv and b4 (which are highlighted ≈ FCMOS, and the high area density of memristive devices is with blue and red circles, respectively) provides access to the achieved by stacking many layers, the overhead for leftmost of the two shown devices on Fig. 4c, while pins δv programming circuitry, which is required to program all the and c4 for the rightmost device. devices, should be small so as not to comprise the density The total number of crosspoint devices that can be advantages. Secondly, actual integration of memristive accessed by the N×N array of CMOS addressing cells is devices, placed on top of the CMOS stack is a substantial ~N2β2(F /F )2, which would be much larger than N2, if challenge by itself. CMOS nano Fnano

(a) 123 456 (b) (c) control lines for “red” vias Fig. 4. Original CMOL 2F 2βF nano CMOS α crossbar with sandwiched 123456 circuits: (a) top view of the memristive crosspoints f f crossbar structure, (b) cut- ζ (top layer) e ζ e away illustration, and (c) ε ε data corresponding equivalent data d d lines lines δ circuit diagram of the δ c for for 2 c “red” configuration logic in CMOS 2(βFCMOS) “blue” γ b γ vias / Fnano vias layer for the N = 5 primitive b β a β cell array. CMOS α a (bottom layer) α i ii iii iv v vi iiiiiiivvvi control lines for “blue” vias

36 (a) nanowire (b) (c) 100 nm (d) layer switching layer

nanowire layer 40 µm 40 µm CMOS layer

Fig. 5. Hybrid circuit demonstration [25] : (a) Conceptual illustration of the hybrid circuit, (b) optical micrograph of the as-received CMOS chip, (c) a hybrid chip with the memristor crossbars built on top; and (d) scanning electron microscope image of a fragment of the memristor crossbar array (where 3 nanowires cross 3 other nanowires, forming 9 memristors) with junction areas of 100×100 nm2. interfaces. For example, a 100-gate-scale hybrid the other hand, increasing N leads to larger readout delays, CMOS/memristor circuit has been fabricated, in which voltage drop across crossbar wires and, most importantly, memristive TiO2-x film is integrated onto a foundry-built leakage currents via semi-selected devices (Fig. 3). This is CMOS platform using nanoimprint lithography, with why implementing strong nonlinearity in the I-V is one of the materials and processes that are compatible with CMOS [32] most important goals for the resistive switching devices in [25] . This implementation features an area-distributed the context of passive crossbar memories. CMOL-like interface with tilted crossbar fabricated with Another concern is the effect of the defective crosspoint nanoimprint technology (Figure 5). To make alignment devices on the memory performance. The defect density for between the CMOS and crossbar layers feasible, memristive devices is likely to be much higher than that of interconnects between the memristor layer and the CMOS conventional CMOS technology so that some novel defect layer have been implemented using larger contact pads and fault tolerance schemes must be considered. A detailed connected the nanowires to the tungsten vias in the CMOS analysis of CMOL memories (which neglects an issue of substrate. leakage via semi-selected devices) with global and quasi- 3.4 Applications local (“dash”) structure of matrix blocks was carried out [30] [31] . Both analyses rely on the combination of two major Memory Arrays. The most natural applications of the 3D techniques for increasing their defect tolerance: the memory hybrid circuits are embedded memories and stand-alone matrix reconfiguration (the replacement of some rows and memory chips, with their simple matrix structure. Such columns with the largest number of bad memory cells for memories are an extension of the so-called resistive spare lines), and error correction codes. The best results were memories (RRAM) [28] . (Memories based on phase change achieved in the later architecture using a synergy of bad bit devices are already in the development stage [29] .) For the exclusion with BCH error-correction codes: the defect memory application, each memristive device corresponds to tolerance is up to 8 to 12% depending on the required access a single-bit , while the CMOS subsystem may be speed (Fig. 6). As a result, CMOL memories may be the first used for coding, decoding, line driving, sensing, and technology to reach the terabit frontier. input/output functions. Reconfigurable Circuits. To implement logic operations, Having larger crossbar arrays helps to approach the each cell (or supercell) hosts some CMOS logic gate in 2 ideal density 1/(2Fnano) because for N×N crossbar array addition to configuration (pass ) circuitry (Fig. 7). peripheral area overhead (i.e. sense , decoders etc.) In the original CMOL field-programmable gate array is proportional to N*logN, while useful area scales as N2. On (FPGA), each cell contains a CMOS inverter, and any multi- input NOR gate can be implemented with the inverter and a 1 2 ) 10 few memristive devices with diode-like functionality [33]

CMOS Ideal CMOS [34] [35] . During the reconfiguration stage, all logic gates F ( N

/ are disabled and any memristive device can be configured to A 100 Access time (ns)

= the high or low resistive state, similar to the operation of the 3 a 10 CMOL memory. Once configured, memristive devices do 30 not change state and those set to the low resistive state 100 10-1 represent electrical connection between logic gates. Ideal CMOL For example, a particular input of the NOR gate can be F F CMOS/ nano=10 connected to any of the outputs of other NOR gates that are

Area per useful bit, bit, per useful Area -2 10 -5 -4 -3 -2 -1 0 within the “connectivity domain” by programming the 10 10 10 10 10 10 corresponding memristive devices (Fig. 7d). The cells that Fraction of bad nanodevices, q are not within the connectivity domain of each other can be Fig. 6. Density (in terms of chip area per bit) for CMOL memory as connected by dedicating some CMOS cells for routing a function of defective device fraction, for several memory access purposes, e.g., to use two inverters in sequence as a buffer. time values, and for a particular FCMOS/Fnano ratio [31] .

37 (a) A B C (d) intrinsic to CMOL FPGA gates and thus enabling much nanodevices ROFF/M more efficient use of nanosubsystem. Wide fan-in allows to R ON F A F compare an n-bit pattern (which is programmed as a binary B weight in the memristive devices) with the streaming data in C F B Cwire Rpass just one cycle and enables average nanodevice utilization (b) (c) close to ~15% for these circuits. Moreover, multiple patterns A C may be compared in parallel enabling a throughput of up to 3×1019 bits/second/cm2 for pattern matching for an aggregate of 1010 bits for the 45-nm CMOS technology and practicable power density, even with conservative assumptions for nanodevice density and performance characteristics, and Fig. 7. CMOL FPGA circuits [35] : (a) The idea of diode NOR without any optimization. logic; (b) basic inverter and (c) latch CMOS cells; For clarity, panel (d) shows only nanodevices and nanowires participating in the Figure 8 shows the results of successful experimental NOR gate demonstrated on panel a. demonstration of the CMOL FPGA-like concept [25] . In this Preliminary theoretical results show a very substantial slightly simplified version of the original CMOL FPGAs density advantage (on the average, about two orders of [43] complete logic gates are implemented in CMOS magnitude) over the purely CMOS circuits, and a subsystem while memristive devices are used only to connect considerable leading edge over alternative hybrid circuit selectively CMOS gates. In particular, Figure 8 shows one concept, so-called nanoPLA [36] . of the many possible signal routings for NOT, AND, OR, NAND and NOR gates and a D-type flip-flop which were On the other hand, the simulation results show that the successfully programmed and tested. It is worth mentioning speed of CMOL FPGA circuits is not much faster than that that this work is perhaps the first ever successful of CMOS. The situation may be rather different in custom demonstration of using this novel technology at such scale. logic circuits, were CMOL may lose a part of its density advantage, but become considerably faster than CMOS. An Bio-inspired Information Processing. Perhaps the most example of such a circuit, performing fast, parallel imaging exciting application of CMOL circuits is in bio-inspired processing, was presented in [42] . The simulated time of information processing – for example, in the field of artificial of a large (1,024×1,024 pixel) image with a neuromorphic networks (ANN). The motivation behind 32×32 window function (at 12-bit precision) is close to just ANN comes from the fact that the mammalian brain still 25 μs. This time has to be compared with estimated 3,500 us remains much more efficient (in power and processing for a CMOS circuit based on the same design rules. This speed) for a number of computational tasks, such as pattern speed advantage is an explicit result of small CMOL recognition and classification, as compared to conventional footprint: the whole circuit processing one input pixel may computers, despite the exponential progress in the be placed behind a 25×25 um2 pixel sensor. As a result, the performance of the latter during the past several decades [14] communication delays are cut to the bone. [37] [40] . The main reason behind poor performance of the conventional CMOS-based circuits is that they cannot Slightly modified original CMOL FPGA circuits have provide high complexity, connectivity and massive parallel potentials to perform massively parallel high throughput information processing which is required typical for pattern matching far exceeding the state-of-the-art biological neural networks. conventional implementations [38] [39] . Pattern matching applications take the full advantage of the very wide fan-in The very structure of CMOL circuits makes them uniquely suited for the implementation of (a) (b) Voltages (V) ultrafast ANN [14] [37] and, in general, A NOT gate VA 3.300 0.002 mixed-signal information processing C V -0.001 3.296 systems [42] [41] . For example, the A C C CMOS subsystem would implement less Voltages (V) A AND gate V -0.001 3.299 -0.001 3.299 dense but also more complex somatic A cells. The area-distributed interface and A B V B -0.001-0.001 3.297 3.298 B dense fragmented crossbar structure of C V -0.004-0.005-0.005 3.295 1 C C CMOL ensure very rich interconnect Voltages (V) OR gate A among somas cells which are connected V 0.002 0.000 3.297 3.298 A to ech other via nanowire-memristive A C B V 0.000 3.297 -0.000 3.297 B B device – nanowire links, i.e. with C 1 VC -0.004 3.297 3.297 3.296 memristive devices acting as artificial synapses. Crude estimates have shown Fig. 8. Experimental demo of a CMOL FPGA-like circuit [25] : (a) CMOS layer fabric that artificial synapses smaller than 10 on a die; and (b) equivalent circuits and digital logic results from the chip tester. nanometres in length could be used to

38 make artificial neural networks of sufficient complexity and experimental milestones are very encouraging, more work is connectivity to challenge the computational performance of certainly needed in order to fulfill the potentials of CMOL the human brain [37] . technology. The most critical issues for CMOL technology still remains yield, reproducibility of the memristive devices Very recently, a key operation for analog and mixed and their viable integration with conventional signal processing - multiply-and-add (i.e. dot-product) has complementary metal oxide semiconductor technology. been demonstrated with hybrid CMOS/memristor circuits (Fig. 9) [41] . To realize such circuitry the memristive 5. Acknowledgments devices implement density-critical configurable weights (e.g. The authors would like to acknowledge useful artificial synapses), while CMOS is used for the summing discussions with K. Likharev as well as other members of the amplifier, which provides gain and signal restoration (e.g. HyNano team. This work is supported by the Air Force implementing simple soma function in the context of ANN). Office of Scientific Research (AFOSR) under the MURI As a result, individual voltages applied to memristors can be grant FA9550-12-1-0038. multiplied by the unique weight (conductance) of memristor and summed up by CMOS amplifier - all in analog fashion. 6. References 4. Summary [1] D.J. Frank, R.H. Dennard, E. Nowak, P.M. Solomon, Y. Taur, and H.S.P. Wong, “Device scaling limits of Si The development of 3D CMOL circuits should greatly MOSFETs and their application dependencies”, Proc. benefit digital memory and logic circuits and may, for the IEEE, vol. 89, pp. 259–288, 2001. first time, enable large scale bio-inspired neuromorphic [2] A.W. Topol et al., “Three-dimensional integrated circuits. Practical introduction of such circuits will enable circuits”, IBM Journal of Research and Development, new compact, fast electronic systems which would open up a vol.50, pp.491-506, July 2006. number of new applications. For example, the initial [3] V.F. Pavlidis and E.G. Friedman, “Interconnect-based experimental demonstrations with just two crossbar layers, design methodologies for three-dimensional integrated circuits”, Proc. IEEE, vol.97, pp.123-140, Jan. 2009. with Fnano = 50 nm, and FCMOS = 130 nm, would enable a 2 [4] S. Borkar, “3D integration for energy efficient system data density of up to 20 Gbit/cm combined with a design”, in: Proc. 2009 VLSI Technology Symposium, throughput, between CMOS and nano subsystems, of up to Kyoto, Japan, June 2009, pp. 58-59. 15 2 5×10 bits per second per cm , assuming a manageable [5] R. S. Patti, “Three-dimensional integrated circuits and the power density [23] [24] [44] . With an assumption of more future of system-on-chip designs”, Proceedings of the aggressive, though still realistic, technology parameters, i.e. IEEE, vol. 94, pp.1214-1224, June 2006. 10 crossbar layers, Fnano = 10 nm, and FCMOS = 45 nm, the [6] D.B. Strukov and A. Mishchenko, “Monolithically maximum density will be close to 2.5 Tbit/cm2 while stackable hybrid FPGA”, in: Proc. Design Automation ensuring similar or superior throughput. The high memory and Test in Europe, Dresden, Germany, Mar. 2010, pp. 661-666. density and aggregate throughout of such circuits open [7] F. Liu et al., “A 300-mm wafer-level three-dimensional unprecedented information processing capabilities, e.g., a 19 2 8 integration scheme using tungsten through-silicon via and pattern matching rate of about 10 bits/second/cm for 4×10 hybrid Cu-adhesive bonding”, in: Proc. IEEE locally stored 500-bit-wide patterns. International Electron Devices Meeting, Dec. 2008, pp. 1- 4. While the initial simulation results and recent [8] W. R. Davis et al., “Demistifying 3D ICs: The pros and cons of going vertical”, IEEE Design and Test of Computers, vol. 22, pp. 498-510, 2005. 20KΩ 2 [9] E.-K. Kim and J. Sung, “Yield challenges in wafer Output stacking technology”, Microelectron Reliab., Vol. 48, CMOS opamp Input1 Input2 pp.1102–1105, 2008.

Output(V) 1 [10] S.M. Chai, A. Gentile, W.E. Lugo-Beauchamp, J. Fonseca, J.L. Cruz-Rivera, and D.S. Wills, “Focal-plane processing architectures for real-time hyperspectral image 0.0 processing”, Appl. Optics, vol. 39, pp. 835-849, 2000. Programming Programming Programming -0.1 Programming [11] K. Sakuma et al., “3D chip-stacking technology with

Input2(V) -0.2 through-silicon vias and low-volume leadfree 0.0 interconnections”, IBM Journal of Research and Programming -0.1 Development, vol. 52, pp. 611 – 622, 2008. -0.2 Input1 (V) Input1 [12] H. Wei et al., “Monolithic three-dimensional integrated 0 50 100 circuits using carbon nanotube FETs and interconnects”, Time (s) in: Proc. IEEE International Electron Devices Meeting Fig. 9. Illustration of analog multiply and add circuitry operation (IEDM), San Francisco, CA, Dec. 2009, pp.1-4. with IC summing amplifier and memristive devices set with high [13] T. Naito et al., “World's first monolithic 3D-FPGA with TFT SRAM over 90 nm 9 layer Cu CMOS”, in: Proc. precision to 15, 30 and 60μA states with adaptive variation tolerant algorithm [41] .

39 Symposium on VLSI Technology (VLSIT), Honololu, HI, [29] H.-B. Kang et al., “Core circuit technologies for PN- June 2010, pp.219-220. diode-cell PRAM”, J. Semiconductor Technology and [14] K. Likharev, A. Mayr, I. Muckra, and O. Türel, Science, vol. 8, pp. 128-133, 2008. “CrossNets: High-performance neuromorphic [30] D. Strukov and K. Likharev, “Prospects for terabit-scale architectures for CMOL circuits”, Ann. NY Acad. Sci, vol. nanoelectronic memories”, Nanotechnology, vol. 16, pp. 1006, pp. 146-163, 2003. 137-148, Jan. 2005. [15] K.K. Likharev and D.B. Strukov, “CMOL: Devices, [31] D.B. Strukov and K.K. Likharev, “Defect-tolerant circuits, and architectures”, in: Introducing Molecular architectures for nanoelectronic crossbar memories”, J. Electronics, G. Cuniberti, G. Fagas, and K. Richter, Eds., Nanoscience Nanotechnology, vol. 7, pp. 151-167, 2007. Berlin: Springer, pp. 447-478, 2005. [32] J. Borghetti et al., “A hybrid nanomemristor/transistor [16] K.K. Likharev, “Hybrid CMOS/nanoelectronic circuits: logic circuit capable of self-programming”, Proc. Opportunities and challenges”, J. and National Academy of Sciences, vol. 106, pp. 1699-1703, Optoelectronics, vol. 3, pp. 203-230, 2008. 2009. [17] D.B. Strukov, G. Snider, D. Stewart, R.S. Williams, “The [33] D.B. Strukov and K.K. Likharev, “CMOL FPGA: A missing memristor found”, , Vol. 453, pp.80-83, reconfigurable architecture for hybrid digital circuits with 2008. two-terminal nanodevices”, Nanotechnology, vol. 16, pp. [18] R. Waser, R. Dittman, G. Staikov, K. Szot, “Redox-based 888-900, 2005. resistive switching memories – nanoionic mechanisms, [34] D.B. Strukov and K.K. Likharev, “CMOL FPGA prospects, and challenges”, Advanced Materials, vol. 21, circuits”, in: Proc. Int. Conf. Computer Design, Las pp. 2632-2663, 2009. Vegas, NE, Jun. 2006, pp. 213-219. [19] D.B. Strukov and H. Kohlstedt, “Resistive switching [35] D.B. Strukov and K.K. Likharev, “A reconfigurable phenomena in thin films: Materials, devices, and architecture for hybrid CMOS/nanodevice circuits”, in: applications”, Materials Research Society Bulletin, vol. Proc. Field Programmable Gate Arrays, Monterey, CA, 37, Feb 2012. Feb. 2006, pp. 131-140. [20] M.T. Hickmott, “Low-frequency negative resistance in [36] A. DeHon, “Array-based architecture for FET-based thin anodic oxide films”, J. Appl. Phys. vol. 33, pp. 2669- nanoscale electronics”, IEEE Trans. Nanotechnology, vol. 2682, 1962. 2, pp. 23-32, 2003. [21] J.J. Yang et al. “Memristive switching mechanism for [37] K. K. Likharev, "CrossNets: Neuromorphic hybrid metal/oxide/metal nanodevices”, Nature Nanotechnology, CMOS/nanoelectronic networks", Science of Advanced vol. 3, pp. 429-433, 2009. Materials, vol. 3, pp. 322-331 (2011). [22] Y.C. Chen et al., “Ultra-thin phase-change bridge [38] F. Alibart, T. Sherwood, and D.B. Strukov, “Hybrid memory device using GeSb”, in: Proc. Electron Device CMOS/nanodevice circuits for high throughput pattern Meeting, San Francisco, CA, Dec. 2006, pp. 1 – 4. matching applications”, in: Proc. AHS’11, pp. 279-286, [23] S.H. Jo and W. Lu, “CMOS compatible nanoscale San Diego, CA, Jun. 2011. nonvolatile switching memory”, Nano Lett., vol. 8, [39] D.B. Strukov, “Hybrid CMOS/Nanodevice circuits with pp.392-397, 2008. tightly integrated memory and logic functionality”, in: [24] S.H. Jo, K.-H. Kim, and W. Lu, “High-density crossbar Proc. Nanotech‘11, vol. 2, pp. 9-12, Boston, MA, Jun. arrays based on a-Si memristive system”, Nano Lett., 2011. vol. 9, pp. 870-874, 2009. [40] D.B. Strukov, “Smart connections”, Nature 476, pp. 403- [25] Q. Xia et al., “Memristor-CMOS hybrid integrated 405, 2011. circuits for reconfigurable logic”, Nano Letters, vol. 9, pp. [41] F. Alibart, L. Gao, B. Hoskins, and D.B. Strukov, “High- 3640-3645, 2009. precision tuning of state for memristive devices by [26] D.B. Strukov and K.K. Likharev, Reconfigurable nano- adaptable variation-tolerant algorithm”, Nanotechnology, crossbar architectures, in: Nanoelectronics, R.Waser, vol. 23, art. 075201, 2012. Eds., 2012. [42] D.B. Strukov and K.K. Likharev, “Reconfigurable hybrid [27] D. B. Strukov and R. S. Williams, "Four-dimensional CMOS/nanodevice circuits for image processing”, IEEE address topology for circuits with stacked multilayer Trans. Nanotechnology, vol. 6, pp. 696-710, 2007. crossbar arrays," Proc. National Academy of Sciences, [43] G.S. Snider, R.S. Williams, “Nano/CMOS architectures vol. 106, pp. 20155-20158, Dec. 2009. using a field-programmable nanowire interconnect”, [28] K.K. Likharev, “Resistive and hybrid CMOS/nanodevice Nanotechnology, vol. 18, art. 035204, 2007. memories”, in: J. E. Brewer and M. Gill (eds.), [44] K.-H. Kim, S.H. Jo, S. Gaba and W. Lu, “Nanoscale Nonvolatile Memory Technologies with Emphasis on resistive memory with intrinsic diode characteristics and Flash, Wiley, Hoboken, NJ, (2008), pp. 696-703. long endurance”, Applied Physics Letters, vol. 96, art. 053106, 2010.

40