All-NDR Crossbar Logic Dmitri B. Strukov, Member, IEEE, and Konstantin K. Likharev, Fellow, IEEE

Abstract—We propose new crossbar circuits in which the hybrid CMOS/nanocrossbar circuits do, resulting in a logic functionality, signal restoration, and connectivity are all significant footprint overhead – see, e.g., reviews [8-10]. performed by similar bistable two-terminal devices with For that, however, the gate insulation problem should be negative differential resistance (NDR) in one of the states. The solved. Namely, in all earlier NDR logic circuits we are gate isolation challenge is met by using device’s nonlinearity aware of, Goto pair connection has been performed using together with a multiphase clocking scheme. A preliminary evaluation shows that for at least some applications, the all- some other two-terminal devices – see, e.g., Refs. 11, 12. In NDR approach enables circuit density and data throughput simple (two--layer, uniform) crossbars, this is not an higher than those of hybrid CMOL FPGA ICs. option. In this paper, we propose a solution of the gate isolation problem, based on a multiphase clocking scheme and a specifically engineered device nonlinearity. I. INTRODUCTION ttempts to utilize the negative differential resistance II. ALL-NDR LOGIC CIRCUITS A(NDR) effect in two-terminal devices (based, e.g., on Figure 1a shows an example of the device capable of band-to-band tunneling in Esaki [1] or on resonant performing all the functions we need. Its stack consists of tunneling through quantum wells [2]) for computing have a two back-to-back Esaki diodes (which ensure a symmetric long history. Perhaps the most serious of them were based on NDR characteristic in ON state), in series with a tunnel so-called Goto pairs [3]. Unfortunately, because of inferior barrier to suppress current at small voltages (the feature performance characteristics of NDR-based circuits (such as crucial for gate isolation, see below), and a bistable density and/or power) to those based on CMOS technology, (“memristive ”) layer [13]. Figure 1b shows simulated the Goto pair logic did not go mainstream. However, the I-V curves based on a crude model of the stack. In particular, impending end of CMOS scaling may give NDR logics each Esaki was simulated using Eqs. 1a-c and specific another chance, because two-terminal NDR devices have parameters listed in Ref. 14, with current scaled down by a only one critical dimension which may be controlled by layer factor of 104 corresponding to ~25×25 nm2 germanium thickness, so that their lateral dimension reduction may be junctions. The tunnel barrier was approximated using pursued much more aggressively than that of . expression I = 10-15×sinh[40V] (in SI units), roughly The most plausible way to sustain high density of two- corresponding to a 6-nm TiO2 tunnel barrier of the same terminal devices is to build crossbars incorporating them at area. For simplicity, the ON resistance of the switch was each crosspoint. The regular topology of crossbars allows assumed to be much smaller, and its OFF resistance much their fabrication using such advanced patterning technologies larger than the resistance scale of the rest of the stack. (For as nanoimprint [4, 5], interference lithography [6], and scaling to sub-10-nm lateral dimensions, other means of block-copolymer lithography [7] which may be scaled down combining the NDR effect with bistability, such as self- beyond the 10-nm frontier without using prohibitively assembled molecular monolayers with single-electron expensive equipment. However, in order to use this tunneling [15], may be more realistic.) opportunity, effective ways to form arbitrary logic circuits Figure 2 shows results of simulation of the transfer curve from the regular crossbar fabric have to be found. (Vclk vs. Vout) for a Goto pair, based on two NDR devices A possible way here is to use bistable crosspoint devices shown in Fig. 1a-b, with no input current injected into its which could be switched between their low-current (OFF) central node, voltage-biased with a symmetric clock signal. state and high-current (ON) state which would feature an Similarly to the conventional case [3, 11, 12], two stable NDR branch. Such circuits can perform signal restoration, states of the Goto pair correspond to either high (Vout = V0 , and thus avoid using for that purpose a CMOS subsystem or in our case V0 ≈ 0.5 V) or low (Vout = -V0) output voltage. other three-terminal devices - as all previously proposed Figure 3a shows the transfer curve for a buffer gate whose symmetry is broken during the clock ramp-up. More general, reconfigurable threshold gates [16] with the Boolean th Manuscript received April 18 , 2011. This work was supported by the function National Science Foundation grants CCF-1017579 and CCF-0829947. D.B. Strukov is with the Electrical and Engineering Department, University of California at Santa Barbara, Santa Barbara, CA, y = V0×sgn[Σixi – Θ] (1) 93106 USA (phone: 805-893-2971; fax: 805-893-3262; e-mail: strukov@ ece.ucsb.edu). (where x , y = {-V , V } are digital inputs and output, K.K. Likharev is with the Physics and Astronomy Department, Stony i 0 0 Brook University, Stony Brook, NY 11794-3800 USA (phone: 631-632- respectively, and Θ is switching threshold) may be also 8159; fax: 631-632-4976; e-mail: [email protected]). readily implemented by feeding the Goto pair with several

computation is performed in a systolic (a) (b) ON state top electrode fashion, with the direction determined by tunnel barrier 0.55

 the clocking sequence, similar to n-type OFF state back- )

p-type to-back mA 0 reversible parametron-like logic circuits μA  ( -4

Esaki I

I 0.00110 n-type 105 [17-19] (some of them referred to as diodes -87 -0.55 10109 memristive layer “on” “off” 10 “quantum-dot cellular automata”, QCA), 10-10 state 0.1 0.3 1 bottom electrode 0.1 0.3 1 and adiabatic CMOS circuits [20]. -22  -11 00 11 2 The gate isolation is quite sufficient due V(Volts)Volts to the high nonlinearity of the NDR Fig. 1. Resistive switching devices (also known as “memristive switch” or “latching switch”) devices at low voltages. For example, for with symmetric negative differential characteristics: (a) device stack with the NDR effect due the Goto pair being evaluated, the current to back-to-back Esaki diodes and energy diagram (schematically) and (b) its simulated I-V characteristics. The inset of panel (b) shows the same I-V curve in a log-log scale. from its output pair is negligible (below 100 pA even for a fan-out of 10) given the maximum bias of V = 0.5 V applied over input currents. For each gate, Θ is set by properly biasing 0 one NDR device connected in series with several other NDR Goto pair, i.e. by connecting its central node, with an devices (Fig. 4a). In addition, if necessary, the reference additional NDR device, to corresponding bias voltages V or up Goto pair can be reinforced with more NDR devices to V . For example, Fig. 3b shows an example of a 2-input down minimize the effect of the output current on its state – see gate with a switching thresholds of Θ = - V , i.e. 2-input OR 0 gray lines in Fig. 4a. In this case each evaluation stage gate. Figs. 3c and d show 3-input gates with Θ = -2V and Θ 0 should be followed by an reinforcement phase (in Fig. 4a, = 0, correspondingly, and results of their numerical provided by clock V ). simulations. RE Figure 4 shows the idea of multiphase clocking used for III. DISCUSSION gate isolation. There are four distinct phases of each clock signal. Every clock voltage is set to ±VDD/2 during one of The proposed all-NDR logic scheme maps naturally to phases (REF). During this phase, outputs of the Goto pairs hybrid CMOL IC circuits ([9, 10], Fig. 5), with CMOS driven by this clock are either high or low (either +V0 or - circuitry used only for configuration, clock distribution, and I/O purposes. For example, Fig. 6 shows a full adder V0) and serve as references for the next, “evaluation” pairs whose clock voltage is being ramped up from zero to VDD/2 implementation using an all-NDR crossbar circuit. (Note the (Fig. 4b). The Goto pairs driven by the evaluation clock are dual-rail implementation which is necessity due to the lack of forced to calculate a certain Boolean function of its inputs. the signal inversion by a Goto pair.) With such During the next two phases, the clock voltage is set to zero, implementation, the full adder comprises two logic levels disabling the corresponding Goto pairs. The resulting and total of 8 gates (i.e. 8 CMOS cells in CMOL circuits).

(a) 6 (b) 0.4 V = 0 V 2 CLK A   0  0  VCLK     G 2  0.40.4  -0.4  4 I  U    6  6   2 1 0 1 I (V -V 2)  0.4 V = 0.55 V D OUT CLK  4 CLK  V  OUT  2 0.2  B 0.2   0 I  0 D    D 2 -V  IU(VCLK -VOUT) CLK  ) -0.4  4  I A B  E 

μ A  6 0.0  ( 6 0.0  ( Volts)  I E  0.4 VCLK = 0.67 V  4  C D  OUT  2  C V   0 VCLK  0     -0.20.2  2   -0.4  4     V t  6 DD  6   0.4 V = 1.05 V  4 CLK  -0.4  0.4  F I G  2   F 0  0  2 -0.44 0.00.0 0.2 0.2 0.4 0.6 0.6 0.80.8 1.0 1.0 6 V (Volts) -22  -11 0 0 1 2 CLK V (Volts)

Fig. 2. Results of simulation of a Goto pair [3] implemented with ON state NDR devices: (a) A set of device I-V curves, showing currents I(Vout-Vclk) and I(Vclk-Vout) flowing, respectively, through the upper and lower NDR devices of the Goto pair, for four cases: VCLK = 0 V, 0.55 V, 0.67 V, and 1.05 V (from top to bottom) and (b) the transfer curve of the Goto pair at zero input current. The top inset in panel (b) shows the equivalent circuit for Goto pair, while that in the bottom panel shows a typical time evolution of the clock voltage. The output voltage of the Goto pair corresponds to the equality of two currents shown on panel (a), i.e. to points A-G of intersection of blue and red curves on that panel. These points are shown also on panel (b); points E and I correspond to unstable equilibrium. Arrows show voltage leaps as VCLK is changed.

             (a)  (b)    4 V     CLK  4    0.4    4       V      CLK V      UP                          1      2 V        VIN OUT  VIN 2        2       -V     CLK    0        V     0   OUT 0     

(Volts)       0                   2         V   OUT   IN   2       1   1  1   2     V  2  V = - V   V = +V  V = +V  IN 0   IN 0   IN 0   -V      V = -V  CLK     IN 0       2  2  2         V = - V  V = - V  V = +V  4  IN 0  IN 0  IN 0  k = 1     -0.4 CLK  k = 2, k = 1 4     CLK UP   4          0 1.0 V (Volts)     CLK         4   4      4  (c)          1 VCLK VUP       V       IN        2          2   2               2             VIN      0        V   0 0 OUT   1   1 1 1   V = +V V = +V V = +V VIN = - V0   IN 0 IN 0 IN 0 3     2  2 2 VIN 2 2     2 2 V = - V   VIN = - V0   VIN = +V0  VIN = +V0 IN 0       -V       CLK 3  3  3   3   V = - V  V = - V  V = - V  V = +V  IN 0  IN 0  IN 0  IN 0  4   4  4      kCLK = 3, kUP = 2     00 02 04 06 08 10 00 02 04 06 08 10 00 02 04 06 08 10         1     4     VIN = - V0  4  4  4           (d) 1 VCLK 2       V V = - V      IN IN 0              2 3 2  2   2   V = - V     IN 0     2         VIN        V 0  0   0    OUT     0  1   1 1       VIN = +V0   VIN = +V0 VIN = +V0     3     V 2   2 2   2  2 IN   V = - V   2 V = +V  2 V = +V  IN 0   IN 0   IN 0        -VCLK  3  3   3    V = - V  V = - V  V = +V    IN 0  IN 0  IN 0  4     kCLK = 2  4  4  4          Fig. 3. Threshold gates and their calculated transfer curves: (a) buffer, (b) OR gate, (c) 3-input threshold gate with Θ = -2 V0 (i.e. fan-in-3 OR gate), and (d) similar gate with the threshold set to Θ = 0 (i.e. majority gate). Symmetric input vectors are not shown. kclk and kup denote the number of nanodevices in parallel connected to clock and pull-up Vup = V0 voltage sources, respectively. All transfer curves are drawn on the same scale – similar to that shown in Figs. 3a and 2b.

2 The circuit density depends substantially on whether a (660×FCMOS) , respectively. The former number yields a dataflow systolic-like architecture may be used. For circuit density slightly higher that that for the CMOL FPGA example, if the most significant input bits are supplied with a technology [9]. However, in both cases the all-NDR certain time delay (i.e. exactly when they are needed), a 32- approach allows for a ×6 higher throughput (at the same bit ripple carry adder has only 32×8 CMOS cells. On the power consumption and similar latency) due to its inherently other hand, a fully balanced scheme [16], in which most pipelined structure capable of processing a new set of data gates are configured as delay (buffer) elements, requires a every fourth clock cycle. ~(32×6)×(32×2) array of cells. Using results of our prior On the negative side, this concept shares CMOL’s major work [9, 10] the linear size of the CMOS cell is ~6FCMOS, challenge, the need in high-yield technology of bistable and assuming that the number of control cells for practical device fabrication. In addition, it imposes a requirement connectivity domain sizes is negligible, these two cases pertinent to NDR devices, of a sufficiently high peak-to- 2 correspond to adder area of about (100×FCMOS) and valley ratio. Also, mapping arbitrary circuits to a deeply

(a) phase 1 phase 2 phase 3 phase 4 (b) Voltage REF V /2 1 DD +VCLK 1 t VIN 0 1 -VDD/2 -VCLK 2 VOUT VIN V /2 EVAL DD V 2 2 1 1 V UP +VCLK +VRE +VCLK 0 t k 0 CLK kU 2 -VDD/2 -VCLK +V 2 V 2 +V 3 4 V 1 CLK UP CLK +VCLK OFF IN 3 VDD/2 +V CLK t 1 -VRE 1 0 -VCLK 3 VOUT -V /2 -VCLK +V 1 1 DD RE +VCLK OFF V /2 4 DD +VCLK 2 3 4 t -VCLK -V -VCLK 2 CLK 0 VIN 4 -VDD/2 -VCLK -V 1 -V 1 RE CLK Fig. 4. Four phase clocking scheme: (a) an example of a circuit and (b) clock voltage evolution in time. Gray lines show (optional) additional Goto pairs providing input signal reinforcement. Blue line on panel b shows pull up voltage Vup associated with this clock phase.

(a) (b) (c) (d) enable enable VCLK VCLK -VCLK CMOS CMOS 1 1 1 VOUT VCLK VCLK VCLK VOUT column 2 column 1 -VCLK 1 1 1 CMOS -VCLK -VCLK -VCLK 1 2 V 2 V 2 VCLK row 1 VCLK CLK CLK

1 1 2 2 2 2 2 -VCLK -VCLK VCLK VUP -VCLK -VCLK -VCLK

3 3 3 3 CMOS VCLK VCLK VCLK row 2 1 2 3 -V 3 -V 3 VIN VIN -VCLK CLK CLK 2 1 upper layer lower layer 4 4 V 1 4 VCLK VCLK CLK VCLK 1 2 3 nanowire nanowire -VCLK

Fig. 5. Example of a simple logic circuit and its plausible implementation using the CMOL hybrid IC [9, 10]: (a) Mapping of the simple circuit (shown in the bottom right corner) to the crossbar; (b) the generic hybrid circuit and (c) its CMOS cell; (d) example of CMOS circuitry allocation for clocking and biasing cells. The allocation is rather flexible, and the ratio of control cell and logic cells numbers may be changed in field with enable signal and some extra peripheral circuitry. Note that for clarity panel (a) does not show crosspoint devices in their OFF state. Also, several modifications of the CMOL hybrid circuits, including fully CMOS-compatible [21] and monolithically stackable 3D hybrid circuits [22] may be used instead.

pipelined structure with a multiphase clock scheme presents (a) 1,1 1,2 (b) 1,3 a an additional logic synthesis and layout challenge. Similar 0 -2 0 1,2 1,1 problems have arisen in the context of parametron-like logic 2,1 2,3 b 0 0 0 cOUT 2,2 circuits [17-19]; however, unlike those circuits, the all-NDR 2,1 concept does not impose the nearest-neighbor restriction on 3,3 cIN 0 2 0 s 3,2 gate connectivity. A detailed analysis of performance of the 3,1 NDR circuits is our next goal. 4,3 ~a 0 -2 0 ~s 4,2 4,1 ACKNOWLEDGMENT 5,3 5,3 ~b 0 0 0 ~cOUT 5,2 5,1 Useful discussions with R. Brayton, D. Hammerstrom, A. 6,2 6,3 6,3 Mishchenko, M. Stan, and X. Qiu are gratefully ~cIN 0 2 0 acknowledged. 6,1 V 1 V 2 V 3 6,2 CLK CLK CLK REFERENCES Fig. 6. Full adder implementation with all-NDR crossbar logic: (a) mapping scheme, and (b) crossbar mapping. On panel (a), each box [1] L. Esaki, “New phenomenon in narrow germanium p-n junctions”, represents one gate with the switching threshold (in units V ) indicated Phil. Phys. Rev., vol. 109, pp. 603–604, 1958. 0 inside the box. Also, for convenience, in both panels each gate is marked [2] R. Tsu and L. Esaki, “Tunneling in a finite superlattice”, Appl. Phys. with its row and column position. Panel (b), for clarity, does not show Lett., vol. 22, pp. 562-564, 1973. the pull-up, pull-down and parallel Goto pairs (reflecting factor kclk) of [3] E. Goto et al., “Esaki diode high-speed logical circuits”, IRE Trans. the stage, as well as any OFF-state crosspoint devices. Elec. Comp., vol. EC-9, pp. 25-29, 1960. [4] D. Bratton, D. Yang, J.Y. Dai, and C.K. Ober, “Recent progress in [14 ] S.L. Sarnot and A.B. Bhattacharyya, “ transient high resolution lithography”, Polymers for Advanced Technologies, analysis”, Electronics Letters, vol. 5, pp. 275-277, 1969. vol. 17, pp. 94-103, 2006. [15] N. Simonian, J. Li and K. Likharev, “Negative differential resistance [5] V. S. Sreenivasan, “Nanoscale manufacturing enabled by imprint at sequential single-electron tunneling through atoms and molecules”, lithography”, MRS Bulletin, vol. 33, pp. 855-865, 2008. Nanotechnology, vol. 18, art. 424006, 2007. [6] H.H. Solak, “Nanolithography with coherent extreme ultraviolet [16] C. Pasha et al., “Threshold logic circuit design of parallel adders light”, J. Phys. D – Appl. Phys., vol. 39, pp. R171-188, 2006. using resonant tunneling devices”, IEEE Trans. VLSI, vol. 8, pp. 558 - [7] I.W. Hamley, “Nanostructure fabrication using block copolymers”, 572, 2000. Nanotechnology, vol. 14, pp. R39-R54, 2003. [17] K.K. Likharev, “Dynamics of some single flux quantum devices. I. [8] M. Stan et al., “Molecular electronics: From devices and interconnect Parametric quantron”, IEEE Trans. Magn., vol. 13, pp. 242-244, to circuits and architectures”, Proc. IEEE, vol. 91, pp. 1940-1957, 1977. 2003. [18] K.K. Likharev and A.N. Korotkov, “Single-electron parametron: [9] K.K. Likharev and D.B. Strukov, “CMOL: Devices, circuits, and Reversible computation in a discrete state system”, Science, vol. 273, architectures”, Lecture Notes in Physics, vol. 680, pp. 447-477, 2005. pp. 763-765, 1996. [10] K.K. Likharev, “Hybrid CMOS/nanoelectronic circuits: Opportunities [19] M.T. Niemier and P.M. Kogge, “Problems in designing with QCA: and challenges”, J. Nanoel. & Optoel., vol. 3, pp. 203-230, 2008. Layout equals timing”, Int. J. Circ. Theo. Appl., vol. 29, pp. 49-62, [11] A.C. Seabaugh, “Resonant tunneling and quantum integrated 2001. circuits”, in: Proc. IEEE Cornell Conf. on Adv. Concepts in High [20] W.C. Athas and L.J. Svenson, “Reversible logic issues in adiabatic Speed Semicond. Dev. and Circ., Ithaca, NY, Aug. 1995, pp. 455-459. CMOS”, in: Proc. Phys.& Computation., Dallas, TX, Nov. 1994, pp. [12] P. Mazumder et al., “Digital circuit applications of resonant tunneling 111-118. devices“, Proc. IEEE, vol 86, pp. 664-686, 1998. [21] Q. Xia et al., “-CMOS hybrid integrated circuits for [13] J.J. Yang et al., “Memristive switching mechanism for reconfigurable logic”, Nano Letters, vol. 9, pp. 3640-3645, 2009. metal/oxide/metal nanodevices”, Nature Nanotechnology, vol. 3, pp. [22] D.B. Strukov and R.S. Williams, “Four-dimensional address topology 429-433, 2008. for circuits with stacked multilayer crossbar arrays”, Proc. Natl. Acad. Sci., vol. 106, pp. 20155-20158, 2009.