<<

Integration, Architecture, and Applications of 3D CMOS‐ Circuits

K.‐T. Tim Cheng and Dimitri Strukov Univ. of California, Santa Barbara

ISPD 2012

1 3D Hybrid ‐ CMOS/NANO

add-on nanodevices top nanowire level CMOS stack similar two-terminal nanodevices CMOS layer bottom nanowire at each level crosspoint

• CMOS stack + nano add-on • nanowire crossbar of two-terminal devices ()

2 Resistive Switching “Memristive” Devices (latching , a.k.a. resistive switches, a.k.a. programmable , a.k.a. memristive switches)

+Wide range of material systems and 200 physical phenomena 100 50 nm hp uA )

( 0 + Pt ‐100 TiO2 V Current <50 ns TiOx Pt ‐ ‐200 ‐2 ‐1 0 1 2 Voltage ( V ) 3 J. Yang Iet al. Natue Nano, (2008) Area‐Distributed “CMOL” Interfaces

interface nanodevices via (“pin”) (latching switches) gold nanowire levels (nanoimprint)

interface pins

Tip radii 2-10 nm

CMOS stack (just a cartoon)

MOSFET Si wafer

K. Likharev (2004, 2005); D. Strukov and K. Likharev (2006) http://www.oxfordplasma.de/ 4 process/sibo_wtc.htm AFOSR‐MURI HyNano: 3D Hybrid CMOS‐Nano Circuits

5 The HyNano Team

Michael Chabinyc Tim Cheng Marivi Fernandez‐Serra Konstantin K. Likharev Materials, UCSB ECE, UCSB Physics, Stony Brook Physics, Stony Brook (Director)

Wei Lu Susanne Stemmer Dmitri Strukov Luke Theogarajan Qiangfei Xia EECS, Michigan Materials, UCSB ECE, UCSB ECE, UCSB ECE, UM Amherst 6 Project Overview APPLICATIONS information 3D hybrid 3D hybrid SoC processing memories

ARCHITECTURES/CIRCUITS 3D CMOS/nano circuits mixed‐signal w. area‐distributed interface CrossNets

Optical lithography e‐beam nanoimprint compact models DEVICES drift diffusion reproducible, high‐performance, and ab‐initio high‐endurance devices modeling MATERIALS a‐Si metal oxide organic solid electrolyte 7 Thrust Area #1: Application/Architecture/Ckt Exploration

• Memory arrays for high‐performance computing • CMOL‐based FPGA • Neuromorphic networks for bio‐inspired information processing • Evolvable analog circuits • Tunable bias network for analog design • Weighted multiply and add circuits • High precision Digital‐to‐Analog converter

8 “CMOL” Interface –Integrating CMOS with Crossbar Memory Array interface nanodevices via (“pin”) (latching switches) gold nanowire levels (nanoimprint) interface pins

CMOS stack (just a cartoon)

MOSFET Si wafer Addressing Crossbar Memory Array

• There are two types of pins – Blues and Reds • Each array of pins has its own decoding scheme

Double decoding scheme: • An array of N2 blue pins uniquely accessed with 2N control signals. • Another 2N control signals for the corresponding N2 red pins Double Decoding Scheme

• Four decoders:

demux

select array decoder

select mux/demux decoder data I/O 11 Crossbar Construction – Top View Crossbar Construction – Top View Crossbar Construction –Side View

interface nanodevices via (“pin”) (latching switches) gold nanowire levels (nanoimprint) interface pins

CMOS stack (just a cartoon)

MOSFET Si wafer Crossbar Construction – Top View Crossbar Construction – Bottom Level Crossbar Construction – Top Level Crossbar Construction Crossbar Construction Crossbar Construction

Connectivity Domain Crossbar Construction Crossbar Construction Unused Address Space

The red pin can only interact with blue pins in its connectivity domain

Address space provided by yellow cells is wasted! Key Geometric Parameters

• Distance between nanowires is 2FNANO

• Size of cell is 2βFCMOS • β2 = r2 + 1 where r is an odd integer > 1. • Crossbar is tilted by an angle α equal to ArcTan(1/r) with respect to the pins. • # of reachable crosspoints per segment is r2 –1 Crossbar Construction – Bottom Level Adding a Second Crossbar Layer

Connectivity domain in the first crossbar layer

Connectivity domain in the second crossbar layer

Blue pins are common to all The mapping is done crossbar layers. through pin Red pins are "redefined" for translation each layer using the pin translation wires. First layer of red pins. First layer of red and blue pins. Layer of (bluish) wires connected to the blue pins. Single (orange) wire connected to a red pin. The cross‐points with the bottom wires are shown in green. First complete crossbar layer. A single pin translation wire (in yellow). Every orange wire is “translated” into another point using the same type of pin translation wire. The first crossbar layer with its pins translation wires are then “buried” in

SiO2 We start to build the next crossbar layer (bluish wires) We start to build the next crossbar layer (bluish wires) We add the orange wires (the cross‐points are formed) And we add the pins translation wires and repeat the process… Maximum Number of Layers

• Each layer has N2 cells. • There are r2 –1 cross points per cell. • That gives us a total of N2(r2 –1) cross points per layer. • The double decoding scheme allows us to address up to N4 locations • Which means that we can (potentially) have up to N2/(r2 –1) crossbar layers. How Does it Stand Up as a Memory?

Memristor PCM STTRAM DRAM Flash HDD

Density (F2) <4 8–16 37–64 6–8 4–6 2/3

Energy per † 0.1–3 2–27 0.1 2 10000 1–10x109 (pJ)

Read time (ns) 10-100(?) 20–70 10–30 10–50 25000 5–8x106

Write time (ns) ~10 50–500 13–95 10–50 200000 5–8x106

Retention years years weeks? <

Endurance >1012 107 1015 1015 106 104 (cycles)

40 If Successful, 3D Hybrids Can Achieve…..

• Unprecedented memory density 2 – Footprint of a nano‐device is 4Fnano /K, for K vertically integrated crossbar layers – Potentially up to 1014 on a single 1‐cm2 chip • Enormous memory bandwidth – Potentially up to 1018 bits/second/cm2 • At manageable power dissipation • With abundant redundancy for yield/reliability

41 Thrust Area #1: Application/Architecture/Ckt Exploration

• Memory arrays for high‐performance computing • CMOL‐based FPGA • Neuromorphic networks for bio‐inspired information processing • Evolvable analog circuits • Tunable bias network for analog design • Weighted multiply and add circuits • High precision Digital‐to‐Analog converter

42 CMOL‐Based FPGA

• Programming for xpoint memristors similar to CMOL digital memories • Uniform fabric with CMOS cells • Crossbar wires for routings

A cell F B

A+B B

A B nanodevices

A RON

A+B Cwire Rpass CMOS inverter 43 Density: CMOS vs. CMOL

Metrics (units) 2009 2010 2011 2012 2013 Comments

Half-pitch FCMOS (nm) 50 45 40 36 32 In accordance with ITRS

Half-pitch Fnano (nm) 20 18 16 14 12 - 2 2 CMOS memories (Gbits/cm ) 6.7 8.2 10.5 13 16 Follows ITRS (with A = 6F CMOS) CMOL memories (Gbits/cm2) 4 10233667Initial progress impacted by q CMOS FPGA (Mgates/cm2) 0.4 0.5 0.6 0.8 1.0 Rescaled from 0.18 μm rules CMOL FPGA (Mgates/cm2) 625 775 1,000 1,200 1,500 -

Metrics (units) 2016 2019 2022 2025 2028 Comments

Half-pitch FCMOS (nm) 30 28 26 24 22 Grows slower than in ITRS

Half-pitch Fnano (nm) 10 6 4 3.5 3 - 2 2 CMOS memories (Gbits/cm ) 18 21 25 29 35 Follows A = 6F CMOS CMOL memories (Gbits/cm2) 100 350 900 1,200 1,700 Spectacular progress at lower q CMOS FPGA (Mgates/cm2) 1.1 1.3 1.5 1.7 2.1 Rescaled from 0.18 μm rules CMOL FPGA (Mgates/cm2) 1,700 2,000 2,300 2,700 3,200 - 44 Thrust Area #1: Application/Architecture/Ckt Exploration

• Memory arrays for high‐performance computing • CMOL‐based FPGA • Neuromorphic networks for bio‐inspired information processing • Evolvable analog circuits • Tunable bias network for analog design • Weighted multiply and add circuits • High precision Digital‐to‐Analog converter

45 Thrust Areas: # 2: High‐Performance/‐Yield Devices # 3: 3D Hybrids Integration Integrating CMOS with devices Using: of different materials: • Nanoimprint • a‐Si • E‐beam lithography • Metal oxide • Optical lithography • Organic • Heterogeneous wafer‐level • Solid‐state electrolyte integration

(a) (b)

50 μm 100 μm E‐Beam Crossbar Arrays (Lu) <20nm Overlay Alignment (Xia) 46 Integrated Crossbar Array/CMOS System PI: Lu

Crossbar array Integrated crossbar/CMOS chip with probe card attached

CMOS Kim et al. Nano Lett., 12, 389–395 (2012).

47 Integrated Crossbar Array/CMOS System

48 Performance of a‐Si and Metal‐Oxide Device Array

“on” filament “off”

100nm

• Tight distribution from 256 devices measured • Devices shown excellent on/off and intrinsic characteristics

49 Project Overview APPLICATIONS information 3D hybrid 3D hybrid SoC processing memories

ARCHITECTURES/CIRCUITS 3D CMOS/nano circuits mixed‐signal w. area‐distributed interface CrossNets

Optical lithography e‐beam nanoimprint compact models DEVICES drift diffusion reproducible, high‐performance, and ab‐initio high‐endurance devices modeling MATERIALS a‐Si metal oxide organic solid electrolyte 50 BACKUP SLIDES

51 Thrust Area #3: High‐Performance/‐Yield/‐Reproducibility Devices

• a‐Si (Lu) • Metal oxide (Stemmer, Xia) • Organic (Chabinyc) • Solid‐state electrolyte (Lu)

52 a‐Si Memristive Devices and Arrays PI: Lu

3.0 1st cycle 50 2.5 After 2nd cycle “on” -6 40 filament 10 “off” -7 2.0 10 -8 30 (100nA) 10-9 1.5 10-10 10

-11 devices 1.0 10 20 -12

10 of 0.5 -4 -20 2 4

# 10 Current Voltage 0.0 0 -4 -2 0 2 4 6 2.62.9 3.2 3.5 3.8 4.1 4.4 100nm Voltage (V) Vth (V)

Lu et al., Nano Lett. (2008, 2009) 53 Project Organization UCSB

CMOS circuit MBE fabrication Organic Digital and analog design for Memristive device of memristive memristive 3D hybrid circuit CMOL modeling devices; devices architectures integration

Cheng, Strukov, Cheng, Strukov, Stemmer Chabinyc Strukov Theogarajan Theogarajan U. Michigan UMass Stony Brook University Metal oxide memristive Mixed-signal a-Si & solid electrolyte devices; Ab-initio simulation of neuromorphic 3D devices; 3D integration 3D integration with memristive devices hybrid circuit with CMOS CMOS architectures Fernandez-Serra, Lu Xia Likharev Likharev <------experiment ------> <------theory/modeling------>54 Crossbar Architecture ‐ Xbar to preserve density ‐ Passive (no ) but nonlinear I‐ V top ‐ Common way (from periphery) (nano)wire level i vr similar two‐terminal devices at each crosspoint ‐vw bottom (nano)wire v level w v Read Write

V V = =

V /2 Vw/2 V r V

A

CMOS for V V =V /2 V V =V /2 r decoding w V =Vr V =Vw and sensing July 2011 MURI Kickoff 55 Generic Memory Array

• Asserting a word line makes the access element to place the contents of the memory element in the bit line. A particular bit is then selected with a MUX. Access element

multiplex er

Memory decode element r Generic Memory Array

• An array of N2 memory elements can be uniquely accessed using 2N control signals (word+bit lines).

Other representation of the same array Area‐Distributed “CMOL” Interfaces (II)

Most important feature: pin array tilt by angle  = arcsin(Fnano/FCMOS) = arctan(1/r) pin 1

 pin 2A

2FCMOS

2rFnano

A B pin 2B 2Fnano Every nanowire (and hence every crosspoint) may be addressed from CMOS!

K. Likharev (2004, 2005); D. Strukov and K. Likharev (2006) 58 A Possible Solution

With this particular connectivity The pin translation wires are another domain geometry (r=3), we can layer of wires on top of the crossbar cover all the plane... But that is not always the case.

We can add more crossbar layers by simply inserting a layer of pin translation wires between them. Crossbar Analysis

The crossbar is rotated by an angle α such that:

Where r is an integer (an odd integer greater than 1).

Once we set r and β (the CMOS cell complexity), the angle α and Fnano are fixed as well the length of the wires in the crossbar and the number of memristive devices reachable per wire segment. Crossbar Analysis

The parameter r also sets the maximum, minimum and average paths the electric signals have to propagate to access a bit (a memristive device).

This paths are given by:

2 Maximum (worst) case: 2Fnano * (r -r + 1) Minimum (best) case: 2Fnano * r Average (real) case: 2Fnano * (r2 + 1)/2 3D Hybrid Integration with Multi‐Layer Crossbars

crosspoint device crosspoint device in 1st layer in 2nd layer

E

5

D

4 ~N2β2 C crosspoint via translation layer 3 devices B per layer 4 crossbar layer 2 (out of N A total)

1 CMOS layer N data/control lines N2 access devices/vias

A B C D E 1 2 3 4 5

connectivity domain in 1st layer via translation wires connectivity domain in 2nd layer 62 D. Strukov and R. S. Williams (2009) Project Overview APPLICATIONS (Cheng, Likharev, Strukov, Theogarajan) information 3D hybrid 3D hybrid SoC processing memories

ARCHITECTURES/CIRCUITS (Cheng, Likharev, Strukov, Theogarajan) 3D CMOS/nano circuits mixed‐signal w. area‐distributed interface CrossNets

Optical lithography e‐beam nanoimprint compact models (Strukov, Stemmer) (Lu) (Xia, Chabinyc) (Likharev, Strukov) DEVICES drift diffusion reproducible, high‐performance, and ab‐initio high‐endurance devices modeling MATERIALS (Fernandez‐ a‐Si metal oxide organic solid electrolyte Serra, Strukov)

(Lu) (Xia, Stemmer) (Chabinyc) (Lu) 63 IC Applications Continue to Demand More Memory and Higher Bandwidth

GPU, CPU, Chipset & FPGA High Networking

Memory Application

Baseband BT/WiFi Size

PM For most applications running on high‐

Chip Transceiver Peripheral end SoCs, amount of available memory I/O Controller PA and memory bandwidth have been and Discrete will continue to be the bottlenecks

100 200 300 400 500 600 64 I/O High Bistable Two‐Terminal Devices (latching switches, a.k.a. resistive switches, a.k.a. programmable diodes, a.k.a. memristive switches) Demonstrated with many materials; no clear winner yet; few reproducibility reports, e.g.:

Si /α-Si / M: Ti / Pt / TiO2 / Pt:

S. H. Jo and W. Lu (2008) Q. Xia et al. (2009); 65 J. Borghetti et al. (2010) Device Requirements Vary for Different Ckts/Architectures/Applications

Dynamic range Signal of resistance DC AC

Small Tuning ‐

Memory, Large MAC FPGA, DAC

66 CMOS‐CMOL Integration: Initial Demonstration

(a) (c) PI: Xia

(d)

(e)

(b)

67 Xia, Strukov et al. (2009)