NEMO5 on Blue Waters – A Flexible Package for Modeling Problems

Gerhard Klimeck Network for Computational

PRAC: Accelerating Nano-scale Transistor Innovation PI: Gerhard Klimeck Blue Waters Symposium June 2018 Moore’s Law Forever? Number of transistors: Moores law is continuing

http://jai-on-asp.blogspot.com 2005: free lunch is over, updated 2009 CPU’s are not getting faster! Number of transistors: Moores law is continuing

Clock speed: no longer scaling

http://jai-on-asp.blogspot.com 2005: free lunch is over, updated 2009 Power is the Limit! Number of transistors: Moores law is continuing

Clock speed: no longer scaling Power: todays limitation ~100W

http://jai-on-asp.blogspot.com 2005: free lunch is over, updated 2009 Limited Performance Improvements

Number of transistors: Moores law is continuing

Clock speed: no longer scaling Power: todays limitation ~100W Performance Gain: limited Supply voltage (Vdd) stopped http://jai-on-asp.blogspot.com scaling at around 2003 2005: free lunch is over, updated 2009 What is Special about 100W ?

Power: todays limitation ~100W

http://jai-on-asp.blogspot.com 2005: free lunch is over, updated 2009 Intel Projection from 2004 How do we burn power? Switching Circuits & Leakage => Transistors

Power: todays limitation ~100W

http://jai-on-asp.blogspot.com CMOS Inverter Dynamic / Switching Power: Vdd P ØCharging a capacitor network leakage ! H L ! ∝ !!!!!!!!! conduct ØReduce frequency L N Vss ØReduce capacitance => device size K ØReduce voltage J Static Power: ØLeakage through transistors

! ∝ !!""!∝ 1/exp!(V!!)! L “Fundamental” Limit Vg S D log Id Metal Oxide Threshold n+ p n+

Vg↑ S≥60 mV/dec

0 Vdd Vg

E

x “Fundamental” Limit Vg S D log Id Metal Oxide Threshold n+ p n+

DOS(E), log f(E) Vg↑ S≥60 mV/dec ` Ef 0 Vdd Vg

E

x

log ! ! ∝ −!! !! Device Scaling for Performance

log I Dynamic / Switching Power: d ØCharging a capacitor network ! ! ∝ !!!!!!!!!! Ø Reduce supply voltage I OFF S≥60 mV/dec

0 Vdd Vg Vdd Vdd Static Power: ØLeakage through transistors

! ∝ !!"" ∝ 1/exp!(V!!)! Device Scaling for Performance

log Id

I OFF S≥60 mV/dec

From: Eric Pop, Nanotechnology

0 Vdd Vg

Static Power: ØLeakage through transistors

! ∝ !!"" ∝ 1/exp!(V!!)! Device Scaling for Performance

log Id log Id ION

S<<60 mV/dec I S≥60 mV/dec OFF S≥60 mV/dec IOFF

0 Vdd Vg 0 Vdd Vg

Static Power: ØLeakage through transistors

! ∝ !!"" ∝ 1/exp!(V!!)! Need a Different Switch

log I G d n-type S D Metal ION Oxide p+ i n+

S<<60 mV/dec log f(E) S≥60 mV/dec IOFF Ef 0 Vdd Vg

Cold injection Static Power: ØLeakage through transistors

! ∝ !!"" ∝ 1/exp!(V!!)! Ultra-Thin-Body (8nm) InAs BTBT Device

§ Do not need phonons for direct current § Highly non-parabolic conduction band § Realistic valence band features

Gerhard Klimeck UTB Band Edges

• 1D quantization => Bandgap raised from bulk 0.37eV => ~0.6eV • Realistic Poisson Solution Doping 1x1018/cm3 • High doping regions flat bands • Central gate region good control

Doping 1x1018/cm3 § Gerhard Klimeck UTB: Current Hotspots in Energy

• 1D quantization => Bandgap raised from bulk 0.37eV => ~0.6eV • Realistic Poisson Solution Doping 1x1018/cm3 • High doping regions flat bands • Central gate region good control

• Bandgap narrow at »Certain energies »Certain biases

Gerhard Klimeck UTB Current Density in Energy

• 1D quantization => Bandgap raised from bulk 0.37eV => ~0.6eV • Realistic Poisson Solution Doping 1x1018/cm3 • High doping regions flat bands • Central gate region good control

• Bandgap narrow at »Certain energies »Certain biases • 2 energy regions of current flow »Low Ec / high Ev »High Ec / low Ev

Gerhard Klimeck UTB Current as a function of Gate Voltage

• Current vs. Gate Voltage

Gerhard Klimeck Application to pin InAs UTB and Nanowire

Single-Gate UTB 20nm 20nm

6nm 6nm

GAA Nanowire 20nm

-3 NA=ND=5e19 cm 6nm t =1nm ox Double-Gate UTB

Gerhard Klimeck BTBT in pin InAs Devices – SG-UTB / DG-UTB / NW

Subthreshold Swing @Vds=0.2 V

Nanowires can deliver very steep turn-offs!

Gerhard Klimeck Tunnel FETs

J. Charles, P. Sarangapani, D. Lemus, Chin Yi Chen, P. Long, Tarek Ameen, X. Guo, T. Kubis, G. Klimeck Nitride Tunnel Hetero-structures

Multi-Scale, Multi-Physics 23Partitioning is critical Nitride Tunnel Hetero-structures

ON and OFF current need careful24 Geometry designs phonon scattering degrades transistor performance

Local current density 1016

1014

1012

scattering

1010 A/(eV*m) BlueWaters simulation: Phonon scattering degrades ON/OFF ratio of 3HJ TFETs.

At 30nm Lg , VDD must be increased ~0.07V to maintain on/off ratio 25 5. Chemically doped tunneling FETs Channel thickness optimization Application: Tunneling FETs Impact: § Low power electronics. § TFET’s performance Ion/Ioff can be improved up § Mobile/medical devices. 4 Simulation method: 10 if optimized channel thickness is used. § Quantum transmitting boundary method coupled with Poisson equation WSe2 BP InAs § 10 band WSe2 , Phosphorene (BP), and InAs tight binding Hamiltonian Achievement: V § Channel thickness design rule. DD ü Compact model developed. VDD source oxide drain

oxide

Details: arXiv:1804.11034; https://arxiv.org/abs/1804.11034 26 Benchmarking by industry: Charge-devices continue to shine D. Nikonov, I. Young (Intel) 102 2012 ST: Spin-Torque Charge-based devices outperform ST transfer 101 others in electronic switching Spin-wave triad

ST oscillator 0 All-spin 10 logic ST Transfer/ Graphene pn junction SpinFET BISFET ST Domain-Wall Nanomagnet -1 majority 10 CMOS high performance gate Logic

Heterojunction CMOS low power -2 10 III-V Energy(fJ)

Graphene nanoribbon 10-3 Preferred corner Tunnel-FETs 10-4 10-1 100 101 102 103 104 Delay (ps)

Energy vs delay of inverters with fanout 4 with current-controlled switching, Vdd=0.01 V Power Problem: Tunneling Transistors to the Rescue!

For a little while! Intel Roadmap Today: non-planar 3D devices Better gate control!

Intel 22nm finFET

http://www.goldstandardsimulations.com/index.php/news/blog_search/simulation-analysis-of-the-intel-22nm-finfet/ http://www.chipworks.com/media/wpmu/uploads/blogs.dir/2/files/2012/08/Intel22nmPMOSfin.jpg Today: non-planar 3D devices Better gate control!

22nm = 176 atoms 8nm = 64 atoms

http://www.goldstandardsimulations.com/index.php/news/blog_search/simulation-analysis-of-the-intel-22nm-finfet/ http://www.chipworks.com/media/wpmu/uploads/blogs.dir/2/files/2012/08/Intel22nmPMOSfin.jpg Today: non-planar 3D devices Better gate control! 1,085 atoms

22nm = 176 atoms 8nm = 64 atoms

http://www.goldstandardsimulations.com/index.php/news/blog_search/simulation-analysis-of-the-intel-22nm-finfet/ http://www.chipworks.com/media/wpmu/uploads/blogs.dir/2/files/2012/08/Intel22nmPMOSfin.jpg Roadmap of finite atoms! Atomistic Modeling => NEMO

nm Node 22 14 10 7 5 Node atoms 176 122 80 56 40 Critical atoms 64 44(?) 29(?) 20(?) 14(?) Electrons 160-190 64-80 30-38 18-23 11-15 Nanoelectronics Today

• $300 billion industry • International Technology Roadmap for (ITRS) • Moore’s Law • Countable number of atoms • Quantum effects • Devices (transistors) • Smaller, faster, more energy efficient • Designs, materials Band-to-band tunneling Topological insulators Single atom transistors

IEEE Elec. Dev. Lett. 30, 602 (2009) Nature Physics 6, 584 (2010) Nature Nanotechnology 7, 242 (2012) http://www.chipworks.com/media/wpmu/uploads/blogs.dir34 /2/files/2012/08/Intel22nmPMOSfin.jpg NEMO5 - Bridging the Scales From Ab-Initio to Realistic Devices Ab-Initio TCAD

Goal: Approach: • Device performance with realistic • Ab-initio: extent, heterostructures, fields, etc. • Bulk constituents for new / unknown materials • Small ideal superlattices Problems: • Map ab-initio to tight binding • Need ab-initio to explore new (binaries and superlattices) material properties • Current flow in ideal structures • Ab-initio cannot model non- • Study devices perturbed by: equilibrium. • Large applied biases • TCAD uses quantum corrections • Disorder 35 • Phonons Quantum Transport far from Equilibrium

Macroscopic dimensions NonLaw-Equilibriumof Equilibrium Quantum: Statisticalρ = exp (−( HMechanics− µN)/kT)

Diffusive Atomic Ballistic Drift / dimensions Diffusion Quantum Σs Boltzmann Unified model Transport µ1 H µ2 Non-WhichEquilibrium GreenFormalism? Functions Σ1 Σ2 S D SILICON S D INSULATOR VG VD VG VD Gerhard Klimeck,I Supriyo Datta I Multi-Scale Modeling

Atom Geometry Positions Construction ~10-50 million o Valence Force Field Atomistic atoms Strain (VFF) Method Relaxation

Piezoelectric Hamiltonian Electrical / o Piezoelectric eff. Valence Potential Construction Magnetic field Pol. charge density Electrons o Empirical tight binding Optical Prop. ~0.5-10 million Single Particle 3 5 States Electronic Str sp d s* + spin orbit

e-e Many Body Optical Prop. o SCP: Poisson + LDA interactions Interactions Electronic Str Few States o Slater Determinants A Journey Through Nanoelectronics Tools NEMO and OMEN

NEMO-1D NEMO-3D NEMO3Dpeta OMEN NEMO5 Transport Yes - - Yes Yes Dim. 1D any any any any Atoms ~1,000 50 Million 100 Million ~140,000 100 Million Crystal [100] [100] [100], Any Any Cubic, ZB Cubic, ZB Cubic,ZB, WU Any Any Strain - VFF VFF - MVFF Multi- - Spin, physics Classical Parallel 3 levels 1 level 3 levels 4 levels 4 levels Comp. 23,000 cores 80 cores 30,000 cores 220,000 co 100,000 cores

38 A Journey Through Nanoelectronics Tools NEMO and OMEN

NEMO-1D NEMO-3D NEMO3Dpeta OMEN NEMO5 Transport Yes - - Yes Yes Dim. 1D any any any any Atoms ~1,000 50 Million 100 Million ~140,000 100 Million Crystal [100] [100] [100], Any Any Cubic, ZB Cubic, ZB Cubic,ZB, WU Any Any Strain - VFF VFF - MVFF Multi- - Spin, physics Classical Parallel 3 levels 1 level 3 levels 4 levels 4 levels Comp. 23,000 cores 80 cores 30,000 cores 220,000 co 100,000 cores

39 A Journey Through Nanoelectronics Tools NEMO and OMEN

NEMO-1D NEMO-3D NEMO3Dpeta OMEN NEMO5 Transport Yes - - Yes Yes Dim. 1D any any any any Atoms ~1,000 50 Million 100 Million ~140,000 100 Million Crystal [100] [100] [100], Any Any Cubic, ZB Cubic, ZB Cubic,ZB, WU Any Any Strain - VFF VFF - MVFF Multi- - Spin, physics Classical Parallel 3 levels 1 level 3 levels 4 levels 4 levels Comp. 23,000 cores 80 cores 30,000 cores 220,000 co 100,000 cores

40 A Journey Through Nanoelectronics Tools NEMO and OMEN

NEMO-1D NEMO-3D NEMO3Dpeta OMEN NEMO5 Transport Yes - - Yes Yes Dim. 1D any any any any Atoms ~1,000 50 Million 100 Million ~140,000 100 Million Crystal [100] [100] [100], Any Any Cubic, ZB Cubic, ZB Cubic,ZB, WU Any Any Strain - VFF VFF - MVFF Multi- - Spin, physics Classical Parallel 3 levels 1 level 3 levels 4 levels 4 levels Comp. 23,000 cores 80 cores 30,000 cores 220,000 co 100,000 cores

41 A Journey Through Nanoelectronics Tools NEMO and OMEN

NEMO-1D NEMO-3D NEMO3Dpeta OMEN NEMO5 Transport Yes - - Yes Yes Dim. 1D any any any any Atoms ~1,000 50 Million 100 Million ~140,000 100 Million Crystal [100] [100] [100], Any Any Cubic, ZB Cubic, ZB Cubic,ZB, WU Any Any Strain - VFF VFF - MVFF Multi- - Spin, physics Classical Parallel 3 levels 1 level 3 levels 4 levels 4 levels Comp. 23,000 cores 80 cores 30,000 cores 220,000 co 100,000 cores

42 Modeling Electronic Transport of Metal Interconnects

Daniel Valencia

Electrical and Computer Engineering Why interconnect power dissipation is a BIG deal

Total interconnect length in Metal 1 and Intermediate wiring ~ 3km/cm2 !!!

Electron travels through far more metal than semiconductor!!!

2004, Nir Magen, Intel http://lp-hp.com/pangrle/files/2011/09/barry1.png ITRS 2007 Interconnect

45 45 Sources of resistance in interconnects

1. Surface and Surface roughness scattering (Metal-dielectric Interface) 2. Grain Boundary scattering (Metal-Metal interface)

2 3. Barrier scattering (Metal-Metal Interface) 1

3

Image from EE-311, V.K.Saraswat, Stanford University 2003

ITRS Interconnect 46(2010) 46 Challenges

Resistivity as a function of the Atomistic structure of an width interconnect

Traditional Model

Challenges of Traditional Models Challenges of Atomistic Models o Require experimental input o Structure becomes an arrangement of o Fail to describe the effect of surface atoms instead of a continuous bulk roughness and grain boundary material individually o Atoms models requires a quantum o Pure fitting process and lack of treatment physical meaning of those parameters o Grain has 104 to 105 atoms à cannot be modelled by ab initio Atomistic models can be used to o Atom positions are unknown a priori à tackles those shortcomings the structure must be minimized Results: Resistivity at room temperature Resistivity at room At room T, resistivity:ρT = ρGB + ρSR + ρe−ph temperature Simulated copper thin- filmLength

Grain 1 Grain 2 Grain 3 Width

Source Drain

The resistivity for copper thin films at room T is simulated assuming same structures than before but Results: o At room T, the e-ph effects are o e-ph method correctly describes the included by the Molecular Dynamics + increasing in the resistivity at room Landauer formalism temperature o 10 different “snapshots” are averaged o e-ph contribution is smaller than SR per configuration effect for the simulated results Mode Space: Computational Burden Reduction

Daniel Lemus

Electrical and Computer Engineering Purdue University Computational complexity of Non- Equilibrium Green’s Functions (NEGF)

NEGF equations best solved with recursive Green’s function (RGF) algorithm Device Hamiltonian ! rows for total device

Solution for RGF: # *(+) for + layers

With RGF:

Time-to-solution: Matrix mult./inversions $(&') ,- = / − # − Σ- 23 † ,4 = ,-Σ4,- Memory: Σ4 = ,464 " rows for $(&)) Σ- = ,-6- + ,-64 + ,46- single layer 54 Computational complexity - example

• Hundreds of thousands of rows » ! "# time to solution $ » ! " memory footprint NEGF equations solved • Realistic basis sets 500 million times » 10 rows per atom 80 million CPU hours can be • Self-consistency spent on designing one device » 100 iterations • Multiple energies, momentums All of these factors needed for » 500 instances realistic simulations of nanodevices • Multiple electrostatic response tests » 10 instances A way to reduce this workload • Calculation of inelastic scattering is required » 100 instances • Geometry/material variations » 10 instances 55 Mode space basis reduction

2880 81

2880 ∗ 81 !" ∗

81 ! × # × !" !# !$ = ∗ (+,-- 81 ( 2880 )* !$ 2880

3 nm 3 nm device using sp3d5s*: Reduced to 2.8% 2880x2880 matrices (single unit cell) of original rank . /3 = ~23,887,872,000 operations

3 nm3 nm device using 81 modes: Huge reduction in 81x81 matrices computational complexity . /3 = ~531,441 operations

56 Speedup assessment

209

57 Memory reduction

7.14

58 Realistic simulations solvable in mode space

Example: 5.43 nm-wide nanowire solved with inelastic scattering

Computation on Blue Waters: • 160 hours • 16,384 cores 5.43 • 2.6 million core hours nm Peak memory: • 23.5 GB per node

Reduced basis to 2.8% 20.63 nm If simulated in full basis, would require: • Mode space opens capability to solve realistically-sized device Time to solution: with realistic physics 547 million core hours 3.81 years on 16,384 cores • Devices up to 18 nm width have been solved in mode space with Peak memory: inelastic scattering 167.8 GB per node 59 All Electron Model Development

YuanChen Chu

Electrical and Computer Engineering Purdue University Standard Electron – Hole Model vs. All Electron Model ØLack of physics based charge model

Excess-charge model: • Unclear particle nature !" or ℎ$ ? • Heuristic delimiter with arbitrary prefactors • Large impact on electrostatics Recombination and Generation Modeling

Kuang Chung Wang

Electrical and Computer Engineering Purdue University Modeling recombination and generation

Problem: Current NEGF models fails to address recombination and generation mechanism efficiently with the versatilities of geometry. Solution: Innovate new method based on Büttiker Probe NEGF model. Application: Enable device prediction with dominant recombination generation processes. 2D materials Solar cell Generation - - - - - Top gate Photon Bot gate + + + III-nitrides LED SRH Radiative - - - - - Photon - - - - - Phonon, heat

+ + + + + + https://www.theverge.com/2018/1/22/16921450/trump-tariffs-solar-cells-panels-jobs-clean-energy-imports- china-suniva-solarworld 65 Pro and Cons of the state of art tools

Semi-classical: Commercialized software ü Spatially continuous recombination generation. e e e e Carriers are particle like.(Newton physics) e e e e h h Efn Efp assumed. h h h h

Multi-eqneq: NEMO5, state of art quantum transport LED sim. ü Carriers are wave like. (Quantum physics) Eq and Neq regions required. Eq: complete thermalization Eq: recombination only in Eq Neq: discontinuous η

Büttiker Probe RG ü Carriers are wave like. (Quantum physics) ü Continues η (bandtail, scattering) ü Incomplete thermalization. Slower? Convergence? 66 Quantum Computing Single Atom Devices P:Si

Chin Yi Chen Band structure of 2D densely doped Phosphorus in Si (Si:P ! doped layer) Application: Si:P * doped layer: Achievement: § Fundamental component of donor based § Band structure matches ARPES measurement. quantum computing. § ARPES: ü Angle-resolved photoemission spectroscopy Simulation method: New discovery: § Poisson Schrodinger solver § Si:P * doped layer has a dielectric constant two § 10 band Si/P tight binding Hamiltonian times of bulk Si.

Impact: § Accurate sub-band structure calculation. ü Important for single electron transport in donor quantum computing application.

Simulation EK along [110] APPES EK along [110] +, − ., = 0.1234 0, − ., = +1234

3Γ 2Γ

2Γ Γ &'() 1Γ 1Γ

69 Non-Local Scattering with a New Recursive Nonequilibrium Green’s Function Method

James Charles, Prasad Sarangapani, Yuanchen Chu, Gerhard Klimeck, and Tillmann Kubis Network for Computational Nanotechnology (NCN) Electrical and Computer Engineering

[email protected] Scattering In Nanoscale Devices

biosensing devices photoelectronic devices logic devices

band tails recombination rates sensitivity variations degraded subthreshold slope phonon-assisted tunneling

Stanene on Graphene Sensors and Actuators B: Chemical, Volume Phys. Chem. Chem. Phys., 2016, 18, 16302 207, Part A, 2015, 801–810 http://newsroom.intel.com/docs/DOC-2035

What do these devices have in common? Host: impurities, defects, vibrations, fluctuations

need to account for scattering in transport models

71 Results: Performance: Non-local RGF Comparisons

Si 2x2x20 nm square nanowire in !"#$%!∗ – single energy point Full inversion: 150 GB for single inversion (not feasible) Question: How does non-local RGF compare to local RGF?

2.5 25 60 peak memory ratio 150 peak memory 2.0 20 best fit to '(.* 50 1.5 100 40 15

1.0 30 time [h] time 50 10 local RGF 20 local RGF 0.5 5 10 0.0 0 Peak Simulation Memory [GB] [GB] Memory Simulation Peak nonlocal RGF / local RGF time RGF / local RGF nonlocal 0 0 0.0 0.5 1.1 1.6 2.2 2.7 0.00.3 0.5 0.8 1.1 1.4 1.6 1.9 2.22.4 2.7 3.0 nonlocal RGF / local RGF time memory time RGF local / RGF nonlocal nonlocality range in transport direction [nm] nonlocality range in transport direction [nm] non-local RGF allows full control over amount of non-locality covered 72 Multi Quantum Well LEDs J. Geng, P. Sarangapani, E. Nelson, C. Wordelman, B. Browne, T. Kubis, and G. Klimeck Multi-Quantum Well LED

• Medium to high power blue light • Multiphysics » Barriers in equilbrium (red) » Wells in nonequilibrium (green) • Strong scattering

74 nanoHUB.org • >1.4 million annual visitors • >500 simulation tools » Nanoelectronics, nanophotonics, materials science, molecular electronics, carbon-based systems, Microelectromechanical systems, uncertainty quantifications • >5,000 resources (video lectures, presentations, tutorials, etc.) • nanoHUB-U (online courses) » Nanoscale transistors (Prof. Mark Lundstrom) » Fundamentals of Nanoelectronics (Prof. Supriyo Datta) » From Atoms to Materials: Predictive Theory and Simulations (Prof. Alejandro Strachan) » Principles of Electronic Nanobiosensors (Prof. Ashraf Alam)

75 NEMO5 on nanoHUB.org

• OMEN/NEMO5-based nanoHUB tools » Brillouin Zone Viewer ü Crystal theory » Crystal Viewer ü Visualize crystals, design your own » Bandstructure Lab ü Semiconductor fundamentals » 1dhetero ü Poisson-Schroedinger Solver for 1 dimensional heterostructures » RTDNEGF – ü Calculate current through resonant tunneling diodes » Quantum Dot Lab ü Eigenstates (energy levels) of particle in a box system » nanoFET ü Calculate transport properties of transistors ü NEMO5 powered version of OMENwire, OMEN_FET • Distribution and Support Group on nanoHUB.org » https://nanohub.org/groups/nemo5distribution » Source code, example, discussion forum, run NEMO5 on Purdue Resources

76 NEMO5 nanoHUB Usage

Include one cool breakthrough image

NEMO Powered Tools >27,000 Users >500,000 Simulation Runs >3,700 students in 381 classrooms iNEMO Group

• PI: Gerhard Klimeck • 3 Research Faculty: Tillmann Kubis, Michael Povolotskyi, Rajib Rahman • Research Scientist: Jim Fonseca • 2 Postdocs: Bozidar Novakovic, Jun Huang • Students: Tarek Ameen, James Charles, Chin- Yi Chen, Fan Chen, Yuanchu Chen, Rifat Ferdous, Jun Zhe Geng, Xinchen Guo, Yuling Hsueh, Hesameddin Ilatikhameneh, Daniel Lemus, Pengyu Long, Daniel Mejia Padilla, Kai Miao, Samik Mukherjee, Harshad Sahasrabudhe, Prasad Sarangapani, Frederico Severgnini, Archana Tankasala, Gustavo Valencia, Daniel Valencia Hoyos, Kuang Chung Wang, Yu Wang, Evan Wilson • Ryan Mokos, Sharif Islam • PRAC • Intel, Samsung, Philips, TSMC

Materials Structures Devices 78