nanoHUB.org online simulations and more Network for Computational (NCN)

Parallel Computing for Realistic Nanoelectronic Simulations Gerhard Klimeck, NCN Technical Director

September 12th, 2005

Univ. of Florida, Univ.of Illinois, MIT, Morgan State, Northwestern, Purdue, Stanford, UTEP

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Simulation is essential for online simulations and more Nanotechnology Development

Hint from the Industry: • No new devices / circuits designed without software! Simulation Problem: • Accepted nano simulation tool suite does NOT exist.

Approach: • Conduct research in Modeling and Simulation of: • • Nanoelectromechanics • Nano-bio sensors • Computational science • DEVELOP and DEPLOY to nanoscience and nanotechnology community Characterization Fabrication

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Online simulation and MORE online simulations and more collaboration

animations

Real users! Real Usage! online simulation >6,600 nanoHUB.oUsersrg >71,000 simulations

learning modules

seminars courses, tutorials Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Middleware for Deployment online simulations and more

Remote access to simulators and compute power

nanoHUB.org Unique Middleware internet tool Physical Machine

Mostly Batch Operation

Old -system: PUNCH  Jobs are run on a specific hardware  Most jobs are serial  Most jobs are batch oriented and not interactive New System: In-VIGO  Jobs can be distributed to various machines  Virtual machines can generate specific OS on various machines

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Middleware for Deployment online simulations and more

Remote access to simulators and compute power

nanoHUB.org Unique Middleware internet Virtual Machine tool Physical Machine

RMeomsotltye Bdaetsckht oOpp (eVrNatCio)n

Old -system: PUNCH  Jobs are run on a specific hardware  Most jobs are serial Job  Most jobs are batch oriented and Manager not interactive New System: In-VIGO  Jobs can be distributed to various machines  Virtual machines can generate TeraGrid Cluster specific OS on various machines (future)

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallel Computing: online simulations and more Why and How? Motivation: Approaches: • Serial computation on single CPU • Maximize the work on each CPU requires • Minimize communication between CPUs .Too much time . Communication is your “enemy” .Too much memory ⇒Identify “identical code segments” that can Available Resources: be treated independently of each other • Traditional parallel computers .with shared memory Technologies: .expensive in FLOPS/$ • Batch processing with shells • Cluster computers / Beowulfs • Independent executables - trivial .Distributed memory - independent • Automated compiler-based parallelism computers on a dedicated network • Typically limited to shared memory machines .“cheap” in FLOPS/$ and few CPUs / threads Objective: • OpenMP - shared memory model • Distribute workload on N CPUs • MPI - message passing interface • Only communicate where programmer decides • Best achievable: Tparallel = Tserial / N Minimize communication => Keep available N CPUs busy • • Runs on almost all parallel computers • Runs on clusters / beowulfs

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Really Embarrassing Parallel online simulations and more Time

1 2 3 4 5 ... N-1

•Spawn independent jobs one single time •Possibly trivial data aggregation

⇒shell-based parallelism or ⇒MPI-based parallelism

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Embarrassingly Parallel online simulations and more

0 Time 1 2 3 4 5 ... N-1

0

1 2 3 4 5 ... N-1

•Data needs to aggregated by one master CPU ... •Master distributes work to slaves repeatedly => MPI-based approaches

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Coherently Parallel Processes online simulations and more Lockstep Parallelism

1 2 3 4 5 6 •Compute

Time 1 2 3 4 5 6 •Communicate odd <=> odd+1

•Communicate 1 2 3 4 5 6 odd <=> odd-1

•Communicate 1 2 3 4 5 6 all-2-all, aggregate

1 2 3 4 5 6 •Compute ... •Repeat

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization on Multiple Levels online simulations and more

•Sometimes the work cannot be easily distributed amongst many CPUs .Maybe there are not enough members in the high level loop .Maybe parallelism is only efficient on a few CPUs on the low level loop •Sometimes the levels of available parallelism change .User runs an experimentwith just a single bias point .User runs a model that has no “k” level parallelism

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Basic Genetic Algorithm online simulations and more

S1’ S1’ S1 S1’ S1’ S1 E E v v o o l u l

u S2’ S2’ S2 F

F S2’ S2’ S2 t C t i i o i i o t t o n n n n m e … e … … … … … s s b s s i n -R -R … Si’ Si’ e … Si’ Si’ a

a S n n e k k t i i n n ... Si’’ Si’’ s ... Si’’ Si’’ g g … ...... … ...... SN SN’ SN SN’ SN’ RIP SN’ RIP

Generation M Generation M+1

• Genetic algorithm parameter optimization is based on: .Survival of good parameter sets .Evolution of new parameter sets .Survival of a diverse population

• Optimization can be performed globally, rather than locally.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Basic Evolution Operations online simulations and more • Each set (Si) consists of several parameters (Pj) • The parameters Pj can be of different kinds: real, integers, symbols, …. Gross Exploration Fine Tuning Set PI PI Set 1 1’ P2 Q2 … … Pk Pk Set PI PI Set 3 3’ P2 P2’ Set QI QI Set 2 2’ … … Q2 P2 Pk Pk … … Qk Qk Mutation operation Crossover operation

•Crossover explores different combinations of existing genes. •Creation of new gene values.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Global Optimization via Genetic Algorithms online simulations and more

sin(x) sin(y) sin(x ! 4) sin(y ! 4) F(x, y) = + 0.7 x y (x ! 4) (y ! 4)

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Global Optimization: online simulations and more Genetic Algorithm Development

Genetic Algorithm Convergence pop = 100, 300 generations, steady-state (10%), 2-point crossover p = 0.85, mutatation p = 1/2

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Embarrassingly Parallel online simulations and more

0 Time 1 2 3 4 5 ... N-1

0

1 2 3 4 5 ... N-1

•Data needs to aggregated by one master CPU ... •Master distributes work to slaves repeatedly => MPI-based approaches

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org GENES - RTD Structural Analysis online simulations and more

•Allow genetic algorithm to vary 5 • Start from “random” population of different structural parameters: 5 parameters. .3 Thicknesses: well, barrier, spacer • Well width is larger than nominal. .2 Dopings: low doped spacer, • No intentional doping is larger unintentional doping in center than nominal.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org online simulations and more Mapping of Orbitals to Bulk Bandstructure

Bulk are described by: • Conduction and valence bands, bandgaps (direct, indirect), effective masses • 10-30 physically measurable quantities Tight Binding Models are described by: Atomic Orbitals Structure size: 0.2nm • Orbital interaction energies. • 15-30 theoretical parameters

z

High Dimensional y Fitting Problem (1/4)Vsa, pc anion s orbital x coupled to cation p orbital 15-30 theoretical interaction energies 10-30 data points of bands and masses

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Semiconductor Compounds: online simulations and more cation: In, Ga, Al anion: Sb, As, P In Ga Al •Match experimental data in various electron transport Sb areas of the Brillouin zone: .Effective masses of electrons at Γ, X and L .Effective masses of holes at Γ .Bandedges at Γ, X and L

•Each individual material As poses a 15 dimensional fitting problem.

z

P y (1/4)Vsa, pc anion s orbital x coupled to cation p orbital

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Simulation in Si and Ge: 2nd Nearest Neighbor Model - online simulations and more Optimization in a 37-dimensional Space

Value Target Rel. Error Description 0.0006 0.0000 5.72E-04 Vhh_E Si Target: 0.6760 0.7500 9.87E-02 kz_X_Delta •Energy at X 1.1330 1.1300 2.62E-03 E_kz_X_Delta 0.8931 0.9163 2.53E-02 mlong_kz_X_Delta •Masses at X 0.1932 0.1905 1.40E-02 mtran_kz_X_Delta -0.1515 -0.1530 1.00E-02 Vlh_mlong_001 •Hole masses -0.5281 -0.5370 1.65E-02 Vhh_mlong_001 -0.2399 -0.2340 2.52E-02 Vso_mlong_001 0.0451 0.0450 3.25E-03 Delta_so 1.5745 1.6000 1.59E-02 E_kz_K_Delta 2.8168 3.3500 1.59E-01 Cgam_Eg 1.7742 2.0500 1.35E-01 E_kz_L_Delta

Value Target Rel. Error Description 0.0001968 0 1.97E-04 Vhh_E Ge Target: 0.664905 0.664 1.36E-03 E_kz_L_Delta 1.575913 1.59 8.86E-03 mlong_kz_L_Delta •Energy at L 0.081305 0.0807 7.50E-03 mtran_kz_L_Delta •Masses at L -0.04369 -0.0438 2.61E-03 Vlh_mlong_001 -0.04232 -0.0426 6.48E-03 Vlh_mlong_011 •Hole masses -0.04191 -0.043 2.54E-02 Vlh_mlong_111 -0.27446 -0.284 3.36E-02 Vhh_mlong_001 -0.34398 -0.376 8.52E-02 Vhh_mlong_011 Exhaustive search expl: -0.37411 -0.352 6.28E-02 Vhh_mlong_111 1037 evaluations -0.09513 -0.095 1.40E-03 Vso_mlong_001 1 evaluation/sec 0.0393357 0.038 3.51E-02 Cgam_mlong 0.8012496 0.805 4.66E-03 Cgam_Eg Runtime on 1000 CPUs: 26 0.8 0.84 4.76E-02 kz_X_Delta => 3x10 years 1.1020801 1.16 4.99E-02 E_kz_X_Delta 0.2882422 0.3 3.92E-02 Delta_so PGA: 12hrs 128CPUs

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org GENES: online simulations and more Genetically Engineered Nanostructures •Genetic Algorithms: . Can find GLOBAL minima . Explore a design space driven by design requirements

•Demonstrsted application to: . Optical filter designs . Nanodevice design Inverse problem solution - What was being bullt? . Nanoscale basis representation optimization

•Runs on cheap Beowulf cluster computers!

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 3-D online simulations and more Multi-Million Atom Electronic Structure Simulations

Designed Optical Transitions Sensors

Quantum Dot Array s Atomic Orbitals Structure Nanoscale Quantum States Computing size: 0.2nm Millions of atoms (Artificial Atoms, size 20nm)

Eigenvector problem for NSF TeraGrid - NCSA Itanium 8 Demonstrated matrices of order 10 for the first time 21 Million atom system

Volume: (78nm)3

Will deploy on Cluster nanoHUB next year

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 3-D calculation online simulations and more sequence

STRAIN+ELECTRONIC

NEMO 3-D calculation sequence: h 1. Geometry Construction STRAIN 2. Strain: valence force field method (VFF)

3. Electronic Structure: 20-band nearest- d neighbor tight-binding model InAs QDs embedded in a GaAs barrier material d=18.09 nm and h=1.7 nm, on a 0.6nm wetting layer

Strain computed in huge domain

Electronic structure computed in central domain

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization and Methods online simulations and more

• Divide Simulation domain into slices. • Strain computed in open source CG algorithm • Communication only from one slice to the • Electronic structure needs eigenvalues and next (nearest neighbor) eigenvectors. Matrix is Hermitian • Communication overhead across the • Released NEMO 3-D methods: surfaces of the slices. • Standard 2-pass Lanczos • PARPACK about 10x slower • Limiting operation: • Folded Spectrum Method (Zunger), also complex sparse matrix-vector multiplication typically slower than Lanczos • Enable Hamiltonian storage or re-computation on the fly.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Coherently Parallel Processes online simulations and more Lockstep Parallelism

1 2 3 4 5 6 •Compute

Time 1 2 3 4 5 6 •Communicate odd <=> odd+1

•Communicate 1 2 3 4 5 6 odd <=> odd-1

•Communicate 1 2 3 4 5 6 all-2-all, aggregate

1 2 3 4 5 6 •Compute ... •Repeat

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization Benchmark online simulations and more Multi-machine comparison

Comparison:

Strain - communication limited Electronics scale well Cluster networks appear noisy Clusters deliver better FLOPs/$

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 3-D Memory Requirements online simulations and more d i r

Memory Requirements: g a r

• How much memory is needed? e T

• Is the memory used efficiently? F S N

n

• Strain calculation tops off in terms of o

efficiency d e m

• Electronic structure requirements do r o f not top off yet r e P

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Proof of Concept online simulations and more Extraction of Targeted Interior Eigenstates

N states N~2mln

P ~40 eV ~10 meV S

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Proof of Concept online simulations and more Extraction of Targeted Interior Eigenstates

11.3 nm 33.9 nm 50.9 nm 22.6 nm 67.8 nm

101.7 nm

N states N~2mln

P S

Unique and targeted eigenstates of correct symmetry can be computed in all electronic computational domains Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Extent of the Strain Field online simulations and more

2 MLN ATOMS h

1.39 Electron states 1.38

1.37

) 1.36 V P e

d ( 1.35 MFixReEEd BSU.CR.FACE y g r

e 1.34 • Domain ratio d/h is fixed to 2 n E DEEPLY BUFixReIEDd B.C. • Electronic domain always 1.33 1.32 contains 2 mln atoms S 1.31 • Strain domain contains 40 60 80 100 120 140 160 up to 64 mln atoms d (nm) • Computations for deeply buried dot and dot array with free surface Result: strain field is long-range!

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Arrays of quantum dots: online simulations and more neighbor distance and finite cap size

Cap thickness c Interdot distance b parameter Measured from Electronic energies QD edge to edge 1.316

Free / open surface BC 1.315 c = 23.18 nm )

V 1.314 e c = 17.52 nm b ( c y 1.313 g r e

n 1.312

a E 1.311 c = 11.87 nm 1.310 Fixed atom position BC 10 15 20 25 30 35 Base thickness a is large b (nm) (To ensure convergence)

• Computed: the low-energy edge of the lowest electronic miniband • Dots are coupled both via strain and quantum-mechanically • Anomalous dependence for the thinnest cap is due to strong strain relaxation via the top surface of the sample

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Vertically Coupled Two-Dot online simulations and more Molecule

ELECTRONIC QUANTUM- MOLECULAR ENERGIES

1.36

1.35 +H ORBITALS OF 1.34

) UNCOUPLED DOTS

1V .33 e P ( H = 2.83 nm

1y .32

R = 6.5 nm g

1 r 1.31 R2 = 6.96 nm e n D IS A PARAMETER 1.30 E

1.29 1.28 S E2 40 60 80 100 120 140 BONDING ORBITAL Interdot distance D (A) E2

ANTIBONDING ORBITAL E1 E1

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Vertically Coupled Seven-Dot Molecule online simulations and more Identical dots • Application in lasers m m n n

5

6 • Electronics: 6.1 Ma 2 9 . . 2 3 6 = 1 D • 7 Identical dots, without strain m n

=> symmetric miniband 3 8 13.0 nm . with 7 states 2 55.4 nm • Can derive this analytically 84.8 nm from 1 dot simulations

E1 E2 E3

E4

E7 E6 E5

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Vertically Coupled Seven-Dot Molecule online simulations and more Strain asymmetry => Non-identical dots • Application in lasers • Strain: 44.7 Ma m m n n

5

6 • Electronics: 6.1 Ma 2 9 . . 2 3 6 = 1 D • 7 Identical dots, with strain m n

=> asymmetric miniband 3 8 13.0 nm . • Ground state in the BOTTOM! 2 55.4 nm • Cannot derive this analytically! 84.8 nm

EE11 EE22 EE33

EE44

Modeling capability E7 E6 E7 E6 EE55 absolutely unique!!

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO: A User-friendly Quantum Device Design Tool online simulations and more Formalism, Physics, and Technology

Software Engineering Object-Oriented Principles Documentation

Physics Tool Phonons Ionized ResonanceFinder Dopants Formalism Novel Charging Material Green Function Theory Grid Gen. Param. & Boundary Cond. structureBand- Database

Alloy C, FORTRANHybrid Interface FORTRAN90 Roughness Disorder Graphical Library of Examples Batch Run User Interface Interface

Approximately 250,000 lines of code Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Resonant Tunneling Diode online simulations and more t n e r r u C

12 different I-V curves: 2 wafers, 3 mesa sizes, 2 bias directions

Voltage 50nm 1e18 InGaAs 7 ml nid InGaAs 7 ml nid AlAs 20 ml nid InGaAs Conduction band diagrams 7 ml nid AlAs for different voltages 7 ml nid InGaAs 50 nm 1e18 InGaAs and the resulting current flow.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Algorithm Overview online simulations and more

2D integral I ! 2" kdk dET(E,k)( fL (E) $ fR(E)) For many # # bias points! k Total energy “E” Momentum “k” accounts for transverse freedom z Ez

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Transport in Momentum and Energy Space online simulations and more

2D integral I ! 2" kdk dET(E,k)( fL (E) $ fR(E)) # # k

z Ez 1D integral - Tsu-Esaki I ! "2D T(E)( fL (E) $ fR (E))

. . #

0.25 0.25

0.20 0.20

0.15 0.15 Energy (eV) Energy (eV)

DOS (kx=0.00) DOS (kx=0.03) 0.10 0.10 40 50 60 70 80 90 40 50 60 70 80 90 Distance (nm) Distance (nm) Resonance coupling depends on the transverse momentum

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Transport in Momentum and Energy Space online simulations and more Consequences of Approximations

I 2 kdk dET E,k f E f E 2D integral ! " # # ( )( L ( ) $ R( ))

I T E f E f E 1D integral - Tsu-Esaki ! "2D # ( )( L ( ) $ R ( ))

. .

0.25 0.25

0.20 0.20

0.15 0.15 Energy (eV) Energy (eV)

DOS (kx=0.00) DOS (kx=0.03) 0.10 0.10 40 50 60 70 80 90 40 50 60 70 80 90 Distance (nm) Distance (nm)

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Previously Used Approximations online simulations and more 1D Charge & 2D Current

• Charge self-consistency results in a dramatic overshoot • Problem: .alignment of emitter bound and central resonance states .assumption of parabolic transverse subbands.

• Partial Solution: .Perform self-consistency for N,V using the 1D integral. .Follow calculation with a single pass of of a 2D integral for the current.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Really Required: online simulations and more 2D integral throughout

• Charge self-consistency results in a dramatic overshoot • Problem: .alignment of emitter bound and central rsonance states .assumption of parabolic transverse subbands.

• Partial Solution: .Perform self-consistency for N,V using the 1D integral. .Follow calculation with a single pass of of a 2D integral for the current.

• Needed: .Perform self-consistency for N, V, and I using the double integral in k and E. .Single CPU: 1-2 weeks to compute. .Need to parallelize the code.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Need for Parallelization online simulations and more Look at Algorithm

I 2 kdk dET E,k f E f E 2D integral ! " # # ( )( L ( ) $ R( ))

Have three levels of possible parallelization: • Loop over bias points coarse grain • Loop over momentum points medium grain • Loop over energy points fine grain

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Bias Points online simulations and more

•Master-Slave approach .master only distributes and gathers individual bias points •This I-V has 70 bias points: .Good scaling up to 15 CPUs, Strong steps at 36 and 24 CPUs / Load imbalance Max speed-up: 21 at 24 CPU 87% efficiency 30 at 36 CPU 83% efficiency Large CPU speed-up: 32 at 64 CPU 50% efficiency

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Momentum Integral online simulations and more

•J(k) is a smooth function for electrons •Need only a few k-points to resolve

•Exception: .Hole transport J(k) is spiked in k

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Momentum Integral online simulations and more

• K-grid has 21 points: • Even bin distribution in one communication step. • Results: .Good scaling up to 8 CPUs, Strong steps at 11 and 21 CPUs Load imbalance Max speed-up: 10 at 11 CPU 91% efficiency 18 at 21 CPU 86% efficiency Large CPU Speed-up: 18 at 64 CPU 28% efficiency

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Energy Integral online simulations and more

107 105 103 101 10-1 10-3 10-5 10-7 0 0.2 0.4 0.6 0.8 1 • Adaptive mesh refinement starting from 50-200 nodes 107 • Refinement ends on 2 nodes at a time for 1 resonance 106 • Energy grid is at the lowest level -> communication overhead 105 • Maximum speed-up: 7.5 at 10 CPU 75% efficiency 104 15 at 40 CPU 38% efficiency 103

• Large CPU speed-up: 13 at 64 CPU 20% efficiency 102 0.09998 0.1 0.10002

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of One Level online simulations and more

•Parallelization on 64 CPUs is unsatisfactory in all one level algorithms! .For this particular benchmark - typical for electron RTD computation

•How about parallelization on multiple levels?

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Multiple Levels online simulations and more

•Parallelization on 64 CPUs is unsatisfactory in all one level algorithms!

•How about parallelization on multiple levels? .4 possibilities: I-K, I-E, K-E, I-K-E all of them implemented .Try to maximize number of CPUs in the coarser grids

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Bias and Momentum online simulations and more

• Still see some load imbalance problems

• Maximum speed-up: 40 at 43 CPU 93% efficiency 46 at 57 CPU 81% efficiency • Large CPU speed-up: 45 at 64 CPU 70% efficiency

• Compared to I parallelism only: • Large CPU speed-up: 32 at 64 CPU 50% efficiency

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Bias and Energy online simulations and more

• Much fewer load imbalance problems

• Maximum speed-up: 43 at 58 CPU 74% efficiency • Large CPU speed-up: 42 at 64 CPU 65% efficiency

• Compared to I parallelism only: • Large CPU speed-up: 32 at 64 CPU 50% efficiency

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 1-D Parallelization Conclusions online simulations and more

•Efficient parallelization for realistic RTD simulations is non-trivial. •Parallelization on multiple levels: .Flexibility to tackle different problems .Enabled simulation of fully charge self-consistent simulations Reduced compute time from 14 days to 6 hours. •Beowulf clusters are affordable and useful for computational electronics.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org More Physics -> Better results online simulations and more Full band integration + Exchange&Correlation

• Calculate the exchange and correlation potential in the local density approximation. • Exchange and correlation energy does not eliminate (in general) the bistability, it does reduce it however. • Inclusion of scattering in the simulation reduces the bistability region as well.

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallel Image Processing online simulations and more

Parallel Image Processing

Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org nanoHUB: more than computation online simulations and more

online simulation courses, tutorials

Your software?

nanoHUB.org

collaboration learning modules

seminars, themes

Gerhard Klimeck Network for Computational Nanotechnology