nanoHUB.org online simulations and more Network for Computational Nanotechnology (NCN)
Parallel Computing for Realistic Nanoelectronic Simulations Gerhard Klimeck, NCN Technical Director Purdue University
September 12th, 2005
Univ. of Florida, Univ.of Illinois, MIT, Morgan State, Northwestern, Purdue, Stanford, UTEP
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Simulation is essential for online simulations and more Nanotechnology Development
Hint from the Semiconductor Industry: • No new devices / circuits designed without software! Simulation Problem: • Accepted nano simulation tool suite does NOT exist.
Approach: • Conduct research in Modeling and Simulation of: • Nanoelectronics • Nanoelectromechanics • Nano-bio sensors • Computational science • DEVELOP and DEPLOY to nanoscience and nanotechnology community Characterization Fabrication
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Online simulation and MORE online simulations and more collaboration
animations
Real users! Real Usage! online simulation >6,600 nanoHUB.oUsersrg >71,000 simulations
learning modules
seminars courses, tutorials Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Middleware for Deployment online simulations and more
Remote access to simulators and compute power
nanoHUB.org Unique Middleware internet tool Physical Machine
Mostly Batch Operation
Old -system: PUNCH Jobs are run on a specific hardware Most jobs are serial Most jobs are batch oriented and not interactive New System: In-VIGO Jobs can be distributed to various machines Virtual machines can generate specific OS on various machines
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Middleware for Deployment online simulations and more
Remote access to simulators and compute power
nanoHUB.org Unique Middleware internet Virtual Machine tool Physical Machine
RMeomsotltye Bdaetsckht oOpp (eVrNatCio)n
Old -system: PUNCH Jobs are run on a specific hardware Most jobs are serial Job Most jobs are batch oriented and Manager not interactive New System: In-VIGO Jobs can be distributed to various machines Virtual machines can generate TeraGrid Cluster specific OS on various machines (future)
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallel Computing: online simulations and more Why and How? Motivation: Approaches: • Serial computation on single CPU • Maximize the work on each CPU requires • Minimize communication between CPUs .Too much time . Communication is your “enemy” .Too much memory ⇒Identify “identical code segments” that can Available Resources: be treated independently of each other • Traditional parallel computers .with shared memory Technologies: .expensive in FLOPS/$ • Batch processing with shells • Cluster computers / Beowulfs • Independent executables - trivial .Distributed memory - independent • Automated compiler-based parallelism computers on a dedicated network • Typically limited to shared memory machines .“cheap” in FLOPS/$ and few CPUs / threads Objective: • OpenMP - shared memory model • Distribute workload on N CPUs • MPI - message passing interface • Only communicate where programmer decides • Best achievable: Tparallel = Tserial / N Minimize communication => Keep available N CPUs busy • • Runs on almost all parallel computers • Runs on clusters / beowulfs
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Really Embarrassing Parallel online simulations and more Time
1 2 3 4 5 ... N-1
•Spawn independent jobs one single time •Possibly trivial data aggregation
⇒shell-based parallelism or ⇒MPI-based parallelism
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Embarrassingly Parallel online simulations and more
0 Time 1 2 3 4 5 ... N-1
0
1 2 3 4 5 ... N-1
•Data needs to aggregated by one master CPU ... •Master distributes work to slaves repeatedly => MPI-based approaches
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Coherently Parallel Processes online simulations and more Lockstep Parallelism
1 2 3 4 5 6 •Compute
Time 1 2 3 4 5 6 •Communicate odd <=> odd+1
•Communicate 1 2 3 4 5 6 odd <=> odd-1
•Communicate 1 2 3 4 5 6 all-2-all, aggregate
1 2 3 4 5 6 •Compute ... •Repeat
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization on Multiple Levels online simulations and more
•Sometimes the work cannot be easily distributed amongst many CPUs .Maybe there are not enough members in the high level loop .Maybe parallelism is only efficient on a few CPUs on the low level loop •Sometimes the levels of available parallelism change .User runs an experimentwith just a single bias point .User runs a model that has no “k” level parallelism
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Basic Genetic Algorithm online simulations and more
S1’ S1’ S1 S1’ S1’ S1 E E v v o o l u l
u S2’ S2’ S2 F
F S2’ S2’ S2 t C t i i o i i o t t o n n n n m e … e … … … … … s s b s s i n -R -R … Si’ Si’ e … Si’ Si’ a
a S n n e k k t i i n n ... Si’’ Si’’ s ... Si’’ Si’’ g g … ...... … ...... SN SN’ SN SN’ SN’ RIP SN’ RIP
Generation M Generation M+1
• Genetic algorithm parameter optimization is based on: .Survival of good parameter sets .Evolution of new parameter sets .Survival of a diverse population
• Optimization can be performed globally, rather than locally.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Basic Evolution Operations online simulations and more • Each set (Si) consists of several parameters (Pj) • The parameters Pj can be of different kinds: real, integers, symbols, …. Gross Exploration Fine Tuning Set PI PI Set 1 1’ P2 Q2 … … Pk Pk Set PI PI Set 3 3’ P2 P2’ Set QI QI Set 2 2’ … … Q2 P2 Pk Pk … … Qk Qk Mutation operation Crossover operation
•Crossover explores different combinations of existing genes. •Creation of new gene values.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Global Optimization via Genetic Algorithms online simulations and more
sin(x) sin(y) sin(x ! 4) sin(y ! 4) F(x, y) = + 0.7 x y (x ! 4) (y ! 4)
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Global Optimization: online simulations and more Genetic Algorithm Development
Genetic Algorithm Convergence pop = 100, 300 generations, steady-state (10%), 2-point crossover p = 0.85, mutatation p = 1/2
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Embarrassingly Parallel online simulations and more
0 Time 1 2 3 4 5 ... N-1
0
1 2 3 4 5 ... N-1
•Data needs to aggregated by one master CPU ... •Master distributes work to slaves repeatedly => MPI-based approaches
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org GENES - RTD Structural Analysis online simulations and more
•Allow genetic algorithm to vary 5 • Start from “random” population of different structural parameters: 5 parameters. .3 Thicknesses: well, barrier, spacer • Well width is larger than nominal. .2 Dopings: low doped spacer, • No intentional doping is larger unintentional doping in center than nominal.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org online simulations and more Mapping of Orbitals to Bulk Bandstructure
Bulk Semiconductors are described by: • Conduction and valence bands, bandgaps (direct, indirect), effective masses • 10-30 physically measurable quantities Tight Binding Models are described by: Atomic Orbitals Structure size: 0.2nm • Orbital interaction energies. • 15-30 theoretical parameters
z
High Dimensional y Fitting Problem (1/4)Vsa, pc anion s orbital x coupled to cation p orbital 15-30 theoretical interaction energies 10-30 data points of bands and masses
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Semiconductor Compounds: online simulations and more cation: In, Ga, Al anion: Sb, As, P In Ga Al •Match experimental data in various electron transport Sb areas of the Brillouin zone: .Effective masses of electrons at Γ, X and L .Effective masses of holes at Γ .Bandedges at Γ, X and L
•Each individual material As poses a 15 dimensional fitting problem.
z
P y (1/4)Vsa, pc anion s orbital x coupled to cation p orbital
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Simulation in Si and Ge: 2nd Nearest Neighbor Model - online simulations and more Optimization in a 37-dimensional Space
Value Target Rel. Error Description 0.0006 0.0000 5.72E-04 Vhh_E Si Target: 0.6760 0.7500 9.87E-02 kz_X_Delta •Energy at X 1.1330 1.1300 2.62E-03 E_kz_X_Delta 0.8931 0.9163 2.53E-02 mlong_kz_X_Delta •Masses at X 0.1932 0.1905 1.40E-02 mtran_kz_X_Delta -0.1515 -0.1530 1.00E-02 Vlh_mlong_001 •Hole masses -0.5281 -0.5370 1.65E-02 Vhh_mlong_001 -0.2399 -0.2340 2.52E-02 Vso_mlong_001 0.0451 0.0450 3.25E-03 Delta_so 1.5745 1.6000 1.59E-02 E_kz_K_Delta 2.8168 3.3500 1.59E-01 Cgam_Eg 1.7742 2.0500 1.35E-01 E_kz_L_Delta
Value Target Rel. Error Description 0.0001968 0 1.97E-04 Vhh_E Ge Target: 0.664905 0.664 1.36E-03 E_kz_L_Delta 1.575913 1.59 8.86E-03 mlong_kz_L_Delta •Energy at L 0.081305 0.0807 7.50E-03 mtran_kz_L_Delta •Masses at L -0.04369 -0.0438 2.61E-03 Vlh_mlong_001 -0.04232 -0.0426 6.48E-03 Vlh_mlong_011 •Hole masses -0.04191 -0.043 2.54E-02 Vlh_mlong_111 -0.27446 -0.284 3.36E-02 Vhh_mlong_001 -0.34398 -0.376 8.52E-02 Vhh_mlong_011 Exhaustive search expl: -0.37411 -0.352 6.28E-02 Vhh_mlong_111 1037 evaluations -0.09513 -0.095 1.40E-03 Vso_mlong_001 1 evaluation/sec 0.0393357 0.038 3.51E-02 Cgam_mlong 0.8012496 0.805 4.66E-03 Cgam_Eg Runtime on 1000 CPUs: 26 0.8 0.84 4.76E-02 kz_X_Delta => 3x10 years 1.1020801 1.16 4.99E-02 E_kz_X_Delta 0.2882422 0.3 3.92E-02 Delta_so PGA: 12hrs 128CPUs
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org GENES: online simulations and more Genetically Engineered Nanostructures •Genetic Algorithms: . Can find GLOBAL minima . Explore a design space driven by design requirements
•Demonstrsted application to: . Optical filter designs . Nanodevice design Inverse problem solution - What was being bullt? . Nanoscale basis representation optimization
•Runs on cheap Beowulf cluster computers!
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 3-D online simulations and more Multi-Million Atom Electronic Structure Simulations
Designed Optical Transitions Sensors
Quantum Dot Array s Atomic Orbitals Structure Nanoscale Quantum States Computing size: 0.2nm Millions of atoms (Artificial Atoms, size 20nm)
Eigenvector problem for NSF TeraGrid - NCSA Itanium 8 Demonstrated matrices of order 10 for the first time 21 Million atom system
Volume: (78nm)3
Will deploy on Cluster nanoHUB next year
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 3-D calculation online simulations and more sequence
STRAIN+ELECTRONIC
NEMO 3-D calculation sequence: h 1. Geometry Construction STRAIN 2. Strain: valence force field method (VFF)
3. Electronic Structure: 20-band nearest- d neighbor tight-binding model InAs QDs embedded in a GaAs barrier material d=18.09 nm and h=1.7 nm, on a 0.6nm wetting layer
Strain computed in huge domain
Electronic structure computed in central domain
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization and Methods online simulations and more
• Divide Simulation domain into slices. • Strain computed in open source CG algorithm • Communication only from one slice to the • Electronic structure needs eigenvalues and next (nearest neighbor) eigenvectors. Matrix is Hermitian • Communication overhead across the • Released NEMO 3-D methods: surfaces of the slices. • Standard 2-pass Lanczos • PARPACK about 10x slower • Limiting operation: • Folded Spectrum Method (Zunger), also complex sparse matrix-vector multiplication typically slower than Lanczos • Enable Hamiltonian storage or re-computation on the fly.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Coherently Parallel Processes online simulations and more Lockstep Parallelism
1 2 3 4 5 6 •Compute
Time 1 2 3 4 5 6 •Communicate odd <=> odd+1
•Communicate 1 2 3 4 5 6 odd <=> odd-1
•Communicate 1 2 3 4 5 6 all-2-all, aggregate
1 2 3 4 5 6 •Compute ... •Repeat
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization Benchmark online simulations and more Multi-machine comparison
Comparison:
Strain - communication limited Electronics scale well Cluster networks appear noisy Clusters deliver better FLOPs/$
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 3-D Memory Requirements online simulations and more d i r
Memory Requirements: g a r
• How much memory is needed? e T
• Is the memory used efficiently? F S N
n
• Strain calculation tops off in terms of o
efficiency d e m
• Electronic structure requirements do r o f not top off yet r e P
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Proof of Concept online simulations and more Extraction of Targeted Interior Eigenstates
N states N~2mln
P ~40 eV ~10 meV S
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Proof of Concept online simulations and more Extraction of Targeted Interior Eigenstates
11.3 nm 33.9 nm 50.9 nm 22.6 nm 67.8 nm
101.7 nm
N states N~2mln
P S
Unique and targeted eigenstates of correct symmetry can be computed in all electronic computational domains Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Extent of the Strain Field online simulations and more
2 MLN ATOMS h
1.39 Electron states 1.38
1.37
) 1.36 V P e
d ( 1.35 MFixReEEd BSU.CR.FACE y g r
e 1.34 • Domain ratio d/h is fixed to 2 n E DEEPLY BUFixReIEDd B.C. • Electronic domain always 1.33 1.32 contains 2 mln atoms S 1.31 • Strain domain contains 40 60 80 100 120 140 160 up to 64 mln atoms d (nm) • Computations for deeply buried dot and dot array with free surface Result: strain field is long-range!
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Arrays of quantum dots: online simulations and more neighbor distance and finite cap size
Cap thickness c Interdot distance b parameter Measured from Electronic energies QD edge to edge 1.316
Free / open surface BC 1.315 c = 23.18 nm )
V 1.314 e c = 17.52 nm b ( c y 1.313 g r e
n 1.312
a E 1.311 c = 11.87 nm 1.310 Fixed atom position BC 10 15 20 25 30 35 Base thickness a is large b (nm) (To ensure convergence)
• Computed: the low-energy edge of the lowest electronic miniband • Dots are coupled both via strain and quantum-mechanically • Anomalous dependence for the thinnest cap is due to strong strain relaxation via the top surface of the sample
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Vertically Coupled Two-Dot online simulations and more Molecule
ELECTRONIC QUANTUM- MOLECULAR ENERGIES
1.36
1.35 +H ORBITALS OF 1.34
) UNCOUPLED DOTS
1V .33 e P ( H = 2.83 nm
1y .32
R = 6.5 nm g
1 r 1.31 R2 = 6.96 nm e n D IS A PARAMETER 1.30 E
1.29 1.28 S E2 40 60 80 100 120 140 BONDING ORBITAL Interdot distance D (A) E2
ANTIBONDING ORBITAL E1 E1
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Vertically Coupled Seven-Dot Molecule online simulations and more Identical dots • Application in lasers m m n n
5
6 • Electronics: 6.1 Ma 2 9 . . 2 3 6 = 1 D • 7 Identical dots, without strain m n
=> symmetric miniband 3 8 13.0 nm . with 7 states 2 55.4 nm • Can derive this analytically 84.8 nm from 1 dot simulations
E1 E2 E3
E4
E7 E6 E5
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Vertically Coupled Seven-Dot Molecule online simulations and more Strain asymmetry => Non-identical dots • Application in lasers • Strain: 44.7 Ma m m n n
5
6 • Electronics: 6.1 Ma 2 9 . . 2 3 6 = 1 D • 7 Identical dots, with strain m n
=> asymmetric miniband 3 8 13.0 nm . • Ground state in the BOTTOM! 2 55.4 nm • Cannot derive this analytically! 84.8 nm
EE11 EE22 EE33
EE44
Modeling capability E7 E6 E7 E6 EE55 absolutely unique!!
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO: A User-friendly Quantum Device Design Tool online simulations and more Formalism, Physics, and Technology
Software Engineering Object-Oriented Principles Documentation
Physics Tool Phonons Ionized ResonanceFinder Dopants Formalism Novel Charging Material Green Function Theory Grid Gen. Param. & Boundary Cond. structureBand- Database
Alloy C, FORTRANHybrid Interface FORTRAN90 Roughness Disorder Graphical Library of Examples Batch Run User Interface Interface
Approximately 250,000 lines of code Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Resonant Tunneling Diode online simulations and more t n e r r u C
12 different I-V curves: 2 wafers, 3 mesa sizes, 2 bias directions
Voltage 50nm 1e18 InGaAs 7 ml nid InGaAs 7 ml nid AlAs 20 ml nid InGaAs Conduction band diagrams 7 ml nid AlAs for different voltages 7 ml nid InGaAs 50 nm 1e18 InGaAs and the resulting current flow.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Algorithm Overview online simulations and more
2D integral I ! 2" kdk dET(E,k)( fL (E) $ fR(E)) For many # # bias points! k Total energy “E” Momentum “k” accounts for transverse freedom z Ez
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Transport in Momentum and Energy Space online simulations and more
2D integral I ! 2" kdk dET(E,k)( fL (E) $ fR(E)) # # k
z Ez 1D integral - Tsu-Esaki I ! "2D T(E)( fL (E) $ fR (E))
. . #
0.25 0.25
0.20 0.20
0.15 0.15 Energy (eV) Energy (eV)
DOS (kx=0.00) DOS (kx=0.03) 0.10 0.10 40 50 60 70 80 90 40 50 60 70 80 90 Distance (nm) Distance (nm) Resonance coupling depends on the transverse momentum
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Transport in Momentum and Energy Space online simulations and more Consequences of Approximations
I 2 kdk dET E,k f E f E 2D integral ! " # # ( )( L ( ) $ R( ))
I T E f E f E 1D integral - Tsu-Esaki ! "2D # ( )( L ( ) $ R ( ))
. .
0.25 0.25
0.20 0.20
0.15 0.15 Energy (eV) Energy (eV)
DOS (kx=0.00) DOS (kx=0.03) 0.10 0.10 40 50 60 70 80 90 40 50 60 70 80 90 Distance (nm) Distance (nm)
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Previously Used Approximations online simulations and more 1D Charge & 2D Current
• Charge self-consistency results in a dramatic overshoot • Problem: .alignment of emitter bound and central resonance states .assumption of parabolic transverse subbands.
• Partial Solution: .Perform self-consistency for N,V using the 1D integral. .Follow calculation with a single pass of of a 2D integral for the current.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Really Required: online simulations and more 2D integral throughout
• Charge self-consistency results in a dramatic overshoot • Problem: .alignment of emitter bound and central rsonance states .assumption of parabolic transverse subbands.
• Partial Solution: .Perform self-consistency for N,V using the 1D integral. .Follow calculation with a single pass of of a 2D integral for the current.
• Needed: .Perform self-consistency for N, V, and I using the double integral in k and E. .Single CPU: 1-2 weeks to compute. .Need to parallelize the code.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Need for Parallelization online simulations and more Look at Algorithm
I 2 kdk dET E,k f E f E 2D integral ! " # # ( )( L ( ) $ R( ))
Have three levels of possible parallelization: • Loop over bias points coarse grain • Loop over momentum points medium grain • Loop over energy points fine grain
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Bias Points online simulations and more
•Master-Slave approach .master only distributes and gathers individual bias points •This I-V has 70 bias points: .Good scaling up to 15 CPUs, Strong steps at 36 and 24 CPUs / Load imbalance Max speed-up: 21 at 24 CPU 87% efficiency 30 at 36 CPU 83% efficiency Large CPU speed-up: 32 at 64 CPU 50% efficiency
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Momentum Integral online simulations and more
•J(k) is a smooth function for electrons •Need only a few k-points to resolve
•Exception: .Hole transport J(k) is spiked in k
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Momentum Integral online simulations and more
• K-grid has 21 points: • Even bin distribution in one communication step. • Results: .Good scaling up to 8 CPUs, Strong steps at 11 and 21 CPUs Load imbalance Max speed-up: 10 at 11 CPU 91% efficiency 18 at 21 CPU 86% efficiency Large CPU Speed-up: 18 at 64 CPU 28% efficiency
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Energy Integral online simulations and more
107 105 103 101 10-1 10-3 10-5 10-7 0 0.2 0.4 0.6 0.8 1 • Adaptive mesh refinement starting from 50-200 nodes 107 • Refinement ends on 2 nodes at a time for 1 resonance 106 • Energy grid is at the lowest level -> communication overhead 105 • Maximum speed-up: 7.5 at 10 CPU 75% efficiency 104 15 at 40 CPU 38% efficiency 103
• Large CPU speed-up: 13 at 64 CPU 20% efficiency 102 0.09998 0.1 0.10002
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of One Level online simulations and more
•Parallelization on 64 CPUs is unsatisfactory in all one level algorithms! .For this particular benchmark - typical for electron RTD computation
•How about parallelization on multiple levels?
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Multiple Levels online simulations and more
•Parallelization on 64 CPUs is unsatisfactory in all one level algorithms!
•How about parallelization on multiple levels? .4 possibilities: I-K, I-E, K-E, I-K-E all of them implemented .Try to maximize number of CPUs in the coarser grids
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Bias and Momentum online simulations and more
• Still see some load imbalance problems
• Maximum speed-up: 40 at 43 CPU 93% efficiency 46 at 57 CPU 81% efficiency • Large CPU speed-up: 45 at 64 CPU 70% efficiency
• Compared to I parallelism only: • Large CPU speed-up: 32 at 64 CPU 50% efficiency
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallelization of Bias and Energy online simulations and more
• Much fewer load imbalance problems
• Maximum speed-up: 43 at 58 CPU 74% efficiency • Large CPU speed-up: 42 at 64 CPU 65% efficiency
• Compared to I parallelism only: • Large CPU speed-up: 32 at 64 CPU 50% efficiency
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org NEMO 1-D Parallelization Conclusions online simulations and more
•Efficient parallelization for realistic RTD simulations is non-trivial. •Parallelization on multiple levels: .Flexibility to tackle different problems .Enabled simulation of fully charge self-consistent simulations Reduced compute time from 14 days to 6 hours. •Beowulf clusters are affordable and useful for computational electronics.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org More Physics -> Better results online simulations and more Full band integration + Exchange&Correlation
• Calculate the exchange and correlation potential in the local density approximation. • Exchange and correlation energy does not eliminate (in general) the bistability, it does reduce it however. • Inclusion of scattering in the simulation reduces the bistability region as well.
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Presentation Outline online simulations and more •NanoHUB.org - Online Simulation and More •Parallel Applications: .Motivation .Approaches •Embarrassingly Parallel .Genetic Algorithm Optimization .GENES - Genetically engineered nanoelectronic structures •Lock-Step Parallelism .NEMO 3-D - multimillion atom quantum dot simulation .Parallel performance .Simulation results •Multi-Level Parallelism .NEMO 1-D - a real CAD tool with dynamic parallelism requirements
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org Parallel Image Processing online simulations and more
Parallel Image Processing
Gerhard Klimeck Network for Computational Nanotechnology nanoHUB.org nanoHUB: more than computation online simulations and more
online simulation courses, tutorials
Your software?
nanoHUB.org
collaboration learning modules
seminars, themes
Gerhard Klimeck Network for Computational Nanotechnology