Case Study: Beyond Homogeneous Decomposition with Qbox Scaling Long-Range Forces on Massively Parallel Systems

Case Study: Beyond Homogeneous Decomposition with Qbox Scaling Long-Range Forces on Massively Parallel Systems

Case Study: Beyond Homogeneous Decomposition with Qbox Scaling Long-Range Forces on Massively Parallel Systems Donald Frederick, LLNL – Presented at Supercomputing ‘11 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551! LLNL-PRES-508651 Case Study: Outline • Problem DescripBon • Computaonal Approach • Changes For Scaling LLNL-PRES-508651 Computer simulaons oF materials Computer simulaons are widely used to predict the properBes oF new materials or understand the properes of exisng ones LLNL-PRES-508651 Simulaon oF Materials From First- First-principles methods: Principles Calculate properBes oF a given material directly From Fundamental physics equaons. • No empirical parameters Can make predicons about complex or novel materials, under condi-ons where experimental data is unavailable or inconclusive. • Chemically dynamic As atoms move, chemical bonds can be formed or broken. Electron density surrounding • Computaonally expensive water molecules, calculated Solving quantum equa-ons is -me-consuming, limi-ng From first-principles systems sizes (e.g. hundreds of atoms) and simula-on -mescales (e.g. picoseconds) LLNL-PRES-508651 Quantum Mechanics is Hard ProperBes oF a many-body quantum system are given by Schrödinger's Equaon: 2 ⎡⎤−h 2 vv vv v ⎢⎥∇+V()V()V()ionrr + H + xc rrψεψiii () = () r ⎣⎦2m Exact numerical soluBon has exponenal complexity in the number oF quantum degrees oF Freedom, making a brute Force approach impracBcal (e.g. a system with 10,000 electrons would take 2.69 x 1043 Bmes longer to solve than one with only 100 electrons!) Approximate soluBons are needed: Quantum Monte Carlo stochasBcally sample high-dimensional phase space O(N3) Density Func9onal Theory use approximate Form For exchange and correlaon oF electrons Both methods are O(N3), which is expensive, but tractable LLNL-PRES-508651 Density FuncBonal Theory Density FuncBonal theory: total energy oF system can be wriZen as a FuncBonal oF the electron density occ vv*2⎛⎞11 v vvion v ZZ ER{},{}ψψψρ= 2 drr ()−∇ () rdrrVr+ () () +IJ [ iI] ∑∑∫∫ i⎜⎟ i iJI⎝⎠22≠ RRIJ− v xc 1()vv vρ rʹ ++Edrrdr[]ρρ () ʹ vv 2 ∫∫rr− ʹ many-body effects are approximated by a density FuncBonal exchange-correlaon Solve Kohn-Sham equaons selF-consistently to find the ground state electron density defined by the minimum oF this energy FuncBonal (For a given set oF ion posiBons) The total energy allows one to directly compare different structures and configuraons LLNL-PRES-508651 Stac Calculaons: Total Energy Basic idea: For a given configuraon oF atoms, we find the electronic density which gives the lowest total energy Bonding is determined by electronic structure, no bonding inFormaon is supplied a priori LLNL-PRES-508651 First-Principles Molecular Dynamics (FPMD) For dynamical properBes, both ions and electrons have to move simultaneously and consistently: 1. Solve Schrodinger’s Equaon to find ground state electron density. 2. Compute inter-atomic Forces. 3. Move atoms incrementally Forward in Bme. 4. Repeat. Because step 1 represents the vast majority (>99%) oF the computaonal effort, we want to choose Bme steps that are as large as possible, but small enough so that previous soluBon is close to current one Bme t move atoms Bme t + dt LLNL-PRES-508651 PredicBve simulaons oF melBng SOLID LIQUID T = 800 K T = 900 K t = 1.2 ps t = 0.6 ps t = 2.4 ps t = 1.7 ps freezing melng A two-phase simulaon approach combined with local order analysis allows one to calculate phase behavior such as melBng lines with unprecedented accuracy. LLNL-PRES-508651 High-Z Metals From First-Principles • A detailed theoreBcal understanding oF high-Z Effect of radioactive decay metals is important For stockpile stewardship. • Simulaons may require many atoms to capture anisotropy. Dynamical properBes such as melBng require enough atoms to minimize finite-size effects. Response to shocks • Computaonal cost depends on number oF electrons, not atoms. A melBng simulaon oF a high-Z metal with 10-20 valence electrons/ atom will be 1000-8000 mes as expensive as Evolution of aging damage hydrogen! • The complex electronic structure oF high-Z metals requires large basis sets and mulBple k- points. We need a really big computer! LLNL-PRES-508651 The Plaorm: BlueGene/L System (64 cabinets, 64x32x32) 65,536 nodes Cabinet (131,072 CPUs) (32 Node boards, 8x8x16) Node Board (32 chips, 4x4x2) 512 MB/node 16 Compute Cards Compute Card (2 chips, 2x1x1) 180/360 TF/s 16 TB DDR Chip (2 processors) 2.9/5.7 TF/s 256 GB DDR 90/180 GF/s 367 TFlop/s peak 8 GB DDR perFormance 5.6/11.2 GF/s 2.8/5.6 GF/s 0.5 GB DDR 4 MB Can an FPMD code use 131,072 CPUs efficiently? The Code: Qbox Qbox, wriZen by François Gygi, was designed From the ground up to be massively parallel – C++/MPI – Parallelized over plane waves and electronic states – Parallel linear algebra handled with ScaLAPACK/PBLAS libraries. – Communicaon handled by BLACS library and direct MPI calls. – Single-node dual-core dgemm, zgemm opBmized by John Gunnels, IBM – Uses custom distributed 3D complex Fast Fourier TransForms Implementaon details: – Wave FuncBon described with a plane wave basis – Periodic boundary condiBons – PseudopotenBals are used to represent electrons in atomic core, and thereby reduce the total number oF electrons in the simulaon. LLNL-PRES-508651 Qbox Code Structure Qbox ScaLAPACK/PBLAS XercesC (XML parser) BLAS/MASSV BLACS FFTW lib DGEMM lib MPI LLNL-PRES-508651 Parallel SoluBon oF Kohn-Sham Equaons How to efficiently distribute problem over tens oF thousands oF processors? ⎧− Δϕi +V (ρ,r)ϕi = εiϕi i =1…Nel Kohn-Sham equaons ⎪ – soluBons represent molecular ⎪ ρ(rʹ) V (ρ,r) = V (r) + drʹ +V (ρ(r),∇ρ(r)) orbitals (one per electron) ⎪ ion ∫ r − rʹ XC ⎪ – molecular orbitals are complex ⎨ N 3 el 2 scalar FuncBons in R ⎪ρ(r) = ϕ (r) ⎪ ∑ i – coupled, non-linear PDEs ⎪ i=1 – ⎪ ∗ periodic boundary condiBons ϕ (r)ϕ (r) dr = δ ⎩⎪∫ i j ij We use a plane-wave basis For orbitals: ()r ceiqr⋅ ϕkkq,,nn= ∑ + 2 kq+<Ecut Represent func9on as a Fourier series CareFul distribuBon oF wave FuncBon is key to achieving good parallel efficiency Algorithms used in FPMD • Solving the KS equaons: a constrained opBmizaon problem in ⎧− Δϕi +V (ρ,r)ϕi = εiϕi i =1…Nel ⎪ the space oF coefficients cqn ⎪ ρ(rʹ) V (ρ,r) = V (r) + drʹ +V (ρ(r),∇ρ(r)) • Poisson equaon: 3-D FFTs ⎪ ion ∫ r − rʹ XC ⎪ • Computaon oF the electronic charge: ⎨ N el 2 ⎪ρ(r) = ϕ (r) 3-D FFTs ⎪ ∑ i ⎪ i=1 • Orthogonality constraints require ⎪ ∗ ϕ (r)ϕ (r) dr = δ ⎩⎪∫ i j ij dense, complex linear algebra (e.g. A = CHC) Overall cost is O(N3) for N electrons LLNL-PRES-508651 Qbox communica+on pa9erns The matrix oF coefficients cq,n is block ()r ceiqr⋅ distributed (ScaLAPACK data layout) ϕnn= ∑ q, 2 q <Ecut n electronic states 0 8 16 24 Ncolumns 1 9 17 25 0 8 16 24 2 10 18 26 1 9 17 25 q plane 3 11 19 27 N 2 10 18 26 waves (q rows 3 11 19 27 >> n) 4 12 20 28 4 12 20 28 5 13 21 29 5 13 21 29 6 14 22 30 6 14 22 30 7 15 23 31 7 15 23 31 wave FuncBon matrix c process grid LLNL-PRES-508651 Qbox communica+on pa9erns • Computaon oF 3-D Fourier transForms MPI_Alltoallv LLNL-PRES-508651 Qbox communica+on pa9erns • Accumulaon oF electronic charge density MPI_Allreduce LLNL-PRES-508651 Qbox communica+on pa9erns • ScaLAPACK dense linear algebra operaons PDGEMM/PZGEMM PDSYRK PDTRSM LLNL-PRES-508651 Controlling Numerical Errors • For systems with complex electronic structure like high-Z metals, the desired high accuracy is achievable only through careFul control oF all numerical errors. • Numerical errors which must be controlled: – Convergence of Fourier series ()r ceiqr⋅ ϕnn= ∑ q, 2 q <Ecut – Convergence oF system size (number oF atoms) 2 ρψ()rrk= () d 3 – Convergence oF k-space integraon ∫ k,n BZ • We need to systemacally increase – Plane-wave energy cutoff Let’s start with these – Number oF atoms – Number oF k-points in the Brillouin zone integraon BlueGene/L allows us to ensure convergence of all three approximaons High-Z test problem: Mo1000 • Electronic structure oF a 1000-atom Molybdenum sample • 12,000 electrons • 32 non-local projectors For pseudopotenBals • 112 Ry plane-wave energy cutoff Mo1000 • High-accuracy parameters We wanted a test problem which was representave oF the next generaon oF high-Z materials simulaons while at the same Bme straighlorward For others to replicate and compare perFormance LLNL-PRES-508651 QBox PerFormance 1000 Mo atoms: 8 k-points! 207.3 TFlop/s 112 Ry cutoff (56% of peak)! 4 k-points! 12 electrons/atom 1 k-point (2006 Gordon Bell Award)" Communication Comm. Optimizations Complex Arithmetic related Optimal Node Mapping Dual Core MM 1 k-point! LLNL-PRES-508651 PerFormance measurements •! We use the PPC440 HW perFormance counters •! Access the HW counters using the APC library (J. Sexton, IBM) –!Provides a summary file For each task •! Not all double FPU operaons can be counted –!DFPU Fused mulBply-add: ok –!DFPU add/sub: not counted •! FP operaons on the second core are not counted LLNL-PRES-508651 PerFormance measurements •! Using a single-core DGEMM/ZGEMM library –! Measurements are done in three steps: •! 1) count FP ops without using the DFPU •! 2) measure Bmings using the DFPU •! 3) compute flop rate using 1) and 2) •! Problem: some libraries use the DFPU and are not under our control •! DGEMM/ZGEMM uses mostly Fused mulBply-add ops •! In pracBce, we use 2). Some DFPU add/sub are not counted LLNL-PRES-508651 PerFormance measurements •! Using a dual-core DGEMM/ZGEMM library: –!FP operaons on the second core are not counted –!In this case, we must use a 3 step approach: •! 1) count FP ops using the single-core

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    50 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us