Summer School CEA-EDF-INRIA
Toward petaflop numerical simulation on parallel hybrid architectures
BigDFT and GPU INRIACENTREDE RECHERCHE SOPHIA ANTIPOLIS,FRANCE Atomistic Simulations
DFT Ab initio codes
BigDFT Wavelet-Based DFT calculations on Massively
Properties BigDFT and GPUs Parallel Hybrid Architectures Code details
BigDFT and HPC GPU Luigi Genovese Practical cases
Discussion Messages L_Sim – CEA Grenoble
June 9, 2011 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Outline
1 Review of Atomistic Simulations Density Functional Theory Ab initio codes
BigDFT and 2 The BigDFT project GPU Formalism and properties Atomistic Simulations The needs for hybrid DFT codes DFT Main operations, parallelisation Ab initio codes
BigDFT 3 Properties Performance evaluation BigDFT and GPUs Code details Evaluating GPU gain
BigDFT and Practical cases HPC
GPU Practical cases 4 Concrete examples Discussion Messages Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Review of Atomistic Simulations
A interdisciplinary domain Theory – Experiment – Simulation Hardware – Computers BigDFT and Algorithms GPU
Atomistic Simulations Different Atomistic Simulations DFT Ab initio codes Force fields (interatomic potentials) BigDFT
Properties Tight Binding Methods BigDFT and GPUs Code details Hartree-Fock BigDFT and HPC Density Functional Theory
GPU Practical cases Configuration interactions Discussion
Messages Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Quantum mechanics for many particle systems
Can we do quantum mechanics on systems of many atoms? Decoupling of the nuclei and electron dynamics Born-Oppenheimer approximation: The position of the nuclei can be considered as fixed,
BigDFT and obtaining the potential “felt” by the electrons GPU n Za Atomistic Vext(r,{R1,··· ,Rn}) = − ∑ Simulations a=1 |r − Ra| DFT Ab initio codes BigDFT Electronic Schrödinger equation Properties BigDFT and GPUs The system properties are described by the ground state Code details
BigDFT and wavefunction ψ(r1,··· ,rN ), which solves Schrödinger HPC equation GPU Practical cases H [{R}]ψ = Eψ Discussion Messages The quantum hamiltonian depends on the set of the atomic positions {R}.
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Atomistic Simulations
Two intrinsic difficulties for numerical atomistic simulations, related to complexity: Interactions The way that atoms interact is known: ∂Ψ BigDFT and i = H Ψ H ψ = E0ψ GPU } ∂t Exploration of the configuration space Atomistic Simulations
DFT Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details
BigDFT and HPC
GPU Practical cases
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Atomistic Simulations
Two intrinsic difficulties for numerical atomistic simulations, related to complexity: Interactions The way that atoms interact is known: ∂Ψ BigDFT and i = H Ψ H ψ = E0ψ GPU } ∂t Exploration of the configuration space Atomistic Simulations E DFT pot Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details BigDFT and R1 HPC
GPU Practical cases
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Atomistic Simulations
Two intrinsic difficulties for numerical atomistic simulations, related to complexity: Interactions The way that atoms interact is known: ∂Ψ BigDFT and i = H Ψ H ψ = E0ψ GPU } ∂t Exploration of the configuration space Atomistic Simulations E DFT pot Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details BigDFT and R1 HPC
GPU Practical cases R2
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Atomistic Simulations
Two intrinsic difficulties for numerical atomistic simulations, related to complexity: Interactions The way that atoms interact is known: ∂Ψ BigDFT and i = H Ψ H ψ = E0ψ GPU } ∂t Exploration of the configuration space Atomistic Simulations E DFT pot Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details BigDFT and R1 HPC
GPU Practical cases R3 R2
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Atomistic Simulations
Two intrinsic difficulties for numerical atomistic simulations, related to complexity: Interactions The way that atoms interact is known: ∂Ψ BigDFT and i = H Ψ H ψ = E0ψ GPU } ∂t Exploration of the configuration space Atomistic Simulations E DFT pot Ab initio codes R1000 BigDFT
Properties BigDFT and GPUs Code details BigDFT and R1 HPC
GPU R41 R3 Practical cases Rn R2
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Choice of Atomistic Methods
3 criteria 1→ Generality (elements, alloys) G P 2→ Precision (∆r, ∆E) 3→ System size (N, ∆t) BigDFT and S GPU
Atomistic Chemistry and Physics Simulations
DFT Ab initio codes Force Fields BigDFT G P G P G P Properties Tight Binding BigDFT and GPUs Code details S S S Hartree-Fock BigDFT and HPC DFT GPU G P G P G P Practical cases Conf. Inter. Discussion Messages S S S Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Choice of Atomistic Methods
3 criteria 1→ Generality (elements, alloys) G P 2→ Precision (∆r, ∆E) 3→ System size (N, ∆t) BigDFT and S GPU
Atomistic Chemistry and Physics Simulations
DFT Ab initio codes Force Fields BigDFT G P G P G P Properties Tight Binding BigDFT and GPUs Code details S S S Hartree-Fock BigDFT and HPC DFT GPU G P G P G P Practical cases Conf. Inter. Discussion Messages S S S Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Choice of Atomistic Methods
3 criteria 1→ Generality (elements, alloys) G P 2→ Precision (∆r, ∆E) 3→ System size (N, ∆t) BigDFT and S GPU
Atomistic Chemistry and Physics Simulations
DFT Ab initio codes Force Fields BigDFT G P G P G P Properties Tight Binding BigDFT and GPUs Code details S S S Hartree-Fock BigDFT and HPC DFT GPU G P G P G P Practical cases Conf. Inter. Discussion Messages S S S Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Choice of Atomistic Methods
3 criteria 1→ Generality (elements, alloys) G P 2→ Precision (∆r, ∆E) 3→ System size (N, ∆t) BigDFT and S GPU
Atomistic Chemistry and Physics Simulations
DFT Ab initio codes Force Fields BigDFT G P G P G P Properties Tight Binding BigDFT and GPUs Code details S S S Hartree-Fock BigDFT and HPC DFT GPU G P G P G P Practical cases Conf. Inter. Discussion Messages S S S Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Choice of Atomistic Methods
3 criteria 1→ Generality (elements, alloys) G P 2→ Precision (∆r, ∆E) 3→ System size (N, ∆t) BigDFT and S GPU
Atomistic Chemistry and Physics Simulations
DFT Ab initio codes Force Fields BigDFT G P G P G P Properties Tight Binding BigDFT and GPUs Code details S S S Hartree-Fock BigDFT and HPC DFT GPU G P G P G P Practical cases Conf. Inter. Discussion Messages S S S Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Choice of Atomistic Methods
3 criteria 1→ Generality (elements, alloys) G P 2→ Precision (∆r, ∆E) 3→ System size (N, ∆t) BigDFT and S GPU
Atomistic Chemistry and Physics Simulations
DFT Ab initio codes Force Fields BigDFT G P G P G P Properties Tight Binding BigDFT and GPUs Code details S S S Hartree-Fock BigDFT and HPC DFT GPU G P G P G P Practical cases Conf. Inter. Discussion Messages S S S Quantum Monte-Carlo
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese The Hohenberg-Kohn theorem
A tremendous numerical problem
N 1 1 1 H = − ∇2 + V (r ,{R}) + ∑ ri ext i ∑ i=1 2 2 i6=j |ri − rj |
BigDFT and The Schrödinger is very difficult to solve for more than two GPU electrons! Another approach is imperative
Atomistic Simulations The fundamental variable of the problem is however not the DFT Ab initio codes wavefunction, but the electronic density BigDFT Z ∗ Properties ρ(r) = N dr2 ···drN ψ (r,r2,··· ,rN )ψ(r,r2,··· ,rN ) BigDFT and GPUs Code details BigDFT and Hohenberg-Kohn theorem (1964) HPC
GPU Practical cases The ground state density ρ(r) of a many-electron system
Discussion uniquely determines (up to a constant) the external potential . Messages The external potential is a functional of the density
Vext = Vext[ρ]
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese The Kohn-Sham approximation
Given the H-K theorem, it turns out that the total electronic energy is an unknown functional of the density E = E[ρ] =⇒ Density Functional Theory DFT (Kohn-Sham approach)
BigDFT and Mapping of a interacting many-electron system into a system GPU with independent particles moving into an effective potential. Atomistic Simulations DFT Find a set of orthonormal orbitals Ψ (r) that minimizes: Ab initio codes i
BigDFT Properties N/2 1 Z 1 Z BigDFT and GPUs E = − Ψ∗(r)∇2Ψ (r)dr + ρ(r)V (r)dr Code details ∑ i i H 2 i=1 2 BigDFT and HPC Z GPU + Exc[ρ(r)]+ Vext (r)ρ(r)dr Practical cases
Discussion N/2 Messages ∗ 2 ρ(r) = 2 ∑ Ψi (r)Ψi (r) ∇ VH (r) = −4πρ(r) i=1
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Ab initio calculations with DFT
Several advantages Main limitations Ab initio: No Approximated approach adjustable parameters Requires high computer DFT: Quantum power, limited to few mechanical BigDFT and hundreds atoms in most GPU (fundamental) cases Atomistic treatment Simulations
DFT Ab initio codes Wide range of applications: nanoscience, biology, materials BigDFT
Properties BigDFT and GPUs Code details
BigDFT and HPC
GPU Practical cases
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Performing a DFT calculation
A self-consistent equation
Then ρ = 2∑i hψi |ψi i, where |ψi i satisfies 1 2 ∇ + VH [ρ] + Vxc[ρ] + Vext + Vpseudo |ψi i = ∑Λi,j |ψj i , 2 j BigDFT and GPU Now in practice: implementing a DFT code Atomistic Simulations (Kohn-Sham) DFT “Ingredients” DFT Ab initio codes An XC potential, functional of the density BigDFT
Properties several approximations exists (LDA,GGA,. . . ) BigDFT and GPUs Code details A choice of the pseudopotential (if not all-electrons) BigDFT and (norm conserving, ultrasoft, PAW,. . . ) HPC GPU An (iterative) algorithm for finding the wavefunctions | i Practical cases ψi
Discussion A basis set for expressing the |ψi i Messages A (good) computer. . .
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese 2 2 2 Poisson Equation: ∆V = ρ (Laplacian: ∆ = ∂ + ∂ + ∂ ) Hartree ∂x2 ∂y2 ∂z2 Real Mesh (1003 = 106): 106 × 106 = 1012 evaluations !
KS Equations: Self-Consistent Field
Hamiltonian (H) z }| { Set of self-consistent equations: 2 1 ~ 2 − ∇ + Veff ψi = εi ψi 2 m BigDFT and e GPU with an effective potential:
Atomistic Z 0 Simulations 0 ρ(r ) δExc Veff (r) = Vext (r)+ dr + DFT |r − r 0| δρ(r) Ab initio codes V | {z } | {z } BigDFT Hartree exchange−correlation Properties BigDFT and GPUs Code details 2 BigDFT and and: ρ(r) = ∑i fi |ψi (r)| HPC
GPU Practical cases
Discussion
Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese KS Equations: Self-Consistent Field
Hamiltonian (H) z }| { Set of self-consistent equations: 2 1 ~ 2 − ∇ + Veff ψi = εi ψi 2 m BigDFT and e GPU with an effective potential:
Atomistic Z 0 Simulations 0 ρ(r ) δExc Veff (r) = Vext (r)+ dr + DFT |r − r 0| δρ(r) Ab initio codes V | {z } | {z } BigDFT Hartree exchange−correlation Properties BigDFT and GPUs Code details 2 BigDFT and and: ρ(r) = ∑i fi |ψi (r)| HPC
GPU Practical cases
Discussion ∂2 ∂2 ∂2 Messages Poisson Equation: ∆V = ρ (Laplacian: ∆ = + + ) Hartree ∂x2 ∂y2 ∂z2 Real Mesh (1003 = 106): 106 × 106 = 1012 evaluations !
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese KS Equations: Self-Consistent Field
Hamiltonian (H) z }| { Set of self-consistent equations: 2 1 ~ 2 − ∇ + Veff ψi = εi ψi 2 m BigDFT and e GPU with an effective potential:
Atomistic Z 0 Simulations 0 ρ(r ) δExc Veff (r) = Vext (r)+ dr + DFT |r − r 0| δρ(r) Ab initio codes V | {z } | {z } BigDFT Hartree exchange−correlation Properties BigDFT and GPUs Code details 2 BigDFT and and: ρ(r) = ∑i fi |ψi (r)| HPC
GPU Practical cases
Discussion ∂2 ∂2 ∂2 Messages Poisson Equation: ∆V = ρ (Laplacian: ∆ = + + ) Hartree ∂x2 ∂y2 ∂z2 Real Mesh (1003 = 106): 106 × 106 = 1012 evaluations !
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese KS Equations: Self-Consistent Field
Hamiltonian (H) z }| { Set of self-consistent equations: 2 1 ~ 2 − ∇ + Veff ψi = εi ψi 2 m BigDFT and e GPU with an effective potential:
Atomistic Z 0 Simulations 0 ρ(r ) δExc Veff (r) = Vext (r)+ dr + DFT |r − r 0| δρ(r) Ab initio codes V | {z } | {z } BigDFT Hartree exchange−correlation Properties BigDFT and GPUs Code details 2 BigDFT and and: ρ(r) = ∑i fi |ψi (r)| HPC
GPU Practical cases
Discussion ∂2 ∂2 ∂2 Messages Poisson Equation: ∆V = ρ (Laplacian: ∆ = + + ) Hartree ∂x2 ∂y2 ∂z2 Real Mesh (1003 = 106): 106 × 106 = 1012 evaluations !
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese KS Equations: Self-Consistent Field
Hamiltonian (H) z }| { Set of self-consistent equations: 2 1 ~ 2 − ∇ + Veff ψi = εi ψi 2 m BigDFT and e GPU with an effective potential:
Atomistic Z 0 Simulations 0 ρ(r ) δExc Veff (r) = Vext (r)+ dr + DFT |r − r 0| δρ(r) Ab initio codes V | {z } | {z } BigDFT Hartree exchange−correlation Properties BigDFT and GPUs Code details 2 BigDFT and and: ρ(r) = ∑i fi |ψi (r)| HPC
GPU Practical cases
Discussion ∂2 ∂2 ∂2 Messages Poisson Equation: ∆V = ρ (Laplacian: ∆ = + + ) Hartree ∂x2 ∂y2 ∂z2 Real Mesh (1003 = 106): 106 × 106 = 1012 evaluations !
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese KS Equations: Self-Consistent Field
Hamiltonian (H) z }| { Set of self-consistent equations: 2 1 ~ 2 − ∇ + Veff ψi = εi ψi 2 m BigDFT and e GPU with an effective potential:
Atomistic Z 0 Simulations 0 ρ(r ) δExc Veff (r) = Vext (r)+ dr + DFT |r − r 0| δρ(r) Ab initio codes V | {z } | {z } BigDFT Hartree exchange−correlation Properties BigDFT and GPUs Code details 2 BigDFT and and: ρ(r) = ∑i fi |ψi (r)| HPC
GPU Practical cases
Discussion ∂2 ∂2 ∂2 Messages Poisson Equation: ∆V = ρ (Laplacian: ∆ = + + ) Hartree ∂x2 ∂y2 ∂z2 Real Mesh (1003 = 106): 106 × 106 = 1012 evaluations !
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Performing a DFT calculation (KS formalism)
Find a set of orthonormal orbitals Ψi (r) that minimizes:
E = ∑hΨi |H[ρ]|Ψi i i with:
BigDFT and i = 1,··· ,N (one Ψ per electron) GPU ∗ ρ(r) = ∑Ψi (r)Ψi (r) Atomistic i Simulations
DFT Ab initio codes (Kohn-Sham) DFT “Actors” BigDFT Properties A set of wavefunctions |ψi i, one for each electron BigDFT and GPUs Code details A computational approach on a finite basis BigDFT and HPC ⇒ One array for each Ψi GPU Practical cases ⇒ A set of computational operations on these arrays which Discussion
Messages depend on the basis set A (even more) good computer. . .
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Basis sets for electronic structure calculation
Several basis sets exist, with different features:
Plane Waves Gaussians, Slater Orbitals ABINIT, CPMD, VASP,.. . CP2K,Gaussian,AIMPRO,. . . Systematic convergence. Real space localized
BigDFT and GPU Accuracy increases with Small number of basis the number of basis functions for moderate Atomistic Simulations elements accuracy DFT Ab initio codes Non-localised, Well suited for molecules BigDFT
Properties optimal for periodic, and other open BigDFT and GPUs homogeneous systems structures Code details BigDFT and Non adaptive Non systematic HPC
GPU Practical cases Analytic functions Discussion FFT Messages Kinetic and overlap matrices Robust, Easy to parallelise can be calculated analytically
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese List of ab initio Codes
Plane Waves ABINIT — Louvain-la-Neuve — http://www.abinit.org CPMD — Zurich, Lugano — http://www.cpmd.org PWSCF — Italy — http://www.pwscf.org VASP — Vienna — http://cms.mpi.univie.ac.at/vasp BigDFT and Gaussian GPU Gaussian — http://www.gaussian.com Atomistic Simulations DeMon — http://www.demon-software.com DFT CP2K — http://cp2k.berlios.de Ab initio codes
BigDFT Numerical-like basis sets Properties Siesta — Madrid — BigDFT and GPUs Code details http://www.uam.es/departamentos/ciencias/fismateriac/siesta BigDFT and Wien2K — Vienna — http://www.wien2k.at (FPLAPW, HPC
GPU all electrons) Practical cases Real space basis set Discussion
Messages ONETEP — http://www.onetep.soton.ac.uk BigDFT — http://inac.cea.fr/L_Sim/BigDFT
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese A basis for nanosciences: the BigDFT project
STREP European project: BigDFT(2005-2008) Four partners, 15 contributors: CEA-INAC Grenoble (T.Deutsch), U. Basel (S.Goedecker), U. Louvain-la-Neuve (X.Gonze), U. Kiel (R.Schneider)
BigDFT and GPU Aim: To develop an ab-initio DFT code Atomistic Simulations based on Daubechies Wavelets, to be DFT Ab initio codes integrated in ABINIT. BigDFT BigDFT 1.0 −→ January 2008 Properties BigDFT and GPUs Code details
BigDFT and HPC . . . why have we done this? Was it worth it?
GPU Practical cases Test the potential advantages of a new formalism Discussion A lot of outcomes and interesting results Messages A lot can be done starting from present know-how
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese A DFT code based on Daubechies wavelets
BigDFT: a PSP Kohn-Sham code A Daubechies wavelets basis has unique properties for DFT usage
Systematic, Orthogonal BigDFT and GPU Localised, Adaptive Atomistic Simulations Kohn-Sham operators are analytic DFT Ab initio codes Daubechies Wavelets Short, Separable convolutions BigDFT 1.5 φ(x) Properties ˜c = a c ` ∑j j `−j 1 ψ(x) BigDFT and GPUs Code details 0.5 BigDFT and Peculiar numerical properties 0 HPC
GPU -0.5 Practical cases Real space based, highly flexible -1 Discussion Big & inhomogeneous systems Messages -1.5 -6 -4 -2 0 2 4 6 8 x
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Wavelet properties: multi-resolution
Example of two resolution levels: step function (Haar Wavelet) Scaling Functions: Multi-Resolution basis Low and High resolution functions related each other
BigDFT and GPU = + Atomistic Simulations
DFT Ab initio codes Wavelets: complete the low resolution description BigDFT Properties Defined on the same grid as the low resolution functions BigDFT and GPUs Code details
BigDFT and 1 1 HPC = 2 + 2 GPU Practical cases Discussion Scaling Function + Wavelet = High resolution Messages We increase the resolution without modifying the grid spacing
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Wavelet properties: adaptivity
Adaptivity Resolution can be refined following the grid point.
The grid is divided in BigDFT and Low (1 DoF) and High (8 GPU DoF) resolution points. Atomistic Simulations Points of different resolution DFT Ab initio codes belong to the same grid.
BigDFT Empty regions must not be Properties BigDFT and GPUs “filled” with basis functions. Code details
BigDFT and HPC
GPU Practical cases Localization property,real space description Discussion
Messages Optimal for big & inhomogeneous systems, highly flexible
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Basis set features
BigDFT features in a nutshell Arbitrary absolute precision can be achieved Good convergence ratio for real-space approach (O(h14)) Optimal usage of the degrees of freedom (adaptivity) BigDFT and Optimal speed for a systematic approach (less memory) GPU Hartree potential accurate for various boundary Atomistic Simulations conditions DFT Ab initio codes Free and Surfaces BC Poisson Solver
BigDFT (present also in CP2K, ABINIT, OCTOPUS) Properties BigDFT and GPUs Data repartition is suitable for optimal scalability Code details
BigDFT and Simple communications paradigm, multi-level parallelisation HPC possible (and implemented) GPU Practical cases Discussion Improve and develop know-how Messages Optimal for advanced DFT functionalities in HPC framework
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese BigDFT version 1.5.2: (ABINIT-related) capabilities http://inac.cea.fr/L_Sim/BigDFT Isolated, surfaces and 3D-periodic boundary conditions (k-points, symmetries) All XC functionals of the ABINIT package (libXC library)
BigDFT and Hybrid functionals, Fock exchange operator GPU Direct Minimisation and Mixing routines (metals) Atomistic Simulations Local geometry optimizations (with constraints) DFT Ab initio codes External electric fields (surfaces BC) BigDFT Properties Born-Oppenheimer MD, ESTF-IO BigDFT and GPUs Code details Vibrations BigDFT and HPC Unoccupied states
GPU Practical cases Empirical van der Waals interactions Discussion
Messages Saddle point searches (NEB, Granot & Bear) All these functionalities are GPU-compatible
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Operations performed
The SCF cycle Real Space Daub. Wavelets Orbital scheme: Hamiltonian Preconditioner BigDFT and GPU Coefficient Scheme:
Atomistic Simulations Overlap matrices
DFT Ab initio codes Orthogonalisation
BigDFT Properties Comput. operations BigDFT and GPUs Code details Convolutions BigDFT and HPC BLAS routines GPU Practical cases FFT (Poisson Solver) Discussion
Messages Why not GPUs?
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Hybrid Supercomputing nowadays
GPGPU on Supercomputers Traditional architectures are somehow saturating More cores/node, memories (slightly) larger but not faster Architectures of Supercomputers are becoming hybrid
BigDFT and 3 out to 4 Top Supercomputers are hybrid machines GPU Extrapolation: In 2015, No. 500 will become petafloppic Atomistic Simulations Likely it will be a hybrid machine DFT Ab initio codes
BigDFT Codes should be conceived differently Properties BigDFT and GPUs # MPI processes is limited for a fixed problem size Code details
BigDFT and Performances increase only by enhancing parallelism HPC GPU Further parallelisation levels should be added (OpenMP, Practical cases GPU) Discussion
Messages Does electronic structure calculations codes are suitable?
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese How far is petaflop (for DFT)?
At present, with traditional architectures Routinely used DFT calculations are: Few dozens (hundreds) of processors Parallel intensive operations (blocking communications, BigDFT and GPU 60-70 percent efficiency)
Atomistic Not freshly optimised (legacy codes, monster codes) Simulations DFT Optimistic estimation: 5 GFlop/s per core × 2000 cores × Ab initio codes 0.9 = 9 TFlop/s = 200 times less than Top 500’s #1! BigDFT
Properties BigDFT and GPUs Code details It is such as BigDFT and HPC Distance Earth-Moon = 384 Mm
GPU Practical cases Distance Earth-Mars = 78.4 Gm = 200 times more
Discussion Messages Are we able to go to Mars? (. . . in 2015?)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese The vessel: Ambitions from BigDFT experience
Reliable formalism Systematic convergence properties Explicit environments, analytic operator expressions
BigDFT and GPU State-of-the-art computational technology
Atomistic Data locality optimal for operator applications Simulations DFT Massive parallel environements Ab initio codes BigDFT Material accelerators (GPU) Properties BigDFT and GPUs Code details New physics can be approached BigDFT and HPC Enhanced functionalities can be applied relatively easily GPU Practical cases Limitation of DFT approximations can be evidenced Discussion Messages A formalism of interest for Post-DFT treatments
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Separable convolutions
We must calculate L F(I ,I ,I ) = h h h G(I − j ,I − j ,I − j ) 1 2 3 ∑ j1 j2 j3 1 1 2 2 3 3 j1,j2,j3=0 L L L BigDFT and = hj1 hj2 hj3 G(i1 − j1,i2 − j2,i3 − j3) GPU ∑ ∑ ∑ j1=0 j2=0 j3=0 Atomistic Simulations DFT Application of three successive operations Ab initio codes 1 BigDFT A3(I3,i1,i2) = ∑j hj G(i1,i2,I3 − j) ∀i1,i2; Properties 2 BigDFT and GPUs A2(I2,I3,i1) = ∑j hj A3(I3,i1,I2 − j) ∀I3,i1; Code details 3 BigDFT and F(I1,I2,i3) = ∑j hj A2(I2,I3,I1 − j) ∀I2,I3. HPC
GPU Practical cases Main routine: Convolution + transposition Discussion
Messages F(I,a) = ∑hj G(a,I − j) ∀a ; j
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Basic Input-Output operation
BigDFT and GPU
Atomistic Simulations
DFT Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details
BigDFT and HPC From sequential to GPU GPU Practical cases Same operation schedule for monocore, multithread, GPU Discussion
Messages Can be treated as a library
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese CPU performances of the convolutions Initially, naive FORTRAN routines do j=1,ndat U do i=0,n1 y(j,I) = h x(I + `,j) ∑ ` tt=0.d0 `=L do l=lowfil,lupfil BigDFT and Easy to write and GPU tt=tt+x(i+l,j)*h(l) debug enddo Atomistic Simulations Test the formalism y(j,i)=tt DFT Ab initio codes Define reference enddo BigDFT enddo Properties results BigDFT and GPUs Code details Optimisation can then start (Ex. X5550,2.67 GHz) BigDFT and HPC
GPU Method GFlop/s % of peak SpeedUp Practical cases Naive (FORTRAN) 0.54 5.1 1/(6.25) Discussion Messages Current (FORTRAN) 3.3 31 1 Best (C, SSE) 7.8 73 2.3
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese How to optimize?
A trade-off between benefit and effort
FORTRAN based Relatively accessible (loop unrolling)
BigDFT and Moderate optimisation can be achieved relatively fast GPU Compilers fail to use vector engine efficiently Atomistic Simulations
DFT Ab initio codes Push optimisation at the best BigDFT Only one out of 3 convolution type has been Properties BigDFT and GPUs implemented Code details
BigDFT and About 20 different patterns have been studied for one HPC
GPU 1D convolution Practical cases Tedious work, huge code −→ Maintainability? Discussion
Messages Automatic code generation under study
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese MPI parallelization I: Orbital distribution scheme
Used for the application of the hamiltonian Operator approach: The hamiltonian (convolutions) is applied separately onto each wavefunction
BigDFT and GPU ψ1 Atomistic Simulations MPI 0 DFT Ab initio codes ψ2 BigDFT
Properties BigDFT and GPUs ψ3 Code details
BigDFT and MPI 1 HPC ψ4 GPU Practical cases
Discussion Messages ψ5 MPI 2
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese MPI parallelization II: Coefficient distribution scheme
Used for scalar products & orthonormalisation BLAS routines (level 3) are called, then result is reduced
MPI 0 MPI 1 MPI 2
BigDFT and GPU ψ1
Atomistic Simulations ψ2 DFT Ab initio codes BigDFT ψ Properties 3 BigDFT and GPUs Code details
BigDFT and ψ4 HPC
GPU Practical cases ψ5 Discussion
Messages At present, MPI_ALLTOALL(V) is used to switch
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese OpenMP parallelisation
Innermost parallelisation level (Almost) Any BigDFT operation is parallelised via OpenMP Useful for memory demanding calculations
BigDFT and Allows further increase of speedups GPU Saves MPI processes and intra-node Message Passing Atomistic 4.5 Simulations 2OMP DFT 3OMP Less efficient than 6OMP Ab initio codes 4 MPI BigDFT 3.5 Properties
BigDFT and GPUs Compiler and 3 Code details
system dependent OMP Speedup BigDFT and 2.5 HPC OMP sections GPU 2 Practical cases should be regularly 1.5 Discussion maintained 0 20 40 60 80 100 120 Messages No. of MPI procs
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Task repartition for a small system (ZnO, 128 atoms)
1 Th. OMP per core 100 1000 90 80 70 100 60 BigDFT and GPU 50
Percent 40 Atomistic 10 Efficiency (%) Simulations 30
Seconds (log. scale) Time (sec) DFT Other Ab initio codes 20 CPU Conv BigDFT 10 LinAlg Properties Comms BigDFT and GPUs 0 1 Code details 16 24 32 48 64 96 144 192 288 576 No. of cores BigDFT and HPC GPU What are the ideal conditions for GPU Practical cases
Discussion GPU-ported routines should take the majority of the time Messages What happens to parallel efficiency?
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Parallelisation and architectures
Same code, same runs. Which is the best?
1 Th. OMP per core 1 Th. OMP per core 100 1000 100 1000 90 90 80 80 70 70 100 100 60 60 50 50
Percent 40 Percent 40 10 Efficiency (%) 10 Efficiency (%) 30 30
BigDFT and Seconds (log. scale) Time (sec) Seconds (log. scale) Time (sec) Other Other GPU 20 CPU 20 CPU Conv Conv 10 LinAlg 10 LinAlg 0 1 Comms 0 1 Comms Atomistic 16 24 32 48 64 96 144 192 288 576 16 24 32 48 64 96 144 192 288 576 Simulations No. of cores No. of cores DFT CCRT Titane (Nehalem, Infiniband) CSCS Rosa (Opteron, Cray XT5) Ab initio codes
BigDFT Titane is 2.3 to 1.6 times faster than Rosa! Properties BigDFT and GPUs Code details Degradation of parallel performances: why?
BigDFT and 1 HPC Calculation power has increased more than networking
GPU 2 Practical cases Better libraries (MKL) Discussion Walltime reduced, but lower parallel efficiency Messages
This will always happen while using GPU!
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Architectures, libraries, networking
Same runs, same sources; different user conditions
6 Titane MpiBull2 no MKL 5.5 Jade MPT Titane MpiBull2, MKL 5 Titane OpenMPI, MKL Rosa Cray, istanbul 4.5
BigDFT and 4 GPU Differences 3.5 up to a Atomistic Cpu Hours 3 Simulations 2.5 factor of 3! DFT Ab initio codes 2
BigDFT 1.5 Properties 1 BigDFT and GPUs 0 100 200 300 400 500 600 Code details No. of cores
BigDFT and HPC
GPU A case-by-case study Practical cases Consideration are often system-dependent, a thumb rule not Discussion Messages always exists. Know your code!
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Using GPUs in a Big systematic DFT code
Nature of the operations Operators approach via convolutions Linear Algebra due to orthogonality of the basis Communications and calculations do not interfere BigDFT and GPU A number of operations which can be accelerated
Atomistic Simulations Evaluating GPU convenience DFT Ab initio codes Three levels of evaluation BigDFT Properties 1 Bare speedups: GPU kernels vs. CPU routines BigDFT and GPUs Code details Does the operations are suitable for GPU? BigDFT and 2 HPC Full code speedup on one process
GPU Practical cases Amdahl’s law: are there hot-spot operations? Discussion 3 Speedup in a (massively?) parallel environment Messages The MPI layer adds an extra level of complexity
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Convolutions and Transposition in GPU
BigDFT and GPU
Atomistic Simulations
DFT Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details
BigDFT and HPC Blocking operations GPU Practical cases A work group performs convolutions plus transpositions Discussion
Messages Different boundary conditions can be implemented
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese OpenCL data management
Example of 4 × 4 block with a filter of length 5
BigDFT and GPU
Atomistic Simulations
DFT Ab initio codes
BigDFT
Properties BigDFT and GPUs Code details
BigDFT and HPC
GPU Practical cases SIMD implementation Discussion Each work item is associated to a single output element Messages Convolution buffers → data reuse
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Convolution kernel with OpenCL
Comparison with CPU work (Ex. X5550,2.67 GHz) Method GFlop/s % of peak SpeedUp Naive (FORTRAN) 0.54 5.1 1/(6.25) Current (FORTRAN) 3.3 31 1 BigDFT and GPU Best (C, SSE) 7.8 73 2.3
Atomistic OpenCL (Fermi) 97 20 29 (12.4) Simulations
DFT Ab initio codes Very good and promising results BigDFT Properties No need of data transfer for 3D case (chain of 1D BigDFT and GPUs Code details kernels) BigDFT and HPC Deeper optimisation still to be done (20% of peak) GPU Practical cases Require less manpower than deep CPU optimisation Discussion Messages Automatic generation will be considered
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese GPU-ported operations in BigDFT (double precision)
Convolutions Kernels (OpenCL (re)written) Fully functional (all BC) Based on the former BigDFT and CUDA version GPU A 10 to 50 speedup Atomistic Simulations
DFT Ab initio codes 20
18 BigDFT Properties 16 GPU BigDFT sections BigDFT and GPUs 14 Code details 12 GPU speedups between 5
BigDFT and 10 HPC and 20, depending of: 8 GPU Wavefunction size Practical cases 6 GPU speedup (Double prec.) locden Discussion 4 locham CPU-GPU Architecture precond Messages 2 0 10 20 30 40 50 60 70 Wavefunction size (MB)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese BigDFT in hybrid codes
Acceleration of the full BigDFT code Considerable gain may be achieved for suitable systems Amdahl’s law should always be considered Resources can be used concurrently (OpenCL queues)
BigDFT and More MPI processes may share the same card! GPU
Badiane, X5550 + Fermi S2070 , ZnO 64 at.: CPU vs. Hybrid Atomistic 100 Simulations 10 DFT Ab initio codes 80 BigDFT 8
Properties BigDFT and GPUs 60 6 Code details Percent Speedup BigDFT and 40 HPC 4
GPU
Practical cases 20 Speedup 2 Other CPU Discussion Conv Messages LinAlg Comms 0 CPU-mkl CPU-mkl-mpiCUDA CUDA-mkl OCL-cublasOCL-mkl CUDA-mpi CUDA-mkl-mpiOCL-cublas-mpiOCL-mkl-mpi 0
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese (Lower time-to-solution for a given architecture) The time-to-solution problem I: Efficiency
Good example: 4 C at, surface BC, 113 Kpts Parallel efficiency of 98%, convolutions largely dominate. Node: # GPU added 2 4 8 2× Fermi + 8 × SpeedUp (SU) 5.3 9.8 11.6 BigDFT and Westmere # MPI equiv. 44 80 96 GPU 8 MPI processes Acceler. Eff. 1 .94 .56 Atomistic Simulations 100
DFT Ab initio codes 80 BigDFT
Properties BigDFT and GPUs 60 Code details
BigDFT and SpeedUp 40 HPC
GPU Practical cases 20 ideal Discussion CPU+MPI Messages GPU 0 0 20 40 60 80 100 No. of MPI proc
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese The time-to-solution problem II:Robustness
Not so good example: A too small system
Titane, ZnO 64 at.: CPU vs. Hybrid 100 7
6 80
5
BigDFT and 60 GPU 4
Percent 3 Atomistic 40 Simulations Speedup with GPU
DFT 2 Speedup Ab initio codes 20 Efficiency (%) 1 Other BigDFT CPU Conv Properties LinAlg 0 0 Comms BigDFT and GPUs 1 2 4 6 8 12 24 32 48 64 96 144 1 2 4 6 8 12 24 32 48 64 96 144 CPU code Hybrid code (rel.) Code details No. of MPI proc BigDFT and CPU efficiency is poor (calculation is too fast) HPC GPU Amdahl’s law not favorable (5x SU at most) Practical cases
Discussion GPU SU is almost independent of the size Messages The hybrid code always goes faster
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Hybrid and Heterogeneous runs with OpenCL
NVidia S2070 Connected each ATI HD 6970 to a Nehalem Workstation
BigDFT may run BigDFT and GPU on both
Atomistic Simulations
DFT Ab initio codes Sample BigDFT run: Graphene, 4 C atoms, 52 kpts BigDFT 12 Properties No. of Flop: 8.053 · 10 BigDFT and GPUs MPI 1 1 4 1 4 8 Code details GPU NO NV NV ATI ATI NV+ ATI BigDFT and HPC Time (s) 6020 300 160 347 197 109 GPU Practical cases Speedup 1 20.07 37.62 17.35 30.55 55.23
Discussion GFlop/s 1.34 26.84 50.33 23.2 40.87 73.87
Messages Next Step: handling of Load (un)balancing
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Concrete examples with BigDFT code
MD simulation, 32 water molecules, 0.5 fs/step Mixed MPI/OpenMP BigDFT parallelisation vs. GPU case MPI•OMP 32•1 128•1 32•6 128•6 128+128 s/SCF 7.2 2.0 1.5 0.44 0.3 MD ps/day 0.461 1.661 2.215 7.552 11.02 BigDFT and GPU
Atomistic An example: challenging DFT for catalysis Simulations DFT Multi-scale study for OR mechanism on PEM fuel cells Ab initio codes
BigDFT Explicit model of H2O/Pt interface Properties BigDFT and GPUs Absorbtion properties, reaction Code details mechanisms BigDFT and HPC
GPU Outcomes from the understanding of Practical cases catalytic mechanism at atomic scale: Discussion Messages Conception of new active and selective materials Fuels cell ageing, more efficient and durable devices
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese The fuel: Scientific Topics
Scientific directions: Energy conversion A scientific domain which embodies a number of challenges: The quantities are “complicated” (many body effects) Study/prediction of fundamental properties: band gaps, band BigDFT and offsets, excited state quantities,. . . GPU
Atomistic Objects are “big” and the environment matters Simulations Systems react with the surroundings DFT Ab initio codes Building new modelisation paradigms BigDFT
Properties BigDFT and GPUs How to achieve these objectives? Code details
BigDFT and Short and medium term objectives HPC GPU State-of-the art functionalities of DFT for complex Practical cases environments Discussion Messages Explore new formalisms for Post DFT treatments
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese A look in near future: science with HPC DFT codes
A concerted set of actions Improve codes functionalities for present-day and next generation supercomputers Test and develop new formalisms BigDFT and Insert ab-initio codes in new scientific workflows GPU (Multiscale Modelling) Atomistic Simulations
DFT Ab initio codes The Mars mission
BigDFT Is Petaflop performance possible? Properties BigDFT and GPUs GPU acceleration → one order of magnitude Code details
BigDFT and Bigger systems, heavier methods → (more than) one HPC
GPU order of magnitude bigger Practical cases
Discussion
Messages BigDFT experience makes this feasible An opportunity to achieve important outcomes and know-how
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese General considerations
Optimisation effort Know the code behaviour and features Careful performance study of the complete algorithm
BigDFT and Identify and make modular critical sections GPU Fundamental for mainainability and architecture evolution
Atomistic Simulations Optimisation cost: consider end-user running conditions DFT Robustness is more important than best performance Ab initio codes
BigDFT
Properties BigDFT and GPUs Performance evaluation know-how Code details No general thumb-rule: what means High Performance? BigDFT and HPC A multi-criterion evaluation process GPU Practical cases Multi-level parallelisation always to be used Discussion Your code will not (anymore) become faster via hardware Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Conclusions
BigDFT code: a modern approach for nanosciences Flexible, reliable formalism (wavelet properties) Easily fit with massively parallel architecture Open a path toward the diffusion of Hybrid architectures BigDFT and GPU Messages from GPU experience with BigDFT Atomistic Simulations GPU allow a significant reduction of the time-to-solution DFT Ab initio codes Require a well-structured underlying code which makes BigDFT
Properties multi-level parallelisation possible BigDFT and GPUs Code details To be taken into account while evaluating performances BigDFT and Parallel efficiency ⇐ dimensioning of system wrt architecture HPC
GPU Practical cases CECAM BigDFT tutorial next October Discussion Messages A tutorial on BigDFT code is scheduled! Grenoble, 19-21 October 2011
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Acknowledgments
CEA Grenoble – Group of Thierry Deutsch LG, D. Caliste, B. Videau, M. Ospici, I. Duchemin, P. Boulanger, E. Machado-Charry, F. Cinquini, B. Natarajan
BigDFT and GPU Basel University – Group of Stefan Goedecker Atomistic S. A. Ghazemi, A. Willand, M. Amsler, S. Mohr, A. Sadeghi, Simulations DFT N. Dugan, H. Tran Ab initio codes
BigDFT Properties And other groups BigDFT and GPUs Code details Montreal University: BigDFT and HPC L. Béland, N. Mousseau GPU Practical cases European Synchrotron Radiation Facility: Discussion Y. Kvashnin, A. Mirone Messages
Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese