BigDFT Real-Space approach

Max Webinar Luigi Genovese

Wavelets and ISF

BigDFT GPU

Wavelet Convolutions Optimization Approach BOAST Challenges The operators of BigDFT code. From convolutions Poisson Solver Implicit Solvents to Poisson Solver in High Performance Computing Summary

Luigi Genovese

Laboratoire de Simulation Atomistique - L_Sim

November 12, 2020

Virtual Room BigDFT Real-Space families used in BigDFT code approach

Luigi Genovese

Daubechies f (x) = ∑µ cµφµ(x) Interpolatingf (x) = ∑ fj ϕj (x) j and ISF Orthogonal set Dual to deltas (. . . is it?) GPU Wavelet Convolutions Optimization Approach R BOAST cµ = dx φµ(x)f (x) ≡ hφµ|f i fj = f (j) Challenges

Poisson Solver LEAST ASYMMETRIC DAUBECHIES-16 INTERPOLATING (DESLAURIERS-DUBUC)-8 1.5 1 scaling function scaling function Implicit Solvents wavelet wavelet 1 Summary 0.5 0.5

0 0

-0.5 -0.5

-1

-1 -1.5 -4 -2 0 2 4 -6 -4 -2 0 2 4 6 No need to calculate overlap The expansion coefficients matrix of basis functions are the point values on a grid Used for wavefunctions, Used for charge density, scalar products function products

Magic Filter method (A.Neelov, S. Goedecker) The passage between the two basis sets can be performed

without losing accuracy cµ = ∑k wµ−k fk , wk = hφ|Lk i. BigDFT Real-Space Operations performed approach

Luigi Genovese

Real Space Daub. Wavelets The SCF cycle Wavelets and ISF GPU

Wavelet Convolutions Orbital scheme: Optimization Approach BOAST • Hamiltonian Challenges • Preconditioner Poisson Solver Implicit Solvents Coefficient Scheme: Summary • Overlap matrices • Orthogonalisation

Comput. operations

• Convolutions • BLAS routines • FFT (Poisson Solver) Why not GPUs? BigDFT Real-Space HPC approach of BigDFT approach

Luigi Genovese

A DFT code conceived with a HPC mindset Wavelets and ISF • DFT calculations up to many thousands atoms GPU Wavelet Convolutions Optimization Approach An award-winning HPC code BOAST Challenges

• BigDFT has been conceived for massively parallel Poisson Solver Implicit Solvents etherogeneous architectures since more than 10 years Summary (MPI + OpenMP + GPU)

Code able to run routinely on different architectures • GPGPU since the advent of double-precision (2009) • Itanium, BG-P and BG-Q (Incite award 2015) • ARM architectures (Mont Blanc I) • KNL Accelerators (Marconi) • K computer (RIKEN) Fugaku – preparatory • European Supercomputers (Archer, IRENE-Rome, . . . ) BigDFT Real-Space Using GPUs in a given (DFT) code approach

Luigi Genovese

Developer and user dilemmas Wavelets and ISF

GPU • Does my code fit well? For which systems? Wavelet Convolutions Optimization Approach • How much does porting costs? BOAST Challenges

• Should I always use GPUs? Poisson Solver

Implicit Solvents • How can I interpret results? Summary

Evaluating GPU convenience Three levels of evaluation 1. Bare speedups: GPU kernels vs. CPU routines Are the operations suitable for GPU? 2. Full code speedup on one process Amdahl’s law: are there hot-spot operations? 3. Speedup in a (massively?) parallel environment The MPI layer adds an extra level of complexity BigDFT Real-Space Case study: 1D convolutions (BigDFT code) approach

Luigi Genovese

Initially, naive routines (FORTRAN?) Wavelets and ISF

do j=1,ndat GPU U do i=0,n1 Wavelet Convolutions tt=0.d0 Optimization Approach y(j,I) = ∑ h`x(I + `,j) BOAST `=L do l=lowfil,lupfil Challenges tt=tt+x(i+l,j)*h(l) Poisson Solver • Easy to write and Implicit Solvents enddo Summary debug y(j,i)=tt • Define reference enddo results enddo Optimisation can then start (2010: X5550,2.67 GHz)

Method GFlop/s % of peak SpeedUp Naive (FORTRAN) 0.54 5.1 1/(6.25) Current (FORTRAN) 3.3 31 1 Best (C, SSE) 7.8 73 2.3 OpenCL (Fermi) 97 20 29 (12.4) BigDFT Real-Space How to optimize? approach

Luigi Genovese

Wavelets and ISF

A trade-off between benefit and effort GPU Wavelet Convolutions Optimization Approach BOAST FORTRAN based Challenges

Poisson Solver

 Relatively accessible (loop unrolling) Implicit Solvents Summary  Moderate optimisation can be achieved relatively fast  Compilers fail to use vector engine efficiently

Push optimisation at the best

• Only one out of 3 convolution type has been dissected • About 20 different patterns have been studied for one 1D convolution • Tedious work, huge code −→ Maintainability? Automatic code generation with BOAST engine BigDFT Real-Space BOAST Workflow approach

Luigi Genovese

Metaprogramming the kernels Wavelets and ISF GPU

Wavelet Convolutions Optimization hand-in-hand with design of the operations Optimization Approach BOAST Challenges Development Source Compilation Binary Poisson Solver Code Implicit Solvents Summary Transformation Performance BOAST Analysis Generative Optimization Performance Source Code data

• Generate combination of optimizations • C, OpenCL, FORTRAN and CUDA are supported • Compilation and analysis are automated • Selection of best version can also be automated BigDFT Real-Space Example with a kernel (Wavelet Synthesis) approach

Luigi Genovese

Wavelets and ISF

GPU

Wavelet Convolutions Optimization Approach BOAST Challenges

Poisson Solver

Implicit Solvents Summary

Take-home messages of the BOAST strategy

• Meta-programming of Hot-spot operations • Optimization can be adapted to the characteristics • Portability, best effort and maintainance may go together Towards a BLAS-like library for wavelets convolutions BigDFT Real-Space Autotuning of Libraries (core level) approach

Luigi Genovese

Not all codes can benefit from BLAS/LAPACK/FFT Wavelets and ISF

GPU • Domain specific libraries need to be optimized for the Wavelet Convolutions Optimization Approach architecture BOAST Challenges • Autotuning is possible through the usage of tools like Poisson Solver BOAST Implicit Solvents Summary • Generality can be achieved throught meta-programming

Example: libconv

• BOAST can generate all wavelet families ⇒ the library has more functionalities than the original code • BOAST can adapt the routines to the architecture using OpenMP and vector instructions • BOAST can target FORTRAN and C with OpenMP, CUDA, OpenCL BigDFT Real-Space O(N) BigDFT code is completely different approach Luigi Genovese

Wavelets and ISF • Less data → lower number of flops per atom GPU Wavelet Convolutions Optimization Approach • Communication scheduling completely different BOAST Challenges Poisson Solver Different problems new opportunities Implicit Solvents Summary • No hot spot operations anymore! Convolutions are a less important percentage of the run • Linear algebra becomes sparse Extra completity in handling parallelisation

The time-to-solution is considerably improved 18 k Atoms → less than 20 minutes on 13k cores • The computational physicist is happier • What about the computer scientist? BigDFT Real-Space Interpolating SF Poisson Solver approach

Luigi Genovese

(Screened) Poisson Equation for any BC in vacuum Wavelets and ISF

GPU

Non-orthorhombic cells (periodic, surface BC): Wavelet Convolutions 2 2 Optimization Approach (∇ − µ0)V (x,y,z) = −4πρ(x,y,z) BOAST Machine-precision accuracy J. Chem. Phys. 137, 13 (2012) Challenges Poisson Solver

Implicit Solvents Extended to implicit solvents (JCP 144, 014103 (2016)) Summary

Future developments Range-separated Coulomb operator h i 1 erf r + erfc r r r0 r0 BigDFT Real-Space Precision and performances (JCC, 2014) approach

Luigi Genovese

Wavelets and ISF

GPU

Wavelet Convolutions Optimization Approach BOAST Challenges

Poisson Solver

Implicit Solvents Summary

Volume 35 | Issues 5–6 | 2014 Included in this print edition: Issue 5 (February 15, 2014) Journal of Issue 6 (March 5, 2014) Organic • Inorganic • Physical Biological • M a t e r i a l s Research in Systems Neuroscience www.c-chem.org

Editors: Charles L. Brooks III • Masahiro Ehara • Gernot Frenking • Peter R. Schreiner BigDFT Real-Space Hybrid Functionals (JPCM 30 (9),095901 (2018)) approach

Luigi Genovese

UO systems: Wavelets and ISF 2 GPU Atoms Orbitals Wavelet Convolutions Optimization Approach 12 200 BOAST Challenges 96 1432 Poisson Solver 324 5400 Implicit Solvents 768 12800 Summary 1029 17150 BigDFT Real-Space Polarizable Continuum Models approach

Luigi Genovese Poisson solver for implicit solvents JCP 144, 014103 (2016) Wavelets and ISF Allows an efficient and GPU accurate treatment of implicit Wavelet Convolutions Optimization Approach solvents BOAST Challenges The dielectric function Poisson Solver determine the cavity where Implicit Solvents Summary the solute is defined. The cavity can be • rigid (PCM-like) • determined from the Electronic Density (SCCS approach) Can treat various BC

(here TiO2 surface)

Can be used in conjunction with O(N) BigDFT BigDFT Real-Space From vacuum to complex wet environments approach

Luigi Genovese

Neutral or ionic wet environments Wavelets and ISF

GPU

Poisson-Boltzmann eq. Wavelet Convolutions Generalized Poisson eq. Optimization Approach ∇ · ε(r)∇φ(r) = BOAST Challenges ∇ · ε(r)∇φ(r) = −4πρ(r)  ions  −4π ρ(r) + ρ [φ](r) Poisson Solver

Implicit Solvents Summary Various algorithms implemented

• Polarization Iteration (use gradient of polarization potential) • Preconditioned Conjugate Gradient (need only function multiplications)

Solution found by iterative application of PS in vacuum BC BigDFT Real-Space Modelling of Electrostatic Environment approach

Luigi Genovese

Advantages of explicit BC Wavelets and ISF GPU Difference between the electrostatic solvation energy computed Wavelet Convolutions Optimization Approach with periodic and free boundary conditions as function of the BOAST Challenges periodic cell length. Poisson Solver

Implicit Solvents Summary dipole norm (D)

vacuum H2O CH3CONH2 3.88 5.76 H2O 1.81 2.41 CH3OH 1.57 2.14 NH3 1.49 1.98 CH3NH2 1.27 1.78

Explicit BC avoid error due to supercell aliasing BigDFT Real-Space Performances and timings for full DFT runs approach

Luigi Genovese

Wavelets and ISF

GPU Blackbox-like usage Wavelet Convolutions Optimization Approach BOAST The Generalized PS only needs Challenges few iterations of the vacuum Poisson Solver Implicit Solvents poisson solver Summary

Time-to-solution Timings for the protein PDB ID: 1y49 (122 atoms) in water • Full SCF convergence 49 s • Solvent/vacuum runtime ratio α = 1.16 BigDFT Real-Space ISF: wavelet analysis meets real-space approach

Luigi Genovese

Features of Systematicity Wavelets and ISF

GPU • ISF provide a rigorous framework to interpret real-space Wavelet Convolutions Optimization Approach coefficients BOAST Challenges • High interpolating power (precise results) Poisson Solver • Implicit Solvents Scaling relation → computational efficiency, arbitrary Summary precision • (Quasi) variational • In conjunction with (Daubechies) wavelets offer a good formalism to get rid of uncertainties

Opportunities from new quadratures

• Generalize the concept of compensating charge (multipoles) • Combine together cartesian and polar meshes