Bigdft Approach to High Performance Computing

Total Page:16

File Type:pdf, Size:1020Kb

Bigdft Approach to High Performance Computing Daubechies Wavelets in Electronic Structure Calculation: BigDFT Code Tutorial ORNL/NICS – OAK RIDGE,TENNESSEE BigDFT and HPC The code Properties BigDFT approach to High Performance BigDFT and GPUs Code details BigDFT and Computing HPC GPU Practical cases Discussion Luigi Genovese L_Sim – CEA Grenoble February 8, 2012 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Outline 1 The code Formalism and properties BigDFT and The needs for hybrid DFT codes HPC Main operations, parallelisation The code Properties BigDFT and GPUs 2 Code details Performance evaluation BigDFT and Evaluating GPU gain HPC GPU Practical cases Practical cases Discussion 3 Discussion Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese A DFT code based on Daubechies wavelets BigDFT: a PSP Kohn-Sham code A Daubechies wavelets basis has unique properties for DFT usage Systematic, Orthogonal BigDFT and HPC Localised, Adaptive The code Properties Kohn-Sham operators are analytic BigDFT and GPUs Code details Daubechies Wavelets Short, Separable convolutions BigDFT and 1.5 φ(x) HPC ~c = a c ` ∑j j `−j 1 ψ(x) GPU Practical cases 0.5 Discussion Peculiar numerical properties 0 Real space based, highly flexible -0.5 -1 Big & inhomogeneous systems -1.5 -6 -4 -2 0 2 4 6 8 x Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Basis set features BigDFT features in a nutshell 4 Arbitrary absolute precision can be achieved Good convergence ratio for real-space approach (O(h14)) 4 Optimal usage of the degrees of freedom (adaptivity) BigDFT and Optimal speed for a systematic approach (less memory) HPC 4 Hartree potential accurate for various boundary The code Properties conditions BigDFT and GPUs Code details Free and Surfaces BC Poisson Solver BigDFT and (present also in CP2K, ABINIT, OCTOPUS) HPC GPU * Data repartition is suitable for optimal scalability Practical cases Discussion Simple communications paradigm, multi-level parallelisation possible (and implemented) Improve and develop know-how Optimal for advanced DFT functionalities in HPC framework Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Moore’s law 40 years of improvements BigDFT and HPC Transistor counts double every The code two years. Properties BigDFT and GPUs . but how? Code details BigDFT and HPC GPU Practical cases Discussion Power is the limiting factor Power ∝ Frequency3 * Clock rate is limited Multiple slower devices preferable than one superfast device * More performance with less power ! software problem? Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Hybrid Supercomputing nowadays GPGPU on Supercomputers Traditional architectures are somehow saturating More cores/node, memories (slightly) larger but not faster Architectures of Supercomputers are becoming hybrid BigDFT and 3 out to 5 Top Supercomputers are hybrid machines HPC Extrapolation: In 2015, No. 500 will become petafloppic The code Properties Likely it will be a hybrid machine BigDFT and GPUs Code details BigDFT and Codes should be conceived differently HPC GPU # MPI processes is limited for a fixed problem size Practical cases Discussion Performances increase only by enhancing parallelism Further parallelisation levels should be added (OpenMP, GPU) Does electronic structure calculations codes are suitable? Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese How far is petaflop (for DFT)? At present, with traditional architectures Routinely used DFT calculations are: Few dozens (hundreds) of processors Parallel intensive operations (blocking communications, BigDFT and HPC 60-70 percent efficiency) The code Not freshly optimised (legacy codes, monster codes) Properties BigDFT and GPUs * Optimistic estimation: 5 GFlop/s per core × 2000 cores × Code details 0.9 = 9 TFlop/s = 200 times less than Top 500’s #3! BigDFT and HPC GPU Practical cases It is such as Discussion Distance Earth-Moon = 384 Mm Distance Earth-Mars = 78.4 Gm = 200 times more Moon is reached. can we go to Mars? (. in 2015?) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Operations performed The SCF cycle Real Space Daub. Wavelets Orbital scheme: Hamiltonian Preconditioner BigDFT and HPC Coefficient Scheme: The code Overlap matrices Properties BigDFT and GPUs Code details Orthogonalisation BigDFT and HPC Comput. operations GPU Practical cases Convolutions Discussion BLAS routines FFT (Poisson Solver) Why not GPUs? Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Separable convolutions We must calculate L F(I ;I ;I ) = h h h G(I − j ;I − j ;I − j ) 1 2 3 ∑ j1 j2 j3 1 1 2 2 3 3 j1;j2;j3=0 L L L BigDFT and = hj1 hj2 hj3 G(i1 − j1;i2 − j2;i3 − j3) HPC ∑ ∑ ∑ j1=0 j2=0 j3=0 The code Properties BigDFT and GPUs Application of three successive operations Code details 1 BigDFT and A3(I3;i1;i2) = ∑j hj G(i1;i2;I3 − j) 8i1;i2; HPC 2 GPU A2(I2;I3;i1) = ∑j hj A3(I3;i1;I2 − j) 8I3;i1; Practical cases 3 Discussion F(I1;I2;i3) = ∑j hj A2(I2;I3;I1 − j) 8I2;I3. Main routine: Convolution + transposition F(I;a) = ∑hj G(a;I − j) 8a ; j Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese CPU performances of the convolutions Initially, naive routines (FORTRAN?) do j=1,ndat U do i=0,n1 y(j;I) = ∑ h`x(I + `;j) tt=0.d0 `=L do l=lowfil,lupfil tt=tt+x(i+l,j)*h(l) BigDFT and Easy to write and HPC enddo debug y(j,i)=tt The code Properties Define reference enddo BigDFT and GPUs Code details results enddo BigDFT and HPC GPU Optimisation can then start (Ex. X5550,2.67 GHz) Practical cases Discussion Method GFlop/s % of peak SpeedUp Naive (FORTRAN) 0.54 5.1 1/(6.25) Current (FORTRAN) 3.3 31 1 Best (C, SSE) 7.8 73 2.3 OpenCL (Fermi) 97 20 29 (12.4) Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese How to optimize? A trade-off between benefit and effort FORTRAN based 4 Relatively accessible (loop unrolling) BigDFT and 4 Moderate optimisation can be achieved relatively fast HPC 6 Compilers fail to use vector engine efficiently The code Properties BigDFT and GPUs Code details Push optimisation at the best BigDFT and HPC Only one out of 3 convolution type has been GPU implemented Practical cases Discussion About 20 different patterns have been studied for one 1D convolution Tedious work, huge code −! Maintainability? * Automatic code generation under study Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese MPI parallelization I: Orbital distribution scheme Used for the application of the hamiltonian Operator approach: The hamiltonian (convolutions) is applied separately onto each wavefunction BigDFT and HPC y1 The code Properties MPI 0 BigDFT and GPUs Code details y2 BigDFT and HPC GPU y3 Practical cases Discussion MPI 1 y4 y5 MPI 2 Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese MPI parallelization II: Coefficient distribution scheme Used for scalar products & orthonormalisation BLAS routines (level 3) are called, then result is reduced MPI 0 MPI 1 MPI 2 BigDFT and HPC y1 The code Properties y2 BigDFT and GPUs Code details BigDFT and HPC y3 GPU Practical cases Discussion y4 y5 At present, MPI_ALLTOALL(V) is used to switch Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese OpenMP parallelisation Innermost parallelisation level (Almost) Any BigDFT operation is parallelised via OpenMP 4 Useful for memory demanding calculations BigDFT and 4 Allows further increase of speedups HPC 4 Saves MPI processes and intra-node Message Passing The code 4.5 Properties 2OMP BigDFT and GPUs 3OMP 6OMP 6 Less efficient than 4 Code details MPI BigDFT and 3.5 HPC GPU 6 Compiler and 3 Practical cases system dependent OMP Speedup Discussion 2.5 6 OMP sections 2 should be regularly 1.5 0 20 40 60 80 100 120 maintained No. of MPI procs Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Task repartition for a small system (ZnO, 128 atoms) 1 Th. OMP per core 100 1000 90 80 70 100 60 BigDFT and HPC 50 Percent 40 The code 10 Efficiency (%) Properties 30 Seconds (log. scale) Time (sec) BigDFT and GPUs Other Code details 20 CPU Conv BigDFT and 10 LinAlg HPC Comms GPU 0 1 Practical cases 16 24 32 48 64 96 144 192 288 576 No. of cores Discussion What are the ideal conditions for GPU GPU-ported routines should take the majority of the time What happens to parallel efficiency? Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Parallelisation and architectures Same1 code, Th. OMP same per core runs. Which is the best? 100 1000 1 Th. OMP per core 90100 1000 90 80 80 70 100 70 60 50 100 Percent 60 40 10 Efficiency (%) 30 BigDFT and Seconds (log. scale) Time (sec) Other HPC 50 20 CPU Conv 10 Percent LinAlg 40 Comms The code 0 1 16 24 32 48 64 96 144 192 288 576 10 Efficiency (%) Properties 30 No. of cores Seconds (log. scale) Time (sec) BigDFT and GPUs CCRT Titane (Nehalem, Infiniband) CSCS Rosa (Opteron, CrayOther XT5) Code details 20 CPU Conv BigDFT and 10 Titane is 2.3 to 1.6 times faster than Rosa! LinAlg HPC Comms GPU 0 1 16 24 32 48 64 96 144 192 288 576 Practical cases Degradation of parallel performances: why? No. of cores Discussion 1 Calculation power has increased more than networking 2 Better libraries (MKL) * Walltime reduced, but lower parallel efficiency This will always happen while using GPU! Laboratoire de Simulation Atomistique http://inac.cea.fr/L_Sim Luigi Genovese Architectures, libraries, networking Same runs, same sources; different user conditions 6 Titane MpiBull2 no MKL 5.5 Jade MPT Titane MpiBull2, MKL 5 Titane OpenMPI, MKL Rosa Cray, istanbul 4.5 BigDFT and 4 HPC Differences 3.5 up to a The code Cpu Hours 3 Properties 2.5 factor of 3! BigDFT and GPUs Code details 2 BigDFT and 1.5 HPC 1 GPU 0 100 200 300 400 500 600 Practical cases No.
Recommended publications
  • GPAW, Gpus, and LUMI
    GPAW, GPUs, and LUMI Martti Louhivuori, CSC - IT Center for Science Jussi Enkovaara GPAW 2021: Users and Developers Meeting, 2021-06-01 Outline LUMI supercomputer Brief history of GPAW with GPUs GPUs and DFT Current status Roadmap LUMI - EuroHPC system of the North Pre-exascale system with AMD CPUs and GPUs ~ 550 Pflop/s performance Half of the resources dedicated to consortium members Programming for LUMI Finland, Belgium, Czechia, MPI between nodes / GPUs Denmark, Estonia, Iceland, HIP and OpenMP for GPUs Norway, Poland, Sweden, and how to use Python with AMD Switzerland GPUs? https://www.lumi-supercomputer.eu GPAW and GPUs: history (1/2) Early proof-of-concept implementation for NVIDIA GPUs in 2012 ground state DFT and real-time TD-DFT with finite-difference basis separate version for RPA with plane-waves Hakala et al. in "Electronic Structure Calculations on Graphics Processing Units", Wiley (2016), https://doi.org/10.1002/9781118670712 PyCUDA, cuBLAS, cuFFT, custom CUDA kernels Promising performance with factor of 4-8 speedup in best cases (CPU node vs. GPU node) GPAW and GPUs: history (2/2) Code base diverged from the main branch quite a bit proof-of-concept implementation had lots of quick and dirty hacks fixes and features were pulled from other branches and patches no proper unit tests for GPU functionality active development stopped soon after publications Before development re-started, code didn't even work anymore on modern GPUs without applying a few small patches Lesson learned: try to always get new functionality to the
    [Show full text]
  • Accessing the Accuracy of Density Functional Theory Through Structure
    This is an open access article published under a Creative Commons Attribution (CC-BY) License, which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited. Letter Cite This: J. Phys. Chem. Lett. 2019, 10, 4914−4919 pubs.acs.org/JPCL Accessing the Accuracy of Density Functional Theory through Structure and Dynamics of the Water−Air Interface † # ‡ # § ‡ § ∥ Tatsuhiko Ohto, , Mayank Dodia, , Jianhang Xu, Sho Imoto, Fujie Tang, Frederik Zysk, ∥ ⊥ ∇ ‡ § ‡ Thomas D. Kühne, Yasuteru Shigeta, , Mischa Bonn, Xifan Wu, and Yuki Nagata*, † Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan ‡ Max Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany § Department of Physics, Temple University, Philadelphia, Pennsylvania 19122, United States ∥ Dynamics of Condensed Matter and Center for Sustainable Systems Design, Chair of Theoretical Chemistry, University of Paderborn, Warburger Strasse 100, 33098 Paderborn, Germany ⊥ Graduate School of Pure and Applied Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8571, Japan ∇ Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan *S Supporting Information ABSTRACT: Density functional theory-based molecular dynamics simulations are increasingly being used for simulating aqueous interfaces. Nonetheless, the choice of the appropriate density functional, critically affecting the outcome of the simulation, has remained arbitrary. Here, we assess the performance of various exchange−correlation (XC) functionals, based on the metrics relevant to sum-frequency generation spectroscopy. The structure and dynamics of water at the water−air interface are governed by heterogeneous intermolecular interactions, thereby providing a critical benchmark for XC functionals.
    [Show full text]
  • Density Functional Theory
    Density Functional Approach Francesco Sottile Ecole Polytechnique, Palaiseau - France European Theoretical Spectroscopy Facility (ETSF) 22 October 2010 Density Functional Theory 1. Any observable of a quantum system can be obtained from the density of the system alone. < O >= O[n] Hohenberg, P. and W. Kohn, 1964, Phys. Rev. 136, B864 Density Functional Theory 1. Any observable of a quantum system can be obtained from the density of the system alone. < O >= O[n] 2. The density of an interacting-particles system can be calculated as the density of an auxiliary system of non-interacting particles. Hohenberg, P. and W. Kohn, 1964, Phys. Rev. 136, B864 Kohn, W. and L. Sham, 1965, Phys. Rev. 140, A1133 Density Functional ... Why ? Basic ideas of DFT Importance of the density Example: atom of Nitrogen (7 electron) 1. Any observable of a quantum Ψ(r1; ::; r7) 21 coordinates system can be obtained from 10 entries/coordinate ) 1021 entries the density of the system alone. 8 bytes/entry ) 8 · 1021 bytes 4:7 × 109 bytes/DVD ) 2 × 1012 DVDs 2. The density of an interacting-particles system can be calculated as the density of an auxiliary system of non-interacting particles. Density Functional ... Why ? Density Functional ... Why ? Density Functional ... Why ? Basic ideas of DFT Importance of the density Example: atom of Oxygen (8 electron) 1. Any (ground-state) observable Ψ(r1; ::; r8) 24 coordinates of a quantum system can be 24 obtained from the density of the 10 entries/coordinate ) 10 entries 8 bytes/entry ) 8 · 1024 bytes system alone. 5 · 109 bytes/DVD ) 1015 DVDs 2.
    [Show full text]
  • D6.1 Report on the Deployment of the Max Demonstrators and Feedback to WP1-5
    Ref. Ares(2020)2820381 - 31/05/2020 HORIZON2020 European Centre of Excellence ​ Deliverable D6.1 Report on the deployment of the MaX Demonstrators and feedback to WP1-5 D6.1 Report on the deployment of the MaX Demonstrators and feedback to WP1-5 Pablo Ordejón, Uliana Alekseeva, Stefano Baroni, Riccardo Bertossa, Miki Bonacci, Pietro Bonfà, Claudia Cardoso, Carlo Cavazzoni, Vladimir Dikan, Stefano de Gironcoli, Andrea Ferretti, Alberto García, Luigi Genovese, Federico Grasselli, Anton Kozhevnikov, Deborah Prezzi, Davide Sangalli, Joost VandeVondele, Daniele Varsano, Daniel Wortmann Due date of deliverable: 31/05/2020 Actual submission date: 31/05/2020 Final version: 31/05/2020 Lead beneficiary: ICN2 (participant number 3) Dissemination level: PU - Public www.max-centre.eu 1 HORIZON2020 European Centre of Excellence ​ Deliverable D6.1 Report on the deployment of the MaX Demonstrators and feedback to WP1-5 Document information Project acronym: MaX Project full title: Materials Design at the Exascale Research Action Project type: European Centre of Excellence in materials modelling, simulations and design EC Grant agreement no.: 824143 Project starting / end date: 01/12/2018 (month 1) / 30/11/2021 (month 36) Website: www.max-centre.eu Deliverable No.: D6.1 Authors: P. Ordejón, U. Alekseeva, S. Baroni, R. Bertossa, M. ​ Bonacci, P. Bonfà, C. Cardoso, C. Cavazzoni, V. Dikan, S. de Gironcoli, A. Ferretti, A. García, L. Genovese, F. Grasselli, A. Kozhevnikov, D. Prezzi, D. Sangalli, J. VandeVondele, D. Varsano, D. Wortmann To be cited as: Ordejón, et al., (2020): Report on the deployment of the MaX Demonstrators and feedback to WP1-5. Deliverable D6.1 of the H2020 project MaX (final version as of 31/05/2020).
    [Show full text]
  • Octopus: a First-Principles Tool for Excited Electron-Ion Dynamics
    octopus: a first-principles tool for excited electron-ion dynamics. ¡ Miguel A. L. Marques a, Alberto Castro b ¡ a c, George F. Bertsch c and Angel Rubio a aDepartamento de F´ısica de Materiales, Facultad de Qu´ımicas, Universidad del Pa´ıs Vasco, Centro Mixto CSIC-UPV/EHU and Donostia International Physics Center (DIPC), 20080 San Sebastian,´ Spain bDepartamento de F´ısica Teorica,´ Universidad de Valladolid, E-47011 Valladolid, Spain cPhysics Department and Institute for Nuclear Theory, University of Washington, Seattle WA 98195 USA Abstract We present a computer package aimed at the simulation of the electron-ion dynamics of finite systems, both in one and three dimensions, under the influence of time-dependent electromagnetic fields. The electronic degrees of freedom are treated quantum mechani- cally within the time-dependent Kohn-Sham formalism, while the ions are handled classi- cally. All quantities are expanded in a regular mesh in real space, and the simulations are performed in real time. Although not optimized for that purpose, the program is also able to obtain static properties like ground-state geometries, or static polarizabilities. The method employed proved quite reliable and general, and has been successfully used to calculate linear and non-linear absorption spectra, harmonic spectra, laser induced fragmentation, etc. of a variety of systems, from small clusters to medium sized quantum dots. Key words: Electronic structure, time-dependent, density-functional theory, non-linear optics, response functions PACS: 33.20.-t, 78.67.-n, 82.53.-k PROGRAM SUMMARY Title of program: octopus Catalogue identifier: Program obtainable from: CPC Program Library, Queen’s University of Belfast, N.
    [Show full text]
  • 5 Jul 2020 (finite Non-Periodic Vs
    ELSI | An Open Infrastructure for Electronic Structure Solvers Victor Wen-zhe Yua, Carmen Camposb, William Dawsonc, Alberto Garc´ıad, Ville Havue, Ben Hourahinef, William P. Huhna, Mathias Jacqueling, Weile Jiag,h, Murat Ke¸celii, Raul Laasnera, Yingzhou Lij, Lin Ling,h, Jianfeng Luj,k,l, Jonathan Moussam, Jose E. Romanb, Alvaro´ V´azquez-Mayagoitiai, Chao Yangg, Volker Bluma,l,∗ aDepartment of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, USA bDepartament de Sistemes Inform`aticsi Computaci´o,Universitat Polit`ecnica de Val`encia,Val`encia,Spain cRIKEN Center for Computational Science, Kobe 650-0047, Japan dInstitut de Ci`enciade Materials de Barcelona (ICMAB-CSIC), Bellaterra E-08193, Spain eDepartment of Applied Physics, Aalto University, Aalto FI-00076, Finland fSUPA, University of Strathclyde, Glasgow G4 0NG, UK gComputational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA hDepartment of Mathematics, University of California, Berkeley, CA 94720, USA iComputational Science Division, Argonne National Laboratory, Argonne, IL 60439, USA jDepartment of Mathematics, Duke University, Durham, NC 27708, USA kDepartment of Physics, Duke University, Durham, NC 27708, USA lDepartment of Chemistry, Duke University, Durham, NC 27708, USA mMolecular Sciences Software Institute, Blacksburg, VA 24060, USA Abstract Routine applications of electronic structure theory to molecules and peri- odic systems need to compute the electron density from given Hamiltonian and, in case of non-orthogonal basis sets, overlap matrices. System sizes can range from few to thousands or, in some examples, millions of atoms. Different discretization schemes (basis sets) and different system geometries arXiv:1912.13403v3 [physics.comp-ph] 5 Jul 2020 (finite non-periodic vs.
    [Show full text]
  • 1-DFT Introduction
    MBPT and TDDFT Theory and Tools for Electronic-Optical Properties Calculations in Material Science Dott.ssa Letizia Chiodo Nano-bio Spectroscopy Group & ETSF - European Theoretical Spectroscopy Facility, Dipartemento de Física de Materiales, Facultad de Químicas, Universidad del País Vasco UPV/EHU, San Sebastián-Donostia, Spain Outline of the Lectures • Many Body Problem • DFT elements; examples • DFT drawbacks • excited properties: electronic and optical spectroscopies. elements of theory • Many Body Perturbation Theory: GW • codes, examples of GW calculations • Many Body Perturbation Theory: BSE • codes, examples of BSE calculations • Time Dependent DFT • codes, examples of TDDFT calculations • state of the art, open problems Main References theory • P. Hohenberg & W. Kohn, Phys. Rev. 136 (1964) B864; W. Kohn & L. J. Sham, Phys. Rev. 140 , A1133 (1965); (Nobel Prize in Chemistry1998) • Richard M. Martin, Electronic Structure: Basic Theory and Practical Methods, Cambridge University Press, 2004 • M. C. Payne, Rev. Mod. Phys.64 , 1045 (1992) • E. Runge and E.K.U. Gross, Phys. Rev. Lett. 52 (1984) 997 • M. A. L.Marques, C. A. Ullrich, F. Nogueira, A. Rubio, K. Burke, E. K. U. Gross, Time-Dependent Density Functional Theory. (Springer-Verlag, 2006). • L. Hedin, Phys. Rev. 139 , A796 (1965) • R.W. Godby, M. Schluter, L. J. Sham. Phys. Rev. B 37 , 10159 (1988) • G. Onida, L. Reining, A. Rubio, Rev. Mod. Phys. 74 , 601 (2002) codes & tutorials • Q-Espresso, http://www.pwscf.org/ • Abinit, http://www.abinit.org/ • Yambo, http://www.yambo-code.org • Octopus, http://www.tddft.org/programs/octopus more info at http://www.etsf.eu, www.nanobio.ehu.es Outline of the Lectures • Many Body Problem • DFT elements; examples • DFT drawbacks • excited properties: electronic and optical spectroscopies.
    [Show full text]
  • The CECAM Electronic Structure Library and the Modular Software Development Paradigm
    The CECAM electronic structure library and the modular software development paradigm Cite as: J. Chem. Phys. 153, 024117 (2020); https://doi.org/10.1063/5.0012901 Submitted: 06 May 2020 . Accepted: 08 June 2020 . Published Online: 13 July 2020 Micael J. T. Oliveira , Nick Papior , Yann Pouillon , Volker Blum , Emilio Artacho , Damien Caliste , Fabiano Corsetti , Stefano de Gironcoli , Alin M. Elena , Alberto García , Víctor M. García-Suárez , Luigi Genovese , William P. Huhn , Georg Huhs , Sebastian Kokott , Emine Küçükbenli , Ask H. Larsen , Alfio Lazzaro , Irina V. Lebedeva , Yingzhou Li , David López- Durán , Pablo López-Tarifa , Martin Lüders , Miguel A. L. Marques , Jan Minar , Stephan Mohr , Arash A. Mostofi , Alan O’Cais , Mike C. Payne, Thomas Ruh, Daniel G. A. Smith , José M. Soler , David A. Strubbe , Nicolas Tancogne-Dejean , Dominic Tildesley, Marc Torrent , and Victor Wen-zhe Yu COLLECTIONS Paper published as part of the special topic on Electronic Structure Software Note: This article is part of the JCP Special Topic on Electronic Structure Software. This paper was selected as Featured ARTICLES YOU MAY BE INTERESTED IN Recent developments in the PySCF program package The Journal of Chemical Physics 153, 024109 (2020); https://doi.org/10.1063/5.0006074 An open-source coding paradigm for electronic structure calculations Scilight 2020, 291101 (2020); https://doi.org/10.1063/10.0001593 Siesta: Recent developments and applications The Journal of Chemical Physics 152, 204108 (2020); https://doi.org/10.1063/5.0005077 J. Chem. Phys. 153, 024117 (2020); https://doi.org/10.1063/5.0012901 153, 024117 © 2020 Author(s). The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp The CECAM electronic structure library and the modular software development paradigm Cite as: J.
    [Show full text]
  • Accelerating Performance and Scalability with NVIDIA Gpus on HPC Applications
    Accelerating Performance and Scalability with NVIDIA GPUs on HPC Applications Pak Lui The HPC Advisory Council Update • World-wide HPC non-profit organization • ~425 member companies / universities / organizations • Bridges the gap between HPC usage and its potential • Provides best practices and a support/development center • Explores future technologies and future developments • Leading edge solutions and technology demonstrations 2 HPC Advisory Council Members 3 HPC Advisory Council Centers HPC ADVISORY COUNCIL CENTERS HPCAC HQ SWISS (CSCS) CHINA AUSTIN 4 HPC Advisory Council HPC Center Dell™ PowerEdge™ Dell PowerVault MD3420 HPE Apollo 6000 HPE ProLiant SL230s HPE Cluster Platform R730 GPU Dell PowerVault MD3460 10-node cluster Gen8 3000SL 36-node cluster 4-node cluster 16-node cluster InfiniBand Storage (Lustre) Dell™ PowerEdge™ C6145 Dell™ PowerEdge™ R815 Dell™ PowerEdge™ Dell™ PowerEdge™ M610 InfiniBand-based 6-node cluster 11-node cluster R720xd/R720 32-node GPU 38-node cluster Storage (Lustre) cluster Dell™ PowerEdge™ C6100 4-node cluster 4-node GPU cluster 4-node GPU cluster 5 Exploring All Platforms / Technologies X86, Power, GPU, FPGA and ARM based Platforms x86 Power GPU FPGA ARM 6 HPC Training • HPC Training Center – CPUs – GPUs – Interconnects – Clustering – Storage – Cables – Programming – Applications • Network of Experts – Ask the experts 7 University Award Program • University award program – Universities / individuals are encouraged to submit proposals for advanced research • Selected proposal will be provided with: – Exclusive computation time on the HPC Advisory Council’s Compute Center – Invitation to present in one of the HPC Advisory Council’s worldwide workshops – Publication of the research results on the HPC Advisory Council website • 2010 award winner is Dr.
    [Show full text]
  • Application Profiling at the HPCAC High Performance Center Pak Lui 157 Applications Best Practices Published
    Best Practices: Application Profiling at the HPCAC High Performance Center Pak Lui 157 Applications Best Practices Published • Abaqus • COSMO • HPCC • Nekbone • RFD tNavigator • ABySS • CP2K • HPCG • NEMO • SNAP • AcuSolve • CPMD • HYCOM • NWChem • SPECFEM3D • Amber • Dacapo • ICON • Octopus • STAR-CCM+ • AMG • Desmond • Lattice QCD • OpenAtom • STAR-CD • AMR • DL-POLY • LAMMPS • OpenFOAM • VASP • ANSYS CFX • Eclipse • LS-DYNA • OpenMX • WRF • ANSYS Fluent • FLOW-3D • miniFE • OptiStruct • ANSYS Mechanical• GADGET-2 • MILC • PAM-CRASH / VPS • BQCD • Graph500 • MSC Nastran • PARATEC • BSMBench • GROMACS • MR Bayes • Pretty Fast Analysis • CAM-SE • Himeno • MM5 • PFLOTRAN • CCSM 4.0 • HIT3D • MPQC • Quantum ESPRESSO • CESM • HOOMD-blue • NAMD • RADIOSS For more information, visit: http://www.hpcadvisorycouncil.com/best_practices.php 2 35 Applications Installation Best Practices Published • Adaptive Mesh Refinement (AMR) • ESI PAM-CRASH / VPS 2013.1 • NEMO • Amber (for GPU/CUDA) • GADGET-2 • NWChem • Amber (for CPU) • GROMACS 5.1.2 • Octopus • ANSYS Fluent 15.0.7 • GROMACS 4.5.4 • OpenFOAM • ANSYS Fluent 17.1 • GROMACS 5.0.4 (GPU/CUDA) • OpenMX • BQCD • Himeno • PyFR • CASTEP 16.1 • HOOMD Blue • Quantum ESPRESSO 4.1.2 • CESM • LAMMPS • Quantum ESPRESSO 5.1.1 • CP2K • LAMMPS-KOKKOS • Quantum ESPRESSO 5.3.0 • CPMD • LS-DYNA • WRF 3.2.1 • DL-POLY 4 • MrBayes • WRF 3.8 • ESI PAM-CRASH 2015.1 • NAMD For more information, visit: http://www.hpcadvisorycouncil.com/subgroups_hpc_works.php 3 HPC Advisory Council HPC Center HPE Apollo 6000 HPE ProLiant
    [Show full text]
  • Improvements of Bigdft Code in Modern HPC Architectures
    Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Improvements of BigDFT code in modern HPC architectures Luigi Genovesea;b;∗, Brice Videaua, Thierry Deutscha, Huan Tranc, Stefan Goedeckerc aLaboratoire de Simulation Atomistique, SP2M/INAC/CEA, 17 Av. des Martyrs, 38054 Grenoble, France bEuropean Synchrotron Radiation Facility, 6 rue Horowitz, BP 220, 38043 Grenoble, France cInstitut f¨urPhysik, Universit¨atBasel, Klingelbergstr.82, 4056 Basel, Switzerland Abstract Electronic structure calculations (DFT codes) are certainly among the disciplines for which an increasing of the computa- tional power correspond to an advancement in the scientific results. In this report, we present the ongoing advancements of DFT code that can run on massively parallel, hybrid and heterogeneous CPU-GPU clusters. This DFT code, named BigDFT, is delivered within the GNU-GPL license either in a stand-alone version or integrated in the ABINIT software package. Hybrid BigDFT routines were initially ported with NVidia's CUDA language, and recently more functionalities have been added with new routines writeen within Kronos' OpenCL standard. The formalism of this code is based on Daubechies wavelets, which is a systematic real-space based basis set. The properties of this basis set are well suited for an extension on a GPU-accelerated environment. In addition to focusing on the performances of the MPI and OpenMP parallelisation the BigDFT code, this presentation also relies of the usage of the GPU resources in a complex code with different kinds of operations. A discussion on the interest of present and expected performances of Hybrid architectures computation in the framework of electronic structure calculations is also adressed.
    [Show full text]
  • Kepler Gpus and NVIDIA's Life and Material Science
    LIFE AND MATERIAL SCIENCES Mark Berger; [email protected] Founded 1993 Invented GPU 1999 – Computer Graphics Visual Computing, Supercomputing, Cloud & Mobile Computing NVIDIA - Core Technologies and Brands GPU Mobile Cloud ® ® GeForce Tegra GRID Quadro® , Tesla® Accelerated Computing Multi-core plus Many-cores GPU Accelerator CPU Optimized for Many Optimized for Parallel Tasks Serial Tasks 3-10X+ Comp Thruput 7X Memory Bandwidth 5x Energy Efficiency How GPU Acceleration Works Application Code Compute-Intensive Functions Rest of Sequential 5% of Code CPU Code GPU CPU + GPUs : Two Year Heart Beat 32 Volta Stacked DRAM 16 Maxwell Unified Virtual Memory 8 Kepler Dynamic Parallelism 4 Fermi 2 FP64 DP GFLOPS GFLOPS per DP Watt 1 Tesla 0.5 CUDA 2008 2010 2012 2014 Kepler Features Make GPU Coding Easier Hyper-Q Dynamic Parallelism Speedup Legacy MPI Apps Less Back-Forth, Simpler Code FERMI 1 Work Queue CPU Fermi GPU CPU Kepler GPU KEPLER 32 Concurrent Work Queues Developer Momentum Continues to Grow 100M 430M CUDA –Capable GPUs CUDA-Capable GPUs 150K 1.6M CUDA Downloads CUDA Downloads 1 50 Supercomputer Supercomputers 60 640 University Courses University Courses 4,000 37,000 Academic Papers Academic Papers 2008 2013 Explosive Growth of GPU Accelerated Apps # of Apps Top Scientific Apps 200 61% Increase Molecular AMBER LAMMPS CHARMM NAMD Dynamics GROMACS DL_POLY 150 Quantum QMCPACK Gaussian 40% Increase Quantum Espresso NWChem Chemistry GAMESS-US VASP CAM-SE 100 Climate & COSMO NIM GEOS-5 Weather WRF Chroma GTS 50 Physics Denovo ENZO GTC MILC ANSYS Mechanical ANSYS Fluent 0 CAE MSC Nastran OpenFOAM 2010 2011 2012 SIMULIA Abaqus LS-DYNA Accelerated, In Development NVIDIA GPU Life Science Focus Molecular Dynamics: All codes are available AMBER, CHARMM, DESMOND, DL_POLY, GROMACS, LAMMPS, NAMD Great multi-GPU performance GPU codes: ACEMD, HOOMD-Blue Focus: scaling to large numbers of GPUs Quantum Chemistry: key codes ported or optimizing Active GPU acceleration projects: VASP, NWChem, Gaussian, GAMESS, ABINIT, Quantum Espresso, BigDFT, CP2K, GPAW, etc.
    [Show full text]