MPI Profiling

Total Page:16

File Type:pdf, Size:1020Kb

MPI Profiling Best Practices: Application Profiling at the HPCAC High Performance Center Pak Lui 164 Applications Best Practices Published • Abaqus • COSMO • HPCC • Nekbone • RADIOSS • ABySS • CP2K • HPCG • NEMO • RFD tNavigator • AcuSolve • CPMD • HYCOM • NWChem • SNAP • Amber • Dacapo • ICON • Octopus • SPECFEM3D • AMG • Desmond • Lattice QCD • OpenAtom • STAR-CCM+ • DL-POLY • LAMMPS • STAR-CD • AMR • OpenFOAM • ANSYS CFX • Eclipse • LS-DYNA • VASP • OpenMX • ANSYS Fluent • FLOW-3D • miniFE • WRF • OptiStruct • ANSYS Mechanical• GADGET-2 • MILC • PAM-CRASH / VPS • BQCD • Graph500 • MSC Nastran • PARATEC • BSMBench • GROMACS • MR Bayes • Pretty Fast Analysis • CAM-SE • Himeno • MM5 • PFLOTRAN • CCSM 4.0 • HIT3D • MPQC • Quantum ESPRESSO • CESM • HOOMD-blue • NAMD For more information, visit: http://www.hpcadvisorycouncil.com/best_practices.php 2 38 Applications Installation Best Practices Published • Adaptive Mesh Refinement (AMR) • ESI PAM-CRASH / VPS 2013.1 • NWChem • Amber (for GPU/CUDA) • GADGET-2 • NWChem 6.5 • Amber (for CPU) • GROMACS 5.1.2 • Octopus • ANSYS Fluent 15.0.7 • GROMACS 4.5.4 • OpenFOAM • ANSYS Fluent 17.1 • GROMACS 5.0.4 (GPU/CUDA) • OpenMX • BQCD • Himeno • PyFR • Caffe • HOOMD Blue • Quantum ESPRESSO 4.1.2 • CASTEP 16.1 • LAMMPS • Quantum ESPRESSO 5.1.1 • CESM • LAMMPS-KOKKOS • Quantum ESPRESSO 5.3.0 • CP2K • LS-DYNA • TensorFlow 0.10.0 • CPMD • MrBayes • WRF 3.2.1 • DL-POLY 4 • NAMD • WRF 3.8 • ESI PAM-CRASH 2015.1 • NEMO For more information, visit: http://www.hpcadvisorycouncil.com/subgroups_hpc_works.php 3 HPC Advisory Council HPC Center HPE Apollo 6000 HPE ProLiant SL230s Gen8 HPE Cluster Platform 3000SL Dell™ PowerEdge™ Dell™ PowerEdge™ 10-node cluster 4-node cluster 16-node cluster C6145 6-node cluster R815 11-node cluster Dell™ PowerEdge™ R730 Dell PowerVault MD3420 Dell™ PowerEdge™ Dell™ PowerEdge™ M610 IBM POWER8 GPU Dell PowerVault MD3460 R720xd/R720 32-node GPU 38-node cluster 8-node cluster 36-node cluster cluster InfiniBand Storage (Lustre) Dell™ PowerEdge™ C6100 4-node cluster 4-node GPU cluster 4-node GPU cluster 4 Agenda – Example of HPCAC Applications Activity • Overview of HPC Applications Performance • Way to Inspect, Profile, Optimize HPC Applications – CPU, memory, file I/O, network • System Configurations and Tuning • Case Studies, Performance Comparisons, Optimizations and Highlights • Conclusions 5 Note • The following research was performed under the HPC Advisory Council activities – Compute resource - HPC Advisory Council Cluster Center • The following was done to provide best practices – HPC application performance overview – Understanding HPC application communication patterns – Ways to increase HPC application productivity 6 Test Clusters • HPE ProLiant DL360 Gen9 128-node (4096-core) “Hercules” cluster – Dual-Socket 16-Core Intel E5-2697A v4 @ 2.60 GHz CPUs – Memory: 256GB memory, DDR4 2400 MHz, Memory Snoop Mode in BIOS sets to Home Snoop – OS: RHEL 7.2, MLNX_OFED_LINUX-3.4-2.0.0.0 InfiniBand SW stack – Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters, Mellanox Switch-IB SB7800 36-port EDR 100Gb/s InfiniBand Switch – Intel® Omni-Path Host Fabric Interface (HFI) 100Gb/s Adapter, Intel® Omni-Path Edge Switch 100 Series – MPI: Intel MPI 2017, Open MPI 2.02 • IBM OperPOWER 8-node “Telesto” cluster - IBM Power System S822LC (8335-GTA) – IBM: Dual-Socket 10-Core @ 3.491 GHz CPUs, Memory: 256GB memory, DDR3 PC3-14900 MHz – Wistron OpenPOWER servers (Dual-Socket 8-Core @ 3.867 GHz CPUs. Memory: 224GB memory, DDR3 PC3-14900 MHz) – OS: RHEL 7.2, MLNX_OFED_LINUX-3.4-1.0.0.0 InfiniBand SW stack – Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters, Mellanox Switch-IB SB7800 36-port EDR 100Gb/s InfiniBand Switch – Compilers: GNU compilers 4.8.5, IBM XL Compilers 13.1.3 – MPI: Open MPI 2.0.2, IBM Spectrum MPI 10.1.0.2, MPI Profiler: IPM • Dell PowerEdge R730 32-node (1024-core) “Thor” cluster – Dual-Socket 16-Core Intel E5-2697Av4 @ 2.60 GHz CPUs (BIOS: Maximum Performance, Home Snoop) – Memory: 256GB memory, DDR4 2400 MHz, Memory Snoop Mode in BIOS sets to Home Snoop – OS: RHEL 7.2, M MLNX_OFED_LINUX-3.4-1.0.0.0 InfiniBand SW stack – Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters, Mellanox Switch-IB SB7800 36-port EDR 100Gb/s InfiniBand Switch – Intel® Omni-Path Host Fabric Interface (HFI) 100Gbps Adapter, Intel® Omni-Path Edge Switch 100 Series – Dell InfiniBand-Based Lustre Storage based on Dell PowerVault MD3460 and Dell PowerVault MD3420 – Compilers: Intel Compilers 2016.4.258 – MPI: Intel Parallel Studio XE 2016 Update 4, Mellanox HPC-X MPI Toolkit v1.8, MPI Profiler: IPM (from Mellanox HPC-X) 7 GROMACS (GROningen MAchine for Chemical Simulation) • A molecular dynamics simulation package • Primarily designed for biochemical molecules like proteins, lipids and nucleic acids – A lot of algorithmic optimizations have been introduced in the code – Extremely fast at calculating the nonbonded interactions • Ongoing development to extend GROMACS with interfaces both to Quantum Chemistry and Bioinformatics/databases • An open source software released under the GPL 8 GROMACS Performance – MPI Libraries • Small performance gain in the latest GROMACS version – About 3% better performance seen on GROMACS version 2016.2 than 5.1.2 3% Higher is better Optimized parameters used 9 GROMACS Performance – Interconnects • EDR InfiniBand enables higher scalability than Omni-Path for GROMACS – InfiniBand delivers 136% better scaling versus Omni-Path for 128 nodes 136% Higher is better Intel MPI 10 GROMACS Performance – MPI • Intel MPI includes multiple transport providers for running on InfiniBand fabrics – Native transport provides up to 31% better scaling than uDAPL provider at 128 nodes 31% Higher is better Optimized parameters used 11 GROMACS Profiling – % of MPI Calls • For the most time consuming MPI calls (as % of MPI time): – MPI_Iprobe (51%), MPI_Allreduce (23%), MPI_Bcast (16%), MPI_Waitall (9%) 32 Nodes / 1024 Processes 12 NAMD Performance – MPI Libraries • Different MPI options deliver better performance per different benchmarks Higher is better Optimized parameters used 13 NAMD Profiling – % of MPI Calls • For the most time consuming MPI calls (as % of MPI time): – MPI_Iprobe (90%), MPI_Isend (4%), MPI_Test (3%), MPI_Recv (3%) 64 Nodes / 2048 Processes 14 BSMBench Profiling – % of MPI Calls • Major MPI calls (as % of wall time): – Balance: MPI_Barrier (26%), MPI_Allreduce (6%), MPI_Waitall (5%), MPI_Isend (4%) – Communications: MPI_Barrier (14%), MPI_Allreduce (5%), MPI_Waitall (5%), MPI_Isend (2%) – Compute: MPI_Barrier (14%), MPI_Allreduce (5%), MPI_Waitall (5%), MPI_Isend (1%) Balance Communications Compute 32 Nodes / 1024 Processes 15 BSMBench Profiling – MPI Msg Distribution • Similar communication pattern seen across all 3 examples: – Balance: MPI_Barrier: 0-byte, 22% wall, MPI_Allreduce: 8-byte, 5% wall – Communications: MPI_Barrier: 0-byte, 26% wall, MPI_Allreduce: 8-byte, 5% wall – Compute: MPI_Barrier: 0-byte, 13% wall, MPI_Allreduce: 8-byte, 5% wall Balance Communications Compute 32 Nodes / 1024 Processes 16 BSMBench Profiling – Time Spent in MPI • The different communications across the MPI processes is mostly balance – Does not appear to be any significant load imbalances in the communication layer Balance Communications Compute 32 Nodes / 1024 Processes 17 BSMBench Performance – MPI Libraries • Comparison between two commercial available MPI libraries • Intel MPI and HPC-X delivers similar performance – HPC-X demonstrates 5% advantage at 32 nodes Higher is better 32 MPI Processes / Node 18 BSMBench Summary • Benchmark for BSM Lattice Physics – Utilizes both compute and network communications • MPI Profiling – Most MPI time is spent on MPI collective operations and non-blocking communications • Heavy use of MPI collective operations (MPI_Allreduce, MPI_Barrier) – Similar communication patterns seen across all three examples • Balance: MPI_Barrier: 0-byte, 22% wall, MPI_Allreduce: 8-byte, 5% wall • Comms: MPI_Barrier: 0-byte, 26% wall, MPI_Allreduce: 8-byte, 5% wall • Compute: MPI_Barrier: 0-byte, 13% wall, MPI_Allreduce: 8-byte, 5% wall 19 HPCG Performance – SMT • Simultaneous Multithreading (SMT) allows additional hardware threads for compute • Additional performance gain is seen with SMT enabled – Up to 45% of performance gain is seen between no SMT versus 5 SMT threads are used – As more MPI ranks being used for SMT cores, but would need more memory to run – Memory bandwidth saturation appears to be around at around 5 SMT thread 45% Higher is better 20 HPCG Performance – System Architecture • Power CPU demonstrates 8% higher performance compared to x86 – Performance gain on a single node is approximately 8% for Power8 – 32 cores per node used for x86, versus 20 physical cores used per node for Power Higher is better SMT=5 for IBM POWER8 21 HPCG Performance – Matrix Size • The sparse matrix size specified determines the amount of memory consumed – The amount of memory for sparse matrix computation is bounded by matrix size specified – Performance achieved by using slightly lower matrix size appeared to have no effect • Shorter time duration appeared to have no effect on the performance – The standard runtime for HPCG is 30 minutes; running shorter appears to perform the same Higher is better SMT=1 for IBM POWER8 22 HPCG Profiling – % of MPI Calls • For the most time consuming MPI calls (as % of wall time): – MPI_Wait (1.5%), MPI_Send (%), MPI_Waitall (0.5%) • The percentage time spent on communication is limited 23 HPCG Summary • HPCG Project – Potential replacement for the
Recommended publications
  • Free and Open Source Software for Computational Chemistry Education
    Free and Open Source Software for Computational Chemistry Education Susi Lehtola∗,y and Antti J. Karttunenz yMolecular Sciences Software Institute, Blacksburg, Virginia 24061, United States zDepartment of Chemistry and Materials Science, Aalto University, Espoo, Finland E-mail: [email protected].fi Abstract Long in the making, computational chemistry for the masses [J. Chem. Educ. 1996, 73, 104] is finally here. We point out the existence of a variety of free and open source software (FOSS) packages for computational chemistry that offer a wide range of functionality all the way from approximate semiempirical calculations with tight- binding density functional theory to sophisticated ab initio wave function methods such as coupled-cluster theory, both for molecular and for solid-state systems. By their very definition, FOSS packages allow usage for whatever purpose by anyone, meaning they can also be used in industrial applications without limitation. Also, FOSS software has no limitations to redistribution in source or binary form, allowing their easy distribution and installation by third parties. Many FOSS scientific software packages are available as part of popular Linux distributions, and other package managers such as pip and conda. Combined with the remarkable increase in the power of personal devices—which rival that of the fastest supercomputers in the world of the 1990s—a decentralized model for teaching computational chemistry is now possible, enabling students to perform reasonable modeling on their own computing devices, in the bring your own device 1 (BYOD) scheme. In addition to the programs’ use for various applications, open access to the programs’ source code also enables comprehensive teaching strategies, as actual algorithms’ implementations can be used in teaching.
    [Show full text]
  • Accessing the Accuracy of Density Functional Theory Through Structure
    This is an open access article published under a Creative Commons Attribution (CC-BY) License, which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited. Letter Cite This: J. Phys. Chem. Lett. 2019, 10, 4914−4919 pubs.acs.org/JPCL Accessing the Accuracy of Density Functional Theory through Structure and Dynamics of the Water−Air Interface † # ‡ # § ‡ § ∥ Tatsuhiko Ohto, , Mayank Dodia, , Jianhang Xu, Sho Imoto, Fujie Tang, Frederik Zysk, ∥ ⊥ ∇ ‡ § ‡ Thomas D. Kühne, Yasuteru Shigeta, , Mischa Bonn, Xifan Wu, and Yuki Nagata*, † Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan ‡ Max Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany § Department of Physics, Temple University, Philadelphia, Pennsylvania 19122, United States ∥ Dynamics of Condensed Matter and Center for Sustainable Systems Design, Chair of Theoretical Chemistry, University of Paderborn, Warburger Strasse 100, 33098 Paderborn, Germany ⊥ Graduate School of Pure and Applied Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8571, Japan ∇ Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan *S Supporting Information ABSTRACT: Density functional theory-based molecular dynamics simulations are increasingly being used for simulating aqueous interfaces. Nonetheless, the choice of the appropriate density functional, critically affecting the outcome of the simulation, has remained arbitrary. Here, we assess the performance of various exchange−correlation (XC) functionals, based on the metrics relevant to sum-frequency generation spectroscopy. The structure and dynamics of water at the water−air interface are governed by heterogeneous intermolecular interactions, thereby providing a critical benchmark for XC functionals.
    [Show full text]
  • Density Functional Theory
    Density Functional Approach Francesco Sottile Ecole Polytechnique, Palaiseau - France European Theoretical Spectroscopy Facility (ETSF) 22 October 2010 Density Functional Theory 1. Any observable of a quantum system can be obtained from the density of the system alone. < O >= O[n] Hohenberg, P. and W. Kohn, 1964, Phys. Rev. 136, B864 Density Functional Theory 1. Any observable of a quantum system can be obtained from the density of the system alone. < O >= O[n] 2. The density of an interacting-particles system can be calculated as the density of an auxiliary system of non-interacting particles. Hohenberg, P. and W. Kohn, 1964, Phys. Rev. 136, B864 Kohn, W. and L. Sham, 1965, Phys. Rev. 140, A1133 Density Functional ... Why ? Basic ideas of DFT Importance of the density Example: atom of Nitrogen (7 electron) 1. Any observable of a quantum Ψ(r1; ::; r7) 21 coordinates system can be obtained from 10 entries/coordinate ) 1021 entries the density of the system alone. 8 bytes/entry ) 8 · 1021 bytes 4:7 × 109 bytes/DVD ) 2 × 1012 DVDs 2. The density of an interacting-particles system can be calculated as the density of an auxiliary system of non-interacting particles. Density Functional ... Why ? Density Functional ... Why ? Density Functional ... Why ? Basic ideas of DFT Importance of the density Example: atom of Oxygen (8 electron) 1. Any (ground-state) observable Ψ(r1; ::; r8) 24 coordinates of a quantum system can be 24 obtained from the density of the 10 entries/coordinate ) 10 entries 8 bytes/entry ) 8 · 1024 bytes system alone. 5 · 109 bytes/DVD ) 1015 DVDs 2.
    [Show full text]
  • Octopus: a First-Principles Tool for Excited Electron-Ion Dynamics
    octopus: a first-principles tool for excited electron-ion dynamics. ¡ Miguel A. L. Marques a, Alberto Castro b ¡ a c, George F. Bertsch c and Angel Rubio a aDepartamento de F´ısica de Materiales, Facultad de Qu´ımicas, Universidad del Pa´ıs Vasco, Centro Mixto CSIC-UPV/EHU and Donostia International Physics Center (DIPC), 20080 San Sebastian,´ Spain bDepartamento de F´ısica Teorica,´ Universidad de Valladolid, E-47011 Valladolid, Spain cPhysics Department and Institute for Nuclear Theory, University of Washington, Seattle WA 98195 USA Abstract We present a computer package aimed at the simulation of the electron-ion dynamics of finite systems, both in one and three dimensions, under the influence of time-dependent electromagnetic fields. The electronic degrees of freedom are treated quantum mechani- cally within the time-dependent Kohn-Sham formalism, while the ions are handled classi- cally. All quantities are expanded in a regular mesh in real space, and the simulations are performed in real time. Although not optimized for that purpose, the program is also able to obtain static properties like ground-state geometries, or static polarizabilities. The method employed proved quite reliable and general, and has been successfully used to calculate linear and non-linear absorption spectra, harmonic spectra, laser induced fragmentation, etc. of a variety of systems, from small clusters to medium sized quantum dots. Key words: Electronic structure, time-dependent, density-functional theory, non-linear optics, response functions PACS: 33.20.-t, 78.67.-n, 82.53.-k PROGRAM SUMMARY Title of program: octopus Catalogue identifier: Program obtainable from: CPC Program Library, Queen’s University of Belfast, N.
    [Show full text]
  • Open Babel Documentation Release 2.3.1
    Open Babel Documentation Release 2.3.1 Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) December 05, 2011 Contents 1 Introduction 3 1.1 Goals of the Open Babel project ..................................... 3 1.2 Frequently Asked Questions ....................................... 4 1.3 Thanks .................................................. 7 2 Install Open Babel 9 2.1 Install a binary package ......................................... 9 2.2 Compiling Open Babel .......................................... 9 3 obabel and babel - Convert, Filter and Manipulate Chemical Data 17 3.1 Synopsis ................................................. 17 3.2 Options .................................................. 17 3.3 Examples ................................................. 19 3.4 Differences between babel and obabel .................................. 21 3.5 Format Options .............................................. 22 3.6 Append property values to the title .................................... 22 3.7 Filtering molecules from a multimolecule file .............................. 22 3.8 Substructure and similarity searching .................................. 25 3.9 Sorting molecules ............................................ 25 3.10 Remove duplicate molecules ....................................... 25 3.11 Aliases for chemical groups ....................................... 26 4 The Open Babel GUI 29 4.1 Basic operation .............................................. 29 4.2 Options .................................................
    [Show full text]
  • 1-DFT Introduction
    MBPT and TDDFT Theory and Tools for Electronic-Optical Properties Calculations in Material Science Dott.ssa Letizia Chiodo Nano-bio Spectroscopy Group & ETSF - European Theoretical Spectroscopy Facility, Dipartemento de Física de Materiales, Facultad de Químicas, Universidad del País Vasco UPV/EHU, San Sebastián-Donostia, Spain Outline of the Lectures • Many Body Problem • DFT elements; examples • DFT drawbacks • excited properties: electronic and optical spectroscopies. elements of theory • Many Body Perturbation Theory: GW • codes, examples of GW calculations • Many Body Perturbation Theory: BSE • codes, examples of BSE calculations • Time Dependent DFT • codes, examples of TDDFT calculations • state of the art, open problems Main References theory • P. Hohenberg & W. Kohn, Phys. Rev. 136 (1964) B864; W. Kohn & L. J. Sham, Phys. Rev. 140 , A1133 (1965); (Nobel Prize in Chemistry1998) • Richard M. Martin, Electronic Structure: Basic Theory and Practical Methods, Cambridge University Press, 2004 • M. C. Payne, Rev. Mod. Phys.64 , 1045 (1992) • E. Runge and E.K.U. Gross, Phys. Rev. Lett. 52 (1984) 997 • M. A. L.Marques, C. A. Ullrich, F. Nogueira, A. Rubio, K. Burke, E. K. U. Gross, Time-Dependent Density Functional Theory. (Springer-Verlag, 2006). • L. Hedin, Phys. Rev. 139 , A796 (1965) • R.W. Godby, M. Schluter, L. J. Sham. Phys. Rev. B 37 , 10159 (1988) • G. Onida, L. Reining, A. Rubio, Rev. Mod. Phys. 74 , 601 (2002) codes & tutorials • Q-Espresso, http://www.pwscf.org/ • Abinit, http://www.abinit.org/ • Yambo, http://www.yambo-code.org • Octopus, http://www.tddft.org/programs/octopus more info at http://www.etsf.eu, www.nanobio.ehu.es Outline of the Lectures • Many Body Problem • DFT elements; examples • DFT drawbacks • excited properties: electronic and optical spectroscopies.
    [Show full text]
  • In Quantum Chemistry
    http://www.cca-forum.org Computational Quality of Service (CQoS) in Quantum Chemistry Joseph Kenny1, Kevin Huck2, Li Li3, Lois Curfman McInnes3, Heather Netzloff4, Boyana Norris3, Meng-Shiou Wu4, Alexander Gaenko4 , and Hirotoshi Mori5 1Sandia National Laboratories, 2University of Oregon, 3Argonne National Laboratory, 4Ames Laboratory, 5Ochanomizu University, Japan This work is a collaboration among participants in the SciDAC Center for Technology for Advanced Scientific Component Software (TASCS), Performance Engineering Research Institute (PERI), Quantum Chemistry Science Application Partnership (QCSAP), and the Tuning and Analysis Utilities (TAU) group at the University of Oregon. Quantum Chemistry and the CQoS in Quantum Chemistry: Motivation and Approach Common Component Architecture (CCA) Motivation: CQoS Approach: CCA Overview: • QCSAP Challenges: How, during runtime, can we make the best choices • Overall: Develop infrastructure for dynamic component adaptivity, i.e., • The CCA Forum provides a specification and software tools for the for reliability, accuracy, and performance of interoperable quantum composing, substituting, and reconfiguring running CCA component development of high-performance components. chemistry components based on NWChem, MPQC, and GAMESS? applications in response to changing conditions – Performance, accuracy, mathematical consistency, reliability, etc. • Components = Composition – When several QC components provide the same functionality, what • Approach: Develop CQoS tools for – A component is a unit
    [Show full text]
  • The CECAM Electronic Structure Library and the Modular Software Development Paradigm
    The CECAM electronic structure library and the modular software development paradigm Cite as: J. Chem. Phys. 153, 024117 (2020); https://doi.org/10.1063/5.0012901 Submitted: 06 May 2020 . Accepted: 08 June 2020 . Published Online: 13 July 2020 Micael J. T. Oliveira , Nick Papior , Yann Pouillon , Volker Blum , Emilio Artacho , Damien Caliste , Fabiano Corsetti , Stefano de Gironcoli , Alin M. Elena , Alberto García , Víctor M. García-Suárez , Luigi Genovese , William P. Huhn , Georg Huhs , Sebastian Kokott , Emine Küçükbenli , Ask H. Larsen , Alfio Lazzaro , Irina V. Lebedeva , Yingzhou Li , David López- Durán , Pablo López-Tarifa , Martin Lüders , Miguel A. L. Marques , Jan Minar , Stephan Mohr , Arash A. Mostofi , Alan O’Cais , Mike C. Payne, Thomas Ruh, Daniel G. A. Smith , José M. Soler , David A. Strubbe , Nicolas Tancogne-Dejean , Dominic Tildesley, Marc Torrent , and Victor Wen-zhe Yu COLLECTIONS Paper published as part of the special topic on Electronic Structure Software Note: This article is part of the JCP Special Topic on Electronic Structure Software. This paper was selected as Featured ARTICLES YOU MAY BE INTERESTED IN Recent developments in the PySCF program package The Journal of Chemical Physics 153, 024109 (2020); https://doi.org/10.1063/5.0006074 An open-source coding paradigm for electronic structure calculations Scilight 2020, 291101 (2020); https://doi.org/10.1063/10.0001593 Siesta: Recent developments and applications The Journal of Chemical Physics 152, 204108 (2020); https://doi.org/10.1063/5.0005077 J. Chem. Phys. 153, 024117 (2020); https://doi.org/10.1063/5.0012901 153, 024117 © 2020 Author(s). The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp The CECAM electronic structure library and the modular software development paradigm Cite as: J.
    [Show full text]
  • Accelerating Performance and Scalability with NVIDIA Gpus on HPC Applications
    Accelerating Performance and Scalability with NVIDIA GPUs on HPC Applications Pak Lui The HPC Advisory Council Update • World-wide HPC non-profit organization • ~425 member companies / universities / organizations • Bridges the gap between HPC usage and its potential • Provides best practices and a support/development center • Explores future technologies and future developments • Leading edge solutions and technology demonstrations 2 HPC Advisory Council Members 3 HPC Advisory Council Centers HPC ADVISORY COUNCIL CENTERS HPCAC HQ SWISS (CSCS) CHINA AUSTIN 4 HPC Advisory Council HPC Center Dell™ PowerEdge™ Dell PowerVault MD3420 HPE Apollo 6000 HPE ProLiant SL230s HPE Cluster Platform R730 GPU Dell PowerVault MD3460 10-node cluster Gen8 3000SL 36-node cluster 4-node cluster 16-node cluster InfiniBand Storage (Lustre) Dell™ PowerEdge™ C6145 Dell™ PowerEdge™ R815 Dell™ PowerEdge™ Dell™ PowerEdge™ M610 InfiniBand-based 6-node cluster 11-node cluster R720xd/R720 32-node GPU 38-node cluster Storage (Lustre) cluster Dell™ PowerEdge™ C6100 4-node cluster 4-node GPU cluster 4-node GPU cluster 5 Exploring All Platforms / Technologies X86, Power, GPU, FPGA and ARM based Platforms x86 Power GPU FPGA ARM 6 HPC Training • HPC Training Center – CPUs – GPUs – Interconnects – Clustering – Storage – Cables – Programming – Applications • Network of Experts – Ask the experts 7 University Award Program • University award program – Universities / individuals are encouraged to submit proposals for advanced research • Selected proposal will be provided with: – Exclusive computation time on the HPC Advisory Council’s Compute Center – Invitation to present in one of the HPC Advisory Council’s worldwide workshops – Publication of the research results on the HPC Advisory Council website • 2010 award winner is Dr.
    [Show full text]
  • Application Profiling at the HPCAC High Performance Center Pak Lui 157 Applications Best Practices Published
    Best Practices: Application Profiling at the HPCAC High Performance Center Pak Lui 157 Applications Best Practices Published • Abaqus • COSMO • HPCC • Nekbone • RFD tNavigator • ABySS • CP2K • HPCG • NEMO • SNAP • AcuSolve • CPMD • HYCOM • NWChem • SPECFEM3D • Amber • Dacapo • ICON • Octopus • STAR-CCM+ • AMG • Desmond • Lattice QCD • OpenAtom • STAR-CD • AMR • DL-POLY • LAMMPS • OpenFOAM • VASP • ANSYS CFX • Eclipse • LS-DYNA • OpenMX • WRF • ANSYS Fluent • FLOW-3D • miniFE • OptiStruct • ANSYS Mechanical• GADGET-2 • MILC • PAM-CRASH / VPS • BQCD • Graph500 • MSC Nastran • PARATEC • BSMBench • GROMACS • MR Bayes • Pretty Fast Analysis • CAM-SE • Himeno • MM5 • PFLOTRAN • CCSM 4.0 • HIT3D • MPQC • Quantum ESPRESSO • CESM • HOOMD-blue • NAMD • RADIOSS For more information, visit: http://www.hpcadvisorycouncil.com/best_practices.php 2 35 Applications Installation Best Practices Published • Adaptive Mesh Refinement (AMR) • ESI PAM-CRASH / VPS 2013.1 • NEMO • Amber (for GPU/CUDA) • GADGET-2 • NWChem • Amber (for CPU) • GROMACS 5.1.2 • Octopus • ANSYS Fluent 15.0.7 • GROMACS 4.5.4 • OpenFOAM • ANSYS Fluent 17.1 • GROMACS 5.0.4 (GPU/CUDA) • OpenMX • BQCD • Himeno • PyFR • CASTEP 16.1 • HOOMD Blue • Quantum ESPRESSO 4.1.2 • CESM • LAMMPS • Quantum ESPRESSO 5.1.1 • CP2K • LAMMPS-KOKKOS • Quantum ESPRESSO 5.3.0 • CPMD • LS-DYNA • WRF 3.2.1 • DL-POLY 4 • MrBayes • WRF 3.8 • ESI PAM-CRASH 2015.1 • NAMD For more information, visit: http://www.hpcadvisorycouncil.com/subgroups_hpc_works.php 3 HPC Advisory Council HPC Center HPE Apollo 6000 HPE ProLiant
    [Show full text]
  • High-Performance Algorithms and Software for Large-Scale Molecular Simulation
    HIGH-PERFORMANCE ALGORITHMS AND SOFTWARE FOR LARGE-SCALE MOLECULAR SIMULATION A Thesis Presented to The Academic Faculty by Xing Liu In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Computational Science and Engineering Georgia Institute of Technology May 2015 Copyright ⃝c 2015 by Xing Liu HIGH-PERFORMANCE ALGORITHMS AND SOFTWARE FOR LARGE-SCALE MOLECULAR SIMULATION Approved by: Professor Edmond Chow, Professor Richard Vuduc Committee Chair School of Computational Science and School of Computational Science and Engineering Engineering Georgia Institute of Technology Georgia Institute of Technology Professor Edmond Chow, Advisor Professor C. David Sherrill School of Computational Science and School of Chemistry and Biochemistry Engineering Georgia Institute of Technology Georgia Institute of Technology Professor David A. Bader Professor Jeffrey Skolnick School of Computational Science and Center for the Study of Systems Biology Engineering Georgia Institute of Technology Georgia Institute of Technology Date Approved: 10 December 2014 To my wife, Ying Huang the woman of my life. iii ACKNOWLEDGEMENTS I would like to first extend my deepest gratitude to my advisor, Dr. Edmond Chow, for his expertise, valuable time and unwavering support throughout my PhD study. I would also like to sincerely thank Dr. David A. Bader for recruiting me into Georgia Tech and inviting me to join in this interesting research area. My appreciation is extended to my committee members, Dr. Richard Vuduc, Dr. C. David Sherrill and Dr. Jeffrey Skolnick, for their advice and helpful discussions during my research. Similarly, I want to thank all of the faculty and staff in the School of Compu- tational Science and Engineering at Georgia Tech.
    [Show full text]
  • Ir Division - Notice
    International Registration designating India Trade Marks Journal No: 2012 , 09/08/2021 Class 1 Priority claimed from 28/04/2020; Application No. : 1415749 ;Benelux 4684663 31/07/2020 [International Registration No. : 1552467] CAP III B.V. Mauritslaan 49 NL-6129 EL Urmond Netherlands Address for service in India/Attorney address: L.S. DAVAR & CO. GLOBSYN CRYSTALS,TOWER 1,2ND FLOOR,BLOCK EP,PLOT NO.11 & 12,SALT LAKE,SECTOR V,KOLKATA 700 091,WEST BENGAL,INDIA Proposed to be Used IR DIVISION Chemical substances, chemical materials and chemical preparations for use in industry, in particular synthetic chemical precursor products, chemical intermediates for use in manufacture, in particular chemical intermediates and chemical precursor products for use in manufacture of artificial and synthetic resins, raw (unprocessed) artificial resins, in particular polyamide or nylon, in particular polyamide 6, nylon 6, co-polymers; precursors and monomers for the manufacture of artificial and synthetic resins, precursor compounds for manufacturing artificial yarns and threads for textile use; nitrogen based chemicals and nitrogen compounds, in particular amides and amine compounds and their derivatives; amides for use in industry; amides for use in manufacture; cyclic amides; artificial and synthetic resins for use in manufacture, raw synthetic resins in the form of liquids, paste, powder, granules, pastes; polyamide resins; synthetic chemical precursor products for fire extinguishing compositions, fire proofing compositions, tempering and soldering preparations, substances for preserving foodstuffs, tanning substances, industrial adhesives. 5499 Trade Marks Journal No: 2012 , 09/08/2021 Class 1 Priority claimed from 03/04/2020; Application No. : 4636576 ;France 4739346 29/09/2020 [International Registration No.
    [Show full text]