ENABLING NEW SCIENCE TESLA™ BIO Workbench

Total Page:16

File Type:pdf, Size:1020Kb

ENABLING NEW SCIENCE TESLA™ BIO Workbench ENABLING NEW SCIENCE TESLA™ BIO WOrkBENCh The NVIDIA® Tesla™ Bio Workbench enables biophysicists GPU SOLUTIONS and computational chemists to push the boundaries of life The Tesla Bio Workbench applications can be deployed on sciences research. It turns a standard PC into a “computational GPU-based desktop personal supercomputers or in data laboratory” capable of running complex bioscience codes, in center solutions. Built on the revolutionary, massively fields such as drug discovery and DNA sequencing, more than parallel CUDA™ GPU computing architecture, these solutions 10-20 times faster through the use of NVIDIA Tesla GPUs. are designed to accelerate the pace of computational science It consists of bioscience applications; a community site for and are available today. Visit www.nvidia.com/tesla to find out downloading, discussing, and viewing the results of these more about: applications; and GPU-based platforms that enable these WORKSTATION SOLUTIONS: applications to run at 1/10th the cost of CPU-only computers. Complex molecular simulations that had been only possible using supercomputing resources can now be run on an individual workstation, optimizing the scientific workflow and accelerating the pace of research. These simulations can also be scaled up to GPU-based clusters of servers to simulate large molecules and systems that would have otherwise required a supercomputer. TESLA PERSONAL SUPERCOMPUTER Applications that are accelerated on GPUs include: For personal supercomputing at your desk • Molecular Dynamics & Quantum Chemistry DATA CENTER SOLUTIONS: - AMBER, GROMACS, HOOMD, LAMMPS, NAMD, TeraChem (Quantum Chemistry), VMD • Bio Informatics - CUDA-BLASTP, CUDA-EC, CUDA-MEME, CUDASW++ (Smith-Waterman), GPU-HMMER, MUMmerGPU For more information, visit www.nvidia.com/bio_workbench TESLA GPU COMPUTING CLUSTERS For computing with large-scale installations GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS AMBEr hOOMD The explicit solvent and implicit solvent simulations in AMBER have HOOMD-blue is a general purpose particle dynamics package been accelerated using CUDA-enabled GPUs. Paired with a Tesla written from ground-up to take advantage of the revolutionary GPU computing solution built on the CUDA architecture, it enables CUDA architecture in NVIDIA GPUs. It contains a number of more than 10x speedup compared to a single quad-core CPU. different force fields and integrators and its object oriented design enables adding additional ones easily. Benchmark data for explicit solvent PME and implicit solvent GB Benchmarks. Details can be found on the AMBER 11 Detailed benchmarks are available on the HOOMD-blue page. One NVIDIA benchmark page courtesy of the San Diego Tesla GPU running HOOMD-blue can outperform 32 CPU cores. Supercomputing Center. HOOMD on GPUs vs LAMMPS on 32 CPU cores 25 1.2 450 1.04 2 GPUs 2 GPUs 20.70 400 1 20 350 0.8 1 GPU 1 GPU 15 300 11.93 0.6 0.52 ns/day 10 ns/day 250 32 CPU 32 CPU 0.4 cores cores 5.94 200 5 0.2 0.06 150 LAMMS on 32 x 86 CPU cores Time steps/sec HOOMD-BLUE on 1 Tesla 10-series GPU 0 0 100 DHFR NVE Nucleosome HOOMD-BLUE on 2 Tesla 10-series GPUs 23,558 atoms 25,095 atoms 1GB-1 50 8 CPU cores 0 Tesla C1060 CPU = 2x Quad-core Intel Xeon E 5462, ifort 10.1, MKL 10.1, MPICHZ Polymer Lennard-Jones Liquid Tesla C2050 GPU = Tesla C1060 / Tesla C2050, Gfortran 4.1, nvcc v3.0 (64K particles) (64K particles) Data courtesy of University of Michigan GrOMACS LAMMPS GROMACS is a molecular dynamics package designed LAMMPS is a classical molecular dynamics package written to primarily for simulation of biochemical molecules like proteins, run well on parallel machines. The CUDA version of LAMMPS is lipids, and nucleic acids that have a lot complicated bonded accelerated by moving the force calculations to the GPU. interactions. The CUDA port of GROMACS enabling GPU acceleration is now available in beta and supports Particle- LAMMPS on the GPU scales very well as seen by the results below. Mesh-Ewald (PME), arbitrary forms of non-bonded interactions, Two Tesla GPUs running GPU-LAMMPS outperforms 24 CPUs. and implicit solvent Generalized Born methods. LAMMPS Results on CPU vs GPU Clusters The CUDA version of GROMACS currently supports a single 180 GPU and produces the following results: 160 2 GPUs 140 120 Particle-Mesh-Ewald (PME) Reaction-Field Cutoffs 100 12.0 7000 80 10.0 6000 60 5000 24 CPUs 8.0 40 22x simulated/sec Femtosec 4000 20 6.0 Faster 3000 steps/sec steps/sec 0 4.0 2000 1 8 24 2.0 1000 Number of Quad-Core CPUs or Number of Tesla GPUs 0.0 0 Solvated Spectrin Spectrin Dimer NaCl (894 atoms) 2 GPUs = 24 CPUs (37.8K atoms) (75.6K atoms) Data courtesy of Scott Hampton & Pratul K. Agarwal, Oak Ridge National Laboratory 2x Intel Quad Core E5462 Tesla C1060 GPU Data courtesy of Stockholm Center for Biomembrane Research GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS NAMD VMD The Team at University of Illinois at Urbana-Champaign (UIUC) VMD is a molecular visualization program for displaying, has been enabling CUDA-acceleration on NAMD since 2007. animating, and analyzing large biomolecular systems using They have performed scaling experiments on the NCSA 3-D graphics and built-in scripting. Several key kernels and Tesla-based Lincoln cluster and demonstrated that 4 Tesla applications in VMD now take advantage of the massively GPUs can outperform a cluster with 16 quad-core CPUs. parallel CUDA architecture of NVIDIA’s GPUs. These applications run 20x to 100x faster when using a NVIDIA CUDA NAMD scales very well on a Tesla GPU cluster as demonstrated GPU compared to running them on a CPU only. by these results. NAMD Results on CPU vs GPU Clusters Molecular Orbital Computate in VMD 9 70000 8 60000 7 50000 6 40000 5 30000 4 GPUs = 16 CPUs Lattice points/sec Lattice 4 3 10 20000 STMV Step/s Intel Core 2 Q6600 (Quad) 2.4 GHz 3 10000 Tesla C1060 GPU 2 0 C60-b C60-a Thr-b Thr-a 1 (carbon-60) (carbon-60) (threonine-17) (threonine-17) 0 Data courtesy of Theoretical and Computational Bio-physics Group, UIUC 1 2 4 8 16 Number of Quad-Core CPUs or Number of Tesla GPUs Data courtesy of Theoretical and Computation Bio-physics Group, UIUC TErAChEM TeraChem is the first general purpose quantum chemistry software package designed ground up specifically to take advantage of the massively parallel CUDA architecture of NVIDIA GPUs. TeraChem currently supports density functional theory (DFT) and Hartree-Fock (HF) methods. TeraChem supports multiple GPUs using p-threads and will soon be MPI enabled. A workstation with 4 Tesla GPUs running TeraChem outperforms 256 quad-core CPUs running GAMESS. Speedup of TeraChem on GPU vs GAMESS on CPU 50x Olestra 11x Vallnomyclin 7x Taxol 8x 2x Intel Quad Core 5520 CPU Buckyball 4x Tesla C1060 GPU 0 10 20 30 40 50 60 70 Data courtesy of Petachem GPU-BASED BIO-INFORMATICS APPLICATIONS CUDA-BLASTP CUDA-MEME CUDA-BLASTP is designed to accelerate NCBI BLAST for CUDA-MEME is a motif discovery software based on MEME scanning protein sequence databases by taking advantage (version 3.5.4). It accelerates MEME by taking advantage of the of the massively parallel CUDA architecture of NVIDIA Tesla massively parallel CUDA architecture of NVIDIA Tesla GPUs. It GPUs. CUDA-BLASTP also has a utility to convert FASTA supports the OOPS and ZOOPS models in MEME. format database into files readable by CUDA-BLASTP. CUDA-MEME running on one Tesla C1060 GPU is up to 23x CUDA-BLASTP running on a workstation with two Tesla C1060 faster than MEME running on a x86 CPU. This cuts compute GPUs is 10x faster than NCBI BLAST (2.2.22) running on an time from hours on CPUs to minutes using GPUs. The data Intel i7-920 CPU. This cuts compute time from minutes on in the chart below are for the OOPS (one occurrence per CPUs to seconds using GPUs. sequence) model for 4 datasets. CUDA-MEME vs MEME Speedups CUDA-BLASTP vs NCBI BLASTP Speedups 12 4 hours 15 mins 57 mins 25 10 3.6 sec 3.6 sec 8.5 sec 21 sec 8 20 22 sec 22 mins 6 Speedup NCBI BLASTP: 1 Thread Intel i7-920 CPU 15 4 NCBI BLASTP: 4 Threads Intel i7-920 CPU CUDA BLASTP: 1 Tesla C1060 GPU 2 Speedup 10 CUDA BLASTP: 2 Tesla C1060 GPUs 0 AMD Opteron 2.2 GHz 127 254 517 1054 2026 5 Tesla C1060 GPU Query Sequence Length 0 Data courtesy of Nanyang Technological University, Singapore Mini-drosoph HS_100 HS_200 HS_400 4 sequences 100 sequences 200 sequences 4 sequences Data courtesy of Nanyang Technological University, Singapore CUDA-EC CUDA-EC is a fast parallel sequence error correction tool for CUDASW++ (SMITh-WaterMAN) short reads. It corrects sequencing errors in high-throughput CUDASW++ is a bio-informatics software for Smith-Waterman short-read (HTSR) data and accelerates HTSR by taking protein database searches that takes advantage of the advantage of the massively parallel CUDA architecture of massively parallel CUDA architecture of NVIDIA Tesla GPUs to NVIDIA Tesla GPUs. Error correction is a preprocessing step for perform sequence searches 10x-50x faster than NCBI BLAST. many DNA fragment assembly tools and is very useful for the CUDASW++ supports query lengths up to 59K. new high-throughput sequencing machines. CUDASW++ can take advantage of multiple Tesla GPUs and CUDA-EC running on one Tesla C1060 GPU is up to 20x faster runs between 10x-50x faster than NCBI BLAST, getting up to than Euler-SR running on a x86 CPU.
Recommended publications
  • CESMIX: Center for the Exascale Simulation of Materials in Extreme Environments
    CESMIX: Center for the Exascale Simulation of Materials in Extreme Environments Project Overview Youssef Marzouk MIT PSAAP-3 Team 18 August 2020 The CESMIX team • Our team integrates expertise in quantum chemistry, atomistic simulation, materials science; hypersonic flow; validation & uncertainty quantification; numerical algorithms; parallel computing, programming languages, compilers, and software performance engineering 1 Project objectives • Exascale simulation of materials in extreme environments • In particular: ultrahigh temperature ceramics in hypersonic flows – Complex materials, e.g., metal diborides – Extreme aerothermal and chemical loading – Predict materials degradation and damage (oxidation, melting, ablation), capturing the central role of surfaces and interfaces • New predictive simulation paradigms and new CS tools for the exascale 2 Broad relevance • Intense current interest in reentry vehicles and hypersonic flight – A national priority! – Materials technologies are a key limiting factor • Material properties are of cross-cutting importance: – Oxidation rates – Thermo-mechanical properties: thermal expansion, creep, fracture – Melting and ablation – Void formation • New systems being proposed and fabricated (e.g., metallic high-entropy alloys) • May have relevance to materials aging • Yet extreme environments are largely inaccessible in the laboratory – Predictive simulation is an essential path… 3 Demonstration problem: specifics • Aerosurfaces of a hypersonic vehicle… • Hafnium diboride (HfB2) promises necessary temperature
    [Show full text]
  • Popular GPU-Accelerated Applications
    LIFE & MATERIALS SCIENCES GPU-ACCELERATED APPLICATIONS | CATALOG | AUG 12 LIFE & MATERIALS SCIENCES APPLICATIONS CATALOG Application Description Supported Features Expected Multi-GPU Release Status Speed Up* Support Bioinformatics BarraCUDA Sequence mapping software Alignment of short sequencing 6-10x Yes Available now reads Version 0.6.2 CUDASW++ Open source software for Smith-Waterman Parallel search of Smith- 10-50x Yes Available now protein database searches on GPUs Waterman database Version 2.0.8 CUSHAW Parallelized short read aligner Parallel, accurate long read 10x Yes Available now aligner - gapped alignments to Version 1.0.40 large genomes CATALOG GPU-BLAST Local search with fast k-tuple heuristic Protein alignment according to 3-4x Single Only Available now blastp, multi cpu threads Version 2.2.26 GPU-HMMER Parallelized local and global search with Parallel local and global search 60-100x Yes Available now profile Hidden Markov models of Hidden Markov Models Version 2.3.2 mCUDA-MEME Ultrafast scalable motif discovery algorithm Scalable motif discovery 4-10x Yes Available now based on MEME algorithm based on MEME Version 3.0.12 MUMmerGPU A high-throughput DNA sequence alignment Aligns multiple query sequences 3-10x Yes Available now LIFE & MATERIALS& LIFE SCIENCES APPLICATIONS program against reference sequence in Version 2 parallel SeqNFind A GPU Accelerated Sequence Analysis Toolset HW & SW for reference 400x Yes Available now assembly, blast, SW, HMM, de novo assembly UGENE Opensource Smith-Waterman for SSE/CUDA, Fast short
    [Show full text]
  • Kepler Gpus and NVIDIA's Life and Material Science
    LIFE AND MATERIAL SCIENCES Mark Berger; [email protected] Founded 1993 Invented GPU 1999 – Computer Graphics Visual Computing, Supercomputing, Cloud & Mobile Computing NVIDIA - Core Technologies and Brands GPU Mobile Cloud ® ® GeForce Tegra GRID Quadro® , Tesla® Accelerated Computing Multi-core plus Many-cores GPU Accelerator CPU Optimized for Many Optimized for Parallel Tasks Serial Tasks 3-10X+ Comp Thruput 7X Memory Bandwidth 5x Energy Efficiency How GPU Acceleration Works Application Code Compute-Intensive Functions Rest of Sequential 5% of Code CPU Code GPU CPU + GPUs : Two Year Heart Beat 32 Volta Stacked DRAM 16 Maxwell Unified Virtual Memory 8 Kepler Dynamic Parallelism 4 Fermi 2 FP64 DP GFLOPS GFLOPS per DP Watt 1 Tesla 0.5 CUDA 2008 2010 2012 2014 Kepler Features Make GPU Coding Easier Hyper-Q Dynamic Parallelism Speedup Legacy MPI Apps Less Back-Forth, Simpler Code FERMI 1 Work Queue CPU Fermi GPU CPU Kepler GPU KEPLER 32 Concurrent Work Queues Developer Momentum Continues to Grow 100M 430M CUDA –Capable GPUs CUDA-Capable GPUs 150K 1.6M CUDA Downloads CUDA Downloads 1 50 Supercomputer Supercomputers 60 640 University Courses University Courses 4,000 37,000 Academic Papers Academic Papers 2008 2013 Explosive Growth of GPU Accelerated Apps # of Apps Top Scientific Apps 200 61% Increase Molecular AMBER LAMMPS CHARMM NAMD Dynamics GROMACS DL_POLY 150 Quantum QMCPACK Gaussian 40% Increase Quantum Espresso NWChem Chemistry GAMESS-US VASP CAM-SE 100 Climate & COSMO NIM GEOS-5 Weather WRF Chroma GTS 50 Physics Denovo ENZO GTC MILC ANSYS Mechanical ANSYS Fluent 0 CAE MSC Nastran OpenFOAM 2010 2011 2012 SIMULIA Abaqus LS-DYNA Accelerated, In Development NVIDIA GPU Life Science Focus Molecular Dynamics: All codes are available AMBER, CHARMM, DESMOND, DL_POLY, GROMACS, LAMMPS, NAMD Great multi-GPU performance GPU codes: ACEMD, HOOMD-Blue Focus: scaling to large numbers of GPUs Quantum Chemistry: key codes ported or optimizing Active GPU acceleration projects: VASP, NWChem, Gaussian, GAMESS, ABINIT, Quantum Espresso, BigDFT, CP2K, GPAW, etc.
    [Show full text]
  • POLISH SUPERCOMPUTING CENTERS (WCSS – Wrocław, ICM – Warsaw, CYFRONET - Kraków, TASK – Gdańsk, PCSS – Poznań)
    POLISH SUPERCOMPUTING CENTERS (WCSS – Wrocław, ICM – Warsaw, CYFRONET - Kraków, TASK – Gdańsk, PCSS – Poznań) Provide free access to supercomputers and licensed software for all academic users in form of computational grants. In order to obtain access Your supervisor has to write grant application and acknowledge use of supercomputer resources in published papers. 1) Wrocław Networking and Supercomputing Center Wrocławskie Centrum Sieciowo-Superkomputerowe https://www.wcss.pl/ • Klaster Bem - najnowszy superkomputer WCSS, posiadający 22 tyś. rdzeni obliczeniowych o łącznej mocy 860 TFLOPS. Klaster liczy 720 węzłów obliczeniowych 24-rdzeniowych (Intel Xeon E5-2670 v3 2.3 GHz, Haswell) oraz 192 węzły obliczeniowe 28-rdzeniowe (Intel Xeon E5- 2697 v3 2.6 GHz, Haswell). Posiada 74,6 TB pamięci RAM (64, 128 lub 512 GB/węzeł) (złóż wniosek o grant) • Klaster kampusowy - klaster włączony w struktury projektu PLATON U3, świadczący usługi aplikacji na żądanie. Klaster liczy 48 węzłów, w tym 16 wyposażonych w karty NVIDIA Quadro FX 580. 8688 rdzeni obliczeniowych 46 TB pamięci RAM 8 PB przestrzeni dyskowej • Available software Abaqus ⋅ ABINIT ⋅ ADF ⋅ Amber ⋅ ANSYS [ ANSYS CFD: Fluent, CFX, ICEM; Mechanical ] ⋅ AutoDock ⋅ BAGEL ⋅ Beast ⋅ Biovia [ Materials Studio, Discovery Studio ] ⋅ Cfour ⋅ Comsol ⋅ CP2K ⋅ CPMD ⋅ CRYSTAL09 ⋅ Dalton ⋅ DIRAC ⋅ FDS-SMV ⋅ GAMESS ⋅ Gaussian ⋅ Gromacs ⋅ IDL ⋅ Lumerical [ FDTD, MODE ] ⋅ Maple ⋅ Mathcad ⋅ Mathematica⋅ Matlab ⋅ Molcas ⋅ Molden ⋅ Molpro ⋅ MOPAC ⋅ NAMD ⋅ NBO ⋅ NWChem ⋅ OpenFOAM ⋅ OpenMolcas ⋅ Orca ⋅ Quantum
    [Show full text]
  • Düzce University Journal of Science & Technology
    Düzce University Journal of Science & Technology, 9 (2021) 1227-1241 Düzce University Journal of Science & Technology Research Article Detonation Parameters of the Pentaerythritol Tetranitrate and Some Structures Descriptors in Different Solvents - Computational Study Cihat HİLAL a,*, Müşerref ÖNAL b , Mehmet Erman MERT c a,b Department of Chemistry, Faculty of Sciences, Ankara University, Ankara, TURKEY c Advanced Technology Research and Application Center, Adana Alparslan Türkeş Science and Technology University, Adana, TURKEY * Corresponding author’s e-mail address: [email protected] DOI: 10.29130/dubited.896332 ABSTRACT Pentaerythritol tetranitrate (PETN, C5H8N4O12) is a relatively stable explosive nitrate ester molecule. It has been widely used in various military and public industrial productions. In this study, the solubility tendency of PETN in different organic solvents was investigated theoretically. Several physicochemical parameters of PETN such as density, detonation pressure, temperature, rate and products of detonation reaction were investigated using the B3LYP functional and basic set of polarization functions (d, p) containing 6-31G**. The obtained results have been compared with the literature values. Furthermore, the stability and reactivity of PETN in acetone, diethyl ether, ethanol, tetrahydrofuran, toluene and methylene chloride were examined. Results revealed toluene is a good solvent to increase the explosive properties of PETN. Keywords: Density Functional Theory, Organic Solvents, PETN Pentaeritritol Tetranitratın
    [Show full text]
  • Openmolcas: from Source Code to Insight Ignacio Fdez
    Article Cite This: J. Chem. Theory Comput. XXXX, XXX, XXX−XXX pubs.acs.org/JCTC OpenMolcas: From Source Code to Insight Ignacio Fdez. Galvan,́ 1,2 Morgane Vacher,1 Ali Alavi,3 Celestino Angeli,4 Francesco Aquilante,5 Jochen Autschbach,6 Jie J. Bao,7 Sergey I. Bokarev,8 Nikolay A. Bogdanov,3 Rebecca K. Carlson,7,9 Liviu F. Chibotaru,10 Joel Creutzberg,11,12 Nike Dattani,13 Mickael̈ G. Delcey,1 Sijia S. Dong,7,14 Andreas Dreuw,15 Leon Freitag,16 Luis Manuel Frutos,17 Laura Gagliardi,7 Fredé rić Gendron,6 Angelo Giussani,18,19 Leticia Gonzalez,́ 20 Gilbert Grell,8 Meiyuan Guo,1,21 Chad E. Hoyer,7,22 Marcus Johansson,12 Sebastian Keller,16 Stefan Knecht,16 Goran Kovacevič ,́ 23 Erik Kallman,̈ 1 Giovanni Li Manni,3 Marcus Lundberg,1 Yingjin Ma,16 Sebastian Mai,20 Joaõ Pedro Malhado,24 Per Åke Malmqvist,12 Philipp Marquetand,20 Stefanie A. Mewes,15,25 Jesper Norell,11 Massimo Olivucci,26,27,28 Markus Oppel,20 Quan Manh Phung,29 Kristine Pierloot,29 Felix Plasser,30 Markus Reiher,16 Andrew M. Sand,7,31 Igor Schapiro,32 Prachi Sharma,7 Christopher J. Stein,16,33 Lasse Kragh Sørensen,1,34 Donald G. Truhlar,7 Mihkel Ugandi,1,35 Liviu Ungur,36 Alessio Valentini,37 Steven Vancoillie,12 Valera Veryazov,12 Oskar Weser,3 Tomasz A. Wesołowski,5 Per-Olof Widmark,12 Sebastian Wouters,38 Alexander Zech,5 J. Patrick Zobel,12 and Roland Lindh*,2,39 1Department of Chemistry − Ångström Laboratory, Uppsala University, P.O.
    [Show full text]
  • Virt&L-Comm.3.2012.1
    Virt&l-Comm.3.2012.1 A MODERN APPROACH TO AB INITIO COMPUTING IN CHEMISTRY, MOLECULAR AND MATERIALS SCIENCE AND TECHNOLOGIES ANTONIO LAGANA’, DEPARTMENT OF CHEMISTRY, UNIVERSITY OF PERUGIA, PERUGIA (IT)* ABSTRACT In this document we examine the present situation of Ab initio computing in Chemistry and Molecular and Materials Science and Technologies applications. To this end we give a short survey of the most popular quantum chemistry and quantum (as well as classical and semiclassical) molecular dynamics programs and packages. We then examine the need to move to higher complexity multiscale computational applications and the related need to adopt for them on the platform side cloud and grid computing. On this ground we examine also the need for reorganizing. The design of a possible roadmap to establishing a Chemistry Virtual Research Community is then sketched and some examples of Chemistry and Molecular and Materials Science and Technologies prototype applications exploiting the synergy between competences and distributed platforms are illustrated for these applications the middleware and work habits into cooperative schemes and virtual research communities (part of the first draft of this paper has been incorporated in the white paper issued by the Computational Chemistry Division of EUCHEMS in August 2012) INTRODUCTION The computational chemistry (CC) community is made of individuals (academics, affiliated to research institutions and operators of chemistry related companies) carrying out computational research in Chemistry, Molecular and Materials Science and Technology (CMMST). It is to a large extent registered into the CC division (DCC) of the European Chemistry and Molecular Science (EUCHEMS) Society and is connected to other chemistry related organizations operating in Chemical Engineering, Biochemistry, Chemometrics, Omics-sciences, Medicinal chemistry, Forensic chemistry, Food chemistry, etc.
    [Show full text]
  • GPGPU Progress in Computational Chemistry
    GPGPU Progress in Computational Chemistry Mark Berger, Life and Material Sciences Alliances Manager [email protected] Agenda • A Little about NVIDIA • Update on GPUs in Computational Chemistry • A Couple of Examples • Upcoming Events 2 PARALLEL PERSONAL VISUALIZATION COMPUTING COMPUTING c 3 NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA | ~$4B revenue | ~5,500 employees Founded in 1999 with primary business in semiconductor industry Products for graphics in workstations, notebooks, mobile devices, etc. Began R&D of GPUs for HPC in 2004, released first Tesla and CUDA in 2007 Development of GPUs as a co-processing accelerator for x86 CPUs HPC Evolution of GPUs 2004: Began strategic investments in GPU as HPC co-processor 2006: G80 first GPU with built-in compute features, 128 cores; CUDA SDK Beta 2007: Tesla 8-series based on G80, 128 cores – CUDA 1.0, 1.1 2008: Tesla 10-series based on GT 200, 240 cores – CUDA 2.0, 2.3 3 Years With 3 Generations 2009: Tesla 20-series, code named “Fermi” up to 512 cores – CUDA SDK 3.0, 3.2 4 Tesla Data Center & Workstation GPU Solutions Tesla M-series GPUs Tesla C-series GPUs M2090 | M2070 | M2050 C2070 | C2050 Integrated CPU-GPU Workstations Servers & Blades 2 to 4 Tesla GPUs M2090 M2070 M2050 C2070 C2050 Cores 512 448 448 448 448 Memory 6 GB 6 GB 3 GB 6 GB 3 GB Memory bandwidth 177.6 GB/s 150 GB/s 148.8 GB/s 148.8 GB/s 148.8 GB/s (ECC off) Single 1030 1030 1331 1030 1030 Peak Precision Perf 515 515 Double Gflops 665 515 515 Precision 5 GPUs are Disruptive Speed Throughput Insight Days
    [Show full text]
  • Recent Advances in Computing Infrastructure at Penn State and a Proposal for Integrating Computational Science in Undergraduate Education
    Recent Advances in Computing Infrastructure at Penn State and A Proposal for Integrating Computational Science in Undergraduate Education a presentation for the HPC User Forum 2013 Fall Conference of Korean Society of Computational Science & Engineering Sponsored by National Institute of Supercomputing & Networking Seoul, South Korea October 1, 2013 Vijay K. Agarwala Senior Director, Research Computing and CI Information Technology Services The Pennsylvania State University University Park, PA 16802 USA What is missing in today's undergraduate education? • Lack of comprehensive training in computational science principles in many majors except Computer Science – Object Oriented Programming and Discrete Mathematics – Data Structures and Algorithm Analysis – Mathematical Methods and Numerical Analysis – Parallel Programming and Numerical Algorithms • A typical BS/BA graduate is not well-prepared for a computationally intensive graduate program • Computational proficiency is being acquired on-the-job, either in graduate school or at place of employment What could universities add or change? • An integrated 5-year program, either a BS/MS or dual BS/BA – Keep the general education requirements intact. The tradition of a liberal arts education is highly valued. – The discipline-specific part of undergraduate curricula cannot be shortened either – there is more to be taught than ever before in all majors – Add 8-10 classes to teach core principles of computational science and also extend the computational aspects of a discipline – Intent is
    [Show full text]
  • User's Guide Terachem V1.9
    User’s Guide TeraChem v1.9 PetaChem, LLC 26040 Elena Road Los Altos Hills, CA 94022 http://www.petachem.com August 16, 2017 TeraChem User’s Guide Contents 1 Introduction 3 1.1 Obtaining TeraChem.........................................4 1.2 Citing TeraChem...........................................4 1.3 Acknowledgements..........................................5 2 Getting started 6 2.1 System requirements.........................................6 2.2 Installation..............................................6 2.3 Environment Variables........................................7 2.4 Licensing...............................................8 2.4.1 Node-locked Licenses.....................................8 2.4.2 IP-Checkout Licenses....................................9 2.4.3 HTTP Forwarding......................................9 2.5 Command Line Options....................................... 10 2.6 Running sample jobs......................................... 12 3 TeraChem I/O file formats 17 3.1 Input files............................................... 17 3.2 Atomic coordinates.......................................... 17 3.3 Basis set file format......................................... 17 3.3.1 Mixing basis sets - Custom Basis File Method....................... 20 3.3.2 Mixing Basis Sets - Multibasis Method........................... 21 3.4 Fixing Atoms in Molecular Dynamics................................ 21 3.5 Ab Initio Steered Molecular Dynamics (AISMD)......................... 22 3.6 Specifying Values of the Hubbard Parameter for DFT+U...................
    [Show full text]
  • GPU Accelerated Applications for Academic Research Devang Sachdev Sr
    GPU Accelerated Applications for Academic Research Devang Sachdev Sr. Product Manager Explosive Growth of GPU Computing Institutions with 629 University Courses 8,000 CUDA Developers In 62 Countries 1,500,000 CUDA Downloads 395,000,000 CUDA GPUs Shipped Titan: World’s #1 Open Science Supercomputer 18,688 Tesla Kepler GPUs Fully GPU Accelerated Computing System 20+ Petaflops: 90% of Performance from GPUs Day One: 2.5x Oversubscribed Demand: 5B Compute Hours Available: 2B Compute Hours 100X 130X 25X 60X Astrophysics Quantum Chemistry Fast Flow Simulation Geospatial Imagery RIKEN U of Illinois, Urbana FluiDyna PCI Geomatics GPUs Accelerate HPC 12X12X 30X 149X 18X Kirchoff Time Migration Gene Sequencing Financial Simulation Video Transcoding Acceleware U of Maryland Oxford Elemental Tech GPUs Accelerate Compute-Intensive Functions Application Code Rest of Sequential Compute-Intensive Functions CPU Code GPU Use GPU to Parallelize CPU + GPU Computing – Easier Than You Think GPU Apps Libraries Directives CUDA NAMD C++ Thrust OpenACC CUDA C MATLAB cuBLAS Open CUDA C++ CHROMA cuSPARSE Simple CUDA Fortran VASP cuFFT Portable Many more Many more Key Application Developers Embracing GPUs “GPU computing enables highly accurate calculations with a short time to solution.” Dr. Jeongnim Kim, Lead Developer for QMCPACK “GPU capabilities are the path forward for scaling dense codes, and VASP is no exception.” Dr. Michael Widom, Professor at Carnegie Mellon University & One of the leads in the VASP++ Consortium “A petascale GPU cluster would dramatically increase access to large-scale simulation capability in the 10M-100M-atom range, allowing simulations of significant cellular and viral structures on useful timescales.” Dr.
    [Show full text]
  • Chey Marcel Jones
    Chey Marcel Jones CURRICULUM VITAE 350 Sharon Park Dr, Menlo Park, CA 94025 [email protected] | (570) 449-7329 EDUCATION Stanford University Stanford, CA Doctor of Philosophy, Chemistry Expected 2022 Research Advisor: Todd Martínez Temple University Philadelphia, PA Bachelor of Science with Honors, Chemistry Aug. 2013 – May 2017 Summa Cum Laude, Distinction in Major Research Advisor: Spiridoula Matsika Cumulative GPA: 3.94/4.00 AWARDS AND HONORS 2017-2022, National Science Foundation Graduate Research Fellowship National Science Foundation 2017-2022, Stanford Graduate Fellowship in Science and Engineering Stanford University – Department of Chemistry 2017-2022, Enhancing Diversity in Graduate Education (EDGE) Doctoral Fellowship Stanford University 2017, ACS Scholastic Achievement Award in Chemistry American Chemical Society, Philadelphia Section 2017, Stanford Summer Fellowship Stanford University 2016, ACS 252nd National Meeting Outstanding Undergraduate Poster Award American Chemical Society, Division of Physical Chemistry 2016, Leadership Alliance Summer Research-Early Identification Program (SR-EIP) Scholar Stanford University 2016-2017, Albert B. Brown Chemistry Award Temple University – Main Campus 2015-2016, Albert B. Brown Chemistry Award Temple University – Main Campus 2015-2017, Maximizing Access to Research Careers – Undergraduate Student Training in Academic Research (MARC U-STAR) Award National Institutes of Health, Temple University – Main Campus 2013-2017, Honors Provost Scholarship Award Temple University – Main Campus RESEARCH EXPERIENCE Undergraduate Research Advisor: Spiridoula Matsika Sep. 2014 - May 2017 Temple University – Main Campus • Electronic structure and behavior of a molecular photochromic switch were investigated for applications in nanotechnology and optical data storage. Photo-physical pathways were modeled, using multi-reference methods, to elucidate reaction pathways through conical intersections, based on ground-state and excited-state character.
    [Show full text]