Automated Code Generation and Optimization for GPU Kernels Alexey Titov, Ivan Ufimtsev, Nathan Luehr and Todd Martinez

Total Page:16

File Type:pdf, Size:1020Kb

Automated Code Generation and Optimization for GPU Kernels Alexey Titov, Ivan Ufimtsev, Nathan Luehr and Todd Martinez Automated Code Generation and Optimization for GPU Kernels Alexey Titov, Ivan Ufimtsev, Nathan Luehr and Todd Martinez Department of Chemistry Stanford University GTC, May 2012 GPU computing ecosystem evolution G80 GT200 Fermi Kepler Cayman Tahiti MIC GPU computing ecosystem evolution G80 GT200 Fermi Kepler Cayman Tahiti MIC 2007 2012 TeraChem Selected feature list • Restricted, unrestricted, and restricted open shell Hartree-Fock and grid-based Kohn-Sham energy and gradient calculations • Full support of s, p and d-type basis functions • Various DFT functionals, including range-corrected and Coulomb attenuated functionals (BLYP, B3LYP, PBE, PBE0, ωPBE, ωPBEh, ωB97, ωB97x, camB3LYP, etc) and DFT grids (800 - 80,000 grid points per atom) • Static and dynamical DFT grids • Empirical dispersion correction (DFT-D3 and DFT-D2) • Geometry optimization (L-BFGS, Conjugate gradient, Steepest descent) and transition state search • Reaction path and transition state search (through DL-FIND, Kastner) • Ab initio molecular dynamics (NVE, NVT ensembles) • Time reversible Born-Oppenheimer dynamics • Spherical boundary conditions • Support of multiple-GPU systems • Single/Dynamical/Double precision accuracy • QM/MM treatment of surrounding water molecules using TIP3P force field • QM/MM with TeraChem/Amber – (w/ Ross Walker, UCSD/SDSC) • Natural bond orbital analysis through integration with NBO6 • Polarizabilities for HF and closed-shell DFT methods TeraChem the world’s fastest best GPU multi-GPU accelerated quantum computational chemisty software Gaussian Nwchem GAMESS Q-Chem MolPro DFT density functional theory GGA LDA hybrid functionals coupled cluster hartree fock ab initio molecular dynamics electronic structure molecular properties nano bionano nanosystem high performance AMD radeon NVIDIA C2050 petachem GPGPU polarization charge redistribution Modeling simulation molecular mechanics first principles jaguar http://petachem.com/ spartan mpqc psi wavefunction dmol3 gpaw cpmd gaussian basis sets gaussian type orbitals Quantum chemistry with TeraChem 3 journal covers, 8 peer-reviewed papers, 4000+ downloads of free beta Quantum chemistry with TeraChem Riding Advances in GPU Hardware: Molecule Molecule size 2009 2011 • Is it possible to easily retune codes for new and older archs for better performance? • How to simplify transitions between architectures (e.g. Fermi -> Kepler)? • How to implement complex kernels performing efficiently for GPUs? • What about other hardware architectures (Cayman, Tahiti, MIC, etc)? Increase computational capabilities + 26 more elements Managing d-functions • Increased number of kernels to calculate electron repulsion integrals over gaussian-type orbitals χ(r): 1 ( | ) (r ) (r ) (r ) (r )dr dr 1 1 2 2 1 2 | r1 r2 | J: 9 36 K: 10 45 • Increased depth of calculation: J kernel for ssss integrals batch: 63 loc (30 flops) pppp integrals batch: 306 loc (387 flops) dddd integrals batch: 2094 loc (3584 flops) Our ideal case: automate kernel generation and optimization Opening ‘combination’ lock for multiple targets Batch of integrals going through the generation pipeline voidJSSSPclne = voidJSSSPclne; dbllData0 = 0.0e0; loopentry = loopentry; Rs = Rlev(R0000, R0001); gamm = Gamma1(R0000, R0001, T); R0001multi = -0.20e1 * alp1; fltR[-1][-1][0][-1] = c * R[-1][-1][-1][0]; fltR[-1][0][-1][-1] = b * R[-1][-1][-1][0]; fltR[0][-1][-1][-1] = a * R[-1][-1][-1][0]; tmp = Temp(tmp0); Maple blockindex = 0; sed collect0 = R[-1][-1][-1][-1] * P0; blockindex = 1; collect0minus = R[0][-1][-1][-1] * P1; blockindex = 2; collect0minus = R[-1][0][-1][-1] * P2; C++/CUDA blockindex = 3; collect0minus = R[-1][-1][0][-1] * P3; Lambda = Lambda; lData0plus = tmp0 * Lambda; gthidXplus = BSIZEX; logg = logg; lData0contraction1 = clps; clse = clse; Intermediate representation Autogenerated J kernel example: dddd batch … for( [bra • ket] > ε) { // load data Bytes per thread: 1880 Gamma8(…) (reg + lmem) // calculate a,b,c and auxiliary functions R000j Mops: 47 Flops: 3583 float R0010 = c * R0001; … float R3000 = a * R2001 + 2.0f * R1001; 486 lines … float R0080 = c * R0071 + 7.0f * R0061; float P0 = fetch_data(preproP, g_thidX); tmp0 += R0000 * P0; Total 2090 lines tmp1 += R1000 * P0; … 35 lines tmp34 += R0040 * P0; × 35 = 1300 lines float P34 = fetch_data(preproP, g_thidX + ne*34); … //accumulate tmps in DP } // collect integrals and upload to global memory JDDDD performance GPU orig.+ volatile 3× GPU orig. ~ 13.33 GPU i7, 8 cores, SSE ~ 0.63 GPU i7, 1 core Architecture tuning: Empirically test different pathways Code variant #1 float R1000 = a * R0001; tmp_0 += R1000 * ((-PBx * (Pxz * PAx + PAz * Pzz + PAy * Pyz) * QDz - PBx * (PAy * Pyx + PAz * Pzx + Pxx * PAx) * QCx - PBx * (PAy * Pyx + PAz * Pzx + Pxx * PAx) * QDx - PBx * (PAz * Pzy + PAy * Pyy + Pxy * PAx) * QDy) * rtaq + ((-Pxz * QDz + PAz * Pzx - Pxy * QDy - Pxx * QDx - Pxx * QCx + Pxx * PBx + Pxx * PAx + PAy * Pyx) * rtaq + ((PAy * Pyx + PAz * Pzx + Pxx * PBx + Pxx * PAx) * QDx + (Pxy * PAx + PAz * Pzy + PAy * Pyy + Pxy * PBx) * QDy + (PAy * Pyz + Pxz * PAx + PAz * Pzz + Pxz * PBx) * QDz) * QCx) * rtap); Legend Red: density matrix elements RXXXX: variables containing values of auxiliary functions Blue: hermite expansion coefficients (ket pair) Green: hermite expansion coefficients (bra pair) Bold: density contracted with ket coefficients Italic: intermediates Code variant #2 float t650 = Pxy * R2100 + Pxz * R2010 + Pxx * R3000; float t652 = Pxy * R1100 + Pxz * R1010 + Pxx * R2000; float t639 = t650 * rtap + t652 * PAx; float t658 = -rtap * R2000 - PAx * R1000; float t660 = Pxx * QDx + QDy * Pxy + QDz * Pxz; float t659 = -Pxx * R1000 - Pxy * R0100 - Pxz * R0010; float t656 = rtap * R1000 + PAx * R0000; float t641 = t656 * PBx + (R0000 - t658) * rtap; float t640 = -t652 * rtap + t659 * PAx; float t624 = t640 * PBx + (t659 - t639) * rtap; tmp_0 += ((t639 * PBx + ((Pxy * R3100 + Pxz * R3010 + Pxx * R4000) * rtap + t650 * PAx + t652) * rtap) * rtaq + t624 * QCx + t641 * Pxx) * rtaq + t660 * ((t658 * PBx + (-rtap * R3000 - PAx * R2000 - R1000) * rtap) * rtaq + t641 * QCx); Code variant #3 float PP_0 = Pxx * QDx + Pxy * QDy + Pxz * QDz; float PP_1 = Pxx * rtaq; float PP_2 = Pxy * rtaq; float PP_3 = Pxz * rtaq; tmp_0 += PBx * ( PAx * ( QCx * (R0000*PP_0 - R1000*PP_1 - R0100*PP_2 - R0010*PP_3) - rtaq * (R1000*PP_0 - R2000*PP_1 - R1100*PP_2 - R1010*PP_3) + R0000 * PP_1) + rtap * ( QCx * (R1000*PP_0 - R2000*PP_1 - R1100*PP_2 - R1010*PP_3) - rtaq * (R2000*PP_0 - R3000*PP_1 - R2100*PP_2 - R2010*PP_3) + R1000 * PP_1)) + rtap * ( PAx * ( QCx * (R1000*PP_0 - R2000*PP_1 - R1100*PP_2 - R1010*PP_3) - rtaq * (R2000*PP_0 - R3000*PP_1 - R2100*PP_2 - R2010*PP_3) + R1000 * PP_1) + rtap * ( QCx * (R2000*PP_0 - R3000*PP_1 - R2100*PP_2 - R2010*PP_3) - rtaq * (R3000*PP_0 - R4000*PP_1 - R3100*PP_2 - R3010*PP_3) + R2000 * PP_1)) + rtap * ( QCx * (R0000*PP_0 - R1000*PP_1 - R0100*PP_2 - R0010*PP_3) - rtaq * (R1000*PP_0 - R2000*PP_1 - R1100*PP_2 - R1010*PP_3) + R0000 * PP_1); Empirical testing of code variants Colors: different auto-generated kernels Empirical testing of code variants Code variant C1060 Timing (ms) C2050 Timing (ms) Registersa FLOPsb 1 1025.26 822.57 115 1049 2 1042.57 823.99 115 1083 3 1112.64 988.97 114 1218 4 1117.36 1151.79 120 2124 5 2303.17 2511.44 145 1185 6 2523.15 2780.31 171 2012 7 2077.94 2852.86 141 1931 Development & execution stages Computer Algebra System C/C++ CUDA OpenCL C/C++ CUDA OpenCL Assembly language Assembly language Hardware Hardware Algebraic part Algebraic part Numerical part Numerical part Data input Data input Data output Data output Computation flow expressed and Computational flow designed algebraically and computed in C language then expressed & computed in C language Conclusions • Performance is sensitive to architecture-specific optimizations. • There is no direct and meaningful relationship between performance and FLOPS on GPUs. • Automatic code generation and performance tuning will provide code portability. It enables performance portability across various architectures: from the same or different vendors. Acknowledgements Nathan Luehr Ivan Ufimtsev The Boss Funding Not shown: Jeff Gour, Ed Hohenstein STTR - AFOSR Jason Quenneville, Spectral Sciences Vlad Kindratenko Guochun Shi .
Recommended publications
  • Free and Open Source Software for Computational Chemistry Education
    Free and Open Source Software for Computational Chemistry Education Susi Lehtola∗,y and Antti J. Karttunenz yMolecular Sciences Software Institute, Blacksburg, Virginia 24061, United States zDepartment of Chemistry and Materials Science, Aalto University, Espoo, Finland E-mail: [email protected].fi Abstract Long in the making, computational chemistry for the masses [J. Chem. Educ. 1996, 73, 104] is finally here. We point out the existence of a variety of free and open source software (FOSS) packages for computational chemistry that offer a wide range of functionality all the way from approximate semiempirical calculations with tight- binding density functional theory to sophisticated ab initio wave function methods such as coupled-cluster theory, both for molecular and for solid-state systems. By their very definition, FOSS packages allow usage for whatever purpose by anyone, meaning they can also be used in industrial applications without limitation. Also, FOSS software has no limitations to redistribution in source or binary form, allowing their easy distribution and installation by third parties. Many FOSS scientific software packages are available as part of popular Linux distributions, and other package managers such as pip and conda. Combined with the remarkable increase in the power of personal devices—which rival that of the fastest supercomputers in the world of the 1990s—a decentralized model for teaching computational chemistry is now possible, enabling students to perform reasonable modeling on their own computing devices, in the bring your own device 1 (BYOD) scheme. In addition to the programs’ use for various applications, open access to the programs’ source code also enables comprehensive teaching strategies, as actual algorithms’ implementations can be used in teaching.
    [Show full text]
  • Integrated Tools for Computational Chemical Dynamics
    UNIVERSITY OF MINNESOTA Integrated Tools for Computational Chemical Dynamics • Developpp powerful simulation methods and incorporate them into a user- friendly high-throughput integrated software su ite for c hem ica l d ynami cs New Models (Methods) → Modules → Integrated Tools UNIVERSITY OF MINNESOTA Development of new methods for the calculation of potential energy surface UNIVERSITY OF MINNESOTA Development of new simulation methods UNIVERSITY OF MINNESOTA Applications and Validations UNIVERSITY OF MINNESOTA Electronic Software structu re Dynamics ANT QMMM GCMC MN-GFM POLYRATE GAMESSPLUS MN-NWCHEMFM MN-QCHEMFM HONDOPLUS MN-GSM Interfaces GaussRate JaguarRate NWChemRate UNIVERSITY OF MINNESOTA Electronic Structure Software QMMM 1.3: QMMM is a computer program for combining quantum mechanics (QM) and molecular mechanics (MM). MN-GFM 3.0: MN-GFM is a module incorporating Minnesota DFT functionals into GAUSSIAN 03. MN-GSM 626.2:MN: MN-GSM is a module incorporating the SMx solvation models and other enhancements into GAUSSIAN 03. MN-NWCHEMFM 2.0:MN-NWCHEMFM is a module incorporating Minnesota DFT functionals into NWChem 505.0. MN-QCHEMFM 1.0: MN-QCHEMFM is a module incorporating Minnesota DFT functionals into Q-CHEM. GAMESSPLUS 4.8: GAMESSPLUS is a module incorporating the SMx solvation models and other enhancements into GAMESS. HONDOPLUS 5.1: HONDOPLUS is a computer program incorporating the SMx solvation models and other photochemical diabatic states into HONDO. UNIVERSITY OF MINNESOTA DiSftDynamics Software ANT 07: ANT is a molecular dynamics program for performing classical and semiclassical trajjyectory simulations for adiabatic and nonadiabatic processes. GCMC: GCMC is a Grand Canonical Monte Carlo (GCMC) module for the simulation of adsorption isotherms in heterogeneous catalysts.
    [Show full text]
  • Supporting Information
    Electronic Supplementary Material (ESI) for RSC Advances. This journal is © The Royal Society of Chemistry 2020 Supporting Information How to Select Ionic Liquids as Extracting Agent Systematically? Special Case Study for Extractive Denitrification Process Shurong Gaoa,b,c,*, Jiaxin Jina,b, Masroor Abroc, Ruozhen Songc, Miao Hed, Xiaochun Chenc,* a State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources, North China Electric Power University, Beijing, 102206, China b Research Center of Engineering Thermophysics, North China Electric Power University, Beijing, 102206, China c Beijing Key Laboratory of Membrane Science and Technology & College of Chemical Engineering, Beijing University of Chemical Technology, Beijing 100029, PR China d Office of Laboratory Safety Administration, Beijing University of Technology, Beijing 100124, China * Corresponding author, Tel./Fax: +86-10-6443-3570, E-mail: [email protected], [email protected] 1 COSMO-RS Computation COSMOtherm allows for simple and efficient processing of large numbers of compounds, i.e., a database of molecular COSMO files; e.g. the COSMObase database. COSMObase is a database of molecular COSMO files available from COSMOlogic GmbH & Co KG. Currently COSMObase consists of over 2000 compounds including a large number of industrial solvents plus a wide variety of common organic compounds. All compounds in COSMObase are indexed by their Chemical Abstracts / Registry Number (CAS/RN), by a trivial name and additionally by their sum formula and molecular weight, allowing a simple identification of the compounds. We obtained the anions and cations of different ILs and the molecular structure of typical N-compounds directly from the COSMObase database in this manuscript.
    [Show full text]
  • Open Babel Documentation Release 2.3.1
    Open Babel Documentation Release 2.3.1 Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) December 05, 2011 Contents 1 Introduction 3 1.1 Goals of the Open Babel project ..................................... 3 1.2 Frequently Asked Questions ....................................... 4 1.3 Thanks .................................................. 7 2 Install Open Babel 9 2.1 Install a binary package ......................................... 9 2.2 Compiling Open Babel .......................................... 9 3 obabel and babel - Convert, Filter and Manipulate Chemical Data 17 3.1 Synopsis ................................................. 17 3.2 Options .................................................. 17 3.3 Examples ................................................. 19 3.4 Differences between babel and obabel .................................. 21 3.5 Format Options .............................................. 22 3.6 Append property values to the title .................................... 22 3.7 Filtering molecules from a multimolecule file .............................. 22 3.8 Substructure and similarity searching .................................. 25 3.9 Sorting molecules ............................................ 25 3.10 Remove duplicate molecules ....................................... 25 3.11 Aliases for chemical groups ....................................... 26 4 The Open Babel GUI 29 4.1 Basic operation .............................................. 29 4.2 Options .................................................
    [Show full text]
  • CESMIX: Center for the Exascale Simulation of Materials in Extreme Environments
    CESMIX: Center for the Exascale Simulation of Materials in Extreme Environments Project Overview Youssef Marzouk MIT PSAAP-3 Team 18 August 2020 The CESMIX team • Our team integrates expertise in quantum chemistry, atomistic simulation, materials science; hypersonic flow; validation & uncertainty quantification; numerical algorithms; parallel computing, programming languages, compilers, and software performance engineering 1 Project objectives • Exascale simulation of materials in extreme environments • In particular: ultrahigh temperature ceramics in hypersonic flows – Complex materials, e.g., metal diborides – Extreme aerothermal and chemical loading – Predict materials degradation and damage (oxidation, melting, ablation), capturing the central role of surfaces and interfaces • New predictive simulation paradigms and new CS tools for the exascale 2 Broad relevance • Intense current interest in reentry vehicles and hypersonic flight – A national priority! – Materials technologies are a key limiting factor • Material properties are of cross-cutting importance: – Oxidation rates – Thermo-mechanical properties: thermal expansion, creep, fracture – Melting and ablation – Void formation • New systems being proposed and fabricated (e.g., metallic high-entropy alloys) • May have relevance to materials aging • Yet extreme environments are largely inaccessible in the laboratory – Predictive simulation is an essential path… 3 Demonstration problem: specifics • Aerosurfaces of a hypersonic vehicle… • Hafnium diboride (HfB2) promises necessary temperature
    [Show full text]
  • Popular GPU-Accelerated Applications
    LIFE & MATERIALS SCIENCES GPU-ACCELERATED APPLICATIONS | CATALOG | AUG 12 LIFE & MATERIALS SCIENCES APPLICATIONS CATALOG Application Description Supported Features Expected Multi-GPU Release Status Speed Up* Support Bioinformatics BarraCUDA Sequence mapping software Alignment of short sequencing 6-10x Yes Available now reads Version 0.6.2 CUDASW++ Open source software for Smith-Waterman Parallel search of Smith- 10-50x Yes Available now protein database searches on GPUs Waterman database Version 2.0.8 CUSHAW Parallelized short read aligner Parallel, accurate long read 10x Yes Available now aligner - gapped alignments to Version 1.0.40 large genomes CATALOG GPU-BLAST Local search with fast k-tuple heuristic Protein alignment according to 3-4x Single Only Available now blastp, multi cpu threads Version 2.2.26 GPU-HMMER Parallelized local and global search with Parallel local and global search 60-100x Yes Available now profile Hidden Markov models of Hidden Markov Models Version 2.3.2 mCUDA-MEME Ultrafast scalable motif discovery algorithm Scalable motif discovery 4-10x Yes Available now based on MEME algorithm based on MEME Version 3.0.12 MUMmerGPU A high-throughput DNA sequence alignment Aligns multiple query sequences 3-10x Yes Available now LIFE & MATERIALS& LIFE SCIENCES APPLICATIONS program against reference sequence in Version 2 parallel SeqNFind A GPU Accelerated Sequence Analysis Toolset HW & SW for reference 400x Yes Available now assembly, blast, SW, HMM, de novo assembly UGENE Opensource Smith-Waterman for SSE/CUDA, Fast short
    [Show full text]
  • An Efficient Hardware-Software Approach to Network Fault Tolerance with Infiniband
    An Efficient Hardware-Software Approach to Network Fault Tolerance with InfiniBand Abhinav Vishnu1 Manoj Krishnan1 and Dhableswar K. Panda2 Pacific Northwest National Lab1 The Ohio State University2 Outline Introduction Background and Motivation InfiniBand Network Fault Tolerance Primitives Hybrid-IBNFT Design Hardware-IBNFT and Software-IBNFT Performance Evaluation of Hybrid-IBNFT Micro-benchmarks and NWChem Conclusions and Future Work Introduction Clusters are observing a tremendous increase in popularity Excellent price to performance ratio 82% supercomputers are clusters in June 2009 TOP500 rankings Multiple commodity Interconnects have emerged during this trend InfiniBand, Myrinet, 10GigE …. InfiniBand has become popular Open standard and high performance Various topologies have emerged for interconnecting InfiniBand Fat tree is the predominant topology TACC Ranger, PNNL Chinook 3 Typical InfiniBand Fat Tree Configurations 144-Port Switch Block Diagram Multiple leaf and spine blocks 12 Spine Available in 144, 288 and 3456 port Blocks combinations 12 Leaf Blocks Multiple paths are available between nodes present on different switch blocks Oversubscribed configurations are becoming popular 144-Port Switch Block Diagram Better cost to performance ratio 12 Spine Blocks 12 Spine 12 Spine Blocks Blocks 12 Leaf 12 Leaf Blocks Blocks 4 144-Port Switch Block Diagram 144-Port Switch Block Diagram Network Faults Links/Switches/Adapters may fail with reduced MTBF (Mean time between failures) Fortunately, InfiniBand provides mechanisms to handle
    [Show full text]
  • Computer-Assisted Catalyst Development Via Automated Modelling of Conformationally Complex Molecules
    www.nature.com/scientificreports OPEN Computer‑assisted catalyst development via automated modelling of conformationally complex molecules: application to diphosphinoamine ligands Sibo Lin1*, Jenna C. Fromer2, Yagnaseni Ghosh1, Brian Hanna1, Mohamed Elanany3 & Wei Xu4 Simulation of conformationally complicated molecules requires multiple levels of theory to obtain accurate thermodynamics, requiring signifcant researcher time to implement. We automate this workfow using all open‑source code (XTBDFT) and apply it toward a practical challenge: diphosphinoamine (PNP) ligands used for ethylene tetramerization catalysis may isomerize (with deleterious efects) to iminobisphosphines (PPNs), and a computational method to evaluate PNP ligand candidates would save signifcant experimental efort. We use XTBDFT to calculate the thermodynamic stability of a wide range of conformationally complex PNP ligands against isomeriation to PPN (ΔGPPN), and establish a strong correlation between ΔGPPN and catalyst performance. Finally, we apply our method to screen novel PNP candidates, saving signifcant time by ruling out candidates with non‑trivial synthetic routes and poor expected catalytic performance. Quantum mechanical methods with high energy accuracy, such as density functional theory (DFT), can opti- mize molecular input structures to a nearby local minimum, but calculating accurate reaction thermodynamics requires fnding global minimum energy structures1,2. For simple molecules, expert intuition can identify a few minima to focus study on, but an alternative approach must be considered for more complex molecules or to eventually fulfl the dream of autonomous catalyst design 3,4: the potential energy surface must be frst surveyed with a computationally efcient method; then minima from this survey must be refned using slower, more accurate methods; fnally, for molecules possessing low-frequency vibrational modes, those modes need to be treated appropriately to obtain accurate thermodynamic energies 5–7.
    [Show full text]
  • In Quantum Chemistry
    http://www.cca-forum.org Computational Quality of Service (CQoS) in Quantum Chemistry Joseph Kenny1, Kevin Huck2, Li Li3, Lois Curfman McInnes3, Heather Netzloff4, Boyana Norris3, Meng-Shiou Wu4, Alexander Gaenko4 , and Hirotoshi Mori5 1Sandia National Laboratories, 2University of Oregon, 3Argonne National Laboratory, 4Ames Laboratory, 5Ochanomizu University, Japan This work is a collaboration among participants in the SciDAC Center for Technology for Advanced Scientific Component Software (TASCS), Performance Engineering Research Institute (PERI), Quantum Chemistry Science Application Partnership (QCSAP), and the Tuning and Analysis Utilities (TAU) group at the University of Oregon. Quantum Chemistry and the CQoS in Quantum Chemistry: Motivation and Approach Common Component Architecture (CCA) Motivation: CQoS Approach: CCA Overview: • QCSAP Challenges: How, during runtime, can we make the best choices • Overall: Develop infrastructure for dynamic component adaptivity, i.e., • The CCA Forum provides a specification and software tools for the for reliability, accuracy, and performance of interoperable quantum composing, substituting, and reconfiguring running CCA component development of high-performance components. chemistry components based on NWChem, MPQC, and GAMESS? applications in response to changing conditions – Performance, accuracy, mathematical consistency, reliability, etc. • Components = Composition – When several QC components provide the same functionality, what • Approach: Develop CQoS tools for – A component is a unit
    [Show full text]
  • PDF Hosted at the Radboud Repository of the Radboud University Nijmegen
    PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/19078 Please be advised that this information was generated on 2021-09-23 and may be subject to change. Computational Chemistry Metho ds Applications to Racemate Resolution and Radical Cation Chemistry ISBN Computational Chemistry Metho ds Applications to Racemate Resolution and Radical Cation Chemistry een wetenschapp elijkeproeve op het gebied van de Natuurwetenschapp en Wiskunde en Informatica Pro efschrift ter verkrijging van de graad van do ctor aan de KatholiekeUniversiteit Nijmegen volgens b esluit van het College van Decanen in het op enbaar te verdedigen op dinsdag januari des namiddags om uur precies do or Gijsb ert Schaftenaar geb oren op augustus te Harderwijk Promotores Prof dr ir A van der Avoird Prof dr E Vlieg Copromotor Prof dr RJ Meier Leden manuscriptcommissie Prof dr G Vriend Prof dr RA de Gro ot Dr ir PES Wormer The research rep orted in this thesis was nancially supp orted by the Dutch Or ganization for the Advancement of Science NWO and DSM Contents Preface Intro duction Intro duction Chirality Metho ds for obtaining pure enantiomers Racemate Resolution via diastereomeric salt formation Rationalization of diastereomeric salt formation Computational metho ds for mo deling the lattice energy Molecular Mechanics Quantum Chemical
    [Show full text]
  • Application Profiling at the HPCAC High Performance Center Pak Lui 157 Applications Best Practices Published
    Best Practices: Application Profiling at the HPCAC High Performance Center Pak Lui 157 Applications Best Practices Published • Abaqus • COSMO • HPCC • Nekbone • RFD tNavigator • ABySS • CP2K • HPCG • NEMO • SNAP • AcuSolve • CPMD • HYCOM • NWChem • SPECFEM3D • Amber • Dacapo • ICON • Octopus • STAR-CCM+ • AMG • Desmond • Lattice QCD • OpenAtom • STAR-CD • AMR • DL-POLY • LAMMPS • OpenFOAM • VASP • ANSYS CFX • Eclipse • LS-DYNA • OpenMX • WRF • ANSYS Fluent • FLOW-3D • miniFE • OptiStruct • ANSYS Mechanical• GADGET-2 • MILC • PAM-CRASH / VPS • BQCD • Graph500 • MSC Nastran • PARATEC • BSMBench • GROMACS • MR Bayes • Pretty Fast Analysis • CAM-SE • Himeno • MM5 • PFLOTRAN • CCSM 4.0 • HIT3D • MPQC • Quantum ESPRESSO • CESM • HOOMD-blue • NAMD • RADIOSS For more information, visit: http://www.hpcadvisorycouncil.com/best_practices.php 2 35 Applications Installation Best Practices Published • Adaptive Mesh Refinement (AMR) • ESI PAM-CRASH / VPS 2013.1 • NEMO • Amber (for GPU/CUDA) • GADGET-2 • NWChem • Amber (for CPU) • GROMACS 5.1.2 • Octopus • ANSYS Fluent 15.0.7 • GROMACS 4.5.4 • OpenFOAM • ANSYS Fluent 17.1 • GROMACS 5.0.4 (GPU/CUDA) • OpenMX • BQCD • Himeno • PyFR • CASTEP 16.1 • HOOMD Blue • Quantum ESPRESSO 4.1.2 • CESM • LAMMPS • Quantum ESPRESSO 5.1.1 • CP2K • LAMMPS-KOKKOS • Quantum ESPRESSO 5.3.0 • CPMD • LS-DYNA • WRF 3.2.1 • DL-POLY 4 • MrBayes • WRF 3.8 • ESI PAM-CRASH 2015.1 • NAMD For more information, visit: http://www.hpcadvisorycouncil.com/subgroups_hpc_works.php 3 HPC Advisory Council HPC Center HPE Apollo 6000 HPE ProLiant
    [Show full text]
  • Dmol Guide to Select a Dmol3 Task 1
    DMOL3 GUIDE MATERIALS STUDIO 8.0 Copyright Notice ©2014 Dassault Systèmes. All rights reserved. 3DEXPERIENCE, the Compass icon and the 3DS logo, CATIA, SOLIDWORKS, ENOVIA, DELMIA, SIMULIA, GEOVIA, EXALEAD, 3D VIA, BIOVIA and NETVIBES are commercial trademarks or registered trademarks of Dassault Systèmes or its subsidiaries in the U.S. and/or other countries. All other trademarks are owned by their respective owners. Use of any Dassault Systèmes or its subsidiaries trademarks is subject to their express written approval. Acknowledgments and References To print photographs or files of computational results (figures and/or data) obtained using BIOVIA software, acknowledge the source in an appropriate format. For example: "Computational results obtained using software programs from Dassault Systèmes Biovia Corp.. The ab initio calculations were performed with the DMol3 program, and graphical displays generated with Materials Studio." BIOVIA may grant permission to republish or reprint its copyrighted materials. Requests should be submitted to BIOVIA Support, either through electronic mail to [email protected], or in writing to: BIOVIA Support 5005 Wateridge Vista Drive, San Diego, CA 92121 USA Contents DMol3 1 Setting up a molecular dynamics calculation20 Introduction 1 Choosing an ensemble 21 Further Information 1 Defining the time step 21 Tasks in DMol3 2 Defining the thermostat control 21 Energy 3 Constraints during dynamics 21 Setting up the calculation 3 Setting up a transition state calculation 22 Dynamics 4 Which method to use?
    [Show full text]