The Molecular Sciences Institute

… a nexus for science, education, and cooperation for the global computational molecular sciences community. What is the MolSSI? •Launched August 1st, 2016, funded by the National Science Foundation. •Collaborative effort by Virginia Tech (T.D. Crawford), Rice U. (. Clementi), Stony Brook U. (R. Harrison), U.C. Berkeley (T. Head-Gordon), Stanford U. (V. Pande), Rutgers U. (S. Jha), U. Southern California (A. Krylov), and Iowa State U (T. Windus). •Part of the NSF’s commitment to the White House’s National Strategic Computing Initiative (NSCI). •Total budget of $19.42M for five years, potentially renewable to ten years. •Joint support from numerous NSF divisions: Advanced Cyberinfrastructure (ACI), Chemistry (CHE), and Division of Materials Research (DMR) •Designed to serve and enhance the software development efforts of the broad field of computational molecular science. Computational Molecular Sciences (CMS)

• The history of CMS – the sub-fields of , computational materials science, and biomolecular simulation – reaches back decades to the genesis of computational science. • CMS is now a “full partner with experiment”. • For an impressive array of chemical, biochemical, and materials challenges, our community has developed simulations and models that directly impact: • Development of new chiral drugs; • Elucidation of the functionalities of biological macromolecules; • Development of more advanced materials for solar-energy storage, technology for CO2 sequestration, etc. CMS Codes Are Developed and Used Globally

Gaussian

Dalton

Amber

MOLPRO CMS Codes Are Developed and Used Globally

Gaussian

Dalton These codes represent decades of development by thousands of programmers, and are used by hundredsAmber of thousands of scientists worldwide.

MOLPRO Code Complexity and Historical Legacy •CMS programs contain millions of lines of hand-written code and require hundreds of programmers to develop and maintain. •Incredible language diversity: F77, F90, F95, HPF, C, C++, C++11, C++14,C++17, Python, perl, Javascript, etc. •Incredible algorithmic diversity: structured and unstructured grids, dense and sparse linear algebra, graph traversal, fast Fourier transforms, MapReduce, and more. •The packages have evolved in an ad hoc manner over decades because of the intricacy of the scientific problems they are designed to solve. Rapidly Evolving Computing Hardware •Multi- and many-core architectures are the norm, but many CMS codes are developed with limited view to parallel task management. •Reduced-power solutions will also require improved error recovery and checkpointing at the software level – capabilities absent in nearly all CMS codes. •Anticipated architectural innovations will yield even greater hardware complexity – more advanced accelerators, specialized computing cores, reconfigurable logic… •Many CMS codes (especially for quantum chemistry) are limited to shared-memory paradigms and cannot yet take advantage of GPUs or large-scale distributed-memory systems. Inertia in the Scientific Education Culture •Undergraduate programs in chemistry and physics typically require no training in software development or programming. •Graduate programs in these areas require minimal coursework between the bachelor and Ph.D. •Most computer science students lack the underlying knowledge of the scientific domains to help develop creative software solutions. •Due credit for software development is elusive due to a culture that judges productivity based on citations of peer-reviewed papers. •Thus, a “just get the physics working” approach pervades much of CMS software development. MolSSI Goal #1

To Provide Software Expertise and Infrastructure… • MolSSI will work with CMS research groups nationwide and internationally to design, develop, test, deploy, and maintain key code infrastructure and frameworks for the entire community. • MolSSI will interact with partners in industry, NSF supercomputing centers, national laboratories, and international facilities to identify and act on emerging hardware trends, access leading-edge computing architectures, further educational goals, set software priorities, and identify future workforce career paths. MolSSI Goal #2 To Provide Education and Training… • MolSSI will serve as an education and outreach nexus for the worldwide CMS community. • MolSSI will organize summer schools, targeted workshops, high- school and undergraduate training programs, and on-line resources and classes to provide current and future CMS students with a modern and complete set of programming skills. • MolSSI will reach beyond the traditional student cohort to computer scientists and mathematicians seeking interdisciplinary applications. • MolSSI will deploy a Professional Master’s program in Molecular Simulation and Software Engineering. MolSSI Goal #3

To Provide Community Engagement and Leadership… • MolSSI will enable the CMS community to establish its own standards for interoperability, best practices, and curation tools. • Through a “grass roots” approach, MolSSI will engage the community broadly using interoperability workshops and focus groups – and ultimately the formation of a Molecular Sciences Consortium – to catalyze the consensus needed for standardization of data structures, APIs, and frameworks for the entire CMS software ecosystem. The Molecular Sciences Software Institute

Software

Dev Team #1 Dev Team Board of Directors #2

Dev Team #3 Science & Software Advisory Board

Community Software Fellows MolSSI Software Scientists (MSSs) •A team of ~12 software engineering experts, drawn both from newly minted Ph.D.s and established researchers in molecular sciences, computer science, and applied mathematics. •Dedicated to multiple responsibilities: • Developing software infrastructure and frameworks; • Interacting with CMS research groups and community code developers; • Providing forums for standards development and resource curation; • Serving as mentors to MolSSI Software Fellows; • Working with industrial, national laboratory, and international partners; Approximately 50% of the Institute’s budget will directly support the MolSSI Software Scientists. MolSSI Software Scientists

Doaa Altarawy Taylor Barnes Ph.D., Virginia Tech Ph.D., Caltech Computer Science Quantum Chemistry, Machine Learning, High Performance Computing, Software Engineering Code Optimization MolSSI Software Scientists

Eliseo Marin-Rimoldi Jessica Nash Ph.D., Univ. Notre Dame Ph.D., N.C. State Univ. Monte Carlo Methods, Phase Soft Materials, Equilibria, Statistical Thermodynamics MolSSI Software Scientists

Benjamin Pritchard Daniel Smith Ph.D., Univ. Buffalo Ph.D., Auburn University Quantum Chemistry, Quantum Chemistry, Interoperability Frameworks, Workflows, Microarchitecture Optimization Rapid Prototyping Current MolSSI Software Infrastructure Projects

•MolSSI Integral Reference Project (MIRP) – provides both reference data and reference implementations of common integrals found in ; •MolSSI Code Database – convenient and up-to-date information on CMS community codes; •MolSSI Energy Expression Exchange – to allow translations of forcefields between molecular dynamics codes; Current MolSSI Software Infrastructure Projects

•MolSSI Framework – a light- weight Python-based plugin structure for interoperability of CMS codes for new scientific calculations;

•MolSSI/PNNL Basis-Set Exchange – an overhaul and expansion of the well-known basis-set exchange originally developed by Pacific Northwest National Labs in the Environmental Molecular Sciences Laboratory; Current MolSSI Software Infrastructure Projects •MolSSI QM/MM Driver – an API and communication layer for a control code using QM and MM codes as clients for QM/MM and other similar calculations; •MolSSI QM Schema – a JSON-based standard for common data to enable more complex workflows among quantum chemistry codes; •MolSSI QM Database – open, community-driven, multi-use quantum chemistry databases of benchmarks, force-field- related datasets, and other information for data mining and other machine learning initiatives MolSSI Software Fellows (MSFs) •A cohort of ~20 Fellows supported simultaneously – graduate students and postdocs selected by the Science and Software Advisory Board from research groups across the U.S. •Fellows will work directly with both the Software Scientists and the MolSSI Directors, thus providing a conduit between the Institute and the CMS community itself. •Fellows will work on their own projects, as well as contribute to the MolSSI development efforts, and they will engage in outreach and education activities under the Institute guidance. •Funding for MolSSI Software Fellows will follow a flexible, two-phase structure, providing up to two years of support. Approximately 25% of the Institute’s budget will directly support the MolSSI Software Fellows. 2018 Phase-I MolSSI Software Fellows

Adam Abbott (U Georgia) Joshua Rackers (Washington U in St. Louis) Caitlin Banna (UC Irvine) Jocelyn Sunseri (U Pittsburgh) Dr. M. Eric Irrgang (U Virginia) Dr. Tyler Takeshita (UC Berkeley) Aaron Mahler (Duke U) Ruslan Tazhigulov (Boston U) Dr. James McClain (Cal Tech) Jing Yang (U Pennsylvania) Justin Provazza (Boston U) 2018 Phase-I MolSSI Software Fellows

Joshua Rackers (Washington U in St. Louis) Adam Abbott (U Georgia) Jocelyn Sunseri (U Pittsburgh) Caitlin Banna (UC Irvine) Dr. Tyler Takeshita (UC Berkeley) Dr. M. Eric Irrgang (U Virginia) Ruslan Tazhigulov (Boston U) Aaron Mahler (Duke U) Jing Yang (U Pennsylvania) Dr. James McClain (Cal Tech) Justin Provazza (Boston U) 2018 Phase-II MolSSI Software Fellows

Srinivas Mushnoori Rutgers University Adviser: Prof. Meenakshi Dutt “Development and extension of a flexible, scalable Replica Exchange framework”

Kate Lebold Penn State University Adviser: Prof. William Noid “Thermodynamic state point transferability of coarse-grained models through the force-, pressure-, and energy-matching methodologies” 2018 Phase-II MolSSI Software Fellows

Erik Thiede University of Chicago Advisers: Profs. Aaron Dinner and Jonathan Weare “Protein dynamics in the absence of collective variables through dynamics-dependent statistics and the study of partial differential equations”

Marc Riera-Riambau University of California, San Diego Adviser: Prof. Francesco Paesani “Development and application of many-body potential energy functions in a user friendly software interface” 2018 Phase-II MolSSI Software Fellows

David Williams-Young University of Washington Adviser: Prof. Xiaosong Li “High-performance and scalable relativistic electronic structure methods for the ab initio prediction of molecular properties”

Oanh Vu Vanderbilt University Adviser: Prof. Jens Meiler “Machine-learning models used for drug discovery through local similarity-based descriptors with traditional autocorrelation descriptors” 2018 Phase-II MolSSI Software Fellows

Boyi Zhang University of Georgia Adviser: Prof. Fritz Schaefer “A unified software platform that effectively implements various adaptive Quantum Mechanics/ Molecular Mechanics (QM/MM) methods” Education and Training MolSSI 2017 Software Summer School

Focus: Graduate-Student Training in Software Fundamentals Virginia Tech July 24-August 2, 2017 Organizers: Daniel Smith, John Chodera, Lori Burns, and Daniel Crawford MolSSI 2017 Software Summer School • Primary objectives: 56 Graduate Students (16F/40M) 35 Universities • Teamwork 9 Instructors (4 external) • Best Practices 10 Days • Modern tools for modern software • Diverse programming knowledge • Ensure content is relevant to students

• Core Components: • Software Carpentry style courses • Learning groups • Taught by experts in the field MolSSI Coding Workshop at the MERCURY Conference

Focus: Undergraduate Training in Molecular Dynamics Simulations Furman University July 22-23, 2017 Organizers: Paul Nerenberg, Lee-Ping Wang, and Theresa Windus

Participants: 32 Undergraduates (18 Female) & 3 Faculty from 9 PUIs

Second Workshop: July 18-19, 2018 for up to 50 students Community Engagement and Leadership MolSSI Community Code Partners So Far…

•ACESII • •Molpro •PARSEC •ADF •DL_POLY •MPQC •PCMSolver •Amber •ELSI •MRChem •PLUMED •APBS •FHI-aims •NAMD •PSI4 •BOSS •GAMESS •NWChem •Q-Chem •CFOUR •Gaussian •NWChemEx •QBox •CHARMM •Gromacs •ONETEP •QMworks •Columbus •LAMMPS •OpenMM •Quantum ESPRESSO •Dalton •Molcas •Orca •Schrödinger •Tiger-CI •Turbomole •VASP

We encourage all community codes in the computational molecular sciences to work with us! The Molecular Sciences Consortium (MSC) •Community driven – what does the community need/want? •Establish standards or interoperability mechanisms in software and data. •Establish best practices for simulations and development. •Curation – organization and integration. •Coordinate information exchange between groups developing particular software components, including SSE/SSI projects and community code developers. •Accept proposals for focus groups to identify specific areas of interest. •Draft charter and operating procedures are in place. Current Working Groups

•Forcefield Interoperability: •Paul Saxe, Chris Bayly, John Chodera, Erick Lindahl, David Mobley •Data Formats for Quantum Mechanics Information: •Daniel Smith, Wibe deJong, Marcus Hanwell, Aaron Virshup; Doaa Altarawy, Paul Ayers, Nick Bauman, Lori Burns, Matthew Chan, Evgeny Epifanovsky, Sarom Leang, Joseph Montoya, David Sherrill, JR Schmidt, Benjamin Pritchard, Eduard Valeyev, Marat Valiev, William Polik, Toon Verstraelen, Vamsee Voora •Shared Memory Tensor Library Interfaces: •Theresa Windus, Evgeny Epifanovsky, Robert Harrison, Dmitry Liakh, Devin Matthews, Saday Sadayappan, Beverly Sanders, Daniel Smith, Eduard Valeyev Proposed working groups

•Workflows •Basis set description/tool •QM/MM interfaces •Software best practices

Discussions at the MolSSI interoperability workshop, June 2017 Upcoming MolSSI Workshops: Community-Driven

• Science Gateways and Computational Chemistry, New Orleans, Louisiana, TODAY (Organizers: Nancy Wilkins-Diehr, Marlon Pierce, Katherine Lawrence, Sudhakar Pamidighantam) •Python for Quantum Chemistry and Materials Simulation Software, Pasadena, California, June 2018 (Organizer: Qiming Sun)

•Tinker Modeling Software Workshop, Austin, Texas, June 2018 (Organizers: Pengyu Ren and Jay Ponder) MolSSI: Some Highlights So Far •Hired seven Software Scientists in the last ten months. Currently recruiting five more. •Ten software workshops held so far; another six planned. •New software components currently under development including forcefield interoperability frameworks, a reference integral implementation, a new basis set exchange, a molecular simulation framework, an open QM database, and more. •Community-driven working groups established in forcefield interoperability, quantum chemistry data exchange, and tensor algebra interfaces. •Held one software summer school and a bootcamp; another six educational workshops coming in 2018. •18 Software Fellows now funded, and a competition for a third cohort now open. •Proposal window for more community-led Software Workshops will open this summer. Acknowledgments

• Cecilia Clementi, Robert Harrison, Teresa Head-Gordon, Shantenu Jha, Anna Krylov, Vijay Pande, Theresa Windus; • The dozens of members of the CMS community who helped to develop the vision for the Institute over the last five years; • NSF ACI-1547580. Watch molssi.org for the latest information!