AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-Cov-2 Spike Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.11.19.390187; this version posted November 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics Lorenzo Casalino1y, Abigail Dommer1y, Zied Gaieb1y, Emilia P. Barros1, Terra Sztain1, Surl-Hee Ahn1, Anda Trifan2,3, Alexander Brace2, Anthony Bogetti4, Heng Ma2, Hyungro Lee5, Matteo Turilli5, Syma Khalid6, Lillian Chong4, Carlos Simmerling7, David J. Hardy3, Julio D. C. Maia3, James C. Phillips3, Thorsten Kurth8, Abraham Stern8, Lei Huang9, John McCalpin9, Mahidhar Tatineni10, Tom Gibbs8, John E. Stone3, Shantenu Jha5, Arvind Ramanathan2∗, Rommie E. Amaro1∗ 1University of California San Diego, 2Argonne National Lab, 3University of Illinois at Urbana-Champaign, 4University of Pittsburgh, 5Rutgers University & Brookhaven National Lab, 6University of Southampton, 7 Stony Brook University, 8NVIDIA Corporation, 9Texas Advanced Computing Center, 10San Diego Supercomputing Center, yJoint first authors, ∗Contact authors: [email protected], [email protected] ABSTRACT ’20: International Conference for High Performance Computing, Networking, We develop a generalizable AI-driven workflow that leverages het- Storage, and Analysis. ACM, New York, NY, USA, 14 pages. https://doi.org/ finalDOI erogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mech- anisms of infectivity of the SARS-CoV-2 spike protein, the main 1 JUSTIFICATION viral infection machinery. Our workflow enables more efficient in- We: vestigation of spike dynamics in a variety of complex environments, • develop an AI-driven multiscale simulation framework to including within a complete SARS-CoV-2 viral envelope simula- interrogate SARS-CoV-2 spike dynamics, tion, which contains 305 million atoms and shows strong scaling • reveal the spike’s full glycan shield and discover that glycans on ORNL Summit using NAMD. We present several novel scien- play an active role in infection, and tific discoveries, including the elucidation of the spike’s full glycan • achieve new high watermarks for classical MD simulation shield, the role of spike glycans in modulating the infectivity of the of viruses (305 million atoms) and the weighted ensemble virus, and the characterization of the flexible interactions between method (600,000 atoms). the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems 2 PERFORMANCE ATTRIBUTES and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems. Performance Attribute Our Submission Category of achievement Scalability, Time-to-solution KEYWORDS Type of method used Explicit, Deep Learning Results reported on the basis of Whole application including I/O molecular dynamics, deep learning, multiscale simulation, weighted Precision reported Mixed Precision ensemble, computational virology, SARS-CoV-2, COVID19, HPC, System scale Measured on full system GPU, AI Measurement mechanism Hardware performance counters, ACM Reference Format: Application timers, 1y 1y 1y 1 Lorenzo Casalino , Abigail Dommer , Zied Gaieb , Emilia P. Barros , Performance Modeling Terra Sztain1, Surl-Hee Ahn1, Anda Trifan2,3, Alexander Brace2, Anthony 4 2 5 5 6 Bogetti , Heng Ma , Hyungro Lee , Matteo Turilli , Syma Khalid , Lillian 3 OVERVIEW OF THE PROBLEM Chong4, Carlos Simmerling7, David J. Hardy3, Julio D. C. Maia3, James C. Phillips3, Thorsten Kurth8, Abraham Stern8, Lei Huang9, John McCalpin9, The SARS-CoV-2 virus is the causative agent of COVID19, a world- Mahidhar Tatineni10, Tom Gibbs8, John E. Stone3, Shantenu Jha5, Arvind wide pandemic that has infected over 35 million people and killed Ramanathan2∗, Rommie E. Amaro1∗. 2020. AI-Driven Multiscale Simulations over one million. As such it is the subject of intense scientific inves- Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics. In Supercomputing tigations. Researchers are interested in understanding the structure and function of the proteins that constitute the virus, as this knowl- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed edge aids in the understanding of transmission, infectivity, and for profit or commercial advantage and that copies bear this notice and the full citation potential therapeutics. on the first page. Copyrights for components of this work owned by others than ACM A number of experimental methods, including x-ray crystallogra- must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a phy, cryoelectron (cryo-EM) microscopy, and cryo-EM tomography fee. Request permissions from [email protected]. are able to inform on the structure of viral proteins and the other Supercomputing ’20, November 16–19, 2020, Virtual (e.g., host cell) proteins with which the virus interacts. Such struc- © 2020 Association for Computing Machinery. ACM ISBN ISBN...$15.00 tural information is vital to our understanding of these molecular https://doi.org/finalDOI machines, however, there are limits to what experiments can tell us. bioRxiv preprint doi: https://doi.org/10.1101/2020.11.19.390187; this version posted November 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Supercomputing ’20, November 16–19, 2020, Virtual Casalino et al. For example, achieving high resolution structures typically comes vaccine and therapeutic development, inform on basic mechanisms at the expense of dynamics: flexible parts of the proteins (e.g., loops) of viral infection, push technological and methodological limits for are often not resolved, or frequently not even included in the exper- molecular simulation, and bring supercomputing to the forefront imental construct. Glycans, the sugar-like structures that decorate in the fight against COVID19. viral surface proteins, are particularly flexible and thus experimen- tal techniques are currently unable to provide detailed views into 3.1 Methods their structure and function beyond a few basic units. Additionally, Full-length, fully-glycosylated spike protein. In this work, we built these experiments can resolve static snapshots, perhaps catching two full-length glycosylated all-atom models of the SARS-CoV-2 S different states of the protein, but they are unable to elucidate the protein in both closed and open states, fully detailed in Casalino thermodynamic and kinetic relationships between such states. et al [10]. The two all-atom models were built starting from the In addition to the rich structural datasets, researchers have used cryo-EM structures of the spike in the open state (PDB ID: 6VSB a variety of proteomic, glycomic, and other methods to determine [70]), where one receptor binding domain (RBD) is in the “up” con- detailed information about particular aspects of the virus. In one formation, and in the closed state, bearing instead three RBDs in example, deep sequencing methods have informed on the functional the “down” conformation (PDB ID: 6VXX [66]). Given that the implications of mutations in a key part of the viral spike protein [57]. experimental cryo-EM structures were incomplete, the remain- In others, mass spectrometry approaches have provided information ing parts, namely (i) the missing loops within the head (residues about the particular composition of the glycans at particular sites 16–1141), (ii) the stalk (residues 1141–1234) and (iii) the cytosolic on the viral protein [54, 69]. These data are each valuable in their tail (residues 1235–1273), were modelled using MODELLER [51] own right but exist as disparate islands of knowledge. Thus there is and I-TASSER [79]. The resulting full-length all-atom constructs a need to integrate these datasets into cohesive models, such that were subsequently N-/O-glycosylated using the Glycan Reader & the fluctuations of the viral particle and its components that cause Modeler tool [24] integrated into Glycan Reader [25] in CHARMM- its infectivity can be understood. GUI [38]. Importantly, an asymmetric glycoprofile was generated In this work, we used all-atom molecular dynamics (MD) sim- (e.g., not specular across monomers) taking into account the N-/O- ulations to combine, augment, and extend available experimental glycans heterogeneity as described in the available glycoanalytic datasets in order to interrogate the structure, dynamics, and func- data [54, 69]. The two glycosylated systems were embedded into tion of the SARS-CoV-2 spike protein (Fig. 1). The spike protein their physiological environment composed of an ERGIC-like lipid is considered the main infection machinery of the virus because bilayer [11, 65] built using CHARMM-GUI [24, 72], explicit TIP3P it is the only glycoprotein on the surface of the virus and it is the water molecules [26], and neutralizing chloride and sodium ions molecular machine that interacts with the human host cell receptor, at 150 mM concentration,