Understanding Biomolecular Structure and Dynamics by Overcoming Barriers to Conformational Sampling

BLUE WATERS ANNUAL REPORT 2016 UNDERSTANDING BIOMOLECULAR STRUCTURE AND DYNAMICS BY OVERCOMING BARRIERS TO CONFORMATIONAL SAMPLING Allocation: NSF PRAC/2.00 Mnh PI: Thomas Cheatham1 Co-PI: Adrian Roitberg2, Carlos Simmerling3, and David Case4 Collaborators: Darrin York4, and Shantenu Jha4 FIGURE 1: Relative 1University of Utah performance 2University of Florida 3Stonybrook University of CPPTRAJ on 4Rutgers University different nodes when determining the 965 closest EXECUTIVE SUMMARY parallel scaling since, for a fixed system size, adding solvent molecules additional cores does not increase performance. To out of 15,022 to Large ensembles of independent molecular dynamics, FIGURE 3: scattering (SAXS) analysis has previously shown overcome this, the community has moved toward 4,143 solute atoms NEXT GENERATION WORK running optimized AMBER code on Blue Waters’ Organization that three conformations are observed for linker 10 application of ensemble methods and application from a 2,000 frame GPUs, enable full sampling of the conformational of Clostridium (Fig. 3). GSAFold is capable of predicting these three Our primary goal is to obtain a clear picture of of various enhanced sampling methodologies that MF trajectory ensemble of biomolecules, including DNA helices, thermocellum conformations and all the other conformations for the cellulosome structure at work. For that, long couple together independent molecular dynamics (no imaging) RNA tetranucleotides, and RNA tetraloops. This cellulases and CipA. To perform this analysis, 20,000 conformations molecular dynamics simulations of different (MD) simulation engines. AMBER, a suite of using various allows detailed validation and assessment of hemicellulases were obtained per linker and clustered. Combined, cellulosomes, some of them with hundreds of programs for biomolecular simulation whose latest parallelization enhanced sampling approaches and biomolecular in the SdbA/CipA these linker conformations would give us 1043 CipA millions of atoms, will have to be performed. To version, AMBER 16, was released in April 2016, modalities, force fields and provides detailed insight into cellulosome. The conformations. From clustering, we reduce this investigate the enzymatic mechanism in the context has been highly optimized for use on GPUs. The including CUDA on biomolecular structure, dynamics, interactions, C. thermocellum number to 3888 structures that were obtained and of the cellulosome, hybrid quantum mechanics optimized GPU code, and the ensembles that are the XK nodes. and function. The ensemble simulations currently scaffoldin (CipA) also subjected to a cluster analysis that gave rise to (QM)/molecular mechanics simulations will have being performed are possible only on computational contains one CBM the five most significant structures. to be performed using multiple QM regions that hardware with large numbers of GPUs. While today (Green) and nine Following well-established protocols for large require massive computer power. Such complex our simulations are pushing the state of the art, such type I cohesins macromolecular systems [8,9], and using one of study might only be feasible in a few years, requiring large simulations will become routine within a few (Dark Blue) and the CipA conformations that we obtained using pre-exascale and exascale systems. years. The even larger and more powerful parallel thus organizes GSAFold, we built a first model of an entire resources available in the near future will enable a multiprotein cellulosome structure. MD simulations are now PUBLICATIONS AND DATA SETS molecular dynamics simulations to probe more complex with nine employed to study the quaternary structure stability. relevant biological time scales (milliseconds to enzymes (Red). The Schoeler, C., et al., Ultrastable cellulosome- seconds) and to study larger biomolecular assemblies C-terminal type adhesion complex tightens under load. Nat. more completely. II dockerin (Pink) WHY BLUE WATERS Commun. 5 (2014), p. 5635. domain of CipA Investigating the structure and functional processes Bernardi, R.C., M.C.R. Melo, and K. Schulten, binds specifically Enhanced Sampling Techniques in Molecular type II cohesin of large enzymatic complex machineries, such as the INTRODUCTION cellulosomes, is only possible on petascale computing Dynamics Simulations of Biological Systems. domains (Orange) Biomolecular simulation—although known as a resources, such as Blue Waters. Structures obtained Biochim. Biophys. Acta., 1850 (2015), p. 872. found in cell- powerful tool for probing the structure, dynamics, using enhanced sampling techniques, such as GSA, surface proteins. interactions, and functions of proteins and nucleic are only reliable if thousands of conformations The CipA linkers acids for over 40 years—is really coming of age thanks (models) are predicted. Employing GSA for the already studied to access to large-scale computational resources numerous linkers of the cellulosome is a well-suited using GSAFold/NAMD such as Blue Waters. Not only can simulations be task for the large-scale parallel architecture of Blue integration are applied to larger biomolecular assemblies, but for Waters. numbered. modest sized biomolecules the community has demonstrated the ability to fold proteins de novo and to fully sample the conformational distributions of various nucleic acid motifs. A challenge is 202 203 BLUE WATERS ANNUAL REPORT 2016 FIGURE 2 (LEFT): developing and applying ensemble methods. With WHY BLUE WATERS Bergonzo, C., K.B. Hall, and T.E. Cheatham, a variety of methodological variations and names III. Stem-loop V of Varkud satellite RNA exhibits Overlap of the ranging from replica-exchange MD, metadynamics, Within the NSF ecosystem of computational characteristics of the Mg2+ bound structure in the average structures, swarms, and Markov state modeling to constant pH resources, Blue Waters is the GPU-optimized presence of monovalent ions. J. Phys. Chem. B 119, omitting hydrogens MD and lambda dynamics, all of the approaches resource with a sufficiently large set of GPUs to allow (2015) pp. 12355-12364. and the two promise more efficient means to explore structural ensembles on the 300-3,000 scale (assuming a single Robertson, J.C. and T.E. Cheatham, III. DNA terminal base pairs ensembles, free energy pathways, and kinetics. ensemble instance per GPU). Our team has shown backbone BI/BII distribution and dynamics in on each end from Although these techniques are promising, there is the ability to converge the conformational ensemble E2 protein-bound environment determined by nearly milliseconds no free lunch, since as the size of the biomolecule of an RNA tetraloop with multidimensional replica molecular dynamics simulation. J. Phys. Chem. B of aggregated increases, sampling a complete ensemble takes longer exchange; using 360 GPUs, this requires about 2-3 119 (2015) pp. 14111-14119. MD simulation and longer. Various methods attempt to speed the microseconds of MD simulation per ensemble Galindo-Murillo, R., D.R. Davis, and T.E. data from 100 process through applications of enhanced sampling instance, or approximately five to 10 days of Cheatham, III. Probing the influence of independent 11 continuous MD simulation on those resources. Lys microsecond length in different degrees of freedom. However it is often hypermodified residues within the tRNA3 difficult to verify claims of efficiency, especially with anticodon stem loop interacting with the A-loop MD simulations method variants implemented into vastly different primer sequence from HIV-1. Biochemica Biophys. (omitting the first NEXT GENERATION WORK code bases. Therefore, we have explored making our Acta 1860 (2016) pp. 607-617. 2 microseconds) ensemble data available to the community online Ensemble-based biomolecular simulation methods Dissanayake, T., J. Swails, M. Harris, A.E. Roitberg, with the parmbsc1 (http://amber.utah.edu) to allow other researchers will continue to evolve in terms of their generality and D. York. Interpretation of pH-activity Profiles (blue) and AMBER ff15 to directly compare our results to results obtained and power. With next-generation computational for Acid-Base Catalysis from Molecular Simulations. (or ol15, red) force using different ensemble approaches and to assess resources, the community will be able to not Biochemistry 54 (2015) pp 1307-1313. fields compared to relative convergence and efficiency. only study larger biomolecular systems but also Hopkins, C., S. LeGrand, R.C. Walker, A.E. the PDB average The combination of large ensemble methods with to more fully sample and converge the accessible Roitberg. Long Time Step Molecular Dynamics structure from 1NAJ the availability of many fast GPUs on Blue Waters conformational space of these systems. While through Hydrogen Mass Repartitioning. J. Chem. (gray). leads to an explosion in data. In order to process at present we can aggregate to milliseconds of Theory Comp. 11 (2015) pp. 1864-1874. vast amounts of data efficiently, the AMBER MD effective sampling, to reach biological time scales Alvarez, L., A. Louis Ballester, A.E. Roitberg, D. trajectory analysis code CPPTRAJ has been modified we still need orders of magnitude greater sampling Estrin, S.-R. Yeh, M. Marti, L. Capece. Structural to include multiple levels of parallelism, aided by (to seconds and beyond) in simulations that likely study of a flexible active site loop in human a Petascale Application Improvement Discovery will require multiple GPUs or accelerators to reach indoleamine 2,3-dioxygenase

Understanding Biomolecular Structure and Dynamics by Overcoming Barriers to Conformational Sampling

The Architecture of the Protein Domain Universe

Cmse 520 Biomolecular Structure, Function And

Action of Multiple Rice -Glucosidases on Abscisic Acid Glucose Ester

Evolution of Biomolecular Structure Class II Trna-Synthetases and Trna

The Effect of Dicer Knockout on RNA Interference Using Various Dicer Substrate Interfering RNA Structures

Protein Structure Prediction and Design in a Biologically-Realistic Implicit Membrane

Topic 5: the Folding of Biopolymers – RNA and Protein Overview

Biomolecular Modeling and Simulation: a Prospering Multidisciplinary Field

A Glance Into the Evolution of Template-Free Protein Structure Prediction Methodologies Arxiv:2002.06616V2 [Q-Bio.QM] 24 Apr 2

Biological Information, Molecular Structure, and the Origins Debate Jonathan K

Appendix8 Shape Grammars in Chapter 9 There Is a Reference to Possible Applications of Shape Grammars. One Such Application Conc

Structure and Functions of Biomolecules - Web Course