<<

BLUE WATERS ANNUAL REPORT 2016

UNDERSTANDING BIOMOLECULAR AND DYNAMICS BY OVERCOMING BARRIERS TO CONFORMATIONAL SAMPLING

Allocation: NSF PRAC/2.00 Mnh PI: Thomas Cheatham1 Co-PI: Adrian Roitberg2, Carlos Simmerling3, and David Case4 Collaborators: Darrin York4, and Shantenu Jha4 FIGURE 1: Relative 1University of Utah performance 2University of Florida 3Stonybrook University of CPPTRAJ on 4Rutgers University different nodes when determining the 965 closest EXECUTIVE SUMMARY parallel scaling since, for a fixed system size, adding solvent additional cores does not increase performance. To out of 15,022 to Large ensembles of independent molecular dynamics, FIGURE 3: scattering (SAXS) analysis has previously shown overcome this, the community has moved toward 4,143 solute NEXT GENERATION WORK running optimized AMBER code on Blue Waters’ Organization that three conformations are observed for linker 10 application of ensemble methods and application from a 2,000 frame GPUs, enable full sampling of the conformational of Clostridium (Fig. 3). GSAFold is capable of predicting these three Our primary goal is to obtain a clear picture of of various enhanced sampling methodologies that MF trajectory ensemble of , including DNA helices, thermocellum conformations and all the other conformations for the cellulosome structure at work. For that, long couple together independent molecular dynamics (no imaging) RNA tetranucleotides, and RNA . This and CipA. To perform this analysis, 20,000 conformations molecular dynamics simulations of different (MD) simulation engines. AMBER, a suite of using various allows detailed validation and assessment of hemicellulases were obtained per linker and clustered. Combined, cellulosomes, some of them with hundreds of programs for biomolecular simulation whose latest parallelization enhanced sampling approaches and biomolecular in the SdbA/CipA these linker conformations would give us 1043 CipA millions of atoms, will have to be performed. To version, AMBER 16, was released in April 2016, modalities, force fields and provides detailed insight into cellulosome. The conformations. From clustering, we reduce this investigate the enzymatic mechanism in the context has been highly optimized for use on GPUs. The including CUDA on , dynamics, interactions, C. thermocellum number to 3888 that were obtained and of the cellulosome, hybrid quantum mechanics optimized GPU code, and the ensembles that are the XK nodes. and function. The ensemble simulations currently scaffoldin (CipA) also subjected to a cluster analysis that gave rise to (QM)/molecular mechanics simulations will have being performed are possible only on computational contains one CBM the five most significant structures. to be performed using multiple QM regions that hardware with large numbers of GPUs. While today (Green) and nine Following well-established protocols for large require massive computer power. Such complex our simulations are pushing the state of the art, such type I cohesins macromolecular systems [8,9], and using one of study might only be feasible in a few years, requiring large simulations will become routine within a few (Dark Blue) and the CipA conformations that we obtained using pre-exascale and exascale systems. years. The even larger and more powerful parallel thus organizes GSAFold, we built a first model of an entire resources available in the near future will enable a multiprotein cellulosome structure. MD simulations are now PUBLICATIONS AND DATA SETS molecular dynamics simulations to probe more complex with nine employed to study the quaternary structure stability. relevant biological time scales (milliseconds to (Red). The Schoeler, C., et al., Ultrastable cellulosome- seconds) and to study larger biomolecular assemblies C-terminal type adhesion complex tightens under load. Nat. more completely. II dockerin (Pink) WHY BLUE WATERS Commun. 5 (2014), p. 5635. domain of CipA Investigating the structure and functional processes Bernardi, R.C., M.C.R. Melo, and K. Schulten, binds specifically Enhanced Sampling Techniques in Molecular type II cohesin of large enzymatic complex machineries, such as the INTRODUCTION cellulosomes, is only possible on petascale computing Dynamics Simulations of Biological Systems. domains (Orange) Biomolecular simulation—although known as a resources, such as Blue Waters. Structures obtained Biochim. Biophys. Acta., 1850 (2015), p. 872. found in cell- powerful tool for probing the structure, dynamics, using enhanced sampling techniques, such as GSA, surface . interactions, and functions of proteins and nucleic are only reliable if thousands of conformations The CipA linkers acids for over 40 years—is really coming of age thanks (models) are predicted. Employing GSA for the already studied to access to large-scale computational resources numerous linkers of the cellulosome is a well-suited using GSAFold/NAMD such as Blue Waters. Not only can simulations be task for the large-scale parallel architecture of Blue integration are applied to larger biomolecular assemblies, but for Waters. numbered. modest sized biomolecules the community has demonstrated the ability to fold proteins de novo and to fully sample the conformational distributions of various motifs. A challenge is

202 203 BLUE WATERS ANNUAL REPORT 2016

FIGURE 2 (LEFT): developing and applying ensemble methods. With WHY BLUE WATERS Bergonzo, C., K.B. Hall, and T.E. Cheatham, a variety of methodological variations and names III. Stem-loop V of Varkud satellite RNA exhibits Overlap of the ranging from replica-exchange MD, metadynamics, Within the NSF ecosystem of computational characteristics of the Mg2+ bound structure in the average structures, swarms, and Markov state modeling to constant pH resources, Blue Waters is the GPU-optimized presence of monovalent ions. J. Phys. Chem. B 119, omitting hydrogens MD and lambda dynamics, all of the approaches resource with a sufficiently large set of GPUs to allow (2015) pp. 12355-12364. and the two promise more efficient means to explore structural ensembles on the 300-3,000 scale (assuming a single Robertson, J.C. and T.E. Cheatham, III. DNA terminal base pairs ensembles, free energy pathways, and kinetics. ensemble instance per GPU). Our team has shown backbone BI/BII distribution and dynamics in on each end from Although these techniques are promising, there is the ability to converge the conformational ensemble E2 -bound environment determined by nearly milliseconds no free lunch, since as the size of the of an RNA with multidimensional replica molecular dynamics simulation. J. Phys. Chem. B of aggregated increases, sampling a complete ensemble takes longer exchange; using 360 GPUs, this requires about 2-3 119 (2015) pp. 14111-14119. MD simulation and longer. Various methods attempt to speed the microseconds of MD simulation per ensemble Galindo-Murillo, R., D.R. Davis, and T.E. data from 100 process through applications of enhanced sampling instance, or approximately five to 10 days of Cheatham, III. Probing the influence of independent 11 continuous MD simulation on those resources. Lys microsecond length in different degrees of freedom. However it is often hypermodified residues within the tRNA3 difficult to verify claims of efficiency, especially with anticodon stem loop interacting with the A-loop MD simulations method variants implemented into vastly different primer sequence from HIV-1. Biochemica Biophys. (omitting the first NEXT GENERATION WORK code bases. Therefore, we have explored making our Acta 1860 (2016) pp. 607-617. 2 microseconds) ensemble data available to the community online Ensemble-based biomolecular simulation methods Dissanayake, T., J. Swails, M. Harris, A.E. Roitberg, with the parmbsc1 (http://amber.utah.edu) to allow other researchers will continue to evolve in terms of their generality and D. York. Interpretation of pH-activity Profiles (blue) and AMBER ff15 to directly compare our results to results obtained and power. With next-generation computational for Acid-Base Catalysis from Molecular Simulations. (or ol15, red) force using different ensemble approaches and to assess resources, the community will be able to not 54 (2015) pp 1307-1313. fields compared to relative convergence and efficiency. only study larger biomolecular systems but also Hopkins, C., S. LeGrand, R.C. Walker, A.E. the PDB average The combination of large ensemble methods with to more fully sample and converge the accessible Roitberg. Long Time Step Molecular Dynamics structure from 1NAJ the availability of many fast GPUs on Blue Waters conformational space of these systems. While through Hydrogen Mass Repartitioning. J. Chem. (gray). leads to an explosion in data. In order to process at present we can aggregate to milliseconds of Theory Comp. 11 (2015) pp. 1864-1874. vast amounts of data efficiently, the AMBER MD effective sampling, to reach biological time scales Alvarez, L., A. Louis Ballester, A.E. Roitberg, D. trajectory analysis code CPPTRAJ has been modified we still need orders of magnitude greater sampling Estrin, S.-R. Yeh, M. Marti, L. Capece. Structural to include multiple levels of parallelism, aided by (to seconds and beyond) in simulations that likely study of a flexible loop in human a Petascale Application Improvement Discovery will require multiple GPUs or accelerators to reach indoleamine 2,3-dioxygenase and its functional (PAID) collaboration with Blue Waters. Specifically, system sizes of hundreds of thousands to millions of implications. Biochemistry 55 (2016) pp. 2785-2793. CPPTRAJ now implements four levels of parallelism. atoms and beyond. The kind of ensemble analyses Nguyen, H., A. Perez, S. Bermeo, and C. MPI parallelism over ensemble instances (with we are currently undertaking, made possible by Simmerling. Refinement of generalized Born implicit produced, provide a powerful means to assess and sorting of data across all ensembles), MPI parallelism Blue Waters, will be effectively routine by 2019- solvation parameters for nucleic acids and their validate the available force fields and to apply these over reading/writing of trajectory files in a given 20 and will enable the larger community while we complexes with proteins. J. Chem. Theory Comp. methodologies to give novel biological insight into ensemble, OpenMP parallelism for time-intensive push the edge of what is possible on future petascale 11 (2016) pp. 3714-3728. protein and and dynamics. analyses, and most recently GPU parallelization of resources. Maier, J.A., C. Martinez, K. Kasavajhala, L. Improving the codes, methods, and force fields time-intensive analyses involving calculations of a Wickstrom, K.E. Hauser, and C. Simmerling. ff14SB: while also pushing ensemble methods to their limits large number of distances. Relative performance is Improving the Accuracy of Protein Side Chain and are critical research activities since these tools are shown in Fig. 1. PUBLICATIONS AND DATA SETS Backbone Parameters from ff99SB. J. Chem. Theory being used by an ever-increasing pool of researchers Key results of our ensemble approaches are http://amber.utah.edu Comp. 11 (2015) pp. 3696-3713. throughout the world. described in greater detail in the listed publications Galindo-Murillo, R., J.C. Garcia-Ramos, L. Ruiz- Lai, C.-T., H.-J. Li, W. Yu, S. Shah, G.R. Bommineni, and range from accurate modeling of magnesium- Azuara, T.E. Cheatham, III, and F. Cortes-Guzman. V. Perrone, M. Garcia-Diaz, P.J. Tonge, and C. dependent conformational changes in RNA, correct Simmerling. Rational Modulation of the Induced-Fit METHODS & RESULTS Intercalation processes of copper complexes in DNA. modeling of proteins binding and modulating Nuc. Acids Res. 43 (2015) pp. 5364-5376. Conformational Change for Slow-Onset Inhibition The coupling together of independent molecular DNA structure, and optimization, assessment, and Bergonzo, C., N. Henriksen, D.R. Roe, and T.E. in Mycobacterium tuberculosis InhA. Biochemistry dynamics simulations into loosely coupled ensembles validation of nucleic acid force fields. This includes Cheatham, III. Highly sampled tetranucleotide and 54 (2015) pp. 4683-4691. to enhance conformational sampling is an increasingly fully converging the conformational and dynamic tetraloop motifs enable evaluation of common RNA used modality for applications in biomolecular distribution of the Dickerson-Drew dodecamer with force fields. RNA 29 (2015) pp. 1578-1590. simulation. If you peruse any recent journal in the average structures over milliseconds of aggregated Bergonzo, C. and T.E. Cheatham, III. Improved field, including The Journal of Chemical Theory MD data less than 0.8 Å from the published force field parameters lead to a better description and Computation, The Journal of Computational experimental structures as shown in Fig. 2. of RNA structure. J. Chem. Theory Comp. 11 (2015) Chemistry, and The Journal of Physical Chemistry, pp. 3969-3972. among others, you will see multiple publications

204 205