UC San Diego UC San Diego Previously Published Works
Title Advances in Rosetta protein structure prediction on massively parallel systems
Permalink https://escholarship.org/uc/item/87g6q6bw
Journal IBM Journal of Research and Development, 52(1)
ISSN 0018-8646
Authors Raman, S. Baker, D. Qian, B. et al.
Publication Date 2008
Peer reviewed
eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker
One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
Introduction rate at which protein structures are being experimentally With genetic sequencing completed for the entire genome solved is lagging far behind the explosive rate at which of a number of organisms including humans, the next protein sequence information is being gathered by high- challenge is to functionally characterize the proteins throughput genome sequencing efforts. Thus, given an encoded by the genes and to understand their roles and amino-acid sequence, a high-throughput methodology to interactions in cellular pathways. High-resolution three- computationally predict protein structures at atomic-level dimensional (3D) protein structures can help in accuracy is one of the long-standing challenges in understanding biological functions of proteins and also computational biology. explain the underlying molecular mechanisms of protein If the protein of interest shares a reasonable sequence interactions. The availability of high-resolution protein similarity with a protein of known structure, then the structures should significantly accelerate efforts toward latter can be used as a template for modeling. If, targeted drug development. however, no detectable templates exist or the similarity is It has been known for more than 40 years that the 3D small, then de novo techniques have to be used [2]. In structures of proteins under normal physiological de novo methods, a protein structure is predicted in the conditions are uniquely determined by the composition of absence of any tertiary (i.e., 3D) structure information. their amino acids, which form a sequence called the In other words, no known proteins can be used as primary structure of a protein [1]. However, despite templates for the starting model. The ultimate objective considerable technical advances, the experimental is to make high-resolution protein structures readily determination of protein structures by nuclear magnetic available for all proteins of biological interest and then resonance (NMR) and x-ray diffraction techniques to extend this to enable the design of man-made remains slow, expensive, and arduous. In particular, the proteins.