A Novel Statistical Potential for Protein Beta-Sheet Prediction

RICE UNIVERSITY A Novel Statistical Potential for Protein p-Sheets Prediction by Linglin Yu A Tupsts Suniulrtpo tN PeRttar, Fur,r'TLLMENT oF THE RnqurneMENTS FoR THE DpcRop Master of Science AppRovno, THoSIS Collivttttpe: of BioEngineering, University wick T. Bolin Prbfessor of College of Medicine Peter Nordlander Professor of Physics and Astronomy, Professor of Electrical & Computer Engineering, Associate Professor of BioEngineering, Rice University Houston, Texas November, 2013 Copyright Linglin Yu 2013 ABSTRACT A Novel Statistical Potential for Protein β-Sheets Prediction by Linglin Yu One of the most long-term challenging problems in biophysics studies for both computational scientists and experimentalists is protein structure prediction, whose goal is to obtain three-dimensional native protein structure from one- dimensional sequence. In protein structure prediction problems, a fundamental problem is β-sheets structure prediction. Though more than 85% of experimentally solved proteins contain β-sheet structures, limited methods have been found to rapidly and accurately predict the folded conformations. In this study, we proposed a novel statistical potential, OPUS-Beta, to predict the protein β-sheet structure only based on the sequence information. We included three kinds of potential terms in OPUS-Beta, i.e. the self-packing term, the pair- interacting term and the lattice term. The number of hydrogen bonds in β-sheets is also considered as a potential component, corresponding to a global penalty of the potential function. Computational tests show that the new statistical potential has an outstanding performance on native structure recognition from decoys comparing to the β-sheet specific potentials in literature. We will apply the potential to improve the prediction of β-strand and β-residue arrangement and registration for beta proteins. ii Acknowledgments I would like to thank my advisor Dr. Jianpeng Ma, for his trust, and encouragement on my researches. I also appreciate the other two committee members of my thesis: Dr. Nordlander and Dr. Raphael, for their helpful support and insight directions. All the members in Ma's Lab are acknowledged, for their company with me to go through the difficulties. Specially, I need to thank Dr. Mingyang Lu's significant suggestions and Tianwu Zang's helpful discussions on my projects. Last but not the least, I need to thank Applied Physics Program of Rice Quantum Institute (RQI), for giving me the chance and resources to go on the real research journey in Rice University. iii Contents Acknowledgments ..................................................................................................... iii Contents ................................................................................................................... iv List of Figures ............................................................................................................ vi List of Equations ....................................................................................................... vii Background ................................................................................................................ 1 1.1. Introduction to Protein Structure ............................................................................ 2 1.1.1. Primary structure: amino acid sequence ........................................................... 3 1.1.2. Second structure: α helix and β sheet ............................................................... 5 1.1.2.1. α helix .......................................................................................................... 5 1.1.2.2. β sheet ......................................................................................................... 6 1.2. A brief review of free energy in statistical mechanics ............................................. 8 Statistical Potential .................................................................................................. 10 2.1. Construction of statistical potential functions ....................................................... 12 2.2. Types of statistical potential functions .................................................................. 14 2.3. Features of statistical potential functions .............................................................. 15 Potential for β-Sheets Prediction .............................................................................. 17 3.1. Motivation .............................................................................................................. 17 3.2. Methods ................................................................................................................. 19 3.2.1. Data .................................................................................................................. 19 3.2.2. Potential function construction ....................................................................... 20 3.2.2.1. Self-packing term ...................................................................................... 21 3.2.2.2. Pair term .................................................................................................... 22 3.2.2.3. Hydrogen-bond term ................................................................................ 24 3.2.2.4. Lattice term ............................................................................................... 24 3.2.2.5. 2D-RNN term ............................................................................................. 25 3.3. Results .................................................................................................................... 26 3.3.1. Weights Optimization ...................................................................................... 26 3.3.2. Self-packing information ................................................................................. 27 3.3.3. Native structure recognition from decoys ...................................................... 28 iv 3.3.4. Effects of OPUS-Beta and 2D-RNN potential in the combined potential ........ 33 3.3.5. Effects of different energy terms in OPUS-Beta .............................................. 35 3.4. Conclusions ............................................................................................................. 36 References: .............................................................................................................. 38 v List of Figures Figure 1.1 – Overview of protein structure: primary structure; secondary structure; tertiary structure and quaternary structure. ........................................... 2 Figure 1.2 – Primary Structure: (a) amino acid (b) backbone ................................. 3 Figure 1.3 – 3D protein backbone structure .................................................................. 4 Figure 1.4 – Polypeptide chain is consist of peptide units......................................... 4 Figure 1.5 – 3D α helix ............................................................................................................ 5 Figure 1.6 – β sheets: (a) parallel β sheet; (b) antiparallel β sheet; (c) β sheet with parallel strands and antiparallel strands .............................................................. 7 Figure 3.1 – 5 different types for self-packing energy counting ............................ 22 Figure 3.2 – Pair interactions of the residue pairs in β sheets ............................... 24 Figure 3.3 – Different states of self-packing energy term ........................................ 27 Figure 3.4 – Illustration of making the decoy set I ..................................................... 29 Figure 3.5 – The performances of different potentials on native structure recongnition from decoy set I ............................................................................................ 29 Figure 3.6 – Illustration of making the decoy set II .................................................... 31 Figure 3.7 – The performances of different potentials on native structure recongnition from decoys set II ........................................................................................ 31 Figure 3.8 – The performances of different potentials on native structure recongnition from decoys set I and set II....................................................................... 32 Figure 3.9 – Effects of OPUS-Beta and 2D-RNN potential ......................................... 34 Figure 3.10 – Effects of different energy terms in OPUS-Beta ................................ 35 vi List of Equations Equation 1.1 – Partition function ....................................................................................... 8 Equation 1.2 – Probability distribution function .......................................................... 8 Equation 1.3 – Entropy ........................................................................................................... 9 Equation 2.1 – Probalility distribution function ......................................................... 12 Equation 2.2 – Partition function ..................................................................................... 12 Equation 2.3 – Free energy ................................................................................................. 13 Equation 2.4(a) – Free energy difference between the observed state and the reference state. ....................................................................................................................... 13

A Novel Statistical Potential for Protein Beta-Sheet Prediction

Homology Modeling and Analysis of Structure Predictions of the Bovine Rhinitis B Virus RNA Dependent RNA Polymerase (Rdrp)

Dnpro: a Deep Learning Network Approach to Predicting Protein Stability Changes Induced by Single-Site Mutations Xiao Zhou and Jianlin Cheng*

11: Catchup II Machine Learning and Real-World Data (MLRD)

Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems

Chapter 1 Ab Initio Protein Structure Prediction

The PSIPRED Protein Analysis Workbench

A Molecular Modeling Approach to Identify Potential

A Simple and Efficient Statistical Potential for Scoring Ensembles of Protein Structures

Crystallographic Molecular Replacement Using an in Silico-Generated Search Model of SARS-Cov-2 ORF8

Statistical Inference for Template-Based Protein Structure

Protein Structure Prediction and Model Quality Assessment

Fold Assessment for Comparative Protein Structure Modeling