MOIRAE: a Computational Strategy to Predict 3-D Structures of Polypeptides

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL INSTITUTO DE INFORMÁTICA PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO MÁRCIO DORN MOIRAE: A Computational Strategy to Predict 3-D Structures of Polypeptides Thesis presented in partial fulfillment of the requirements for the degree of Doctor of Computer Science Prof. Dr. Luis C. Lamb Advisor Profa.Dra. Luciana S. Buriol Coadvisor Porto Alegre, August 2012 CIP – CATALOGING-IN-PUBLICATION Dorn, Márcio MOIRAE: A Computational Strategy to Predict 3-D Structures of Polypeptides / Márcio Dorn. – Porto Alegre: PPGC da UFRGS, 2012. 217 f.: il. Thesis (Ph.D.) – Universidade Federal do Rio Grande do Sul. Programa de Pós-Graduação em Computação, Porto Alegre, BR–RS, 2012. Advisor: Luis C. Lamb; Coad- visor: Luciana S. Buriol. 1. 3-D protein structure prediction. 2. Artificial neural networks. 3. Genetic algorithms. 4. GA local-search oper- ator. 5. Structural bioinformatics. I. Lamb, Luis C.. II. Bu- riol, Luciana S.. III. Título. UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL Reitor: Prof. Carlos Alexandre Netto Pró-Reitor de Coordenação Acadêmica: Prof. Rui Vicente Oppermann Pró-Reitor de Pós-Graduação: Prof. Aldo Bolten Lucion Diretor do Instituto de Informática: Prof. Luis Cunha Lamb Coordenador do PPGC: Prof. Alvaro Freitas Moreira Bibliotecária-chefe do Instituto de Informática: Beatriz Regina Bastos Haro “Das interessanteste an unseren Universum ist, dass man es verstehen kann.” “O Interessante em nosso universo que podemos entende-lo." — Albert Einstein “In Greek mythology, the Moirae (often known in English as The Fates) were three sisters who determined the fate of both gods and humans. The three women were dismal, responsible for spinning (Clotho), weaving (Lachesis) and cut (Atropos) what would be the thread of life of all individuals: Clotho in Greek means "spinning", held the spindle and weaving the thread of life. Lachesis in Greek means "sort", pulled and wrapped the cord tissue. Atropos in Greek means "away", she cut the thread of life. In this thesis Spinning represents the fact of acquire and combine structural patterns from experimental protein structures, Sort represents the genetic algorithm developed to search the conformation space in order to find the protein native-like structure and Away represents the developed strategy to keep out bad solutions. ” AGRADECIMENTOS Primeiramente gostaria de agradecer aos meus pais Marlise e Valdir Dorn por sempre estarem presentes em todas as etapas da minha vida, me apoiando, me incen- tivando e me inspirando. Sem a ajuda de vocês mais este sonho não teria se tornado realidade. Agradeço também as pessoas que quando do início do meu doutoramento se faziam presentes e que hoje infelizmente já não estão mais entre nós: Vó Helga Kupske e Vô Roberto Kupske (in memoriam). Saudades eternas de vocês. Aos avós paternos Reinert e Berta Dorn, anos já se passaram mas a saudade continua. Vocês sempre foram um exemplo de pessoas. Aos demais familiares Marlene, Carlos, Hildegard, Egon, Claudio, Mariza, Morgane e Mônica. Obrigado pelo apoio, carinho e amizade. Aos meus amigos Leonardo Klein, Everton Becker, Eleonor Mayer, Paula Pavan, Melina, Guaraci, Elaine Costa, Neri Costa e Dione Taschetto, com os quais, sempre pude contar. Aos amigos e colegas de laboratório e da UFRGS, Fabiane Dillenburg, Daniel Farenzena, Diego Noble, Rodrigo Machado, Cristiano Reis dos Santos, Rodrigo Kassick, Michelle Leonhardt, Fernando Stefanello, Arton Dorneles, Tiago Primo, Kelen Bernardi e Rafael Parizi pelas conversas, cafés e momentos de descontração. Aos pais acadêmicos Luis Lamb e Luciana Buriol que me acolheram na UFRGS. Obrigado pela dedicação de vocês, pela paciência, pela confiança, pelo apoio, pelas conversas, pela compreensão e pelo incentivo. Vocês são exemplos de profissionais e de seres humanos. Obrigado por tudo! A todos os técnicos administrativos do INF e do PPGC, em especial Beatriz Haro, Claudia de Quadros, Silvania Vidal, Margareth Schäffer, Leivo Ortiz, Luis Otávio e Elisiane de Almeida. Obrigado também a todos os professores do INF em especial Álvaro Moreira e Lisandro Granville. Ao professor Hugo Verli (CBiot/UFRGS) pelo apoio, incentivo e disponibilidade para conversas e trocas de idéias. Aos professores Mario Inostroza (Universidade do Chile) e Lean- dro Wives (INF/UFRGS) pelos construtivos comentários sobre o conteúdo da tese. Aos colegas e amigos de outras instituições nacionais e internacionais pelo apoio e incentivo ao longo desta caminhada: André Braga (Universidade de Brasília), Carlos Humberto Llanos (Universidade de Brasília), Martin Baumman (KIT, Ale- manha), Florian Wilhelm (KIT, Alemanha), Eva Ketelaer (KIT, Alemanha), Math- ias Krause (KIT, Alemanha), Andrea Nestler (KIT, Alemanha), Mario Torres (Red Hat, Alemanha), Federica Ciacca (Itália), Maristela Brizzi (Unijuí), Priscila Gryn- berg (UFMG), Tálita Sono (UFMG) e Diogo Volpati (UNESP). CONTENTS LIST OF ABBREVIATIONS AND ACRONYMS . 8 LIST OF FIGURES . 10 LIST OF TABLES . 13 ABSTRACT . 14 RESUMO . 15 1INTRODUCTION.............................16 1.1 Motivation ................................. 17 1.2 Contributions ............................... 18 1.3 Thesis organization ............................ 18 2ONPROTEINS..............................19 2.1 Introduction ................................ 19 2.2 Amino acid residues and their structures .............. 20 2.3 The peptide bond ............................. 21 2.3.1 Side-chain conformations . 22 2.4 Description of protein structures ................... 24 2.4.1 Primary structure . 24 2.4.2 Secondary structure . 25 2.4.3 Tertiary structure . 26 2.4.4 Quaternary structure . 27 2.5 Protein taxonomy ............................ 27 2.6 Structural databases and structural parameters .......... 29 2.7 Chapter conclusions ........................... 30 3ONTHEPROTEINKINEMATICS...................31 3.1 Introduction ................................ 31 3.2 Three-dimensional protein structure representation ....... 31 3.3 Protein structure kinematics ...................... 33 3.4 Chapter conclusions ........................... 36 4TERTIARYPROTEINSTRUCTUREPREDICTIONMETHODS..37 4.1 Introduction ................................ 37 4.2 Ab initio methods: first principle methods without database information ................................. 39 4.2.1 Overview of ab initio approaches . 43 4.3 First principle methods with database information ........ 49 4.3.1 Overview of first principle methods with database information . 52 4.4 Fold recognition and threading methods .............. 56 4.4.1 Overview of fold recognition and threading methods . 58 4.5 Comparative modeling methods and sequence alignment strategies ...................................... 60 4.5.1 Overview of comparative modeling methods and sequence alignment strategies.................................. 64 4.6 Chapter conclusions ........................... 65 5MOIRAE:REDUCINGTHECONFORMATIONALSEARCHSPACE 66 5.1 Introduction ................................ 66 5.2 Proposed method ............................. 67 5.2.1 Polypeptide representation . 67 5.2.2 Structural templates . 68 5.3 Chapter conclusions ........................... 83 6MOIRAE:ALOCAL-SEARCH-BASEDGENETICALGORITHMTO SEARCH THE 3-D PROTEIN CONFORMATIONAL SPACE . 84 6.1 Introduction ................................ 84 6.2 Proposed method ............................. 84 6.2.1 Potential energy function and implicit solvation model . 84 6.2.2 Searchstrategy .............................. 85 6.3 Chapter conclusions ........................... 91 7EXPERIMENTALRESULTS.......................92 7.1 Introduction ................................ 92 7.2 Model and target proteins ....................... 92 7.3 Computing main-chain torsion angles intervals .......... 97 7.4 Searching the native-like structures of polypeptides .......104 7.5 Structural analysis ............................113 7.5.1 Secondary structure analysis . 116 7.5.2 Stereo-chemical analysis . 118 7.6 Comparison of protein 3-D structure prediction methods ....124 7.7 Chapter conclusions ...........................126 8CONCLUSIONS..............................127 9PUBLICATIONS..............................129 9.1 Published papers .............................129 9.1.1 Under review . 130 APPENDIX A PROTEIN KINEMATICS: ROTATION OF SIDE-CHAIN TORSION ANGLES . 132 APPENDIX B FIRST PRINCIPLE METHODS WITHOUT DATABASE INFORMATION . 135 APPENDIX C FIRST PRINCIPLE METHODS WITH DATABASE IN- FORMATION ........................ 139 APPENDIX D THREADING METHODS . 141 APPENDIX E COMPARATIVE MODELING METHODS SUMMARY . 143 APPENDIX F TORSION ANGLES OF —-TURNS . 145 APPENDIX G SIDE-CHAIN TORSION ANGLES . 147 APPENDIX H TORSION ANGLES INTERVALS . 149 APPENDIX I EXPERIMENTS: FITNESS GRAPHS . 165 APPENDIX J EXPERIMENTS: RAMACHANDRAN PLOTS . 173 APPENDIX K RESUMO ESTENDIDO . 181 K.1 Desenvolvimento da Pesquisa .....................181 K.1.1 Conclusões . 185 REFERENCES . 187 LIST OF ABBREVIATIONS AND ACRONYMS AMBER Assisted Model Building with Energy Refinement A3N Artificial Neural Network-N-gram based method ANN Artificial Neural Network ASA Accessible Surface Area BB Branch and Bound BOSS Biochemical and Organic Simulation System CATH Class, Architecture, Topology, Homologous super-family CASP Critical Assessment of protein Structure Prediction CCP Contact Capacity Potentials CE Combinatorial Extension CHARMM Chemistry at HARvard Macromolecular Mechanics CM Comparative Modeling CREF Central-REsidue-Fragment-based method CSA Conformation Space Annealing DDD Dali Domain Dictionary DSSP Define Secondary Structure

Load more