Geometric and Statistical Methods for the Analysis and Prediction of Structural Interactions Between Biomolecules Julie Bernauer
Total Page:16
File Type:pdf, Size:1020Kb
Geometric and statistical methods for the analysis and prediction of structural interactions between biomolecules Julie Bernauer To cite this version: Julie Bernauer. Geometric and statistical methods for the analysis and prediction of structural interac- tions between biomolecules. Bioinformatics [q-bio.QM]. Université Paris-Sud XI, 2015. tel-01136261 HAL Id: tel-01136261 https://tel.archives-ouvertes.fr/tel-01136261 Submitted on 26 Mar 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Copyright Methodes´ geom´ etriques´ et statistiques pour l'analyse et la prediction´ des interactions structurales de biomolecules´ HABILITATION À DIRIGER DES RECHERCHES (Sp´ecialit´eInformatique) UNIVERSITÉ PARIS-SUD 11 pr´esent´eeet soutenue publiquement le 13 janvier 2015 Julie Bernauer Pr´esident: Philippe Dague Professeur, Universit´eParis Sud, LRI/LaDHAC Rapporteurs : Patrice Koehl Professor, University of California, Davis, Department of Computer Science Erik Lindahl Professor, KTH Royal Institute of Technology & Stockholm University, Department of Biochemistry and Biophysics Dave Ritchie Directeur de Recherche, Inria Nancy Grand Est, LORIA/Orpailleur Examinateurs : Johanne Cohen Charg´eede Recherche CNRS, HDR, LRI/Galac Erik Lindahl Professor, KTH Royal Institute of Technology & Stockholm University, Department of Biochemistry and Biophysics Dave Ritchie Directeur de Recherche, Inria Nancy Grand Est, LORIA/Orpailleur C´eline Rouveirol Professeure, Universit´eParis Nord, LIPN Jean-Marc Steyaert Professeur, Ecole´ Polytechnique, LIX/BioInfo Alain Viari Directeur de Recherche, Directeur Scientifique Adjoint Sant´e,Biologie et Plan`ete, Inria Marraine : Christine Froidevaux Professeure, Universit´eParis Sud, LRI/BioInfo Equipe´ projet AMIB Inria Saclay ^Ile-de-France, LIX Ecole´ Polytechnique { CNRS UMR 7161 1 rue Honor´ed'Estienne d'Orves, B^atiment Alan Turing, Campus de l'Ecole´ Polytechnique 91120 Palaiseau, Cedex France T´el.:+33 (0)1 72 92 59 00 { Fax : +33 (0)1 74 85 42 42 http://www.inria.fr/centre/saclay i Acknowledgments My scientific research career started in 2001 when I began my Ph.D. studies under the supervision of Anne Poupon at LEBS in Gif-sur-Yvette. Anne trusted me with a bioinformatics project and her support was essential to most of my career. I am also thankful to the people would trusted me at the time and made possible for a chemistry student to pursue biology studies while teaching computer science. In particular, I am grateful to Jean-Marc Steyaert, Hubert Comon and also Joël Janin. Not only did Joël welcome me to his institute but he also introduced me to the docking community. I am also extremely thankful to Herman van Tilbeurgh, the Yeast Structural Genomics group members, IBBMC and LEBS people for having created a great work environment for my Ph.D. studies. At this time, being a computer science teaching assistant at Orsay also lead me to meet people who would have an essential part in the professional choices I would make in the future. Jérôme Azé introduced me to machine learning and our collaboration has been ongoing ever since. The interactions triggered by the bioinformatics community Christine Froidevaux set up in Orsay were decisive in my pursuing research in computer science for biology. I also had the chance to meet Michael Levitt in 2003. This life changing encounter got me to move to the US in 2006, where I spent a lot of time ever since and will certainly continue to do so. I met very valuable collaborators and friends during that time. I would especially like to thank Dahlia Weiss, Xuhui Huang, Adelene Sim, Peter Minary, Sam Flores, Tanya Raschke, Bick Do for supporting me in doing science but also being great people to work with. After my post-doctoral stay at Stanford, I joined Inria in 2007, where I met incredibly talented people from which I learned a lot. I am thankful to Frederic Cazals for having welcomed me and also to Sébastien Loriot, Monique Teillaud, Olivier Devillers and Sylvain Pion for being great people to interact with. I then had to move back to the Paris area and I am extremely thankful to Christine Eisenbeis, Serge Steer and Alain Viari for trusting me and for having made that transition possible. My warmest thanks go to Mireille Regnier who gave me the opportunity to join her group and the people at Inria Saclay with whom I really enjoyed working on a daily basis. I also have to thank the people from LIX and DIX who contributed to make the last four years I spent at Polytechnique a wonderful time. I also thank Henry van den Bedem for having welcomed me at SLAC and made this last year in the Bay Area a great experience. My thanks also to Johanne Cohen who was a essential collaborator and support in the last two years. I benefited from the help of Amélie Héliou, Alice Héliou, Adelene Sim, Rasmus Fonseca and Kilian Cavalotti for the proofreading of this manuscript and I am especially grateful to them. All this would not have been possible without all of you. Thanks a million! Julie Bernauer November 6th, 2014. iii A human must turn information into intelligence or knowledge. We’ve tended to forget that no computer will ever ask a new question. Grace Hopper in The Wit and Wisdom of Grace Hopper by Philip Schieber, 1987. It is a mistake to think you can solve any major problems just with potatoes. Douglas Adams, Life, the Universe and Everything, 1982. CONTENTS 1 Introduction1 A Context: modeling of biological structures......................1 B Protein, RNA and complex structures.........................2 C Experimental and computational challenges......................2 D Objective........................................5 2 Geometric models and supervised learning for docking7 A Geometric coarse-grained models and geometric constructions............9 1 Coarse-grained models............................9 2 Geometric constructions: the Voronoi and Laguerre diagrams........ 10 2.1 Use cases in biology........................ 10 2.2 The Voronoi diagram........................ 10 2.3 The Delaunay triangulation..................... 11 2.4 Properties.............................. 11 2.5 The Laguerre diagram....................... 12 2.6 Voronoi and Laguerre cells constructions for amino and nucleic acids................................. 12 B Predicting protein-protein complexes: supervised learning for docking....... 13 1 The docking problem............................. 13 2 Supervised learning and prediction...................... 14 2.1 Principle.............................. 14 2.2 Evaluation criteria......................... 15 2.3 Algorithms............................. 17 vi Contents C Results and perspectives................................ 19 1 Ground truth and experimental data...................... 19 2 Synthetic data generation........................... 20 3 Voronoi construction and coarse-graining................... 20 4 Supervised learning algorithms........................ 20 5 Relevance for biophysics and biology..................... 21 5.1 The Random Energy Model (REM)................ 21 5.2 Prediction results.......................... 22 6 Perspectives.................................. 24 3 From biophysics to data: knowledge-based potentials 25 A Pioneer work and the RNA prediction challenge................... 26 1 Knowledge-based potentials for proteins................... 26 2 RNA structure prediction........................... 26 3 Criteria for structural prediction........................ 27 3.1 Energy vs. RMSD......................... 27 3.2 Native structure ranking...................... 27 3.3 Enrichment score.......................... 28 B RNA structural data and potential derivation..................... 29 1 Dataset extraction and distance collection................... 29 2 Building a potential.............................. 29 2.1 Formalism.............................. 29 2.2 Measurements and statistics.................... 31 2.3 Low-count regions and distance cutoff............... 32 2.4 The reference state problem.................... 32 C Outcome and limitations................................ 33 1 Biological results............................... 33 2 KB potential derivation and applications................... 33 4 A robotics-inspired model: inverse kinematics sampling for RNA 35 A Inverse kinematics and RNA dynamics........................ 36 B KGSrna: a simple and efficient model......................... 36 1 Methodology.................................. 37 1.1 Overview.............................. 37 1.2 Construction of the tree....................... 38 1.3 Modeling the conformational flexibility of pentameric rings... 38 vii 1.4 Null-space perturbations...................... 39 1.5 Rebuild perturbations........................ 40 1.6 Experimental design for validation................. 42 2 Initial validation................................ 42 2.1 Broad and accurate atomic-scale sampling of the native ensemble 42 2.2 Large scale deformations...................... 43 2.3 KGSrna as an alternative to NMA................. 45 C Extending biological results.............................