Leveraging Novel Information Sources for Protein Structure Prediction

Leveraging Novel Information Sources for Protein Structure Prediction vorgelegt von Dipl.-Biol. Michael Bohlke-Schneider, geboren Schneider geb. in Dortmund von der Fakultät IV — Elektrotechnik und Informatik der Technischen Universität Berlin zur Erlangung des akademischen Grades Doktor der Naturwissenschaften - Dr. rer. nat. - genehmigte Dissertation Promotionsausschuss: Vorsitzender: Prof. Dr. Klaus Obermayer 1. Gutachter: Prof. Dr. Oliver Brock 2. Gutachter: Prof. Dr. Juri Rappsilber 3. Gutachter: Prof. Dr. Jens Meiler Tag der wissenschaftlichen Aussprache: 22.12.2015 Berlin 2016 I would like to dedicate this thesis to my loving wife Nina and my mother Barbara. Acknowledgements Becoming a scientist was clearly the greatest challenge that I had to master in my life. The past five years changed my views and my life in a way that I never thought to be possible. Many people helped me during this journey. I am blessed with having many good friends and I cannot possibly list all the people that I am grateful for. First, I want to thank my adviser, Oliver Brock. Oliver is the sharpest mind I have ever met and his ability to think incredibly clearly has been very inspiring. In addition, Oliver has redefined my standard for scientific rigor and what it means to master all skills that a scientist should have. But his most important lesson was to never be satisfied, to constantly challenge the status quo, and to change the world for the better. This lesson will surely shape my further life, no matter where it will take me. I would also like to thank the members of my committee, Juri Rappsilber and Jens Meiler. Juri’s dedication in our cooperation project contributed a great deal to my scientific success. I also enjoyed the numerous insightful and fun conversations with Juri. I appreciate all the time that he took to advice me during my graduate studies. I would like to thank Jens Meiler for becoming a member of my committee. I appreciate his time reading my thesis and making it possible to arrange the defense at very short notice. Lila Gierasch and Anne Gershenson guided me through my first years in graduate school. I would like to thank them for their advice. The next important block of people is clearly the amazing crowd that I spent so much time with at the Robotics and Biology Laboratory. This includes (in no particular order): Ines Putz, Raphael Deimel, Arne Sieverling, Roberto Martin Martin, Vincent Wall, Tim Werner, Mahmoud Mabrouk, Rico Jonschkowski, Sebastian Höfer, Clemens Eppner, Jessica Abele, Manuel Baum, and Fabian Heinemann. If anything, you guys were the decisive factor for my success as a graduate student. All the amazing people at RBO created a fun and vibrant community with an honest and inspiring atmosphere that I have never experienced before. Thank you for all the critique, help, and warm words along my way. I would also like to thank Alexander Margraf and Wolf Schaarschmidt for all the technical support and chats. Thank you Janika Urig for holding the lab together, being the good spirit, and for being a great listener. vi I also thank Adam Belsom and Lutz Fischer for their amazing work and productive cooperation. Furthermore, I thank "the other people" on the fifth floor at Marchstrasse that contributed to the great atmosphere: Robert Lieck, Johannes Kulick and Marianne Maertens. I would also like to thank the people that I spend only little time with, but nonetheless appreciated: Dov Katz, Ingo Kossyk, José Antonio Álvarez Ruiz, Nasir Mahmood, Florian Kamm and Georgios Fagogenis. Special thanks also goes to the members of the Budisa lab that I spend a lot of time with at parties and the Schleusenkrug. In particular, I would like to thank Maxi Marock and Jessica Nickling who are of such great help by caring for my dog Lilly when I cannot do it. There have been many people that made my time in Berlin truly enjoyable. The following became close friends and I hope that we will never get out of touch because every one of them is a truly remarkable person: Bastian Henkel, Melanie Henkel, Kim Dohlich, Christian Busse, Claudia Luber, and Felix Luber. Thank you for joining me in good times and advising me at times of distress. Of course, I have to thank all my important friends at home and those that are scattered throughout the world: Benjamin Tuma, Lukas Börger, Bastian Haumann, Stefan Pennartz, Timo Schulte, Paul Ratka, Pit Wingender, Philip Scheit and, Christian Saßenscheidt. I also like to thank all the members of my karate club Shirokuma Berlin and the members of the Berlin board gaming community for many hours of fun. In particular: Rico Leifheit, Boris Mahn, and Heiko Möller. An incredible load of thanks goes to my family. My mother Barbara Schneider always supported me in all my years of study and is the source of the wisest advice I have ever received. I also thank Heino Goldschmidt for bringing a new spark of life to my family. I thank my older sister Andrea Bielski for being incredibly supportive and my younger brothers Daniel Schneider and Markus Schneider. I always look forward to spent time with you. I would also like to thank my mother-in-law, Maria Bohlke-Wohlers, for spreading optimism wherever she goes. The next person is not really a person but my dog Lilly. I thank her for making me happy whenever she is around. My last acknowledgement is dedicated to my loving wife Nina Bohlke. Nina approaches anything in her life with incredible charm, wit, and warm-heartedness. Her determination and discipline are second to none. Nina naturally has the calm and wise mind that I hope to develop at some point in my life. I’m incredibly blessed for being able to call her my wife. Prepublication and Statement of Contribution This thesis has been in part published in the following publications (listed by date in chrono- logical order, starting with the newest publication): A Mabrouk, M., Werner, T., Schneider, M. Putz, I., and Brock, O. (2015) Analysis of Free Modeling Predictions by RBO Aleph in CASP11. Proteins, in press. B Belsom, A.*, Schneider, M.*, Fischer, L., Brock, O., and Rappsilber, J. (2015). Serum Albumin Domain Structures in Human Blood Serum by Mass Spectrometry and Computational Biology. Mol. Cell. Proteomics, in print: mcp.M115.048504. C Mabrouk, M.*, Putz, I.*, Werner, T., Schneider, M. Neeb, M., Bartels P. and Brock, O. (2015). RBO Aleph: leveraging novel information sources for protein structure prediction. Nucleic Acids Res., 43(W1):W343–W348. D Schneider, M. and Brock, O. (2014). Combining Physicochemical and Evolutionary Information for Protein Contact Prediction. PLoS ONE, 9(10):e108438. * contributed equally Chapters 1, 2, and 4 are original to this thesis. Chapter 3 contains a discussion of the related work. Parts of this chapter have been published in the related work discussion of paper [A, B, C, D]. Chapter 5 presents a contact prediction algorithm called EPC-map and is based on the original publication [D]. Own contributions to [D]: I (MS) was the sole first author of this paper. I conceived and designed the experiments, conceived and designed the algorithm, implemented the algorithm, viii performed experiments, developed analysis tools, analyzed the data, and contributed to paper writing. Contributions of co-authors to [D]: OB scientifically advised this work, conceived and designed the experiments. OB contributed to paper writing. Chapter 6 presents the analysis of the structure prediction server RBO Aleph in the 11th community-wide Critical Assessment of Protein Structure Prediction experiment (CASP11). RBO Aleph server was originally described in [C]. The analysis of the CASP11 results of this server was originally published in [A]. Own contributions to [C]: My particular contribution was the design and implementation of EPC-map and contact-guided model-based search (main developer from 2010-2015) into the ab initio pipeline of RBO Aleph. I also conceived and designed the server. I designed and implemented significant parts of the server, especially on the backend. I maintained the server during CASP11 and contributed to paper writing. Contributions of co-authors to [C]: MM, IP, TW, and OB conceived and designed the server. MM, IP, TW designed and implemented significant parts of the server. MM, IP, TW, PB, and MN designed and implemented the frontend of the web server. MM, IP, and TW maintained the server during CASP11. MM, IP, TW, and OB contributed to paper writing. Own contributions to [A]: I conceived and designed experiments. I performed experiments and developed analysis tools. I analyzed the data, with a focus on the residue-residue contact prediction results of RBO Aleph in CASP11 and the effect of component combinations on the pipeline output. The contact prediction results of RBO Aleph are based on the algorithm developed in chapter 5. I was the main developer of the conformational space search algorithm of RBO Aleph, contact-guided model-based search, from 2010-2015. I contributed to paper writing. Contributions of co-authors to [A]: MM, TW, IP, and OB conceived and designed experiments. MM, TW, and IP performed experiments. MM, TW, and IP conceived and implemented analysis tools. MM, TW, and IP analyzed data. MM, TW, IP, and OB contributed to paper writing. The following figures and tables were prepared in part or in modified form by co-authors of the original paper [A]: MM: Figure 6.6, 6.9. TW: Figure 6.7, 6.8, 6.11, 6.12, Table 6.4. IP: Table 6.1. Chapter 7 presents a novel hybrid structure determination method based on high-density cross-linking/mass spectrometry and conformational space search. This chapter is based on paper [B].

Leveraging Novel Information Sources for Protein Structure Prediction

Arxiv:1911.09811V1 [Physics.Bio-Ph] 22 Nov 2019 Help to Associate a Folded Structure to a Protein Sequence

Protein Contact Map Prediction Using Multiple Sequence Alignment

Computational Protein Structure Prediction Using Deep Learning

Ensembling Multiple Raw Coevolutionary Features with Deep Residual Neural Networks for Contact‐Map Prediction in CASP13

Direct Information Reweighted by Contact Templates: Improved RNA Contact Prediction by Combining Structural Features

Machine Learning Methods for Protein Structure Prediction Jianlin Cheng, Allison N

Protein Structures

Deep Learning-Based Advances in Protein Structure Prediction

Challenges in the Computational Modeling of the Protein Structure—Activity Relationship

Protein Contact Map Denoising Using Generative Adversarial Networks

Homology Modeling in the Time of Collective and Artificial Intelligence

And Contact-Based Protein Structure Prediction