Predicting Protein Folding Pathways Using Ensemble Modeling and Sequence Information by David Becerra McGill University School of Computer Science Montr´eal, Qu´ebec, Canada A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy c David Becerra 2017 Dedicated to my GRANDMOM. Thanks for everything. You are such a strong woman i Acknowledgements My doctoral program has been an amazing adventure with ups and downs. The most valuable fact is that I am a different person with respect to the one who started the program five years ago. I am not better or worst, I am just different. I would like to express my gratitude to all the people who helped me during my time at McGill. This thesis contains only my name as author, but it should have the name of all the people who played an important role in the completion of this thesis (and they are a lot of people). I am certain I would have not reached the end of my thesis without the help, advises and support of all people around me. First at all, I want to thank my country Colombia. Being abroad, I learnt to value my culture, my roots and my south-american positiveness, happiness and resourcefulness. I am really proud of being Colombian and of showing to the world a bit of what our wonderful land has given to us. I would also like to thank all Colombians, because with your taxes (via Colciencia’s funding) I was able to complete this dream. I also like to thank Canada for accepting us in this beautiful land and for embracing our lives and dreams. I would like to express my deepest gratitude to my wife Luisa and to my son Samuel. You are a constant (and fundamental) source of motivation and encouragement. Your are the real reason why this dream become a reality. I will be always in depth debt with you for walking this journey with me. You have inspired me to be better everyday and to do not give up during the hard times. You know that this thesis really belongs to you two. I would like to thank my parents for supporting my dreams. They have always encouraged us (me and my brother) to take risks and to work for our dreams. I deeply appreciate their help, support and advices. I still have a lot to learn from you. I want to extend my appreciation to my brother Andres. He is an excellent professional and he has been a long personal and career model to follow. I only hope that all our efforts would inspire Samuel and Maria Paula to pursuit their dreams and happiness. I would like to express my deepest gratitude to my supervisor Prof. J´erˆome Waldisp¨uhl for giving me the opportunity to carry out the research presented in this thesis. His guid- ance, motivation and patience were fundamental for succeeding on this project. Working with Professor Waldisp¨uhl has represented for me a once-in-a-life-time opportunity to ii iii learn from a very motivate, innovative and supportive person. Beside my supervisor, I would like to thank Professor Mathieu Blanchette for their generous time and good will through all the (non-appointed) meetings that we had. I feel very honored to have such a distinguished researcher and person as my advisor. I am also very grateful to all the current and former members of the Waldisp¨uh and Blanchette research groups. I would particularly like to thank Alex and Chris, who helped me in countless ways, from giving me advices, sharing complain lunches, fixing my code and for reading and editing this manuscript. Your friendships have always been invaluable. I also really appreciate Mathieu Lavall´e’s tips about the Canadian life. Our shared hobbies of watching cycling and soccer sports made my integration to the lab (and country) much easier. Many thanks to Ayrin and Rola for caring so much about Samuel and my family. Your support was fundamental during all this process. I would like to show my gratitude to Mohamed. You are the right model to follow to balance parenthood and graduate studies. I owe sincere and earnest thankfulness to the students who I mentored (Zheng, Shu and Ettienne). I learn so much from you guys. I owe sincere and earnest thankfulness to all the members of the labs with who I shared invaluable moments during the daily life in the group Olivier, Vladimir, Mohammed I and II, Olivia, Antoine, Dilmi, Chrisostomos, Carlos, Roman, James, Jimmy, Faizy, Javier, Navin, Pouriya, Maia. There are many people to thank beyond the scope of the research environment. Thanks a lot to Juan, Ismenia, Paulina, Felipe, Alejandro, Silvana, Patricia, Pablo, Jessica here in Montreal. Back in Colombia, I would like to thank a los compas Barrantes, Fernando, Goyes, Gomez, Juan Carlos, Lisbeth. I would like to show my deepest gratitude to my friend Sergio in Netherlands. Even in the long distance, Sergio has been always supporting me and giving me the motivation needed to succeed in this doctoral process. I really value our friendship and I really hope to keep you as my friend for long time. I am also want to thank Norman for his valuable advices. I will be always in depth debt with all the Computer Science staff. I thank Sheryl, Ann, Diti, Tricia and Professor Kemme. I apologize for all the persons I have not mentioned. There could be many names that escape my mind right now, but you know that you play an important role in my life. Thanks a lot for all. Abstract - Abr´eg´e The protein folding problem aims to predict the complete physical and dynamical pro- cess that transforms an unfolded sequence into a functional 3D protein structure. This problem consists of two (open) sub-problems: i) the protein structure prediction prob- lem and ii) the protein pathway prediction problem. Computational techniques to face these two sub-problems have been based on the theory of evolution and laws of physics. To-date, classical approaches to obtaining detailed information about protein folding rely on time-consuming methods that are primarily limited to relatively small proteins (i.e., ≤ 50 amino acids). The overall objective of this thesis is to explore algorithms that conciliate: i) the prediction of protein structures and pathways, ii) physical-based pre- dictions (i.e., low free-energy models) & evolutionary based predictions (i.e., sequence variation methods), and iii) computational costs and granularity level requirements of protein folding simulations. We propose an algorithmic framework for predicting protein folding that offers a better trade-off between resolution and efficiency. This framework computes accurate coarse-grained representations of the conformational landscape for large proteins through the combination of ensemble modeling techniques and evolution- ary based sequence information. The resulting conformational energy landscape is then used to predict dominant folding pathways. Given that the proposed framework in this thesis makes use of sequence information, we also explore a crowdsourcing and multi- objective evolutionary strategy to investigate the accuracy of evolutionary information encoded by multiple sequence alignments. Finally, to present our results to the wider bi- ology and computer science communities, we develop an easy-to-use interactive molecular visualizer. Abstract - Abr´eg´e Le probl`eme du repliement des prot´eines a pour objectif de pr´edire le processus physique et dynamique complet de conversion de la s´equence d’une prot´eine non-repli´ee en une structure 3D. Ce probl`eme peut ˆetre d´ecompos´e en deux sous-probl`emes (ouverts): i)le probl`eme de la pr´ediction de la structure des prot´eines et ii)leprobl`eme de la pr´ediction de le processus de repliement des prot´eines. La th´eoriedel’´evolution et les lois de la physique sont les principes sur lesquels les techniques computationnelles se sont bas´ees afin de r´esoudre ces deux sous-probl`emes. A` ce jour, les approches classiques qui sont utilis´ees afin d’obtenir des informations sur le repliement des prot´eines sont reli´ees `ades m´ethodes chronophages, en principe limit´ees `adesprot´eines de petite taille. L’objectif g´en´eral de cette th`ese est d’explorer des algorithmes qui concilient : i)lapr´ediction des structures et celle du processus de repliement des prot`eines, ii)lespr´edictions bas´ees sur les lois physiques (i.e., des mod`elesa´ ` energie libre minimale) et celles bas´ees sur la th´eoriedel’´evolution (i.e., des m´ethodes bas´ees sur les variations de s´equences) et iii) les coˆuts computationnels et les niveaux de granularit´e requis par les simulations. Dans ce but, nous proposons une structure d’algorithme pour le repliement des prot´eines qui offre un meilleur compromis entre la r´esolution et l’efficience. Cet algorithm calcule des repr´esentations pr´ecises `a gros grain du paysage conformationnel des prot´eines de grande taille, en combinant des techniques de mod´elisation d’ensemble avec l’informations ´evolutive des s´equences. Ce paysage ´energ´etique conformationnel est ensuite utilis´eafin de pr´edire les voies de repliement les plus probables. Puisque la structure propos´ee dans le cadre de cette th´ese utilise des informations de s´equences, nous ´etudions aussi une strat´egie participative, et une autre, multi-objectif et ´evolutive pour examiner la pr´ecision de l’information encod´ee par les alignements multiples de s´equences. Finalement, de mani`ere `apr´esenter nos r´esultatsa ` la communaut´e de biologistes et de chercheurs en in- formatique, nous avons d´evelopp´e un visualiseur mol´eculaire interactif facile d’utilisation.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages230 Page
-
File Size-