Protein-Protein Interaction Network Alignment and Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Protein-Protein Interaction Network Alignment and Evolution by Brian Man-Kin Law A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Computer Science University of Toronto © Copyright by Brian Law 2019 Protein-Protein Interaction Network Alignment and Evolution Brian Law Doctor of Philosophy Computer Science University of Toronto 2019 Abstract Network alignment is an emerging analysis method enabled by the rapid large-scale collection of protein-protein interaction data for many different species. As sequence alignment did for gene evolution, network alignment will hopefully provide new insights into network evolution and serve as a new bioinformatic tool for making biological inferences across species. Using new SH3 binding data from Saccharomyces cerevisiae , Caenorhabditis elegans , and Homo sapiens , I construct new interface-interaction networks and devise a new network alignment method for these networks. With appropriate parameterization, this method is highly successful at generating alignments that reflect known protein orthology information and contain high network topology overlap. However, close examination of the optimal parameterization reveals a heavy reliance on protein sequence similarity and fungibility of other data features, including network topology data, an observation that may also pertain to protein-protein interaction network alignment. Closer examination of interactomic data, along with established orthology data, reveals that protein-protein interaction conservation is quite low across multiple species, suggesting that the high network topology overlap achieved by contemporary network aligners is ill-advised if biological relevance of results is desired. Further consideration of gene duplication and protein ii binding sites reveal additional PPI evolution phenomena further reducing the network topology overlap expected in network alignments, casting doubt on the utility of network alignment metrics solely based on network topology. Instead, I suggest a new framework to think about protein-protein interaction network alignment focused on generating and validating small-scale inferences. I create a prototype alignment visualization and analysis tool to facilitate this approach, which will hopefully aid researchers in learning more about the mechanisms of network evolution and how network alignment can model them. iii Acknowledgments I would like to thank my supervisor Gary Bader, without whom this thesis would never exist. His unfailing confidence and patience were my constant companions on this long journey, whether I was working myself into a frenzy or paralyzed by doubt. Even when I felt it unwarranted, Gary’s belief in me never wavered, and without his encouragement, I would have turned off this path long ago. Thank you. I would like to thank the members of my committee Derek Corneil, Alan Moses, and Zhaolei Zhang for their insight and support. Their questions and ideas have occupied me for many years, and I regret I still cannot provide the answers that they deserve. The challenges they have laid out for me have driven, inspired, and haunted me over the years, and I will continue striving to meet them. Thank you. I would like to thank the graduate office staff at the Department of Computer Science, past and present: Julie Weedmark, Linda Chow, Kolden Simmons, Vinita Krishnan, Margaret Meaney, Celeste Francis Esteves, Erin Bedard, Vivian Hwang, and Lynda Barnes, for their assistance in navigating the rocky shoals of academic bureaucracy. I am unsure any graduate student could finish their degrees in our department without their assistance, and yet their response to my incompetence with paperwork was never anything but helpfulness. Thank you. I would like to thank the many people, too many people to name, who have touched my life positively and kept me on track these past years. My fellow Computer Science graduate students, my CCBR 6 th floor-mates, my Bader Lab compatriots, the many undergraduate students I have had the pleasure of teaching, my CUPE 3902 comrades, my teaching faculty mentors, my friends and my family. This has been a long and difficult labour, and it may have entirely consumed me but you reminded me that there was more to do, more to me than simply my work. Thank you. Finally, I would like to apologize to my grandmother, (Monica) Tack Ching Soong. I thought I would have more time to make you proud of me. I took it for granted while you raised me, cared for me, nurtured me. I never expected it would dwindle so quickly, and now that I am out of time, I find it supplanted by regret, so many regrets, big and small. I am sorry. iv Table of Contents Acknowledgments.......................................................................................................................... iv Table of Contents .............................................................................................................................v List of Tables ................................................................................................................................. ix List of Figures ..................................................................................................................................x List of Appendices ....................................................................................................................... xiii List of Abbreviations ................................................................................................................... xiv Chapter 1 ..........................................................................................................................................1 Background .................................................................................................................................2 1.1 Introduction ..........................................................................................................................2 1.1.1 Comparative Evolution ............................................................................................2 1.1.2 Protein-Protein Interactions .....................................................................................3 1.1.3 Protein-Protein Interaction Data Quality .................................................................5 1.1.4 Interface-Interaction Networks ................................................................................6 1.2 Protein-Protein Interaction Network Alignment ................................................................11 1.2.1 Theoretical Considerations ....................................................................................12 1.2.2 Local Network Alignment .....................................................................................13 1.2.3 Global Network Alignment....................................................................................14 1.2.4 Assessment and Evaluation of Network Alignments.............................................17 1.3 Protein-Protein Interaction Network Evolution .................................................................27 1.3.1 Fundamentals and Applications .............................................................................27 1.3.2 Models for Protein-Protein Interaction Networks..................................................29 1.4 Protein-Protein Interaction Network Alignment Visualization .........................................31 Chapter 2 ........................................................................................................................................33 Alignment of Interface-Interaction Networks ...........................................................................34 2.1 Introduction ........................................................................................................................34 v 2.1.1 Network Alignment Theory ...................................................................................34 2.1.2 Interface-Interaction Networks ..............................................................................35 2.1.3 Protein-Protein Interaction Network Alignment ....................................................36 2.2 Results ................................................................................................................................39 2.2.1 Comparison with PPIN Alignment Algorithms .....................................................39 2.2.2 Incorporating More Similarity Features.................................................................45 2.2.3 Parameter Weight Tuning ......................................................................................47 2.2.4 A Zoom-in ..............................................................................................................48 2.2.5 Yeast Subspecies Alignments ................................................................................51 2.3 Discussion ..........................................................................................................................53 2.4 Conclusions ........................................................................................................................58 2.5 Methods..............................................................................................................................58 2.5.1 Algorithm ...............................................................................................................58 2.5.2 Comparison Algorithms .........................................................................................61