University of California, San Diego
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA, SAN DIEGO Understanding cellular function through the analysis of protein interaction networks A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Bioinformatics by Silpa Suthram Committee in charge: Professor Trey G. Ideker, Chair Professor Vineet Bafna Professor Philip Bourne Professor Jeff Hasty Professor Daniel O’Connor ` 2008 Copyright Silpa Suthram, 2008 All rights reserved. The Dissertation of Silpa Suthram is approved, and it is acceptable in quality and form for publication on microfilm: Chair University of California, San Diego 2008 iii To Sanjit I couldn’t have done it without you. iv Aum Tamaso Ma Jyotir Gamaya || (Brhadaranyaka Upanishad — I.iii.28) Translation: Lead me from darkness (ignorance) to light (knowledge). v Table of Contents Signature Page...........................................................................................................iii Dedication.................................................................................................................. iv Epigraph .......................................................................................................................v Table of Contents ..................................................................................................... vi List of Figures.............................................................................................................ix List of Tables ...............................................................................................................x Acknowledgements...................................................................................................xi Vita, Publications .....................................................................................................xv Abstract..................................................................................................................... xvi 1 Introduction.........................................................................................................1 1.1 Generation and visualization of large‐scale protein interaction data..3 1.1.1 Yeast‐two hybrid assay...........................................................................3 1.1.2 Tandem Affinity Purification (TAP) followed by mass‐spectrometry...4 1.1.3 Visualization of protein interaction networks........................................5 1.2 Overview of the dissertation......................................................................6 1.2.1 Noise in the data.....................................................................................7 1.2.2 Comparative network analysis ...............................................................8 1.2.3 Network integration ...............................................................................9 2 Assigning confidence scores to protein interactions.................................15 2.1 Estimation of interaction probabilities ...................................................18 2.2 Benchmarking analysis.............................................................................20 2.3 Quality assessment....................................................................................27 2.3.1 Presence of conserved interactions in other species..............................28 2.3.2 Expression correlation..........................................................................29 vi 2.3.3 Gene Ontology (GO) similarity ...........................................................30 2.3.4 Signal‐to‐Noise Ratio of protein complexes .........................................32 2.3.5 Conservation rate coherency ................................................................33 2.4 Discussion...................................................................................................34 2.5 Methods ......................................................................................................37 2.5.1 GO databases........................................................................................37 2.5.2 Signal to noise ratio (SNR) ..................................................................38 2.5.3 Weighted average..................................................................................39 3 Comparative network analysis ......................................................................47 3.1 NetworkBLAST Algorithm ......................................................................49 3.1.1 Network alignment graph ....................................................................49 3.1.2 A probabilistic model of protein sub‐networks.....................................51 3.1.3 Search algorithm...................................................................................54 3.2 Complexes conserved across multiple species......................................57 3.2.1 Three‐way versus two‐way network alignment...................................61 3.2.2 Prediction of new protein functions .....................................................62 3.2.3 Prediction of new protein interactions .................................................63 3.3 Discussion...................................................................................................65 3.3.1 Comparison to existing methods ..........................................................65 3.3.2 Best BLAST hits may not imply functional conservation ...................66 3.3.3 Functional links within conserved networks........................................67 3.3.4 Validation of predicted interactions .....................................................69 3.3.5 Conclusion............................................................................................70 3.4 Methods ......................................................................................................71 3.4.1 Scoring functional enrichment.............................................................71 3.4.2 Prediction of protein functions.............................................................72 3.4.3 Prediction of protein interactions.........................................................75 3.4.4 Automatic layout of conserved complexes............................................78 4 Comparative network analysis of the protein network of the malaria parasite Plasmodium falciparum ...........................................................................94 4.1 Cross‐species comparison of the Plasmodium protein network........95 4.1.1 Data sources .........................................................................................95 4.1.2 Comparative analysis of the Plasmodium PPI network.......................96 4.1.3 Analysis of conserved and distinct complexes in Plasmodium..........101 4.1.4 Distribution of GO Cellular Components across species...................104 4.2 Discussion.................................................................................................106 vii 4.3 Methods ....................................................................................................108 4.3.1 Identification of conserved and species only complexes .....................108 4.3.2 Interaction confidence scores..............................................................109 4.3.3 Phylogenetic tree construction...........................................................109 4.3.4 SNR of protein complexes ..................................................................110 4.3.5 Simulating false‐positives and negatives in protein networks...........111 5 Network Integration: An efficient method for interpreting eQTL associations using protein networks ..................................................................124 5.1 “Electric circuit” analysis of eQTLs ......................................................126 5.1.1 Definition of terms .............................................................................126 5.1.2 Open problems motivated by previous method ..................................127 5.1.3 The eQED model ................................................................................128 5.1.4 Application to eQTL associations in yeast.........................................129 5.1.5 Combining multiple loci.....................................................................131 5.1.6 Predicting the direction of signaling along protein interactions .......132 5.1.7 Prediction of regulatory pathways .....................................................134 5.1.8 Application to gene‐association studies in human.............................135 5.2 Materials and Methods ...........................................................................137 5.2.1 Electric circuits and random walks....................................................137 5.2.2 eQTL Associations .............................................................................139 5.2.3 High‐confidence physical interaction network...................................140 5.2.4 eQED Model.......................................................................................143 6 Conclusion .......................................................................................................150 6.1 Future directions......................................................................................151 7 References ........................................................................................................155 viii List of Figures Figure 1.1: Schematic of the yeast two‐hybrid assay...........................................12 Figure