Inferring Modules from Human Protein Interactome Classes Elisabetta Marras1, Antonella Travaglione1, Gautam Chaurasia2,3, Matthias Futschik2,4, Enrico Capobianco1*
Total Page:16
File Type:pdf, Size:1020Kb
Marras et al. BMC Systems Biology 2010, 4:102 http://www.biomedcentral.com/1752-0509/4/102 RESEARCH ARTICLE Open Access Inferring modules from human protein interactome classes Elisabetta Marras1, Antonella Travaglione1, Gautam Chaurasia2,3, Matthias Futschik2,4, Enrico Capobianco1* Abstract Background: The integration of protein-protein interaction networks derived from high-throughput screening approaches and complementary sources is a key topic in systems biology. Although integration of protein interaction data is conventionally performed, the effects of this procedure on the result of network analyses has not been examined yet. In particular, in order to optimize the fusion of heterogeneous interaction datasets, it is crucial to consider not only their degree of coverage and accuracy, but also their mutual dependencies and additional salient features. Results: We examined this issue based on the analysis of modules detected by network clustering methods applied to both integrated and individual (disaggregated) data sources, which we call interactome classes. Due to class diversity, we deal with variable dependencies of data features arising from structural specificities and biases, but also from possible overlaps. Since highly connected regions of the human interactome may point to potential protein complexes, we have focused on the concept of modularity, and elucidated the detection power of module extraction algorithms by independent validations based on GO, MIPS and KEGG. From the combination of protein interactions with gene expressions, a confidence scoring scheme has been proposed before proceeding via GO with further classification in permanent and transient modules. Conclusions: Disaggregated interactomes are shown to be informative for inferring modularity, thus contributing to perform an effective integrative analysis. Validation of the extracted modules by multiple annotation allows for the assessment of confidence measures assigned to the modules in a protein pathway context. Notably, the proposed multilayer confidence scheme can be used for network calibration by enabling a transition from unweighted to weighted interactomes based on biological evidence. Background When time changes cannot be monitored, the avail- Networks are complex structures endowed with both able static pictures limit the potential for global interac- statistical and topological properties ([1,2]). Biological tome analysis. Despite this limitation, there has been a networks, and protein-protein interaction networks great interest in analyzing topological features of net- (PPIN) in particular, require both theory and algorithms works in order to cluster proteins into groups, assign to describe complex mechanisms and relationships. Ide- functions to uncharacterized proteins, study their simila- ally, these networks can be assumed to represent snap- rities, and establish reliability of the interactions. How- shots depicted by connectivity maps observed at ever, due to the impact of relatively low ratios between particular times. Through a sequence of such maps, we true and false positives and negatives, it is hard to could verify how the network connectivity changes over accomplish those tasks before filtering interactome sig- time, and thus conceive a topological model for inter- nals from noise. preting the dynamics and conducting inference by the Notably, comparative method evaluations ([3,4]) have built-in predictive power. been proposed to extract clusters of densely connected proteins which might indicate protein complexes or * Correspondence: [email protected] functional modules. The meaning of these two entities 1CRS4 Bioinformatics Laboratory - Technology Park of Sardinia, Pula (Cagliari), is distinct in biological terms (see [5] and [6] for an Sardinia - Italy extended discussion). We refer to a protein complex Full list of author information is available at the end of the article © 2010 Marras et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Marras et al. BMC Systems Biology 2010, 4:102 Page 2 of 34 http://www.biomedcentral.com/1752-0509/4/102 (e.g. transcription factors, histones, polymerases, etc.) as the underlying source of interaction data. We have thus a molecular machine consisting of multiple proteins retrieved modules by various methods and from each (and possibly nucleic acids and other molecules) that interactome class, including the integrated one, and bind at the same place and time. In contrast, a func- finally compared them by multiple validations. tional module (e.g. signaling pathways) represents a set A comparative evaluation of human protein interac- of proteins (and other molecules) that controlor perform tomeclassessuggeststhatascoringsystemshouldbe a particular cellular function but not necessarily at the available. We have proposed a confidence scoring same time and place, and thus may not form a macro- method based on several sequential steps. Using gene molecular complex. expression data and gene annotation, we assigned initial However, it is often hard to distinguish between these confidence scores to PPIN modules. In order to cali- two structures by relying only on PPIN, as in general the brate the initial scores, information from GO http:// analyzed protein interactions do not have temporal and www.geneontology.org/, MIPS (http://mips.helmholtz- spatial information. Nevertheless, since PPIN represent muenchen.de/proj/ppi/, and KEGG http://www.genome. undirected binary or weighted graphs, several graph-based jp/kegg/) was subsequently utilized. Overall, the com- inference approaches have been successfully employed to bined use of interactome classes, network clustering detect modularity. The majority of such approaches evalu- methods and additional multiomic sources allows for ate interactome topological features, and typical examples better characterization of the modularity map, and for are node degree and clustering coefficient, both based on an assessment of the influence of integration and disag- the levels of connectivity of each node. gregation on the detected and validated modules. Both global and local connectivity can be explored by The integrative approach requires specific tools for these methods, depending on the kind interactome ana- both analysis and validation, and is based on a qualita- lysis to be performed. The results may vary, as methods tive and quantitative representation of a compilation of are based on different principles. For instance, the two information from diverse biological sources. Specifically, main contributions to our work come from the applica- we have focussed on how disaggregated and integrated tion of two algorithms, CFinder ([7,8]) and MCODE [9]. interactome classes influence and characterize the Interestingly, they deal with network modularity through detected modules. In addition, we addressed two related similar topological instruments, but achieve quite differ- questions: Do clustering algorithms determine modular- ent outcomes; therefore, we based our analysis on them, ity maps? To what extent is the overlap or separation of while also evaluating other methods. modules induced by inherent data complexity or by In parallel, a substantial heterogeneity of human inter- capacity of the method to partition the data? actome datasets has been generated depending on the There are limitations in the current practice of interac- underlying methods of identifying and characterizing tome modularity detection and representation, and two protein interactions. Besides high-throughput main factors condition the analysis: the choice of the approaches ([10-12]), in particular the curation of litera- modularity-finding algorithm, and the choice of the inter- ture ([13-15]) and the provision of computational pre- actome dataset with the related biological sources. In this dictions ([16-18]) have allowed for the mapping of the article we provided evidence that quantifies the existing human interactome. discrepancies between methods whose performance is Despite the impressive size of PPIN produced by the comparatively evaluated over different interactome data- different approaches, their overall coverage remains lim- sets. Simultaneous evaluation of both interactome meth- ited. A general procedure to increase the coverage level ods and data may provide valuable guidance on the is the integration of different interaction maps. How- overall interpretability of modularity and likely lead to ever, recent analyses [19] have revealed that the class of improved inference methods and models. The paper is integrated PPIN may display distinct functional charac- organized in such way that module extraction algorithms teristics and topological features. This evidence suggests are first introduced, then the structure of the available that the analysis of integrated maps could be compro- datasets is presented, and finally the results of a data inte- mised by the heterogeneity which is fused into them, gration approach are illustrated. ending up with diverging modular maps. To assess this possibility, we have defined integrated Methods and also individual PPIN classes; as the latter have a Detection of Modularity source-specific characterization, they in turn generate Following many Saccharomyces cerevisiae (yeast) PPIN disaggregated