The Identification of Similarities Between Biological Networks: Application to the Metabolome and Interactome

doi:10.1016/j.jmb.2007.03.013 J. Mol. Biol. (2007) 369, 1126–1139 The Identification of Similarities between Biological Networks: Application to the Metabolome and Interactome Adrian P. Cootes1, Stephen H. Muggleton2 and Michael J.E. Sternberg1⁎ 1Division of Molecular The increasing interest in systems biology has resulted in extensive Biosciences, Imperial College experimental data describing networks of interactions (or associations) London, South Kensington, between molecules in metabolism, protein–protein interactions and gene London SW7 2AZ, UK regulation. Comparative analysis of these networks is central to under- standing biological systems. We report a novel method (PHUNKEE: Pairing 2Department of Computing, subgrapHs Using NetworK Environment Equivalence) by which similar Imperial College London, subgraphs in a pair of networks can be identified. Like other methods, South Kensington, PHUNKEE explicitly considers the graphical form of the data and allows for London SW7 2AZ, UK gaps. However, it is novel in that it includes information about the context of the subgraph within the adjacent network. We also explore a new approach to quantifying the statistical significance of matching subgraphs. We report similar subgraphs in metabolic pathways and in protein–protein interaction networks. The most similar metabolic subgraphs were generally found to occur in processes central to all life, such as purine, pyrimidine and amino acid metabolism. The most similar pairs of subgraphs found in the protein–protein interaction networks of Drosophila melanogaster and Saccharomyces cerevisiae also include central processes such as cell division but, interestingly, also include protein sub–networks involved in pre- mRNA processing. The inclusion of network context information in the comparison of protein interaction networks increased the number of similar subgraphs found consisting of proteins involved in the same functional process. This could have implications for the prediction of protein function. © 2007 Elsevier Ltd. All rights reserved. Keywords: network comparison; metabolic networks; protein interactions; *Corresponding author systems biology Introduction volved in biological systems such as protein–protein interactions,2 small-molecule metabolism,3 and gene 4 There is an increasing interest in how molecules regulation, can now be studied in increasing detail. combine in the cell to form complex biological sys- There has been a considerable number of studies into general biological network properties, such as tems. For many species, we now have an increas- 5,6 ingly comprehensive picture of their genes and the their scale-free nature. However, the development of methods to compare biological networks has molecules for which each of those genes code. Re- 7–14 searchers have been building on this success of been quite limited. This is in stark contrast to the genome sequencing efforts by identifying interac- variety of methods for the comparison of sequences tions or functional associations between pairs of and the number of ways used to assess the signifi- molecules or genes.1 There is now a considerable cance of a sequence match. Here, we introduce a amount of such data in the literature with more novel method by which similar subgraphs in a pair being rapidly generated via high-throughput experi- of networks can be identified. ments. The complex network of relationships in- Most studies of biological networks have concen- trated on comparing their connectivity properties to theoretical or other types of well-studied graphical systems.5,6 However, the comparison of biological E-mail address of the corresponding author: networks to one another will yield more evolution- [email protected] ary information than simply examining each net- 0022-2836/$ - see front matter © 2007 Elsevier Ltd. All rights reserved. Biological Network Comparison 1127 work in isolation. Furthermore, comparing the species. The approach used to produce these clusters network contexts of nodes in different biological finds regions in each network with many similar networks may reveal information about the entities nodes but does not consider explicitly whether those represented by those nodes. For example, the func- nodes are connected to one another (and other tion of proteins can be predicted from information in nodes) in a similar way. For many types of network the surrounding interaction network.15 Locating system, one would like a method to consider the statistically significant similar subgraphs in different similarity of the edges, as well as the proximity of networks would give greater confidence to assigning nodes, in comparing subgraphs. Such a method has function to uncharacterised proteins or interactions been demonstrated for the comparison of metabolic between proteins within those subgraphs. networks.11 Frequently occurring metabolic sub- Thus far, there have been a number of approaches graphs were sought across a large number of to the comparison of biological networks. Several of species. The most similar subgraphs found partici- these methods compare the general topological pate in pyrimidine, glutamate and alanine and as- statistics of subgraphs7 or compare the statistical partate metabolism. However, this approach did not prevalence of different types of three or four node consider gaps in the network structure. Given that connection patterns.8 Particular types of small net- biological systems often have elements inserted or work structures have been shown to be over- deleted in the course of evolution, it is important for represented in the transcription regulation networks a network comparison method to permit approxi- of Saccharomyces cerevisiae and Escherichia coli, such mate matches between subgraphs. as three node “feed forward loops” and four node Here, we have developed a method (PHUNKEE: “bi-fan” patterns (where two transcription factors Pairing subgrapHs Using NetworK Environment each regulate the same two genes).16 A more recent Equivalence†) for the comparison of biological approach locates approximately matching small networks that searches for similar subgraphs of motifs.12 Clearly, it would be desirable to be able general structure, allowing for gaps, and determines to also consider the similarity in higher-order the statistical significance of the match. The similar- connectivity patterns in networks. The PATHBLAST ity of subgraphs was determined by considering algorithm was used to search for larger similar explicitly the similarity of the edges as well as the subgraphs by first comparing short linear “strings” similarity of the nodes. Furthermore, we depart of nodes within a pair of protein–protein networks,9 from previous work in considering the set of all and has since been extended to also search for edges adjoining nodes belonging to a subgraph similar “dense clusters” of interacting proteins (referred to as the network context (Figure 1)), rather (NetworkBlast).13 Both approaches allowed for than simply those edges connecting nodes within gaps in similar strings or clusters. These strings the subgraph. We examined whether considering and clusters were then grouped to search for larger network context improved subgraph comparison related subgraphs. In the initial study,9 similar and applied this technique to the comparison of subgraphs (based on short strings) in the protein pairs of metabolic and protein–protein interaction interaction networks of S. cerevisiae and H. pylori networks. We also explored two different methods were found to be involved with protein synthesis for calculating the statistical significance of similar and cell rescue, protein fate and targeting, cell subgraphs and the suitability of each for analysing envelope and nuclear transport, proteolytic degra- various types of biological networks. In this way, we dation and rRNA transcription. The latter study13 identified similar subgraphs in pairs of metabolic searched for similar subgraphs (based on both networks and pairs of protein interaction networks strings and clusters) in multiple species: Drosophila for a number of different species. As expected, melanogaster, S. cerevisiae and Caenorhabditis elegans. similar subgraphs were found in biological pro- The most prevalent processes found in related cesses central to all life, such as amino acid, purine subgraphs in this case were protein degradation, and pyrimidine metabolism, but also in processes RNA polyadenylation and splicing, protein phos- not expected to be as strongly conserved, such as phorylation and signal transduction. NetworkBlast pre-mRNA processing. bases its initial search on particular types of graphical form modelled on signalling pathways (strings) and protein complexes (clusters). The Our Approach Græmlin algorithm extended this approach by searching efficiently for similar subgraphs of arbi- A brief description of PHUNKEE (Figure 2)is trary topology across multiple networks.14 Another given here and a more detailed description is given recent approach searched for general graphical in Materials and Methods. structures by clustering nodes according to their PHUNKEE consists of two basic steps. First, distance from one another in each network.10 This corresponding (shared) nodes and edges were technique was used to identify clusters of genes that identified. Second, the most similar network regions code for enzymes that are also clustered in metabolic of a given, user-defined size were sought, allowing networks. This type of arrangement often implies that the

The Identification of Similarities Between Biological Networks: Application to the Metabolome and Interactome

Systems Biology: Tracking the Protein–Metabolite Interactome

A Metabolic Modeling Approach Reveals Promising Therapeutic Targets and Antiviral Drugs to Combat COVID-19

Deriving Disease Modules from the Compressed Transcriptional Space Embedded in a Deep Auto-Encoder

MS-Based Interactomics: Computational Resources and Tools for Studying the Physical Interactome

Interactome INSIDER: a Structural Interactome Browser for Genomic Studies

Interactome of the Hepatitis C Virus: Literature Mining with Andsystem

Dynamic Rewiring of the Human Interactome by Interferon Signalling

Interactions of the Antiviral Factor Interferon Gamma-Inducible Protein 16 (IFI16) Mediate Immune Signaling and Herpes Simplex Virus-1 Immunosuppression*□S

Viral-Host Interactome Analyzed by Network Based-Approach Model to Study Pathogenesis of SARS-Cov-2 Infection

A Massively Parallel Barcoded Sequencing Pipeline Enables Generation of the First Orfeome and Interactome Map for Rice

Integration of Molecular Interactome and Targeted Interaction Analysis To

Interactome Networks and Human Disease