An Integrated Tool for Network Analysis and Visualization for Multi-Omics Data Yeongjun Jang1,2, Namhee Yu1, Jihae Seo1, Sun Kim2* and Sanghyuk Lee1*
Total Page:16
File Type:pdf, Size:1020Kb
Jang et al. Biology Direct (2016) 11:10 DOI 10.1186/s13062-016-0112-y APPLICATION NOTE Open Access MONGKIE: an integrated tool for network analysis and visualization for multi-omics data Yeongjun Jang1,2, Namhee Yu1, Jihae Seo1, Sun Kim2* and Sanghyuk Lee1* Abstract Background: Network-based integrative analysis is a powerful technique for extracting biological insights from multilayered omics data such as somatic mutations, copy number variations, and gene expression data. However, integrated analysis of multi-omics data is quite complicated and can hardly be done in an automated way. Thus, a powerful interactive visual mining tool supporting diverse analysis algorithms for identification of driver genes and regulatory modules is much needed. Results: Here, we present a software platform that integrates network visualization with omics data analysis tools seamlessly. The visualization unit supports various options for displaying multi-omics data as well as unique network models for describing sophisticated biological networks such as complex biomolecular reactions. In addition, we implemented diverse in-house algorithms for network analysis including network clustering and over-representation analysis. Novel functions include facile definition and optimized visualization of subgroups, comparison of a series of data sets in an identical network by data-to-visual mapping and subsequent overlaying function, and management of custom interaction networks. Utility of MONGKIE for network-based visual data mining of multi-omics data was demonstrated by analysis of the TCGA glioblastoma data. MONGKIE was developed in Java based on the NetBeans plugin architecture, thus being OS-independent with intrinsic support of module extension by third-party developers. Conclusion: We believe that MONGKIE would be a valuable addition to network analysis software by supporting many unique features and visualization options, especially for analysing multi-omics data sets in cancer and other diseases. Reviewers: This article was reviewed by Prof. Limsoon Wong, Prof. Soojin Yi, and Maciej M Kańduła(nominatedby Prof. David P Kreil). Keywords: Network visualization, Network modeling, Graph clustering, Omics data analysis, Over-representation analysis Implementation and effectively used for visualizing biological networks with Introduction ample plugin programs for analysis [3]. However, integra- Given the huge volume and complexity of omics data such tion and utility of third-party plugins are rather limited, not as those from TCGA (The Cancer Genome Atlas) projects, satisfying some important requirements such as dissecting it is a major challenge to gain insights into biological princi- the complex network into multiple small parts of function- ples [1]. A commonly used powerful approach in address- ally or topologically related nodes and subsequent ing such challenge is to use the network-based integrative visualization of each module in distinct styles [4]. Visual analysis which requires insightful visualization in addition comparison of multiple experiments on the network often to network analysis tools [2]. Cytoscape has been widely requires tedious procedure due to the lack of overlaying functions [5]. Another important limitation is that most network programs support binary interactions only and it is * Correspondence: [email protected]; [email protected] 2Interdisciplinary Program in Bioinformatics, College of Natural Science, Seoul difficult to describe complex biochemical reactions or path- National University, Seoul, Republic of Korea way interactions. Many of above features are not easily ac- 1Ewha Research Center for Systems Biology (ERCSB), Ewha Womans complished as plugin applications of Cytoscape without University, Seoul, Republic of Korea © 2016 Jang et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Jang et al. Biology Direct (2016) 11:10 Page 2 of 9 modifying the source code of core engine. Thus, a new plat- data-to-visual mapping function. We further implemented form for visual data mining of biological networks is much serial (manual) or animated visualization of multiple ex- needed. perimental profiles on an identical network with a separ- MONGKIE (Modular Network Generation and ate window for a heat-map view (Fig. 1a). This feature Visualization with Knowledge Integration Environ- facilitates exploring complex data under various condi- ments) was developed as a general platform for inter- tions (e.g. different subtypes of cancer or time series data) active visualization and analysis of complex omics to identify co-regulatory patterns within the network com- data in the context of biological networks. Several al- ponents (5.2 in Additional file 1). Cytoscape has a similar gorithms for network analysis were implemented and function in the VistaClara plugin [6] program, but it is they are tightly coupled with the visualization tools in available in the old legacy version (Ver.2.8) only. a unified platform. Functionalities User case study Like other network tools, MONGKIE supports basic To demonstrate the utility of MONGKIE as general graph representations (4.1 in Additional file 1) and di- network analysis and visualization software, we analyzed verse interactive ways to explore or edit a network in- multi-omics data from the TCGA Glioblastoma Multi- cluding the visual editor, data-to-visual mapping (4.3 in forme (GBM) consortium [7]. Our goal was to identify Additional file 1), zooming, filtering, searching (4.4 in driver gene candidates among mutated genes and core Additional file 1), and various graph layouts (4.5 in regulatory modules with functional interpretation. Additional file 1) etc. A unique feature is the extension From the TCGA GBM data sets (3.1 in Additional file of network model beyond the basic graph representa- 1), we manually selected recurrently altered genes with tion of binary interactions to support multi-modal rela- somatic mutations (patient freq. > 0.02) or copy number tionshipsandhyper-graphs[2].Thisextensionallows aberrations (patient freq. > 0.03) as the starting gene set. us to describe complex, interwoven biochemical reac- The GBM-altered network was established by extracting tions such as formation of protein complexes or path- all shortest paths among each pair of altered genes in way reactions controlled by other components (4.2 in the STRING network (score > 900) [8] with distance Additional file 1). threshold 2 and then retaining significant linkers only Another distinguishing feature is to support creation (p-value < 0.01) [9] as described in 3.2 in Additional file and manipulation of network subgroups. Each subgroup 1. Then, we assigned the Pearson correlation coefficient can be laid out separately from other modules in distinct of expression levels in tumor samples as the edge weight styles (Fig. 1a and Fig. 3.2 in Additional file 1). Such geo- between two genes. This custom network was imported metric separation is critical to focus on particular mod- into MONGKIE as a background network using the ules of interest without being disturbed by unnecessary Interaction Manager function. Users may explore or interactions (5.1 in Additional file 1). For example, users modify the network at this stage. can easily examine pathways affected by mutated genes Next, we applied the MCL clustering algorithm [10] or cross-talks between functional modules in cancer. to identify network modules, representing genes with MONGKIE provides various methods to define sub- correlated expression and topological proximity (3.4 in groups or network modules. Several network clustering Additional file 1). The grouping utility of MONGKIE algorithms are implemented to facilitate identification allows users to illustrate the network modules and of densely interconnected network modules whose their member genes simultaneously on top of back- components are functionally related. Each cluster can ground network as shown in Fig. 1a. This unique fea- be defined as a group, member genes of which are gath- ture helps users examine biological roles of network ered for facile visualization (Fig. 1a). Such modular subgroups by the optimized layout and subsequent organization of complex network is often valuable in GO enrichment analysis. obtaining biological insights and deducing molecular Among the top five largest modules that we discov- mechanisms. To aid functional interpretation of network ered, two were in agreement with previously known modules, we provide a utility to examine statistical enrich- GBM signaling networks [7, 11] (3.5 in Additional file ment of Gene Ontology terms (5.3 in Additional file 1). 1): (i) EGFR-PI3K signaling including EGFR, PDGFRA, All components for visualization, network analysis for de- PIK3CA and PIK3R1 (Fig. 1b), and (ii) DNA damage fining subgroups, and functional interpretation of network response and cell cycle regulation including TP53, CD modules can be easily threaded into a pipeline that allows KN2A/B, CDK4, MDM2/4 and RB1 (Fig.1c).Genesets