Ji et al. BMC Bioinformatics (2016) 17:86 DOI 10.1186/s12859-016-0916-x METHODOLOGY ARTICLE Open Access A powerful score-based statistical test for group difference in weighted biological networks Jiadong Ji†, Zhongshang Yuan†, Xiaoshuai Zhang and Fuzhong Xue* Abstract Background: Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. A key but inadequately addressed issue is how to test possible differences of the networks between two groups. Group-level comparison of network properties may shed light on underlying disease mechanisms and benefit the design of drug targets for complex diseases. We therefore proposed a powerful score-based statistic to detect group difference in weighted networks, which simultaneously capture the vertex changes and edge changes. Results: Simulation studies indicated that the proposed network difference measure (NetDifM) was stable and outperformed other methods existed, under various sample sizes and network topology structure. One application to real data about GWAS of leprosy successfully identified the specific gene interaction network contributing to leprosy. For additional gene expression data of ovarian cancer, two candidate subnetworks, PI3K-AKT and Notch signaling pathways, were considered and identified respectively. Conclusions: The proposed method, accounting for the vertex changes and edge changes simultaneously, is valid and powerful to capture the group difference of biological networks. Keywords: Network medicine, Systems epidemiology, Score-based statistical test, Network comparison Background types of networks are often used to represent diverse From the perspective of network medicine, a disease types of biological processes, each of which stores infor- phenotype is rarely a consequence of an abnormality in mation about levels and interactions related to specific a single biomolecule (e.g. RNA, protein, metabolite), but biomolecules [5]. In fact, different physiological condi- reflects various pathobiological processes that interact in tions may manifest as different networks. Moreover, a complex network [1]. One single factor can exert complex disease are multi-factorial and analyzing the certain effects on disease when studying it alone, while individual components is insufficient, so it is essential to this effect may be vanished when studying it within one dissect how these components interact with each other network or pathway [2], and vice versa. Therefore, bio- and weave into one network, and how these interactions molecules should be studied in the context of biological differ with respect to disease status. Statistical compari- systems they are involved in [3]. Perhaps the abstraction son of group difference in biological networks or path- for a biological system is network, such as transcrip- ways can provide new insight into the underlying disease tional regulatory networks, signal transduction networks, mechanism, and have extensive biomedical and clinical protein interaction networks and metabolic networks applications [6–10]. For instance, a better understanding [4]. In the biological networks, the vertices represent of the effects of molecular interconnectedness on disease biomolecules, and the edges represent functional, causal progression may lead to superior identification of disease or physical interactions between the vertices. Different related biomolecules and pathways, which may further offer more effective targets for drug development in a * Correspondence: [email protected] cost-effective and timely manner. †Equal contributors Department of Biostatistics, School of Public Health, Shandong University, PO On the other hand, identifying biological and environ- Box 100, Jinan 250012, Shandong, China mental causes of human diseases has always been one of © 2016 Ji et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http:// creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Ji et al. BMC Bioinformatics (2016) 17:86 Page 2 of 10 the central concerns in epidemiology. However, trad- strength of connection) can lead to the whole network itional epidemiology has been pejoratively labeled as the difference. Reverter et al. [20] presented an analytical “black box” epidemiology [11], and increasingly suffered procedure to simultaneously identify genes that were from criticism partly due to the fact that too much differentially expressed (DE) as well as genes that are attention has been paid to the identification of a single differentially connected (DC) for unweighted networks. risk factor rather than the network or pathway related to Their methods depend heavily on the specific defin- a disease, which led to difficulty to deeply explore dis- ition of DE and DC, and the two-component mixture ease mechanism [12]. It is highly desirable to unlock the of bi-variate normal distribution may be violated in black box underlying observed associations and to illu- other biological networks, though it may be reasonable minate the biological interaction mechanisms of disease- in gene expression network. Furthermore, weighted related components hiding behind the black box. There (correlation-based) networks are commonly encoun- are unmet needs to access multi-level omics data on the tered and increasingly relevant in biological applica- population level. Thanks to the development of recent tions [16, 21–23]. Statistical methods for detecting the technological advances in high-throughput omics plat- group difference in weighted biological networks are forms, we can enable the acquisition of omics data at still in great demand. unprecedented speed and amounts, and further integrate In this article, we proposed a new score-based network various omics data with traditional epidemiology to pro- difference measure (NetDifM) as a powerful test statistic mote the development of systems epidemiology [12, 13]. to detect group difference in weighted networks, which It offers the potential to provide new insight into the simultaneously capture the difference of vertices and underlying disease mechanisms in breadth and depth at edges. Various simulations were conducted to evaluate its human population level. Under the framework of sys- type I error and statistical power, compared with other tems epidemiology, the focus has been shifted from existed method. Two real data sets about GWAS of lep- identification of single factor to exploration of specific rosy and gene expression of ovarian cancer were further networks or pathways contributing to disease [14, 15]. analyzed to show their performance in practice. In a word, it is in great needs to do statistical compari- son of biological networks. So far, several methods have been proposed to utilize network topology information Methods to carry out various biomedical tasks. Langfelder et al. Statistical model [16] provided several measures for comparing network A weighted biological network can be modeled as an topologies for weighted correlation networks. Zhang undirected graph G =(V, E), where V is the set of vertices et al. [17] proposed a differential dependency network (sometimes referred to nodes) and E is the set of edges (also analysis to detect topological changes in transcriptional called connections). Two vertices, representing biomole- networks between subclasses of breast cancer. Valcarcel cules, are connected by an undirected edge if there is an as- et al. [18] introduced a formal statistical method for the sociation between them. Each edge can be assigned a differential analysis of molecular pair-wise associations weight resembling the strength of evidence for the via network representation. Recently, Yates et al. [19] association. We denote the two networks in two groups (cases and developed an additive element-wise-based dissimilarity D C D controls) by G and G respectively, suppose both G and measure for biological network hypothesis tests. How- C G have the same number of vertices (M)andedges(K), ever, most of above methods mainly focus on the differ- D C D the null hypothesis test is H0 : G = G .LetV(G )and ence of network topology and are unable to account for D D D D the changes of vertices. Although in most situations, the E(G ) denote the set of all vertices and edges in G , xi xj D D D differences of single vertices-wise or edges-wise may be indicate the edge xi −xj (i ≠ j, i, j =1,2,⋯, M), βij represent D D D D weak, their aggregated differences can be quite strong. It the strength of association between xi and xj if xi xj l l ⋯ N will undoubtedly lose statistical power to only consider existed. For individual ( =1,2, , ), the trait value is the connection with the topological difference between 1; l∈case th denoted as Yl, Y l ¼ and the i vertex is two networks. Meanwhile, non-parametric permutation 0; l∈control procedures are commonly employed to perform analysis denoted as xli.UnderH0, networks in two groups are iden- in most existed methods, which were inevitably time- tical not only in the average vertices levels but also in the consuming, especially for big data.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-