Journal of Biomedical Informatics 57 (2015) 1–5

Contents lists available at ScienceDirect

Journal of Biomedical Informatics

journal homepage: www.elsevier.com/locate/yjbin

Prioritization of candidate disease by combining topological similarity and semantic similarity ⇑ Bin Liu, Min Jin , Pan Zeng

College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China article info abstract

Article history: The identification of –phenotype relationships is very important for the treatment of human Received 26 December 2014 diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with Revised 1 July 2015 each other in a –protein interaction (PPI) network. Thus, many identification methods based on Accepted 6 July 2015 the PPI network model have achieved good results. However, in the PPI network, some interactions Available online 11 July 2015 between the encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information Keywords: and reported good predictive performances. However, we believe that the results could be further Disease genes improved. In this paper, we propose a new method that uses the semantic similarity between the candi- Random walk Topological similarity date gene and known disease genes to set the initial probability vector of a random walk with a restart Semantic similarity algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson’s disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Ó 2015 Elsevier Inc. All rights reserved.

1. Introduction genomic data of the candidate gene. The genomic data include sequence-based features [7,8], (GO) annotation Because many diseases, such as cancer, diabetes, and information [9,10], expression patterns [11–13], and protein inter- cardiovascular diseases, result from gene mutations, exploring action data [14,15]. In most cases, multiple sources of genomic the relationships between diseases and their causative genes has data are combined to find causal genes, e.g., the combinations of become an important topic in contemporary systems biology. GO annotation information with protein interaction data [16],GO These gene mutation-caused diseases are very common in devel- annotation information with sequence-based features [17], and oped countries and are becoming increasingly common in develop- metabolic pathway data with protein interaction data [18]. ing countries [1]. Investigation of the interactions between the proteins that are Linkage analyses and association studies have been proposed to encoded by genes in the human PPI network has become one of identify disease genes [2–4]. However, the efforts of these methods the primary and most powerful approaches for elucidating the result in genomic intervals of 0.5–10 cM that are composed of hun- molecular mechanisms that underlie complex diseases [19–21]. dreds of genes [5,6]. Whether these genes are disease-causing Such exploration has often been performed by comparing the net- requires further investigation. work topology similarities of the nodes in the PPI network. There In recent years, with the rapid accumulation of different types are many methods for measuring topological similarity, including of genomic data, many calculation methods for prioritizing disease calculating the number of common neighbors between two net- genes have been proposed. One remarkable advantage of these cal- work nodes and calculating the distance between two network culation methods is the reduction in manpower and material nodes. Due to incomplete data about the PPI network, some inter- resources. Concretely, most of these methods are based on similar- actions between the proteins encoded by candidate gene and the ities between the genomic data of known disease genes and the proteins encoded by known disease genes are very weak. Thus, some candidate genes cannot be well identified. To achieve better prediction results, some studies have combined the PPI network ⇑ Corresponding author. with phenotype similarity information and reported good perfor- E-mail address: [email protected] (M. Jin). mances. However, we believe that the results could be further http://dx.doi.org/10.1016/j.jbi.2015.07.005 1532-0464/Ó 2015 Elsevier Inc. All rights reserved. 2 B. Liu et al. / Journal of Biomedical Informatics 57 (2015) 1–5 improved. In biological data resources, large amounts of data This paper used the HRSS algorithm (Fig. 1) that was developed describe the molecular function of genes or the biological pro- by Wu [25] to measure the semantic similarity. cesses in which the genes are involved. These data form the seman- The information content (IC) is defined by Eq. (3), tic information of a gene. If a candidate gene and a disease gene share a high level of semantic similarity, we can compensate for ICðcÞ¼log pðcÞð3Þ the weak interaction between the genes in the PPI network by add- The probability of the occurrence of the term c in a specific cor- ing the semantic similarity. Some studies have shown that GO pus is represented by component pðcÞ. annotation information, which is used to predict disease genes The IC-based distance between the two terms u and is defined [22], is a very effective semantic resource. Based on these two v in Eq. (4), where is a descendant of u. types of data sources, i.e., protein interaction data and GO annota- v tion information, this paper proposes a new method for inferring distICðu; vÞ¼ICðvÞICðuÞ¼log pðuÞlog pðvÞð4Þ gene–phenotype relationships. We use the semantic similarity value between the candidate gene and known disease genes to Then, the IC-based specificity of the most informative common set the initial probability vector of the random walk with restart ancestor (MICA) of any two terms termi and termj is (RWR) algorithm and apply this algorithm to the PPI network. When the final walk reaches a stable state, we predict new disease aIC ¼ distICðroot; MICAÞ¼log pðMICAÞð5Þ genes according to the candidate genes’ rankings in the vector. We The dist between a term and the most informative leaf nodes used leave-one-out cross-validation to demonstrate the effective- IC (MIL) descending from the term refers to the generality of a term. ness of our method. Compared with other methods, our method Component b represents the average of the generality values of achieved better performance. Additionally, new causative genes term and term . of multifactor diseases, including Parkinson’s disease, breast can- i j cer, and obesity, are predicted with our method. The top predic- distICðtermi; MILiÞþdistICðtermj; MILjÞ tions were good and consistent with the reports in the literature, b ¼ ð6Þ IC 2 which further illustrates the validity of our method.

The most informative leaf nodes of termi and termj are repre- 2. Materials and methods sented by MILi and MILj, respectively.

1 aIC 2.1. Data source HRSSðtermi; termjÞ¼ ð7Þ 1 þ c aIC þ bIC

Gene ontology data (released in October 2013) and a human where c is defined as follows: gene annotation dataset (released in October 2013) were from the Gene Ontology database [23]. The GO consists of three struc- c ¼ distðMICA;termiÞþdistðMICA;termjÞð8Þ tured ontologies, i.e., biological process (BP), molecular function Let g1 and g2 be two genes of interest and tg1 and tg2 the sets of all (MF), and cellular component (CC). The GO data contains 25,571 of the GO terms assigned to gene g1 and g2, respectively. BP, 9661 MF, and 3386 CC terms. The gene annotation dataset con- tained 383,316 annotations of 18,911 genes. GO HRSSMAXðg1; g2Þ¼maxðHRSSðgoi; gojÞÞ ð9Þ goi2tg1 In this paper, PPI data were downloaded from the Human goj2tg2 Protein Reference Database (HPRD). All of the information in the HPRD has been manually extracted from the literature by expert biologists who have read, interpreted, and analyzed the published data. Disease–gene association data were obtained from the Online Mendelian Inheritance in Man (OMIM) database [24].

2.2. Summaries of the RWR and HRSS algorithms

The RWR is a sorting algorithm [14] that simulates a random walker that either starts from a seed node, or from a set of seed nodes, and moves to its direct neighbors randomly at each step. Finally, based on the probability of the random walker reaching a specific node, we ranked all of the nodes in the graph. We used P0 to represent the initial probability vector, and Ps is a vector that rep- resents the probability of the random walker reaching all nodes on the graph at step s. The probability vector at step s + 1 is given by

Psþ1 ¼ð1 dÞMPs þ dP0 ð1Þ

The row-normalized adjacency matrix of the graph is repre- sented by parameter M. The parameter d 2ð0; 1Þ is the restart probability. At each step, the random walker can return to the seed nodes with probability d.

After some steps, the vector Psþ1 will reach a steady state. This steady state is obtained by performing the iteration until the abso- 6 lute value of the difference between Ps and Psþ1 falls below 10 .

6 jPsþ1 Psj < 10 ð2Þ Fig. 1. A schematic illustration of the HRSS algorithm. B. Liu et al. / Journal of Biomedical Informatics 57 (2015) 1–5 3

2.3. Prediction algorithm based on topological similarity and semantic causative genes for the disease were used as the seed nodes. similarity Receiver operating characteristic (ROC) curves were used to com- pare the performance of our algorithm with those of other meth- Different information contents of the genes’ GO terms and the ods. ROC curve plot the sensitivity versus 1 – the specificity positional relationship between the genes’ GO terms in the direc- based on a threshold that separates the prediction classes [11]. ted acyclic graph were used to measure the semantic similarity The area under ROC curve (AUC) indicates the performance of between two genes. A GO term describes gene information that the algorithm. includes molecular functions, biological processes, and cellular components. Therefore, the greater the semantic similarity shared 3.1. Comparison with other methods by genes the more functional similarity they share. There is a relationship between two connected nodes in the PPI To prove the validity of our method, we compared the perfor- network. This type of relationship is reflected by factors related to mance of our method with those of two other methods, i.e., the chemical biology, signal transduction, genetic network, metabo- RWR method [14] and the DP_LCC method [27]. The three methods lism etc. Therefore, nodes that share topological similarity share used the same dataset (i.e., the HPRD and OMIM databases) and the synergy and similarity in the above aspects. Moreover, genes that same test method (leave-one-out cross-validation). In total, there share semantic similarity are similar to each other in terms of were 34,364 interactions among the 8919 genes in the gene net- molecular functions, biological processes, and cellular components. work. The phenotype similarity network included phenotype sim- Therefore, both the topological similarity and the semantic similar- ilarities for 5080 diseases. There were 1428 gene–phenotype links ity reflect functional information and complement each other. between 937 genes and 1216 phenotype entities. As we needed to Researchers have found that similar phenotypes are essentially use the semantic information of the seed genes, only the pheno- caused by functionally related genes [26]. Based on this theory, we types associated with at least two disease genes were considered used the maximum value and the average value of the semantic in our experiment. A total of 168 phenotypes associated with similarity between the candidate gene and known disease genes 470 disease genes were obtained. The largest AUC for the DP_LCC to set the initial probability vector for the RWR. The initial proba- algorithm was found when the restart probability was set to 0.7, bility vector of a disease was constructed in such a manner that we and the weight was set to 0.5 [27]. Therefore, we set d ¼ 0:7. assigned the position value of the causative gene of the as 1. The Table 1 shows the AUC values of the four methods. These values position value of the candidate gene was between 0 and 1, and it indicated that the performance of our method was better than was according to the semantic similarity value between the candi- those of the RWR and DP_LCC methods. The table also shows a date gene and the causative gene. Next, we used the RWR algo- comparison between our two methods, i.e., RWRMHRSS and rithm in the PPI network to measure the topological similarity. RWRAHRSS, and indicates that the latter performed better than In our method, the location value of the candidate gene in the the former. initial probability vector P0 for a specific disease can be calculated with formula (10) or formula (11). The HRSS algorithm was used to 3.2. Effects of parameters measure the semantic similarity between the candidate gene and known disease genes. There is one free parameter in our method, namely, the restart GO MHRSSðgÞ¼maxðHRSSMAXðg; giÞÞ ð10Þ probability d in the RWR algorithm. We tested our algorithm with g G i2 different values of d (from 0 to 1 in steps of 0.2) and found that the P n GO best performance was obtained with the restart parameter d ¼ 0:3. HRSS ðg; giÞ AHRSSðgÞ¼ i¼1gi2G; MAX ð11Þ With the different ds; the difference between the best AUC value n and the worst was less than 0.02. Therefore, d did not have a sub- The component g is a candidate gene, and G is known disease genes stantial influence on the results. Table 2 shows our experimental set of the disease. results. The ROC curves for our methods at d ¼ 0:3 are shown in Now that there are two ways to set the initial probability vector Fig. 2. Fig. 2 shows that our RWRAHRSS method achieved slightly better results than the RWRMHRSS method in the two test gene P0, our method has two according branches. The first is the combi- nation of the RWR and MHRSS (RWRMHRSS), and the second is the sets. The results from Rand were always better than those from combination of the RWR and AHRSS (RWRAHRSS). ALI because the test gene set constructed by ALI was not always included in the gene network. Together, these results demonstrate the following: (1) our algorithm was robust in the selection of the 3. Results test gene set, and (2) increases in the numbers of known disease genes result in better reflections of all aspects of the disease. In this section, we first compare the performance of our method with other methods. Next, we explore the influence of d on our method. Finally, we use our algorithm to predict new causative 3.3. Predicting new disease genes for some common diseases genes for some multifactor diseases. The set of test genes was con- structed in two ways. The first was via the artificial linkage interval Parkinson’s disease (PD), also known as quiver paralytic disease, (ALI) approach that assumes that one of the real disease-causing is one of the most common neurodegenerative diseases. The genes does not interact with the disease and constructs the test genes’ set with this disease-causing gene and its nearest 99 genes in the . The second was via a validation against ran- dom genes (Rand) approach. With the same assumption men- Table 1 Comparison of the RWR and DP_LCC methods. tioned for the ALI approach, the Rand approach selects one of the real disease-causing genes and 99 control genes randomly from Method ALI Rand all genes in the interactome as the test gene set. The performance RWR 0.8084 0.8228 of our method for recovering disease–gene relationships was DP_LCC 0.8255 0.8387 examined with leave-one-out cross-validation. In each round of RWRAHRSS 0.8615 0.8846 RWRMHRSS 0.8455 0.8701 validation, we removed a disease–gene link. The remaining 4 B. Liu et al. / Journal of Biomedical Informatics 57 (2015) 1–5

Table 2 [34,35]. BLM plays key roles in homologous recombination repair AUC values of our method at different values of d. and DNA replication. Some mutations in the BLM gene cause Method 0.1 0.3 0.5 0.7 0.9 Bloom’s syndrome, which leads to multiple cancers including RWRAHRSS breast cancer [36]. The last prediction for breast cancer was ALI 0.8722 0.8737 0.8689 0.8615 0.8531 CHEK2. However, CHEK2 has also been proven to be the cause of Rand 0.8929 0.8937 0.8913 0.8846 0.8768 breast cancer in the latest OMIM database. This further illustrates RWRMHRSS the effectiveness of our approach. ALI 0.8587 0.8560 0.8517 0.8455 0.8385 Obesity is a common, ancient metabolic syndrome. When the Rand 0.8780 0.8795 0.8738 0.8701 0.8616 body absorbs more calories than it consumes, the excess calories are stored as fat. When fat exceeds the normal physiological requirements, it reaches a certain value and results in obesity. In the OMIM record, obesity (MIM: 601665) has the following 16 val- idated causative genes: NR0B2, SDC3, POMC, GHRL, PPARG, UCP1, CART, ADRB2, PPARGC1B, SIM1, ENPP1, ADRB3, UCP3, AGRP, PYY, and MC4R. The top-10 predictions of our algorithm (RWRAHRSS) for obesity were ADIPOQ, UCN, TYMS, NDP, CRH, GHRL, ADIPOR2, CRHBP, NMUR2, and GIP. The first prediction was ADIPOQ, which plays an important role in regulating glucose levels and fatty acid oxidation [37]. The literature indicates that ADIPOQ is closely related to obesity [38,39]. Adiponectin is the protein that is encoded by the ADIPOQ gene. It has been reported that low circu- lating levels of adiponectin and single nucleotide polymorphisms of ADIPOQ are associated with central obesity [40]. Another predic- tion was CRH, which suppresses food intake and may be associated with obesity [41]. Some studies have found that elevated placental CRH is associated with childhood outcomes, such as increased adi- ponectin levels and increased central adiposity [42]. Moreover, we Fig. 2. ROC curves of our methods at d = 0.3. have described the relationship between adiponectin and central obesity above. clinical expression of PD is characterized by sport bradycardia, muscle rigidity, resting tremor, and gait abnormalities. In the 4. Conclusion OMIM database, there are 4 genes associated with this disease that include GBA, ADH1C, TBP, and MAPT. We used our method In the present paper, a method combining the topological sim- (RWRAHRSS) to predict new disease genes for PD (MIM: ilarity in the PPI network and gene semantic similarity was pro- 168600). The top 10 predictions for this disease were BAG5, posed for the prioritizing of candidate disease genes. The novelty TIMM9, TIMM10, S100A5, PSPH, ADIPOQ, ADH5, PCSK9, CDH13, of our method lies in the incorporation of the gene semantic sim- and DNM1L. The first prediction for PD was BAG5, which inhibits ilarity value to set the initial probability vector of the RWR. The parkin. Loss-of-function mutations in the parkin gene, which results of leave-one-out cross-validation on a benchmark dataset encodes an E3 ubiquitin ligase, are the major causes of revealed that the RWRAHRSS achieved a greater precision (as mea- early-onset Parkinson’s disease (PD) [28]. BAG5 plays a potential sured by the AUC) than the RWRMHRSS. This result indicates that role in the promotion of neurodegeneration in sporadic PD [29]. the use of greater numbers of known disease genes results in better The last prediction was DNM1L, which is also known as DLP1, reflections of all aspects of the disease. Our method also outper- DRP1, and EMPF. According to the report, the mutations of this formed the RWR and DP_LCC methods. Furthermore, we predicted gene are associated with several neurological disorders [30]. new causative genes for some common diseases, including DNM1L is a key protein in mitochondrial dynamics. Decreases in Parkinson’s disease, breast cancer, and obesity, with our algorithm this protein may interfere with neuron survival in PD [31]. and found that a portion of the predictions were corroborated by Mitochondrial dysfunction represents an important event in the recent experimental reports. These results all prove the effective- pathogenesis of PD [32]. Moreover, DNM1L is one of the key regu- ness of our approach. lators of mitochondrial fission [33]. In our experiment, we found that the top (e.g., 20) identified Breast cancer is a common malignancy that often occurs in genes varied widely between these methods. We have explored females. The clinical expression of breast cancer may include a this phenomenon only from the perspective of information science. lump in the breast, a change in the breast shape, dimpling of the The biological principle of this phenomenon remains unclear. skin, fluid coming from the nipple, or a red scaly patch of skin. However, our team is applying the new method to the prediction According to the OMIM record, breast cancer (MIM: 114480) has of disease genes, and this phenomenon will be examined and ana- the following 23 validated causative genes: RAD54L, CASP8, lyzed based on additional experimental results, which is also a part BARD1, PIK3CA, HMMR, NQO2, ESR1, RB1CC1, SLC22A1L, TSG101, of our future research. ATM, KRAS, BRCA2, XRCC3, AKT1, RAD51A, PALB2, CDH1, TP53, The PPI network suffers from both high false positive and false PHB, PPM1D, BRIP1, and CHEK2. When we applied our method negative rates. As the PPI network becomes more perfect and GO (RWRAHRSS), the top 10 predictions for breast cancer were annotation information becomes more abundant, the prediction PINK1, MSH5, TOP3A, RAD54L, DMC1, MLH3, PCCB, NMNAT1, results of our method will be better. Additionally, we can integrate BLM, and CHEK2. The fourth prediction for breast cancer was some other genomic information to further improve our method, RAD54L. However, RAD54L has been proven to be the cause of such as gene expression data and pathway membership data. breast cancer in the latest OMIM database. This finding fully Moreover, many studies have suggested that a PPI network can demonstrates the effectiveness of our approach. BLM was another also be used to identify disease-related sub-networks. Thus, fur- prediction and is also a suspected causative gene for breast cancer ther attention should be given to the modularity of disease. B. Liu et al. / Journal of Biomedical Informatics 57 (2015) 1–5 5

Authors’ contributions [16] C. Ortutay, M. Vihinen, Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies, Nucleic Acids Res. 37 (2) (2009) 622–628. BL conceived the algorithm, performed the programming and [17] P.C. Iratxeta, M. Wjst, P. Bork, AndreadMA: G2D: a tool for Mining Genes drafted the manuscript. MJ participated in the algorithm design associated with Disease, BMC Genet. 6 (3) (2005) 45–53. and analysis of the data and modified the manuscript. PZ also [18] Lian Chen, Yan Zhao, LiangCai Zheng, et al., CTFMining: a method to predict candidate disease genes based on the combined network topological features modified the manuscript. All authors have read and approved the mining, in: The 3rd International Conference on Bioinformatics and final manuscript. Biomedical Engineering, Beijing, 11–13 June 2009. [19] T. Ideker, R. Sharan, Protein networks in disease, Genome Res. 18 (4) (2008) 644–652. Conflict of interests [20] M. Oti, B. Snel, M.A. Huynen, H.G. Brunner, Predicting desease genes using protein-protein interactions, J. Med. Genet. 43 (2006) 691–698. [21] I. Feldman, A. Rzhetsky, D. Vitkup, Network properties of genes harboring The authors declare that they have no conflict of interests. inherited disease mutations, Proc. Natl. Acad. Sci. U.S.A. 105 (11) (2008) 4323– 4328. [22] L. Franke, H. Van Bakel, L. Fokkens, E.D. de jong, et al., Reconstruction of a Acknowledgements functional human gene network, with an application for prioritizing positional candidate genes, Am. J. Hum. Genet. 78 (6) (2006) 1011–1025. We would like to thank Dr. Wu from Hangzhou Normal [23] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, et al., Gene ontology: tool for the unification of biology, Nat. Genet. 25 (1) (2000) 25–29. University for providing the HRSS program. [24] A. Hamosh, A.F. Scott, J.S. Amberger, C.A. Bocchini, V.A. McKusick, Online This work is supported by the National Natural Science Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and Foundation of China (Grant No. 61374172). genetic disorders, Nucleic Acids Res. 33 (Database) (2005) 514–517. [25] X. Wu, E. Pang, K. Lin, Z.M. Pei, Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an Appendix A. Supplementary materials edge- and IC-based hybrid method, PLoS ONE 8 (5) (2013) e66745. [26] J.G. Sanchez, C. Barton, V. David, Human disease genes, Nature 409 (15) (2001) 853–855. Supplementary data associated with this article can be found, in [27] Jie Zhu, Yufang Qin et al., Prioritization of candidate disease genes by the online version, at http://dx.doi.org/10.1016/j.jbi.2015.07.005. topological similarity between disease and protein diffusion profiles, in: RECOMB-seq: Third Annual Recomb Satellite Workshop on Massively Parallel Sequencing, Beijing, China, 11–12 April 2013. References [28] K.K. Chung, T.M. Dawson, Parkin and Hsp70 sacked by BAG5 [abstract], Neuron (2004). [29] S. Lee, P.D. Smith, L. Liu, et al., BAG5 inhibits parkin and enhances [1] G. Gibson, Decanalization and the origin of complex disease, Nat. Rev. Genet. dopaminergic neuron degeneration, Neuron 44 (6) (2004) 931–945. 10 (2) (2009) 134–140. [30] http://www.ncbi.nlm.nih.gov/gene/10059. [2] D. Altshuler, M.J. Daly, E.S. Lander, Genetic mapping in human disease, Science [31] J.G.1. Hoekstra, T.J.2. Cook, T.1. Stewart, et al., Astrocytic dynamin-like protein 322 (5903) (2008) 881–888. 1 regulated neuronal protection against excitotoxicity in Parkinson disease, [3] H.J. Cordell, D.G. Clayton, Genetic association studies, Lancet 366 (9491) (2005) Am. J. Pathol. 185 (2) (2015) 536–549. 1121–1131. [32] X. Wang, T.G. Petrie, Y. Liu, et al., Parkinson’s disease-associated DJ-1 [4] M.D. Teare, J.H. Barrett, Genetic linkage studies, Lancet 366 (9490) (2005) mutations impair mitochondrial dynamics and cause mitochondrial 1036–1044. dysfunction, J. Neurochem. 121 (5) (2012) 830–839. [5] W. Anne, H. van Rensburg, J. Adams, H. Ector, F. Van de Werf, H. Heidbuchel, [33] J. Grohm, S.W. Kim, U. Mamrak, et al., Inhibition of Drp1 provides Ablation of post-surgical intra-atrial reentrant tachycardia. Predilection target neuroprotection in vitro and in vivo, Cell Death Differ. 19 (9) (2012) 1446– sites and mapping approach, Eur. Heart J. 23 (20) (2002) 1609–1616. 1458. [6] D. Botstein, N. Risch, Discovering genotypes underlying human phenotypes: [34] A. Sassi, M. Popielarski, E. Synowiec, et al., BLM and RAD51 genes past successes for Mendelian disease, future approaches for complex disease, polymorphism and susceptibility to breast cancer, Pathol. Oncol. Res. 19 (3) Nat. Genet. 33 (Suppl) (2003) 228–237. (2013) 451–459. [7] E.A. Adie, R.R. Adams, K.L. Evans, et al., Speeding disease gene discovery by [35] A.P.1. Sokolenko, A.G. Iyevleva, E.V. Preobrazhenskaya, N.V. Mitiushkina, S.N. sequence based candidate prioritization, BMC Bioinformatics 6 (2005) 55. Abysheva, E.N. Suspitsin, E.Sh. Kuligina, et al., High prevalence and breast [8] F.S. Turner, D.R. Clutterbuck, Semple CAM: POCUS: mining genomic sequence cancer predisposing role of the BLM c.1642 C > T (Q548X) mutation in Russia, annotation to predict disease genes, Genome Biol. 4 (11) (2003). Int. J. Cancer 130 (12) (2012) 2867–2873. [9] J. Freudenberg, P. Propping, A similarity-based method for genome-wide [36] A.P. Sokolenko, A.G. Iyevleva, E.V. Preobrazhenskaya, N.V. Mitiushkina, S.N. prediction of disease-relevant human genes, Bioinformatics 18 (Suppl) (2002) Abysheva, E.N. Suspitsin, E.Sh. Kuligina, et al., Transcriptomic and protein S110–S115. expression analysis reveals clinicopathological significance of Bloom’s [10] C. Perez-Iratxeta, P. Bork, M.A. Andrade, Association of genes to genetically syndrome helicase (BLM) in breast cancer [abstract], Mol. Cancer Ther. (2015). inherited diseases using data mining, Nat. Genet. 31 (3) (2002) 316–319. [37] J. Wu, Z. Liu, K. Meng, L. Zhang, Association of adiponectin gene (ADIPOQ) [11] S. Aerts, D. Lambrechts, S. Maity, P. Van Loo, B. Coessens, F. De Smet, L.C. rs2241766 polymorphism with obesity in adults: a meta-analysis, PLoS ONE 4 Tranchevent, B. De Moor, P. Marynen, B. Hassan, et al., Gene prioritization (2014) e95. through genomic data fusion, Nat. Biotechnol. 24 (5) (2006) 537–544. [38] S. Bag, A. Anbarasu, Revealing the strong functional association of adipor2 and [12] L. Franke, L. Fokkens, H. van Bakel, E.D. de Jong, et al., Reconstruction of a cdh13 with adipoq: a gene network study [abstract], Cell Biochem. Biophys. functional human gene network, with an application for prioritizing positional (2014). candidate genes, Am. J. Hum. Genet. 78 (6) (2006) 1011–1025. [39] J.F. Lu, Y. Zhou, G.H. Huang, H.X. Jiang, et al., Association of ADIPOQ [13] M.A. van Driel, K. Cuelenaere, P.P. Kemmeren, J.A. Leunissen, H.G. Brunner, G. polymorphisms with obesity risk: a meta-analysis [abstract], Hum. Vriend, GeneSeeker: extraction and integration of human disease-related Immunol. (2014). information from web-based genetic databases, Nucleic Acids Res. 33 (Web [40] K. Ramya, K.A. Ayyappa, S. Ghosh, V. Mohan, V. Radha, Genetic association of Server) (2005) W758–W761. ADIPOQ gene variants with type 2 diabetes, obesity and serum adiponectin [14] S. Köhler, S. Bauer, D. Horn, P.N. Robinsoh, Walking the interactome for levels in south Indian population, Gene 532 (2) (2013) 253–262. prioritization of candidate disease genes, Am. J. Hum. Genet. 82 (4) (2008) [41] L. Sominsky, S.J. Spencer, Eating behavior and stress: a pathway to obesity, 949–958. Front. Psychol. 5 (2014) 434. [15] J. Xu, Y. Li, Discovering disease–genes by topological features in human [42] S.A. Stout, E.V. Espel, C.A. Sandman, L.M. Glynn, E.P. Davis, Fetal programming protein–protein interaction network, Bioinformatics 22 (22) (2006) 2800– of children’s obesity risk, Psychoneuroendocrinology 53 (2015) 29–39. 2805.