Biological Network Approaches and Applications in Rare Disease Studies
Total Page:16
File Type:pdf, Size:1020Kb
G C A T T A C G G C A T genes Review Biological Network Approaches and Applications in Rare Disease Studies Peng Zhang 1,* and Yuval Itan 2,3 1 St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065, USA 2 The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; [email protected] 3 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA * Correspondence: [email protected]; Tel.: +1-646-830-6622 Received: 3 September 2019; Accepted: 10 October 2019; Published: 12 October 2019 Abstract: Network biology has the capability to integrate, represent, interpret, and model complex biological systems by collectively accommodating biological omics data, biological interactions and associations, graph theory, statistical measures, and visualizations. Biological networks have recently been shown to be very useful for studies that decipher biological mechanisms and disease etiologies and for studies that predict therapeutic responses, at both the molecular and system levels. In this review, we briefly summarize the general framework of biological network studies, including data resources, network construction methods, statistical measures, network topological properties, and visualization tools. We also introduce several recent biological network applications and methods for the studies of rare diseases. Keywords: biological network; bioinformatics; database; software; application; rare diseases 1. Introduction Network biology provides insights into complex biological systems and can reveal informative patterns within these systems through the integration of biological omics data (e.g., genome, transcriptome, proteome, and metabolome) and biological interactome data (e.g., protein-protein interactions and gene-gene associations). Collectively, through the applications of statistics, graph theory methods, mathematical modeling, and visualization tools, network biology has deepened our understanding of biological mechanisms [1,2] and diseases etiologies [3,4], and has facilitated therapeutics discovery [5,6]. Biological networks can be constructed and applied in a number of ways to address biological questions. Some of the general purposes of biological networks include: (1) identification and prioritization of disease-causing candidate genes [7,8]; (2) identification of disease-associated subnetworks and systematic perturbations of diseases [3,9]; and (3) capturing therapeutic responses to facilitate target identification and drug discovery [5,6]. In this review article, we will briefly introduce the general framework of biological network approaches, followed by several associated applications and methods for studying rare diseases. Figure1 illustrates a flowchart of biological network studies. Genes 2019, 10, 797; doi:10.3390/genes10100797 www.mdpi.com/journal/genes Genes 2019, 10, 797 2 of 15 Genes 2019, 10, x FOR PEER REVIEW 2 of 15 Figure 1. A flowchart illustrating the general framework of biological network studies. The blue Figure 1. A flowchart illustrating the general framework of biological network studies. The blue arrows and text correspond to the construction of biological networks, whereas the green arrows and arrows and text correspond to the construction of biological networks, whereas the green arrows and text correspond to the mapping of biological data onto the network. text correspond to the mapping of biological data onto the network. 2. General Framework of Biological Networks 2. General Framework of Biological Networks 2.1. Construction of Biological Networks 2.1. Construction of Biological Networks 2.1.1. Protein Interaction-Based Network Construction 2.1.1. Protein Interaction-Based Network Construction Protein-protein interactions (PPIs) are widely used in the construction of biological networks. The sourceProtein-protein of PPIs can interactions be obtained (PPIs) from are large-scale widely experiments,used in the construction computational of biological predictions, networks. and the miningThe source of published of PPIs can literature. be obtained Physically from large-scal interactinge experiments, PPIs have often computatio been usednal to predictions, generate the and global the referencemining of biological published network, literature. and Physically have been intera usedcting as a startingPPIs have point often for been further used analyses. to generate Several the widelyglobal reference used databases biological (e.g., networ BioGRIDk, and [10 ]),have IntAct been [11 used], STRING as a starting [12], HIPPE point [13for], further and HPRD analyses. [14]) provideSeveral humanwidely physicalused databases PPIs (Table (e.g.,1 ),BioGRID and one [10]), of them, IntAct STRING, [11], STRING also provides [12], HIPPE gene associations [13], and HPRD and interactions[14]) provide predicted human byphysical text-mining, PPIs with(Table pre-computed 1), and one confidenceof them, STRING, scores. BioGRID, also provides IntAct, gene and STRINGassociations are theand resources interactions updated predicted most frequently.by text-mining, The simplest with pre-computed biological entities confidence can be mapped scores. ontoBioGRID, PPI networks, IntAct, and including STRING lists are ofthe genes resources of interest, updated known most or frequently. candidate The disease-causing simplest biological genes, approvedentities can or be investigational mapped onto therapeutic PPI networks, targets, includin and diagnosticg lists of genes biomarkers. of interest, known or candidate disease-causingTissue-specific genes, PPI approved networks or have investigational been shown therapeutic to provide targets, significantly and diagnostic improved biomarkers. biological enrichmentTissue-specific compared PPI to thenetworks global networkhave been [15 ,shown16], particularly to provide for significantly the prioritization improved of disease biological genes andenrichment for the study compared of disease to the progression global network [7,17,18 [15,16],]. Tissue-specific particularly PPI for networks the prioritization can be constructed of disease by mappinggenes and the for gene the expression study of data disease obtained progression in transcriptome [7,17,18]. or Tissue-specific proteome experiments PPI networks onto the can global be PPIconstructed network, orby theymapping can be obtainedthe gene from expression databases da providingta obtained tissue-specific in transcriptome interactions or /networksproteome (e.g.,experiments TissueNet onto [19 ],the GIANT global [20 PPI], IID network, [21], and or TISPIN they can [22]). be These obtained resources from (Table databases1) o ff erproviding various approachestissue-specific and interactions/networks features for studying (e.g., tissue-specific TissueNet systems [19], GIANT and activities. [20], IID When [21], studyingand TISPIN diseases [22]). aTheseffecting resources multiple (Table tissues 1) (e.g., offer autoimmune various approaches disorders), and additional features for biological studying data tissue-specific are required systems for the derivationand activities. of multiple When tissue-specificstudying diseases networks affecting and formultiple the comparison tissues (e.g., of disease-associated autoimmune disorders), signals betweenadditional tissues. biological Network-based data are required therapeutics for the investigationsderivation of multiple have recently tissue-specific moved into networks the limelight. and for Thesethe comparison studies typically of disease-associated integrate tissue-specific signals protein between interaction tissues. networks Network-based and diff erentialtherapeutics gene investigations have recently moved into the limelight. These studies typically integrate tissue-specific protein interaction networks and differential gene expression data to represent the Genes 2019, 10, 797 3 of 15 expression data to represent the drug perturbation signatures in different molecular environments, and this approach has greatly improved drug target identification and drug response prediction [20,23–25]. Table 1. Databases providing human protein-protein interaction data. Global PPI databases Number of Number of Database Type of Interaction Interactions Genes/Proteins BioGRID [10] Physical 371,513 23,795 IntAct [11] Physical 379,393 25,643 STRING [12] Physical, association, text-mining 11,759,455 19,567 HIPPE [13] Physical 273,900 17,000 HPRD [14] Physical 41,327 30,047 Tissue/cell-type-specific PPI databases Number of Number of Number of Database Type of Interaction Tissues/Cell-Types Interactions Genes/Proteins TissueNet [19] Physical 40 243,706 17,283 GIANT [20] Physical, co-expression 144 n.a. n.a. IID [21] Physical, predicted 26 975,877 19,250 TISPIN [22] Physical 53 128,579 13,123 2.1.2. Gene Correlation/Association-Based Network Construction Genes can also be connected to each other on the basis of the correlation in their levels of expression, or the statistical significance of association with a given phenotype, to construct the gene-gene networks. Such networks are usually context-specific (disease-specific, tissue-specific, cell type-specific, treatment-specific, etc.) based on the condition in which the experiments were designed and performed. Gene-gene correlations/associations can be inferred by various methods, including gene co-expression correlation [26], Bayesian probability [27], Gaussian graphical model [28], or information theory method