Sequence-Based Prediction of Protein Protein Interaction Using a Deep-Learning Algorithm Tanlin Sun1, Bo Zhou1, Luhua Lai1,2,3 and Jianfeng Pei1*

Total Page:16

File Type:pdf, Size:1020Kb

Sequence-Based Prediction of Protein Protein Interaction Using a Deep-Learning Algorithm Tanlin Sun1, Bo Zhou1, Luhua Lai1,2,3 and Jianfeng Pei1* Sun et al. BMC Bioinformatics (2017) 18:277 DOI 10.1186/s12859-017-1700-2 RESEARCHARTICLE Open Access Sequence-based prediction of protein protein interaction using a deep-learning algorithm Tanlin Sun1, Bo Zhou1, Luhua Lai1,2,3 and Jianfeng Pei1* Abstract Background: Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. Results: We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. Conclusions: To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field. Keywords: Deep learning, Protein-protein interaction Background high-throughput computational methods are needed to Protein-protein interactions (PPI) play critical roles in identify PPIs with high quality and accuracy. many cellular biological processes, such as signal trans- Recently, many computational methods have been gen- duction, immune response, and cellular organization. erated to solve this problem. Of these, some have Analysis of PPI is therefore of great importance and may attempted to mine new protein information, whereas shed light on drug target detection and aid in therapy others involved the development of new machine-learning design [1]. Biochemical assays, chromatography, and algorithms. For protein information mining, Shen et.al similar small-scale experimental methods have long been regarded any three continuous amino acids as a unit and used to identify novel PPIs, but these only contribute to calculated the frequencies of those conjoint triads in the a low coverage of the whole PPI database due to their protein sequences. They demonstrated that PPIs could be poor efficacies [2]. High-throughput technologies, such predicted by sequences alone [6]. Several other methods, as yeast two-hybrid screens (Y2H) [3] and mass spec- such as autocovariance (AC) [7] and amino acid index dis- trometric protein complex identification (MS-PCI) [4], tribution [8] were developed to extract features such as have generated copious data, however, they are expen- physical chemical properties, frequencies, and locations of sive and time consuming. In addition, these methods amino acids to represent a protein sequence. Considering may not be applicable to proteins from all organisms the high dimensions of the features, dimension reduction and often produce false-positive results [5]. Therefore, techniques have been used. For machine-learning algo- rithms, support vector machine (SVM) and its derivatives [9, 10], random forest [11] and neural networks [12], have * Correspondence: [email protected] been applied. However, most studies provided only the 1Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China results of cross-validation, and did not test prediction Full list of author information is available at the end of the article results using external datasets [6, 10, 13, 14]. © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Sun et al. BMC Bioinformatics (2017) 18:277 Page 2 of 8 Deep-learning algorithms, which mimic the deep Datasets neural connections and learning processes of the human Benchmark dataset brain, have received considerable attention due to their We obtained the Pan’s PPI dataset from http://www.csbio. successful applications in speech and image recognition sjtu.edu.cn/bioinf/LR_PPI/Data.htm [14]. In this dataset, [15, 16], natural language understanding [17] and deci- the positive samples (PPIs) are from the human protein sion making [18]. Compared to traditional machine references database (HPRD, 2007 version), with removal learning methods, deep-learning algorithms can handle of duplicated interactions (36,630 pairs remained). large-scale raw and complex data and automatically Negative samples (non-interaction pairs) were generated learn useful and more abstract features [19]. In recent by pairing proteins found in different subcellular locations. years, these algorithms have been applied to bioinfor- The protein subcellular location information was from the matics to manage increasing amounts and dimensions of Swiss-Prot database, version 57.3, according to the fol- data generated by high throughput technique [20–24]. lowing criteria. (1) Only human proteins were collected. For genome regulation function prediction, for example, (2) Sequences annotated with ambiguous or uncertain Xiong et al. applied a deep neural network model to subcellular location terms, such as “potential”, “probable”, predict DNA variants causing aberrant splicing. Their “probably”, “maybe”,or“by similarity”, were excluded. (3) method was more accurate than traditional models [25]. Sequences annotated with two or more locations were The DeepBind model constructed by Alipanahi and col- excluded for lack of uniqueness. (4) Sequences annotated leagues using convolutional networks could predict with “fragment” were excluded, and sequences with fewer sequence specificities of DNA- and RNA-binding proteins, than 50 amino acid residues were removed due to the pos- and identify binding motifs [26]. Identifying functional sibility that they may represent fragments. effects of noncoding variants is a major challenge in hu- In total, 2,184 unique proteins from six subcellular man genetics. DeepSEA developed by Zhou et al. could locations (cytoplasm, nucleus, endoplasmic reticulum, directly learn a regulatory sequence code from large-scale Golgi apparatus, lysosome, and mitochondrion) were chromatin-profiling data, enabling prediction of chromatin obtained. By randomly pairing those proteins with others effects of sequence alterations with single-nucleotide sensi- found in different subcellular locations, along with the tivity [27]. After that, the DnaQ model constructed by addition of negative pairs from [33], a total of 36,480 Quang and coworkers achieved more than a 50% relative negative pairs were generated. We removed protein pairs improvement compared to other models for predicting the with unusual amino acids, such as U and X to yield function of non-coding DNA [28]. For protein function 36,545 positive samples and 36,323 negative samples to prediction, Spencer et al. used a deep belief network form the benchmark dataset. The interaction networks (DBN) to predict protein secondary structures and they and the degree distributions of the positive and nega- achieved an accuracy of 80.7% [29]. Sheng and colleagues tive sample sets of the benchmark dataset are shown in improved the prediction accuracy to 84% using deep con- Additionalfile1FigureS1andS2. volutional neural fields [30]. Heffernan et al.’salgorithm We mixed the positive and negative samples in the can not only predict secondary structures, but also can benchmark dataset and randomly selected 7,000 pairs predict backbone angles and solvent accessible surface (3,493 positive samples and 3,507 negative samples) as a areas [31]. A more detailed summary of the application of hold-out test set for model validation, the remainder of the deep learning algorithm in computational biology can which formed the pre-training set (33,052 positive sam- be found in a recent review [32]. ples and 32,816 negative samples). The pre-training set In this study, we applied Stacked autoencoder (SAE) was trained and tested using 10-CV, and the best models to study sequence-based human PPI predictions. Models were selected to predict the hold-out test set. To test the based on protein sequence autocovariance coding robustness of the model, a non-redundant test set (‘NR- achieved the best results on 10-fold cross-validation (10- test set’) was formed by removing pairs in the hold-out CV) and on predicting hold-out test sets. The best test set with a pairwise identity ≥25% to those in the model had an average accuracy of 97.19% for the whole pre-training set. After the network architecture and training benchmark dataset. Various external test sets parameters were selected, we trained with the whole were constructed and predicted using our model and benchmark dataset to construct our final PPI prediction the prediction accuracies for these ranged from 87.99 model and used it to predict the external test sets. to 99.21%. In addition, we trained and tested PPI models on other species,
Recommended publications
  • Systems Biology: Tracking the Protein–Metabolite Interactome
    RESEARCH HIGHLIGHTS SYSTEMS BIOLOGY Tracking the protein–metabolite interactome A method combining limited proteolysis with mass spectrometry systematically detects protein–metabolite interactions. The genome, the transcriptome and the proteome all receive a lot of attention, but there are also multitudes of small molecules in a cell, collectively making up the chemically diverse metabolome. Metabolites wear many hats, serving as enzyme substrates and products, cofac- tors, allosteric regulators, and as mediators of protein-complex assembly. But despite their essential roles in biological processes, their interactions with proteins—which are often tran- sient and low-affinity—remain largely a mystery. “Understanding how these interactions occur on a global scale is essential to understand mechanisms of cellular adaptation and ecosystems’ dynamics,” says Paola Picotti of ETH Zurich in Switzerland. She and her colleagues recently designed a method to systematically discover which proteins bind to a metabolite of interest. The approach may also be useful for drug discovery, as a way to identify druggable sites in proteins and to test for off-target effects. To identify protein–metabolite interactions in Escherichia coli, Picotti’s team treated a whole- cell lysate with a metabolite of interest then added a low amount of the broad-specificity pro- tease proteinase K for a short period of time, all under native conditions. This ‘limited prote- olysis’ approach generates structure-specific protein fragments—metabolite binding can block proteinase K cleavage at locations which would otherwise be severed. Switching to denaturing conditions, the researchers then used the enzyme trypsin to completely digest the metabolite- treated sample and a reference untreated sample, generating peptides for label-free quantitative mass spectrometry analysis, and compared the resulting differential peptide spectral patterns.
    [Show full text]
  • A Metabolic Modeling Approach Reveals Promising Therapeutic Targets and Antiviral Drugs to Combat COVID-19
    www.nature.com/scientificreports OPEN A metabolic modeling approach reveals promising therapeutic targets and antiviral drugs to combat COVID‑19 Fernando Santos‑Beneit 1, Vytautas Raškevičius2, Vytenis A. Skeberdis2 & Sergio Bordel 1,2* In this study we have developed a method based on Flux Balance Analysis to identify human metabolic enzymes which can be targeted for therapeutic intervention against COVID‑19. A literature search was carried out in order to identify suitable inhibitors of these enzymes, which were confrmed by docking calculations. In total, 10 targets and 12 bioactive molecules have been predicted. Among the most promising molecules we identifed Triacsin C, which inhibits ACSL3, and which has been shown to be very efective against diferent viruses, including positive‑sense single‑stranded RNA viruses. Similarly, we also identifed the drug Celgosivir, which has been successfully tested in cells infected with diferent types of viruses such as Dengue, Zika, Hepatitis C and Infuenza. Finally, other drugs targeting enzymes of lipid metabolism, carbohydrate metabolism or protein palmitoylation (such as Propylthiouracil, 2‑Bromopalmitate, Lipofermata, Tunicamycin, Benzyl Isothiocyanate, Tipifarnib and Lonafarnib) are also proposed. Te COVID-19 pandemic, caused by the virus SARS-CoV-2, has resulted in a substantial increase in mortality and serious economic and social disruption worldwide1. In this context, the rapid identifcation of therapeutic molecules against SARS-CoV-2 is essential. To this aim an extensive collaboration and teamwork among research- ers of all academic disciplines is required2. Computational methods and systems biology approaches, as the one presented here, can play a signifcant role in this process of identifcation of suitable drugs.
    [Show full text]
  • Deriving Disease Modules from the Compressed Transcriptional Space Embedded in a Deep Auto-Encoder
    bioRxiv preprint doi: https://doi.org/10.1101/680983; this version posted June 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Title: Deriving Disease Modules from the Compressed Transcriptional Space Embedded in a Deep Auto-encoder Sanjiv K. Dwivedi1, Andreas Tjärnberg1, Jesper Tegnér2,3,4 and Mika Gustafsson1 1Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden. 2Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955–6900, Saudi Arabia. 3Unit of Computational Medicine, Department of Medicine, Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden. 4Science for Life Laboratory, Solna, Sweden. Abstract Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, commonly used to define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without assuming the prior knowledge of a biological network. To this end we train a deep auto-encoder on a large transcriptional data-set. Our hypothesis is that such modules could be discovered in the deep representations within the auto-encoder when trained to capture the variance in the input-output map of the transcriptional profiles. Using a three-layer deep auto-encoder we find a statistically significant enrichment of GWAS relevant genes in the third layer, and to a successively lesser degree in the second and first layers respectively.
    [Show full text]
  • MS-Based Interactomics: Computational Resources and Tools for Studying the Physical Interactome
    MS-based Interactomics: Computational resources and tools for studying the physical interactome ASMS Bioinformatics MS Interest Group Wednesday Evening Workshop Isabell Bludau & Bill Noble What is ‘interactomics’ and why do we discuss it? • Many MS-omics studies focus on cataloging and quantifying individual molecules of a particular type • e.g. quantitative protein or metabolite matrix • Most biological molecules don’t operate in isolation but they interact with each other • protein complexes • activity regulation via metabolite/drug binding • ‘Interactome’ = comprehensive set of molecular interactions in biological system • here we focus only on physical (not functional) interactions Current MS-based techniques for large-scale interactomics Protein-protein interaction (PPI) networks: • Affinity-purification MS (AP-MS) ➭ Interaction network • Proximity-dependent labeling: APEX, BioID Protein-protein complexes: • Protein co-fractionation MS (CoFrac-MS) ➭ Protein complexes protein intensity fractions Structural information on PPIs: • Cross-linking MS ➭ Structure: interacting protein residues Current MS-based techniques for large-scale interactomics Protein-metabolite/drug interactions: • Thermal proteome profiling (TPP) / ➭ Protein-ligand Cellular Thermal Shift Assay (CETSA) +/- drug protein interactions non-denatured • Limited proteolysis-coupled MS (LipMS) temperature Protein-RNA interactions: ➭ Protein-RNA • Protein-RNA crosslinking interactions at residue resolution Available resources and databases Databases based on: Databases: •
    [Show full text]
  • Interactome INSIDER: a Structural Interactome Browser for Genomic Studies
    RESOURCE Interactome INSIDER: a structural interactome browser for genomic studies Michael J Meyer1–3,6, Juan Felipe Beltrán1,2,6, Siqi Liang1,2,6, Robert Fragoza2,4, Aaron Rumack1,2, Jin Liang2, Xiaomu Wei1,5 & Haiyuan Yu1,2 We present Interactome INSIDER, a tool to link genomic been demonstrated that mutations tend to localize to interaction variant information with structural protein–protein interfaces, and mutations on the same protein may cause clini- interactomes. Underlying this tool is the application cally distinct diseases by disrupting interactions with different of machine learning to predict protein interaction interfaces partners6,8. However, the binding topologies of interacting pro- for 185,957 protein interactions with previously unresolved teins can only be determined at atomic resolution through X-ray interfaces in human and seven model organisms, including crystallography, NMR, and (more recently) cryo-EM9 experi- the entire experimentally determined human binary ments, which limits the number of interactions with resolved interactome. Predicted interfaces exhibit functional interaction interfaces. properties similar to those of known interfaces, including To study protein function on a genomic scale, especially as it relates enrichment for disease mutations and recurrent cancer to human disease, a large-scale set of protein interaction interfaces mutations. Through 2,164 de novo mutagenesis experiments, is needed. Thus far, computational methods such as docking10 and we show that mutations of predicted and known interface homology modeling11 have been employed to predict the atomic- residues disrupt interactions at a similar rate and much more level bound conformations of interactions whose experimental frequently than mutations outside of predicted interfaces. To spur functional genomic studies, Interactome INSIDER structures have not yet been determined.
    [Show full text]
  • Interactome of the Hepatitis C Virus: Literature Mining with Andsystem
    G Model VIRUS-96776; No. of Pages 9 ARTICLE IN PRESS Virus Research xxx (2015) xxx–xxx Contents lists available at ScienceDirect Virus Research j ournal homepage: www.elsevier.com/locate/virusres Interactome of the hepatitis C virus: Literature mining with ANDSystem a,b a,b,c a,b Olga V. Saik , Timofey V. Ivanisenko , Pavel S. Demenkov , a,b,∗ Vladimir A. Ivanisenko a The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, 630090 Novosibirsk, Russia b PB-soft, LLC, Prospekt Lavrentyeva 10, 630090 Novosibirsk, Russia c Novosibirsk State University, Pirogova Str. 2, 630090 Novosibirsk, Russia a r t i c l e i n f o a b s t r a c t Article history: A study of the molecular genetics mechanisms of host–pathogen interactions is of paramount importance Received 15 July 2015 in developing drugs against viral diseases. Currently, the literature contains a huge amount of informa- Received in revised form 22 October 2015 tion that describes interactions between HCV and human proteins. In addition, there are many factual Accepted 3 December 2015 databases that contain experimentally verified data on HCV–host interactions. The sources of such data Available online xxx are the original data along with the data manually extracted from the literature. However, the manual analysis of scientific publications is time consuming and, because of this, databases created with such an Keywords: approach often do not have complete information. One of the most promising methods to provide actu- Hepatitis C virus ANDSystem alisation and completeness of information is text mining.
    [Show full text]
  • Dynamic Rewiring of the Human Interactome by Interferon Signalling
    bioRxiv preprint doi: https://doi.org/10.1101/766808; this version posted September 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Dynamic rewiring of the human interactome by interferon signalling 1,2,3,4 1,3 1 2 Craig H. Kerr ,​ Michael A. Skinnider ,​ Angel M. Madero ,​ Daniel D.T. Andrews ,​ R. Greg ​ ​ ​ ​ 1 1 1 2 1,2 Stacey ,​ Queenie W.T. Chan ,​ Nikolay Stoynov ,​ Eric Jan ,​ Leonard J. Foster *​ ​ ​ ​ ​ ​ 1 ​ Michael Smith Laboratories, University of British Columbia, Vancouver BC, V6T 1Z4 2 ​ Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver BC, V6T 1Z3 3 ​ These authors contributed equally 4 ​ Current address: Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA. * Correspondence: [email protected] keywords: interferon, proteomics, interferon stimulated gene, innate immunity, protein correlation profiling, interactome, protein complexes 1 bioRxiv preprint doi: https://doi.org/10.1101/766808; this version posted September 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. ABSTRACT Background: The type I interferon (IFN) response is an ancient pathway that protects cells ​ against viral pathogens by inducing the transcription of hundreds of IFN-stimulated genes (ISGs). Transcriptomic and biochemical approaches have established comprehensive catalogues of ISGs across species and cell types, but their antiviral mechanisms remain incompletely characterized.
    [Show full text]
  • Interactions of the Antiviral Factor Interferon Gamma-Inducible Protein 16 (IFI16) Mediate Immune Signaling and Herpes Simplex Virus-1 Immunosuppression*□S
    crossmark Research © 2015 by The American Society for Biochemistry and Molecular Biology, Inc. This paper is available on line at http://www.mcponline.org Interactions of the Antiviral Factor Interferon Gamma-Inducible Protein 16 (IFI16) Mediate Immune Signaling and Herpes Simplex Virus-1 Immunosuppression*□S Benjamin A. Diner, Krystal K. Lum, Aaron Javitt, and Ileana M. Cristea‡ The interferon-inducible protein IFI16 has emerged as a The ability of mammalian cells to distinguish self from non- critical antiviral factor and sensor of viral DNA. IFI16 self is paramount for triggering host immune defenses in binds nuclear viral DNA, triggering expression of antivi- response to viral infection. To this end, cells intrinsically ex- ral cytokines during infection with herpesviruses. The press highly specialized receptors purposed with “sensing” knowledge of the mechanisms and protein interactions viral nucleic acids. Upon binding to their viral cognate ligand, through which IFI16 exerts its antiviral functions re- these cellular receptors initiate intracellular immune signaling mains limited. Here, we provide the first characteriza- cascades, culminating in the production and secretion of tion of endogenous IFI16 interactions following infection cytokines, such as type I interferons (IFNs)1 (1). Subsequently, with the prominent human pathogen herpes simplex vi- rus 1 (HSV-1). By integrating proteomics and virology these cytokines stimulate antiviral gene programs in neigh- approaches, we identified and validated IFI16 interac- boring cells and mobilize effectors of the innate and adaptive tions with both viral and host proteins that are involved arms of the host immune system. These cytokine functions in HSV-1 immunosuppressive mechanisms and host an- are critical to abate viral replication and spread at the site of tiviral responses.
    [Show full text]
  • Viral-Host Interactome Analyzed by Network Based-Approach Model to Study Pathogenesis of SARS-Cov-2 Infection
    bioRxiv preprint doi: https://doi.org/10.1101/2020.05.07.082487; this version posted May 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 COVID-19: Viral-host interactome analyzed by network based-approach model to study pathogenesis 2 of SARS-CoV-2 infection. 3 4 5 Francesco Messina1°, Emanuela Giombini1°, Chiara Agrati1, Francesco Vairo1, Tommaso Ascoli 6 Bartoli1, Samir Al Moghazi1, Mauro Piacentini1,2, Franco Locatelli3, Gary Kobinger4, Markus 7 Maeurer5, Alimuddin Zumla6, Maria R. Capobianchi1*, Francesco Nicola Lauria1°, Giuseppe 8 Ippolito1°, COVID 19 INMI Network Medicine for IDs Study Group. 9 ° These authors contributed equally to this work. 10 11 1 National Institute for Infectious Diseases "Lazzaro Spallanzani" IRCCS, Rome, Italy. 12 2 Department of Biology, University of Rome "Tor Vergata," Rome, Italy. 13 3 Department of Pediatric Hematology and Oncology, IRCCS Ospedale Pediatrico Bambino Gesù, Piazza 14 Sant'Onofrio, 4, 00165 Rome, Italy. 15 4 Département de Microbiologie-Infectiologie et d'Immunologie, Université Laval, Québec City, Québec, 16 Canada. 17 5 Champalimaud Centre for the Unknown, Lisbon, Portugal; I. Medizinische Klinik Johannes Gutenberg- 18 Universität, University of Mainz, 55131 Mainz, Germany. 19 6 Department of Infection, Division of Infection and Immunity, University College London, and National 20 Institutes of Health and Research Biomedical Research Centre, University College London Hospitals NHS 21 Foundation Trust, London, UK. 22 23 24 25 * Correspondence and requests for materials should be addressed to: 26 Maria R. Capobianchi. Electronic address: [email protected] 27 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.07.082487; this version posted May 7, 2020.
    [Show full text]
  • A Massively Parallel Barcoded Sequencing Pipeline Enables Generation of the First Orfeome and Interactome Map for Rice
    A massively parallel barcoded sequencing pipeline enables generation of the first ORFeome and interactome map for rice Shayne D. Wierbowskia,b,1, Tommy V. Vob,1,2, Pascal Falter-Braunc,d, Timothy O. Jobee, Lars H. Krusef, Xiaomu Weia, Jin Liangb, Michael J. Meyera,b, Nurten Akturkb, Christen A. Rivera-Erickb, Nicolas A. Corderob, Mauricio I. Paramob,g, Elnur E. Shayhidinb, Marta Bertolottib, Nathaniel D. Tippensa,b, Kazi Aktherh, Rita Sharmai, Yuichi Katayosej, Kourosh Salehi-Ashtianik,l,m,n, Tong Haol,m, Pamela C. Ronaldo,p,q, Joseph R. Eckerr,s, Peter A. Schweitzert, Shoshi Kikuchiu, Hiroshi Mizunov, David E. Hilll,m, Marc Vidall,m, Gaurav D. Moghef, Susan R. McCouchh,3, and Haiyuan Yua,b,3 aDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853; bWeill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853; cInstitute of Network Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Munich, Germany; dMicrobe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, 80539 Munich, Germany; eBotanical Institute, Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, 50674 Cologne, Germany; fPlant Biology Section, School of Integrative Plant Sciences, Cornell University, Ithaca, NY 14853; gDepartment of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853; hSection of Plant Breeding and Genetics, School of Integrated Plant Sciences, Cornell University, Ithaca, NY 14853-1901; iSchool
    [Show full text]
  • Integration of Molecular Interactome and Targeted Interaction Analysis To
    www.nature.com/scientificreports OPEN Integration of Molecular Interactome and Targeted Interaction Analysis to Identify a Received: 2 November 2017 Accepted: 20 August 2018 COPD Disease Network Module Published: xx xx xxxx Amitabh Sharma1,3,4,5, Maksim Kitsak4, Michael H. Cho 1,2,3, Asher Ameli1,10, Xiaobo Zhou1,3, Zhiqiang Jiang1, James D. Crapo6, Terri H. Beaty 7, Jörg Menche8, Per S. Bakke9, Marc Santolini 1,4,5 & Edwin K. Silverman1,2,3 The polygenic nature of complex diseases ofers potential opportunities to utilize network-based approaches that leverage the comprehensive set of protein-protein interactions (the human interactome) to identify new genes of interest and relevant biological pathways. However, the incompleteness of the current human interactome prevents it from reaching its full potential to extract network-based knowledge from gene discovery eforts, such as genome-wide association studies, for complex diseases like chronic obstructive pulmonary disease (COPD). Here, we provide a framework that integrates the existing human interactome information with experimental protein- protein interaction data for FAM13A, one of the most highly associated genetic loci to COPD, to fnd a more comprehensive disease network module. We identifed an initial disease network neighborhood by applying a random-walk method. Next, we developed a network-based closeness approach (CAB) that revealed 9 out of 96 FAM13A interacting partners identifed by afnity purifcation assays were signifcantly close to the initial network neighborhood. Moreover, compared to a similar method (local radiality), the CAB approach predicts low-degree genes as potential candidates. The candidates identifed by the network-based closeness approach were combined with the initial network neighborhood to build a comprehensive disease network module (163 genes) that was enriched with genes diferentially expressed between controls and COPD subjects in alveolar macrophages, lung tissue, sputum, blood, and bronchial brushing datasets.
    [Show full text]
  • Interactome Networks and Human Disease
    Leading Edge Review Interactome Networks and Human Disease Marc Vidal,1,2,* Michael E. Cusick,1,2 and Albert-La´ szlo´ Baraba´ si1,3,4,* 1Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA 2Department of Genetics, Harvard Medical School, Boston, MA 02115, USA 3Center for Complex Network Research (CCNR) and Departments of Physics, Biology and Computer Science, Northeastern University, Boston, MA 02115, USA 4Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA *Correspondence: [email protected] (M.V.), [email protected] (A.-L.B.) DOI 10.1016/j.cell.2011.02.016 Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease. Introduction phenotypic associations, there would still be major problems Since the advent of molecular biology, considerable progress to fully understand and model human genetic variations and their has been made in the quest to understand the mechanisms impact on diseases. that underlie human disease, particularly for genetically inherited To understand why, consider the ‘‘one-gene/one-enzyme/ disorders. Genotype-phenotype relationships, as summarized in one-function’’ concept originally framed by Beadle and Tatum the Online Mendelian Inheritance in Man (OMIM) database (Am- (Beadle and Tatum, 1941), which holds that simple, linear berger et al., 2009), include mutations in more than 3000 human connections are expected between the genotype of an organism genes known to be associated with one or more of over 2000 and its phenotype.
    [Show full text]