Interactome INSIDER: a Structural Interactome Browser for Genomic Studies

Total Page:16

File Type:pdf, Size:1020Kb

Interactome INSIDER: a Structural Interactome Browser for Genomic Studies RESOURCE Interactome INSIDER: a structural interactome browser for genomic studies Michael J Meyer1–3,6, Juan Felipe Beltrán1,2,6, Siqi Liang1,2,6, Robert Fragoza2,4, Aaron Rumack1,2, Jin Liang2, Xiaomu Wei1,5 & Haiyuan Yu1,2 We present Interactome INSIDER, a tool to link genomic been demonstrated that mutations tend to localize to interaction variant information with structural protein–protein interfaces, and mutations on the same protein may cause clini- interactomes. Underlying this tool is the application cally distinct diseases by disrupting interactions with different of machine learning to predict protein interaction interfaces partners6,8. However, the binding topologies of interacting pro- for 185,957 protein interactions with previously unresolved teins can only be determined at atomic resolution through X-ray interfaces in human and seven model organisms, including crystallography, NMR, and (more recently) cryo-EM9 experi- the entire experimentally determined human binary ments, which limits the number of interactions with resolved interactome. Predicted interfaces exhibit functional interaction interfaces. properties similar to those of known interfaces, including To study protein function on a genomic scale, especially as it relates enrichment for disease mutations and recurrent cancer to human disease, a large-scale set of protein interaction interfaces mutations. Through 2,164 de novo mutagenesis experiments, is needed. Thus far, computational methods such as docking10 and we show that mutations of predicted and known interface homology modeling11 have been employed to predict the atomic- residues disrupt interactions at a similar rate and much more level bound conformations of interactions whose experimental frequently than mutations outside of predicted interfaces. To spur functional genomic studies, Interactome INSIDER structures have not yet been determined. However, docked models (http://interactomeinsider.yulab.org) enables users to are not yet available on a large scale; and while homology modeling 12 identify whether variants or disease mutations are enriched has been used to produce models at scale , it is only amenable to in known and predicted interaction interfaces at various interactions with structural templates (<5% of known interactions). resolutions. Users may explore known population variants, Together, cocrystal structures and homology models comprise the disease mutations, and somatic cancer mutations, or they may currently available precalculated sources of structural interactomes, upload their own set of mutations for this purpose. covering only ~6% of all known interactions (Fig. 1a,b). Here, we present Interactome INSIDER (integrated structural interactome and genomic data browser), a tool for functional Protein–protein interactions facilitate much of known cellular exploration of human disease on a genomic scale (http://inter- function. Recent efforts to experimentally determine protein actomeinsider.yulab.org). Interactome INSIDER is based on a interactomes in human1 and model organisms2–4, in addition to structurally resolved, proteome-wide human interactome. We literature curation of small-scale interaction assays5, have dramat- assembled this resource by building an interactome-wide set of ically increased the scale of known interactome networks. Studies protein interaction interfaces at the highest resolution possible of these interactomes have allowed researchers to elucidate how for each interaction. We compiled structural interactomes by modes of evolution affect the functional fates of paralogs4 and calculating interfaces in experimental cocrystal structures and to examine, on a genomic scale, network interconnectivities that homology models, when available. For the remaining ~94% of determine cellular functions and disease states6. interactions, we applied a machine-learning framework to predict While simply knowing which proteins interact with each other partner-specific interfaces by applying recent advances in coevo- provides valuable information to spur functional studies, far more lution- and docking-based feature construction13,14. Interactome specific hypotheses can be tested if the spatial contacts of inter- INSIDER combines predicted interaction interfaces for 185,957 acting proteins are known7. In the study of human disease, it has previously unresolved interactions (including the full human 1Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA. 2Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA. 3Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, USA. 4Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA. 5Department of Medicine, Weill Cornell College of Medicine, New York, New York, USA. 6These authors contributed equally to this work. Correspondence should be addressed to H.Y. ([email protected]). RECEIVED 24 JANUARY; ACCEPTED 22 OCTOBER; PUBLISHED oNLINE 1 JANUARY 2018; DOI:10.1038/NMETH.4540 NATURE METHODS | VOL.15 NO.2 | FEBRUARY 2018 | 107 RESOURCE interactome and seven commonly studied model organisms) with a 120,000 disease mutations and functional annotations in an interactive Unresolved toolbox designed to spur functional genomics research. It allows 110,000 Homology modeled Cocrystalized users to find enrichment of disease mutations at different scales: 100,000 in protein interaction domains, in residues, and through atomic 3D clustering in protein interfaces. 90,000 80,000 RESULTS To build Interactome INSIDER, we first constructed an inter- 70,000 actome-wide set of protein interaction interfaces. While there 60,000 are well-established methods for predicting whether two pro- 15,16 50,000 teins interact , we focused on interactions that have been Protein interactions experimentally determined, but whose interfaces are unknown 40,000 (Supplementary Note 1). For this task, there is a rich literature exploring the potential of many structural, evolutionary, and 30,000 docking-based methods to predict protein interaction interfaces. 20,000 However, so far none of these methods have been used to produce a whole-interactome data set of protein interaction interfaces 10,000 (Supplementary Note 2). 0 We used ECLAIR (ensemble classifier learning algorithm to E. coli predict interface residues), a unified machine-learning frame- H. sapiens C. elegansA. thaliana S. pombe S. cerevisiae M. musculus work, to predict the interfaces of protein interactions. ECLAIR D. melanogaster leverages several complementary and proven classification Species features, including sequence-based biophysical features, struc- b S. cerevisiae tural features, and recently proposed features for predicting A. thaliana M. musculus binding partner-specific interfaces, including coevolution- D. melanogaster 736 C. elegans 17,18 14 769 ary and docking-based metrics (Supplementary Note 3; S. pombe Supplementary Figs. 1 and 2). Unfortunately, many protein–pro- tein interactions have missing features (especially structural fea- 931 E. coli tures). In fact, this type of nonrandom missing-feature problem is present in many biological prediction studies and cannot be adequately resolved by commonly used imputation methods. To address this issue, ECLAIR is structured as an ensemble of eight 4,150 independent classifiers, each of which covers a common case of H. sapiens feature availability. This unique structure of ECLAIR enables it to be applied to any interaction while using the most informative Figure 1 | The current size of structural interactomes. (a) The plot shows subset of available features for that interaction (Supplementary the coverage (number of protein interactions) of known high-quality binary interactomes with precomputed cocomplexed protein structures. Notes 4 and 5; Supplementary Figs. 3 and 4). (b) The number of interactions from the eight largest interactomes with We comprehensively optimized hyperparameters for ECLAIR experimentally solved structures. using a recently published Bayesian method, the tree-structured Parzen estimator approach (TPE)19, which allowed us to simulta- neously tune up to eight hyperparameters for each subclassifier. seven model organisms and human (including all 122,647 human We trained and tested each ECLAIR subclassifier using a set of experimentally determined binary interactions reported in major known protein interaction interfaces, and we observed that inter- databases; see Online Methods). faces can be predicted by the single, top-performing subclassifier available for each residue. Subclassifier performance increases with Comprehensive evaluation of predicted interfaces the number of features used. We observe an area under the ROC We established that our predictions are of high quality through curve (AUROC) of 0.64 for our top-sequence-only subclassifier and both machine learning and biological evaluation. We first AUROC of 0.80 for our top subclassifier using both sequence and evaluated the trade-offs between false-positive rate and true- structural features. In total, we used ECLAIR to predict the inter- positive rate and between precision and recall for each of the eight faces of 185,957 interactions with previously unknown interfaces, independent subclassifiers that compose ECLAIR. As expected, including for 115,576 human interactions (Supplementary Fig. 5). we find that as more informative features are added to subse- Specifically, residues classified by ECLAIR with
Recommended publications
  • Deficiency in Protein Tyrosine Phosphatase PTP1B Shortens Lifespan and Leads to Development of Acute Leukemia
    Author Manuscript Published OnlineFirst on November 9, 2017; DOI: 10.1158/0008-5472.CAN-17-0946 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Deficiency in protein tyrosine phosphatase PTP1B shortens lifespan and leads to development of acute leukemia Samantha Le Sommer1, Nicola Morrice1, Martina Pesaresi1, Dawn Thompson1, Mark A. Vickers1, Graeme I. Murray1, Nimesh Mody1, Benjamin G Neel2, Kendra K Bence3,4, Heather M Wilson1*, & Mirela Delibegovic1* 1Institute of Medical Sciences, University of Aberdeen, United Kingdom 2Laura and Isaac Perlmutter Cancer Center, New York University Langone Medical Center, New York University, New York, New York 10016, USA. 3 Dept. of Biomedical Sciences, University of Pennsylvania School of Veterinary Medicine, Philadelphia, USA. 4 Current address: Internal Medicine Research Unit (IMRU), Pfizer Inc, 610 Main St. Cambridge, MA 02139 USA *corresponding authors: Prof. Mirela Delibegovic ([email protected]) and Dr. Heather Wilson ([email protected] ) Running Title: PTP1B deficiency shortens lifespan and leads to leukaemia Key Words: PTP1B, lifespan, leukaemia, myeloid, STAT3 Additional information: This work was performed with the funds from the Wellcome Trust ISSF grant to M. Delibegovic and BHF project grant to M. Delibegovic (PG/11/8/28703). S. Le Sommer is a recipient of the University of Aberdeen Institute of Medical Sciences PhD studentship. Conflict of interest: Authors declare there are no conflicts of interests. Word Count: 5,000 Figure Count: 6 Figures, 1 Table (Supplemental Figures: 6, Supplemental Tables: 4) 1 Downloaded from cancerres.aacrjournals.org on September 29, 2021. © 2017 American Association for Cancer Research.
    [Show full text]
  • Expanding the Genetic Basis of Copy Number Variation in Familial Breast
    Masson et al. Hereditary Cancer in Clinical Practice 2014, 12:15 http://www.hccpjournal.com/content/12/1/15 RESEARCH Open Access Expanding the genetic basis of copy number variation in familial breast cancer Amy L Masson1,4, Bente A Talseth-Palmer1,4, Tiffany-Jane Evans1,4, Desma M Grice1,2, Garry N Hannan2 and Rodney J Scott1,3,4* Abstract Introduction: Familial breast cancer (fBC) is generally associated with an early age of diagnosis and a higher frequency of disease among family members. Over the past two decades a number of genes have been identified that are unequivocally associated with breast cancer (BC) risk but there remain a significant proportion of families that cannot be accounted for by these genes. Copy number variants (CNVs) are a form of genetic variation yet to be fully explored for their contribution to fBC. CNVs exert their effects by either being associated with whole or partial gene deletions or duplications and by interrupting epigenetic patterning thereby contributing to disease development. CNV analysis can also be used to identify new genes and loci which may be associated with disease risk. Methods: The Affymetrix Cytogenetic Whole Genome 2.7 M (Cyto2.7 M) arrays were used to detect regions of genomic re-arrangement in a cohort of 129 fBC BRCA1/BRCA2 mutation negative patients with a young age of diagnosis (<50 years) compared to 40 unaffected healthy controls (>55 years of age). Results: CNV analysis revealed the presence of 275 unique rearrangements that were not present in the control population suggestive of their involvement in BC risk.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Transcriptome-Wide Profiling of Cerebral Cavernous Malformations
    www.nature.com/scientificreports OPEN Transcriptome-wide Profling of Cerebral Cavernous Malformations Patients Reveal Important Long noncoding RNA molecular signatures Santhilal Subhash 2,8, Norman Kalmbach3, Florian Wegner4, Susanne Petri4, Torsten Glomb5, Oliver Dittrich-Breiholz5, Caiquan Huang1, Kiran Kumar Bali6, Wolfram S. Kunz7, Amir Samii1, Helmut Bertalanfy1, Chandrasekhar Kanduri2* & Souvik Kar1,8* Cerebral cavernous malformations (CCMs) are low-fow vascular malformations in the brain associated with recurrent hemorrhage and seizures. The current treatment of CCMs relies solely on surgical intervention. Henceforth, alternative non-invasive therapies are urgently needed to help prevent subsequent hemorrhagic episodes. Long non-coding RNAs (lncRNAs) belong to the class of non-coding RNAs and are known to regulate gene transcription and involved in chromatin remodeling via various mechanism. Despite accumulating evidence demonstrating the role of lncRNAs in cerebrovascular disorders, their identifcation in CCMs pathology remains unknown. The objective of the current study was to identify lncRNAs associated with CCMs pathogenesis using patient cohorts having 10 CCM patients and 4 controls from brain. Executing next generation sequencing, we performed whole transcriptome sequencing (RNA-seq) analysis and identifed 1,967 lncRNAs and 4,928 protein coding genes (PCGs) to be diferentially expressed in CCMs patients. Among these, we selected top 6 diferentially expressed lncRNAs each having signifcant correlative expression with more than 100 diferentially expressed PCGs. The diferential expression status of the top lncRNAs, SMIM25 and LBX2-AS1 in CCMs was further confrmed by qRT-PCR analysis. Additionally, gene set enrichment analysis of correlated PCGs revealed critical pathways related to vascular signaling and important biological processes relevant to CCMs pathophysiology.
    [Show full text]
  • Systems Biology: Tracking the Protein–Metabolite Interactome
    RESEARCH HIGHLIGHTS SYSTEMS BIOLOGY Tracking the protein–metabolite interactome A method combining limited proteolysis with mass spectrometry systematically detects protein–metabolite interactions. The genome, the transcriptome and the proteome all receive a lot of attention, but there are also multitudes of small molecules in a cell, collectively making up the chemically diverse metabolome. Metabolites wear many hats, serving as enzyme substrates and products, cofac- tors, allosteric regulators, and as mediators of protein-complex assembly. But despite their essential roles in biological processes, their interactions with proteins—which are often tran- sient and low-affinity—remain largely a mystery. “Understanding how these interactions occur on a global scale is essential to understand mechanisms of cellular adaptation and ecosystems’ dynamics,” says Paola Picotti of ETH Zurich in Switzerland. She and her colleagues recently designed a method to systematically discover which proteins bind to a metabolite of interest. The approach may also be useful for drug discovery, as a way to identify druggable sites in proteins and to test for off-target effects. To identify protein–metabolite interactions in Escherichia coli, Picotti’s team treated a whole- cell lysate with a metabolite of interest then added a low amount of the broad-specificity pro- tease proteinase K for a short period of time, all under native conditions. This ‘limited prote- olysis’ approach generates structure-specific protein fragments—metabolite binding can block proteinase K cleavage at locations which would otherwise be severed. Switching to denaturing conditions, the researchers then used the enzyme trypsin to completely digest the metabolite- treated sample and a reference untreated sample, generating peptides for label-free quantitative mass spectrometry analysis, and compared the resulting differential peptide spectral patterns.
    [Show full text]
  • Chromatin Immunoprecipitation and DNA Sequencing Identified A
    CANCER GENOMICS & PROTEOMICS 15 : 165-174 (2018) doi:10.21873/cgp.20074 Chromatin Immunoprecipitation and DNA Sequencing Identified a LIMS1/ILK Pathway Regulated by LMO1 in Neuroblastoma NORIHISA SAEKI 1* , AKIRA SAITO 2, YUKI SUGAYA 2, MITSUHIRO AMEMIYA 2, HIROE ONO 1, RIE KOMATSUZAKI 3, KAZUYOSHI YANAGIHARA 4 and HIROKI SASAKI 3 1Division of Genetics, National Cancer Center Research Institute, Tokyo, Japan; 2Statistical Genetics Analysis Division, StaGen Co. Ltd., Tokyo, Japan; 3Department of Translational Oncology, National Cancer Center Research Institute, Tokyo, Japan; 4Division of Pathology, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center Hospital East, Chiba, Japan Abstract. Background/Aim: Overall survival for the high- malignancies in patients younger than 15, it is a cause for risk group of neuroblastoma (NB) remains at 40-50%. An approximately 15% of all pediatric cancer deaths (1). NB integrative genomics study revealed that LIM domain only 1 cells develop from a sympathicoadrenal linage of the neural (LMO1) encoding a transcriptional regulator to be an NB- crest and form solid tumors in the adrenal medulla or susceptibility gene with a tumor-promoting activity, that paraspinal regions, which cause variable symptoms by needs to be revealed. Materials and Methods: We conducted compressing and/or invading to adjacent organs depending chromatin immunoprecipitation and DNA sequencing on their location (1-3). NB cells are heterogenous, and analyses and cell proliferation assays on two NB cell lines. patient age at the time of diagnosis is one of the major Results: We identified three genes regulated by LMO1 in the factors that contribute to the determination of their biological cells, LIM and senescent cell antigen-like domains 1 characters; patients older than 18 months are considered to (LIMS1), Ras suppressor protein 1 (RSU1) and relaxin 2 have a poorer prognosis when associated with dissemination (RLN2).
    [Show full text]
  • The Transcriptome of Peripheral Blood Mononuclear Cells in Patients With
    Subhi et al. Immunity & Ageing (2019) 16:20 https://doi.org/10.1186/s12979-019-0160-0 RESEARCH Open Access The transcriptome of peripheral blood mononuclear cells in patients with clinical subtypes of late age-related macular degeneration Yousif Subhi1* , Marie Krogh Nielsen1, Christopher Rue Molbech1,2, Charlotte Liisborg1,2, Helle Bach Søndergaard3, Finn Sellebjerg1,3 and Torben Lykke Sørensen1,2 Abstract Background: Peripheral blood mononuclear cells (PBMCs) are implicated in the pathogenesis of age-related macular degeneration (AMD). We here mapped the global gene transcriptome of PBMCs from patients with different clinical subtypes of late AMD. Results: We sampled fresh venous blood from patients with geographic atrophy (GA) secondary to AMD without choroidal neovascularizations (n = 19), patients with neovascular AMD without GA (n =38),patientswithpolypoidal choroidal vasculopathy (PCV) (n = 19), and aged control individuals with healthy retinae (n = 20). We isolated PBMCs, extracted RNA, and used microarray to investigate gene expression. Volcano plots identified statistically significant differentially expressed genes (P < 0.05) at a high magnitude (≥30% higher/lower) for GA (62 genes), neovascular AMD (41 genes), and PCV (41 genes). These clinical subtypes differed substantially across gene expression and the following pathways identified in enrichment analyses. In a subgroup analysis, we investigated presence vs. absence of subretinal fibrosis and found 826 differentially expressed genes (≥30% higher/lower, P < 0.05) with relation to mRNA splicing, endothelial migration, and interleukin-1 signaling. Conclusions: We here map the global gene transcriptome of PBMCs related to clinical subtypes of late AMD and find evidence of subtype-specific immunological involvement. Our findings provide a transcriptomic insight into the systemic immunity associated with AMD.
    [Show full text]
  • A Multi-Scale Structural Interactome Browser for Genomic Studies
    bioRxiv preprint doi: https://doi.org/10.1101/126862; this version posted April 12, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Interactome INSIDER: a multi-scale structural interactome browser for genomic studies Michael J. Meyer1,2,3,†, Juan Felipe Beltrán1,2,†, Siqi Liang1,2,†, Robert Fragoza2,4, Aaron Rumack1,2, Jin Liang2, Xiaomu Wei2,5, and Haiyuan Yu1,2,* 1Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853, USA 2Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853, USA 3Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, 10065, USA 4Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA. 5Department of Medicine, Weill Cornell College of Medicine, New York, New York, 10065, USA. †The authors wish it to be known that, in their opinion, the first 3 authors should be regarded as joint First Authors. *To whom correspondence should be addressed. Tel: 607-255-0259; Fax: 607-255-5961; Email: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/126862; this version posted April 12, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
    [Show full text]
  • A Metabolic Modeling Approach Reveals Promising Therapeutic Targets and Antiviral Drugs to Combat COVID-19
    www.nature.com/scientificreports OPEN A metabolic modeling approach reveals promising therapeutic targets and antiviral drugs to combat COVID‑19 Fernando Santos‑Beneit 1, Vytautas Raškevičius2, Vytenis A. Skeberdis2 & Sergio Bordel 1,2* In this study we have developed a method based on Flux Balance Analysis to identify human metabolic enzymes which can be targeted for therapeutic intervention against COVID‑19. A literature search was carried out in order to identify suitable inhibitors of these enzymes, which were confrmed by docking calculations. In total, 10 targets and 12 bioactive molecules have been predicted. Among the most promising molecules we identifed Triacsin C, which inhibits ACSL3, and which has been shown to be very efective against diferent viruses, including positive‑sense single‑stranded RNA viruses. Similarly, we also identifed the drug Celgosivir, which has been successfully tested in cells infected with diferent types of viruses such as Dengue, Zika, Hepatitis C and Infuenza. Finally, other drugs targeting enzymes of lipid metabolism, carbohydrate metabolism or protein palmitoylation (such as Propylthiouracil, 2‑Bromopalmitate, Lipofermata, Tunicamycin, Benzyl Isothiocyanate, Tipifarnib and Lonafarnib) are also proposed. Te COVID-19 pandemic, caused by the virus SARS-CoV-2, has resulted in a substantial increase in mortality and serious economic and social disruption worldwide1. In this context, the rapid identifcation of therapeutic molecules against SARS-CoV-2 is essential. To this aim an extensive collaboration and teamwork among research- ers of all academic disciplines is required2. Computational methods and systems biology approaches, as the one presented here, can play a signifcant role in this process of identifcation of suitable drugs.
    [Show full text]
  • Deriving Disease Modules from the Compressed Transcriptional Space Embedded in a Deep Auto-Encoder
    bioRxiv preprint doi: https://doi.org/10.1101/680983; this version posted June 24, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Title: Deriving Disease Modules from the Compressed Transcriptional Space Embedded in a Deep Auto-encoder Sanjiv K. Dwivedi1, Andreas Tjärnberg1, Jesper Tegnér2,3,4 and Mika Gustafsson1 1Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden. 2Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955–6900, Saudi Arabia. 3Unit of Computational Medicine, Department of Medicine, Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden. 4Science for Life Laboratory, Solna, Sweden. Abstract Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, commonly used to define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without assuming the prior knowledge of a biological network. To this end we train a deep auto-encoder on a large transcriptional data-set. Our hypothesis is that such modules could be discovered in the deep representations within the auto-encoder when trained to capture the variance in the input-output map of the transcriptional profiles. Using a three-layer deep auto-encoder we find a statistically significant enrichment of GWAS relevant genes in the third layer, and to a successively lesser degree in the second and first layers respectively.
    [Show full text]
  • MS-Based Interactomics: Computational Resources and Tools for Studying the Physical Interactome
    MS-based Interactomics: Computational resources and tools for studying the physical interactome ASMS Bioinformatics MS Interest Group Wednesday Evening Workshop Isabell Bludau & Bill Noble What is ‘interactomics’ and why do we discuss it? • Many MS-omics studies focus on cataloging and quantifying individual molecules of a particular type • e.g. quantitative protein or metabolite matrix • Most biological molecules don’t operate in isolation but they interact with each other • protein complexes • activity regulation via metabolite/drug binding • ‘Interactome’ = comprehensive set of molecular interactions in biological system • here we focus only on physical (not functional) interactions Current MS-based techniques for large-scale interactomics Protein-protein interaction (PPI) networks: • Affinity-purification MS (AP-MS) ➭ Interaction network • Proximity-dependent labeling: APEX, BioID Protein-protein complexes: • Protein co-fractionation MS (CoFrac-MS) ➭ Protein complexes protein intensity fractions Structural information on PPIs: • Cross-linking MS ➭ Structure: interacting protein residues Current MS-based techniques for large-scale interactomics Protein-metabolite/drug interactions: • Thermal proteome profiling (TPP) / ➭ Protein-ligand Cellular Thermal Shift Assay (CETSA) +/- drug protein interactions non-denatured • Limited proteolysis-coupled MS (LipMS) temperature Protein-RNA interactions: ➭ Protein-RNA • Protein-RNA crosslinking interactions at residue resolution Available resources and databases Databases based on: Databases: •
    [Show full text]
  • Interactome of the Hepatitis C Virus: Literature Mining with Andsystem
    G Model VIRUS-96776; No. of Pages 9 ARTICLE IN PRESS Virus Research xxx (2015) xxx–xxx Contents lists available at ScienceDirect Virus Research j ournal homepage: www.elsevier.com/locate/virusres Interactome of the hepatitis C virus: Literature mining with ANDSystem a,b a,b,c a,b Olga V. Saik , Timofey V. Ivanisenko , Pavel S. Demenkov , a,b,∗ Vladimir A. Ivanisenko a The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, 630090 Novosibirsk, Russia b PB-soft, LLC, Prospekt Lavrentyeva 10, 630090 Novosibirsk, Russia c Novosibirsk State University, Pirogova Str. 2, 630090 Novosibirsk, Russia a r t i c l e i n f o a b s t r a c t Article history: A study of the molecular genetics mechanisms of host–pathogen interactions is of paramount importance Received 15 July 2015 in developing drugs against viral diseases. Currently, the literature contains a huge amount of informa- Received in revised form 22 October 2015 tion that describes interactions between HCV and human proteins. In addition, there are many factual Accepted 3 December 2015 databases that contain experimentally verified data on HCV–host interactions. The sources of such data Available online xxx are the original data along with the data manually extracted from the literature. However, the manual analysis of scientific publications is time consuming and, because of this, databases created with such an Keywords: approach often do not have complete information. One of the most promising methods to provide actu- Hepatitis C virus ANDSystem alisation and completeness of information is text mining.
    [Show full text]