The Implications of Alternative Splicing in the ENCODE Protein Complement

Total Page:16

File Type:pdf, Size:1020Kb

The Implications of Alternative Splicing in the ENCODE Protein Complement The implications of alternative splicing in the ENCODE protein complement Michael L. Tressa,b, Pier Luigi Martellic, Adam Frankishd, Gabrielle A. Reevese, Jan Jaap Wesselinka, Corin Yeatsf, Pa´ ll Iso´ ´ lfur Olason´ g, Mario Albrechth, Hedi Hegyii, Alejandro Giorgettij, Domenico Raimondoj, Julien Lagardek, Roman A. Laskowskie, Gonzalo Lo´peza, Michael I. Sadowskil, James D. Watsone, Piero Farisellic, Ivan Rossic, Alinda Nagyi, Wang Kaig, Zenia Størlingg, Massimiliano Orsinim, Yassen Assenovh, Hagen Blankenburgh, Carola Huthmacherh, Fidel Ramírezh, Andreas Schlickerh, France Denoeudk, Phil Jonese, Samuel Kerriene, Sandra Orcharde, Stylianos E. Antonarakisn, Alexandre Reymondo, Ewan Birneye, Søren Brunakg, Rita Casadioc, Roderic Guigok,p, Jennifer Harrowd, Henning Hermjakobe, David T. Jonesl, Thomas Lengauerh, Christine A. Orengof, La´ szlo´ Patthyi, Janet M. Thorntone,q, Anna Tramontanor, and Alfonso Valenciaa,r aStructural Computational Biology Programme, Spanish National Cancer Research Centre, E-28029 Madrid, Spain; cDepartment of Biology, University of Bologna, 33-40126 Bologna, Italy; dHAVANA Group, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom; eEuropean Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom; fDepartment of Biochemistry and Molecular Biology and lBioinformatics Unit, University College London, London WC1E 6BT, United Kingdom; gCenter for Biological Sequence Analysis, BioCentrum-DTU, DK-2800 Lyngby, Denmark; hMax Planck Institute for Informatics, 66123 Saarbru¨cken, Germany; iBiological Research Center, Hungarian Academy of Sciences, 1113 Budapest, Hungary; jDepartment of Biochemical Sciences, University of Rome ‘‘La Sapienza,’’ 2-00185 Rome, Italy; kResearch Unit on Biomedical Informatics, Institut Municipal d’Investigacio´Me` dica, E-8003 Barcelona, Spain; mCenter for Advanced Studies, Research and Development in Sardinia (CRS4), 09010 Pula, Italy; nDepartment of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland; oCenter for Integrative Genomics, Genopode building, University of Lausanne, 1015 Lausanne, Switzerland; and pCentre de Regulacio´Geno`mica, Universitat Pompeu Fabra, E-08003 Barcelona, Spain Contributed by Janet M. Thornton, January 31, 2007 (sent for review October 20, 2006) Alternative premessenger RNA splicing enables genes to generate has provided us with the material to make an in-depth assess- more than one gene product. Splicing events that occur within ment of a systematically collected reference set of splice variants. protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein Results functions. Alternative splicing has been suggested as one expla- Alternative Splicing Frequency. The GENCODE set is made up of nation for the discrepancy between the number of human genes 2,608 annotated transcripts for 487 distinct loci. A total of 1,097 and functional complexity. Here, we carry out a detailed study of transcripts from 434 loci are predicted to be protein coding. the alternatively spliced gene products annotated in the ENCODE There are on average 2.53 protein coding variants per locus; 182 pilot project. We find that alternative splicing in human genes is loci have only one variant, whereas one locus, RP1–309K20.2 more frequent than has commonly been suggested, and we dem- (CPNE1) has 17 coding variants. onstrate that many of the potential alternative gene products will A total of 57.8% of the loci are annotated with alternatively have markedly different structure and function from their consti- spliced transcripts, although there are differences between target tutively spliced counterparts. For the vast majority of these alter- regions chosen manually and those chosen according to the native isoforms, little evidence exists to suggest they have a role stratified random-sampling strategy (11). The differences stem as functional proteins, and it seems unlikely that the spectrum of from gene clusters in the manually selected regions, such as the conventional enzymatic or structural functions can be substantially cluster of 31 loci that code for olfactory receptors in manual pick extended through alternative splicing. 9 from chromosome 11 (13). These olfactory receptors are recent in evolutionary origin, have a single large coding exon, and code function ͉ human ͉ isoforms ͉ splice ͉ structure for a single isoform. This means that although the 0.5% of the human genome that was selected for biological interest has 276 loci, just 52.1% of the loci have multiple variants. In contrast, the lternative mRNA splicing, the generation of a diverse range of regions that were selected in the stratified random-sampling mature RNAs, has considerable potential to expand the A process have fewer loci (158), but 68.7% of the loci have multiple cellular protein repertoire (1–3), and recent studies have estimated variants (see Fig. 1a). This number is toward the higher end of that 40–80% of multiexon human genes can produce differently previous estimates but in line with the most recent reports (14). spliced mRNAs (4, 5). The importance of alternative splicing in Analysis of the data suggests that the GENCODE-validated EVOLUTION processes such as development (6) has long been recognized, and transcripts are an underestimate of the real numbers of alter- proteins coded by alternatively spliced transcripts have been impli- cated in a number of cellular pathways (7–9). The extent of alternative splicing in eukaryotic genomes has lead to suggestions Author contributions: M.L.T., P.L.M., G.A.R., C.Y., M.A., E.B., S.B., R.C., R.G., J.H., H. Herm- that alternative splicing is key to understanding how human com- jakob, D.T.J., T.L., C.A.O., L.P., J.M.T., A.T., and A.V. designed research; M.L.T., P.L.M., A.F., plexity can be encoded by so few genes (10). G.A.R., J.J.W., C.Y., P.I´.O´ ., H. Hegyi, A.G., D.R., J.L., R.L., G.L., M.I.S., J.D.W., P.F., I.R., A.N., W.K., Z.S., M.O., Y.A., H.B., C.H., F.R., A.S., F.D., P.J., S.K., and S.O. performed research; The pilot project of the Encyclopedia of DNA Elements M.L.T., P.L.M., A.F., G.R., J.J.W., C.Y., P.I´.O´ ., H. Hegyi, M.A., A.G., D.R., J.L., R.L., M.I.S., S.E.A., (ENCODE) (11), which aims to identify all the functional J.D.W., and A.R. analyzed data; and M.L.T., P.L.M., C.Y., P.I´.O´ ., and A.V. wrote the paper. elements in the human genome, has undertaken a comprehen- The authors declare no conflict of interest. sive analysis of 44 selected regions that make up 1% of the Abbreviation: TMH, transmembrane helices. human genome. One valuable element of the project has been bTo whom correspondence may be addressed at: Spanish National Cancer Research Centres, the detailing of a reference set of manually annotated splice c. Melchor Ferna´ndez Almagro, 3, E-28029 Madrid, Spain. E-mail: [email protected]. variants by the GENCODE consortium (12). The annotation by qTo whom correspondence may be addressed. E-mail: [email protected]. the GENCODE consortium is an extension of the manually rTo whom correspondence may be addressed. E-mail: [email protected]. curated annotation by the Havana team at The Sanger Institute. This article contains supporting information online at www.pnas.org/cgi/content/full/ Although a full understanding of the functional implications 0700800104/DC1. of alternative splicing is still a long way off, the GENCODE set © 2007 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0700800104 PNAS ͉ March 27, 2007 ͉ vol. 104 ͉ no. 13 ͉ 5495–5500 Downloaded by guest on September 30, 2021 sequence sharing 3Ј exons in different reading frames. In this set, 23 separate loci code for variants with overlapping reading frames, suggesting that the frequency of variants with overlap- ping reading frames might be somewhat higher than previously thought (20). Few of the variants coded from overlapping reading frames appeared to be functional. We were able to compare nine transcripts where the human and mouse homologue had con- served exonic structure. If the two alternative reading frames evolve under functional constraints, the mutation rate for all three codon positions should be the same, and both frames should have an identical nonsynonymous substitution rate (Ka, 21). Only one of the nine pairs of transcripts had identical Ka values and equal rates of mutation for each coding position. Signal Peptides and Transmembrane Helices. Of the 1,097 tran- scripts, 219 were predicted to have signal peptides, accounting for 107 of the 434 loci. Unequivocal loss or gain of signal peptides can be seen in 12 loci, and, in these cases, localization will not be conserved between isoforms. Signal peptide loss/gain is caused by substitution of exons at the N terminus in eight of the loci. This is coherent with earlier findings (18) that showed that most signal peptide gain/loss in alternative splicing comes about Fig. 1. Isoform distribution. (a) The number of isoforms per locus in the through N-terminal exon substitution. One example is the al- manually selected regions compared with the number of isoforms per locus in ternative isoform in locus RP1-248E1.1 (MOXD1), which loses the regions selected by random stratified procedure. (b) The effect of splicing 86 N-terminal residues, including the signal peptide, with respect events on the protein sequence of alternative splice isoforms. to the principal isoform. Signal peptide loss in isoform 003 of locus AC010518.2 native splicing at the mRNA level. Although the set does include (LILRA3) is triggered by a 17 residue N-terminal insertion ahead many known isoforms and uncovers numerous previously un- of the signal peptide predicted for the principal sequence. This recognized variants, several known variants were not annotated results in the apparent internalization of the signal peptide and in the initial release.
Recommended publications
  • Deregulated Gene Expression Pathways in Myelodysplastic Syndrome Hematopoietic Stem Cells
    Leukemia (2010) 24, 756–764 & 2010 Macmillan Publishers Limited All rights reserved 0887-6924/10 $32.00 www.nature.com/leu ORIGINAL ARTICLE Deregulated gene expression pathways in myelodysplastic syndrome hematopoietic stem cells A Pellagatti1, M Cazzola2, A Giagounidis3, J Perry1, L Malcovati2, MG Della Porta2,MJa¨dersten4, S Killick5, A Verma6, CJ Norbury7, E Hellstro¨m-Lindberg4, JS Wainscoat1 and J Boultwood1 1LRF Molecular Haematology Unit, NDCLS, John Radcliffe Hospital, Oxford, UK; 2Department of Hematology Oncology, University of Pavia Medical School, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; 3Medizinische Klinik II, St Johannes Hospital, Duisburg, Germany; 4Division of Hematology, Department of Medicine, Karolinska Institutet, Stockholm, Sweden; 5Department of Haematology, Royal Bournemouth Hospital, Bournemouth, UK; 6Albert Einstein College of Medicine, Bronx, NY, USA and 7Sir William Dunn School of Pathology, University of Oxford, Oxford, UK To gain insight into the molecular pathogenesis of the the World Health Organization.6,7 Patients with refractory myelodysplastic syndromes (MDS), we performed global gene anemia (RA) with or without ringed sideroblasts, according to expression profiling and pathway analysis on the hemato- poietic stem cells (HSC) of 183 MDS patients as compared with the the French–American–British classification, were subdivided HSC of 17 healthy controls. The most significantly deregulated based on the presence or absence of multilineage dysplasia. In pathways in MDS include interferon signaling, thrombopoietin addition, patients with RA with excess blasts (RAEB) were signaling and the Wnt pathways. Among the most signifi- subdivided into two categories, RAEB1 and RAEB2, based on the cantly deregulated gene pathways in early MDS are immuno- percentage of bone marrow blasts.
    [Show full text]
  • Disruption of the Neuronal PAS3 Gene in a Family Affected with Schizophrenia D Kamnasaran, W J Muir, M a Ferguson-Smith,Dwcox
    325 ORIGINAL ARTICLE J Med Genet: first published as 10.1136/jmg.40.5.325 on 1 May 2003. Downloaded from Disruption of the neuronal PAS3 gene in a family affected with schizophrenia D Kamnasaran, W J Muir, M A Ferguson-Smith,DWCox ............................................................................................................................. J Med Genet 2003;40:325–332 Schizophrenia and its subtypes are part of a complex brain disorder with multiple postulated aetiolo- gies. There is evidence that this common disease is genetically heterogeneous, with many loci involved. See end of article for In this report, we describe a mother and daughter affected with schizophrenia, who are carriers of a authors’ affiliations t(9;14)(q34;q13) chromosome. By mapping on flow sorted aberrant chromosomes isolated from lym- ....................... phoblast cell lines, both subjects were found to have a translocation breakpoint junction between the Correspondence to: markers D14S730 and D14S70, a 683 kb interval on chromosome 14q13. This interval was found to Dr D W Cox, 8-39 Medical contain the neuronal PAS3 gene (NPAS3), by annotating the genomic sequence for ESTs and perform- Sciences Building, ing RACE and cDNA library screenings. The NPAS3 gene was characterised with respect to the University of Alberta, genomic structure, human expression profile, and protein cellular localisation to gain insight into gene Edmonton, Alberta T6G function. The translocation breakpoint junction lies within the third intron of NPAS3, resulting in the dis- 2H7, Canada; [email protected] ruption of the coding potential. The fact that the bHLH and PAS domains are disrupted from the remain- ing parts of the encoded protein suggests that the DNA binding and dimerisation functions of this Revised version received protein are destroyed.
    [Show full text]
  • Plant SET Domain-Containing Proteins: Structure, Function and Regulation
    Biochimica et Biophysica Acta 1769 (2007) 316–329 www.elsevier.com/locate/bbaexp Review Plant SET domain-containing proteins: Structure, function and regulation Danny W-K Ng, Tao Wang, Mahesh B. Chandrasekharan 1, Rodolfo Aramayo, ⁎ Sunee Kertbundit 2, Timothy C. Hall Institute of Developmental and Molecular Biology and Department of Biology, Texas A&M University, College Station, TX 77843-3155, USA Received 27 October 2006; received in revised form 3 April 2007; accepted 4 April 2007 Available online 12 April 2007 Abstract Modification of the histone proteins that form the core around which chromosomal DNA is looped has profound epigenetic effects on the accessibility of the associated DNA for transcription, replication and repair. The SET domain is now recognized as generally having methyltransferase activity targeted to specific lysine residues of histone H3 or H4. There is considerable sequence conservation within the SET domain and within its flanking regions. Previous reviews have shown that SET proteins from Arabidopsis and maize fall into five classes according to their sequence and domain architectures. These classes generally reflect specificity for a particular substrate. SET proteins from rice were found to fall into similar groupings, strengthening the merit of the approach taken. Two additional classes, VI and VII, were established that include proteins with truncated/ interrupted SET domains. Diverse mechanisms are involved in shaping the function and regulation of SET proteins. These include protein–protein interactions through both intra- and inter-molecular associations that are important in plant developmental processes, such as flowering time control and embryogenesis. Alternative splicing that can result in the generation of two to several different transcript isoforms is now known to be widespread.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Cryptic Inoviruses Revealed As Pervasive in Bacteria and Archaea Across Earth’S Biomes
    ARTICLES https://doi.org/10.1038/s41564-019-0510-x Corrected: Author Correction Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes Simon Roux 1*, Mart Krupovic 2, Rebecca A. Daly3, Adair L. Borges4, Stephen Nayfach1, Frederik Schulz 1, Allison Sharrar5, Paula B. Matheus Carnevali 5, Jan-Fang Cheng1, Natalia N. Ivanova 1, Joseph Bondy-Denomy4,6, Kelly C. Wrighton3, Tanja Woyke 1, Axel Visel 1, Nikos C. Kyrpides1 and Emiley A. Eloe-Fadrosh 1* Bacteriophages from the Inoviridae family (inoviruses) are characterized by their unique morphology, genome content and infection cycle. One of the most striking features of inoviruses is their ability to establish a chronic infection whereby the viral genome resides within the cell in either an exclusively episomal state or integrated into the host chromosome and virions are continuously released without killing the host. To date, a relatively small number of inovirus isolates have been extensively studied, either for biotechnological applications, such as phage display, or because of their effect on the toxicity of known bacterial pathogens including Vibrio cholerae and Neisseria meningitidis. Here, we show that the current 56 members of the Inoviridae family represent a minute fraction of a highly diverse group of inoviruses. Using a machine learning approach lever- aging a combination of marker gene and genome features, we identified 10,295 inovirus-like sequences from microbial genomes and metagenomes. Collectively, our results call for reclassification of the current Inoviridae family into a viral order including six distinct proposed families associated with nearly all bacterial phyla across virtually every ecosystem.
    [Show full text]
  • The Diverse Roles of TIMP-3: Insights Into Degenerative Diseases of the Senescent Retina and Brain
    cells Review The Diverse Roles of TIMP-3: Insights into Degenerative Diseases of the Senescent Retina and Brain Jennifer M. Dewing 1, Roxana O. Carare 1, Andrew J. Lotery 1,2 and J. Arjuna Ratnayaka 1,* 1 Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, MP806, Tremona Road, Southampton SO16 6YD, UK; [email protected] (J.M.D.); [email protected] (R.O.C.); [email protected] (A.J.L.) 2 Eye Unit, University Hospital Southampton NHS Foundation Trust, Southampton SO16 6YD, UK * Correspondence: [email protected]; Tel.: +44-238120-8183 Received: 13 November 2019; Accepted: 19 December 2019; Published: 21 December 2019 Abstract: Tissue inhibitor of metalloproteinase-3 (TIMP-3) is a component of the extracellular environment, where it mediates diverse processes including matrix regulation/turnover, inflammation and angiogenesis. Rare TIMP-3 risk alleles and mutations are directly linked with retinopathies such as age-related macular degeneration (AMD) and Sorsby fundus dystrophy, and potentially, through indirect mechanisms, with Alzheimer’s disease. Insights into TIMP-3 activities may be gleaned from studying Sorsby-linked mutations. However, recent findings do not fully support the prevailing hypothesis that a gain of function through the dimerisation of mutated TIMP-3 is responsible for retinopathy. Findings from Alzheimer’s patients suggest a hitherto poorly studied relationship between TIMP-3 and the Alzheimer’s-linked amyloid-beta (Aβ) proteins that warrant further scrutiny. This may also have implications for understanding AMD as aged/diseased retinae contain high levels of Aβ. Findings from TIMP-3 knockout and mutant knock-in mice have not led to new treatments, particularly as the latter does not satisfactorily recapitulate the Sorsby phenotype.
    [Show full text]
  • Learning Protein Constitutive Motifs from Sequence Data Je´ Roˆ Me Tubiana, Simona Cocco, Re´ Mi Monasson*
    TOOLS AND RESOURCES Learning protein constitutive motifs from sequence data Je´ roˆ me Tubiana, Simona Cocco, Re´ mi Monasson* Laboratory of Physics of the Ecole Normale Supe´rieure, CNRS UMR 8023 & PSL Research, Paris, France Abstract Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (a-helixes and b-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and ’turning up’ or ’turning down’ the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype–phenotype relationship for protein families. DOI: https://doi.org/10.7554/eLife.39397.001 Introduction In recent years, the sequencing of many organisms’ genomes has led to the collection of a huge number of protein sequences, which are catalogued in databases such as UniProt or PFAM Finn et al., 2014). Sequences that share a common ancestral origin, defining a family (Figure 1A), *For correspondence: are likely to code for proteins with similar functions and structures, providing a unique window into [email protected] the relationship between genotype (sequence content) and phenotype (biological features).
    [Show full text]
  • DECIPHER: Harnessing Local Sequence Context to Improve Protein Multiple Sequence Alignment Erik S
    Wright BMC Bioinformatics (2015) 16:322 DOI 10.1186/s12859-015-0749-z RESEARCH ARTICLE Open Access DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment Erik S. Wright1,2 Abstract Background: Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments. Results: Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on largesequencesets. Conclusions: Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins.
    [Show full text]
  • Automethylation of PRC2 Promotes H3K27 Methylation and Is Impaired in H3K27M Pediatric Glioma
    Downloaded from genesdev.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Automethylation of PRC2 promotes H3K27 methylation and is impaired in H3K27M pediatric glioma Chul-Hwan Lee,1,2,7 Jia-Ray Yu,1,2,7 Jeffrey Granat,1,2,7 Ricardo Saldaña-Meyer,1,2 Joshua Andrade,3 Gary LeRoy,1,2 Ying Jin,4 Peder Lund,5 James M. Stafford,1,2,6 Benjamin A. Garcia,5 Beatrix Ueberheide,3 and Danny Reinberg1,2 1Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016, USA; 2Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA; 3Proteomics Laboratory, New York University School of Medicine, New York, New York 10016, USA; 4Shared Bioinformatics Core, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; 5Department of Biochemistry and Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA The histone methyltransferase activity of PRC2 is central to the formation of H3K27me3-decorated facultative heterochromatin and gene silencing. In addition, PRC2 has been shown to automethylate its core subunits, EZH1/ EZH2 and SUZ12. Here, we identify the lysine residues at which EZH1/EZH2 are automethylated with EZH2-K510 and EZH2-K514 being the major such sites in vivo. Automethylated EZH2/PRC2 exhibits a higher level of histone methyltransferase activity and is required for attaining proper cellular levels of H3K27me3. While occurring inde- pendently of PRC2 recruitment to chromatin, automethylation promotes PRC2 accessibility to the histone H3 tail. Intriguingly, EZH2 automethylation is significantly reduced in diffuse intrinsic pontine glioma (DIPG) cells that carry a lysine-to-methionine substitution in histone H3 (H3K27M), but not in cells that carry either EZH2 or EED mutants that abrogate PRC2 allosteric activation, indicating that H3K27M impairs the intrinsic activity of PRC2.
    [Show full text]
  • Suramin Inhibits Osteoarthritic Cartilage Degradation by Increasing Extracellular Levels
    Molecular Pharmacology Fast Forward. Published on August 10, 2017 as DOI: 10.1124/mol.117.109397 This article has not been copyedited and formatted. The final version may differ from this version. MOL #109397 Suramin inhibits osteoarthritic cartilage degradation by increasing extracellular levels of chondroprotective tissue inhibitor of metalloproteinases 3 (TIMP-3). Anastasios Chanalaris, Christine Doherty, Brian D. Marsden, Gabriel Bambridge, Stephen P. Wren, Hideaki Nagase, Linda Troeberg Arthritis Research UK Centre for Osteoarthritis Pathogenesis, Kennedy Institute of Downloaded from Rheumatology, University of Oxford, Roosevelt Drive, Headington, Oxford OX3 7FY, UK (A.C., C.D., G.B., H.N., L.T.); Alzheimer’s Research UK Oxford Drug Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK (S.P.W.); Structural Genomics Consortium, molpharm.aspetjournals.org University of Oxford, Old Road Campus Research Building, Old Road Campus, Roosevelt Drive, Headington, Oxford, OX3 7DQ (BDM). at ASPET Journals on September 29, 2021 1 Molecular Pharmacology Fast Forward. Published on August 10, 2017 as DOI: 10.1124/mol.117.109397 This article has not been copyedited and formatted. The final version may differ from this version. MOL #109397 Running title: Repurposing suramin to inhibit osteoarthritic cartilage loss. Corresponding author: Linda Troeberg Address: Kennedy Institute of Rheumatology, University of Oxford, Roosevelt Drive, Headington, Oxford OX3 7FY, UK Phone number: +44 (0)1865 612600 E-mail: [email protected] Downloaded
    [Show full text]
  • Multi-Class Protein Classification Using Adaptive Codes
    Journal of Machine Learning Research 8 (2007) 1557-1581 Submitted 8/06; Revised 4/07; Published 7/07 Multi-class Protein Classification Using Adaptive Codes Iain Melvin∗ [email protected] NEC Laboratories of America Princeton, NJ 08540, USA Eugene Ie∗ [email protected] Department of Computer Science and Engineering University of California San Diego, CA 92093-0404, USA Jason Weston [email protected] NEC Laboratories of America Princeton, NJ 08540, USA William Stafford Noble [email protected] Department of Genome Sciences Department of Computer Science and Engineering University of Washington Seattle, WA 98195, USA Christina Leslie [email protected] Center for Computational Learning Systems Columbia University New York, NY 10115, USA Editor: Nello Cristianini Abstract Predicting a protein’s structural class from its amino acid sequence is a fundamental problem in computational biology. Recent machine learning work in this domain has focused on develop- ing new input space representations for protein sequences, that is, string kernels, some of which give state-of-the-art performance for the binary prediction task of discriminating between one class and all the others. However, the underlying protein classification problem is in fact a huge multi- class problem, with over 1000 protein folds and even more structural subcategories organized into a hierarchy. To handle this challenging many-class problem while taking advantage of progress on the binary problem, we introduce an adaptive code approach in the output space of one-vs- the-rest prediction scores. Specifically, we use a ranking perceptron algorithm to learn a weight- ing of binary classifiers that improves multi-class prediction with respect to a fixed set of out- put codes.
    [Show full text]
  • Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?
    Health Science Campus FINAL APPROVAL OF DISSERTATION Doctor of Philosophy in Biomedical Science (Cancer Biology) Aneuploidy: Using genetic instability to preserve a haploid genome? Submitted by: Ramona Ramdath In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomedical Science Examination Committee Signature/Date Major Advisor: David Allison, M.D., Ph.D. Academic James Trempe, Ph.D. Advisory Committee: David Giovanucci, Ph.D. Randall Ruch, Ph.D. Ronald Mellgren, Ph.D. Senior Associate Dean College of Graduate Studies Michael S. Bisesi, Ph.D. Date of Defense: April 10, 2009 Aneuploidy: Using genetic instability to preserve a haploid genome? Ramona Ramdath University of Toledo, Health Science Campus 2009 Dedication I dedicate this dissertation to my grandfather who died of lung cancer two years ago, but who always instilled in us the value and importance of education. And to my mom and sister, both of whom have been pillars of support and stimulating conversations. To my sister, Rehanna, especially- I hope this inspires you to achieve all that you want to in life, academically and otherwise. ii Acknowledgements As we go through these academic journeys, there are so many along the way that make an impact not only on our work, but on our lives as well, and I would like to say a heartfelt thank you to all of those people: My Committee members- Dr. James Trempe, Dr. David Giovanucchi, Dr. Ronald Mellgren and Dr. Randall Ruch for their guidance, suggestions, support and confidence in me. My major advisor- Dr. David Allison, for his constructive criticism and positive reinforcement.
    [Show full text]