Characterizing of Functional Human Coding RNA Editing from Evolutionary, Structural, and Dynamic Perspectives Oz Solomon,1,2 Lily Bazak,2 Erez Y
Total Page:16
File Type:pdf, Size:1020Kb
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Characterizing of functional human coding RNA editing from evolutionary, structural, and dynamic perspectives Oz Solomon,1,2 Lily Bazak,2 Erez Y. Levanon,2 Ninette Amariglio,1 Ron Unger,2 Gideon Rechavi,1,3 and Eran Eyal1* 1 Cancer Research Center, Chaim Sheba Medical Center, Tel Hashomer 52621, Ramat Gan, Israel 2 The Everard & Mina Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel 3 Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel ABSTRACT A-to-I RNA editing has been recently shown to be a widespread phenomenon with millions of sites spread in the human transcriptome. However, only few are known to be located in coding sequences and modify the amino acid sequence of the protein product. Here, we used high-throughput data, variant prediction tools, and protein structural information in order to find structural and functional preferences for coding RNA editing. We show that RNA editing has a unique pattern of amino acid changes characterized by enriched stop-to-tryptophan changes, positive-to-neutral and neutral-to-positive charge changes. RNA editing tends to have stronger structural effect than equivalent A-to-G SNPs but weaker effect than random A-to-G mutagenesis events. Sites edited at low level tend to be located at conserved positions with stronger predicted delete- rious effect on proteins comparing to sites edited at high frequencies. Lowly edited sites tend to destabilize the protein structure and affect amino acids with larger number of intra-molecular contacts. Still, some highly edited sites are predicted also to prominently affect structure and tend to be located at critical positions of the protein matrix and are likely to be functionally important. Using our pipeline, we identify and discuss several novel putative functional coding changing editing sites in the genes COPA (I164V), GIPC1 (T62A), ZN358 (K382R), and CCNI (R75G). Proteins 2014; 82:3117–3131. VC 2014 Wiley Periodicals, Inc. Key words: RNA editing; ADAR; thermostability; RNA modification; RNA-seq; protein structure analysis. INTRODUCTION Additional Supporting Information may be found in the online version of this A-to-I RNA editing (de-amination of adenosine to article. inosine) is a widespread post transcriptional modifica- Abbreviations: AA, amino acid; ANM, anisotropic network model; CDS, 1–10 coding sequence; DB, database; dbNSFP, database for non-synonymous tion catalyzed by the ADAR (adenosine deaminase SNVs’ functional predictions; MPS, massively parallel sequencing; PDB, pro- acting on RNA) family of enzymes. In human, the family tein data bank; PPI, protein–protein interaction; RNA-seq, RNA sequencing; includes ADAR, ADARB1, and ADARB2 (synonymous: SNP, simple nucleotide polymorphism; WTS, whole transcriptome sequencing. ADAR1, ADAR2, and ADAR3, respectively).6,11–14 Changes in RNA editing are essential to normal devel- Grant sponsor: Flight Attendant Medical Research Institute (FAMRI), Israeli Cen- ters of Research Excellence (I-CORE), ISF Gene Regulation in Complex Human opment. Adar knock-out mice died with disseminated Disease Center, ISF Chromatin and RNA Gene Regulation Center. G.R. holds the apoptosis at embryonic stage, reflecting Adar’s important Djerassi Chair in Oncology, Tel Aviv University, Israel. 15,16 O.S. and E.E. performed the bioinformatics and statistical assays, analyzed the role in early hematopoiesis in the embryonic liver. data, and wrote the manuscript. L.B. and E.Y.L. helped analyzing the human Adarb1 knock-out mice suffer from epileptic seizures and bodyMap data, conceived ideas and helped writing the manuscript. R.U., N.A., died at very early age.17,18 Editing was found to be con- and G.R. conceived ideas and helped writing the article. 19 *Correspondence to: Eran Eyal, Cancer research Center, Sheba Medical Center, 2 nected with several human disorders and diseases. Sheba Rd., Ramat Gan, Israel 52621. E-mail: [email protected] Phone: Abnormal editing at the 5-hydroxytryptamine (sero- 1972-3-5308148; Fax: 1972-3-5305942. Received 15 April 2014; Revised 28 July 2014; Accepted 11 August 2014 tonin) receptor 2C (HTR2C) was found to be linked Published online 18 August 2014 in Wiley Online Library (wileyonlinelibrary. with depression, schizophrenia and suicide.19 Defective com). DOI: 10.1002/prot.24672 VC 2014 WILEY PERIODICALS, INC. PROTEINS 3117 O. Solomon et al. editing at GRIA2 mRNA was connected with amyotro- experimental work is still needed in order to explore the phic lateral sclerosis (ALS) etiology.20 Mutations in editing dependent regulation in these examples. ADAR were associated with up-regulation of the inter- feron pathway and found to cause Aicardi-Goutie`res syndrome.21 Altered editing levels were also detected in different types of cancers.22,23 The role of abnor- METHODS mal editing was mostly studied in brain tumors, in 24 Construction of a dataset of coding RNA particular in glioblastoma multiforme (GBM). ADAR editing sites in human was found to play an important role also in chronic myeloid leukemia (CML) tumorigenesis.25 RNA editing A-to-I RNA editing sites were obtained from recent in the coding region of AZIN1 was found to contrib- studies which identified editing events based on RNA- ute to human hepatocellular carcinoma (HCC) seq.1–5 In the current study RNA editing sites were not pathogenesis.26 identified ab initio and we associate structural and func- Inosine is recognized as guanosine by cellular machi- tional features only to previously characterized editing neries such as splicing and translation. In addition, cur- sites, gathered from recent publications and updated rent sequencing methods identify inosine as guanosine. RNA editing databases (RADAR36 and DARNED41). This makes the identification of editing sites an appeal- These RNA editing sites were intersected using bed- ing bioinformatics problem and a body of recent studies tools42 with coding regions of RefSeq transcripts43 have been devoted to detection of additional editing downloaded from UCSC table browser.44 Non- sites.1–5,8–10,27,28 It has been recently estimated that synonymous (NS) coding RNA editing were retained for millions of RNA editing sites are found in the human downstream analysis and intersected with the database transcriptome.1,27 Most of the RNA editing sites in for nonsynonymous SNVs’ functional predictions human are located in Alu repeats at non-coding regions (dbNSFP)45 which gathers information about NS amino (3’UTRs and introns).8–10 acid (AA) changes, evolutionary conservation (Phylop ADAR and its RNA editing activity were shown to be score) and prediction of variant effect based on different connected with splicing and gene expression regula- tools (SIFT, polyPhen2, LRT, MutationTaster, Mutatio- tion.29–35 Recently, we detected changes in splicing pat- nAssessor, and FATHMM). For each editing site, the tern on transcriptome-wide scale following knock-down number of methods predicting it to be deleterious was (KD) of ADAR.29 According to current estimations registered. Sites predicted to be deleterious based on (RADAR36 database) there are 2411 editing sites which more methods, were considered as having stronger prob- are located in coding sequences (CDS),36,37 only few of ability for deleterious effect of the editing (see Support- which are well studied and known to affect the function ing Information Table S7 for CDS editing sites mapped of the protein products (such as the sites in GRIA2,17 to dbNSFP information). Equivalent dbNSFP data KCNA1,39 or AZIN126). Most of these sites, however, are regarding simple nucleotide polymorphism (SNPs) and poorly studied and have not been carefully examined in random sites were collected. All coding RNA editing sites the context of the protein. The gap between the charac- were manually intersected with dbSNP. Sites reported to terization of the sites at the RNA level and the ability to be SNP were excluded, unless the reported source data understand the implications on the protein product is for the SNP is cDNA. Such SNPs can be RNA editing clearly increasing in the massively parallel sequencing events.46,47 All NS RNA editing sites reported in this (MPS) era. manuscript, were also examined in the exome sequencing In the current study, we examined the evolutionary, project data (NHLBI ESP,48 which includes variants structural and functional effects of coding RNA editing based on exome sequencing data of 6500 individuals) to using MPS data from various sources including Illumina verify that they do not represent genomic polymorphism. human bodyMap.40 For the first time, we combined the For editing sites with more than two different reported editing data with available protein structural data in editing levels in the Illumina’s bodyMap (e. g. when the order to investigate common structural and functional editing level is reported for more than one tissue sam- preferences of human coding RNA editing sites. We were ple), editing level associated with the sample having the able to find statistical significant preferences for the types maximal coverage (maximal A 1 G reads number) was of amino acid changes induced by editing sites within taken for downstream analysis. We used only high qual- coding regions. Most editing sites are edited at low level ity reads (edited position having Phred quality score and we show correlation between the editing level and >30) and therefore our calculated editing levels may be the structural significance of the amino acid