STRUCTURE O FUNCTION O BIOINFORMATICS

Characterizing of functional human coding RNA editing from evolutionary, structural, and dynamic perspectives Oz Solomon,1,2 Lily Bazak,2 Erez Y. Levanon,2 Ninette Amariglio,1 Ron Unger,2 Gideon Rechavi,1,3 and Eran Eyal1*

1 Cancer Research Center, Chaim Sheba Medical Center, Tel Hashomer 52621, Ramat Gan, Israel 2 The Everard & Mina Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel 3 Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel

ABSTRACT

A-to-I RNA editing has been recently shown to be a widespread phenomenon with millions of sites spread in the human transcriptome. However, only few are known to be located in coding sequences and modify the amino acid sequence of the product. Here, we used high-throughput data, variant prediction tools, and protein structural information in order to find structural and functional preferences for coding RNA editing. We show that RNA editing has a unique pattern of amino acid changes characterized by enriched stop-to-tryptophan changes, positive-to-neutral and neutral-to-positive charge changes. RNA editing tends to have stronger structural effect than equivalent A-to-G SNPs but weaker effect than random A-to-G mutagenesis events. Sites edited at low level tend to be located at conserved positions with stronger predicted delete- rious effect on proteins comparing to sites edited at high frequencies. Lowly edited sites tend to destabilize the protein structure and affect amino acids with larger number of intra-molecular contacts. Still, some highly edited sites are predicted also to prominently affect structure and tend to be located at critical positions of the protein matrix and are likely to be functionally important. Using our pipeline, we identify and discuss several novel putative functional coding changing editing sites in the COPA (I164V), GIPC1 (T62A), ZN358 (K382R), and CCNI (R75G).

Proteins 2014; 82:3117–3131. VC 2014 Wiley Periodicals, Inc.

Key words: RNA editing; ADAR; thermostability; RNA modification; RNA-seq; protein structure analysis.

INTRODUCTION Additional Supporting Information may be found in the online version of this A-to-I RNA editing (de-amination of adenosine to article. inosine) is a widespread post transcriptional modifica- Abbreviations: AA, amino acid; ANM, anisotropic network model; CDS, 1–10 coding sequence; DB, database; dbNSFP, database for non-synonymous tion catalyzed by the ADAR (adenosine deaminase SNVs’ functional predictions; MPS, massively parallel sequencing; PDB, pro- acting on RNA) family of enzymes. In human, the family tein data bank; PPI, protein–protein interaction; RNA-seq, RNA sequencing; includes ADAR, ADARB1, and ADARB2 (synonymous: SNP, simple nucleotide polymorphism; WTS, whole transcriptome sequencing. ADAR1, ADAR2, and ADAR3, respectively).6,11–14 Changes in RNA editing are essential to normal devel- Grant sponsor: Flight Attendant Medical Research Institute (FAMRI), Israeli Cen- ters of Research Excellence (I-CORE), ISF Regulation in Complex Human opment. Adar knock-out mice died with disseminated Disease Center, ISF Chromatin and RNA Gene Regulation Center. G.R. holds the apoptosis at embryonic stage, reflecting Adar’s important Djerassi Chair in Oncology, Tel Aviv University, Israel. 15,16 O.S. and E.E. performed the bioinformatics and statistical assays, analyzed the role in early hematopoiesis in the embryonic liver. data, and wrote the manuscript. L.B. and E.Y.L. helped analyzing the human Adarb1 knock-out mice suffer from epileptic seizures and bodyMap data, conceived ideas and helped writing the manuscript. R.U., N.A., died at very early age.17,18 Editing was found to be con- and G.R. conceived ideas and helped writing the article. 19 *Correspondence to: Eran Eyal, Cancer research Center, Sheba Medical Center, 2 nected with several human disorders and diseases. Sheba Rd., Ramat Gan, Israel 52621. E-mail: [email protected] Phone: Abnormal editing at the 5-hydroxytryptamine (sero- 1972-3-5308148; Fax: 1972-3-5305942. Received 15 April 2014; Revised 28 July 2014; Accepted 11 August 2014 tonin) receptor 2C (HTR2C) was found to be linked Published online 18 August 2014 in Wiley Online Library (wileyonlinelibrary. with depression, schizophrenia and suicide.19 Defective com). DOI: 10.1002/prot.24672

VC 2014 WILEY PERIODICALS, INC. PROTEINS 3117 O. Solomon et al. editing at GRIA2 mRNA was connected with amyotro- experimental work is still needed in order to explore the phic lateral sclerosis (ALS) etiology.20 Mutations in editing dependent regulation in these examples. ADAR were associated with up-regulation of the inter- feron pathway and found to cause Aicardi-Goutie`res syndrome.21 Altered editing levels were also detected in different types of cancers.22,23 The role of abnor- METHODS mal editing was mostly studied in brain tumors, in 24 Construction of a dataset of coding RNA particular in glioblastoma multiforme (GBM). ADAR editing sites in human was found to play an important role also in chronic myeloid leukemia (CML) tumorigenesis.25 RNA editing A-to-I RNA editing sites were obtained from recent in the coding region of AZIN1 was found to contrib- studies which identified editing events based on RNA- ute to human hepatocellular carcinoma (HCC) seq.1–5 In the current study RNA editing sites were not pathogenesis.26 identified ab initio and we associate structural and func- Inosine is recognized as guanosine by cellular machi- tional features only to previously characterized editing neries such as splicing and translation. In addition, cur- sites, gathered from recent publications and updated rent sequencing methods identify inosine as guanosine. RNA editing databases (RADAR36 and DARNED41). This makes the identification of editing sites an appeal- These RNA editing sites were intersected using bed- ing bioinformatics problem and a body of recent studies tools42 with coding regions of RefSeq transcripts43 have been devoted to detection of additional editing downloaded from UCSC table browser.44 Non- sites.1–5,8–10,27,28 It has been recently estimated that synonymous (NS) coding RNA editing were retained for millions of RNA editing sites are found in the human downstream analysis and intersected with the database transcriptome.1,27 Most of the RNA editing sites in for nonsynonymous SNVs’ functional predictions human are located in Alu repeats at non-coding regions (dbNSFP)45 which gathers information about NS amino (3’UTRs and introns).8–10 acid (AA) changes, evolutionary conservation (Phylop ADAR and its RNA editing activity were shown to be score) and prediction of variant effect based on different connected with splicing and gene expression regula- tools (SIFT, polyPhen2, LRT, MutationTaster, Mutatio- tion.29–35 Recently, we detected changes in splicing pat- nAssessor, and FATHMM). For each editing site, the tern on transcriptome-wide scale following knock-down number of methods predicting it to be deleterious was (KD) of ADAR.29 According to current estimations registered. Sites predicted to be deleterious based on (RADAR36 database) there are 2411 editing sites which more methods, were considered as having stronger prob- are located in coding sequences (CDS),36,37 only few of ability for deleterious effect of the editing (see Support- which are well studied and known to affect the function ing Information Table S7 for CDS editing sites mapped of the protein products (such as the sites in GRIA2,17 to dbNSFP information). Equivalent dbNSFP data KCNA1,39 or AZIN126). Most of these sites, however, are regarding simple nucleotide polymorphism (SNPs) and poorly studied and have not been carefully examined in random sites were collected. All coding RNA editing sites the context of the protein. The gap between the charac- were manually intersected with dbSNP. Sites reported to terization of the sites at the RNA level and the ability to be SNP were excluded, unless the reported source data understand the implications on the protein product is for the SNP is cDNA. Such SNPs can be RNA editing clearly increasing in the massively parallel sequencing events.46,47 All NS RNA editing sites reported in this (MPS) era. manuscript, were also examined in the exome sequencing In the current study, we examined the evolutionary, project data (NHLBI ESP,48 which includes variants structural and functional effects of coding RNA editing based on exome sequencing data of 6500 individuals) to using MPS data from various sources including Illumina verify that they do not represent genomic polymorphism. human bodyMap.40 For the first time, we combined the For editing sites with more than two different reported editing data with available protein structural data in editing levels in the Illumina’s bodyMap (e. g. when the order to investigate common structural and functional editing level is reported for more than one tissue sam- preferences of human coding RNA editing sites. We were ple), editing level associated with the sample having the able to find statistical significant preferences for the types maximal coverage (maximal A 1 G reads number) was of amino acid changes induced by editing sites within taken for downstream analysis. We used only high qual- coding regions. Most editing sites are edited at low level ity reads (edited position having Phred quality score and we show correlation between the editing level and >30) and therefore our calculated editing levels may be the structural significance of the amino acid change as somewhat different than the editing levels reported in well as the predicted effect on thermostability. Finally, we RADAR database (DB) based on the same raw data. point to several new cases in which highly edited sites RNA editing sites with their published editing levels were are predicted by computational tools to have significant also downloaded directly from RADAR36 (see Support- effect on the protein structure and function. Further ing Information Table S5). RADAR considers different

3118 PROTEINS Structural and Functional Preferences of Coding RNA Editing

Table I The Number of Changes for Each NS Type Resulting from RNA Editing, A-to-G SNPs or Random A-to-G Replacements in the CDS of the Same Genes

Type Editing SNP Random Editing-ratio SNP-ratio Random-ratioa * > W 8 0 125 0.01 0.00 0.00 (1) D > G 15 80 5481 0.02 0.06 0.08 (100) E > G 57 95 8546 0.08 0.07 0.13 (100) H > R 29 69 2866 0.04 0.05 0.04 (66) I > M 11 23 944 0.02 0.02 0.01 (38) I > V 33 198 4899 0.05 0.14 0.07 (99) K > E 66 60 6833 0.09 0.04 0.10 (69) K > R 85 126 6840 0.12 0.09 0.10 (3) M > V 16 104 2372 0.02 0.07 0.03 (98) N > D 27 31 4146 0.04 0.02 0.06 (99) N > S 36 174 4209 0.05 0.13 0.06 (81) Q > R 87 94 5604 0.12 0.07 0.08 (1) R > G 54 40 2647 0.08 0.03 0.04 (1) S > G 79 66 3647 0.11 0.05 0.05 (1) T >A 79 141 6161 0.11 0.10 0.09 (5) Y > C 26 90 2786 0.04 0.06 0.04 (80) Total 708 1391 68106 1 1 1

Dark gray background: editing is significantly enriched (P < 0.01); light gray background: editing is significantly depleted (P < 0.01). aIn parentheses, the rankings of the original type of change resulted from original RNA editing comparing to 100 random selection trials of adenines. This is presented as score between 1 and 100.1: most frequent. 100: least frequent. sources for editing sites in addition to the human sis. However, for sites with very small coverage in the bodyMap. human bodyMap we used the reported edited level in RADAR and not deduced it from the few bodyMap Calculating editing level from Illumina human reads. This maximal editing level is reported in the bodyMap examples of COPA, GIPC1, ZN358, and CCNI. Illumina human bodyMap data were downloaded 49 Random A-to-G substitutions control and from SRA and aligned to the human genome (hg19) SNP control using Bowtie76 (using the following flags: -n 3, -l 20, -k 20, -e 140 -best). All previously identified coding RNA Adenosines within CDS mRNA of the edited proteins editing sites (as described above) were called from Illu- were randomly changed to guanosine. These in silico mina human bodyMap data40 by counting the number edited sequences were translated into proteins and their of ’G’ found at these sites. A position was considered as properties were compared to those of the original coding edited in the human bodyMap data if it has reads show- editing sites. A-to-G SNPs in CDS of the same edited ing ’G’ whereas ’A’ is the corresponding base in the proteins were taken from dbSNP13550 and were tested human reference genome (hg19), and the variant mean against the original coding editing sites (see Table I for quality score is higher than 30 (representing sequencing details of SNP and random sites used). error rate of 0.001). Editing levels in human bodyMap were attached to each coding editing site. A site was con- Secondary structure prediction and sidered as lowly edited if less than 5% of the reads which disordered prediction cover it show G. A site was considered as highly edited if more than 5% of its covering reads showing G in at least Protein secondary structure prediction was done using profPHD51 and SSpro.52 Disordered prediction was two reads. In addition, we required a minimal coverage 53 54 of two reads in at least one bodyMap sample to consider done using IUpred and disopred. a site in the analysis. A minimal coverage of at least five reads was requested in order to consider a site as highly Protein-protein interaction (PPI) edited (see Supporting Information 1, Table S6). information For the analysis of correlation between editing levels PPI information tables for human were taken from and thermostabilty predictions or amino acid intra Reactome,55 IBIS,56 MINT,57and prePPI.58 molecular contacts (see below), we used editing levels 36 based on the above criteria and also from RADAR. Structural modeling This allows us to increase the number of editing sites under investigation. For sites with editing levels reported All edited proteins were aligned to protein sequences in both sets, the maximal value was taken for the analy- of known structures in the PDB (taken from PISCES59

PROTEINS 3119 O. Solomon et al. server) using Blastp60 with default parameters. Structures were plotted in order to identify edited amino acid with of proteins with sequence identity >25% and aligned local-minimum of b-factor. These sites are often related region >80AA were downloaded from the PDB and used to cooperative functional motion of the protein or par- for downstream analysis. In order to carefully locate the ticipate directly in catalytic activity and functional inter- editing sites which actually code for amino acids present actions. Profiles of the three low frequency (slow) modes in the atomic coordinates, we used S2C (http://dunbrack. for each protein under investigation were taken from fccc.edu/Guoli/s2c/). For FoldX analysis and intramolecu- ANM website. For CCNI R75G we cross validated the lar contacts analysis (described below) we included only ANM results with analysis of the residues contact net- structures having >50% identity. work obtained using both SARIG69 and Jamming.70 Structural models for edited proteins presented in the section of novel examples were downloaded from Mod- Thermostability predictions Base.61 We used only models which cover the editing 71 sites and have >25% sequence identity with their tem- FoldX was used for prediction of thermostabilty plate in a region longer than 80 AA. These structural changes using structural information (see Supporting models were compared with models in Swiss-model62 Information Tables S1 and S2). Known structures having using the structural genomics portal (www.sbkb.org). >50% sequence identity with the edited proteins were Models having the higher sequence identity with their included in the analysis of correlation between FoldX template were eventually taken for further analysis. predicted DDG and the editing level. In addition, for These models were also visualized using Jmol (http:// selected proteins used in the manuscript (see novel jmol.sourceforge.net/). If a model does not include the examples below), the theoretical models were optimized edited amino acid or the region is a part of long and mutated using FoldX. unstructured fragment, this model was removed. Biolog- ical assemblies of the templates were downloaded from Statistics 63 PISA. In order to model the edited version of the All statistical tests were done using the R statistical protein(ascanbeseeninFigs.6and7)weusedthe programming language.72 side chain modeling of SCCOMP64 and SCWRL4.65 TM-align66 was used for structural alignment of model to template. RESULTS Structural derived statistics were mostly based on Construction of a dataset of human coding known structures in PDB. The thermostabilty analysis RNA editing sites for individual examples (COPA, GIPC1, ZN358 and CCNI), the side-chain prediction (as in Figs. 6 and 7) In order to draw common structural and functional and the calculation of the theoretical b-factor (as in the patterns related to human coding RNA editing, we first Fig. 5), were done based on the theoretical models. constructed a dataset of coding RNA editing. Editing In the Supporting Information files of this manuscript sites were taken from recently published studies, mainly we provide tables with the structural details of CDS edit- whole-transcriptome sequencing (WTS or RNA-seq), and ing sites (Supporting Information, Tables S1–S8 with from RNA editing databases36,41 and intersected with detailed explanation in Sup2). We also constructed a coding sequence (CDS) annotation of RefSeq genes.43 simple website (http://www.sheba-cancer.org.il/editing- These regions were then mapped to RefSeq and Uniprot structure) which provides a visualization of the edited proteins. The protein sequences were then aligned against positions on the best template structures using Jmol. the protein data bank (PDB)73 and Modbase.61 These steps resulted with 1066 sites identified in the CDS of genes (well annotated sites according to RefSeq), 708 of Calculation of intramolecular contacts them are non-synonymous (NS) (Table I). Intra-molecular contacts for each edited AA covered The numbers of known RNA editing, and in particular by a known protein structure were determined by calcu- of CDS RNA editing in human, were widely increased fol- lation of common contact surface using the Voronoi pro- lowing recent RNA-seq studies. Until several years ago cedure as described in McConkey et al.67 Structures only dozens of such sites were assumed. Li et al.5 carefully having >50% sequence identity with the edited proteins verified 55 sites in CDS using padlock capturing (consid- were used for the analysis of correlation between AA ered as the stringiest set—class1, in this publication). Peng intramolecular contacts and the editing level. et al.3 identified 63 in CDS and Bahn et al.4 identified 53. More recently, Ramaswami et al.1 identified 441 edit-

Anisotropic network model (ANM) analysis ing sites in CDS and in two useful efforts for the editing community, Ramaswami and Li36 and Kiran et al.41 Theoretical b-factors were calculated for the structures gathered all known RNA editing sites to RADAR and using the inverse of the hessian matrix of ANM,68 and DARNED databases (DB) respectively; 2411 sites in

3120 PROTEINS Structural and Functional Preferences of Coding RNA Editing

RADAR and 710 sites in DARNED are annotated as cod- ing RNA editing. Xu et al.74 used 1783 CDS RNA edit- ing sites in their analysis. The overall numbers are heavily dependent on the detection approach and the exact gene annotation used. In the current study we gathered coding RNA editing from several sources (see methods) and used RefSeq43 annotation for coding regions, which is considered con- servative and carefully curated. When other, more specu- lative, gene annotations are used, the number of coding RNA editing can be significantly higher, as is the case in the RADAR DB (2411 editing sites annotated as CDS RNA editing36). We used the human bodyMap data (see methods) and the RADAR database in order to get the editing frequencies. Of the 708 editing sites gathered and detected as NS coding RNA editing, 140 are associated with structural information in the PDB. In 60 of them the coordinates of the amino acid affected by the editing are present in the file (for further details see methods and Supporting Information, Tables S1–S8). We also provide a simple website (http://sheba-cancer.org.il/editingstructure) to assist visualization of the modified amino acid mapped on the best template structures using Jmol. Out of the 1066 RNA editing sites located in CDS of Figure 1 RefSeq genes (see Supporting Information 1, Table S8), RNA editing results in a unique pattern of amino acid changes. A. Den- 321 are found in repeat regions (305 in Alu repeats), sity plot of the NS AA changes resulted from random A-to-G replace- whereas out of the 708 RNA editing sites which result in ments (100 random trials). The number of original NS changes in NS changes, 226 are located in repeat regions (217 in RefSeq proteins resulted from RNA editing is marked with an arrow. B. Changes by amino acid charge. POS: positive charge, NEG: negative Alu repeats). The fraction of editing sites within repeats charge, NEU: neutral charge, STOP: codon. ** P < 0.01. among CDS RNA editing sites (226/708 5 0.32) is signifi- cantly lower than the fraction among all editing sites (98% out of 975734 sites, all sites with known editing wobble codon) but is significantly larger than random level according to RADAR DB. Fisher, P < 0.0001). We replacement from A-to-G which occur only in 26% in cannot rule out that part of this trend originates from wobble codons (Fisher, P < 0.0001). This observation false-positive editing sites in CDS, as the false identifica- suggests that coding RNA editing is subjected to evolu- tion rate of non Alu editing sites is larger as previously tionary pressure and it may be that the ADAR motif shown.2 The fraction of coding RNA editing in repeats is developed in that manner. higher than the fraction of CDS A-to-G SNPs in the We next examined the number of NS changes from same genes located in repeats (20/2046 A-to-G SNP are each type. RNA editing resulted with enriched number found in repeats, only two of them in Alu repeats). of Q-to-R, R-to-G, S-to-G, and stop-to-W substitutions When looking only at NS SNP, 14 out of 1391 (0.01) are compared to random A-to-G replacements. These substi- located in repeats (and two in Alu repeats). tutions are also enriched compared to A-to-G changes observed in SNP databases within the coding sequences (CDS) of the same genes (Fisher, P < 0.001; Table I). RNA editing results in a unique pattern of Overall, RNA editing contributes to relatively high num- amino acid changes bers of charge changes from neutral-to-positive and The above set identified editing sites which overlap positive-to-neutral [Fig. 1(B)] as well as stop-to-neutral. with CDS of annotated RefSeq genes and cause NS Similar trends for both NS changes and AA charge amino acid changes. We found that there are fewer NS changes were found for a subset of coding RNA editing editing sites than expected by chance [empirical P < 0.01 sites recorded as frequently edited (edited at high fre- as determined by 100 random trials of A-to-G substitu- quency) in human bodyMap data (see below). The trend tions as described in methods; Fig. 1(A)]. This is for this subset is less significant, mainly due to the explained by the observation that 35% of the coding smaller number of sites (Supporting Information 2, RNA editing sites are located in the wobble codon. This Tables S9–S10). Interestingly, most of the stop-to- ratio is similar to the ratio of A-to-G SNP (33% in the tryptophan changes (5/8) were found to be frequent in

PROTEINS 3121 O. Solomon et al.

Figure 2 Comparison between RNA editing sites, genomic polymorphic sites and random A-to-G replacements. A. Density plot for the distribution of con- servation score (Phylop score). Editing: editing sites. SNP: A-to-G SNPs in the CDS of the same genes. Random: Random A-to-G replacements in the CDS of the same genes. B. Ratio of changes predicted to have deleterious effect. x-axis: Number of prediction methods agree on the deleterious effect (for details see methods). White: Editing sites. Gray: A-to-G SNPs in the CDS of same genes. Black: random A-to-G replacements in the CDS of the same genes. **P < 0.01. the human bodyMap data (see Supporting Information are not involved in more protein-protein interactions 1, Table S7). (PPI) than other un-edited proteins (using data from Editing dependent AA changes are depleted in D-to-G Reactome, IBIS, MINT, and prePPI55–58). Interestingly, and I-to-V (Fisher, P < 0.01. Table I). Coding RNA edit- we found that coding Alu repeats (exonized Alu) are fre- ing derived AA changes distribution is significantly dif- quently located in disordered regions of the protein. This ferent than that cause by A-to-G SNPs or random A-to- observation complements previous analyses that found G replacements in CDS of the same genes (v2 P < enrichment of Alu repeats in alternatively spliced (AS) 0.0001 for both comparisons). These significant differen- exons30,77,78 and that tissue–specific AS exons are ces may be partially explained by the ADAR preference enriched in disordered regions of the protein.79 It is also for downstream “G” and upstream “U,”, and a depletion in agreement with our recent finding that reduction of of upstream “G” to the edited “A” (the “ADAR ADAR level change the splicing pattern of genes which motif”).1,4,5,8 Indeed, in the most frequent codon are enriched with Alu repeats.29 It is possible that disor- changes “G” is downstream to the edited “A”. Still, even dered regions better tolerate alterations such as transpo- after taking these preferences into account, and excluding sition events and editing events. Overall, RNA editing all codons with upstream “G” or “U,” or downstream sites are not statistically enriched within particular struc- “G,” we observe that the distribution of NS changes tural motifs or disordered regions. However, Alu repeats, derived from RNA editing is different from that of SNPs the preferred substrates for ADAR editing, may play a or random A-to-G replacements (v2 P < 0.05 for both role in regulation or creation of protein disordered struc- comparisons). tures,80 so relation between editing and disorder may still exist.

RNA editing sites are not enriched in specific protein 2D structure or disordered Infrequent RNA editing sites tend to have regions stronger predicted deleterious effect We next examined if the AA translated from codons It is of interest to examine the level of evolutionary modified by RNA editing has any preferences for a spe- conservation in genomic sites which undergo RNA edit- cific protein secondary structure or disordered regions. ing. We found that RNA editing sites tend in general to We found that editing sites do not have clear preferences be located in positions with lower conservation score for specific protein 2D elements (using profPhd75). Sim- than A-to-G SNPs or random A-to-G replacements in ilarly, no preference for editing sites to affect protein seg- CDS of the same genes [Wilcoxon P <1.4 3 1026 for ments predicted to be disordered (using IUpred53) was both comparisons; Fig. 2(A)]. Accordingly, coding RNA found (Supporting Information Fig. S1). Other predic- editing is predicted to have a weaker damaging effect tion methods (Disopred and SSpro52,54) show similar than random A-to-G in CDS of the same genes [P < results (data not shown). In addition, the edited proteins 10215; Fig. 2(B)] based on six different methods, which

3122 PROTEINS Structural and Functional Preferences of Coding RNA Editing

Figure 3 Lower frequency coding RNA editing sites are located at more conserved positions and cause stronger deleterious effect. A. Density plot for the dis- tribution of Phylop score. Gray vertical fill: editing sites with editing level (G/(A 1 G)) 5% (for details see methods). Horizontal fill: editing sites with editing level <5%. B. Ratio of changes predicted to have deleterious effect. X-axis: number of prediction methods agree on the deleterious effect (for details see methods). Gray bars: editing sites with editing level 5%. White bars: editing sites with editing level <5%. **P < 0.01. *P < 0.05. predict deleterious effect of NS variants (see methods). Lower frequency editing seems to have significantly Random A-to-G replacements have higher probability to stronger deleterious effect than high frequency edited be damaging than A-to-G SNPs (P < 10215). The fraction sites [P 5 0.0061, Fig. 3(B)]. By comparing Blosum62 of A-to-G SNPs not predicted to be deleterious in CDS values of AA changes resulted from lowly edited sites to by any program is higher than coding editing sites or Blosum62 values of AA changes resulted from highly random A-to-G sites in CDS of the same genes (no pre- edited sites, we found that the lowly edited sites tend to diction program indicates the SNP as deleterious. Fisher have significantly lower value (Wilcoxon P 5 0.034). In P < 0.02 for both comparisons). Similar trend was seen other words, changes in highly edited sites tend to be when we considered only frequent NS RNA editing sites more conservative. (see below). Random A-to-G replacements tend to be Using the structural data we examined the correlation located at more conserved positions than A-to-G SNP or between the free energy resulted from the amino acid frequent RNA editing sites and are predicted to have substitution and the editing level. We used FoldX71 to stronger deleterious effect (Supporting Information 2, model the mutated amino acid due to the editing event Tables S11–S12). and to predict the thermostablity changes upon editing. Taken together, these results imply that the functional The results suggest that lowly edited sites tend to give effect resulted from coding RNA editing is weaker than rise to lesser stable proteins and have stronger deleterious in silico random mutations, suggesting again that editing effect (Fig. 4). In general, the vast majority of the editing coding sites undergo evolutionary selection. events tend to destabilize the proteins but this tendency We next compared features of sites edited at high fre- is more apparent for sites edited in low frequency com- quency (102 coding RNA editing sites in RefSeq genes paring to sites edited at high frequency. A statistically with more than 5% of the reads in at least one sample of significant anti-correlation exists between the editing Illumina human bodyMap,40,81 where the editing levels level and the change in free energy (R520.22, are based on high quality reads, see Methods) and of P 5 0.006; FoldX predicted DDG for the edited structures sites with lower frequency editing level (less than 5%, see are detailed in Supporting Information 1, Table S1–S2). methods). Lower frequency editing sites tend to be When we conducted the thermostabilty analysis with a located at more conserved positions [Wilcoxon P 5 6.24 nonredundant set of structures having more than 90% 3 1027; Fig. 3(A)]. The distribution of conservation identity to the edited protein and a single structure per scores for sites with lower frequency editing is not differ- editing site, a similar trend was observed (as can be seen ent than that of random A-to-G replacements (Wilcoxon in http://sheba-cancer.org.il/editingstructure and in Sup- P 5 0.15). However, the distribution of conservation porting Information Fig. S2). The trend was not signifi- scores for the sites with high frequency editing is differ- cant in this analysis due to the much smaller number of ent (Wilcoxon P < 10215). structures (R 520.49, P 5 0.18).

PROTEINS 3123 O. Solomon et al.

To address this question we gathered structural infor- mation regarding these cases. We mapped the sequences surrounding only frequent edited sites (editing level 5%) and considered as deleterious by at least two dif- ferent prediction methods, to available structural data (either from experimental source or from reliable theo- retical models). This step resulted in only 16 coding edit- ing sites, which obey all conditions (see Methods). Structural-derived information, which can hint on functionality of particular site in the protein structure context, is the protein dynamics. Important sites, crucial for stability and catalysis are usually located at positions being relatively fixed and rigid,83 which can be distin- guished as minima on dynamic profiles. This tendency can be examined, for example, by looking on X-ray derived temperature factors (b-factors). Using the Anisotropic Network Model (ANM)68 we cal- culated the theoretical b-factors for the proteins with signifi- cant editing. We found that fourteen of the sixteen sites Figure 4 have relatively low theoretical b-factor comparing to their FoldX prediction for free energy change resulted from the editing vicinity ([23, 13] AA). Mobility profiles of the theoretical events. Each point represents a structure. The x-axis shows the editing b-factor for six of these proteins are shown in Figure 5, level, while the y-axis shows the free energy change. demonstrating the general tendency that editing sites are located at theoretical b-factor minima. This highlights the Related trend was also observed by analysis of the importance of these editing sites and their strategic location. number of contacts created by the amino acids affected It is also known that the most functionally important by the editing. We found that the number of contacts normal modes are the low frequency modes ("slow exist in the unedited version of the protein structures is modes") which are the most collective and functional rele- anti-correlated with the editing level in these positions vant. We examined the position of the editing sites along (R520.35, P < 0.01). The amino acids (translated by the profiles of the slow modes and got similar trend to unedited version) at lowly edited sites tend to have more that obtained with the theoretical b-factors. The editing intramolecular contacts, while amino acids (translated by sites are located almost exclusively at minima of the slow the unedited version) at highly edited sites tend to have modes profiles (Supporting Information Fig. S3). less intra-molecular contacts. We next present examples of frequently edited sites, pre- We also examined how often editing sites are located dicted to bear strong deleterious effect. Some of these within protein domains, as defined in the interpro data- changes, predicted to be deleterious at the protein level, base, and found that lowly edited sites are located more might be in fact beneficial at the organism level. So sites clas- often at functional domains than expected (P < 0.05). sified as deleterious by programs which base their predictions on single, isolated, protein structures, might have a functional Examples of frequently edited CDS sites role, which is not necessary harmful at the system level. with functional implications The section above suggests that most of the coding Glutamate receptors (GRIA2/GRIA3 R-to-G, edited sites have modest structural effects. Some of them Q-to-R and GRIK2 Q-to-R) are simply functionally neutral while others are deleteri- ous but affect only a tiny fraction of the transcripts. We were reassured to find in our data set of functional There are, however, some frequently edited sites pre- editing sites the known examples of glutamate receptor dicted to be highly deleterious. In fact, most CDS editing 2/3 (GRIA2/GRIA3) R-to-G and Q-to-R sites (GRIA3 sites found to be conserved in mammals82 are also pre- R775G–chrX:122598962. Uniprot id: P42263. GRIA2 dicted to have stronger deleterious effect by various Q607R–chr4:158257875. Uniprot id: P42262. GRIA2 methods (Wilcoxon P < 0.0001) than other CDS editing R764G–chr4:158281294, hg19.). These sites were shown sites. It is likely that this collection of editing sites repre- to have low theoretical b-factor [Fig. 5(A,B)]. Four out sents the handful CDS sites with functional implications of six prediction programs agree on the "deleterious" and the deleterious annotation should be considered effect of R764G, Q607R of GRIA2 and Q775R of GRIA3. more broadly as sites which likely influence function. As was previously shown, the GRIA2 Q607R site is cru- The high editing frequency in these cases reflects the real cial to early postnatal development.17,18 Similarly, the importance of the editing event in cellular regulation. structural model of the glutamate receptor ionotropic,

3124 PROTEINS Structural and Functional Preferences of Coding RNA Editing

Figure 5 Theoretical b-factor profiles of edited proteins. The modified amino acids due to the editing events are marked by arrows. A. GRIA2 editing affects Q607R. B. GRIA3. R775G. C. GRIK2. Q621R D. KCNA1. I400V (in this model I400 correspond to the 75th AA position). E. GIPC1. T62A. F. CCNI. R75G. Note that in all cases the edited residue is located at local minima. kainate 2 (GRIK2; Uniprot id Q13002) reveals lower the- conserved from human to drosophila.39 Five out of six pre- oretical b-factor at Q621 [Fig. 5(C)] and "deleterious" diction programs agree upon the "deleterious" effect of this effect of the editing, predicted by four out of six predic- editing site. This editing indeed has low theoretical b-factor tion programs. GRIK2 is known to be almost completely [Fig. 5(D), seen here as I75V], supporting the importance edited (90% of the receptors) in the gray matter while of the site. KCNA1 I400V allows faster kinetics.86 Moreover, less edited in the white matter (10% of the receptors are recent study has shown that editing of I-to-V in related edited). As in GRIA2, editing of GRIK2 at Q621R site potassium channel of octopus is responsible for adaptation determines its permeability to Ca12 ions.84,85 to cold water87 by changing the channel closing speed.

Potassium voltage-gated channel (KCNA1) Novel examples, in which editing in the I400V coding sequence is likely to have a functional role Another fascinating example is that of the Potassium Zinc finger protein 358 (ZN358) K382R voltage-gated channel subfamily A member 1 channel (KCNA1 or Kv1.1. Uniprot id: Q09470) I400V ZN358 (Zinc finger protein 358). Uniprot id: Q96SR6; (chr12:5021742/hg19), a known editing site found to be Reported editing level in RADAR DB 5 10.8%; Edited in

PROTEINS 3125 O. Solomon et al.

Figure 6 Coding changing editing sites at ZN358 K382R and COPA I164V. A. Sequence logo of ZN358 ZINC_FINGER_C2H2_1 motif (Prosite ID: PS00028). K382 is marked with an arrow. B. Structural alignment of ZN358 model (gray) with the template (pink. pdb id: 1mey). Left: pre-edited structure (K382). Right: post-edited (R382). K382 and R382 are colored in green. C. Sequence logo of COPA WD motif (Prosite id: PS00678) the edited I-to-V is marked with an arrow. D. COPA structural model. Dark green: Ile. Light green: Val. thyroid, testes, prostate, lymph nodes, lung, adipose, PDZ domain containing protein (GIPC1) T62A breast, and colon tissues) is a member of the zinc finger GIPC1 (PDZ domain containing family, member 1). protein family. It is a DNA binding protein which is Uniprot id: O14908. Editing level in human body- involved in transcription regulation. It was found to be a Map 5 18.75%, edited in prostate tissue) belongs to the pro-apoptotic tumor suppressor and is commonly GIPC protein family. This protein regulates cell surface silenced in cancer.88 Its editing (chr19:7585273/hg19) receptor expression and trafficking of proteins.89 Its results with K382R change at a conserved position within RNA editing at chr19:14593605/hg19 results with T62A the DNA-binding domain [ZINC_FINGER_C2H2_1. change and shows low theoretical b-factor [Fig. 5(E)]. Prosite id: PS00028, Fig. 6(A)]. The structural model This editing site is considered as deleterious by two dif- shows that the post-edited arginine is more distant from ferent prediction programs. Interestingly, modeling the the DNA than the pre-edited lysine [from 3A˚ to 7.7 A˚ . edited amino acid using FoldX71 resulted in slightly Fig. 6(B)]. This editing site is considered as deleterious more stabilized structure (DDG 520.11, from GIPC1 by three out of six prediction programs. theoretical model).

3126 PROTEINS Structural and Functional Preferences of Coding RNA Editing

Figure 7 CCNI coding RNA editing. A. Sequence logo of the Cyclin motif (Prosite id: PS00292) the edited R75 is marked with arrow. B. Structural model of CCNI aligned to its template (pdb ID: 1w98). The CCNI model appears in pink while the template CCNE in gray and CDK2 in off-white (pdb ID: 1w98). The edited R (R75 of CCNI) is marked by space fill and colored dark gray. C. Left: CCNI pre-editing model. Dark gray: R75. Green: L138. Blue: K137. Right: CCNI post-edited at R75G and K137R. Dark gray: G75. Green: L138. Blue: R137.

Coatomer subunit alpha (COPA) I164V on zebrafish RNA-seq data92 downloaded from SRA (SRA id: SRP013987). Coatomer subunit alpha (COPA). Uniprot id: P53621. Reported editing level in RADAR DB 5 26.4%; Edited in Cyclin-I (CCNI) R75G lung, lymph nodes, prostate and ovary tissues). RNA editing at chr1:160302244/hg19 results with I164V CCNI (Cyclin-I). Uniprot id: Q14094. Editing level in change. COPA is a family member of the non-clathrin- human bodyMap 5 31.95%, edited in lung, lymph nodes, coated vesicular coat proteins (COPs), which mediates prostate, skeletal muscles, white blood cells, ovary, testes, protein transport from endoplasmic reticulum to the and thyroid tissues) is a member of the Cyclin protein Golgi compartments in eukaryotes cells.90 Its editing site family. It was shown to be highly expressed in postdiffer- is located at WD40 repeat-like-containing domain [Pro- entiated cells.93 Its editing at R75G site (chr4: 77979680/ site id: PS00678, Fig. 6(C,D)]. Despite the conservative hg19) is predicted to be deleterious with five out of six amino acid change, it is considered deleterious by four prediction methods agree that the change is deleterious. out six prediction programs. Interestingly, we detected According to the CCNI structural model [Fig. 7(B)] the this editing site also in zebrafish (Danio rerio) and this modified amino acid is completely buried, and therefore position in general is conserved in evolution82 support- a destabilizing thermostability effect is likely. Inspection ing its apparent importance as human and zebrafish of the structure reveals that R75G appears at the protein diverged 450 million years ago.91 This editing site signif- interior at a highly conserved position and, importantly, icantly changes its editing level during the zebrafish within the known functional Cyclin motif [Prosite id: development (82% of the transcripts are edited at two PS00292, Fig. 7(A)]. The site has a central position in days comparing to 23% at 6 days of development. Wil- the protein amino acid contacts network as reflected by coxon P 5 0.012. Supporting Information Fig. S4) based high closeness value (upper 4% of all the residues in the

PROTEINS 3127 O. Solomon et al. structure, using SARIG69 and Jamming70). Both the the- that there is anti-correlation inside the NS editing sites oretical b-factor and the experimental b-factors of the between the deleterious level and the editing level. Most template (pdb id: 1w98) show that this site is less mobile importantly, our study shows this trend based on struc- and contribute to the cooperative motion of CCNI [Fig. tural analysis of the modified proteins. 5(F)]. A conserved contact of R75 with L138 is abolished CDS editing events represent a small minority of the upon its replacement with Gly [Fig. 7(C)]. Amino acid editing events in the entire transcriptome. Editing, at low coded by a second editing site (K137R, chr4:77977164/ level with slight negative functional effect, maybe an hg19, editing level in human bodyMap 5 2.8%) is located inevitable byproduct of necessary editing events else- in spatial proximity to R75G. Both editing sites together where, or even of editing independent function of the predicted by FoldX71 to destabilize the protein structure ADAR enzymes. Recently, it was shown that ADAR1 has (DDG 5 0.71, based on the CCNI theoretical model). This important editing independent function in regulating energetic change is lower than the cumulative result of miRs expression in melanoma.96 In addition, some edit- modeling these two sites independently (DDG 5 0.60 for ing sites predicted to be deleterious might have beneficial K137R, DDG 5 0.59 for R75G, both based on CCNI theo- role at unique conditions of developmental changes. retical model), supporting a cooperative function of these Inferior alleles under a given evolutionary pressure are two sites. It was shown by high-throughput sequencing often kept in the population as they may provide advant- and validated using Sanger sequencing that both editing age for a different selective pressures (examples are the events in CCNI co-occur in the same tissue.94 Both edit- sickle cell anemia mutation97 and cystic fibrosis98). ing sites were also reported in cDNA from the same source Editing level can be changed quickly in response to envi- (GenBank: CR541783), supporting their probable cooper- ronmental cues99 and give rise to "adaptive evolution" ative function. Interestingly, CCNI is known to have a and to utilization of the best fitted transcript at each constitutive mRNA expression unlike most other cyclins condition or developmental stage. whose RNA levels fluctuate according to the Still, we show in this study that there is a relatively stage. Its expression level was reported to be elevated in small set of sites, whose editing level is high and never- post-differentiated cells.93 Recently it was shown that theless are predicted to have strong deleterious effect. although CCNI’s mRNA expression does not change with This raises the question: why are these deleterious events cell cycle, its protein level does.95 In this regard, it will be common? interesting to examine if and how RNA editing of CCNI While we cannot rule out the possibility of false posi- transcript contributes to its protein expression level along tive prediction of deleterious sites, we believe it is, by the cell cycle. and large, unlikely as we included only sites for which two or more different tools indicating a functional effect. A more likely explanation is that the deleterious effect on DISCUSSION the protein level, apparently, contributes to normal regu- lation on the system level. The known cases of GRIA2/3 In the current study we used recent data on A-to-I R-to-G, Q-to-R and KCNA1 I-to-V17,18,38,39 were RNA editing of coding regions in humans in an attempt detected by our procedure independently and are pre- to draw common protein structural and functional pref- dicted to be deleterious although the editing in these erences. We suggest that only few coding editing sites in genes is in fact vital for normal development. humans are edited in a significant portion of the tran- For the cases with high editing level, analysis of pro- scripts, change amino acid and significantly affect protein tein structural model and protein dynamic, enable us to functionality. In many cases, RNA editing events at con- identify CCNI R75G, ZN358 K382R, GIPC1 T62A, and served positions have deleterious effects and are selected- COPA I164V as new interesting and likely important against to be edited at only a tiny fraction of the tran- coding editing sites located in both evolutionary con- scripts. Under normal conditions the modified translated served motifs as well as in structurally critical sites. protein products will then be found in marginal Our study highlights the importance of structural and amounts. Other editing sites may have neutral or small functional information to supplement the transcriptome effect on the protein function and as such are not sub- data usually applied for RNA editing research. More gen- jected to strong negative selection. The editing level at erally, it illustrates the need for the incorporation of such sites may be significantly higher and these sites tend structural information for interpretation and prioritiza- to be located at less conserved positions. tion of specific single nucleotides variants and for direct- Our results complement a recent study74 which ing future experimental studies. showed, mostly using evolutionary analyses, that within CDS there is a selection against coding changing editing ACKNOWLEDGMENTS sites and that NS editing sites occur at lower frequency than synonymous ones within CDS. Our results are, in The authors thank Ariel Azia and Limor Ziv-Strasser general, in agreement with this study, but we also show for helpful advices, Naa’ma Elefant for providing the

3128 PROTEINS Structural and Functional Preferences of Coding RNA Editing script for mapping genomic coordinates into transcripts ated with an editing-deficient GluR-B allele in mice. Science 1995; coordinates, Ami Haviv for providing the script for anal- 270:1677–1680. ysis of the human bodyMap data and Jin Billy Li for 19. Maas S, Kawahara Y, Tamburro KM, Nishikura K. A-to-I RNA edit- ing and human disease. RNA Biol 2006;3:1–9. sharing with us RADAR data pre-publication. The work 20. Kawahara Y, Ito K, Sun H, Aizawa H, Kanazawa I, Kwak S. Gluta- of O.S. was done in partial fulfillment with the require- mate receptors: RNA editing and death of motor neurons. Nature ments of the Faculty of Life-Sciences, Bar-Ilan University, 2004;427:801. Israel. 21. Rice GI, Kasher PR, Forte GM, Mannion NM, Greenwood SM, Szynkiewicz M, Dickerson JE, Bhaskar SS, Zampini M, Briggs TA, REFERENCES Jenkinson EM, Bacino CA, Battini R, Bertini E, Brogan PA, Brueton LA, Carpanelli M, De Laet C, de Lonlay P, Del Toro M, Desguerre I, Fazzi E, Garcia-Cazorla A, Heiberg A, Kawaguchi M, Kumar R, Lin 1. Ramaswami G, Zhang R, Piskol R, Keegan LP, Deng P, O’Connell JP, Lourenco CM, Male AM, Marques W, Jr, Mignot C, Olivieri I, MA, Li JB. Identifying RNA editing sites using RNA sequencing Orcesi S, Prabhakar P, Rasmussen M, Robinson RA, Rozenberg F, data alone. Nat Methods 2013;10:128–132. Schmidt JL, Steindl K, Tan TY, van der Merwe WG, Vanderver A, 2. Ramaswami G, Lin W, Piskol R, Tan MH, Davis C, Li JB. Accurate Vassallo G, Wakeling EL, Wassmer E, Whittaker E, Livingston JH, identification of human Alu and non-Alu RNA editing sites. Nat Lebon P, Suzuki T, McLaughlin PJ, Keegan LP, O’Connell MA, Methods 2012;9:579–581. Lovell SC, Crow YJ. Mutations in ADAR1 cause Aicardi-Goutieres 3. Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, Zhu Y, Zhang W, Liang syndrome associated with a type I interferon signature. Nat Genet Y, Hu X, Tan X, Guo J, Dong Z, Bao L, Wang J. Comprehensive 2012;44:1243–1248. analysis of RNA-Seq data reveals extensive RNA editing in a human 22. Paz N, Levanon EY, Amariglio N, Heimberger AB, Ram Z, transcriptome. Nat Biotechnol 2012;30:253–260. Constantini S, Barbash ZS, Adamsky K, Safran M, Hirschberg A, 4. Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X. Accurate identifi- Krupsky M, Ben-Dov I, Cazacu S, Mikkelsen T, Brodie C, Eisenberg cation of A-to-I RNA editing in human by transcriptome sequenc- E, Rechavi G. Altered adenosine-to-inosine RNA editing in human ing. Genome Res 2012;22:142–150. cancer. Genome Res 2007;17:1586–1595. 5. Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, 23. Dominissini D, Moshitch-Moshkovitz S, Amariglio N, Rechavi G. Gao Y, Church GM. Genome-wide identification of human RNA Adenosine-to-inosine RNA editing meets cancer. Carcinogenesis editing sites by parallel DNA capturing and sequencing. Science 2011;32:1569–1577. 2009;324:1210–1213. 24. Maas S, Patt S, Schrey M, Rich A. Underediting of glutamate recep- 6. Maydanovych O, Beal PA. Breaking the central dogma by RNA edit- tor GluR-B mRNA in malignant gliomas. Proc Natl Acad Sci USA ing. Chem Rev 2006;106:3397–3411. 2001;98:14687–14692. 7. Levanon K, Eisenberg E, Rechavi G, Levanon Erez Y. Letter from 25. Steinman RA, Yang Q, Gasparetto M, Robinson LJ, Liu X, Lenzner the editor: adenosine-to-inosine RNA editing in Alu repeats in the DE, Hou J, Smith C, Wang Q. Deletion of the RNA-editing enzyme human genome. EMBO Rep 2005;6:831–835. ADAR1 causes regression of established chronic myelogenous leuke- 8. Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh mia in mice. Int J Cancer 2013;132:1741–1750. R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, 26. Chen L, Li Y, Lin CH, Chan TH, Chow RK, Song Y, Liu M, Yuan Rechavi G, Jantsch MF. Systematic identification of abundant A-to-I YF, Fu L, Kong KL, Qi L, Zhang N, Tong AH, Kwong DL, Man K, editing sites in the human transcriptome. Nat Biotechnol 2004;22: Lo CM, Lok S, Tenen DG, Guan XY. Recoding RNA editing of 1001–1005. AZIN1 predisposes to hepatocellular carcinoma. Nat Med 2013;19: 9. Kim DDY, Kim Thomas TY, Walsh T, Kobayashi Y, Matise Tara C, 209–216. Buyske S, Gabriel A. Widespread RNA editing of embedded alu ele- 27. Bazak L, Haviv A, Barak M, Jacob-Hirsch J, Deng P, Zhang R, ments in the human transcriptome. Genome Res 2004;14:1719–1725. Isaacs FJ, Rechavi G, Li JB, Eisenberg E, Levanon EY. A-to-I RNA 10. Athanasiadis A, Rich A, Maas S. Widespread A-to-I RNA editing of editing occurs at over a hundred million genomic sites, located in a Alu-containing mRNAs in the human transcriptome. PLoS Biol majority of human genes. Genome Res 2013;24:365–376. 2004;2:e391. 28. Alon S, Mor E, Vigneault F, Church GM, Locatelli F, Galeano F, 11. Bass BL. RNA editing and hypermutation by adenosine deamina- Gallo A, Shomron N, Eisenberg E. Systematic identification of tion. Trends Biochem Sci 1997;22:157–162. edited microRNAs in the human brain. Genome Res 2012;22:1533– 12. Bass BL. RNA editing by adenosine deaminases that act on RNA. 1540. Annu Rev Biochem 2002;71:817–846. 29. Solomon O, Oren S, Safran M, Deshet-Unger N, Akiva P, Jacob- 13. Valente L, Nishikura K. ADAR gene family and A-to-I RNA editing: Hirsch J, Cesarkas K, Kabesa R, Amariglio N, Unger R, Rechavi G, diverse roles in posttranscriptional gene regulation. Prog Nucleic Eyal E. Global regulation of alternative splicing by adenosine deami- Acid Res Mol Biol 2005;79:299–338. nase acting on RNA (ADAR). RNA 2013;19:591–604. 14. Nishikura K. Functions and regulation of RNA editing by ADAR 30. Lev-Maor G, Ram O, Kim E, Sela N, Goren A, Levanon Erez Y, Ast deaminases. Annu Rev Biochem 2009;79:321–349. G. Intronic Alus influence alternative splicing. PLoS Genet 2008;4: 15. Hartner JC, Walkley CR, Lu J, Orkin SH. ADAR1 is essential for the e1000204. maintenance of hematopoiesis and suppression of interferon signal- 31. Lev-Maor G, Sorek R, Levanon Erez Y, Paz N, Eisenberg E, Ast G. ing. Nat Immunol 2009;10:109–115. RNA-editing-mediated exon evolution. Genome Biol 2007;8:R29. 16. Wang Q, Khillan J, Gadue P, Nishikura K. Requirement of the RNA 32. Beghini A, Ripamonti CB, Peterlongo P, Roversi G, Cairoli R, editing deaminase ADAR1 gene for embryonic erythropoiesis. Sci- Morra E, Larizza L. RNA hyperediting and alternative splicing of ence 2000;290:1765–1768. hematopoietic cell phosphatase (PTPN6) gene in acute myeloid leu- 17. Higuchi M, Maas S, Single FN, Hartner J, Rozov A, Burnashev N, kemia. Hum Mol Genet 2000;9:2297–2304. Feldmeyer D, Sprengel R, Seeburg PH. Point mutation in an AMPA 33. Rueter SM, Dawson TR, Emeson RB. Regulation of alternative splic- receptor gene rescues lethality in mice deficient in the RNA-editing ing by RNA editing. Nature 1999;399:75–80. enzyme ADAR2. Nature 2000;406:78–81. 34. Scadden ADJ. The RISC subunit Tudor-SN binds to hyper-edited 18. Brusa R, Zimmermann F, Koh DS, Feldmeyer D, Gass P, Seeburg double-stranded RNA and promotes its cleavage. Nat Struct Mol PH, Sprengel R. Early-onset epilepsy and postnatal lethality associ- Biol 2005;12:489–496.

PROTEINS 3129 O. Solomon et al.

35. Prasanth KV, Prasanth Supriya G, Xuan Z, Hearn S, Freier Susan 57. Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, M, Bennett CF, Zhang Michael Q, Spector David L. Regulating gene Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni expression through RNA nuclear retention. Cell 2005;123:249–263. G. MINT, the molecular interaction database: 2012 update. Nucleic 36. Ramaswami G, Li JB. RADAR: a rigorously annotated database of Acids Res 2012;40(Database issue):D857–D861. A-to-I RNA editing. Nucleic Acids Res 2013;42(Database issue): 58. Zhang QC, Petrey D, Deng L, Qiang L, Shi Y, Thu CA, Bisikirska B, D109–D113. Lefebvre C, Accili D, Hunter T, Maniatis T, Califano A, Honig B. 37. Kleinman CL, Adoue V, Majewski J. RNA editing of protein sequen- Structure-based prediction of protein-protein interactions on a ces: a rare event in human transcriptomes. RNA 2012;18:1586–1596. genome-wide scale. Nature 2012;490:556–560. 38. Schoft VK, Schopoff S, Jantsch Michael F. Regulation of glutamate 59. Wang G, Dunbrack RL, Jr. PISCES: recent improvements to a PDB receptor B pre-mRNA splicing by RNA editing. Nucleic Acids Res sequence culling server. Nucleic Acids Res 2005;33(Web Server 2007;35:3723–3732. issue):W94–W98. 39. Hoopengardner B, Bhalla T, Staber C, Reenan R. Nervous system 60. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local targets of RNA editing identified by comparative genomics. Science alignment search tool. J Mol Biol 1990;215:403–410. 2003;301:832–836. 61. Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, 40. Illumina Human Body Map. http://www.ncbi.nlm.nih.gov/Traces/ Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang sra/?study=ERP000546. Accessed August 24, 2014. CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjolander K, 41. Kiran A, Baranov PV. DARNED: a DAtabase of RNa EDiting in Ferrin TE, Burley SK, Sali A. ModBase, a database of annotated humans. Bioinformatics 2010;26:1772–1776. comparative protein structure models, and associated resources. 42. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for com- Nucleic Acids Res 2010;39(Database issue):D465–D474. paring genomic features. Bioinformatics 2010;26:841–842. 62. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T. The SWISS- 43. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences MODEL repository and associated resources. Nucleic Acids Res (RefSeq): a curated non-redundant sequence database of genomes, 2009;37(Database issue):D387–D392. transcripts and proteins. Nucleic Acids Res 2007;35(Database issue): 63. Krissinel E, Henrick K. Inference of macromolecular assemblies D61–D65. from crystalline state. J Mol Biol 2007;372:774–797. 44. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, 64. Eyal E, Najmanovich R, McConkey BJ, Edelman M, Sobolev V. Haussler D, Kent WJ. The UCSC Table Browser data retrieval tool. Importance of solvent accessibility and contact surfaces in modeling Nucleic Acids Res 2004;32(Database issue):D493–D496. side-chain conformations in proteins. J Comput Chem 2004;25:712– 45. Liu X, Jian X, Boerwinkle E. dbNSFP v2.0: a database of human 724. non-synonymous SNVs and their functional predictions and anno- 65. Krivov GG, Shapovalov MV, Dunbrack RL. Improved prediction of tations. Hum Mutat 2013;34:E2393–E2402. protein side-chain conformations with SCWRL4. Proteins 2009;77: 46. Eisenberg E, Adamsky K, Cohen L, Amariglio N, Hirshberg A, 778–795. Rechavi G, Levanon EY. Identification of RNA editing sites in the 66. Zhang Y, Skolnick J. TM-align: a protein structure alignment algo- SNP database. Nucleic Acids Res 2005;33:4612–4617. rithm based on the TM-score. Nucleic Acids Res 2005;33:2302– 47. Gommans WM, Tatalias NE, Sie CP, Dupuis D, Vendetti N, Smith 2309. L, Kaushal R, Maas S. Screening of human SNP database identifies 67. McConkey BJ, Sobolev V, Edelman M. Quantification of protein recoding sites of A-to-I RNA editing. RNA 2008;14:2074–2085. surfaces, volumes and atom-atom contacts using a constrained Vor- 48. Exome Variant Server, NHLBI GO Exome Sequencing Project onoi procedure. Bioinformatics 2002;18:1365–1373. (ESP), Seattle, WA, http://evs.gs.washington.edu/EVS/. Accessed 68. Eyal E, Yang LW, Bahar I. Anisotropic network model: systematic August 24, 2014. evaluation and a new web interface. Bioinformatics 2006;22:2619– 49. Sequence Read Archive (SRA). http://www.ncbi.nlm.nih.gov/sra. 2627. Accessed August 24, 2014. 69. Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, 50. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski Pietrokovski S. Network analysis of protein structures identifies EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. functional residues. J Mol Biol 2004;344:1135–1146. Nucleic Acids Res 2001;29:308–311. 70. Cusack MP, Thibert B, Bredesen DE, Del Rio G. Efficient identifica- 51. Rost B, Sander C. Prediction of protein secondary structure at bet- tion of critical residues based only on protein structure by network ter than 70% accuracy. J Mol Biol 1993;232:584–599. analysis. PLoS One 2007;2:e421. 52. Pollastri G, Przybylski D, Rost B, Baldi P. Improving the prediction 71. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. of protein secondary structure in three and eight classes using The FoldX web server: an online force field. Nucleic Acids Res 2005; recurrent neural networks and profiles. Proteins 2002;47:228–235. 33(Web Server issue):W382–W388. 53. Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based 72. R Development Core Team. R: A language and environment for sta- on estimated energy content. Bioinformatics 2005;21:3433–3434. tistical computing. R Foundation for Statistical Computing 2011, Vienna, Austria. 54. Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. The DIS- OPRED server for the prediction of protein disorder. Bioinformatics 73. Protein data bank (PDB). ftp://ftp.wwpdb.org/. Accessed August 24, 2004;20:2138–2139. 2014. 55. Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy 74. Xu G, Zhang J. Human coding RNA editing is generally nonadap- M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan tive. Proc Natl Acad Sci USA 2014;111:3769–3774. S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, 75. Rost B. PHD: predicting one-dimensional protein structure by Hermjakob H, D’Eustachio P, Stein L. Reactome: a database of reac- profile-based neural networks. Methods Enzymol 1996;266:525–539. tions, pathways and biological processes. Nucleic Acids Res 2011; 76. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and mem- 39(Database issue):D691–D697. ory-efficient alignment of short DNA sequences to the human 56. Shoemaker BA, Zhang D, Tyagi M, Thangudu RR, Fong JH, genome. Genome Biol 2009;10:R25. Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR. IBIS 77. Schwartz S, Gal-Mark N, Kfir N, Oren R, Kim E, Ast G. Alu exoni- (Inferred Biomolecular Interaction Server) reports, predicts and zation events reveal features required for precise recognition of integrates multiple types of conserved interactions for proteins. exons by the splicing machinery. PLoS Comput Biol 2009;5: Nucleic Acids Res 2012;40(Database issue):D834–D840. e1000300.

3130 PROTEINS Structural and Functional Preferences of Coding RNA Editing

78. Lev-Maor G, Sorek R, Shomron N, Ast G. The birth of an alterna- 90. Quek HH, Chow VT. Molecular and cellular studies of the human tively spliced exon: 3’ splice-site selection in Alu exons. Science homolog of the 160-kD alpha-subunit of the coatomer protein 2003;300:1288–1291. complex. DNA Cell Biol 1997;16:275–280. 79. Ellis JD, Barrios-Rodiles M, Colak R, Irimia M, Kim T, Calarco JA, 91. Kumar S, Hedges SB. A molecular timescale for vertebrate evolu- Wang X, Pan Q, O’Hanlon D, Kim PM, Wrana JL, Blencowe BJ. tion. Nature 1998;392:917–920. Tissue-specific alternative splicing remodels protein-protein interac- 92. Craig TA, Zhang Y, McNulty MS, Middha S, Ketha H, Singh RJ, tion networks. Mol Cell 2012;46:884–892. Magis AT, Funk C, Price ND, Ekker SC, Kumar R. Research 80. Buljan M, Frankish A, Bateman A. Quantifying the mechanisms of resource: whole transcriptome RNA sequencing detects multiple domain gain in animal proteins. Genome Biol 2010;11:R74. 1alpha,25-dihydroxyvitamin D(3)-sensitive metabolic pathways in 81. Illumina Inc. http://www.illumina.com. Accessed August 24, 2014. developing zebrafish. Mol Endocrinol 2012;26:1630–1642. 82. Pinto Y, Cohen HY, Levanon EY. Mammalian conserved ADAR tar- 93. Brinkkoetter PT, Olivier P, Wu JS, Henderson S, Krofft RD, Pippin gets comprise only a small fragment of the human editosome. JW, Hockenbery D, Roberts JM, Shankland SJ. Cyclin I activates Genome Biol 2014;15:R5. Cdk5 and regulates expression of Bcl-2 and Bcl-XL in postmitotic mouse cells. J Clin Invest 2009;119:3089–3101. 83. Jamroz M, Kolinski A, Kihara D. Structural features that predict 94. Zhu H, Urban DJ, Blashka J, McPheeters MT, Kroeze WK, real-value fluctuations of globular proteins. Proteins 2012;80:1425– Mieczkowski P, Overholser JC, Jurjus GJ, Dieter L, Mahajan GJ, 1435. Rajkowska G, Wang Z, Sullivan PF, Stockmeier CA, Roth BL. Quan- 84. Paschen W, Hedreen JC, Ross CA. RNA editing of the glutamate titative analysis of focused a-to-I RNA editing sites by ultra-high- receptor subunits GluR2 and GluR6 in human brain tissue. throughput sequencing in psychiatric disorders. PLoS One 2012;7: J Neurochem 1994;63:1596–1602. e43227. 85. Nutt SL, Kamboj RK. RNA editing of human kainate receptor subu- 95. Nagano T, Hashimoto T, Nakashima A, Hisanaga S, Kikkawa U, nits. Neuroreport 1994;5:2625–2629. Kamada S. Cyclin I is involved in the regulation of cell cycle pro- 86. Gonzalez C, Lopez-Rodriguez A, Srikumar D, Rosenthal JJ, gression. Cell Cycle 2013;12:2617–2624. Holmgren M. Editing of human K(V)1.1 channel mRNAs disrupts 96. Nemlich Y, Greenberg E, Ortenberg R, Besser MJ, Barshack I, Jacob- binding of the N-terminus tip at the intracellular cavity. Nat Com- Hirsch J, Jacoby E, Eyal E, Rivkin L, Prieto VG, Chakravarti N, mun 2011;2:436. Duncan LM, Kallenberg DM, Galun E, Bennett DC, Amariglio N, 87. Garrett S, Rosenthal JJ. RNA editing underlies temperature adapta- Bar-Eli M, Schachter J, Rechavi G, Markel G. MicroRNA-mediated tion in K1 channels from polar octopuses. Science 2012;335:848– loss of ADAR1 in metastatic melanoma promotes tumor growth. 851. J Clin Invest 2013;123:2703–2718. 88. Cheng Y, Geng H, Cheng SH, Liang P, Bai Y, Li J, Srivastava G, Ng 97. Kwiatkowski DP. How malaria has affected the human genome and MH, Fukagawa T, Wu X, Chan AT, Tao Q. KRAB zinc finger pro- what human genetics can teach us about malaria. Am J Hum Genet tein ZNF382 is a proapoptotic tumor suppressor that represses mul- 2005;77:171–192. tiple oncogenes and is commonly silenced in multiple carcinomas. 98. Pier GB, Grout M, Zaidi T, Meluleni G, Mueschenborn SS, Banting Cancer Res 2010;70:6516–6526. G, Ratcliff R, Evans MJ, Colledge WH. Salmonella typhi uses CFTR 89. Lee NY, Ray B, How T, Blobe GC. Endoglin promotes transforming to enter intestinal epithelial cells. Nature 1998;393:79–82. growth factor beta-mediated Smad 1/5/8 signaling and inhibits 99. Doria M, Neri F, Gallo A, Farace MG, Michienzi A. Editing of HIV- endothelial cell migration through its association with GIPC. J Biol 1 RNA by the double-stranded RNA deaminase ADAR1 stimulates Chem 2008;283:32527–32533. viral infection. Nucleic Acids Res 2009;37:5848–5858.

PROTEINS 3131