1 Limited RNA Editing in Exons of Mouse Liver and Adipose

Genetics: Early Online, published on February 14, 2013 as 10.1534/genetics.112.149054 Limited RNA editing in exons of mouse liver and adipose tissue Sandrine Lagarrigue*,§,†,***, Farhad Hormozdiari‡,**,***, Lisa J. Martin§§,***, Frédéric Lecerf*,§,†, Yehudit Hasin§§, Christoph Rau§§, Raffi Hagopian§§, Yu Xiao§§, Jun Yan§§, Thomas A. Drake††, Anatole Ghazalpour**, Eleazar Eskin‡,**, and Aldons J. Lusis**,§§,‡‡ *Agrocampus Ouest, UMR1348 Pegase, Animal Genetics Laboratory, F-35000 Rennes, France §INRA, UMR1348 Pegase, F-35000 Rennes, France †Université Europeenne de Bretagne, France ‡Department of Computer Sciences, University of California, Los Angeles, CA 90095 **Department of Human Genetics, University of California, Los Angeles, CA 90095 §§Department of Medicine/Division of Cardiology, University of California, Los Angeles, CA 90095 ††Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095 ‡‡Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, CA 90095 ***Equal contributions of the authors Short title: RNA-Seq bias and RNA editing Key words: RNA editing, RNA Seq, inbred mice, RNA DNA Differences 1 Copyright 2013. Corresponding Authors: Aldons J. Lusis UCLA Department of Medicine/Division of Cardiology 3730 MRL 650 Charles E Young Drive South Los Angeles, CA 90095-1679 Phone: (310) 825-1359 FAX (310) 794-7345 [email protected] Sandrine Lagarrigue Agrocampus Ouest Department of Animal Sciences 65 rue de Saint Brieuc Rennes, 35000 France [email protected] 2 ABSTRACT Several studies have investigated RNA-DNA differences (RDD), presumably due to RNA editing, with conflicting results. We report a rigorous analysis of RDD in exonic regions in mice, taking into account critical biases in RNA-Seq analysis. Using deep-sequenced F1 reciprocal inbred mice, we mapped 40 million RNA-Seq reads per liver sample and 180 million reads per adipose sample. We found 7,300 apparent hepatic RDDs using a multiple-site mapping procedure, compared with 293 RDD found using a unique-site mapping procedure. After filtering for repeat sequence, splice junction proximity, undirectional strand and extremity read bias, 63 RDD remained. In adipose tissue unique-site mapping identified 1,667 RDD, and after applying the same four filters, 188 RDDs remained. In both tissues, the filtering procedure increased the proportion of canonical (A-to-I and C-to-U) editing events. The genomic DNA of 12 RDD sites among the potential 63 hepatic RDD was tested by Sanger sequencing, three of which proved to be due to an unreferenced SNP. We validated seven liver RDD with Sequenom technology, including two non-canonical, Gm5424 C-to-I(G) and Pisd I(G)-to-A RDD. Differences in diet, sex, or genetic background had very modest effects on RDD occurrence. Only a small number of apparent RDD sites overlapped between liver and adipose, indicating a high degree of tissue specificity. Our findings underscore the importance of properly filtering for bias in RNA-Seq investigations, including the necessity of confirming the DNA sequence to eliminate unreferenced SNPs. Based on our results, we conclude that RNA editing is likely limited to hundreds of events in exonic RNA in liver and adipose. 3 INTRODUCTION Several recent studies have investigated genome-wide RNA editing using deep sequencing of transcriptomes by RNA-Seq on human transformed cells, cancer cell lines (JU et al. 2011; LI et al. 2011; BAHN et al. 2012; PENG et al. 2012; RAMASWAMI et al. 2012), or tissues of inbred mouse strains (DANECEK et al. 2012; GU et al. 2012). Total reported RDD sites have varied from hundreds to thousands. Over the same period, technical issues, such as mapping of reads in paralogous or repetitive sequence regions, mapping errors at splice sites, and systematic sequencing errors that could produce a large number of false-positive RDDs have been described (KLEINMAN and MAJEWSKI 2012; LIN et al. 2012a; PICKRELL et al. 2012). Another reported source of RDD error is undetected genomic DNA SNPs, arising from insufficient coverage of current DNA sequencing data (SCHRIDER et al. 2011). We have examined genome-wide exonic RDD by using RNA-Seq data obtained from two tissues, liver and adipose, in F1 reciprocal crosses from two inbred strains of mice, DBA/2J (D2) and C57BL/6J (B6). These inbred mouse strains have been subjected to deep genomic sequencing and SNP analyses, with a higher coverage for B6 than for D2. A major aim was to estimate the impact of the major technical issues (paralog mapping, mismapping near splice sites and repeat sequences, and systematic sequencing errors, such as unidirectional strand and extremity biases) to obtain a better sense of the true frequency of RDD in normal mammalian tissues. The RDDs that remained were then characterized by comparison with Expressed Sequence Tags and tested by Sanger and quantitative Sequenom sequencing, showing the importance of controlling the genomic DNA sequence in RDD site analysis. We also examined the effects of sex and diet, and the possibility of allele-specific RNA editing. 4 MATERIALS AND METHODS Ethics statement. All animals were handled in strict accordance with good animal practice as defined by the relevant national and/or local animal welfare bodies, and all animal work was approved by the appropriate committee. All experiments in this paper were carried out with UCLA IACUC approval. Mice and tissues. RNA-Seq was performed on liver and adipose mRNA from F1 male and female D2 and B6 mice, purchased from the Jackson Laboratory (Bar Harbor, ME). Reciprocal F1 male and female mice were generated by breeding the parental strains in the vivarium at UCLA. For six liver RNA libraries, RNA from 3 mice was pooled into 4 independent samples of high fat fed B6xD2 (BXD) and DXB males and females, and 2 samples of chow fed BXD and DXB males. Four adipose RNA libraries were made using pooled RNA from three BXD and DXB males and females fed a chow diet. Males and females of other reciprocal inbred mouse crosses were used for Sequenom validation. Those F1’s were A/JxC3H/HeJ (AXH) and HXA and B6xC3H/HeJ (BxH) and HXB. Liver RNA was isolated from three mice per sex per F1 cross using the RNeasy kit from Qiagen (Valencia, CA, USA). cDNA was made with the High Capacity Reverse Transcription Kit from Applied Biosystems. All mice were fed ad libitum and maintained on a 12-hour light/dark cycle. F1 pups were weaned at 28 days and fed a chow diet (Ralston-Purina Co) until 8 weeks of age, at which time half were placed on a high fat diet (Research Diets D12266B). All F1 mice were euthanized at 16 weeks, with liver and adipose harvested at that time. Library preparation for Illumina sequencing. Library preparation was performed as recommended by the manufacturer (Illumina, Hayward, CA,USA). Briefly, total RNA was 5 extracted using the RNeasy Mini kit with DNase treatment (Qiagen, Valencia, CA, USA). PolyA mRNA was isolated and fragmented, and first strand cDNA was prepared using random hexamers. Following second strand cDNA synthesis, end repair, addition of a single A base, adaptor ligation, and agarose gel isolation of ~200 bp cDNA, PCR amplification of the ~200 bp cDNA was performed. Liver samples were sequenced using the Illumina GAIIX sequencer to a coverage of approximately 40 million single end 75bp reads per sample. Adipose was sequenced with the Illumina HiSeq2000 on paired end 50 bp reads and generated 180 million reads per sample. Read mapping. We first aligned reads 75pb (liver) or 50bp (adipose) to the mouse reference genome version mm9 using mrsFAST (HACH et al. 2010) allowing up to 5 mismatches for liver tissue and 3 for adipose). The reads were divided into two categories. The first category was the the set of mapped reads which align to the genome with e or less mismatches. The second category was reads that failed to map to the reference genome. Many RNA-Seq reads failed to align to a genome because they spanned the exonic junctions. In order to overcome this problem we mapped the unmapped reads with TopHat (TRAPNELL et al. 2009), which is designed to map reads to the genome by splitting the reads into smaller fragments. The reads aligned to the genome in this process were added to the map read set. In our multiple-site mapping procedure, we allowed up to 10 genomic locations for the same read. If a read mapped to more than 10 locations, we picked the top 10 sites with the most reads. In the unique-site mapping procedure, we only consider the reads which mapped to only one position, and reads which map to more than one position are filtered out. RDD criteria. We selected reads with base modifications of the RNA represented by at least 10 reads at the position for the edited base, located in one exon, and not corresponding to a known 6 genomic SNP between D2 and B6 (based on the Mouse Sequencing Consortium (WATERSTON et al. 2002; KEANE et al. 2011). The read was required to have a base quality of 20 or higher. A base modification A (DNA base)àB (edited base) was considered an RDD if 1) B6 and D2 were homozygous for the DNA nucleotide in the genomic DNA, 2) B generated at least 10 RNA-Seq reads, with 3) (A + B) / total reads ≥ 0.9 of all reads for the site and B / (A + B) ≥ 0.1 in RNA- Seq , 4) the RDD was shared by 4 of 4 samples for adipose tissue or 4 of 6 samples for liver (Figure S1). Filter 2 ensured a certain level of expression for the transcript with the edited base. The last filter allowed exclusion of random sequencing errors (MEACHAM et al.

1 Limited RNA Editing in Exons of Mouse Liver and Adipose

Noninvasive Sleep Monitoring in Large-Scale Screening of Knock-Out Mice

Inbred Mouse Strains Expression in Primary Immunocytes Across

Mouse Models of Congenital Cataract

Primepcr™Assay Validation Report

UC San Francisco Electronic Theses and Dissertations

Hepatic Proteome and Toxic Response of Early-Life Stage Rainbow Trout (Oncorhynchus Mykiss) to the Aquatic Herbicide, Reward®

Micrornas for Virus Pathogenicity and Host Responses, Identified In

SUPPLEMENTAL DIGITAL CONTENT (SDC) 1 SDC-METHODS Microarray Processing: Total RNA Was Reverse-Transcribed Using the One-Cycle Ta

Double Maternal-Effect: Duplicated Nucleoplasmin 2 Genes, Npm2a and Npm2b, with Essential but Distinct Functions Are Shared by Fish and Tetrapods Caroline T

Evolutionarily Conserved Regulation of Embryonic Fast-Twitch Skeletal Muscle

Extensive Cargo Identification Reveals Distinct Biological Roles of the 12 Importin Pathways Makoto Kimura1,*, Yuriko Morinaka1

Maternal DNMT3A-Dependent De Novo Methylation of the Paternal Genome Inhibits Gene Expression in the Early Embryo