Validation of Two Ribosomal RNA Removal Methods for Microbial Metatranscriptomics
Total Page:16
File Type:pdf, Size:1020Kb
ANaLYSIS Validation of two ribosomal RNA removal methods for microbial metatranscriptomics Shaomei He1,2,5, Omri Wurtzel3,5, Kanwar Singh1, Jeff L Froula1, Suzan Yilmaz1, Susannah G Tringe1, Zhong Wang1, Feng Chen1, Erika A Lindquist1, Rotem Sorek3 & Philip Hugenholtz1,2,4 The predominance of rRNAs in the transcriptome is a major Among these methods, subtractive hybridization and exo- technical challenge in sequence-based analysis of cDNAs from nuclease digestion have become the most popular owing to the microbial isolates and communities. Several approaches have availability of commercial kits from Ambion (MICROBExpress been applied to deplete rRNAs from (meta)transcriptomes, but Bacterial mRNA Enrichment kit) and Epicentre (mRNA-ONLY no systematic investigation of potential biases introduced by Prokaryotic mRNA Isolation kit). The former kit uses a subtrac- any of these approaches has been reported. Here we validated tive hybridization with capture oligonucleotides specific to 16S the effectiveness and fidelity of the two most commonly and 23S rRNAs. It has been applied to both bacterial isolates used approaches, subtractive hybridization and exonuclease and environmental samples, in one or two rounds6,13–18. The digestion, as well as combinations of these treatments, on Epicentre kit uses exonuclease to preferentially degrade processed two synthetic five-microorganism metatranscriptomes using RNAs with 5′ monophosphate (the majority of which are believed massively parallel sequencing. We found that the effectiveness to be rRNAs)19,20. In some instances, these methods have been of rRNA removal was a function of community composition and used in combination to improve rRNA removal21–23. There is no RNA integrity for these treatments. Subtractive hybridization consensus, however, on the best approach, and existing data are alone introduced the least bias in relative transcript abundance, anecdotal. Here we validated the performance of these kits with whereas exonuclease and in particular combined treatments two synthetic five-member microbial communities using Illumina greatly compromised mRNA abundance fidelity.I llumina sequencing and found that rRNA removal efficiencies were com- sequencing itself also can compromise quantitative data munity- and RNA integrity–dependent and that only subtractive analysis by introducing a G+C bias between runs. hybridization adequately preserved relative transcript abundance for quantitative analyses. Rapid technological advances in ultra-high-throughput sequenc- © All rights reserved. 2010 Inc. Nature America, ing are making de novo sequencing of transcriptomes (RNA-seq) RESULTS a viable alternative to microarray analysis of microbial isolates Experimental design and communities1. A major technical challenge for de novo tran- We constructed two five-member synthetic microbial communities scriptome sequencing is the low relative abundance of mRNAs by pooling equimolar amounts of total RNAs extracted independ- in total cellular RNA (1–5%; ref. 2), the bulk of which is rRNAs ently from microbial isolates with sequenced genomes that span and tRNAs3. Unlike eukaryotic mRNAs, which can be selectively a wide phylogenetic, (G+C) content and genome-size range. Two synthesized into cDNA by virtue of their poly(A) tails4, bacte- of the five species were common to both communities (Table 1). rial and archaeal cDNAs are predominantly rRNA sequences5,6. We tested subtractive hybridization (Hyb), exonuclease digestion Therefore, prokaryotic rRNAs are often removed before sequenc- (Exo) and combined treatments in two experiments using the two ing to improve mRNA detection sensitivity. Different methods synthetic communities. In experiment 1, we tested Hyb, Exo and have been used to eliminate prokaryotic rRNA, including sub- Hyb followed by Exo (Hyb + Exo) using synthetic community 1. tractive hybridization with rRNA-specific probes7,8, digestion Then we conducted experiment 2 using synthetic community 2 to with exonuclease that preferentially acts on rRNA, poly(A) tail confirm the results of experiment 1 and expanded the depletion addition to discriminate against rRNA9,10, reverse transcription methods tested to also include two rounds of Hyb (2Hyb) as well with rRNA-specific primers followed by RNase H digestion as Exo followed by Hyb (Exo + Hyb) (Table 2). As a control, we to degrade rRNA:DNA hybrids11, and gel electrophoresis size used total RNA without rRNA removal. We initially assessed rRNA separation and extraction of non-rRNA bands12. removal using RNA electropherograms (Agilent 2100 Bioanalyzer) 1Department of Energy Joint Genome Institute, Walnut Creek, California, USA. 2Energy Biosciences Institute, University of California-Berkeley, Berkeley, California, USA. 3Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel. 4Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia. 5These authors contributed equally to this work. Correspondence should be addressed to P.H. ([email protected]). RECEIVED 12 APrIL; aCCEPTED 30 AUGUST; PUBLISHED ONLINE 19 SEPTEMBEr 2010; DOI:10.1038/NMETH.1507 NaTURE METHODS | ADVANCE ONLINE PUBLICATION | ANaLYSIS Table 1 | Details of microbial isolates used in the two synthetic communities Genome G+C content Match Hyb Organism size (Mbp) (%) Phylum target sitesa 23S/16Sb RINb Community Desulfovibrio vulgaris strain Hildenborough 3.7 63 Proteobacteria Yes 0.9 7.4 1 Streptomyces sp. strain LCC 8–10c 71 Actinobacteria Yes 1.4 5.5 1 Lactococcus lactis subspecies lactis IL1403 2.53 35 Firmicutes Yes 1.6 9.8 1 Spirochaeta aurantia subspecies aurantia M1 4.3 65 Spirochaeta Yes 2.1 9.8 1 and 2 Lactobacillus brevis ATCCd 367 2.3 46 Firmicutes Yes 1.4 9.9 1 and 2 Kangiella koreensis DSMd 16069 2.9 43 Proteobacteria Yes 1.4 10 2 Catenulispora acidiphila DSMd 44928 10.5 70 Actinobacteria Yes 1.3 8.6 2 Halorhabdus utahensis DSMd 12940 3.1 63 Euryarchaeota No 1.9 10 2 aInformation about 16S and 23S rRNAs matching to the capture oligos in the MICROBExpress mRNA Enrichment Kit was obtained from Ambion. bThe ratio of 23S/16S rRNAs and RNA integrity number (RIN) were determined using the Agilent 2100 Bioanalyzer. RIN values range from 1 (most degraded) to 10 (most intact). cDraft assembly; genome size was estimated to be eight to ten mega–base pairs (Mbp), which is in the range of the size of genomes of this organism’s close relatives, such as S. coelicolor A3(2) (9,054,847 bp), S. griseus subspecies griseus NITE Biological Resource Center (NBRC) 13350 (8,545,929 bp) and S. avermitilis MA-468 (9,119,895 bp). dATCC, American Type Culture Collection; DSM, Deutsche Sammlung von Mikrorganismen. (Supplementary Figs. 1 and 2 and Supplementary Note 1). We Linear regression slopes for each organism positively correlated then sequenced technical replicates (after RNA pooling) distrib- with their genomic G+C content (Fig. 1c). This indicates that uted within and between four runs (flowcells; runs 1 and 2 in run 3 systematically underrepresented (G+C)-rich sequences experiment 1, and runs 3 and 4 in experiment 2) on an Illumina relative to run 4, which we believe was due to variability in run Genome Analyzer II sequencer to evaluate intra- and interrun quality (Table 2). Two simple normalization strategies based on variation (Table 2). We generated ~10–17 million 76-base-pair the total mRNA counts from the source organism or gene G+C single-end reads for each sample, and for all but one sample we content (Supplementary Note 3) improved the overall correlation mapped 99% of reads to a reference, indicating good read quality between technical replicates (Fig. 1d), confirming that the run- and negligible contamination (Supplementary Tables 1 and 2 and to-run variation was indeed largely associated with G+C content. Supplementary Note 2). Therefore, to accurately assess rRNA depletion efficiency and mRNA fidelity, we restricted comparisons of treatment-control Technical reproducibility pairs to the same run in experiment 2 (Table 2). We took the We first evaluated technical reproducibility by comparing the rela- interrun average after we performed the intrarun comparison. tive transcript abundance of technical replicates calculated as reads per gene normalized by total mapped mRNA reads. The four intra- Efficiency of rRNA depletion run technical replicates in three independent runs (Hyb in run 1, As expected, the relative rRNA content of the controls was and controls in runs 1, 3 and 4; Table 2) were highly reproducible 95–97% of total RNA, typical of bacteria and archaea, with each (Pearson’s product-moment correlation coefficient, r = 0.997 ± community member being approximately equally represented 0.001; Fig. 1a), suggesting that both sample preparation–associated in the controls (Fig. 2a,b). After the various treatments, the variation and intrarun sequencing variation were minor. In addi- observed rRNA content decreased by as little as 3.6% (Exo) and tion, the sole interrun technical replicate between runs 1 and 2 up to 19.9% (Hyb + Exo). This decrease in rRNA percentage in © All rights reserved. 2010 Inc. Nature America, (Exo 1 i and Exo 1 ii; Table 2) was also highly reproducible total RNA reflected the redistribution of rRNA relative to non- (Fig. 1b), suggesting negligible technical variation between these rRNA owing to the treatment. For experiment 1, the rank order two runs. By contrast, the interrun technical replicates between of rRNA removal efficiencies was Exo < Hyb < Hyb + Exo;