Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Resource The melanogaster transcriptome by paired-end RNA

Bryce Daines,1,2,6 Hui Wang,2,6 Liguo Wang,3,6 Yumei Li,1,2 Yi Han,2 David Emmert,4 William Gelbart,4 Xia Wang,1,2 Wei Li,3,5 Richard Gibbs,1,2,7 and Rui Chen1,2,7 1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA; 2Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 3Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA; 4Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA; 5Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA

RNA-seq was used to generate an extensive map of the transcriptome by broad sampling of 10 developmental stages. In total, 142.2 million uniquely mapped 64–100-bp paired-end reads were generated on the Illumina GA II yielding 3563 sequencing coverage. More than 95% of FlyBase genes and 90% of splicing junctions were observed. Modifications to 30% of FlyBase gene models were made by extension of untranslated regions, inclusion of novel exons, and identification of novel splicing events. A total of 319 novel transcripts were identified, representing a 2% increase over the current annotation. Alternate splicing was observed in 31% of D. melanogaster genes, a 38% increase over previous estimations, but significantly less than that observed in higher organisms. Much of this splicing is subtle such as tandem alternate splice sites. [Supplemental material is available for this article. The sequencing data from this study have been submitted to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi) under accession no. SRA012173 and to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession no. GSE24324.]

High-throughput sequencing of cDNA libraries (RNA-seq) is an ef- splicing isoforms is enabled without dependence on prior knowl- fective and accurate method for transcriptome profiling. RNA-seq edge. While splicing variants can also be detected with custom has been applied in many organisms including humans (Campbell microarrays, probes must be specifically designed to interrogate et al. 2008; Cloonan and Grimmond 2008; Cloonan et al. 2008; exon–exon junctions and are thus limited in scope to known or Marioni et al. 2008; Mortazavi et al. 2008; Mudge et al. 2008; predicted junctions. Finally, direct annotation of gene models and Nagalakshmi et al. 2008; Pan et al. 2008; Shendure 2008; Sultan novel transcript discovery are possible with RNA-seq (Yassour et al. et al. 2008; Wang et al. 2008; Filichkin et al. 2009; Levin et al. 2009; 2009). This can be accomplished by alignment of reads to a refer- Perkins et al. 2009; Wurtzel et al. 2009). RNA-seq is not dependent ence genome and inference of transcript models from coverage on prior knowledge and requires no design work, thus reducing and splicing data. While genome-tiling arrays can also be used to labor and enabling de novo discovery. Several additional features identify novel transcriptional units, RNA-seq is not subject to the of RNA-seq make it likely to supplant microarrays as the key tech- inherent limitations of hybridization platforms. Due to the several nology for transcriptome profiling, including accurate and sensi- advantages of the RNA-seq approach we have applied this tech- tive measuring of expression level, identification of alternate splic- nology to characterize the transcriptome of D. melanogaster. ing isoforms, and direct discovery of novel transcripts (Graveley D. melanogaster is an important and has 2008; Shendure 2008). contributed significantly to our understanding of gene expression Highly accurate digital measurement of gene expression by and development. The transcriptome of D. melanogaster is well RNA-seq can be obtained by counting the number of sequencing annotated and functionally characterized. Analysis of transcript reads which map to annotated transcripts from appropriately expression using spotted arrays has yielded insight into the de- prepared libraries (Mortazavi et al. 2008; Wang et al. 2009). In this velopmental expression patterns of annotated D. melanogaster way, RNA-seq offers increased sensitivity and larger dynamic range transcripts; however, this platform is limited to a subset of anno- over microarray, effectively detecting rare transcripts with greater tated genes (Arbeitman et al. 2002). Additionally, tilling arrays sensitivity and abundant transcripts with superior discrimination have been used to demonstrate that >29% of unnanotated introns (Marioni et al. 2008). Furthermore, RNA-seq generates reads which and intergenic regions are transcribed within the first 24 h of de- straddle exon–exon junctions (junction reads). When junction velopment at specific time points in development (Manak et al. reads are properly aligned to a reference genome the associated 2006). Tilling arrays, however, are limited in their resolution by the splice event can be inferred. Thus, direct discovery of alternate size of probes used. Expressed sequence tags (ESTs) provide the primary evidence for annotation efforts. Although more than 800,000 D. melanogaster ESTs have been sequenced, not all pre- 6 These authors contributed equally to this work. dicted or annotated genes are supported by ESTs (Stapleton et al. 7Corresponding authors. 2002a,b). Furthermore, these EST efforts have focused primarily E-mail [email protected]. on embryo and adult libraries, specifically adult head and gonad E-mail [email protected]. Article published online before print. Article, supplemental material, and pub- tissues, leaving intermediate stages of development underrep- lication date are at http://www.genome.org/cgi/doi/10.1101/gr.107854.110. resented. Continuation of Sanger-based EST sequencing efforts is

21:315–324 Ó 2011 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/11; www.genome.org Genome Research 315 www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Daines et al. not an ideal solution for improvement of gene models because of set of curated models. Other annotations based on experimental its higher cost in comparison to next-generation sequencing and computational analysis are also available; a prominent ex- platforms. More recently, high throughput sequencing has been ample is the gene models (MB7) produced by the modENCODE recognized as an ideal solution for transcriptome profiling with consortium (http://www.modENCODE.org/). For robustness we base pair resolution. In D. melanogaster the 59-end SAGE method used both of these annotations as references throughout our has been combined with Illumina sequencing to generate a de- analysis of RNA-seq data. tailed profile of transcription start sites throughout development Read alignment with BLAT enabled the direct detection of (Ahsan et al. 2009). RNA-seq will enable interrogation of all tran- junction reads. These reads are essential to the identification of scripts, including potentially unannotated transcripts without the exon–exon junctions and alternate splicing. To maximize sensi- need for further EST characterization (Torres et al. 2008). We tivity for detecting junction reads we used two independent therefore propose that characterization of the D. melanogaster computational approaches (see Supplemental materials). Approx- transcriptome by RNA-seq technology will greatly facilitate on- imately 6% of all uniquely aligned reads were junction reads, of going annotation efforts. which, 94% are consistent with annotated FlyBase events, sug- gesting high concordance between experimental evidence and existing annotations. In total, 90.6% of annotated junctions Results (46,820) were observed with 93.6% consistency between the two approaches (see Supplemental Table 1; Supplemental Fig. 1). Ad- Extensive transcriptome profiling of D. melanogaster by RNA-seq ditionally, junction reads supported ;120,000 candidate novel In order to gain a broad sampling of the D. melanogaster tran- junctions. Analytical methods were used to reduce these to 7776 scriptome, RNA-seq experiments were performed at all stages of well-supported candidates (see Supplemental materials), of which the D. melanogaster life cycle. Poly(A)+ transcripts from 10 distinct 2012 (25.9%) are reported in the modENCODE MB7 gene models. stages during the live cycle of D. melanogaster were isolated to To determine the technical robustness of RNA-seq we con- generated cDNA libraries which were sequenced on the Illumina sidered the positive predictive value (PPV) and sensitivity of the GA II instrument. As shown in Table 1, 12 independent cDNA li- platform. The PPV of RNA-seq for expressed sequences is defined as braries were generated, including embryonic, larval, pupal, and the percent of uniquely aligned reads which overlap with anno- adult. Some libraries were staged as specific windows: 2–4-h em- tated transcripts. Across all samples, 97.6% of uniquely aligned bryo, 14–16-h embryo, third instar larva, 3-d pupa, and 17-d adult. reads overlap with annotated sequences, suggesting that RNA-seq is Additional libraries were derived from broadly staged mixed - highly specific to expressed sequences (Table 1). The sensitivity of ples: embryo, larvae, and pupa. Three-day-old male and female RNA-seq is calculated as the percentage of annotated bases covered adults were sequenced separately for discovery of sex-specific var- by sequenced reads for the entire transcriptome and for each gene iation. Finally, one library of mixed-age pupal RNA was sequenced individually. With all pooled reads considered, 93.4% and 91.9% in replicate as a validation of the technology. These libraries were of FlyBase and modENCODE annotated bases were observed re- selected to represent a broad diversity of RNA molecules (Arbeitman spectively, suggesting a high degree of sensitivity (Fig. 1A). Random et al. 2002). A total of 272 million paired-end reads of 64–100 bp in sampling of pooled reads from all samples was used to simulate length were obtained. transcriptome coverage at various read depths. The curve of these Sequenced reads were aligned to the D. melanogaster genome plots is asymptotic, suggesting that deeper sequencing would offer (UCSC dm3/FlyBase r5.23) using BLAT (Kent 2002). In total, 180.5 low returns at the base-pair level. The majority of annotated genes million reads were aligned to the genome with 142.2 million are well covered by RNA-seq reads. For example, >70% of annotated mapped to unique locations, representing >3563 sequence cov- genes are covered for >95% of their length (Fig. 1B). We conclude erage of the 30Mb D. melanogaster transcriptome. The current that RNA-seq is a robust technology for transcriptome profiling FlyBase annotation (v5.23) was used to calculate digital counts of including the characterization of gene models. gene expression for each of the annotated 14,797 FlyBase genes. We observed, however, that not all annotated genes are well The FlyBase annotation is often considered the ‘‘gold standard’’ of represented by RNA-seq reads. In total, 1133 genes (7.7%) are D. melanogaster gene models because it is the most comprehensive covered at <25% by unique reads. To determine the limitations of

Table 1. RNA-collection and sequencing yields 6903 coverage of the D. melanogaster transcriptome

Read length Reads produced Sequenced bases Aligned reads Reads aligned Unique reads overlapping (bp) (millions) (Mb) (millions) uniquely (millions) exons (%)

Early embryo (E2–4hr) 62 16.0 989.9 10.5 8.9 98.7 Late embryo (E14–16hr) 75 22.9 1715.7 4.8 3.3 97.7 Embryo (E2–16hr) 75 20.3 1520.6 13.0 10.0 98.0 Embryo (E2–16hr100) 100 7.3 730.8 6.9 6.0 98.7 3rd instar larva (L3i) 75 27.3 2044.8 21.1 12.5 96.4 3rd instar larva (L3i100) 100 14.6 1455.8 13.1 7.7 99.0 Larva (Larva) 75 33.2 2489.7 26.3 20.7 98.6 3-d pupa (P3d) 75 25.0 1872.5 18.3 14.4 95.9 Pupa (Pupa) 75 30.5 2286.3 25.8 22.5 96.5 Male adult (MA3d) 75 19.8 1481.9 6.4 5.3 96.6 Female adult (FA3d) 75 24.6 1847.9 11.0 9.6 98.8 17-d adult (A17d) 75 30.7 2301.4 23.4 21.2 97.9 Total 272.0 20,737.2 180.5 142.2 97.6

316 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

The D. melanogaster transcriptome by RNA-seq

Figure 1. Deep RNA-seq covers 93% of the annotated D. melanogaster transcriptome. (A) 93.4% and 91.9% of the FlyBase and modENCODE an- notated transcriptome is observed by pooled RNA-seq reads. Simulated read densities indicate that the current depth of sequencing approaches satu- ration. (B) Annotated transcripts are well covered by RNA-seq reads. More than 70% of annotated genes are covered for >95% of their length. (C ) Most unobserved genes can be categorized into classes which are not expected to be observed due to size, genomic duplication, or lack of polyadenylation. (D) A novel exon is identified in wupA, which is supported by two novel junctions with 487 and 428 reads, respectively. (E) RT-PCR designed to validate the novel junction successfully amplifies an appropriately sized product from cDNA but not from genomic DNA.

RNA-seq it is important to understand the nature of these under- cant modification was the addition of exon sequences, which af- represented genes. The lack of gene coverage may be due to two fected 25% of gene models (3692). For example, a novel exon primary causes: either reads were generated but could not be identified in wupA is depicted (Fig. 1D). RT-PCR experiments were mapped uniquely to the reference sequence, or no reads originated designed to confirm the expression of this novel exon. Subsequent from the gene in question. The majority of underrepresented genes isolation and sequencing of an appropriately sized PCR product fall into classes which are not expected to be observed due to se- confirmed the prediction. Additionally, the untranslated regions quence features and limitations of the chosen methodology (Fig. (UTRs) of >8% of genes (1274) were extended by RNA-seq coverage 1C). For example, tRNAs, histones, 5SrRNAs, and Ste/Su(Ste) are in genes with previously unannotated UTRs. The ability of RNA- highly similar and occur in gene clusters. Reads from these genes seq to accurately define 59 and 39 UTRs is limited due to biases in are not likely to map uniquely to the reference genome. Further- sequencing; therefore, we restricted these analysis to genes whose more, unpolyadenylated transcripts such as miRNA and snoRNA UTRs were previously unannotated. are not likely to be observed because of the poly(A) selection used in the RNA purification strategy (see Methods). Finally, 128 gus- RNA-seq detects 319 novel transcripts tatory, odorant, and ionotropic receptors (small molecules whose expression is tightly restricted to specific neuronal cells) were not In addition to modifying current gene models, RNA-seq enables observed in our data potentially due to their restricted expression the identification of novel transcripts. Evidence from junction pattern although size selection may also play a role (see Methods). reads and read coverage was combined to identify 319 novel We conclude that underrepresented genes are predominantly due transcripts mapping completely outside of FlyBase gene annota- to explainable limitations of the library preparation and read tions (see Methods). These novel genes are distributed across mapping methodologies employed. chromosomes roughly as expected by chromosome size and be- tween strands in nearly equal proportions (Fig. 2A). Several fea- tures stand out for these novel genes compared to annotated genes. RNA-seq data provides evidence for modifying 30% First, these novel genes are on average smaller than annotated of gene annotations genes. The FlyBase median gene size is 1560 bp, while the median The depth of our RNA-seq data enables us to evaluate not only the size of these novel transcripts is 315 bp (Fig. 2B). Second, the ma- accuracy of current gene model annotations but also their com- jority of novel transcripts contain only two exons (Fig. 2C). Third, pleteness. While RNA-seq and current gene models are largely these novel genes are often expressed in stage and/or sex-specific consistent, a significant portion of current models are incomplete. patterns (Fig. 2D). For example, many novel transcripts are Evidence from junction reads and coverage was used to extend expressed in male but not female adults, although they are ob- annotated gene models (see Methods). Even under stringent cri- served in mixed larva and pupa samples (Fig. 2D, ‘‘Male’’). Other teria, 30% of genes (4418) underwent some level of modification clusters express most abundantly in a specific stage (Fig. 2D, during this process (see Supplemental Table 2). The most signifi- ‘‘Larva’’).

Genome Research 317 www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Daines et al.

Figure 2. RNA-seq detects 319 novel transcripts. (A) Novel transcripts detected by RNA-seq are identified on all chromosomes distributed as expected by chromosome size. (B) Novel transcripts (median size 315 bp) are much smaller than annotated FlyBase transcripts (median size 1560 bp). (C )The majority of novel transcripts identified in this study have two exons. (D) Clustering identifies many novel transcripts expressed at specific time points or in sex-specific patterns. For example, many novel transcripts are expressed in male but not female adults, although they are observed in mixed larva and pupa samples (Male). Other clusters express most abundantly in a specific stage (Larva). (E ) A novel transcript is depicted, and the associated junction is supported by 50 sequenced reads. Experimental validation by RT-PCR was obtained in pupae cDNA (see Supplemental Fig. 2).

For validation of these results, novel transcripts derived from Supplemental Fig. 3). The largest differences between the two RNA-seq were compared to two recently described data sets. One platforms occur for genes which were detected at very low levels on study using comparative genomic techniques published a set of the array but were identified in a range of expression by RNA-seq. 146 novel introns confirmed by ESTs or RT-PCR (Hiller et al. 2009). RNA-seq robustness was also observed in technical replicates. Total RNA-seq identified 55% of these introns (80 of 146). Additionally, RNA extracted from pupa-staged flies was split to derive two sepa- RNA-seq identifies 61% predicted coding introns (57 of 94) and rate sequencing libraries. These libraries were sequenced and ana- 18% of predicted introns of messenger-like-noncoding-RNAs lyzed in parallel, and the correlation coefficient of these replicates (mlncRNAs) from the same study (23 of 129). These results are was calculated as R = 0.99 (Fig. 3D). Finally, we found that by ran- consistent with the authors’ own experimental validation rate at dom sub-sampling of RNA-seq reads at various depths, gene ex- ;60%, which was lowest for the mlncRNAs due primarily to low pression levels are highly robust and R = 0.99 correlation could be and stage-specific expression. Thus we conclude that our current obtained with as few as 100,000 reads (see Supplemental Fig. 3). depth and breadth of sequencing is adequate for identification of To determine which genes are expressed in each develop- some novel transcripts. Second, significant overlap has been ob- mental stage, the read counts and normalized expression levels served between our data set and the novel genes identified by were calculated for each annotated feature (see Methods). Libraries modENCODE. Of the novel transcripts identified, 45% (144 out of were grouped according to the major stages as appropriate to con- 319) are supported by the modENCODE annotation. Finally, by sider the total number of genes observed in each. In total, 84.4% of RT-PCR the junctions supporting nine of these novel transcripts FlyBase genes (12,490) are expressed twofold above background, were validated. An example novel transcript is illustrated in Figure calculated as the average coverage of intergenic regions, in at least 2E; this junction, which was supported by only 50 sequenced one sample. Furthermore, 48.8% of FlyBase genes (7214) were reads, was validated by RT-PCR in the pupae cDNA. The experi- expressed twofold above background in all stages, suggesting that mental evidence for these validations is included in the Supple- this group of genes is broadly expressed across the D. melanogaster mental material (see Supplemental Fig. 2). lifecycle (Fig. 3A). On average, 67.5% of genes (9995) are observed within each stage. Since the majority of D. melanogaster genes are developmentally regulated (Arbeitman et al. 2002), we sought to RNA-seq provides an extensive digital expression profile identify genes which are differentially regulated during devel- of D. melanogaster development opment and those which are invariant in their expression. De- RNA-seq provides ‘‘digital’’ reading of gene expression levels velopmentally regulated genes were identified by considering the (Mortazavi et al. 2008). To assess whether RNA-seq is consistent difference between maximum and minimum expression levels with previous technology in measuring gene expression we ana- across all samples. The majority of genes expressed above back- lyzed RNA expression in parallel on the microarray platform. A ground (85.7%; 10,702 of 12,490) exhibit fourfold or greater dif- single RNA library generated previously and analyzed on the Affy- ferences in expression level between their highest and lowest stages metrix microarray was reanalyzed by RNA-seq. Expression levels of expression, suggesting developmental regulation (Fig. 3B). Ex- were evaluated and normalized appropriately for each platform and pressed genes which exhibit consistent expression were identified the correlation coefficient calculated as R = 0.76 (Spearman), in- by calculating the coefficient of variation (Fig. 3C). dicating a strong positive correlation between the platforms (see (GO) analysis was performed on the 3% of genes exhibiting the

318 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

The D. melanogaster transcriptome by RNA-seq

Figure 3. Sex-biased expression occurs in one-third of genes. (A) On average, 9995 genes are observed in each stage, 7214 genes are shared between all stages, and 12,490 genes are expressed in one or more stages. (B) 85.7% of genes exhibit greater than fourfold difference in expression between maximum and minimum expression time points. (C ) The distribution of coefficient of variation calculated for each gene identifies genes whose expression is highly consistent across development. (D) Technical replicates of pupa RNA-seq exhibit nearly perfect correlation (R = 0.99). (E) Females and early embryos exhibit a high degree of correlation (R = 0.80). (F) 5251 exhibit fourfold difference in expression level between males and females: 1088 genes up- regulated in females are consistent with their expression in early embryos (red), suggesting their importance in embryonic development, 3486 genes are up-regulated in males (blue). (G) Genes without a known ortholog are abundantly expressed in males, many of which have known male-specific functions including seminal fluid proteins and male-specific transcripts. (H) Many genes on unassembled heterochromatin (chrU) exhibit male-specific expression. (I) Genomic PCR results suggest CG40968, CG40583, CG40992,andCG41561 are linked to chromosome Y. most consistent expression (Carmona-Saez et al. 2007; Nogales- and female adults (Fig. 3E,F). Of these, 1088 genes are up-regulated Cadenas et al. 2009). Many of these genes are involved with basic in female over male but consistent between female and embryonic cellular processes (246 genes, P = 1.47 3 10À16) and functions, such expression (Fig. 3E,F). GO analysis of these genes identified func- as translation (14 genes, P = 1.39 3 10À6). tions related to embryonic development, including 45 genes in- It has been reported that a significant number of genes are volved in oogenesis (P = 5.99 3 10À13), 25 genes involved with DNA differentially expressed between males and females (Arbeitman replication (P = 4.5 3 10À16), and 17 genes involved with rRNA et al. 2002). We observed 5251 genes (35.5%) that exhibited processing (P = 1.08 3 10À9). In total, 3486 genes are up-regulated in fourfold difference in their expression levels between 3-d-old male males over females (Fig. 3F). GO analysis identified many expected

Genome Research 319 www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Daines et al. functional groups, including 27 genes involved with sensory Table 2. Alternate splicing is observed in 36% of D. melanogaster perception of chemical stimulus (P = 2.297 3 10À6), eight genes genes 3 À6 involved with post-mating behavior (P = 4.64 10 ), 38 genes in- Event Annotated Observed Total Increase volved with microtubule-based movement (P = 8.06 3 10À6), and 32 À5 genes involved with oxidation reduction (P = 1.16 3 10 ). Skipped exon (SE) 3763 90% 6464 72% Another class of genes up-regulated in males consists of genes Retained intron (RI) 1605 80% 2222 38% without an identified ortholog among the 12 Drosophila Genomes Alternate donor splice 2963 78% 4714 59% Project (Fig. 3G; Clark et al. 2007). Of 1043 genes without an an- site (ADSS) Alternate acceptor 1978 73% 3697 87% notated Drosophila ortholog, nearly 200 exhibited male-specific splice site (AASS) expression patterns; in contrast, less than 20 genes exhibited a fe- Alternate first exon (AFE) 1082 70% 1270 17% male-specific expression pattern. It is conceivable that these non- Alternate last exon (ALE) 27 70% 83 207% conserved genes are more rapidly evolving and that their male- Alternate splicing genes 3353 86% 4618 38% specific expression can be related to the forces of male-driven Many species of alternate splicing events were identified and character- evolution. Many known male-specific transcripts and sperm fac- ized in this study, including skipped exons (SE), retained intron (RI), al- tors do not have an identifiable ortholog among Drosophila species, ternate donor splice site (ADSS), alternate acceptor splice site (AASS), supporting this hypothesis; however, several genes do not have alternate first exon (AFE), and alternate last exon (ALE). Of all annotated described molecular functions and it is reasonable to postulate that alternate splicing events 86% were detected with RNA-seq data. Addi- tionally, a 38% increase was observed in the number of genes detected some of these may also be linked to male reproduction (see Sup- with alternate splice isoforms. plemental Table 3).

common annotated event is the SE, of which we observed 90% of Identification of novel D. melanogaster Y–linked genes events (3390). The SE was also the most common novel event We observed that many genes with male-specific patterns of ex- observed, of which we observed 3074 new events. The other classes pression are located on chrU (Fig. 3H). The D. melanogaster chrU were also broadly observed (Table 2). Of 3353 genes with anno- consists of unassembled contigs which are largely heterochromatic tated splicing events profiled in this study (23% of all genes) we and have not been assigned to physical genomic locations due to detected annotated alternate splicing events in 86% (2899 genes). difficulty in assembly. Chromosome Y is wholly heterochromatic; Additionally, we observed many novel alternate splicing events. therefore, much of its sequence is currently unassembled and as- The largest increase over annotation is the ALE event of which we signed as chromosome U. Only eight genes are annotated on chro- detected 64 novel events, a twofold increase. mosome Y according to the current FlyBase annotation. We hy- Alternate splicing is an essential feature of sexual determi- pothesized that genes with male-specific expression annotated on nation and differentiation in D. melanogaster, and it has been esti- chrU might potentially be linked to chromosome Y. To test for mated that as many as 22% of multi-transcript genes exhibit sex- Y-linkage, PCR primers were designed for eight genes exhibiting biased splicing patterns (Telonis-Scott et al. 2009). In our data set male-specific expression. Amplification of genomic DNA in males 3-d-old males and females were sequenced separately to enable but not in females is indicative of Y-linkage. PCR products were assessment of sex-specific alternate splicing. Of alternately spliced obtained only in males for four of the eight tested genes: CG41561, genes, 2308 were expressed in both males and females at a sufficient CG40968, CG40583, CG40992 (Fig. 3I). Evidence linking three of level to enable examination of alternate splicing. Of genes with these genes to the Y chromosome has been reported previously annotated alternate splice isoforms, differential splicing was ob- (Carvalho et al. 2000, 2001, 2003; Vibranovski et al. 2008; Krsticevic served in 23.5%. In a recent report, interrogation of 417 genes with et al. 2009). Assignment of CG41561 to chromosome Y is believed to exon-microarrays led to identification of 33 genes shown to have be a novel discovery as no FlyBase or GenBank record links this gene sex-biased splicing. Among these candidate genes, we confirmed 14 to chromosome Y. (Telonis-Scott et al. 2009), most unconfirmed genes were expressed below our threshold in one or the other sex (see Supplemental Table Alternate splicing is observed in 31% of D. melanogaster genes 4). Therefore, our analysis is quite sensitive, identified hundreds of A key feature of eukaryotic gene expression is alternate splicing. In specific transcripts, and confirms that widespread sex-specific reg- D. melanogaster humans, as many as 95% of transcripts are believed to be alter- ulation of alternate splicing occurs in . nately spliced; in contrast, in the current FlyBase annotation only 26% of genes have multiple splice isoforms. Our analysis estimates Tandem alternate splicing in the D. melanogaster transcriptome that alternate splicing occurs in at least 4618 genes (31%) of the A striking observation regarding alternate splicing is that many of D. melanogaster genome, a 38% increase over FlyBase estimates. The the 120,000 candidate novel junctions occur near annotated junc- detection of junction reads in RNA-seq data enables the discovery tions, exhibiting a 3-bp periodicity in their distance from annotated of the alternate splicing events summarized in Table 2. We profiled junctions (Fig. 4A). In fact, when these are further partitioned into the extent of defined alternate splicing events: skipped exons (SE), coding and noncoding junctions, this periodicity is strongly ob- retained introns (RI), alternate donor splice site (ADSS), alternate served only in coding regions (Fig. 4B). These observations led us to acceptor splice site (AASS), and alternate last exon (ALE) (see Sup- consider mechanisms which might result in the observed spacing. plemental materials). Alternate first exons (AFE), a transcrip- One explanation for this observation is the existence of tandem tionally regulated event detectable by RNA-seq, were also profiled. alternate splice sites (TASSs) in the D. melanogaster genome. Subtly Each of these splicing events has the capacity to generate transcript different alternate splice isoforms can be observed when multiple diversity by modifying the coding and regulatory sequences of a potential splicing sites are located in close physical proximity. We transcript (Wang et al. 2008). profiled two well-described classes of TASS: the GYNGYN donor We observed 80% of annotated alternate splicing events site and the NAGNAG alternate acceptor site (Hiller et al. 2004, (9181) across all categories of splicing events examined. The most 2006, 2007). Though many thousand TASS sites exist within the

320 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

The D. melanogaster transcriptome by RNA-seq

transcriptome of D. melanogaster only a small fraction are known to splice alternately. Furthermore, splicing at NAGNAGs is far more common than at GYNGYNs, with 135 and seven annotated sites, respectively. From the 7776 novel junctions identified, tandem al- ternate splicing was observed at 90% of annotated NAGNAG sites and 29% of GYNGYN sites. In addition to annotated sites, we ob- serve alternate splicing at 242 novel NAGNAG splicing sites and 22 novel GYNGYN sites (see Supplemental Table 5). Consistent with previous reports, we observed that the most important characteristic of determining the dominant splice site at

a NAGNAG are the bases N1 and N2 (Fig. 4E; Sinha et al. 2009). For example, CAG is a strong acceptor site while GAG is a weak ac- ceptor site. When weak sites or strong sites are paired, alternate splicing frequently occurs. Conversely, when a weak and a strong site are paired, constitutive splicing is the norm. The most abun- dant alternate splice pair CAGCAG demonstrated alternate splic-

ing at 53.2% of observed sites, suggesting that while N1N2 are highly informative for splice site strength, other factors (cis or trans) likely contribute to determine alternate splicing. These ob- servations suggest that TASSs are a common mechanism for alter- nate splicing in D. melanogaster.

Discussion

We have generated the first map of Drosophila transcription based on paired-end RNA-seq. The extensive nature of these data is il- lustrated by the broad range of 10 distinct developmental time points sampled. Additionally, from the 142.2 million uniquely mapped sequencing reads, 95% of annotated genes and 90% of annotated junctions were observed in these data. RNA-seq data provides direct experimental evidence for vali- dating and modifying current gene models and identifying novel genes. In our data sets, a total of 97.5% of uniquely mapped reads overlapped with annotated models suggesting a high level of con- cordance between experimental evidence and current annotations. Even using stringent criteria, ;30% of FlyBase genes underwent some level of modification when the coverage and junction reads data were combined with annotated gene models. These changes include extensive modification to coding and noncoding exons in 25% of gene models and the extension of previously unannotated UTRs for 8% of gene models. Furthermore, we identified 319 novel transcripts across the genome. It remains to be determined whether these transcripts have phenotypic consequences or represent noisy transcription. It seems likely that narrower developmental time points and tissue-specific samples may be necessary to identify the full range of transcripts in the D. melanogaster transcriptome. This is based on our observation that novel transcripts are usually small, Figure 4. RNA-seq detects abundant subtle alternate splice isoforms. (A) Distribution of novel splicing junctions near annotated exon–exon of low abundance, and exhibit stage and sex-specific expression junctions approximates the frequency of read indel events. The distance patterns. From these observations we conclude that, while existing between novel splice sites and annotated splice sites is plotted separately models are largely complete, RNA-seq data sets are valuable re- for canonical and noncanonical novel junctions. (B) Canonical novel sources for the continued refinement of gene models and identi- junctions in coding regions conserve frame. The portion of novel junctions which conserve frame is calculated for all candidate junctions across two fication of novel transcripts. partions: canonical/noncanonical and coding/noncoding. Only canonical Our data provide interesting insights into the extent of alter- novel junctions within coding regions conserve frame more than expected nate splicing in D. melanogaster and highlight mechanisms which À16 by chance (Binomial test n = 34,060, P-value = 2.2 3 10 ). (C ) Con- contribute to the subtle variety of splice isoforms. We identified stitutively splicing NAGNAGs can be exonic (E) and intronic (I), named for where the second NAG is incorporated. Alternate splicing NANAGs result in 7776 potential novel splice events with strong experimental sup- both E and I states. (D) Constitutively splicing GYNGYNs can be of two port. Many of these events are well supported by RNA-seq reads but forms: exonic (e) or intronic (i), named for where the first GYN is in- have not been validated by additional experimental methodolo- corporated. Alternate splicing GYNGYNs result in both e and i states. Dia- gies. On a gene-by-gene basis, interested researchers may want to gram adapted from Hiller et al. (2006). (E ) For each possible NAGNAG splice validate alternate splicing events which occur in their particular site N1 (A, T, G, or C) and N2 (A, T, G, or C), the genomic count (circular diameter) and the proportion of exonic (E), intronic (I), and alternative (EI) gene of interest. We estimate that alternate splicing occurs in 5014 splicing (circle color) was calculated. genes (36%) of the D. melanogaster genome, a 61% increase over

Genome Research 321 www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Daines et al.

FlyBase estimates. In comparison to mammals, where alternate cDNA Synthesis kit (Invitrogen) and random hexamer primers splicing is estimated to occur in 95% of genes, this number (Invitrogen, 3 mg/mL). On a 2% agrose gel, 200–400-bp cDNAs were is relatively low. Interestingly, however, we observed extensive selected and used as the DNA template for the Illumina library amounts of subtle alternate splicing, including 264 novel TASS construction. The quality and quantity of the resulting double- sites identified. stranded cDNAs was assessed using the Nanodrop 7500 spectro- We observed that sex-specific differences are extensive in photometer (Nanodrop). D. melanogaster at the gene regulatory level and at the level of al- DNA sequencing libraries were generated using the size- ternate splicing. Of genes with annotated alternate splice isoforms, selected cDNAs according to the manufacturer’s protocol. Cluster generation and sequencing was performed on the Illumina differential splicing was observed in 23.5%. This suggests that ex- cluster station and Illumina GA II following manufacturer’s tensive sex-specific regulation occurs at the level of alternate instructions. splicing. Furthermore, while the majority of genes (85.7%) appear to be developmentally regulated, there is a significant degree of sex-specific regulation in gene expression; more than one-third of Read alignment genes are differentially expressed between males and females. Several alignment programs written specifically for mapping next- From this we conclude that both gene regulation and alternate generation sequencing data were considered for this study, in- splicing contribute significantly to the differences between males cluding SOAP, SOAP2, Bowtie/Tophat, and BWA. BLAT was chosen and females in D. melanogaster. for two important reasons: First, it outperformed the other align- Several directions remain to further utilize these data. For ex- ment algorithms in the number of reads aligned and in our as- ample, it is reasonable that evidence for RNA editing can be extracted sessment of alignment quality. Short-read alignment programs, from our reads (Stapleton et al. 2006). Additionally, recent studies while efficient, offer their substantial increases in speed at the have demonstrated that de novo assembly of transcripts from RNA- expense of sensitivity and accuracy. Second, BLAT natively sup- seq data is a potential avenue for gene annotation (Yassour et al. ports alignments of intron-sized gaps and the direct discovery of 2009). Finally, for simplicity we have ignored all novel noncanonical junction reads. Small indels of two to three base pairs were per- splicing events detected in our data despite some events having very mitted as these might correspond to polymorphism or sequencing high support, and examination of these may shed additional light error; larger indels were filtered out. Alignments with gaps >39 bp on the variation introduced by alternate splicing in the D. melanogaster were additionally processed to determine whether they corre- genome. sponded to introns. While BLAT does not make use of paired-end information, custom scripts were developed to disambiguate read pairs with multiple mapping locations if a unique concordant Methods alignment could be identified. Read pairs mapping to separate chromosomes were discarded, as these are likely artifacts of library Stocks and staging preparation or alignment errors (McManus et al. 2010). CS flies were reared at room temperature (23°C–24°C) for collec- tion. Flies were staged to generate 12 separate collections. For early embryo (E2–4hr), adults were set for 2 h of egg laying on grape juice Modifying gene models plates. Embryos were then collected and allowed to age for an ad- A reference-guided assembly approach which joined overlapping ditional 2 h, resulting in embryos from 2 to 4 h in age. Late embryos reads to form blocks of coverage was used similar to approaches (E14–16hr) were collected as above, but allowed to age for 14 h. The previously described (Denoeud et al. 2008). Novel junctions were broadly staged embryos (E2–16hr and E2–16hr100) were set for 14 h incorporated into annotated gene models if they shared a common of egg laying on grape juice plates, followed by 2 h of aging resulting splice site with the model. Novel junctions were also incorporated in embryos from 2 to 16 h in age. For third instar larva (L3i and if their associated block of read coverage overlapped with the an- L3i100), ;30 wandering larvae were collected. For pupa collections, notated exons of that locus and they were derived from the ap- third instar wandering larvae were collected into new vials and propriate strand. Additionally, 59 and 39 UTRs were extended based approximately 30 pupae were collected at 24, 48, 72, 96, and 108 h. on composite coverage only if the UTR was previously annotated Mixed pupa sample (P) consisted of equimolar amounts of total as 10 bp or smaller. RNA from each of these collections, whereas 3-d pupa (P3d) con- sisted of only the 72-h collection. Three-day-old adults (MA3d and FA3d) were collected when emerging and separated as males and Identifying novel transcripts virgin females into new vials, followed by 48 h of aging. Seventeen- A reference-guided assembly approach which joined overlapping day-old adults were collected separately, allowed to age for 17 d, and reads to form blocks of coverage was used. Reads mapping outside total RNA from each was combined in equimolar amounts. of annotated FlyBase genes which could not be connected to gene models were considered for the detection of novel transcripts. RNA isolation Novel transcripts were required to contain at least one canonical Whole-body samples were homogenized in 1 mL of TRIzol (Invitro- intron and be covered by 20 or more reads. gen) per 50–100 mg of tissue using a glass-Teflon homogenizer. RNA was purified according to the suggested manufacturer’s protocol Gene expression counts (Invitrogen). The number of reads uniquely aligning to transcribed regions of each gene was calculated for all genes in the annotated tran- Library preparation scriptome. The gene read count was calculated as the number of For each library, mRNA was purified from 20 mg of total RNA using unique reads which aligned to the genome within the exons of an mRNA purification kit (Invitrogen). Double-stranded cDNAs each gene. The expression level of each gene was calculated in reads were made from mRNA using the Superscript Double-Stranded per kilobase per million reads (RPKM) according to the formula:

322 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

The D. melanogaster transcriptome by RNA-seq

Cloonan N, Grimmond SM. 2008. Transcriptome content and dynamics at 9 10 3 Ci single-nucleotide resolution. Genome Biol 9: 234. doi: 10.1186/gb-2008- Ri = ; 9-9-234. N 3 Li Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, where C is the read count, N is the sum of aligned reads in the Taylor DF, Steptoe AL, Wani S, Bethel G, et al. 2008. Stem cell experiment, and L is the length of the transcript (Mortazavi et al. transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5: 613–619. 2008). Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, et al. 2008. Annotating genomes with massive-scale RNA sequencing. Genome Biol 9: R175. doi: 10.1186/gb- GO analysis 2008-9-12-r175. Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong WK, GO analysis was performed using the GeneCoDis2.0 web server Mockler TC. 2009. Genome-wide mapping of alternative splicing in (Carmona-Saez et al. 2007; Nogales-Cadenas et al. 2009). The . Genome Res 20: 45–58. D. melanogaster organism is supported on this server, and GO cat- Graveley BR. 2008. Molecular biology: Power sequencing. Nature 453: 1197–1198. egories ‘‘Biological Process,’’ ‘‘Molecular Function,’’ and ‘‘Cellular Hiller M, Huse K, Szafranski K, Jahn N, Hampe J, Schreiber S, Backofen R, Component’’ were selected. Platzer M. 2004. Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat Genet 36: 1255–1257. Sex-specific splicing Hiller M, Huse K, Szafranski K, Rosenstiel P, Schreiber S, Backofen R, Platzer M. 2006. Phylogenetically widespread alternative splicing at unusual We counted the number of reads occurring within each exon of GYNGYN donors. Genome Biol 7: R65. doi: 10.1186/gb-2006-7-7-r65. alternate splicing genes and tested whether the male to female Hiller M, Nikolajewa S, Huse K, Szafranski K, Rosenstiel P, Schuster S, Backofen R, Platzer M. 2007. TassDB: A database of alternative tandem ratio met the expected ratio based on gene expression counts, splice sites. Nucleic Acids Res 35: D188–D192. excluding genes which were not observed in both sexes. Our cri- Hiller M, Findeiss S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, teria required that first an exon expression level exhibit greater Prohaska SJ, Reuter G, et al. 2009. Conserved introns reveal novel than twofold between male and female, and second that the x2 transcripts in Drosophila melanogaster. Genome Res 19: 1289–1300. À4.5 Kent WJ. 2002. BLAT—the BLAST-like alignment tool. Genome Res 12: 656– P-value exceed 1 3 10 . This estimate excludes those genes 664. whose alternate splicing might lead to down-regulation of the Krsticevic FJ, Santos HL, Januario S, Schrago CG, Carvalho AB. 2009. transcripts. Functional copies of the Mst77F gene on the Y chromosome of Drosophila melanogaster. Genetics 184: 295–307. Levin JZ, Berger MF, Adiconis X, Rogov P, Melnikov A, Fennell T, Nusbaum C, Garraway LA, Gnirke A. 2009. Targeted next-generation sequencing Acknowledgments of a cancer transcriptome enhances detection of sequence variants and We thank Graeme Mardon for critical reading of the manuscript. novel fusion transcripts. Genome Biol 10: R115. doi: 10.1186/gb-2009- 10-10-r115. We thank Ming Cao for contributing to formatting of the manu- Manak JR, Dike S, Sementchenko V, Kapranov P, Biemar F, Long J, Cheng J, script and technical support of the analyses. We thank the staff of Bell I, Ghosh S, Piccolboni A, et al. 2006. Biological function of the Human Genome Sequencing Center who performed the se- unannotated transcription during the early development of Drosophila quencing of genomic libraries. B.D. is supported by training grant melanogaster. Nat Genet 38: 1151–1158. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. 2008. RNA-seq: An T32 EYO7102-16. H.W. is supported by postdoctoral fellowship assessment of technical reproducibility and comparison with gene EY19430-01. This work was partially supported by NHGRI/NIH expression arrays. Genome Res 18: 1509–1517. grant 5U54HG003273 (R.G.) and Retinal Research Foundation and McManus CJ, Duff MO, Eipper-Mains J, Graveley BR. 2010. Global analysis the NEI/NIH grant R01EY016853 (R.C.). of trans-splicing in Drosophila. Proc Natl Acad Sci 107: 12975–12979. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628. References Mudge J, Miller NA, Khrebtukova I, Lindquist IE, May GD, Huntley JJ, Luo S, Zhang L, van Velkinburgh JC, Farmer AD, et al. 2008. Genomic Ahsan B, Saito TL, Hashimoto S, Muramatsu K, Tsuda M, Sasaki A, convergence analysis of schizophrenia: mRNA sequencing reveals Matsushima K, Aigaki T, Morishita S. 2009. MachiBase: A Drosophila altered synaptic vesicular transport in post-mortem cerebellum. PLoS melanogaster 59-end mRNA transcription database. Nucleic Acids Res 37: ONE 3: e3625. doi: 10.1371/journal.pone.0003625. D49–D53. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. Arbeitman MN, Furlong EE, Imam F, Johnson E, Null BH, Baker BS, Krasnow 2008. The transcriptional landscape of the yeast genome defined by MA, Scott MP, Davis RW, White KP. 2002. Gene expression during the RNA sequencing. Science 320: 1344–1349. life cycle of Drosophila melanogaster. Science 297: 2270–2275. Nogales-Cadenas R, Carmona-Saez P, Vazquez M, Vicente C, Yang X, Tirado Campbell PJ, Stephens PJ, Pleasance ED, O’Meara S, Li H, Santarius T, F, Carazo JM, Pascual-Montano A. 2009. GeneCodis: Interpreting gene Stebbings LA, Leroy C, Edkins S, Hardy C, et al. 2008. Identification of lists through enrichment analysis and integration of diverse biological somatically acquired rearrangements in cancer using genome-wide information. Nucleic Acids Res 37: W317–322. massively parallel paired-end sequencing. Nat Genet 40: 722–729. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. 2008. Deep surveying of Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A. alternative splicing complexity in the human transcriptome by high- 2007. GENECODIS: A web-based tool for finding significant concurrent throughput sequencing. Nat Genet 40: 1413–1415. annotations in gene lists. Genome Biol 8: R3. doi: 10.1186/gb-2007-8-1-r3. Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, Carvalho AB, Lazzaro BP, Clark AG. 2000. Y chromosomal fertility factors He M, Croucher NJ, Pickard DJ, et al. 2009. A strand-specific RNA-Seq kl-2 and kl-3 of Drosophila melanogaster encode dynein heavy chain analysis of the transcriptome of the typhoid bacillus Salmonella typhi. polypeptides. Proc Natl Acad Sci 97: 13239–13244. PLoS Genet 5: e1000569. doi: 10.1371/journal.pgen.1000569. Carvalho AB, Dobo BA, Vibranovski MD, Clark AG. 2001. Identification of Shendure J. 2008. The beginning of the end for microarrays? Nat Methods five new genes on the Y chromosome of Drosophila melanogaster. Proc 5: 585–587. Natl Acad Sci 98: 13225–13230. Sinha R, Nikolajewa S, Szafranski K, Hiller M, Jahn N, Huse K, Platzer M, Carvalho AB, Vibranovski MD, Carlson JW, Celniker SE, Hoskins RA, Rubin Backofen R. 2009. Accurate prediction of NAGNAG alternative splicing. GM, Sutton GG, Adams MD, Myers EW, Clark AG. 2003. Y chromosome Nucleic Acids Res 37: 3569–3579. and other heterochromatic sequences of the Drosophila melanogaster Stapleton M, Carlson J, Brokstein P, Yu C, Champe M, George R, Guarin H, genome: How far can we go? Genetica 117: 227–237. Kronmiller B, Pacleb J, Park S et al. 2002a. A Drosophila full-length cDNA Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, resource. Genome Biol 3: RESEARCH0080. doi: 10.1186/gb-2002-3-12- Kaufman TC, Kellis M, Gelbart W, Iyer VN, et al. 2007. Evolution research0080. of genes and genomes on the Drosophila phylogeny. Nature 450: 203– Stapleton M, Liao G, Brokstein P, Hong L, Carninci P, Shiraki T, Hayashizaki 218. Y, Champe M, Pacleb J, Wan K, et al. 2002b. The Drosophila gene

Genome Research 323 www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

Daines et al.

collection: Identification of putative full-length cDNAs for 70% of Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, D. melanogaster genes. Genome Res 12: 1294–1300. Schroth GP, Burge CB. 2008. Alternative isoform regulation in human Stapleton M, Carlson JW, Celniker SE. 2006. RNA editing in Drosophila tissue transcriptomes. Nature 456: 470–476. melanogaster: New targets and functional consequences. RNA 12: 1922– Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: A revolutionary tool for 1932. transcriptomics. Nat Rev Genet 10: 57–63. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert Wurtzel O, Sapra R, Chen F, Zhu Y, Simmons BA, Sorek R. 2009. A single- M, Borodina T, Soldatov A, Parkhomchuk D, et al. 2008. A global view of base resolution map of an archaeal transcriptome. Genome Res 20: 133– gene activity and alternative splicing by deep sequencing of the human 141. transcriptome. Science 321: 956–960. Yassour M, Kaplan T, Fraser HB, Levin JZ, Pfiffner J, Adiconis X, Schroth G, Telonis-Scott M, Kopp A, Wayne ML, Nuzhdin SV, McIntyre LM. 2009. Sex- Luo S, Khrebtukova I, Gnirke A, et al. 2009. Ab initio construction of specific splicing in Drosophila: Widespread occurrence, tissue specificity a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc and evolutionary conservation. Genetics 181: 421–434. Natl Acad Sci 106: 3264–3269. Torres TT, Metta M, Ottenwalder B, Schlotterer C. 2008. Gene expression profiling by massively parallel sequencing. Genome Res 18: 172–177. Vibranovski MD, Koerich LB, Carvalho AB. 2008. Two new Y-linked genes in Drosophila melanogaster. Genetics 179: 2325–2327. Received March 15, 2010; accepted in revised form September 30, 2010.

324 Genome Research www.genome.org Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

The Drosophila melanogaster transcriptome by paired-end RNA sequencing

Bryce Daines, Hui Wang, Liguo Wang, et al.

Genome Res. 2011 21: 315-324 originally published online December 22, 2010 Access the most recent version at doi:10.1101/gr.107854.110

Supplemental http://genome.cshlp.org/content/suppl/2010/12/02/gr.107854.110.DC1 Material

Related Content Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans Wei-Jen Chung, Phaedra Agius, Jakub O. Westholm, et al. Genome Res. February , 2011 21: 286-300 Genome-wide analysis of promoter architecture in Drosophila melanogaster Roger A. Hoskins, Jane M. Landolin, James B. Brown, et al. Genome Res. February , 2011 21: 182-192 Developmental control of the DNA replication and transcription programs Jared Nordman, Sharon Li, Thomas Eng, et al. Genome Res. February , 2011 21: 175-181 The transcriptional diversity of 25 Drosophila cell lines Lucy Cherbas, Aarron Willingham, Dayu Zhang, et al. Genome Res. February , 2011 21: 301-314 Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin Nicole C. Riddle, Aki Minoda, Peter V. Kharchenko, et al. Genome Res. February , 2011 21: 147-163 Deep annotation of Drosophila melanogaster microRNAs yields insights into their processing, modification, and emergence Eugene Berezikov, Nicolas Robine, Anastasia Samsonova, et al. Genome Res. February , 2011 21: 203-215 Polycomb preferentially targets stalled promoters of coding and noncoding transcripts Daniel Enderle, Christian Beisel, Michael B. Stadler, et al. Genome Res. February , 2011 21: 216-226 Chromatin signatures of the Drosophila replication program Matthew L. Eaton, Joseph A. Prinz, Heather K. MacAlpine, et al. Genome Res. February , 2011 21: 164-174 Conservation of an RNA regulatory map between Drosophila and mammals Angela N. Brooks, Li Yang, Michael O. Duff, et al. Genome Res. February , 2011 21: 193-202

References This article cites 42 articles, 13 of which can be accessed free at: http://genome.cshlp.org/content/21/2/315.full.html#ref-list-1

Articles cited in: http://genome.cshlp.org/content/21/2/315.full.html#related-urls

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Copyright © 2011 by Cold Spring Harbor Laboratory Press Downloaded from genome.cshlp.org on September 29, 2021 - Published by Cold Spring Harbor Laboratory Press

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Copyright © 2011 by Cold Spring Harbor Laboratory Press