Sandmann et al. Genome Biology 2011, 12:R76 http://genomebiology.com/2011/12/8/R76
RESEARCH Open Access The head-regeneration transcriptome of the planarian Schmidtea mediterranea Thomas Sandmann1,2*, Matthias C Vogg3, Suthira Owlarn3, Michael Boutros1,4 and Kerstin Bartscherer3*
Abstract Background: Planarian flatworms can regenerate their head, including a functional brain, within less than a week. Despite the enormous potential of these animals for medical research and regenerative medicine, the mechanisms of regeneration and the molecules involved remain largely unknown. Results: To identify genes that are differentially expressed during early stages of planarian head regeneration, we generated a de novo transcriptome assembly from more than 300 million paired-end reads from planarian fragments regenerating the head at 16 different time points. The assembly yielded 26,018 putative transcripts, including very long transcripts spanning multiple genomic supercontigs, and thousands of isoforms. Using short- read data from two platforms, we analyzed dynamic gene regulation during the first three days of head regeneration. We identified at least five different temporal synexpression classes, including genes specifically induced within a few hours after injury. Furthermore, we characterized the role of a conserved Runx transcription factor, smed-runt-like1. RNA interference (RNAi) knockdown and immunofluorescence analysis of the regenerating visual system indicated that smed-runt-like1 encodes a transcriptional regulator of eye morphology and photoreceptor patterning. Conclusions: Transcriptome sequencing of short reads allowed for the simultaneous de novo assembly and differential expression analysis of transcripts, demonstrating highly dynamic regulation during head regeneration in planarians.
Background a bilaterally symmetrical brain made from two cephalic The limited regenerative capabilities of humans call for ganglia, and two longitudinal ventral nerve cords, which therapies that can replace or heal wounded tissues. The extend along the body axis and send out axonal projec- treatment of neurodegenerative diseases has been a tions into nearly any micrometer of the body (reviewed major focus of regenerative medicine, as these diseases in [1]). Despite its relatively simple morphology, the pla- can cause irreversible damage to the central nervous narian brain is highly complex at the cellular level, and system (CNS). It is crucial, therefore, to understand the consists of a large number of different neuronal cell molecular mechanisms of regeneration and the intrinsic types [2-4]. Many genes expressed in the planarian CNS and extrinsic signals that induce and promote this are highly conserved in humans [5]. process. Planarians are characterized by their large pool of Planarian flatworms are one of the few animals that pluripotent adult stem cells that facilitate the regenera- can regenerate their CNS. Planarians are free-living Pla- tion of whole animals from only small pieces of their tyhelminthes with a relatively simple CNS consisting of body (reviewed in [6]). Strikingly, planarians can develop a new head within a week. This process can be classified * Correspondence: [email protected]; kerstin.bartscherer@mpi-muenster. into several distinct events. First, wounding induces a mpg.de generic, body-wide proliferation response of stem cells 1Division Signaling and Functional Genomics, German Cancer Research within the first 6 hours. Attracted by as yet unidentified Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany 3Max Planck Research Group Stem Cells and Regeneration, Max Planck guidance signals possibly released from cells at the site Institute for Molecular Biomedicine, Von-Esmarch-Str. 54, 48149 Münster, of tissue loss, stem cells accumulate at the wound within Germany 18 hours. This response is regeneration-specific and is Full list of author information is available at the end of the article
© 2011 Sandmann et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Sandmann et al. Genome Biology 2011, 12:R76 Page 2 of 19 http://genomebiology.com/2011/12/8/R76
not detected in wounded animals that have not experi- is a powerful tool for gene function discovery even in enced any tissue loss [7]. A second, regeneration-speci- organisms with no or only poorly annotated genomes. fic, localized proliferation response that reaches its peak after 2 days of regeneration generates stem cell progeny Results that contribute to the growth of the blastema. This pro- A time course of planarian head regeneration geny starts to differentiate into different cell types To study the dynamic changes in gene expression dur- between 1 and 2 days after the cut [7]. In decapitated ing head regeneration, we collected samples at 16 differ- animals, brain rudiment is detected within 24 hours, ent time points between 30 minutes and 3 days after which continuously grows and develops into a properly head amputation, as well as two control samples frozen patterned bi-lobed brain. The first clusters of photore- immediately after decapitation. To facilitate the detec- ceptor neurons appear between 2 and 3 days, dorsally to tion of genes expressed in or proximal to the blastema, the brain. With photoreceptors-to-brain, and brain-to- we extracted mRNA specifically from anterior pre-phar- ventral nerve cord connections, structural and functional yngeal tissue rather than from whole animals (Figure recovery is completed between 4 to 7 days of regenera- 1a). We prepared seven fragmented cDNA libraries for tion [8,9]. 2 × 36-bp paired-end sequencing, each including mate- Their unique regenerative properties in combination rial from two to four pooled samples (Figure 1b). with the efficiency of gene knockdown by RNA interfer- Sequencing on an Illumina Genome Analyzer II yielded ence (RNAi) have made planarians an attractive model more than 336 million raw reads (168 million read organism for investigating the molecular processes that pairs), of which 274 million (81.5%) could be mapped to underlie regeneration and stem cell biology in vivo supercontigs of the preliminary S. mediterranea genome (reviewed in [10,11]). One of the most frequently used assembly using Tophat [14]. While good correspondence species in planarian research is Schmidtea mediterranea. between MAKER gene predictions and Illumina read These planarians were collected in the Mediterranean coverage was observed in some cases (Figure 1c), we fre- area and have been maintained in laboratories world- quently detected transcription from genomic regions wide for many years, often as clonal lines originating lacking annotation (Figure 1d). To identify differentially from a single wild animal. They reproduce sexually or expressed loci independent of prior gene annotation, we asexually by fission, are 0.1 to 2 cm in size, and have a therefore used our short-read transcriptome sequencing diploid genome of approximately 850 Mb, arranged into (RNAseq) data to assemble the expressed transcriptome four chromosomes [12]. de novo. Based on the S. mediterranea genome sequencing pro- ject [13], approximately 30,000 genes have been pre- De novo assembly with Velvet and Oases dicted using the MAKER genome annotation pipeline The assembly of transcripts needs to account for alter- [12]. However, the repetitiveness and A/T richness of native splicing events as well as post-transcriptional the genome, and the fragmentation of its assembly into sequence modifications, for example, poly-adenylation of approximately 43,000 supercontigs, make genome anno- RNAs. After filtering the dataset for low base-calling tations difficult, resulting in many incomplete, redun- quality, we employed a two-step strategy to assemble dant and error-laden predictions. the remaining 318 million (94.6%) high quality reads: we To overcome these limitations and to discover poten- first generated a preliminary assembly using Velvet [15], tial regulators of planarian head regeneration, we con- incorporating 187 million reads (58.8%), followed by the structed an annotated head regeneration transcriptome construction of transcripts by Oases [16]. We obtained library by de novo assembly of hundreds of millions of 26,018 transcripts, corresponding to 18,780 non-overlap- short raw reads generated by next generation sequen- ping sequences (Figure 2a; Illumina) with a minimum cing without genomic sequence information. We used length of 200 bp. this library to map and count expressed sequence reads To assess the quality of this assembly, we first com- from different stages of regeneration and identified hun- pared it to results obtained with a complementary dreds of genes showing differential expression at differ- sequencing technique, Roche 454 sequencing. Recently, ent time points during the first 3 days following two independent studies generated 454 sequence data- decapitation. We show that an early growth response sets from different stages and tissues of S. mediterranea (EGR)-like gene is transcriptionally induced as an early [17,18]. To generate reference sequences for comparison response to injury. In addition, we further characterized with the Illumina assembly, we assembled these datasets, the biological function of a putative Runx transcription separately or combined into a single set of 454 reads, factor, smed-runt-like1, which controls photoreceptor using the isoform-aware assembler Newbler 2.5. As patterning during the regeneration of the visual system. expected, combining reads from both 454 datasets sig- Our study demonstrates that next generation sequencing nificantly improved both individual assemblies, as Sandmann et al. Genome Biology 2011, 12:R76 Page 3 of 19 http://genomebiology.com/2011/12/8/R76
(a) (b) 1st amputation 2nd amputation ctrl
Illumina AB CD E ** paired-end regeneration RNA sequencing for 0h - 3d extraction 07246 81014161812 24 2 Time after amputation (hours)
(c) v31.000002:438400..475899
440k 450k 460k 470k
MAKER prediction mk4.000002.17.01 mk4.000002.43.01 mk4.000002.18.01 mk4.000002.45.01 mk4.000002.19.01 mk4.000002.44.01 Illumina+
Gene_14234 Gene_10726 Gene_9154 600
Illumina coverage 300
0
(d) v31.000002:54000..133999
60k 70k 80k 90k 100k 110k 120k 130k MAKER prediction mk4.000002.04.01
Illumina+ Gene_4797 Gene_818 500 Illumina coverage
0 Figure 1 Next generation sequencing reveals the planarian head regeneration transcriptome. (a) Schematic overview of the two-step amputation and sample collection approach. (b) Schematic overview of the regeneration time course. Individual samples are indicated as red dots, which were analyzed in five pooled sequencing libraries (black lines, A to E). Two independent control samples were taken immediately after amputation (green dots). (c, d) Examples of Illumina transcriptome sequencing (RNAseq) reads mapped with Tophat to regions on genomic supercontig v31.000002. The short read coverage, calculated from reads from all sequenced samples, is shown in black. Green gene models represent MAKER predictions; blue models exemplify the results of the de novo assembly (Illumina+). judged, for example, by the improved average and maxi- isotigs (isogroups and their putative splice isoforms, mum lengths of the assembled transcripts (N50 = 1.1 from hereon referred to as transcripts) (Figure 2a), as kb, longest sequences = 12.2 kb) (Figure S1a-c in Addi- well as comparable mean sequence lengths (454, 946 bp; tional file 1) and an improved orthology hit ratio (Figure Illumina, 1,005 bp). Yet, their length distributions dif- S1d in Additional file 1). To evaluate assembly quality, fered, with the Illumina assembly being strongly skewed we compared our short-read de novo assembly with the towards longer sequences, reflected in a high weighted assembly obtained with the combined 454 datasets median statistic (N50 = 1.5 kb) and greater maximum (’454’). transcript length (16.7 kb), and the 454 assembly pre- Transcriptome assemblies based on Illumina or 454 senting a more symmetrical distribution with a median data yielded similar numbers of isogroups (non-overlap- of 839 bp (Figure 2b), approximately twice the length of ping sequences, from hereon referred to as genes) and the reported raw reads (Figure S1e in Additional file 1). Sandmann et al. Genome Biology 2011, 12:R76 Page 4 of 19 http://genomebiology.com/2011/12/8/R76
(a) Illumina 454 Illumina+ 2 x 168 Mio 1.3 Mio 2 x 170 Mio 36 bp 36 bp
Trimming & Filtering Trimming & Filtering Trimming & Filtering
318 Mio reads 1.2 Mio reads 318 Mio reads
Velvet & Oases Newbler 2.5 Velvet & Oases
18780 Genes 18380 Genes 17465 Genes 26018 Transcripts 24962 Transcripts 24669 Transcripts
Max: 16.7 kb Max: 12.2 kb Max: 17.6 kb N50: 1.5 kb N50: 1.1 kb N50: 1.6 kb
(b) (c) 0.0012
0.0010 1.5
0.0008 1.0 0.0006 Density Density 0.0004 0.5 0.0002
0.0000 0.0
0 1000 2000 3000 4000 5000 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Gene length Ortholog hit ratio (longest t ransc ript, bp) (d) (e)
100 80 80 60 60
40 40 C overage (%) 20 20