Journal of Biotechnology 136 (2008) 3–10

Contents lists available at ScienceDirect

Journal of Biotechnology

journal homepage: www.elsevier.com/locate/jbiotec

Review The Sequencer FLXTM System—Longer reads, more applications, straight forward bioinformatics and more complete data sets

Marcus Droege a,∗, Brendon Hill b a Roche Applied Science, Global Marketing, 82372 Penzberg, Germany b 454 Life Sciences, Branford, USA article info abstract

Article history: The Genome Sequencer FLX System (GS FLX), powered by 454 , is a next-generation DNA Received 3 January 2008 sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high through- Received in revised form 17 March 2008 put. It has been proven to be the most versatile of all currently available next-generation sequencing Accepted 31 March 2008 technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole and target Keywords: DNA regions, , and RNA analysis. 454 Sequencing is a powerful tool for human genetics 2nd generation sequencing technology research, having recently re-sequenced the genome of an individual human, currently re-sequencing the Human re-sequencing De novo sequencing complete human exome and targeted genomic regions using the NimbleGen sequence capture process, Transcriptome analysis and detected low-frequency somatic mutations linked to cancer. Microbial genome sequencing © 2008 Elsevier B.V. All rights reserved. Metagenomics Plants sequencing Viral variants

Contents

1. Introduction ...... 4 2. The 454 Sequencing technology—an overview...... 4 3. Long reads, high throughput and superior single-read accuracy without filtering against reference sequences ...... 4 3.1. Average read length of 200–300 bases ...... 4 3.2. Single-read accuracy of more than 99.5%, substitution errors are exceedingly rare ...... 4 3.3. High throughput and cost-efficient split up of runs—up to 2304 samples per large run ...... 6 4. Long reads and superior single-read accuracy ensures broadest application versatility ...... 7 4.1. De novo sequencing (e.g. plant BACs, microbes, viruses) ...... 7 4.2. Whole genome re-sequencing (e.g. targeted human genomic regions, structural variations)...... 7 4.3. Amplicon sequencing (e.g. exon re-sequencing, virus variant detection, DNA methylation) ...... 8 4.4. Metagenomics and microbial diversity ...... 8 4.5. EST sequencing (e.g. transcriptome survey of organisms with unknown genomes)...... 8 4.6. Full length/shotgun sequencing of the transcriptome ...... 8 4.7. ncRNAs (all classes of ncRNA, from miRNA to >200 nt transcripts of unknown function) ...... 8 5. Bioinformatics without large investment in enterprise scale infrastructure and people...... 8 5.1. GS Reference Mapper software ...... 8 5.2. GS De Novo Assembler software ...... 8 5.3. The Amplicon Variant Analyzer (AVA) ...... 8 6. Recent breakthroughs achieved using the Genome Sequencer System...... 9 6.1. Re-sequencing of the human genome...... 9 6.2. Re-sequencing of the human exome and targeted gene regions ...... 9

∗ Corresponding author. Tel.: +49 88566 06888. E-mail address: [email protected] (M. Droege).

0168-1656/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jbiotec.2008.03.021 4 M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10

6.3. Analysis of structural variations in the human genome ...... 9 6.4. Studying the molecular basis for eusociality using expression profiling...... 9 6.5. Metagenomics and microbial diversity ...... 9 6.6. Ancient DNA ...... 9 7. Even longer reads, considerably higher throughput and significant reductions in the costs per base in 2008 ...... 9 References ...... 9

1. Introduction of wells that contain a single-amplified library bead (avoiding more than one sstDNA library bead per well). For the past 30 years, the Sanger sequencing process has The loaded PicoTiterPlate device is placed into the Genome been the standard method for sequencing DNA. Despite contin- Sequencer FLXTM Instrument (Fig. 4). The fluidics sub-system flows ued advances such as the introduction of capillary electrophoresis sequencing reagents (containing buffers and ) across systems, and a continuing decrease in costs, this method has the wells of the plate. Nucleotides are flowed sequentially in a been shown to be prohibitively expensive and time consuming fixed order across the PicoTiterPlate device during a sequencing to perform routine high-throughput sequencing such as routine run. During the flow, hundreds of thousands of beads sequencing of the human genomes. Demand for faster, affordable each carrying millions of copies of a unique single-stranded DNA DNA sequencing has led to the development of so-called “next- molecule are sequenced in parallel. If a nucleotide complemen- generation” sequencing technologies, which have the potential to tary to the template strand is flowed into a well, the polymerase sequence the human genome for several thousands of dollars in the extends the existing DNA strand by adding nucleotide(s). Addition coming years. of one (or more) nucleotide(s) results in a reaction that generates In October 2005, 454 Life Science, a member of the Roche a light signal that is recorded by the CCD camera in the instrument group, was the first company to introduce such a next-generation (Fig. 5). In a limited range, the signal strength is proportional to sequencing system into the life science market. The 454 Sequenc- the number of nucleotides incorporated in a single-nucleotide flow ing technology has been enthusiastically adopted by researchers (Fig. 6). worldwide and has already been used to achieve several of the recent breakthrough discoveries in genomic research. This article 3. Long reads, high throughput and superior single-read provides an overview of the 454 Sequencing technology and sum- accuracy without filtering against reference sequences marizes some selected breakthroughs achieved using the system. 3.1. Average read length of 200–300 bases 2. The 454 Sequencing technology—an overview Read length is one of the most important key factors in high- The 454 Sequencing System supports the analysis of samples throughput sequencing. Gaps in the consensus sequence, caused from a wide variety of starting materials including genomic DNA, by repeats, can be covered by longer reads, leading to a more PCR products, BACs, and cDNA. Samples such as genomic DNA comprehensive result. Furthermore, long reads enable haplotyping, and BACs are fractionated into small, 300–800-basepair fragments the identification of low frequency viral quasi species, annotation using a mechanical sheering process (nebulization). For smaller of fragments isolated from the transcriptome, the more accurate samples, such as small non-coding RNA or PCR amplicons, frag- quantification of microbial diversities and so forth. In fact, only the mentation is not required. Using a series of standard molecular read lengths and single-read accuracies provided by conventional biology techniques (Fig. 1), short adaptors (A and B) are added to Sanger and the new 454 Sequencing technology allow high-quality each fragment. The adaptors are used for purification, amplification, de novo sequencing of genomes and transcriptomes. Fig. 7 shows and sequencing steps. Single-stranded fragments (sstDNA) with A the average read length generated using the Genome Sequencer and B adaptors compose the sample library used for subsequent FLX System. workflow steps. During the following emulsion PCR procedure (Fig. 2) the 3.2. Single-read accuracy of more than 99.5%, substitution errors sstDNA library is first mixed with an excess of sepharose beads are exceedingly rare carrying oligonucleotides complementary to, e.g. the B-adaptor sequence of the library fragments. As a result most of these beads Compared to the Genome Sequencer 20 System, significant carry a unique single-stranded DNA library fragment. The bead- enhancements in the single-read accuracy were integrated into bound library is then emulsified with amplification reagents in a the Genome Sequencer FLX System (Fig. 8). Currently, single-read water-in-oil mixture. Each bead is now captured within its own accuracies of >99.5% over the first 200 bases are typically achieved. microreactor where clonal amplification of the single-stranded Most notably, this error rate already includes small insertions and DNA fragments occurs. This results in bead-immobilized, clonally deletions caused by the presence of homopolymers. Substitution amplified DNA fragments (ca. 10 million identical DNA molecules errors are exceedingly rare (down to 10−6). The Genome Sequencer per bead). FLXTM System offers high-throughput sequencing with single-read As preparation for the sequencing reaction, sstDNA library beads accuracies equivalent, or better than traditional Sanger sequenc- are added to the DNA Bead Incubation Mix (containing DNA poly- ing (99.5%). The vast majority of errors that make up the 0.5% merase) and are layered with Beads (containing sulfurylase single-read error rate are overcalling or undercalling the length of and luciferase) onto the 454 PicoTiterPlateTM device (Fig. 3). This homopolymeric stretches. The magnitude of this error mode can be 70 mm × 75 mm plate is an optical device containing 1.6 million put into perspective with an example that comes from researchers wells at a diameter of 44 ␮m per well. Only one library bead (around at the University of Bielefeld (Tauch et al., 2006). A Corynebacterium 30 ␮m) fits into one well. The layer of Enzyme Beads ensures that urealyticum strain containing 451 oligonucleotides in the range of the DNA beads remain positioned in the wells during the sequenc- 6–13 nucleotides was sequenced using the 454 Sequencing tech- ing reaction. The bead-deposition process maximizes the number nology (GS20). The lengths of 6 out of the 461 homopolymers were M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10 5

Fig. 1. Generation of single-stranded fragments (sstDNA) with A and B adaptors compose the sample library used for subsequent workflow steps.

Fig. 2. Clonal amplification of single sstDNA on beads during the so-called emulsion PCR. called incorrectly, demonstrating the high accuracy of the system, sequence limits data comprehensiveness. One simple example is even in homopolymeric stretches. microbial genome sequencing. Bacteria constantly swap genetic Most importantly, 454 Sequencing differentiates itself from material with the environment. Haemophilus influenzae as an exam- all other 2nd generation technologies by using quality filtering ple, has as a species 5000 genes, though only ∼1400 genes are algorithms that are independent from the availability of reference conserved among all strains (Hogg et al., 2007). Hence, due to this sequences. This is of paramount importance in cases when no refer- enormous genome plasticity of microbial genomes only de novo ence sequence is available or when a comparison to the reference sequencing and subsequent comparison to the reference ensures

Fig. 3. sstDNA library beads are added to the DNA Bead Incubation Mix (containing DNA polymerase) and are layered with Enzyme Beads (containing sulfurylase and luciferase) onto the 454 PicoTiterPlateTM device. 6 M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10

Another reason to avoid filtering against the reference is the requirement to detect insertions and deletions in the genome straightforward. The read filtering software of other next- generation technologies excludes reads differing more than 2 or 3 bases from the reference sequence from further analysis. Not con- sidering these reads during data analysis makes the discovery of insertions and deletions challenging, especially in large, complex genomes. To detect those variations using micro-reads, the devel- opment of new bioinformatics tools and availability of enterprise scale computing power (due to specificity issues accompanied with micro-reads) is needed. But even then the detection frequency is generally significantly lower compared to long-read technologies, particularly in complex genomes.

3.3. High throughput and cost-efficient split up of runs—up to Fig. 4. The Genome Sequencer FLXTM Instrument. 2304 samples per large run to get the complete picture about the genome, including infor- A large run on the Genome Sequencer FLX System typically mation on additional or deleted genes, structural variations, etc. yields more than 400,000 reads at an average read length of 250 Even if a reference sequence is available, a complete comparative bases. Since two sequencing runs can be easily started in a standard genomics study requires de novo sequencing first, then the compar- workday, 1 gigabase of sequence information can be generated in 5 ison against the reference sequence. days. For smaller projects, application development, or feasibility

Fig. 5. Graphical representation of the 454 Sequencing technology (advanced technology).

Fig. 6. The 454 Flowgram. The signal strength provided on the y-axis is proportional to the number of nucleotides incorporated in a single-nucleotide flow. M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10 7

Fig. 7. Read length provided by the system. Depending on the organisms, read lengths are in the range of 200–300 high-quality bases. Genomes that are more AT (Campylobacter jejuni)orGCrich(Thermophilus thermophilus) typically yield a longer read length distribution as compared to an AT/GC neutral genome.

testing, a small run yielding 20 megabases is available. A 100-base 4. Long reads and superior single-read accuracy ensures read length kit, generating 400,000 reads, has been optimized for broadest application versatility short-read applications, such as expression profiling. In addition to multiple sequencing kit formats, the GS FLX offers sample-specific With the unique combination of read length and through- multiplex identifiers (MIDs). MIDs are short-nucleotide adaptors put (numbers of reads) per run, the Genome Sequencer FLXTM added to the ends of DNA fragments in a sample which allow for is the most versatile of all 2nd generation sequencing technolo- the efficient pooling of samples that require less total sequence gies, offering applications from whole genome de novo sequencing data, such as BACs or amplicons. Using this approach, between 1 to expression profiling. The following paragraphs summarize why and 2304 samples can be sequenced per large format run. researchers have used the Genome Sequencer System for their applications.

4.1. De novo sequencing (e.g. plant BACs, microbes, viruses)

Researchers have chosen 454 Sequencing because highly accu- rate 250 bp reads yield assemblies that deliver significantly better genome coverage, ∼5–10-fold fewer contigs, and minimal mis- assemblies compared to shorter read lengths of approximately 35 bp (so-called micro-reads). Long paired-end reads align specif- ically throughout the entire genome, allowing for scaffolding of contigs. These benefits produce more comprehensive discoveries and minimize the need for costly follow-up experiments such as gap closure. Selected examples for de novo sequencing on the Genome Sequencer System: Velasco et al. (2007), Poletal. (2007), Pearson et al. (2007), Andries et al. (2005), and Oh et al. (2006). Fig. 8. Enhancements in the single-read accuracy of Genome Sequencer Systems over time. When the technology was made public in 2005, a single-read accuracy of approximately 96% has been reported. Based on several changes in chemistry 4.2. Whole genome re-sequencing (e.g. targeted human genomic and computer algorithms, an accuracy of at least 99% has been achieved in 2006. regions, structural variations) Meanwhile a superior accuracy of >99.5%, including errors caused by the presence of homopolymers, can be reported (colored lines). Excluding homopolymers, single- read accuracy is >99.9% in, e.g. E. coli. For comparison, the single-read accuracy of Researchers have chosen 454 Sequencing because 250 bp reads the traditional Sanger method is in the range of 99.3–99.6%. effectively span many short repeats and 99.5% single-read accuracy 8 M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10 eliminates the micro-read requirement for filtering reads against 4.7. ncRNAs (all classes of ncRNA, from miRNA to >200 nt the reference sequence. The ability to sequence bias-free at high- transcripts of unknown function) single-read accuracy, to resolve many repeats and to eliminate tedious and difficult filtering against a reference sequence results in Several different classes of non-coding RNAs (ncRNAs) in a genome coverage of close to 99%, and considerably more efficient the range of 20–>200 bp have been identified (Gingeras, 2007). identification of more mutations, including insertions and dele- Researchers have chosen 454 clonal sequencing because it is a tions. Long paired-end reads of 100 nt, only available on the GS FLX, cheap, fast and bias-free tool to discover these ncRNA species on a align specifically throughout the entire human genome, yielding a genome wide basis. 250 bp reads cover the broadest range of these comprehensive list of structural variations. Selected examples for known and unknown ncRNA species and the 99.5% single-read re-sequencing using the 454 Sequencing technology: Albert et al. accuracy enables the differentiation of highly similar sequences. (2007) and Korbel et al. (2007). The combination of high accuracy and comprehensive coverage provides a more complete picture of the existence and putative function of ncRNAs in the genome than micro-read technologies. 4.3. Amplicon sequencing (e.g. exon re-sequencing, virus variant Furthermore, the read lengths are long enough to completely read detection, DNA methylation) through ncRNA cloning adaptors (known sequence), providing an ideal quality control of each read. Selected examples of miRNA dis- Researchers have chosen 454 Sequencing because 250 bp reads covery using the 454 Sequencing technology: Ruby et al. (2007), span typical amplicons such as exons (which in humans have an Ruby et al. (2006), Yao et al. (2007), Axtell et al. (2007), Girard et al. average size of 200 bp), and enable real haplotyping over the full (2006), and Berezikov et al. (2006). exon length, preventing the inaccurate assessment of variation pat- terns typical of micro-read technologies. High-single-read accuracy of 99.5% means each read can be used to determine low-frequency 5. Bioinformatics without large investment in enterprise variations such as somatic mutations or rare genotypes in complex scale infrastructure and people samples. Selected examples of amplicon sequencing using the 454 Sequencing technology: Dahl et al. (2007), Pettersson et al. (2007), Data analysis is a rate-limiting step in many labs. To help TM Thomas et al. (2005), Korshunova et al. (2007), and Taylor et al. researchers make discoveries faster, the Genome Sequencer FLX (2007). System comes with a suite of state-of-the-art analysis tools that integrate seamlessly with the instrument and are optimized for 454 Sequence data analysis. Unlike other systems which leave you to 4.4. Metagenomics and microbial diversity fend for yourself, the Genome Sequencer FLXTM provides you with the tools needed to make discoveries. Researchers have chosen the 454 Sequencing because highly accurate 250 bp reads provide the uniqueness required to unam- 5.1. GS Reference Mapper software biguously identify an organism or gene in an unknown complex environmental sample by mapping or assembly of single reads. The The GS Reference Mapper software maps shotgun reads against uniqueness of 250 bp reads ensures an accurate prediction of the a given reference sequence and assembles them into a consensus diversity compared to micro-reads, which usage often results in an sequence. In addition, it generates a list of high-confidence muta- underestimation of the microbial diversity (micro-reads with sev- tions (e.g. SNPs, insertions, deletions) by identification of individual eral hits against the database are not specific and therefore need bases that differ between the generated consensus DNA sequence to be excluded from the analysis). Selected examples for metage- and the reference sequence. Currently, the GS Reference Mapper nomics using the 454 Sequencing technology: Cox-Foster et al. software will map reads against genomes up to 3 gigabase in size (2007), Huber et al. (2007), Leininger et al. (2006), Sogin et al. and is capable of co-assembling reads generated with both the tra- (2006), Turnbaugh et al. (2006), Warnecke et al. (2007), and Wegley ditional Sanger and the 454 technology. et al. (2007).

5.2. GS De Novo Assembler software 4.5. EST sequencing (e.g. transcriptome survey of organisms with unknown genomes) The GS De Novo Assembler software generates a consensus sequence by de novo assembly of the shotgun sequencing reads Researchers have chosen highly accurate 250 bp reads for EST into contigs. Subsequent ordering of these contigs into scaffolds sequencing because they offer better annotations of unknown is achieved with the addition of paired-end reads. Currently, the GS genes and more accurate resolution of different expression levels De Novo Assembler software allows for routine de novo assembly of alleles and gene family members, in both sequenced and non- of genomes up to 500 Mb megabases in size and is capable of co- sequenced genomes. Selected examples for EST sequencing using assembling reads generated with both the traditional Sanger and the Genome Sequencer System: Eveland et al. (2007) and Gowda et the 454 technology. al. (2006). 5.3. The Amplicon Variant Analyzer (AVA) 4.6. Full length/shotgun sequencing of the transcriptome The AVA software application computes the alignment of reads Researchers choose 454 Sequencing because highly accurate from amplicon libraries sequenced using the Genome Sequencer 250 bp reads provide enough contiguous sequence to map and FLX System, and identifies differences between the reads and a assemble known and unknown transcriptomes, identify splice reference sequence. The frequency of variants can be detected variants and annotate transcripts of unknown function. Selected and quantified by examination of the read alignments. This tool examples for full-length sequencing using the Genome Sequencer has been shown to be perfectly suited for the identification and System: Bainbridge et al. (2006), Weber et al. (2007), and Emrich quantification of somatic mutations in cancer samples (Thomas et al. (2007). et al., 2005) or for the detection of mutations conferring resis- M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10 9 tance in HIV quasi species (Hoffmann et al., 2007; Wang et al., genes, suggesting that the evolution of eusociality involved major 2007). nutritional and reproductive pathways.

6. Recent breakthroughs achieved using the Genome 6.5. Metagenomics and microbial diversity Sequencer System Mitchell Sogin and team from Woods Hole Oceanographic Insti- In October of 2006, Roche Applied Science and 454 Life Sci- tution analyzed the microbial diversity of deep sea samples using ence announced the publication of the 100th peer-reviewed study the amplicon sequencing procedure of the Genome Sequencer Sys- using the Genome Sequencer System, including 12 papers in Nature, tem (Sogin et al., 2006). They found that the microbial diversity 11 papers in Science, 10 papers in Nucleic Acids Research,or4 in their samples was 1–2 orders of magnitude more complex than papers in Cell. These first 100 studies span a diverse group of ever published. DNA sequencing applications, including de novo sequencing and A study which has been published by a team at Washington re-sequencing of whole genomes, metagenomics, RNA analysis, and University suggests that the microflora in obese individuals differs targeted sequencing of DNA regions of interest. The following para- significantly from that of lean individuals (Turnbaugh et al., 2006). graphs provides a short overview of some selected breakthroughs Furthermore, the injection of microbiota isolated from obese mice achieved using the system. into the gut of a germ-free lean mouse resulted in symptoms of obesity. 6.1. Re-sequencing of the human genome Using a breakthrough transcriptome metagenomics approach a team from Columbia University found a significant connection The 454 Sequencing technology was the first 2nd generation between an RNA virus identified in honey bees and the honey bee sequencing technology used to decipher the genome of an individ- colony collapse disorder (Cox-Foster et al., 2007). ual human being—the genome of Dr. James D. Watson. The data was initially analyzed by the Human Genome Sequencing Center 6.6. Ancient DNA at the Baylor College of Medcine, Houston, USA, and the publicly available sequence is expected to provide many new insights into Researchers from Pennsylvania State University in the US used human genetics. the Genome Sequencer System to re-sequence the wholly mam- moth genome (Poinar et al., 2006). At the MPI in Leipzig, Germany, 6.2. Re-sequencing of the human exome and targeted gene Svante Päboö are sequencing the Neanderthal genome, with the regions aim to better understand the evolutionary background of variations in the human genome (Green et al., 2006). One of the most recent breakthroughs in human genome re-sequencing also has been achieved by the Human Genome 7. Even longer reads, considerably higher throughput and Sequencing Center at Baylor College of Medicine, in cooperation significant reductions in the costs per base in 2008 with Roche Nimblegen (Albert et al., 2007). Based on the Roche Nimblegen capturing and enrichment procedure, 6726 approx- Based upon the exciting research of customers worldwide, one imately 500-base ‘exon’ segments, and ‘locus-specific’ regions can expect that many more groundbreaking studies are pending. ranging in size from 200 kb to 5 Mb, we extracted from the human Planned improvements to the GS FLX will include an increase in genome and quickly sequenced on a GS FLX System. The direct sequence read length beyond 400 base pairs and throughput in enrichment method avoids the need for the tedious and expensive excess of 1 billion bases per day (>5 GB in 5 working days). These generation of thousands of small and long-range PCR products and system improvements can be used on the same GS FLX hardware can, in combination with the 454 Sequencing technology, be con- configuration that is currently available and in use in labs around sidered as a milestone towards routine sequencing of the human the world today. genome. References 6.3. Analysis of structural variations in the human genome Albert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X., Richmond, Recently, a group of researchers at Yale University used 454’s T.A., Middle, C.M., Rodesch, M.J., Packard, C.J., Weinstock, G.M., Gibbs, R.A., 2007. Direct selection of human genomic loci by microarray hybridization. Nature new 100 nt paired-end sequencing strategy to determine structural Methods 4 (11), 903–905. differences between human genomes. In the two human genomes Andries, K., Verhasselt, P., Guillemont, J., Göhlmann, H.W.H., Neefs, J.-M., Winkler, analyzed, more than 1000 different structural variations have been H., Van Gestel, J., Timmerman, P., Zhu, M., Lee, E., Williams, P., de Chaffoy, D., Huitric, E., Hoffner, S., Cambau, E., Truffot-Pernot, C., Lounis, N., Jarlier, V., 2005. found, suggesting that structural variation is responsible for a larger A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis. number of differences between the genomes of two individuals Science 307 (5707), 223–227. than single-nucleotide polymorphisms (Korbel et al., 2007). Axtell, M.J., Snyder, J.A., Bartel, D.P., 2007. Common functions for diverse small RNAs of land plants. Plant Cell 19 (6), 1750–1769. Bainbridge, M.N., Warren, R.L., Hirst, M., Romanuik, T., Zeng, T., Go, A., Delaney, A., 6.4. Studying the molecular basis for eusociality using expression Griffith, M., Hickenbotham, M., Magrini, V., Mardis, E.R., Sadar, M.D., Siddiqui, profiling A.S., Marra, M.A., Jones, S.J.M., 2006. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-bysynthesis approach. BMC Genomics 7, 246. In order to study the molecular mechanisms underlying euso- Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E., ciality in wasp colonies, researchers from the University of Illinois Plasterk, R.H., 2006. Diversity of microRNAs in human and chimpanzee brain. in the US recently analyzed differences in the gene expression pro- Nature Genetics 38 (12), 1375–1377. Cox-Foster, D.L., Conlan, S., Holmes, E.C., Palacios, G., Evans, J.D., Moran, N.A., Quan, P.- files in the brain of wasp queens, gynes, workers, and foundresses L., Briese, T., Hornig, M., Geiser, D.M., Martinson, V., vanEngelsdorp, D., Kalkstein, (Toth et al., 2007). For the first time it could be shown that gene A.L., Drysdale, A., Hui, J., Zhai, J., Cui, L., Hutchison, S.K., Simons, J.F., Egholm, M., expression in worker wasp brains was more similar to foundresses, Pettis, J.S., Lipkin, W.I., 2007. A metagenomic survey of microbes in honey bee colony collapse disorder. Science 318 (5848), 283–287. which show maternal care, than to queens and gynes, which do Dahl, F., Stenberg, J., Fredriksson, S., Welch, K., Zhang, M., Nilsson, M., Bicknell, D., not. Insulin-related genes were among the differentially regulated Bodmer, W.F., Davis, R.W., Ji, H., 2007. Multigene amplification and massively 10 M. Droege, B. Hill / Journal of Biotechnology 136 (2008) 3–10

parallel sequencing for cancer mutation discovery. Proceedings of the National Ruby, J.G., Jan, C.H., Bartel, D.P., 2007. Intronic microRNA precursors that bypass Academy of Sciences U.S.A. 104 (22), 9387–9392. Drosha processing. Nature 448 (7149), 83–86. Emrich, S.J., Barbazuk, W.B., Li, L., Schnable, P.S., 2007. Gene discovery and annotation Sogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M., Huse, S.H., Neal, P.R., Arrieta, J.M., using LCM-454 transcriptome sequencing. Genome Research 17 (1), 69–73. Herndl, G.J., 2006. Microbial diversity in the deep sea and the underexplored Eveland, A.L., McCarty, D.R., Koch, K.E., 2008. Transcript profiling by 3(UTR sequenc- ‘rare biosphere’. Proceedings of the National Academy of Sciences U.S.A. 103 ing resolves expression of gene families. Plant Physiology 146, 32–44. (32), 12115–12120. Gingeras, T.R., 2007. Origin of phenotypes: genes and transcripts. Genome Research Tauch, A., Trost, E., Bekel, T., Goesmann, A., Ludewig, U., Pühler, A. 2006. Ultra- 17, 682–690. fast de novo sequencing of the human pathogen Corynebacterium urealyticum Girard, A., Sachidanandam, R., Hannon, G.J., Carmell, M.A., 2006. A germline- with the Genome Sequencer System. Roche Application note; www.genome- specific class of small RNAs binds mammalian Piwi proteins. Nature 442 (7099), sequencing.com. 199–202. Taylor, K.H., Kramer, R.S., Davis, J.W., Guo, J., Duff, D.J., Xu, D., Caldwell, C.W., Gowda, M., Li, H., Alessi, J., Chen, F., Pratt, R., Wang, G.-L., 2006. Robust analysis of Shi, H., 2007. Ultradeep bisulfite sequencing analysis of DNA methylation pat- 5-transcript ends (5-RATE): a novel technique for transcriptome analysis and terns in multiple gene promoters by 454 Sequencing. Cancer Research 67 (18), genome annotation. Nucleic Acids Research 34 (19), e126. 8511–8518. Green, R.E., Krause, J., Ptak, S.E., Briggs, A.W., Ronan, M.T., Simons, J.F., Du, L., Egholm, Thomas, R.K., Nickerson, E., Simons, J.F., Jänne, P.A., Tengs, T., Yuza, Y., Garraway, L.A., M., Rothberg, J.M., Paunovic, M., Pääbo, S., 2006. Analysis of one million base LaFramboise, T., Lee, J.C., Shah, K., O’Neill, K., Sasaki, H., Lindeman, N., Wong, pairs of Neanderthal DNA. Nature 444 (7117), 330–336. K.K., Borras, A.M., Gutmann, E.J., Dragnev, K.H., DeBiasi, R., Chen, T.H., Glatt, K.A., Hoffmann, C., Minkah, N., Leipzig, J., Wang, G., Arens, M.Q., Tebas, P., Bushman, F.D., Greulich, H., Desany, B., Lubeski, C.K., Brockman, W., Alvarez, P., Hutchison, S.K., 2007. DNA bar coding and pyrosequencing to identify rare HIV drug resistance Leamon, J.H., Ronan, M.T., Turenchalk, G.S., Egholm, M., Sellers, W.R., Rothberg, mutations. Nucleic Acids Research 35 (13), e91. J.M., Meyerson, M., 2005. Sensitive mutation detection in heterogeneous cancer Hogg, J.S., Hu, F.Z., Janto, B., Boissy, R., Hayes, J., Keefe, R., Post, J.C., Ehrlich, G.D., specimens by massively parallel picoliter reactor sequencing. Nature Medicine 2007. Characterization and modeling of the Haemophilus influenzae core and 12 (7), 852–855. supragenomes based on the complete genomic sequences of Rd and 12 clinical Toth, A.L., Varala, K., Newman, T.C., Miguez, F.E., Hutchison, S.K., Willoughby, D.A., nontypeable strains. Genome Biology, doi:10.1186/gb-2007-8-6-r103, 8:R103. Simons, J.F., Egholm, M., Hunt, J.H., Hudson, M.E., Robinson, G.E., 2007. Wasp Huber, J.A., Welch, D.B.M., Morrison, H.G., Huse, S.M., Neal, P.R., Butterfield, D.A., gene expression supports an evolutionary link between maternal behavior and Sogin, M.L., 2007. Microbial population structures in the deep marine biosphere. eusociality. Science 318 (5849), 441–444. Science 318 (5847), 97–100. Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., Gordon, J.I., 2006. Korbel, J.O., Urban, A.E., Affourtit, J.P., Godwin, B., Grubert, F., Simons, J.F., Kim, P.M., An obesity-associated gut microbiome with increased capacity for energy har- Palejev, D., Carriero, N.J., Du, L., Taillon, B.E., Chen, Z., Tanzer, A., Saunders, A.C.E., vest. Nature 444 (7122), 1027–1031. Chi, J., Yang, F., Carter, N.P., Hurles, M.E., Weissman, S.M., Harkins, T.T., Ger- Velasco, R., Zharkikh, A., Troggio, A., Cartwright, D.A., Cestaro, A., Pruss, D., Pindo, M., stein, M.B., Egholm, M., Snyder, M., 2007. Paired-end mapping reveals extensive FitzGerald, L.M., Vezzulli, S., Reid, J., Malacarne, G., Iliev, D., Coppola, G., Wardell, structural variation in the human genome. Science 318 (5849), 420–426. B., Micheletti, D., Macalma, T., Facci, M., Mitchell, J.T., Perazzolli, M., Eldredge, Korshunova, Y., Maloney, R.K., Lakey, N., Citek, R.W., Bacher, B., Budiman, A., Ordway, G., Gatto, P., Oyzerski, R., Moretto, M., Gutin, N., Stefanini, M., Chen, Y., Segala, J.M., McCombie, W.R., Leon, J., Jeddeloh, J.A., McPherson, J.D., 2008. Mas- C., Davenport, C., Demattè, L., Mraz, A., Battilana, J., Stormo, K., Costa, F., Tao, Q., sively parallel bisulphate pyrosequencing reveals the molecular complexity of Si-Ammour, A., Harkins, T., Lackey, A., Perbost, C., Taillon, B., Stella, A., Solovyev, breast cancer-associated cytosine-methylation patterns obtained from tissue V., Fawcett, J.A., Sterck, L., Vandepoele, K., Grando, S.M., Toppo, S., Moser, C., and serum DNA. Genome Research 18, 19–29. Lanchbury, J., Bogden, R., Skolnick, M., Sgaramella, V., Bhatnagar, S.K., Fontana, Leininger, S., Urich, T., Schloter, M., Schwark, L., Qi, J., Nicol, G.W., Prosser, J.I., Schus- P., Gutin, A., Van de Peer, Y., Salamini, F., Viola, R., 2007. A high quality draft ter, S.C., Schleper, C., 2006. Archaea predominate among ammonia-oxidizing consensus sequence of the genome of a heterozygous grapevine variety. PLoS prokaryotes in soils. Nature 442 (7104), 806–809. One 2 (12), e1326. Oh, J.D., Kling-Bäckhed, H., Giannakis, M., Xu, J., Fulton, R.S., Fulton, L.A., Cordum, Wang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M., Shafer, R.W., 2007. Characteriza- H.S., Wang, C., Elliott, G., Edwards, J., Mardis, E.R., Engstrand, L.G., Gordon, J.I., tion of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 2006. The complete genome sequence of a chronic atrophic gastritis Helicobacter drug resistance. Genome Research 17 (8), 1195–1201. pylori strain: evolution during disease progression. Proceedings of the National Warnecke, F., Luginbühl, P., Ivanova, N., Ghassemian, M., Richardson, T.H., Stege, J.T., Academy of Sciences U.S.A. 103 (26), 9999–10004. Cayouette, M., McHardy, A.C., Djordjevic, G., Aboushadi, N., Sorek, R., Tringe, Pearson, B.M., Gaskin, D.J.H., Segers, R.P.A.M., Wells, J.M., Nuijten, P.J.M., van Vliet, S.G., Podar, M., Martin, H.G., Kunin, V., Dalevi, D., Madejska, J., Kirton, E., Platt, A.H.M., 2007. The complete genome sequence of Campylobacter jejuni strain D., Szeto, E., Salamov, A., Barry, K., Mikhailova, N., Kyrpides, N.C., Matson, E.G., 81116 (NCTC11828). Journal of Bacteriology 189 (22), 8402–8403. Ottesen, E.A., Zhang, X., Hernández, M., Murillo, C., Acosta, L.G., Rigoutsos, I., Pettersson, E., Zajac, P., Ståhl, P.L., Jacobsson, J.A., Fredriksson, R., Marcus, C., Schiöth, Tamayo, G., Green, B.D., Chang, C., Rubin, E.M., Mathur, E.J., Robertson, D.E., H.B., Lundeberg, J., Ahmadian, A., 2007. Allelotyping by massively parallel Hugenholtz, P., Leadbetter, J.R., 2007. Metagenomic and functional analysis pyrosequencing of SNP-carrying trinucleotide threads. Human Mutation 29 (2), of hindgut microbiota of a wood-feeding higher termite. Nature 450 (7169), 323–329. 560–565. Poinar, H.N., Schwarz, C., Qi, J., Shapiro, B., Macphee, R.D., Buigues, B., Tikhonov, A., Weber, A.P., Weber, K.L., Carr, K., Wilkerson, C., Ohlrogge, J.B., 2007. Sampling the Huson, D.H., Tomsho, L.P., Auch, A., Rampp, M., Miller, W., Schuster, S.C., 2006. Arabidopsis transcriptome with massively-parallel pyrosequencing. Plant Phys- Metagenomics to paleogenomics: largescale sequencing of mammoth DNA. Sci- iology 144 (1), 32–42. ence 311 (5759), 392–394. Wegley, L., Edwards, R., Rodriguez-Brito, B., Liu, H., Rohwer, F., 2007. Metagenomic Pol, A., Heijmans, K., Harhangi, H.R., Tedesco, D., Jetten, M.S.M., Op den Camp, H.J.M., analysis of the microbial community associated with the coral Porites astreoides. 2007. Methanotrophy below pH 1 by a new Verrucomicrobia species. Nature 450 Environmental Microbiology 9 (11), 2707–2719. (7171), 874–878. Yao, Y., Guo, G., Ni, Z., Sunkar, R., Du, J., Zhu, J.-K., Sun, Q., 2007. Cloning and char- Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., Bartel, D.P., acterization of microRNAs from wheat (Triticum aestivum L.). Genome Biology 8 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and (6), R96. endogenous siRNAs in C. elegans. Cell 127 (6), 1193–1207.