TECHNOLOGY FEATURE NATURE|Vol 449|4 October 2007

are in fact quite different from one another 250 bases each with an accuracy of 99.5% or and at various stages of maturity. “The prin- better,” says Egholm. Although the 454 Life ciple that we use when applying these new Sciences system is not as accurate as conven- technologies is that there is a lot of expensive tional Sanger , Egholm notes that sequencing that we do with Applied Biosys- it is an order of magnitude more productive tem’s 3730xl system and anything that we (see ‘Truth and accuracy’). can move over to the new technologies, as This upgraded Sequencer FLX long as it is effective, is bound to be cheaper.” System allows more sequencing cycles The systems available now from Roche, Illu- and therefore longer reads than the previ- mina and do seem to be ous Genome Sequencer 20 System. Longer effective, as the Broad Institute and other reads help in whole-genome sequencing organizations are using them for various and assembly applications. “We believe that sequencing-based applications. shortly there will be many more de novo assembled due to our technology,” Assembling the future says Egholm. He notes that the genomes of By the end of last year, 454 Life Sciences, which several microorganisms have been assembled was founded in 2000 and was recently acquired from scratch by use of 454 sequencing, and by Roche, had more than 60 of its sequencing the technology has also been used to supple- systems placed around the world. “Our tech- ment Sanger sequencing on a few projects nology is in all major US genome centres involving larger genomes. and some of the international centres,” says At the Broad Institute, where research- Michael Egholm, vice-president of research ers use two FLX systems and one Genome and development at 454 Life Sciences. Sequencer 20, Nusbaum appreciates the ease The technology developed by 454 Life of the 454 sequencing process. “It is nicer The Genome Sequencer FLX, developed jointly by Sciences is based on two fundamental prin- than Sanger sequencing because it is a faster 454 Life Sciences and Roche Applied Sciences, is ciples: emulsion PCR and . and simpler process.” He points out that at based on 454 sequencing technology. Emulsion PCR side-steps the conventional the Broad Institute, sequencing a bacterium process of bacterial cloning by attaching frag- can take a month with Sanger methodology, be necessary — an issue that Roche and ments of DNA 300 to 500 base pairs long to whereas with 454 technology it can be done in 454 are trying to address. “Whether [454 beads in vitro, then amplifying them with a week and without the high degree of clone sequencing] will work with a mammalian PCR to make millions of identical copies. tracking associated with Sanger sequencing. genome is a good question, and it is a little Pyrosequencing allows for a massive paral- Still, for de novo sequencing and to assem- way off,” says Nusbaum. But he optimistically lel reaction format done in 1.6 million wells ble larger genomes, such as those of mam- notes that 454 Life Sciences has exceeded his on a PicoTiterPlate. “Right now, day in and mals, longer paired reads — that is, two expectations in surmounting several other day out, we can perform 400,000 reads of reads that are a known distance apart — will technical hurdles. Egholm, however, is much

TRUTH AND ACCURACY Mitch Sogin, director of the Sogin and his colleagues did a generation sequencing systems Josephine Bay Paul Center for straightforward experiment in could trump the accuracy

M. SOGIN Comparative Molecular Biology which they resequenced more produced through Sanger and Evolution at Woods Hole than 50 templates and cloned sequencing. “For applications Marine Biological Laboratory sequences on a 454 Life Sciences such as CHIP-sequencing you in Massachusetts, performs Genome Sequencer 20 that they can use the 454 or Solexa 1G environmental sampling of had sequenced previously with data even though they have lower nucleic-acid sequences. “Every the Sanger methodology. The base accuracy because you do sequence has the potential to work showed that the 454 system not need it. What you need is tell us an important story,” says was 98% accurate if no culling the volume for the experiment,” Sogin, so highly accurate analysis was used to remove bad bases or says Chad Nusbaum of the techniques are needed. reads6. Broad Institute in Cambridge, But when Sogin’s lab switched However, by using a very Massachusetts. over from traditional Sanger- simple set of rules, which caused Sogin now thinks that traditional based sequencing to the somewhere between 10% and sequencing methods had been next-generation sequencing 20% of the data to be discarded, underestimating the biological system of 454 Life Sciences in the accuracy could be pushed up diversity of the environmental Branford, Connecticut, to study to 99.75%. And discarding up nucleic-acid samples. “Turns these environmental samples, Mitch Sogin tested the accuracy of to 20% of the data for this level out that the diversity is coming something strange happened. 454 sequencing. of accuracy is a trade-off that from low-abundance nucleic-acid “The diversity was between is fine with Sogin because the populations that you are not likely ten- and a hundred-fold more true biological diversity, or latest 454 system — the Genome to encounter if you sequence divergent than we expected,” just errors caused by the new Sequencer FLX — can produce up only a few hundred molecules. recalls Sogin. sequencing technology. “We to 400,000 reads per run. You see these low-abundance So Sogin and his colleagues had to explore just how good the Others agree that for some molecules only if you sequence needed to determine whether the sequencing technology actually applications the large amount many tens of thousands of unexpected findings represented was,” he says. of data generated by the next- molecules,” he says. N.B.

628