A Scalable, Fully Automated Process for Construction of Sequence-Ready Barcoded Libraries for 454 Genome Biology 2010, 11:R15

Lennon et al. Genome Biology 2010, 11:R15 http://genomebiology.com/2010/11/2/R15 METHOD Open Access AMethod scalable, fully automated process for construction of sequence-ready barcoded libraries for 454 Niall J Lennon1, Robert E Lintner1, Scott Anderson1, Pablo Alvarez2, Andrew Barry1, William Brockman3, Riza Daza1, Rachel L Erlich1, Georgia Giannoukos4, Lisa Green1, Andrew Hollinger1, Cindi A Hoover5, David B Jaffe4, Frank Juhn1, Danielle McCarthy1, Danielle Perrin1, Karen Ponchner1, Taryn L Powers1, Kamran Rizzolo1, Dana Robbins1, Elizabeth Ryan1, Carsten Russ4, Todd Sparrow1, John Stalker1, Scott Steelman1, Michael Weiand1, Andrew Zimmer1, Matthew R Henn1, Chad Nusbaum4 and Robert Nicol*1 454Aniesthe forautomated costlibrary 454 and sequencingconstruction time method required. significantly for constructing reduces librar- Abstract We present an automated, high throughput library construction process for 454 technology. Sample handling errors and cross-contamination are minimized via end-to-end barcoding of plasticware, along with molecular DNA barcoding of constructs. Automation-friendly magnetic bead-based size selection and cleanup steps have been devised, eliminating major bottlenecks and significant sources of error. Using this methodology, one technician can create 96 sequence-ready 454 libraries in 2 days, a dramatic improvement over the standard method. Background metagenomic profiling and amplicon sequencing com- The emergence of next-generation sequencing technolo- pared with other next-generation sequencing platforms. gies, such as the Roche/454 Genome Sequencer, the Illu- However, these types of applications pose a challenge in mina Genome Analyzer, the Applied Biosystems SOLiD that they require a relatively small number of reads from sequencer and others, has provided the opportunity for large numbers of samples. For example, for viruses such both large genome centers and individual labs to generate as HIV, the small (approximately 10 kb) genome size DNA sequence data at an unprecedented scale [1]. How- means that a single sample on even the smallest scale 454 ever, as sequence output continues to increase dramati- picotiter plate configuration (1 region of a 16 region gas- cally, processes to generate sequence-ready libraries lag ket) would yield over 1,500-fold coverage, vastly more behind in scale. The minimum unit of sequence data (for coverage than required for genome assembly. Further, the example, lane or channel) already exceeds the amount standard 454 library construction protocol is not easily required for small projects, such as viral or bacterial scalable and becomes a major cost driver relative to genomes, and will continue to increase. As a result, proj- sequencing when modest numbers of reads are required ects with large numbers of samples but small sequence from each sample. In addition, when sequencing large per sample requirements become increasingly challeng- numbers of isolates of the same organism, the sequence ing to undertake in a cost-effective manner. identity between samples makes cross-contamination vir- The 454 Genome Sequencer uses bead-in-emulsion tually impossible to detect without a molecular amplification and a pyrosequencing chemistry to gener- (sequence-based) tag. We set out to devise a laboratory ate DNA sequence reads by synthesis [2]. Longer reads process for high-throughput 454 sequencing that is able and shorter sequencing run times make the 454 platform to generate large numbers of sequence-ready libraries at a powerful tool for de novo assembly of small genomes, low cost per sample. Opportunities for sample mix-up errors or cross-contamination must be minimized and * Correspondence: [email protected] the process must also support efficient pooling of sam- 1 Genome Sequencing Platform, Broad Institute of MIT and Harvard, 320 ples to avoid the cost of over-sequencing. Key require- Charles St., Cambridge, MA 02141, USA ments for this process include: plate-based processing of © 2010 Lennon et al.; license BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Lennon et al. Genome Biology 2010, 11:R15 Page 2 of 9 http://genomebiology.com/2010/11/2/R15 samples to enable handling by automation; redesign of each modification provides. This system utilizes a stan- process steps to be amenable to automation, particularly dard 96-well plate format and operates on the Velocity 11 sample cleanup and size-selection steps; end-to-end Bravo, a small-footprint, liquid handling platform (see barcoding, including barcoded input sample tubes and Additional files 1, 2 and 3 for process maps; a link to the microtiter plates to support comprehensive sample track- Bravo automation protocol files can be found in Materials ing; molecular barcodes added to each DNA sample dur- and methods), but can be implemented on many com- ing library construction, which is read out as sequence, to mercially available liquid handlers. The process is fully support pooling before and sorting of reads after scalable and greatly decreases the potential for sample sequencing as well as easy identification of sample cross- swaps and cross-contamination as well as operator-to- contamination; automated construction of both frag- operator variability. We note that this process can also be ment-read and paired (jumping) library types; low input carried out by hand (see Materials and methods). DNA library construction; very limited human labor. Samples are tracked end-to-end through the use of bar- We have addressed each of these specifications in coded plasticware so that each step is captured in a labo- development of a high-throughput library construction ratory information management system (LIMS). Since process to support 454 sequencing. We were motivated individual samples can come from many sources and by two key applications in particular, assembly of bacte- sometimes in small batches, each sample enters the pro- rial genomes and assembly and diversity analysis of small cess in a two-dimensional barcoded microtube. The two- viral genomes, but the process is amenable to virtually dimensional barcoded tubes (Thermo Matrix) are placed any sequencing project with large numbers of samples. on the deck in racks of 96 where they are scanned for tracking in the LIMS. Samples are then transferred by the Results and discussion robot into 96-well plates labeled with standard code 128 High-throughput library construction barcodes for all downstream steps. Each sample also We comprehensively redesigned the standard 454 library receives a unique, molecular barcode that is added at the construction process for large-scale implementation of adapter ligation step that allows for sample multiplexing both fragment and 3-kb paired read library types. Table 1 and for downstream contamination checks (described describes the steps in the process, the scaling challenges below). of each step, the modifications that we have put in place Implementation of automated library construction for the high-throughput process and the benefits that enables a single technician to produce 96 fragment librar- Table 1: Improvements to library construction process Process step DNA Size selection/ Adapter ligation Multiplexing Library fragmentation clean-ups quantification Standard method Nebulization Column-based; Un-tagged or one Up to 12 samples Ribogreen ssDNA agarose gel cuts of 12 multiplex pooled after assay identifiers (MIDs) library in tubes construction process Drawback Low throughput; Not easily Low throughput Limited pool Limited accuracy Reduced yield automated; complexity and sensitivity opportunity for sample mix-up Modified method Acoustic shear in Solid phase 120 barcoded Up to 120 samples qPCR 96-well plate reversible adapters in plate pooled after immobilization in format adapter ligation 96-well plates or enrichment step Benefit Improved yield; Amenable to Cross- Increased Increased increased automation; less contamination flexibility and sensitivity; less throughput; opportunity for checks; high order pool complexity; input DNA automated setup sample mix-up multiplex within decreased usage required single region of of LC reagents PTP LC, Library Construction; PTP, picotiter plate; qPCR, quantitative PCR; ssDNA, single-stranded DNA. Lennon et al. Genome Biology 2010, 11:R15 Page 3 of 9 http://genomebiology.com/2010/11/2/R15 ies in 2 days or 24 3-kb jumping libraries in 3 days. This duce fragments in a tighter fragment length distribution, compares to an average throughput of six fragment the above-described benefits of acoustic shearing make it libraries or four jumping libraries in the same time span an ideal method for a scalable process. Fragments outside using the standard method. The jumping library con- the desired size range can be removed with subsequent struction throughput has been kept lower to make cross- size-selection steps (described below). Although we have contamination even more unlikely, specifically because observed a large fraction of fragments in the desired size there are a large number of steps prior to adapter ligation range with the standard settings for >90% of genomic and consequently more opportunity for sample cross- DNA samples (Figure 1b(i)), under-shearing is occasion- contamination. In this case, 24 samples in the 96-well ally evident (Figure 1b(ii)),

A Scalable, Fully Automated Process for Construction of Sequence-Ready Barcoded Libraries for 454 Genome Biology 2010, 11:R15

Supporting Information

Science Journals — AAAS

A Class I Jumping Clone Places the HLA-G Gene Approximately 100 Kilobases from Hfa-H Within the MA-A Subregion of the Human MHC

Chromosome-Scale Scaffolding of De Novo Genome Assemblies Based on Chromatin Interactions

Reducing Assembly Complexity of Microbial Genomes with Single-Molecule Sequencing

Paired-End Sequencing of Fosmid Libraries by Illumina

(12) United States Patent (10) Patent No.: US 7,932,029 B1 Lok (45) Date of Patent: Apr

Curriculum Vitae

Next-Generation Sequencing Strategies Enable Routine Detection of Balanced Chromosome Rearrangements for Clinical Diagnostics and Genetic Research

Paired-End Sequencing of Fosmid Libraries by Illumina

2010 Meeting Guide