"Overview of DNA Sequencing Strategies"
Total Page:16
File Type:pdf, Size:1020Kb
Overview of DNA Sequencing Strategies UNIT 7.1 Jay A. Shendure,1 Gregory J. Porreca,2 and George M. Church2 1University of Washington, Seattle, Washington 2Harvard Medical School, Boston, Massachusetts ABSTRACT Efficient and cost-effective DNA sequencing technologies have been, and may continue to be, critical to the progress of molecular biology. This overview of DNA sequencing strategies pro- vides a high-level review of six distinct approaches to DNA sequencing: (a) dideoxy sequencing; (b) cyclic array sequencing; (c) sequencing-by-hybridization; (d) microelectrophoresis; (e) mass spectrometry; and (f) nanopore sequencing. The primary focus is on dideoxy sequencing, which has been the dominant technology since 1977, and on cyclic array strategies, for which sev- eral competitive implementations have been developed since 2005. Because the field of DNA sequencing is changing rapidly, this unit represents a snapshot of this particular moment. Curr. Protoc. Mol. Biol. 81:7.1.1-7.1.11. C 2008 by John Wiley & Sons, Inc. ! Keywords: DNA sequencing Sanger dideoxy polony r r r r INTRODUCTION cleotides (ddNTPs) bearing fluorescent moi- In the mid-1960s, the first attempts at DNA eties (Prober et al., 1987), and thermostable sequencing followed the precedent set for pro- polymerases engineered to accept them (Tabor tein (Ryle et al., 1955) and RNA (Holley et al., and Richardson, 1995), as well as the im- 1965): sequencing by detailed analysis of plementation of efficient DNA sequence pro- degradation products. However, the length and duction workflows in core facilities and high- consequent complexity of the DNA polymer throughput sequencing centers. It is notable proved to be significantly problematic (Sanger, that much of this innovation was moti- 1988). A key moment came in February, 1977, vated by the Human Genome Project (HGP), when groups led by Fred Sanger and Walter which achieved completion of a draft of the Gilbert independently published descriptions canonical human genome sequence in 2001 of methodologies for DNA sequencing, both (International Human Genome Sequencing of which relied on gel electrophoresis to Consortium, 2001). Consequent to the techno- separate DNA fragments with single-base- logical innovation that enabled the HGP, the pair resolution (Maxam and Gilbert, 1977; per-base cost of dideoxy sequencing has fol- Sanger et al., 1977). In the years that fol- lowed an exponential decline (Collins et al., lowed, the rapid dissemination of these tech- 2003; Shendure et al., 2004). Importantly, nologies and their progression to robust proto- the read lengths and accuracy of sequenc- cols enabled a wide range of critical advances ing traces have steadily improved as well. As throughout the fields of genetics and molecu- community-wide capacity for high-throughput lar biology. The development of commercially DNA sequence production has been main- available automated sequencing platforms in tained in the wake of the HGP, the number of the mid-1980s represented a second key break- sequenced nucleotides deposited in GenBank through that secured the dominance of the has continued its exponential rise. As of Sanger protocol (also known as “dideoxy se- October 2007, genome sequences for 997 bac- quencing”) over the Maxam-Gilbert protocol terial species and 164 eukaryotic species are (also known as “chemical sequencing”) as the available in at least draft assembly form. method of choice for the next several decades In recent years, there has been a collective (Hunkapiller et al., 1991). sense in the technology development field In addition to automation, a supporting that optimization of dideoxy sequencing pro- cast of related technologies was developed tocols may be approaching exhaustion, and to further reduce costs and improve sequenc- that the trend of declining sequencing costs ing throughput. These included a broad range is unlikely to continue much further without of methods for efficient library construc- a radical change in the underlying technol- tion and template preparation, dideoxynu- ogy. This has sparked significant academic DNA Sequencing Current Protocols in Molecular Biology 7.1.1-7.1.11, January 2008 7.1.1 Published online January 2008 in Wiley Interscience (www.interscience.wiley.com). DOI: 10.1002/0471142727.mb0701s81 Supplement 81 Copyright C 2008 John Wiley & Sons, Inc. ! and commercial investment in alternative tech- complementary to the template whose se- nological paradigms (Shendure et al., 2004). quence is to be determined (Fig. 7.1.1). Several of these alternatives have quickly Numerous identical copies of the sequencing progressed to substantial proof-of-concept, template undergo the primer extension reac- demonstrating costs competitive with conven- tion within a single microliter-scale volume. tional dideoxy sequencing for certain applica- Generating sufficient quantities of template for tions (Margulies et al., 2005; Shendure et al., a sequencing reaction is typically achieved by 2005). Some of these platforms have recently either (1) miniprep of a plasmid vector into become, or are anticipated to become, widely which the fragment of interest has been cloned, available in an “open-source” format or as or (2) polymerase chain reaction (PCR) fol- commercial products. Although dideoxy se- lowed by a cleanup step. In the sequenc- quencing still accounts for the vast majority of ing reaction itself, both the natural deoxynu- DNA sequencing production, this is unlikely cleotides (dNTPs) and the chain-terminating to be the case several years from now. dideoxynucleotides (ddNTPs) are present at This unit provides a high-level overview a specific ratio that determines their relative of six distinct approaches to DNA sequenc- probability of incorporation during the primer ing. These are: (1) dideoxy sequencing, extension. Incorporation of a ddNTP instead of (2) cyclic array sequencing, (3) sequencing a dNTP results in termination of a given strand. by hybridization, (4) microelectrophoresis, Therefore, for any given template molecule, (5) mass spectrometry, and (6) nanopore se- strand elongation will begin at the 3" end of quencing. Additionally, this unit presents key the primer and terminate upon incorporation parameters that should be considered when of a ddNTP. In older protocols for dideoxy se- choosing the DNA sequencing strategy most quencing, four separate primer extension reac- appropriate for a given application. It should tions are carried out, each containing only one be emphasized that the DNA sequencing field of the four possible ddNTP species (ddATP, is changing rapidly, so the information in this ddGTP, ddCTP, or ddTTP), along with tem- unit represents a snapshot of this particular plate, polymerase, dNTPs, and a radioactively moment. labeled primer. The result is a collection of It is worthwhile to note that the research many terminated strands of many different goals that motivate DNA sequencing may be lengths within each reaction. As each reaction undergoing a substantial shift as well, concur- contains only one ddNTP species, fragments rent with the introduction of new technologies. with only a subset of possible lengths will Given that reference genome sequences for H. be generated, corresponding to the positions sapiens as well as all major model organisms of that nucleotide in the template sequence. are nearly complete, demand will likely shift The four reactions are then electrophoresed in away from de novo genome sequencing to- four lanes of a denaturing polyacrylamide gel wards other areas of application, such as rese- to yield size separation with single-nucleotide quencing (identifying genetic variation in the resolution. The pattern of bands (with each genome of an individual for whose species a band consisting of terminated fragments of a reference genome is already available) and tag single length) across the four lanes allows one counting (i.e., serial analysis of gene expres- to directly interpret the primary sequence of sion or chromatin occupancy by the sequenc- the template under analysis. ing of short but identifying DNA tags). The ini- Current implementations of dideoxy se- tial generation of new technologies will deliver quencing differ in several key ways from the sequence that is substantially shorter and less protocol described above. Only a single primer accurate than state-of-the-art Sanger sequenc- extension reaction is performed that includes ing. However, although the utility of such se- all four ddNTPs. The four species of ddNTP quence may be limited for de novo sequencing, are labeled with fluorescent dyes that have the it will likely be compatible, and often prefer- same excitation wavelength but different emis- able, for other areas of application. sion spectra, allowing for identification by flu- orescent energy resonance transfer (FRET). To minimize the required amount of template DNA SEQUENCING STRATEGIES DNA, a “cycle sequencing” reaction is per- Dideoxy Sequencing formed, in which multiple cycles of denat- Dideoxy sequencing, also known as Sanger uration, primer annealing, and primer exten- Overview of DNA sequencing, proceeds by primer-initiated, sion are performed to linearly increase the Sequencing polymerase-driven synthesis of DNA strands number of terminated strands. This requires Strategies 7.1.2 Supplement 81 Current Protocols in Molecular Biology Figure 7.1.1 Schematic of the basic principle involved in dideoxy sequencing. The sequencing template consists of an unknown region whose sequence is to be determined, flanked by known sequence to which a sequencing