<<

Proc. Natl. Acad. Sci. USA Vol. 92, pp. 10162-10166, October 1995 Biochemistry

A DNA strategy that requires only five bases of known terminal sequence for primiing (primer extention/stacking interaction/fluorescein/solid state/duplex probe) DONG-JING FU*, NATALIA E. BROUDEt, HUBERT K6STERt, CASSANDRA L. SMITHt, AND CHARLES R. CANTORt *Sequenom, Inc., 101 Arch Street, Boston, MA 02210; tCenter for Advanced Biotechnology and Department of Biomedical Engineering, Pharmacology, and Biology, Boston University, 36 Cummington Street, Boston, MA 02215; and tFaculty of Chemistry, Department of Biochemistry and Molecular Biology, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany Contributed by Charles R. Cantor, July 11, 1995

ABSTRACT We have previously reported an enhanced single-stranded probes. Stacking interactions between the version of sequencing by hybridization (SBH), termed posi- duplex probe and a single-stranded target provide enhanced tional SBH (PSBH). PSBH uses partially duplex probes stringency in discriminating matched and mismatched targets. containing single-stranded 3' overhangs, instead of simple Two enzyme-catalyzed steps (DNA polymerase and T4 DNA single-stranded probes. Stacking interactions between the ligase), in addition to pure physical hybridization, are used to duplex probe and a single-stranded target allow us to reduce enhance the accuracy of sequence information obtained. Here, the probe sizes required to 5-base single-stranded overhangs. we report a completely different application of the PSBH Here we demonstrate the use of PSBH to capture relatively approach. Partially double-stranded probes are used to capture long single-stranded DNA targets and perform standard DNA targets and form primer-template complexes. These solid-state Sanger sequencing on these primer-template com- complexes are then used as substrates in conventional solid- plexes without ligation. Our results indicate that only 5 bases state Sanger sequencing. The potential advantages of this of known terminal sequence are required for priming. In approach are much more efficient methods for DNA sample addition, the partially duplex probes have the ability to preparation and a smaller set of oligonucleotide primers than capture their specific target from a mixture of five single- needed for segmented primer walking. The PSBH approach stranded targets with different 3'-terminal sequences. This appears capable of sequencing mixtures of DNA fragments indicates the potential utility of the PSBH approach to se- without prior purification. One example would be DNA frag- quence mixtures of DNA targets without prior purification. ments generated by restriction enzymes such as BstXI, which leave DNA overhangs that are not part of the recognition site. The success of the human project is largely dependent on improving the efficiency and lowering the cost of DNA MATERIALS AND METHODS sequencing. Current strategies of DNA sequencing include the directed approach such as primer walking (1-6) and the Oligonucleotides were purchased from Operon Technologies random, shotgun, approach (7, 8). Two very different strate- (Alameda, CA) in an unpurified form. Sequencing was carried gies for using oligonucleotides to increase the efficiency of out either by using a Sequenase version 2.0 kit (United States DNA sequencing have recently been described and tested. The Biochemical) or by using an AutoRead sequencing kit (Phar- first strategy is a form of primer walking where three short macia) employing T7 DNA polymerase. oligonucleotides are stacked together and used as a segmented Immobilized single-stranded DNA targets for solid-phase primer (2, 5, 6). The advantage of this approach is that a DNA sequencing were prepared by PCR amplification using complete set of the short oligonucleotides can be made in procedures similar to those reported before (17, 18). PCR was advance, and no additional DNA synthesis is required as the performed on a Perkin-Elmer/Cetus DNA Thermal Cycler. sequencing project proceeds. This allows a much more parallel Vent (exo-) DNA polymerase was purchased from New sequencing strategy to be employed than in conventional England Biolabs, and the dNTP solutions were from Promega. primer walking. The second strategy is a completely different Plasmid NB34, a pCR II plasmid (Invitrogen) with a 1-kb approach, called sequencing by hybridization (SBH) (9-15). In anonymous human DNA insert, was used as the DNA template SBH a set of short oligonucleotide probes with known se- for amplification. PCR was performed using an 18-nt upstream quences is used to identify the complementary sequences in an primer and a downstream 5'-end-biotinylated 18-nt primer in unknown long DNA target. The hybridization results are then a 100-,ul or 400-,ul volume containing 10 mM KCl, 20 mM assembled to generate the complete sequence of the target Tris HCl (pH 8.8), 10 mM (NH4)2SO4, 2 mM MgSO4, 0.1% DNA. The advantages of the SBH method are its potential Triton X-100, 250 ,tM dNTPs, 2.5 ,tM biotinylated primer, 5 application for automated large-scale sequencing and its in- ,tM nonbiotinylated primer, <100 ng of plasmid DNA, and 6 trinsic power of determining many sequence regions in par- units of Vent (exo-) DNA polymerase per 100 gl of reaction allel. In one SBH format, a complete set of all possible short volume. Thirty PCR temperature cycles were performed; each oligonucleotide sequences is immobilized as an ordered array cycle included a heat denaturation step at 94°C for 1 min, an on a surface, and the unknown DNA fragment is hybridized to annealing step for 1 min at 60°C, and DNA chain extension this array. Alternatively, a set of unknown DNA fragments can with Vent (exo-) polymerase for 1 min at 72°C. PCR ampli- be immobilized on a surface and hybridized with one probe at fication with the tagged primer used a 1-min 45°C incubation a time. for primer annealing. The PCR product was purified by passing We have developed an enhanced version of SBH, termed through an Ultrafree-MC 30,000 NMWL filter unit (Millipore) positional SBH (PSBH) (16). PSBH uses duplex probes con- or by electrophoresis and extraction from a low-melting-point taining single-stranded 5-base 3' overhangs, instead of simple agarose gel. About 10 pmol of purified PCR fragment was then mixed with 1 mg of prewashed Dynabeads M280 with strepta- The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in Abbreviations: SBH, sequencing by hybridization; PSBH, positional accordance with 18 U.S.C. §1734 solely to indicate this fact. SBH.

10162 Downloaded by guest on September 26, 2021 Biochemistry: Fu et aL Proc. Natl. Acad. Sci. USA 92 (1995) 10163

A S' 23 mer NNNNN

18 mer s'

-%?NNNNN

annealing

s 23 m F NNNN k 3 18 mer s 3 Sanger Sequencing D | Sanger Sequencing

B s5NNN

biodns' PCR Dynal Streptavidin Beads 1k 5NNNN iNNNNN- *s105' 0.1 M NaOH treatment and then remove the supernatant

3TQNN \E

biotins' FIG. 1. Schematic of procedures used. (A) Solid-state Sanger sequencing with partially duplex probe. Outlined letters indicate the overhang sequence from the partially duplex probe. (B) Generating sequencing targets by PCR amplification. (C) Adding known sequence at the 3' end of a target to facilitate subsequent capture and sequencing by procedures in A and B. F, fluorescein; b, biotin.

vidin (Dynal, Oslo) in 100 ,ul of 1 M NaCl and TE (10 mM slowly to 37°C over a 20- to 30-min time period. The duplex Tris-HCl, pH 8.0/1 mM Na2EDTA, pH 8.0) and incubated at primer was then annealed with 1-5 pmol of the immobilized 37°C or 45°C for 30 min. The immobilized biotinylated double- single-stranded DNA target by adding the annealing mixture stranded DNA fragment was then converted to a single- to the DNA-containing magnetic beads, and the resulting stranded form by treating with freshly prepared 0.1 M NaOH mixture was further incubated at 37°C for 5 min, room at room temperature for 5 min. The magnetic beads, with temperature for 10 min, and finally 0°C for at least 5 min. For immobilized single-stranded DNA, were washed with 0.1 M Sequenase reactions, 1 ,ul of 0.1 M dithiothreitol solution, 1 ,ul NaOH and TE. Sequencing of the immobilized single-stranded DNA target was performed with reagents from the sequencing kits for primer 5'-F-GGCCCTTTGACGGGAGCC Sequenase version 2.0 or T7 DNA polymerase. A DNA duplex primer 2 5'-F-GATGATCCGACGCATCAAGGCCC probe containing a 5-base 3' overhang was used as a sequenc- 3'-CTACTAGGCTGCGTAGTT ing primer. The duplex has a 5'-fluorescein-labeled 23-mer, containing an 18-base 5'-constant region and a 5-base 3'- primer 3 5'-F-GATGATCCGACGCATCACAGCTC variable region (which has the same sequence as the 5' end of - 3'-CTACTAGGCTGCGTAGTG the corresponding nonbiotinylated primer for PCR amplifica- primer 4 5'-F-GATGATCCGACGCATCAGAATGT tion of target DNA as described above; see Fig. IA) and an 3'-CTACTAGGCTGCGTAGTC 18-mer complementary to the constant region of the 23-mer. The duplex was formed by annealing 20 pmol ofeach of the two primer 5 5'-F-GATGATCCGACGCATCACTCAAC oligonucleotides in a 1O-plJ volume containing 2 pl of Seque- 3'-CTACTAGGCTGCGTAGTG nase 5 x buffer (200 mM Tris'HCl, pH 7.5/100 mM MgCl2/250 primer 6 5'-F-GATGATCCGACGCATCAGCCTAG mM NaCl) from the Sequenase kit or in a 13-,ul volume 3'-CTACTAGGCTGCGTAGTC containing 2 ,ul of the annealing buffer (1 M Tris'HCl, pH 7.6/100 mM MgCl2) from the AutoRead sequencing kit. The FIG. 2. Sequencing primers. The "F" at the 5' end indicates annealing mixture was heated to 65°C and allowed to cool fluorescein. Downloaded by guest on September 26, 2021 10164 Biochemistry: Fu et al. Proc. Natl. Acad. Sci. USA 92 (1995) 'Anyul! EEA l liE

I.j-IILL1L,b _ bhh_I. A ILA 4aI _1IYLAlWfll11 i_____A_iA~aL s L L1i t TNTT CTC C T -C- T C CCTTCT -. CCT CTTCCT C.CCT CT,T AxTCk C-ACCAT CCA T XCTT C-A VB 110 140 170 1S0

TTJ.T CTC-C T h .C TI C CCTTCT .A ; * -AA . -A. \ AA,. CT CTTCCTCCCCT CT .Tn TC CCCC% AT CC: T ACfl C BC FlLLll/S,VlS§XL~~~~~~~~~~~~~~~~~~~~~~~~~110 ?f1<414bV 1.IA%L-T* 140rl 1S0 I 1!50

CT CTTOET C-OCCT CT T -.TC CJ;C-AT CCA T ¾CTI CAATCTTTTC.A A CTAATCTTCA.C CCTTCTT TACT.TCs_ATaTTA,\ C T_ 160 180 190 200 210

TCTTC .TTTTCTTT C-C-CTTT TT CTTCT TCCCT A;C TA AACAACCT aCTAATTCA TT TCACT T ATTTT CTC4C:-xT 60 70 50 90 ±00 110 120 130

CT CTICCT CkCCT CT%T AATCA C,CCA:*T CCA T ACTT C. ATCTXT.T.TXC.A A CT.ATCTTC..-C CCITCIT Ai.TTC;AT-ITA C 190 200 220 240 250

CC . TTT TT CTTCT TCCCT ACA T -CC -.CCT I.CT, A TTC.- TT TCCIT T ATTTT CTCACAT . ACA T C CC TTCT 20 30 50 60 -_ 90 100 110

T CCCT.T C ACTCT,TC_ TT.TTCT.TTT\AA CC-AAITT A TT CTTCT TCCCT ACA TA AACC:ACCT ±CTAATTCA TT TCACT 10 30 40 S0 60 70 go90 ±1 10

ITJCTIT a;CTTT- TT TICT TOrT mCr TACTkC^AACCCCT aCTAxATYCA Tr TCAT T AT2IT CTCA4CT A IA TA C CCTTCT 10 20 30 50 s0 90 100 1±0 FIG. 3. (Legend appears at the bottom of the opposite page.) Downloaded by guest on September 26, 2021 Biochemistry: Fu et al. Proc. Natl. Acad. Sci. USA 92 (1995) 10165 of Mn buffer (0.15 M sodium isocitrate/0.1 M MnCl2) for the Complex 1 relative short target (140-nt), and 2 Al of diluted Sequenase (1.5 units) were added, and the reaction mixture was divided Primer. 5'-F-GATGATCCGACGCATCAGAATGT into four ice-cold termination mixtures (each consisted of 3 ,ul 3'-CTACTAGGCTGCGTAGTC of the appropriate termination mixture: 80 ,uM dATP, 80 p.M Targets: 3'-TTACA------b-5' -560nt dCTP, 80 p.M dGTP, 80 p.M dTTP, and 8 p.M of one of the 3'-GGATC------b-5' -530nt four ddNTPs, in 50 mM NaCl). For T7 DNA polymerase 3'-GTCGA------b-5' -140nt reactions, 1 p.l of extension buffer (40 mM MnCl2, pH 7.5/304 3'-TCGAG------b-5' -140nt mM citric acid/324 mM dithiothreitol) and 1 p.l of T7 DNA 3'-CCGGG------b-5' - 90nt polymerase (8 units) were added, and the reaction volume was split into four ice-cold termination mixtures (each consisted of Complex 2 1 ,ul dimethyl sulfoxide and 3 p.l of the appropriate termination Primer. 5'-F-GATGATCCGACGCATCAGCCTAG mixture: 1 mM dATP, 1 mM dCTP, 1 mM dGTP, 1 mM dTTP, 3'-CTACTAGGCTGCGTAGTC and 5 p.M of one of the four ddNTPs, in 50 mM NaCl/40 mM Tris HCl, pH 7.4). The reaction mixtures for both enzymes Targets: 3'-TTACA------b-5' -560nt were further incubated at 0°C for 5 min, room temperature for 3'-GGATC------b-5' -530nt 5 min, and 37°C for 5 min. The magnetic beads were precip- 3'-GTCGA------b-5' -140nt itated, and the supernatant was removed. The magnetic beads 3'-TCGAG------b-5' -140nt were resuspended in 10 ,ld of Pharmacia stop mix (deionized FIG. 4. Sequencing complexes containing multiple targets. The formamide containing dextran blue at 5 mg/ml). Samples were "F" at the 5' end indicates fluorescein; the "b" at the 5' end indicates denatured at 90-95°C for 5 min (under this harsh condition, biotin. both DNA template and the dideoxy-terminated fragments are released from the beads) and stored on ice prior to loading on handling, good yield, and reproducible amplification and se- a DNA sequencing gel. A control experiment, performed in quencing procedures. Several different overhang sequences parallel, used a fluorescein-labeled 18-mer complementary to containing different base compositions have been tested as the 3' end of target DNA as the sequencing primer instead of sequencing primers. Fig. 3 A and B show part of the sequence the duplex probe. The annealing of the 18-mer to its target was data obtained using a conventional 18-mer primer (primer 1 in carried out in the same manner as the annealing of the duplex Fig. 2) or a duplex primer with a G+C-rich overhang (primer probe described above. Sequencing samples were analyzed on 2 in Fig. 2), respectively, to sequence a 530-nt DNA target. In an A.L.F. DNA Sequencer (Pharmacia) using a 6% polyacryl- both cases, clean and even sequencing ladders were obtained. amide gel containing 7 M urea and 0.6x TBE (53 mM The fragment peaks are resolved out to about the end oftarget, Tris-borate/1.2 mM Na2EDTA, pH 8.0). although the signal intensity obtained from the duplex primer is somewhat less than that from the 18-mer. Fig. 3C gives an example of sequence priming with an overhang containing a RESULTS AND DISCUSSION similar content of A+T and G+C (primer 3 in Fig. 2). Fig. 3D PCR was used to generate single-stranded targets for testing gives an example of sequence priming with an A+T-rich the PSBH strategy. Several known sequences ofplasmid NB-34 overhang (primer 4 in Fig. 2). Our results indicate that were amplified by PCR with one 5'-biotinylated and one A+T-rich overhangs work as well as G+C-rich overhangs. It unbiotinylated primer using the Vent (exo-) polymerase, a seems that the proposed purine-purine stacking at the an- thermophilic DNA polymerase lacking 3'-terminal nealing junction (6) is not required to generate good Sanger transferase activity. The amplified, biotinylated double- sequencing ladders using these duplex probes. stranded DNA fragments were then immobilized on magnetic To demonstrate the generality of our sequencing strategy, beads coated with streptavidin (Fig. 1). Single-stranded DNA several single-stranded targets with preselected 5-base se- target was generated after strand separation by alkaline de- quences at their 3' end were generated by PCR amplification naturation. A partially duplex probe was formed by annealing with tagged primers (Fig. 1C). For example, one PCR primer a 5'-fluorescein-labeled 23-mer to a complementary 18-mer to has a 5-base tag (TCAAC) at its 5' end, which shares the same generate an 18-bp duplex and a single-stranded 5-base 3' sequence as the overhang of duplex sequencing primer 5 (Fig. overhang. The preformed duplex probe was then used to 2), and a 3' region complementary to one of the PCR template capture the single-stranded target, and conventional solid- strands. This PCR product was treated as in Fig. LA to generate state Sanger sequencing was carried out (Fig. 1A). Both single-stranded targets and sequenced using the corresponding Sequenase version 2.0 (United States Biochemical) and T7 duplex probe. The resulting sequence data (Fig. 3E) are similar DNA polymerase (Pharmacia) sequencing kits were tested. to those obtained from the targets generated by regular PCR The polymerase extension reaction was first performed at low primers. Fig. 3F gives another example of sequencing PCR temperature (0°C) for 5 min to ensure the binding of the short products generated by a tagged primer using probe 6 in Fig. 2. overhang to its target; then the reaction temperature was These results suggest that a selected overhang can be used to raised up to 25°C and then to 37°C to allow the full extension capture and sequence targets with various sequences when the to occur. targets are amplified with a related tagged primer. Thus we are The sequences of the partially duplex probes used are shown not limited to priming at known sequences at the ends of in Fig. 2; their 3' overhang is complementary to the 3' end of targets. Any internal sequence can be used as an initiation the corresponding immobilized single-stranded target DNA. point, but this requires the custom synthesis of a primer. The use of a 5'-fluorescein-labeled probe enables the analysis To test the abilities of the partially duplex probes to dis- of the sequencing ladder on an automated fluorescence DNA criminate against mismatched targets, two sequencing reac- sequencer. The solid-phase approach ensures easy sample tions were performed as shown in Fig. 4. A mixture of five FIG. 3 (on opposite page). Sanger sequence data on magnetic beads using primer 1 as in Fig. 2 to sequence a 530-nt target with T7 DNA polymerase (A), using primer 2 as in Fig. 2 to sequence a 530-nt target with T7 DNA polymerase (B), using primer 3 as in Fig. 2 to sequence a 530-nt target (amplified with a tagged PCR primer) with Sequenase version 2.0 (C), using primer 4 as in Fig. 2 to sequence a 560-nt target with T7 DNA polymerase (D), using primer 5 as in Fig. 2 to sequence a 560-nt target (amplified with a tagged PCR primer) with T7 DNA polymerase (E), using primer 6 as in Fig. 2 to sequence a 530-nt target (amplified with a tagged PCR primer) with T7 DNA polymerase (F), complex 1 in Fig. 4 with Sequenase version 2.0 (G), and complex 2 in Fig. 4 with Sequenase version 2.0 (H). Downloaded by guest on September 26, 2021 10166 Biochemistry: Fu et al. Proc. Natl. Acad. Sci. USA 92 (1995) (complex 1) or four targets (complex 2) with different se- This work was supported by a grant from the Department of Energy quences at their 3' end was annealed to a single probe, and the (DE-FG02-93ER61609). C. R. Cantor is a consultant to Sequenom. Sanger sequencing reactions were performed following the general procedures described inMaterials andMethods. In both 1. Studier, F. W. (1989) Proc. Natl. Acad. Sci. USA 86, 6917-6921. cases, only the target containing the correct 3'-terminal se- 2. Szybalski, W. (1990) 90, 177-178. 3. Strauss, E. C., Kobori, J. A., Siu, G. & Hood, L. E. (1986) Anal. quence gives readable sequence data. Fig. 3 G and H shows Biochem. 154, 353-360. part of the sequence data generated from complex 1 and 2, 4. Siemieniak, D. R. & Slightom, J. L. (1990) Gene 96, 121-124. respectively, in Fig. 4. 5. Kieleczawa, J., Dunn, J. J. & Studier, F. W. (1992) Science 258, We have previously shown that hybridization of an oligo- 1787-1791. nucleotide next to a preformed duplex can allow shorter probe 6. Kotler, L. E., Zevin-Sonkin, D., Sobolev, I. A., Beskin, A. D. & sequences such as a 5-base single-stranded overhang to be used Ulanovsky, L. E. (1993) Proc. Natl. Acad. Sci. USA 90, 4241- because of the extra stacking energy (16). Here we further 4245. demonstrate that such duplex probes are able to hybridize with 7. Messing, J., Crea, R. & Seeburg, P. H. (1981) Nucleic Acids Res. relatively long single-stranded targets, and we are able to 9, 309-321. 8. Anderson, S. (1981) Nuckic Acids Res. 9, 3015-3027. perform solid-state Sanger sequencing effectively without 9. Bains, W. & Smith, G. C. (1988) J. Theor. Biol. 135, 303-307. ligating the targets to the probes. A+T-rich overhangs worked 10. Strezoska, Z., Paunesku, T., Radosavlevic, D., Labat, I., as well as the G+C-rich overhangs, which indicates the gen- Drmanac, R. & Crkvenjakov, R. (1991) Proc. Natl. Acad. Sci. erality of stacking hybridization. Furthermore, a single duplex USA 88, 10089-10093. probe can be used to read the sequence of a complementary 11. Drmanac, R., Labat, I., Brukner, I. & Crkvenjakov, R. (1989) target when challenged with a mixture of this target and other 4, 114-128. unrelated sequences. 12. Drmanac, R., Drmanac, S., Strezoska, Z., Paunesku, T., Labat, I., Elsewhere we report using the partially duplex probe to Zeremski, M., Snoddy, J., Funkhouser, W. F., Koop, B., Hood, L. sequence short targets potentially suitable for analysis by & Crkvenjakov, R. (1993) Science 260, 1649-1652. laser desorption/ionization time-of-flight mass 13. Pevzner, P. A., Lysov, Y. P., Khrapko, K. R., Belyavsky, A. V., matrix-assisted Florentiev, V. L. & Mirzabekov, A. D. (1991) J. Biomol. Struct. spectrometry (19). By using stacking hybridization, the effec- Dyn. 9, 99-110. tive probe size can be reduced to 5-base single-stranded 14. Bains, W. (1991) Genomics 11, 294-301. overhangs. This means only 1024 (45) different probes are 15. Pease, A. C., Solas, D., Sullivan, E. J., Cronin, M. T., Holmes, needed to cover all possible sequences. In principle, an ordered C. P. & Fodor, S. P. A. (1994) Proc. Natl. Acad. Sci. USA 91, array of 45 different probes could be used as a capture device 5022-5026. to purify their complementary targets from restriction digests 16. Broude, N. E., Sano, T., Smith, C. L. & Cantor, C. R. (1994) Proc. generated by class IIS enzymes that leave 5-base 3' overhangs Natl. Acad. Sci. USA 91, 3072-3076. that are not part of the recognition site and to perform multiple 17. Hultman, T., Stahl, S., Homes, E. & Uhlen, M. (1989) Nucleic Sanger reactions simultaneously on each of the captured DNA Acids Res. 17, 4937-4946. 18. Zimmermann, J., Voss, H., Wiemann, S., Erfle, H., Rupp, T., fragments. An example of such a restriction enzyme is ApaBI Hewitt, N. A., Schwager, C., Stegemann, J. & Ansorge, W. (1993) (recognition sequence: GCANNNNN/TGC), which can gener- Methods Mol. Cell. Biol. 4, 29-32. ate 5-base 3' overhangs (20). Such a sequencing strategy has great 19. Fu, D.-J., Broude, N. E., Koster, H., Smith, C. L. & Cantor, C. R. potential to speed up genome sequencing and lower the cost (1995) Genet. Anal., in press. compared to many other sequencing strategies currently avail- 20. Roberts, R. & Macelis, D. (1994) Nucleic Acids Res. 22, 3628- able. 3639. Downloaded by guest on September 26, 2021