Downloaded from www.genome.org on October 30, 2006

Pyrosequencing Sheds Light on DNA

Mostafa Ronaghi

Genome Res. 2001 11: 3-11 Access the most recent version at doi:10.1101/gr.11.1.3

References This article cites 33 articles, 10 of which can be accessed free at: http://www.genome.org/cgi/content/full/11/1/3#References

Article cited in: http://www.genome.org/cgi/content/full/11/1/3#otherarticles Email alerting Receive free email alerts when new articles cite this article - sign up in the box at the service top right corner of the article or click here

Notes

To subscribe to Genome Research go to: http://www.genome.org/subscriptions/

© 2001 Cold Spring Harbor Laboratory Press Downloaded from www.genome.org on October 30, 2006

Review Pyrosequencing Sheds Light on DNA Sequencing

Mostafa Ronaghi Genome Technology Center, Stanford University, Palo Alto, California 94304, USA

DNA sequencing is one of the most important platforms for the study of biological systems today. Sequence determination is most commonly performed using dideoxy chain termination technology. Recently, pyrosequencing has emerged as a new sequencing methodology. This technique is a widely applicable, alternative technology for the detailed characterization of nucleic acids. Pyrosequencing has the potential advantages of accuracy, flexibility, parallel processing, and can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled , and gel-electrophoresis. This article considers key features regarding different aspects of pyrosequencing technology, including the general principles, enzyme properties, sequencing modes, instrumentation, and potential applications.

The development of DNA sequence determination Pyrosequencing techniques with enhanced speed, sensitivity, and Pyrosequencing is a DNA sequencing technique that is throughput are of utmost importance for the study of based on the detection of released pyrophosphate (PPi) biological systems. Conventional DNA sequencing re- during DNA synthesis. In a cascade of enzymatic reac- lies on the elegant principle of the dideoxy chain ter- tions, visible light is generated that is proportional to mination technique first described more than two de- the number of incorporated nucleotides (Fig. 1). The cades ago (Sanger et al. 1977). This multi-step principle cascade starts with a nucleic acid polymerization reac- has gone through major improvements during the tion in which inorganic PPi is released as a result of years to make it a robust technique that has been used incorporation by polymerase. The released for the sequencing of several different bacterial, ar- PPi is subsequently converted to ATP by ATP sulfury- cheal, and eucaryotic genomes (http://www.ncbi.nlm. lase, which provides the energy to luciferase to oxidize nih.gov, and http://www.tigr.org). However, this tech- luciferin and generate light. Because the added nucleo- nique faces limitations in both throughput and cost for tide is known, the sequence of the template can be most future applications. Many research groups determined. The nucleic acid molecule can be either around the world have put effort into the development RNA or DNA. However, because DNA polymerases of alternative principles for DNA sequencing. Three show higher catalytic activity than RNA polymerases methods that hold great promise are sequencing by for limited nucleotide extension, efforts have been fo- hybridization (Bains and Smith 1988; Drmanac et al. cused on the use of a primed DNA template for pyrose- 1989; Khrapko et al. 1989; Southern 1989), parallel sig- quencing. Standard pyrosequencing uses the Klenow nature sequencing based on ligation and cleavage fragment of Escherichia coli DNA Pol I, which is a rela- (Brenner et al. 2000), and pyrosequencing (Ronaghi et tively slow polymerase (Benkovic and Cameron 1995). al. 1996, 1998b). Pyrosequencing has been successful The ATP sulfurylase used in pyrosequencing is a recom- for both confirmatory sequencing and de novo se- binant version from the yeast Saccharomyces cerevisiae quencing. This technique has not been used for ge- (Karamohamed et al. 1999a) and the luciferase is from nome sequencing due to the limitation in the read the American firefly Photinus pyralis. The overall reac- length, but it has been employed for applications such tion from polymerization to light detection takes place as genotyping (Ahmadian et al. 2000a; Alderborn et al. within 3–4 sec at room temperature. One pmol of DNA -ATP mol 1011 ן Ekstro¨m et al. 2000; Nordstro¨m et al. 2000b), in a pyrosequencing reaction yields 6 ;2000 109 ן resequencing of diseased genes (Garcia et al. 2000), and ecules which, in turn, generate more than 6 sequence determination of difficult secondary DNA photons at a wavelength of 560 nanometers. This structure (Ronaghi et al. 1999). This article reviews the amount of light is easily detected by a photodiode, historical and technical aspects of the technique with photomultiplier tube, or a charge-coupled device regards to general principles, different strategies, appli- camera (CCD) camera. There are two different pyrose- cation of the technique to different formats, and in- quencing strategies that are currently available: solid- strumentation. The performance of the technique in phase pyrosequencing (Ronaghi et al. 1996) and liquid- different applications is also discussed. phase pyrosequencing (Ronaghi et al. 1998b). Solid- phase pyrosequencing (Fig. 2) utilizes immobilized DNA in the three-enzyme system described previously. E-MAIL [email protected]; FAX (650) 812-1975. Article and publication are at www.genome.org/cgi/doi/10.1101/ In this system a washing step is performed to remove gr.150601. the excess substrate after each nucleotide addition. In

11:3–11 ©2001 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/01 $5.00; www.genome.org Genome Research 3 www.genome.org Downloaded from www.genome.org on October 30, 2006

Ronaghi

Figure 1 The general principle behind different pyrosequenc- ing reaction systems. A polymerase catalyzes incorporation of nucleotide(s) into a nucleic acid chain. As a result of the incor- poration, a pyrophosphate (PPi) molecule(s) is released and sub- sequently converted to ATP, by ATP sulfurylase. Light is produced in the luciferase reaction during which a luciferin molecule is oxidized. liquid-phase pyrosequencing (Fig. 3) apyrase, a nucleo- Figure 3 Schematic representation of the progress of the en- tide-degrading enzyme from potato, is introduced to zyme reaction in liquid-phase pyrosequencing. Primed DNA tem- make a four-enzyme system. Addition of this enzyme plate and four enzymes involved in liquid-phase pyrosequencing has eliminated the need for solid support and interme- are placed in a well of a microtiter plate. The four different diate washing thereby enabling the pyrosequencing re- nucleotides are added stepwise and incorporation is followed using the enzyme ATP sulfurylase and luciferase. The nucleotides action to be performed in a single tube. These formats are continuously degraded by nucleotide-degrading enzyme al- are described in detail in this review. lowing addition of subsequent nucleotide. dXTP indicates one of the four nucleotides. History The theory behind sequencing-by-synthesis was de- scribed in 1985 (Melamede 1985) and based on this earlier to assay polymerase activity (Nyren 1987) to מ principle, detection of pyrophosphate was used in monitor stepwise DNA synthesis using exo polymer- DNA sequencing (Hyman 1988). Efforts were also put ase and unlabeled nucleotides (pyrosequencing). How- into the development of this principle for sequence ever, false signals were always observed when dATP determination using labeled nucleotides (Canard and was added into the pyrosequencing solution (Ronaghi Sarfati 1994; Cheesman 1994; Metzker et al. 1994; et al. 1996). The first major improvement was substi- Rosenthal 1989; Tsien et al. 1991). However, Metzker tution of dATP␣S for dATP in the polymerization reac- et al. (1994) showed that the incorporation efficiency tion, which enabled the pyrosequencing reaction to be of labeled nucleotides is low, causing nonsynchronized performed in homogeneous phase in real time (Ron- extension, which made it difficult to sequence more aghi et al. 1996). It was later shown that the nonspe- than a few bases. Synchronized extension in sequenc- cific signals were attributed to the fact that dATP is a substrate for luciferase. Conversely, dATP␣S was found (מing-by-synthesis requires exonuclease-deficient (exo DNA polymerase and unmodified nucleotides. We to be inert for luciferase, yet could be incorporated ef- used coupled enzymatic reactions, which were used ficiently by all DNA polymerases tested (Ronaghi et al. 1996). This strategy was used successfully for sequenc- ing of PCR-generated DNA material (Ronaghi et al. 1996). The second improvement was the introduction of apyrase to the reaction to make a four-enzyme system (Ronaghi et al. 1998b). The addition of apyrase allowed nucleotides to be added sequentially without any in- termediate washing step. This enzyme shows high catalytic activity and low amounts of this enzyme in the pyrosequencing reaction system efficiently degrade the unincorporated nucleoside triphosphates to nucleoside diphosphates and subsequently to nucleo- side monophosphate. Apyrase is less inhibited by its products as compared to other nucleotide-degrading enzymes. Figure 2 Schematic representation of the progress of the en- Most recently, the addition of ssDNA-binding pro- zyme reaction in solid-phase pyrosequencing. The four different tein to the pyrosequencing reaction system has simpli- nucleotides are added stepwise to the immobilized primed DNA fied the optimization of different parameters in pyrose- template and the incorporation event is followed using the en- zyme ATP sulfurylase and luciferase. After each nucleotide addi- quencing. This protein has proven to be useful for long tion, a washing step is performed to allow iterative addition. read sequencing and sequencing of difficult templates,

4 Genome Research www.genome.org Downloaded from www.genome.org on October 30, 2006

Pyrosequencing Sheds Light on DNA Sequencing as well as providing flexibility in primer design (Ron- ments may be obtained by running PCR in a compat- aghi 2000). ible buffer with pyrosequencing reaction or by using a more sensitive CCD camera in the pyrosequencing ma- Template Preparation for Pyrosequencing chine. Template preparation for pyrosequencing is straight- forward. After generation of the template by PCR, the Pyrosequencing Enzyme Systems product should be purified prior to pyrosequencing. Pyrosequencing takes advantage of the cooperativity of Unincorporated nucleotides and PCR primers in PCR several enzymes to monitor DNA synthesis. Parameters reaction perturb the pyrosequencing reaction. The salt such as stability, fidelity, specificity, sensitivity, KM, in the PCR reaction slightly inhibits the enzyme sys- and kcat (Table 1) are of utmost importance for the tem and should be removed or diluted. Two strategies optimal performance of the enzymes used in the reac- currently available for generation of a primed DNA tion (Ronaghi 1998). The kinetics of the enzymes can template for pyrosequencing are described below. be studied in real time by following the pyrosequenc- ing signals (a pyrogram). The slope of the ascending Solid-Phase Template Preparation curve in a pyrogram (Figs. 4 and 6) is determined Streptavidin-coated magnetic beads have been used to mainly by the activities of polymerase and ATP sulfu- prepare primed DNA template for pyrosequencing. rylase; the height of the signal is determined by the This technology enables biotinylated PCR product to activity of luciferase, and the slope of the descending be captured onto magnetic beads. After sedimentation, curve by the efficiency of nucleotide removal. In the the remaining components of the PCR reaction can be solid-phase system using microfluidics, which employs removed by washing to obtain pure double-stranded the three-enzyme system, the descending curve is de- DNA followed by alkali denaturation to yield ssDNA. termined by the washing efficiency. In the four- Both the immobilized biotinylated and nonbiotinyl- enzyme system of liquid-phase pyrosequencing, the ated strands in solution can be used as pyrosequencing accumulation of inhibitory substances decreases the ef- templates (Ronaghi et al. 1998a, 1999). This template ficiency of luciferase and apyrase. In both systems, the preparation system has given high-quality sequence activity of ATP sulfurylase is relatively constant during data with low background signals. the sequencing reaction. In pyrosequencing, the most Enzymatic Template Preparation critical reactions are DNA polymerization and nucleo- Recently, enzymatic template preparation was devel- tide removal by either washing or enzymatic degrada- oped for sequencing on double-stranded DNA tem- tion. Nucleotide removal (descending curve) competes plate (Nordstrom et al. 2000a,b). This template prepa- with the polymerization reaction (ascending curve). ration method employs a nucleotide-degrading en- Therefore, slight changes in the kinetics of these reac- zyme and exonuclease I. The enzymes are added to the tions directly influence the performance of the se- PCR product and the mixture is incubated at room quencing reaction. temperature or 35°C. During this step, the nucleotide- Polymerization Reaction degrading enzyme removes the nucleotides and exo- An excess amount of DNA polymerase relative to DNA nuclease I degrades the PCR primers remaining from template in the pyrosequencing reaction ensures that the amplification step. The sequencing primer is dis- the primed DNA template is bound efficiently by poly- pensed into the treated mixture and the temperature of merase and that, at the time of nucleotide addition, the solution is increased to heat-inactivate the en- polymerization takes place immediately. To obtain zymes. Template/primer complexes are formed by rapid polymerization, the nucleotide concentration rapid cooling of the solution. Two different enzyme must be above the K of the DNA polymerase (Table 1). systems can be used. The use of alkaline phosphatase M from shrimp or calf intestine together with exonucle- ase I allows the template to be prepared within 20 min, Table 1. Kinetic Data of Enzymes Involved whereas a combination of a low amount of apyrase, in Pyrosequencing

מ inorganic PPi, and exonuclease I enables the template Enzyme K (µM) k (S 1) to be prepared in three min. High quality pyrosequenc- M cat ing data has been obtained by enzymatic template Klenow Polymerasea 0.18 (dTTP) 0.92 preparation using a prototype pyrosequencing system ATP sulfurylaseb 0.56 (APS) 7.0 (PPi) 38 c that employs a very sensitive light detector. However, Firefly luciferase 20 (ATP) 0.015 Apyrased 120 (ATP) 260 (ADP) 500 (ATP) this template preparation method needs to be further optimized for use with the standard system that uses aVan Draanen et al. 1992. microtiter plates, because the dilution that is required bNyren and Lundin 1985. c to compensate for incompatible buffer systems results DeLuca and McElroy 1984. dTraverso-Cori et al. 1965. in low amounts of primed DNA template. Improve-

Genome Research 5 www.genome.org Downloaded from www.genome.org on October 30, 2006

Ronaghi

∼1.5 sec and the generation of light by luciferase takes place in <0.2 sec. In the four-enzyme system accumulation of AMP and dAMP␣S inhibits the lucif- erase activity. Kinetics of the enzymes in the detection reac- tion can be followed by the ad- dition of a known amount of PPi to the pyrosequencing en- zyme system. Nucleotide Removal To allow iterative addition, nucleotides must be removed Figure 4 Pyrogram of the raw data obtained from liquid-phase pyrosequencing. Proportional from the pyrosequencing reac- signals are obtained for one, two, three, and four base incorporations. Nucleotide addition, according to the order of nucleotides, is indicated below the pyrogram and the obtained tion. In the four-enzyme sys- sequence is indicated above the pyrogram. tem, nucleotides are removed enzymatically. The nucleotide- degrading enzyme must possess Conversely, if the concentration of the nucleotides is the following properties: First, the enzyme must hy- too high, lower fidelity of the polymerase is observed drolyze all deoxynucleoside triphosphates at approxi- (Eckert and Kunkel 1990; Cline et al. 1996), even mately the same rate; second, it must hydrolyze ATP to though the KM for misincorporation is much higher prevent the accumulation of ATP between cycles; than that of correct incorporation (Gillin and Nossal third, the time for nucleotide degradation by the 1976; Topal et al. 1980; Capson et al. 1992). A high nucleotide-degrading enzyme must be slower than fidelity can be achieved by using polymerases with in- nucleotide incorporation by the polymerase. It is also herent exonuclease activity, however, this has the dis- important that the yield of primer-directed incorpora- advantage that primer degradation can occur causing tion is as close to 100% as possible before the nucleo- out of phase signals. Although the exonuclease activity tide-degrading enzyme has degraded the nucleotide to of Klenow polymerase is relatively low, it has been a concentration below the KM of the polymerase (Table found that the 3Ј end of the primer was degraded dur- 1). Pyrograms obtained from liquid-phase pyrose- ing long incubations in the absence of nucleotides. quencing (Fig. 4) show that apyrase fulfils the criteria Even without exonuclease activity, an induced-fit described above. A constant signal intensity for each binding mechanism in the polymerization step (Wong base incorporation is obtained during the course of a et al. 1991) provides a very high selectivity for the cor- reaction determining high efficiency of this system rect nucleotide, with a fidelity of 105–106 when the (Fig. 4). In solid-phase pyrosequencing (three-enzyme nucleotide are added slightly above the KM. In pyrose- quencing, exo-polymerases, such as exo-Klenow or Se- quenase, catalyze the incorporation of a nucleotide only in the presence of a complementary nucleotide, confirming the high fidelity of these enzymes even in the absence of proof-reading exonuclease activity. The

KM and kcat for one-base incorporation is lower than that for the incorporation of several bases for most polymerases (Van Draanen et al. 1992). However, the

KM values for nucleotides are much lower for DNA polymerases than for apyrase (Table 1). Therefore, an increased fidelity in the system can be obtained as the nucleotide concentration necessary for efficient poly- merization is relatively low, and apyrase degrades nucleotides to a concentration far below the KM of the Figure 5 Schematic drawing of the automated system for liq- polymerase in less than five sec. uid-phase pyrosequencing. Four dispensers move on an X-Y ro- botics arm over the microtiter plate and add four different Detection Enzymes nucleotides, according to the prespecified order. The microtiter plate is agitated continuously to mix the added nucleotide. Gen- On successful polymerization, a proportional amount erated light is directed to the CCD camera using a lens array of PPi is released. ATP sulfurylase converts PPi to ATP in located exactly below the microtiter plate.

6 Genome Research www.genome.org Downloaded from www.genome.org on October 30, 2006

Pyrosequencing Sheds Light on DNA Sequencing

system of pyrosequencing, apyrase degrades the

nucleotides below the KM for polymerase, not allowing enough time for the polymerase to complete the poly- merization of these regions. The use of lower amounts of apyrase, or a second addition of the same nucleo- tide, solves this problem. Improvements can also be obtained by the use of ssDNA-binding protein in the pyrosequencing reaction solution (Ronaghi 2000). In a microfluidic format, complete incorporation can be Figure 6 Pyrogram obtained from five different chambers us- controlled easily by a delay in washing. Plus frame shift ing microfluidics in a spinning CD-format device developed at is mainly a problem for the four-enzyme system of py- Gyros AB (Sweden). All chambers contained the same DNA tem- rosequencing and normally is caused by enzyme con- plate. The addition order of nucleotides was GCTA. Nucleotides G, T, and G were correctly incorporated. Background signals for taminants or inefficient nucleotide degradation. A nucleotides C and A were due to pyrophosphate (PPi) contami- contaminating enzyme such as nucleoside diphos- nation in the pyrosequencing reaction mixture. phate kinase, which normally is found in commer- cially available ATP sulfurylase and apyrase, converts system), washing in a controlled manner should show the nucleoside diphosphate to nucleoside triphosphate the same advantages offered by apyrase in the four- (Karamohamed et al. 1999b), a substrate for polymer- enzyme system. An additional advantage of the three- ase. Another parameter causing plus frame shift is in- enzyme system is that no accumulation of inhibitory efficient degradation of nucleotides by apyrase and substances will be observed, because washing is per- usually is seen in later cycles of pyrosequencing due to formed between each nucleotide addition. the accumulation of inhibitory substances. Further works are underway to remove the inhibitory sub- Extending the Read Length of Pyrosequencing stances from the reaction system either by purification For many applications such as genome sequencing and or enzymatic degradation. Another factor reducing the gene sequencing, a long read is desirable. Several crite- efficiency of the four-enzyme system is the dilution ria must be met to obtain a long read length in pyrose- effect. In the four-enzyme system, the nucleotides are quencing: (1) The enzyme system must be stable; (2) iteratively added to the pyrosequencing solution there must be a low misincorporation; and (3) nucleo- thereby increasing the reaction volume at each step. tide extension must be synchronized. The enzyme sys- Although the volume of nucleotides added is as little as tem has been shown to be stable in its buffer system 200 nanoliter/min, dilution can be seen in long-read during the sequencing reaction as relatively constant sequencing. Dilution lowers the enzyme concentra- signal intensity is obtained for each individual nucleo- tions thereby decreasing their efficiency. Possible im- tide (Fig. 4). In the four-enzyme system, removal of provements include reducing the volume of nucleotide inhibitory substances from the reaction and minimiz- delivery or running the reaction at higher tempera- ing the dilution effect gives rise to 200 nucleotide reads tures to increase evaporation. (Ronaghi et al. 2000). The use of unlabeled nucleotides, addition of nucleotide in a concentration slightly Challenges for Pyrosequencing Technology above the KM, and rapid removal of nucleotides from An inherent problem with the described method is de the solution increases the fidelity of DNA synthesis. novo sequencing of polymorphic regions in heterozy- Although a relatively high signal-to-noise ratio is ob- gous DNA material. In most cases, it will be possible to tained in pyrosequencing, misincorporation may play detect the polymorphism. If the polymorphism is a an important role in limiting longer reads. Possible substitution, it will be possible to obtain a synchro- misincorporation terminates the primer strands, nized extension after the substituted nucleotide. If the which results in decreased signal intensity in the polymorphism is a deletion or insertion of the same course of a reaction. Nonsynchronized extensions are kind as the adjacent nucleotide on the DNA template, either a result of minus frame shift (when some of the the sequence after the polymorphism will be synchro- primer strands get one, or a few, nucleotides behind nized. However, if the polymorphism is a deletion or the other synchronized primer strands during exten- insertion of another type, the sequencing reaction can sion) or plus frame shift (when some of the primer become out of phase, making the interpretation of the strands get one, or a few, nucleotides ahead of other subsequent sequence difficult. If the polymorphism is synchronized primer strands during extension). Using known, it is always possible to use programmed DNA polymerase reduces the minus frame shift. nucleotide delivery to keep the extension of different מexo Insufficient exposure of nucleotides to polymerase can alleles synchronized after the polymorphic region. It is cause minus frame shift, which is sometimes observed also possible to use a bidirectional approach (Ronaghi in long homopolymeric regions. In the four-enzyme et al. 1999) whereby the complementary strand is se-

Genome Research 7 www.genome.org Downloaded from www.genome.org on October 30, 2006

Ronaghi quenced to decipher the sequence flanking the poly- fication. As pyrosequencing signals are very quantita- morphism. tive, it is possible to use this strategy for the studies of Another inherent problem is the difficulty in de- allelic frequency in large population. This system al- termining the number of incorporated nucleotides in lows >5000 samples to be analyzed in 8 h. Further- homopolymeric regions, due to the nonlinear light re- more, pyrosequencing enables determination of the sponse following incorporation of more than 5–6 iden- phase of SNPs when they are in the vicinity of each tical nucleotides. The polymerization efficiency over other allowing the detection of haplotypes (Ahmadian homopolymeric regions has been investigated and the et al. 2000b). results indicate that it is possible to incorporate թ10 Microbial Typing identical adjacent nucleotides in the presence of apy- rase (Ronaghi 2000). However, to elucidate the correct DNA markers used for typing normally contain both number of incorporated nucleotides, it may be neces- conserved and variable regions. A DNA primer comple- sary to use specific software algorithms that integrate mentary to the conserved or semiconserved region is the signals. For resequencing, it is possible to add the usually employed to sequence the variable region. In nucleotide twice for a homopolymeric region to ensure bacteria, 16S rRNA gene is commonly used to identify complete polymerization. different species and strains. By analyzing a sequence between 20–100 nucleotides on 16S rRNA gene, it is possible to taxonomically group different bacteria and, Applications of Pyrosequencing in many cases it is possible to get information about Pyrosequencing has opened up new possibilities for the strains. Pyrosequencing is now being applied for performing sequence-based DNA analysis. The avail- rapid typing of large number of bacteria, yeasts, and ability of an automated system for liquid-phase pyrose- viruses (B. Gharizadeh, pers. comm.). quencing (PSQ 96 system, http://www.pyrosequenc- ing.com) has allowed the technique to be adapted for Resequencing high-throughput analyses. This section describes some Pyrosequencing is currently the fastest method for se- of the potential applications of pyrosequencing. quencing a PCR product. Because pyrosequencing generates an accurate quantification of the mutated Genotyping of Single-Nucleotide Polymorphisms nucleotides, the resequencing of PCR-amplified disease For analysis of single-nucleotide polymorphisms genes for mutation scanning will be one of the more Ј (SNPs) by pyrosequencing, the 3 -end of a primer is interesting applications. Using this technique for rese- designed to hybridize one or a few bases before the quencing results in longer read length than de novo polymorphic position. In a single tube, all the different sequencing because nucleotide delivery can be speci- variations can be determined as the region is se- fied according to the order of the sequence. Pro- quenced. A striking feature of pyrogram readouts for grammed dispensing generates a signal for each addi- SNP analysis is the clear distinction between the vari- tion in a pyrogram, therefore variation in the pattern ous genotypes; each allele combination (homozygous indicates the appearance of a mutation. This strategy or heterozygous) will give a specific pattern compared has been used for resequencing of the p53 tumor sup- with the two other variants (Ahmadian et al. 2000a; pressor gene where mutations were successfully deter- Alderborn et al. 2000; Ekstrom et al. 2000; Nordstrom mined and quantified (Garcia et al. 2000). et al. 2000b). This feature makes typing extremely ac- curate and easy. Relative standard deviation values for Tag Sequencing the ratio between key peaks of the respective SNPs and The sequence order of nucleotides determines the na- reference counterparts are թ0.1 (Alderborn et al. 2000). ture of the DNA. Theoretically, eight or nine nucleo- Simple manual comparison of predicted SNP patterns tides in a row should define a unique sequence for and the raw data obtained from the PSQ 96 system can every gene in the human genome. However, it has score an SNP, especially as no editing is needed. Be- been found that to uniquely identify a gene from a cause specific patterns can be readily achieved for the complex organism such as human, a longer sequence individual SNPs, it will also be possible to automati- of DNA is needed. In a pilot study, it was found that cally score the allelic status by pattern recognition soft- 98% of genes in a human cDNA library could be ware. In a study based on results from three different uniquely identified by sequencing a length of 30 laboratories, 26 different SNPs and >1600 DNA samples nucleotides. Pyrosequencing was used to sequence this were analyzed. The algorithm classified the data from length for gene identification from a human cDNA li- 94% of the samples as good or medium quality and brary and the results were in complete agreement with 99.4% of these were automatically assigned the ex- longer sequence data obtained by Sanger DNA se- pected genotypes (B. Ekstro¨m, pers. comm.). The major quencing. Pyrosequencing offers high-throughput reason for low quality data was insufficient signal/ analysis of cDNA libraries because 96 samples can be noise typically caused by low efficiency in PCR ampli- analyzed in less than one hour. Like Sanger DNA se-

8 Genome Research www.genome.org Downloaded from www.genome.org on October 30, 2006

Pyrosequencing Sheds Light on DNA Sequencing quencing, pyrosequencing also has the advantage of other described formats: (1) Sequencing can be per- library screening, as the original cDNA clone is directly formed faster because the cycling time for each nucleo- available for further analysis. tide addition can be reduced; (2) there is no accumu- lation of inhibitory substances because washing is per- Analysis of Difficult Secondary Structures formed after each cycle; (3) it is possible to use lower Hairpin structures are common features in genomic amounts of enzymes to reduce the cost, and (4) inte- material and have been proposed to have regulatory gration of PCR amplification, template preparation, functions in gene transcription and replication. How- and pyrosequencing analysis in a single flow system ever, analyzing these sequences by conventional DNA can be envisaged. Using this format, DNA templates sequencing usually gives rise to DNA sequence ambi- are immobilized on a solid support that enables itera- guities seen as “run-off” or compressions. These prob- tive washing. Eventually it may be possible to immo- lems have been associated with gel electrophoresis. Py- bilize the detection enzymes (ATP sulfurylase and lu- rosequencing was successfully applied to decipher the ciferase) onto a solid support to further reduce enzyme sequence of such regions (Ronaghi et al. 1999). Klenow consumption. Pyrosequencing in a microfluidic format DNA polymerase was used in these studies in which has the potential to be used for very long reads because relatively high strand-displacement activity in reading no accumulation of inhibitory substances will be ob- through these structures was shown. tained. Different microfluidic formats are currently be- ing tested for pyrosequencing analysis. The pyrogram Instrumentation in Figure 6 demonstrates promising sequencing data Automation Based on Microtiter Plate Format with a relatively high signal-to-noise ratio that was ob- An automated version of a pyrosequencing machine tained in a centrifugal force-driven compact disc mi- was recently developed (http://www.pyrosequencing. crofluidic device (Eckersten et al. 2000). com). The automated version uses a disposable inkjet Array Pyrosequencing cartridge for precise delivery of small volume (200 nL) of six different reagents into a temperature-controlled Developments in microarray technology have opened microtiter plate (Fig. 5). The microtiter plate is under new possibilities for sequence-based analysis. The ma- continuous agitation to increase the rate of the reac- jor advantages of these formats are low cost and high tions. A lens array is used to efficiently focus the gen- throughput. Pyrosequencing can be applied on both erated luminescence from each individual well of the ordered and random arrays. In the latter array format, microtiter plate onto the chip of a CCD camera. pyrosequencing data provides high amounts of infor- Nucleotides are dispensed into alternating wells with a mation to reveal the sequence of the DNA template to delay to minimize the crosstalk of generated light be- be analyzed, eliminating the decoding step. A similar tween different wells. A cooled CCD camera images the strategy can be applied to tag sequencing of a cDNA plate every second to follow the exact process of the library immobilized on a solid surface using a common pyrosequencing reaction. Data acquisition modules sequencing primer. A system has been built that em- and an interface for PC connection are used in this ploys a nucleotide delivery module, a DNA array, and a instrument. Software running under Windows NT en- CCD camera (M. Ronaghi N. Pourmand, M. Jain, T. ables individual control of the dispensing order for Willis, and R.W. Davis, in prep.). A piezoelectric ultra- each well. Prior to pyrosequencing, the reagents and sonic sprayer was recently developed to enable homog- each of the four nucleotides are loaded into the inkjet enous delivery of nucleotide onto a microarray. A cartridge that is mounted in the instrument. A microti- single sprayer is used to deliver all four different ter plate containing primed DNA template is placed nucleotides, with washing of the nozzle between each into the pyrosequencing machine, and after the en- delivery. Data from single-base extension on an oligo- zymes and substrate have been delivered by the inkjet, nucleotide template attached to a glass slide has been nucleotides are added to the solution according to the obtained, showing the feasibility of the pyrosequenc- specified order. The signals in a pyrogram (Fig. 4) show ing enzymatic reactions in this format. It should be high quality sequence data with high signal-to-noise noted that in this chemiluminescence assay the energy ratio with the height of the peaks proportional to the available for detection is proportional to the amount of number of incorporated nucleotides. A high-through- template in the reaction, whereas in fluorescence assay put version of this machine is also under development, the energy available for detection can be increased which will allow the analysis of թ50,000 SNPs per day through the use of more powerful lasers. Conse- (B. Ekstro¨m, pers. comm.). quently, sensitive detection systems must be employed to allow detection of miniature pyrosequencing reac- Microfluidics Using Solid-Phase Pyrosequencing tion. Imaging a pyrosequencing reaction onto cur- Running pyrosequencing on solid-phase in a microflu- rently available CCD technology, we believe that the idic system offers several advantages compared with smallest detectable reaction should contain ը5000

Genome Research 9 www.genome.org Downloaded from www.genome.org on October 30, 2006

Ronaghi template molecules. Further optimization needs to be run the reaction in miniaturized formats. The advan- performed in terms of diffusion and variability of the tage of pyrosequencing in miniaturized formats may amount of available primed DNA templates before ap- lie in the ease with which large numbers of high- plication of such a format for reliable high throughput density arrays can be manufactured and the future in- DNA sequencing. tegration of sample preparation with these devices. Success in miniaturization of this technique into high Software for Pyrogram Analysis density microtiter plates, microarrays, or microfluidics Specialized software for SNP analysis of pyrograms ob- will reduce the cost and increase the throughput by tained from liquid-phase pyrosequencing has been de- one to two orders of magnitude, a crucial step for large veloped (http://www.pyrosequencing.com). This soft- scale genetic testing. ware enables analyses of selected wells in a microtiter plate and automatically performs genotyping as well as ACKNOWLEDGMENTS quality assessment of the raw data utilizing a novel The author is supported by an NIH grant. I thank Ronald SNP genotyping algorithm. Based on pattern recogni- Davis for valuable discussions, Guri Giaever, Joakim Lunde- tion, this algorithm automatically scores the genotype berg, Paul Hardenbol, Thomas Willis, Pål Nyre´n, and Bjo¨rn and calculates a quality value for each SNP scored. The Ekstro¨m for valuable comments on this manuscript. I also assignment of quality values is based on a number of thank Baback Gharizadeh, Afshin Ahmadian, Nader Pour- different parameters, including difference in match be- mand, and Nigel Tooke for sharing their results on pyrose- quencing. tween the best and next best choice of genotypes, agreement between expected and obtained sequence REFERENCES around the SNP, signal-to-noise ratios, variance in peak Ahmadian, A., Gharizadeh, B., Gustafsson, A.C., Sterky, F., Nyren, P., heights around the SNP, and peak width. This software Uhlen, M., and Lundeberg, J. 2000a. Single-nucleotide has also been used for other applications such as EST polymorphism analysis by Pyrosequencing. Anal. Biochem. sequencing, microbial typing, and confirmatory se- 280: 103–110. Ahmadian, A., Lundeberg, J., Nyren, P., Uhlen, M., and Ronaghi, M. quencing, however, until now the base-calling has 2000b. Analysis of the p53 tumor suppressor gene by been performed manually. Specialized software for py- pyrosequencing. BioTechniques 28: 140–144. rosequencing of longer reads is currently under devel- Alderborn, A., Kristofferson, A., and Hammerling, U. 2000. Determination of single nucleotide polymorphisms by real-time opment with automatic base-calling (B. Ekstro¨m, pers. pyrophosphate DNA sequencing. Genome Res. 10: 1249–1258. comm.). Bains, W. and Smith, G.C. 1988. A novel method for nucleic acid sequence determination. J. Theoret. Biol. 135: 303–307. Future Trends Benkovic, S.J. and Cameron, C.E. 1995. Kinetic analysis of Genome sequencing provides tremendous amounts of nucleotide incorporation and misincorporation by Klenow information that can be used in several different areas fragment of Escherichia coli DNA polymerase I. Methods Enzymol. 262: 257–269. of biology. Comparative sequencing will dominate Brenner, S., Williams, S.R., Vermaas, E.H., Storck, T., Moon, K., DNA sequencing to identify variations across those ge- McCollum, C., Mao, J.I., Luo, S., Kirchner, J.J., Eletr, S., et al. nomes that have been sequenced. Technologies with 2000. In vitro cloning of complex mixtures of DNA on high accuracy for the identification of these variations microbeads: Physical separation of differentially expressed cDNAs. Proc. Natl. Acad. Sci. 97: 1665–1670. in genome-wide scanning will have great value. Pyrose- Canard, B. and Sarfati, R.S. 1994. DNA polymerase fluorescent quencing has shown excellent accuracy for analysis of substrates with reversible 3Ј-tags. Gene 148: 1–6. polymorphic DNA fragments. This technology has also Capson, T.L., Peliska, J.A., Kaboord, B.F., Frey, M.W., Lively, C., Dahlberg, M., and Benkovic, S.J. 1992. Kinetic characterization of been used for quantification of allelic frequency in the polymerase and exonuclease activities of the gene 43 protein populations. While the variations are characterized, of bacteriophage T4. Biochemistry 31: 10984–10994. correlation of variation to phenotype can be per- Cheesman, P.C. 1994. Method for sequencing polynucleotides. US formed. Pyrosequencing will have a large impact in Patent no. 5302509. Cline, J., Braman, J.C., and Hogrefe, H.H. 1996. PCR fidelity of pfu that area because a large number of samples can be DNA polymerase and other thermostable DNA polymerases. pooled in one pyrosequencing reaction. A high Nucleic Acids Res. 24: 3546–3551. throughput version of this technology can potentially DeLuca, M. and McElroy, W.D. 1984. Two kinetically distinguishable be used for resequencing of genomes. ATP sites in firefly luciferase. Biochem. Biophys. Res. Commun. 123: 764–770. Pyrosequencing technology is relatively new and Drmanac, R., Labat, I., Brukner, I., and Crkvenjakov, R. 1989. there is much room for developments in both chemis- Sequencing of megabase plus DNA by hybridization: Theory of try and in instrumentation. The technology is already the method. 4: 114–128. Eckersten, A., O¨ rlefors, A.E., Ellstro¨m, C., Erickson, A., Lo¨fman, E., time- and cost-competitive (the cost is currently 69 Eriksson, A., Eriksson, S., Jorsback, A., Tooke, N., Derand, H., et cents per sample using standard pyrosequencing; al. 2000. High-throughput SNP scoring in a disposable http://www.pyrosequencing.com) when compared microfabricated CD device. In Proceedings of the Micro Total with the existing sequencing methods. Work is under- Analysis Systems. Kluwer Academic Publishers. Eckert, K.A. and Kunkel, T.A. 1990. High fidelity DNA synthesis by way to further improve the chemistry, to measure the the Thermus aquaticus DNA polymerase. Nucleic Acids Res. sequencing efficiency at elevated temperatures, and to 18: 3739–3744.

10 Genome Research www.genome.org Downloaded from www.genome.org on October 30, 2006

Pyrosequencing Sheds Light on DNA Sequencing

Ekstrom, B., Alderborn, A., and Hammerling, U. 2000. Ronaghi, M. 1998. ‘Pyrosequencing: A tool for sequence-based DNA Pyrosequencing for SNPs. Progress in biomedical optics 1: 134–139. analysis.‘ Doctoral thesis, The Royal Institute of Technology, Garcia, A.C., Ahamdian, A., Gharizadeh, B., Lundeberg, J., Ronaghi, Stockholm, Sweden. M., and Nyren, P. 2000. Mutation detection by Pyrosequencing: ———. 2000. Improved performance of Pyrosequencing using Sequencing of exons 5 to 8 of the p53 tumour supressor gene. single-stranded DNA-binding protein. Anal. Biochem. Gene 253: 249–257. 286: 282–288. Gillin, F.D. and Nossal, N.G. 1976. Control of mutation frequency Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M., and by bacteriophage T4 DNA polymerase. II. Accuracy of nucleotide Nyren, P. 1996. Real-time DNA sequencing using detection of selection by the L88 mutator, CB120 antimutator, and wild type pyrophosphate release. Anal. Biochem. 242: 84–89. phage T4 DNA polymerases. J. Biol. Chem. 251: 5225–5232. Ronaghi, M., Pettersson, B., Uhlen, M., and Nyren, P. 1998a. Hyman, E.D. 1988. A new method of sequencing DNA. Anal. PCR-introduced loop structure as primer in DNA sequencing. Biochem. 174: 423–436. BioTechniques 25: 876–884. Karamohamed, S., Nilsson, J., Nourizad, K., Ronaghi, M., Pettersson, Ronaghi, M., Uhlen, M., and Nyren, P. 1998b. A sequencing method B., and Nyren, P. 1999a. Production, purification, and based on real-time pyrophosphate. Science 281: 363–365. luminometric analysis of recombinant Saccharomyces cerevisiae Ronaghi, M., Nygren, M., Lundeberg, J., and Nyren, P. 1999. MET3 adenosine triphosphate sulfurylase expressed in Escherichia Analyses of secondary structures in DNA by pyrosequencing. coli. Prot. Exp. Purif. 15: 381–388. Anal. Biochem. 267: 65–71. Karamohamed, S., Nordstrom, T., and Nyren, P. 1999b. Real-time Ronaghi, M., Pourmand, N., Jain, M., Willis, T., and Davis, R. 2000. bioluminometric method for detection of nucleoside Pyrosequencing for genome resequencing. In 12th International diphosphate kinase activity. BioTechniques 26: 728–734. Genome Sequencing and Analysis Conference, Miami, FL. Khrapko, K.R., Lysov Yu, P., Khorlyn, A.A., Shick, V.V., Florentiev, Rosenthal, A. 1989. Process for solid phase-sequencing of nucleic V.L., and Mirzabekov, A.D. 1989. An oligonucleotide acid. USPatent no. US1985000761107. hybridization approach to DNA sequencing. FEBS Lett. Sanger, F., Nicklen, S., and Coulson, A.R. 1977. DNA sequencing 256: 118–122. with chain-terminating inhibitors. Proc. Natl. Acad. Sci. Melamede, R.J. 1985. Automatable process for sequencing 74: 5463–5467. nucleotide. US Patent no. US4863849. Southern, E.M. 1989. Analysing polynucleotide sequences. US Patent Metzker, M.L., Raghavachari, R., Richards, S., Jacutin, S.E., Civitello, no. WO/10977. A., Burgess, K., and Gibbs, R.A. 1994. Termination of DNA Topal, M.D., DiGuiseppi, S.R., and Sinha, N.K. 1980. Molecular basis synthesis by novel 3Ј-modified-deoxyribonucleoside for substitution mutations. Effect of primer terminal and 5Ј-triphosphates. Nucleic Acids Res. 22: 4259–4267. template residues on nucleotide selection by phage T4 DNA Nordstrom, T., Nourizad, K., Ronaghi, M., and Nyren, P. 2000a. polymerase in vitro. J. Biol. Chem. 255: 11717–11724. Methods enabling Pyrosequencing on double-stranded DNA. Traverso-Cori, A., Chaimovich, H., and Cori, O. 1965. Kinetic studies Anal. Biochem. 282: 186–193. and properties of potato apyrase. Arch. Biochem. Biophys. Nordstrom, T., Ronaghi, M., Forsberg, L., de Faire, U., Morgenstern, 109: 173–181. R., and Nyren, P. 2000b. Direct analysis of single-nucleotide Tsien, R.Y., Ross, P., Fahhnestock, M., and Johnston, A.J. 1991. polymorphism on double-stranded DNA by pyrosequencing. Method for DNA sequencing. US Patent no. PCT WO 91/06678. Biotechnol. Appl. Biochem. 31: 107–112. Van Draanen, N.A., Tucker, S.C., Boyd, F.L., Trotter, B.W., and Nyren, P. 1987. Enzymatic method for continuous monitoring of Reardon, J.E. 1992. Beta-L-thymidine 5Ј-triphosphate analogs as DNA polymerase activity. Anal. Biochem. 167: 235–238. DNA polymerase substrates. J. Biol. Chem. 267: 25019–25024. Nyren, P. and Lundin, A. 1985. Enzymatic method for continuous Wong, I., Patel, S.S., and Johnson, K.A. 1991. An induced-fit kinetic monitoring of inorganic pyrophosphate synthesis. Anal. Biochem. mechanism for DNA replication fidelity: Direct measurement by 151: 504–509. single-turnover kinetics. Biochemistry 30: 526–537.

Genome Research 11 www.genome.org