SCIENCE CHINA Life Sciences

January 2010 Vol.53 No.1: 44–57 Celebrating Scientia Sinica doi: 10.1007/s11427-010-0023-6 (SCIENCE CHINA)’S the 60th Anniverasry

· REVIEW ·

The next-generation sequencing technology: A technology review and future perspective

ZHOU XiaoGuang1†*, REN LuFeng1†, LI YunTao2, ZHANG Meng1, YU YuDe2 & YU Jun1*

1 Key of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China; 2 Institute of Semiconductor, Chinese Academy of Sciences, Beijing 100083, China

Received December 8, 2009; accepted December 16, 2009

As one of the most powerful tools in biomedical research, DNA sequencing not only has been improving its productivity in an exponential growth rate but also been evolving into a new layout of technological territories toward engineering and physical disciplines over the past three decades. In this technical review, we look into technical characteristics of the next-gen sequenc- ers and provide prospective insights into their future development and applications. We envisage that some of the emerging platforms are capable of supporting the $1000 genome and $100 genome goals if given a few years for technical maturation. We also suggest that scientists from China should play an active role in this campaign that will have profound impact on both scientific research and societal healthcare systems.

genomics, DNA sequencing, next generation sequencing technologies, sequencer

Citation: Zhou X G, Ren L F, Li Y T, et al. The next-generation sequencing technology: A technology review and future perspective. Sci China Life Sci, 2010, 53: 44–57, doi: 10.1007/s11427-010-0023-6

1 Introduction propel creation and development of other branches of ge- nomic studies such as comparative genomics and bioinfor- matics as well as closely related fields such as systems bi- DNA sequencing technology has played an essential role in ology and synthetic biology. In a way, technological ad- the advancement of molecular biology ever since its inven- vancement in DNA sequencing has transformed the study of tion [1]. From early manual sequencing operation developed fundamental element of life – from individual, localized by Frederick Sanger, first-generation automated sequencer genes or fragment of genes to whole genomes, which in turn driven by Sanger , to present next-gen sequencing demands more competent sequencing technology. The syn- platforms, we have witnessed tremendous changes in this ergetic relationship between sequencing and its applications field [2]. Some even liken this change in genomic sequencing ensures that the trend will continue in foreseeable future and to the evolution of semiconductor technology [3]. This is not even accelerate due to the promise of and drive for person- totally unfounded – the speed of sequencing has improved alized medicine in disease diagnosis and treatment. Here, exponentially every few years over the last few decades, we provide a review of sequencing technology evolution, similar to what semiconductor industry has experienced summary of generational advancements with their merits under the Moore’s law [4]. This rapid transformation is and drawbacks, and prediction of possible direction of the captured in Figure 1 and has fundamentally changed the field. For ease of discussion, we categorize the progress of way we can examine the blue-print of all life and helps to sequencing technology into three generations with sub-

† Contributed equally to this work *Corresponding author (email:[email protected]; [email protected]) © Science China Press and Springer-Verlag Berlin Heidelberg 2010 life.scichina.com www.springerlink.com Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 45

time, to study in depth the genetic code of life. The original method was primarily a manual endeavor and hard to automate. For one, it utilized isotopic radioac- tive labeling of primer for DNA ladder imaging, making the sequencing process non-user friendly. The requirement of four separate chain-termination reactions with dideoxynu- cleotides (ddNTPs) and subsequent slab-gel based separa- tion of chain-terminated products on four individual elec- trophoretic lanes are both time- and reagent-consuming. All these severely limited the overall throughput of sequencing- hence, the desire to develop non-radioactive based 1st gen- eration sequencing technology.

2.1.1 G1.1 Figure 1 Sequencing technology timeline The initial version of 1st generation sequencer first appeared in mid 80s and developed in Leroy Hood’s laboratory at Cal generations as illustrated in Table 1. The designation of each Tech [6]. It made possible through modifications to the state of advancement is somewhat arbitrary but nevertheless Sanger’s method. The key improvement includes the use of it captures the key delineation of technological advances of color fluorescent dyes to replace radioactive labeling - four each period. dideoxynucleotide terminators are tagged with differently colored fluorophores. Furthermore, the tag is attached to the terminator molecule (ddNTPs) instead of the primer as in 2 A review of the technology and its recent de- the case of original Sanger’s method. The color-coded velopments scheme made it possible to perform all four chain-termina- tion reactions in one tube. Polyacrylamide gel analysis of 2.1 1st generation technology – fluorescently labeled ladder fragments can be performed through computerized sanger method fluorescence detection system. This greatly enhanced the overall sequencing speed and reduced manual intervention Before the appearance of first automated DNA sequencing required by operator during sequencing run. platform, widely accepted DNA sequencing method of In the following year, ABI introduced its first semi- choice had been the Sanger’s chain-termination method automated DNA sequencing platform, e.g. ABI 370 Se- developed in the mid 1970s, for which Sanger was awarded quencer, based on the technology from Leroy Hood’s lab [7]. the Nobel Chemistry Prize in 1980 [5]. His invention In the subsequent two decades, we had experienced a rapid opened a realm of possibility for researchers, for the first change and improvement of its performance. But the under-

Table 1 Roadmap of sequencing technologya)

Generation 1st-G 2nd-G 3rd-G Version 1.1 1.2 2.1 2.2 2.3 3.1 3.2 ABI/GenoME Sanger ABI MS SBS Illumina Complete Ge- SBL ABI/Polonator G.007 nomics SBP Roche FD Helicos Platform SM-SBS Pacific Biosciences/ ? FE VisiGen SM-SBL SM-SBP Pore PoC Nano Nife PoC Graphene PoC a) SBS, sequence-by-synthesis; SBL, sequence-by-ligation; SBP: sequence-by- pyrosequencing; SM: single molecule, FD: fixed DNA; FE: fixed en- zyme, PoC: proof-of-concept; ?: expected technology 46 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 pinning working principle has not changed until very re- tions of sequencing platform. The reliability, raw accuracy, cently. scalability of the tried-and-true method will continue to play an important role, especially in sequencing PCR products 2.2.2 G1.2 and clone-ends of plasmids and bacterial artificial chromo- Toward the end of last century, the second version of the somes as well as genotyping for STR markers. 1st-generation technology appeared. With it, we see further enhancement in the speed and quality of DNA sequencing. 2.2 2nd Generation technologies – cyclic array sequenc- This was mainly achieved through improvements in two ing by Synthesis areas. First, slab-gel based separation was replaced by cap- illary-electrophoresis, and second, number of concurrent The so-called next generation sequencing methods encom- samples that can be analyzed was increased through higher pass a myriad of approaches based on different technology. parallelism. The use of capillary instead slab-gel eliminated Although utilized quite diverse techniques and biochemistry sample loading, reduced the reagent consumption and sped in each step from template library preparation, fragment up analysis. Further, the compact form of capillary device amplification, to sequencing, they all adopted a massive makes it easier to parallelize multiple sequencing runs, re- matrix configuration popularized by microarray analysis – sulting in higher instrument throughput; 96 samples on ABI DNA samples on the array are simultaneously analyzed in 3730 platform and 384 samples on Amersham MegaBACE parallel. Furthermore, sequencing is carried out by observ- could be achieved in one run. This generation of sequencers ing and recording optical events through microscopic played a pivotal role in DNA sequence production at later apparatus during iterative sequencing cycles - a serial ex- stage of the Human Genome Project and helped to accele- tension of primed template by either DNA polymerase [8] rate the project completion. They have been continuously or ligase [9]. used until this day due to key advantages in its raw data Several key characteristics can be easily observed based accuracy and sequence read length. on the general description. First, massive parallelism can be Through decades of gradual improvement, 1st-generation achieved through ordered or disordered array configuration sequencer can be applied to achieve sequencing length up to that offers high degree of information density. Theoretically, 1000 bp, with raw accuracy as high as 99.999%, at a cost as this is only limited by the diffraction limit of light (i.e., half little as $0.50/kilobase and throughput close to 600000 of the wavelength used for detection of independent optical bp/day. However impressive those numbers may seem, the events). This dramatically increases the overall throughput 1st-gen technology has reached its pinnacle both in terms of of the sequencing operation. Second, no electrophoresis is speed and cost. Reliance on electrophoretic separation has used, resulting in ease of miniaturization and less sam- rendered this approach difficult to further increase analysis ple/reagent consumption over the 1st-generation technology. speed, to achieve higher degree of parallelism, and to re- duce sequencing cost through miniaturization; hence, the 2.2.1 Next-gen Sequencer need for development of a completely new generation of All next-gen sequencing platforms follow a similar work approaches that overcome those limitations. flow as outlined in Figure 2 and require clonal amplification That said, the 1st generation technology is not going to to enhance optical detection for sequencing. Three widely disappear any time soon. It will co-exist with next-genera- used next-gen commercial platforms are Illumina Genome

Figure 2 Workflow of next-gen sequencing Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 47

Analyzer, Roche 454 Genome Sequencer, and Life Tech- DNA. The 3′ end is then unblocked to allow next cycle of nologies SOLiD System. They were all invented and de- extension to occur. This process repeats multiple times, up veloped towards the end of 1990s and commercialized in to 50 cycles, to yield DNA read length of 50 bp. the mid of first decade of the century. A relatively new Throughput of the platform can be thousand times higher comer in this arena is the Polonator G.007, initially devel- than that of the conventional sequencer platform, i.e. 1st-gen oped in George Church’s lab at Harvard and now manufac- platform. The main drawback is its relative short read length tured by Dover Systems. Complete Genomics has recently contributed by optical signal decay and dephasing. Since introduced its sequencing service platform based on its pro- optical signal is acquired on each DNA cluster, it is critical prietary sequencing technology although it has not indicated that all strands of DNA in an ensemble grow in unison. its intention to market this instrument. All of them are util- However, each step of sequencing chemistry could fail, e.g. ized sequencing-by-synthesis with variation in DNA array failing to cleave fluorescent tag and/or remove blocking formation, cluster amplification, and enzyme-based se- group. This leads to some DNA strands extend out of synch quencing biochemistry. with other strands in the ensemble or fail to extend all to- First, DNA template library is constructed. DNA library gether, causing signal decay or fluorescent signal dephased. fragments are prepared from either randomly sheared ge- Furthermore, the error rate is accumulative, i.e. increasing nomic DNA (10 s to 100s bp in size) or alternatively as DNA strand gets longer, limiting the length of sequen- pair-end fragments with controlled distance distribution. cing read. The double-stranded fragments are ligated with adaptor (2) Roche 454 Genome Sequencer. The 454 Sequencer sequences at both ends and denatured. The resulting sin- utilizes emulsion PCR to yield amplicons used for the se- gle-stranded template library is created and immobilized on quencing procedure [14]. Tiny paramagnetic beads coated a solid surface, either a planar surface or supporting beads, with DNA primers are mixed with single-stranded template and clonally amplified by one of several means, e.g. bridge DNA library together with components necessary for PCR PCR [10], emulsion PCR [11], or in situ polonies [12]. DNA reaction. Proportional amount of beads and library frag- clusters or amplified beads form an array of DNA clusters ments are mixed to ensure most beads carry no more than on a slide, which then undergo cyclic manipulation through one ssDNA molecule. The aqueous solution is mixed with enzyme such as polymerase or ligase. Optical events gener- oil to form emulsion where each water compartment forms ated from the cyclic chain extension process are monitored an independent micro-reactor for subsequent PCR chemistry. by microscopic detection system, and images recorded After multiple rounds of thermo-cycling, each bead is through CCD camera. Sequential analysis of array image coated with thousands of copies of DNA of the same se- yields DNA fragment sequences, which are assembled into quence. Beads are further enriched, transferred, and depos- larger sequence contigs by computer algorithm. ited on a picotiter plate fabricated in organized array of tiny (1) Illumina Genome Analyzer. The amplification of sin- wells with each hole occupied by only one bead. The pico- gle-stranded library fragments is carried out through a titer plate is engineered as part of flow cell for sequencing process coined “bridge amplification” [13]. On an oligo- chemistry on one side and bounded with optic fibers as part derived flow-cell surface, consisting eight independent of CCD based optical detection system on the other. lanes, single-strand DNAs flanked by asymmetrical adap- The base interrogation operation is sequencing-by- tors form an oligo-bridges from both ends. After multiple synthesis that taps into pyrophosphate chemistry to produce PCR thermal cycles, thousands of copies of DNA, ampli- optical signal for detection [15]. The pyosequencing, as cons, based on one single-strand of DNA fragment are cre- often called, relies on enzymes ATP sulfurylase and ated and clustered on the surface to a single physical loca- luciferase. Release of pyrophosphate, during nucleotide tion. Millions of such clusters can be produced in each one triphosphate incorporation into the DNA chain, triggers a of the eight independent flow cell channels (as such, eight cascade of biochemical reaction via ATP sulfurylase and independent libraries can be analyzed in parallel during one luciferase, resulting in a burst of biochemiluminescent light instrument run). A sequencing primer is then hybridized to a being emitted. Sequencing is achieved by sequentially in- universal sequence in amplicons to start the sequencing run. troducing each of the four dNTPs into the flow cell. Pres- Illumina’s GA utilizes sequencing-by-synthesis with ence or absence of light burst of each picotiter well indi- fluorescently labeled nucleotides and reversible terminators. cates the incorporation or not of corresponding nucleotide In each cycle of sequence interrogation, four distinctly la- and, therefore, reveals the identity of complementary base beled nucleotides are added simultaneously to the flow cell on the template DNA in that well. channel together with DNA polymerase to give rise to DNA Major advantages of pyrosequencing are its speed and chain extension according to base pairing rule. Each nucleo- read length-up to 500 bp. Unlike other next-generation tide is 3′-OH blocked to prevent further addition. A fluores- technologies discussed here, pyrosequencing does not need cence image is acquired. The base-unique fluorescence re- to carry out extra chemistries to the extending DNA chain veals the identity of newly incorporated nucleotide for each beyond normal biochemical process by DNA polymerase, cluster and, therefore, sequence of corresponding template e.g . no need of removing label moiety and/or de-block ter- 48 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 minator. This reduces the chances of mishap in chemical dual-base encoding approach as described above. Sequenc- reaction and, therefore, less chance for premature chain ing assay is carried out through a serial of ligation reactions termination or out of sync extension which is major cause of between a universal primer and a nanomer probe on emul- dephasing. However, this asynchronous processing renders sion PCR amplified DNA cluster [9]. Each pool of nanomer a limitation for pyrosequencing when going through ho- probes, consisting of degenerate oligonucleotides with mopolymer region, e.g. GGGGGG in a row, where no fluorescent label correlating to one query position (i.e. fluo- terminating moiety can stop the extension run. Length of the rescence color corresponds to the base at interrogation posi- homopolymer has to be inferred from optical signal inten- tion), is successively introduced together with DNA ligase sity which is prone to error. As a consequence, dominant to carry out primer-probe ligation. After each ligation cycle, error type on this platform is insertion-deletion rather than a fluorescent image is taken. The extended primer-probe substitution. Another drawback for 454 is its relative high chain is then denatured away to reset the system. Ligation cost for reagents, comparing with other next-gen sequencer, between primer and the second pool of degenerate oligo- due to its reliance on a set of enzymes for pyrophosphate probes for next query position occurs. This reset-ligation- detection chemistry. imagining process repeats until all positions are interro- (3) Life Technologies SOLiD System. Like in the case of gated. 454, SOLiD system also employs emulsion PCR as a DNA In this system, since no consecutive ligation is required template amplification scheme with paramagnetic beads. after each reset, sequencing error is not accumulative. This After breaking the emulsion, amplified beads are collected, is one of the advantages of the system. However, this does enriched, and fixed on a flat glass substrate to create a dis- limit the reach of query position from each primer location order array. and, therefore, result in a shorter read length. This short- Its sequencing-by-synthesis is driven by ligation rather coming can be somewhat mitigated by using multiple an- than polymerization as in previous platforms [16]. Further- chor locations in library sequence to extend the reach. The more, it employs a dual-base encoding scheme in the proc- Polonator is substantially lower in instrument price than ess to assist error detection. A universal primer complemen- other commercially available next-generation systems. It is tary to the adaptor region of template library DNA on the also an open-source platform, which means it potentially bead is hybridized. A serial ligation cycles follow - each allows sequencing operation and/or chemistry of the in- ligation occurs between the extending chain and a pool of strument to be altered and enhanced by end users. fluorescently labeled (at the 8th position) degenerate oc- (5) Complete Genomics. Complete Genomics utilizes tamer probes. The octamer pool is structured such that oli- sequencing by ligation approach in much the same way as gos with bases at the probing positions, e.g. position 1 and 2, the Polonator does. However, it incorporates a unique crea- correlate with specific fluorescent color. After ligation, a tion to increase the density of DNA clusters on slide surface fluorescent image is acquired. The octamer is, then, chemi- and to reduce reagent consumption [17]. To increase the cally cleaved between position 5 and 6 to remove the last read length, multiple adaptors (four) flanking genomic three bases together with the fluorescent label. Progressive fragments are inserted to form a circular DNA template rounds of ligation enable interrogation of every 1th and 2th through sequence manipulation [18]. The template sequence positions along the extended chain (i.e., 1-2, 6-7, 11-12, is then multiplied through circular PCR to make concate- 16-17, 21-22, 26-27, and 31-32). After 7 rounds of ligation mers containing two hundred copies of the original template cycle, the extended chain is lifted away and system is reset. sequence. The concatemer is folded into a ball structure A second primer, set back by one base, is annealed to the called DNA nano-ball (DNB). Each ball then self-assembles adaptor region. This is followed by another 7 rounds of onto a planar substrate surface patterned with sticky (or ligation/interrogation cycles as described above. This en- activated) spots to form a dense array of DNBs. DNBs do ables interrogation of a new set of positions at 0-1, 5-6 … not stick to areas between activated spot on the slide, leav- etc. This process continues with successive reset with ing an orderly-arranged matrix of DNBs. This provides one-base-shortened primers to and followed by ligation cy- much denser array on the surface than those created through cles until entire sequencing region is covered. This approach clusters or deposited beads because DNA nano-balls utilize sounds complicated on paper. In reality, it’s all driven by more effectively three-dimensional spaces. It also eliminates computerized system and fully automated. Since each base the use of flow cell as the rest of the next-gen sequencing is measured twice, i.e. in two separate ligation cycles, this machines all do [19]. approach has an added advantage of identify miscall during The sequencing assay is carried out in a similar fashion sequencing [16]. The major limitation is its relatively short as the Polonator, i.e. sequencing by ligation with single sequence read length, also caused by dephasing in an en- query position – coined as cPAL (combinatorial probe- semble. anchor ligation). cPAL sequencing uses pools of fluores- (4) Polonator G.007. Polonator is another next-genera- cently labeled degenerate oligo-probes with four distinct tion sequencing platform that utilizes sequencing-by-liga- colors correlating to four types of base at a given position. tion. Its implementation uses single base probe instead of There is a separate pool for each reading position. A given Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 49 pool of probes is ligated with anchor according to base ment. pairing rule at the query position with fluorescent color of (1) Helicos HeliScope. The HeliScope Genetic Analysis the probe correlating to the query base. After each read, System by Helicos Biosciences, based on the work from probe-anchor complex is washed away. Another anchor hy- Quake’s group [21), is the first single molecule sequencing bridized and next pool of probes for a different position is system appeared on market very recently. It utilizes se- cycled in. The process repeats until all positions are read. A quencing by synthesis on single molecule. Constructed sin- recent publication from Complete Genomics has shown its gle-stranded DNA library is disorderly arrayed on a flat sequence accuracy and cost-effectiveness based on the se- substrate without any amplification. At each cycle of se- quences of three human genomes [20]. quencing, DNA polymerase and one of four fluorescently There is no chaining of consecutive probes as ABI labeled nucleotides are flowed in, resulting in template- SOLiD System does. This offers a couple of advantages. dependent extension of DNA strands. Strands in the array First, there is no memory effect. Any error made in prior that have undergone base extension light up by fluorescent ligation cycle does not carry forward, resulting in better label, which are recorded with CCD camera. After washing, fault tolerance. Furthermore, ligation yield of each cycle fluorescent labels on the extended strands are chemically does not have to be high, reducing the amount of reagents, cleaved and removed. Another cycle of single-base exten- e.g. probe and anchor, needed for assay. However, the read sion, label-cleave and imaging can begin. As in pyrose- length from each anchor position is still short due to limited quencing, each iterative cycle is asynchronous - some length of oligo-probes (9-mers). The overall read length is strands in the array may pull ahead, fall behind, or com- extended by using four separate anchor locations in each pletely fail to extend all together. Since each strand operates library fragment. in independence, there is no dephasing issue to concern. The next-generation sequencing platforms based on se- This does mean, however, homopolymer run could be an quencing by synthesis shown above dramatically increase issue as in the case for pyrosequencing. But, unlike Roche the speed and reduce sequencing cost per base over the 454 platform, single molecule affords us mitigation by 1st-generation platforms. It is common to see these plat- playing trick with enzyme kinetics to slow down the rate of forms to churn out Giga-base of sequence data per day per chain extension so to reduce the chance of two consecutive instrument at a cost only a fraction of a cent per kilobase. base incorporations before dNTP being washed away [22]. However, their short sequence reads is an Archille’s heel for As mentioned above, a key challenge with SMS is its all next-gen platforms except that of Roche 454 Sequencer. detection. HeliScope utilizes a fluorescent microscopic This is mainly attributed to the dephasing problem of opti- technique called the Total Internal Reflection cal signal in DNA sequencing cluster. One solution to rem- (TIRM) – where only fluorophores within a very thin layer edy this would be to eliminate the ensemble effect all to- of reaction volume on the surface of a flow cell can be ex- gether – sequencing on single molecule. cited by evanescent wave to produce fluorescence [23]. This helps to reduce fluorescent background. But even with the 2.2.2 Single Molecule Sequencing (SMS) state-of-art optical system, it is still often a challenge to To overcome one of the major drawbacks of next-gen se- capture single-molecule event. Therefore, raw sequencing quencing technology, relatively short read, efforts have been accuracy of the platform suffers in comparing with ensem- made to develop single molecule sequencing platforms – ble-detection-based predecessors with dominant error type where sequencing by synthesis is performed on an array of being deletion. However, a two-pass strategy can substan- single DNA molecule. Single molecule also helps to in- tially improve the accuracy. Single molecule means we can crease the number of DNA fragments that can be independ- now reset the tethered template DNA to its original state by ently analyzed in a given surface area and, therefore, lifting off the extended chain after one run of sequencing. achieves much higher level of throughput. Of course, it also This affords us to carry out another sequencing pass in op- means no costly cluster amplification step is required, fur- posite direction from distal adaptor, yielding a second se- ther reducing sequencing cost. This, however, introduces a quence of the same template. Duplicated sequences can then new set of challenges, mostly in the area of optical signal be used to average out detection errors and, thus, give rise detection of single-molecule event. The major issue is to to much higher accuracy than single pass. reduce non-assay-specific fluorescent background interfer- (2) VisiGen. VisiGen Biotechnologies, now part of Life ence, e.g. free floating fluorescent molecules which are not Technologies, has also been working on an implementation participants of the actual chemical reaction. Several ap- of single molecule sequencing by synthesis [24]. In a nut- proaches have been implemented in attempt to address this shell, they engineered a protein nano-device to observe and challenge. The underpinning principle of all is to limit the record the DNA synthesis process by DNA polymerase in volume of detection close to actual site of sequencing reac- real-time. This is achieved through FRET (Fluorescence tion, such as through evanescent electromagnetic wave. In Resonance Energy Transfer) between fluorescence donor next few sections, we will take a look at a few platforms and receptor. In FRET, receptor molecule does not emit that have been developed or are currently under develop- fluorescent light when excited except when there is a energy 50 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 donor nearby. In their setup, each dNTP is attached with a fluorescence burst reveals the identity of the complementary fluorescent receptor moiety of distinct color at gamma- base on template DNA. By continuously following the phosphate and DNA polymerase is engineered to carry a bursts of fluorescence of each waveguide in real-time, se- fluorescent donor moiety close to its active site. During quence of template DNA in each hole can be rapidly deter- DNA base extension, a matching dNTP is grabbed by DNA mined. polymerase and brought fluorescent receptor close to the The PacificBio technology has the great potential for donor group. Fluorescent energy transfer occurs, giving off high speed sequencing with long read length and low se- fluorescence of correlating color. Once done, fluorescent quencing cost. But, error stem from challenges of real-time moiety as part of pyrophosphate is released by DNA poly- single-molecule detection might put a damper on its per- merase. This, in essence, creates a short burst of fluores- formance in raw accuracy as with other SMS platforms. cent light in concert with nucleotide incorporation. By This can be mitigated through multiple runs after reset on tracking and analyzing sequential light burst, the DNA se- same samples as described in previous section. Also, current quence can be constructed. Please note that there is no CCD chip technology has limited the maximum area of si- pause in the process to remove fluorescent label or to cleave multaneously observable ZMWs. Low yield ratio (~30%) of blocking group - it is a true real-time process. This means it polymerase-occupied ZMWs also put a limit on the number can be done at tremendous speed, given that optical re- of useable waveguides [27]. All these have limited the cording apparatus can keep up. To further reduce back- higher throughput potential of the SMRT technology. Even ground interference, they also apply TIRM as its fluores- with these limitations, the first version of the instrument, cence detection setup. Furthermore, unlike Helicos’ plat- when introduced in 2010, is promised to have read length of form, this system fixes DNA polymerase on a substrate sur- no less than 1500 bp, at a speed of 15 min per run, and with face, instead of DNA strand - extending DNA strand grows reagent cost no more than $60 per run. It is anticipated that untethered. The benefit of immobilized enzyme instead of future version of this platform, after those technical issues DNA comes from keeping nucleotide incorporation event are resolved, could churn out 100 gigabases of data per day close to the detection volume as DNA extends, i.e. fluores- with read length up to 100000 bp. cence not out of sight as DNA chain grows. Although Life (4) Mobious Nexus I. Other than announcing its intention Technologies has not said much about instrument’s per- to develop a single molecule sequencing platform, Mobious formance, we can surmise it is going to be fast and to give Biosystems has not said much about the inner workings of long read length. Theoretically, the read length is only lim- its technology - the Polykinetic Sequencing [28]. This ited by the processivity of DNA polymerase. In reality, it technology exploits the natural chemical behavior of poly- might be restricted by other factors such as photo-bleaching merase during DNA synthesis. For a polymerase to incor- of fluorescence donor molecule attached to DNA poly- porate nucleotide to a growing DNA chain based on its merase. It has been reported that Life Technologies is template, it needs to test a nucleotide from solution to de- working on a kind of quantum-dot fluorescent label to termine its complementarity. If the nucleotide not a match, overcome this problem. it is released immediately. If it is a match, polymerase will (3) Pacific Biosciences. Pacific Biosciences is another hold on to it and continue time-consuming steps to add the company that has been working to develop a new genera- nucleotide to growing chain. It is this time difference for tion of sequencing technology, the SMRT (Single Molecule each nucleotide which is exploited in Mobious’s sequenc- Real Time) technology [25]. Its single-molecule sequenc- ing-by-synthesis approach. Four nucleotides are sequen- ing-by-synthesis relies on a nano-structure called Zero tially introduced into the reaction volume one at a time. By Mode Waveguide (ZMW) for real-time observation of DNA measuring the time DNA polymerase (fixed on substrate polymerization [26]. It consists of thousands upon thou- surface) hold on to the nucleotide to complete polymeriza- sands of sub-wavelength holes, tens of nanometers in di- tion, matching or non-matching nucleotide can be inferred. ameter, fabricated by perforating a thin metal film supported Detection methods which can capture polymerase’s con- by transparent substrate. When illuminated from the side of formational change, for instance, can be used in this regard. glass, light cannot penetrate through the hole because the Fluorescence resonance energy transfer (FRET) with pair of dimension of each hole is less than the wavelength of illu- donor and receptor strategically mounted on DNA poly- minating light. This leaves an exponentially-decayed eva- merase can certainly be used for this purpose as VisiGen nescent wave at very bottom of each hole, creating a very does. The key problem with fluorescence, however, is small volume of detection. During sequencing assay, dou- photo-bleaching of fluorophore. To get around this problem, ble-stranded DNA is synthesized from single-stranded tem- Mobious exploits electromagnetic property variation as en- plate by polymerase planted at the bottom of each zyme conformation changes, through plasmon resonance waveguide, one per hole. Each time a base being added, spectroscopy, nuclear magnetic resonance, etc. [29]. One polymerase locks on a fluorescently labeled dNTP (also added advantage of this approach, besides those with sin- attached to gamma-phosphate position) and brings it to the gle-molecule sequencing, is no fluorescently labeled nu- detection volume, creating a burst of light. This color-coded cleotides are used - a potential reagent cost reduction. Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 51

2.3 Generation Technologies – Direct Sequencing ters (1-2 nanometers), usually made of membrane of

nd solid-state material or biological molecules with perforated In all abovementioned 2 -generation technologies, se- hole. The idea is that when threading a DNA strand through quence is determined by indirect interrogation of nucleo- the pore, driven electropheritically, one can read off the base incorporation with either DNA polymerase or DNA bases as they pass through pore opening by some electro- ligase through optical events generated during synthesis, physical means. Various groups and organizations around often assisted by fluorescence or chemiluminescence. Be- the globe are exploring this idea: Agilent, DNA Electronics, sides requiring expensive optical detection system, large IBM, NabSys, Oxford Nanopore Technologies, Sequenom, numbers of optical images have to be recorded, stored, and etc., just name a few. analyzed, adding to the complexity and cost of sequencing. Two key challenges [35] facing all nanopore-based ap- Reliance on biochemical reactions for base interrogation proaches are (i) distinguish four nucleotides in time com- further adds to the cost of consumables which account for a mensurate with the travel rate of passing DNA strand, (ii) substantial fraction of current sequencing expense. Direct control the speed of DNA translocation through the pore. sequence interrogation, where no chemistry is required, is Initial attempts to measure the fluctuation of ionic current - highly desirable to further reduce sequencing cost. What is variation in electric potentials across the nanopore upon going to happen next is hard to predict with a great certainty blockage - as the single-stranded DNA passes through the in a field that has been experiencing changes with neck- hole have so far yielded little success. Calculation and ex- breaking speed. But, several areas of recent research with periment have shown measurement of ionic conductivity great amount of activities offer us some hint on potential across the nanopore alone is unlikely to provide the required technologies from which future generation of sequencing resolution to discern each nucleotide in DNA molecule [36]. platform may emerge. The channel length of nanopore is usually more than 5 nm, spanning over a dozen nucleobases which is too long to 2.3.1 Non-optical Microscopic Imaging offer single base current resolution needed for sequencing. As the saying goes “picture speaks a thousand words”. One Even though the ionic current measurement is not dis- of the most direct ways to determine DNA sequence is to tinguishable for individual base, it can readily discern sin- visualize the nucleotides’ (mostly the bases) linear arrange- gle-stranded vs. double-stranded DNA [37]. NABsys in ment in space. If a picture of DNA strand can be taken with collaboration with a research group at Brown University enough resolution to distinguish four bases along a DNA exploited this capability and is developing a sequencing-by- chain, sequence can be readily read off. This is precisely hybridization technology [38] - the Hybridization Assisted what researchers in microscopy community have been at- Nanopore Sequencing (HANS). Genomic DNA is ran- tempting to do. The idea is to tap into powerful non-optical domly cut to fragments of about 100 kb in length, made with resolving power down to atomic level, single-stranded and hybridized with 6-mer oligonucleotide such as scanning tunneling microscopy (STM), atomic force probe. Genomic library fragments bound with probes are microscopy (AFM), etc [30,31]. With admittedly limited then driven through an addressable nanopore array device. success so far, progress has been made. Recently, research- Ionic current across each pore is independently measured to ers from a Japanese group at Osaka University showed that create a current tracing that shows the precise positions of the scanning tunneling imaging can be use to hybridized probes on each genomic fragment. Overlapping distinguish guanine from other three bases with its distinct probe regions of genomic fragments are used to align frag- electronic fingerprint along a real stretch of DNA molecule ment library to create a probe map of the genome. This is [32]. Other group has been actively working on atomic repeated for all hybridization probes in turn, creating a force microscope to read off the distinct force required to complete set of probe maps of the genome. Using computer pull the tightly fitted ring structure of each base [33]. Yet, algorithm, the entire genomic DNA sequence can then be ZS Genetics is working on an electron microscopy directed constructed. The precision and consistency of this approach to sequence DNA. To address insufficient contrast nanopore-based measurement of hybridized location, how- under electron microscope of natural DNA molecule, they ever, still need to be demonstrated. attempt to use nucleotides labeled with heavier elements to To enhance the sensitivity of nanopore to reveal features synthesize a new DNA strand by polymerase from the tem- of nucleobases, research groups are working on other ap- plate DNA molecule [34]. The obtained heavier DNA strand proaches, including embedded electrical probes inside can then be visualized under electron microscope to con- nanopore structure [39]. They hope that the pair of tunnel- struct nucleobase sequence. ing electrodes abutted on opposite side of the nanopore can register characteristic tunneling current as each base being 2.3.2 Nanopore driven through pore. Both computer simulation and experi- Another area we have seen flurry of activities is nanopore ence with scanning tunneling microscopy which has been structure for DNA sequencing. Nanopore, as its name im- successfully used to reveal atomic-scale features give us plies, is a tiny hole with a diameter in the range of nanome- optimism for this to work. But fabricating a nano-device of 52 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 this scale is not a walk in the park. Another creative solution sible to move DNA molecule one base at a time through the to the problem is to develop chemically functionalized nanopore. This would give them ample time to interrogate nanopore. Instead of embedding solid-state electrodes, each passing nucleobase. Lindsay and colleagues [40] have proposed to use two chemical probes to form hydrogen bonds with phosphate 2.3.3 Graphene and Carbon Nanotube group (the grabber) and base moiety (the base reader) re- Graphene is a single layer carbon arranged in a sheet. It is spectively, as nucleobase pass through. Four different reader very strong and has great electrical conductivity, a very probes would be required for four nucleotides in this ap- good candidate for making electrode used for nucleobase proach. interrogation. One idea [47] that has been proposed is to Yet, other groups are working on alternatives of solid- create a gap, about 1 nm wide, on the graphene sheet. DNA state nanopore - biopore. One group uses engineered MspA molecule is guided vertically through the gap. As the DNA protein to construct bio-nanopore for analysis of ssDNA [41]. passes through the gap, two edges of the graphene sheet can They demonstrate that ssDNA can be threaded through this act as electrodes to interrogate nucleobase for sequencing. biopore, a potential for single molecule sequencing. Oxford Besides challenge for making such a small gap on a gra- Nanopore Technologies, collaborating with University of phene sheet, controlling the orientation, motion, and speed Oxford, is working on another protein engineered nanopore. of DNA translocation through the gap is also an issue. Through genetic engineering, they have been able to con- Carbon nanotube (CNT) has been promised to have great struct a biochemical nanopore by covalently attaching the potential to play an important role in rapid DNA sequencing aminocyclodextrin adaptor within the α-hemolysin pore in a because of its unique electro-physical properties and lipid-bilayer membrane [42]. Recently, they demonstrated nanometric dimension, albeit no working device has yet that by driving four nucleotide monophosphates (dNMPs) been fabricated so far. It has been shown that surface of through the pore the ionic current is reduced to one of four CNT is highly interactive with DNA molecule, even in se- levels, each of which correlates to one nucleotide [43]. quence specific way [48]. Long genomic ssDNA can wrap Coupling this discovery with successive release of nucleo- around a single-walled CNT to form a tight, stable DNA- tide from DNA chain, which can be achieved through ex- CNT complex [49]. Computer simulation has demonstrated onuclease, offers us another potential nanopore-based single that four types of nucleotides introduce distinct characteris- molecule sequencing technology. To make it happen, it is tic features in the local density of states [50], making CNT a important to demonstrate the exonuclease moiety can be good candidate for the development of electronic sequenc- mounted in a way to ensure delivery of released nucleotide ing strategy on its own or in combination with other tech- monophosphates into and through the pore in strict sin- nology. Most of those ideas are still in proof- of-concept gle-file. phase. Aside from base detection, controlling the motion and speed of DNA translocation through a nanopore is also im- portant and challenging. The speedy translocation of DNA 3 Future Perspectives through a nanopore holds the promise for ultra-fast se- quencing. But, if the DNA strand passes through the pore After providing the overview of sequencing technology too fast, it might give little time for each base to be accu- development over three decades and potential new break- rately determined. The situation is made worse by stochastic throughs that may follow, it begs the question of what is motion of DNA molecule and non-specific interaction of going to happen in the years to come. What is the outlook of bases with the nanopore surface [44]. All these add to the technical parameters of sequencing, where is the thousand uncertainty to the rate of DNA molecule translocating or the hundred dollar genomes ($1000 genome or TDG and through the nanopore. Although the travel speed of DNA $100 genome or HDG; the two goals assume the total cost can be reduced by lowering temperature, increasing solvent for sequencing a human genome and the way to analyze it is viscosity, and decreasing potential bias across nanapore, reference-based) technology going to come from, and when variation of velocity is still problematic. There have been will it happen? In this section, we will try to provide our various attempts and proposals to overcome this difficulty. answers to some of these questions. One such idea [45] is to tap into processive enzyme of some sort to bind with traversing DNA strand; this should sub- 3.1 Technology Convergence stantially reduce the rate of translocation. More recently, a group at IBM announced their idea based on a nanopore- We do not know for sure which game changing idea or dis- device they termed DNA transistor [46] – a nanopore em- ruptive technology would ultimately bring us to the promise bedded with metal layers, forming metal-dielectric structure land of the $1000 genome or the eventual $100 genome. which can be modulated to trap DNA molecule inside the One thing is strikingly clear when we look at the nanopore. By cyclically turning the gate potential on and off, technological progression of three generations of sequenc- their computer simulation result shows that it would be pos- ing platforms – the marriage of solid-state technology and Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 53 biochemistry. And furthermore, technological convergence and processivity, respectively, of DNA polymerase. Future is shifting toward physical interrogation from biochemical generation of sequencing technology based on nanopore has or chemical approaches as illustrated Table 2. We believe also been promised to achieve long read as nucleotides are this trend will continue and future generation of sequencing determined, one by one, while DNA threads through the technology could eliminate biochemistry all together; nanopore structure. In principle, the length of DNA that can nanotechnology will play a bigger role. be read in a single swoop is only limited by the practicali- ties of threading a very long DNA strand through the pore 3.2 Sequencing Throughput and Read Length without shearing. It has been demonstrated that ssDNA up to 5.4-kb can be threaded through a solid-state nanopore [37]. Throughput and read length have been a dichotomy for se- It can, therefore, readily anticipate that newer generation of quencing platform selection. From the first-generation to the sequencers will be capable of super-high throughput with next-gen sequencing technology, we have seen a dra- read length surpassing Sanger instruments. matic improvement in sequencing throughput but at the suffering of read length. As stated in previous sections, this is primarily the result of out-of-synch sequencing chemistry 3.3 Sequencing Cost and Productivity-the Experience for extending DNA strands in an ensemble - the dephasing Curve effect. Users are left with a choice between sequencing platforms of longer read-length but low-throughput, i.e. the Over last three decades, we have seen the cost of sequenc- first-gen, and that of short read and high throughput, i.e. the ing dropped precipitously while sequencing throughput (or next-gen. The situation will change with single molecule productivity) increased exponentially. Some has likened sequencing, e.g. PacBio’s ZMW technology, which prom- the dramatic change to that of IT industry as abovemen- ises super-fast throughput with multi-kilobase read length. tioned. In fact, the speed of change in DNA sequencing has Assuming uninhibited optical detection capability, through- beaten the Moore’s law of semiconductor industry in certain put and read length are only limited by the synthesis rate aspect. Figure 3 shows plots of change over the years in log scale for sequencing cost (in US dollar per human genome) and productivity (in nucleotides per day per instrument). Table 2 Technology convergence A couple of observations can be made in these plots. First, 2.1-2.2th 2.3th Key technology 1st Gen 3rd Gen as expected, cost of sequencing and throughput has fol- Gen Gen lowed pretty much exponential drop and uptake, respec- DNA hybridization √ √ tively, over three decades. Closer examination reveals an- DNA polymerase √ √ √ other interesting point - the slope of both curves increased PCR √ √ Electrophoresis √ with an inflection point right around the year of 2005, Optoelectronics √ √ √ meaning the rate of cost reduction and throughput en- Microfluidics √ √ hancement has accelerated since then. This is the time when Micro/Nano-fab √ √ √ next-gen sequencing platforms started to make their ways Single molecular √ √ into . The near linear slopes on both sides of the detection inflection point reflect the two distinct drivers for sequenc-

Figure 3 Sequencing cost and productivity 54 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 ing production improvement: one being the productivity such as disruptive technological breakthrough or lack of it. increase and cost reduction within a technological genera- If we hit a wall with current technology before reaching the tion and the other propelled by technological breakthrough. goals, all bets are off. Current model we use to make our Productivity and cost relationship can be modeled by projection will become obsolete and a new one based on experience curve (aka the learning curve) effect. This effect replacing technology will have to be established. If hap- states that the more often a task is performed the lower will pened, how long that is going to take is everybody’s guess. be the cost of doing it. It was first successfully used by But one thing is more certain – we should know the answer Theodore Wright in mid of 1930s to quantify and project in a year or two. Based on current trajectory of the experi- decreases in cost as a function of increased airplane produc- ence curve, we project the TDG could be attainable within tion [51]. Since then, the model has been used to study pro- 2nd-gen technology based on cyclic array sequencing by ductivity and cost relation in many industries. If we plot the synthesis. This could come in a year or two. But for the sequencing cost since 2005 in log scale against sequencing HDG, it is harder to project because current generation of throughput (in log scale) over the same period, we obtain technology, even with its newer SMS, might not be techno- the experience curve for the 2nd-gen sequencing technology logically sufficient to lower the cost to $100 dollar. If that is as shown in Figure 4. The linearity of the plot is a typical the case, we may have to wait until a newer generation of experience curve effect – as productivity increases by a cer- disruptive technology comes along, e.g. the 3rd-generation. tain fold, the cost of doing it reduces by a certain percentage. Occasionally the linearity of experience curve can stop 3.5 Cost of Sequencing to Customer abruptly. The discontinuity reflects the obsolete technology or process that has been replaced by newer one. This is what Are we really getting there this quickly? Are we really going we have observed between 1st-gen and 2nd-gen technologies to see $1000 customer cost for human genome sequencing? right around 2005 as previously discussed. In our current Not so fast. All recent sequencing cost data we rely our generation, i.e. 2nd-gen, we have not seen the discontinuity analysis on are based on consumable or reagent cost in most point yet. So far, an order of magnitude increase in se- part. Even with the same instrument platform, a surprisingly quencing throughput translates into 1.8 orders of magnitude wide range of cost estimates can be generated [52]. One in sequencing cost reduction since 2005. frequently under-appreciated cost is the downstream infor- matics analysis. All too often, people forget to include the 3.4 Thousand Dollar Genome (TDG) and Hundred time-consuming and human-intensive analysis of sequenc- Dollar Genome (HDG) ing data generated by instrument to give rise a well- annotated genome data in the cost estimate or procla-mation. Extending the trend of our experience curve as shown in If one factors in all the associated costs to generate quality Figure 4, we can conclude that the TDG and the HDG goal human genome data, e.g. consumables, machine amortiza- might be reached when we can produce genome sequences tion, maintenance, labor, and computation, real cost could at a rate of 20 Gb per day and 70 Gb per day per instrument, run up many times higher than those numbers. respectively, based on the 2nd-generation technologies (in- Besides the real cost of sequencing, we also have to con- cluding SMS currently under development). For the TDG, it sider the market force and include business operation cost. could happen pretty soon, within a year or two, if current Even if we can get sequencing cost down to a thousand dol- trend remains. For the HDG, it might happen in two to three lar range, initial market demands of such service will prob- years based on the same model. ably keep the price higher than that – simple market supply One big caveat, of course, is that all these predictions are and demand. Adding all these up implies that we may not based on our experience curve model. As we already know, see the $1000 genome in real sense of the word for some there are other factors that experience curve cannot predict time soon in the very near future. The real cost structure currently is that the reagent cost is artificially high since both library construction and se- quencing reaction reagents are fully controlled by the com- panies who provide the instruments. Until serious competi- tors who challenge the reagent monopoly enter the market- place, the situation is not going to change in favor of the customers. Therefore, the customers’ job is to choose their resources for sequencing with priority, starting from the top of the to-be-done list, rather than to jump on the bandwagon right away.

3.6 Sequencing Operation Model Figure 4 Experience curve of 2nd-generation Another factor we need to consider on the issue of sequenc- Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 55 ing cost is economies of scale. In a way, this is factored in responsibility of hosting web sites for many organizations, by the experience curve effect abovementioned - the more large or small. Genomic sequencing service providers of one does something the less costly it will become. This such kind have already sprouted up with companies such as raises another interesting question: which direction of the Agencourt Bioscience, Cofactor Genomics, Complete Ge- sequencing business is going to go, production-oriented nomics, Knome, SeqWright, etc., just name a few. This center or distributed sequencing activity? We predict the trend will continue. But eventually, many of them will con- market needs will diversify the operation models. solidate into a few larger providers as the market becomes There are two operational models for DNA sequencing: mature. as a routine laboratory technique or a high-throughput op- eration. The formal has been obviously observed for capill- 3.7 Technology Coexistence ary-electrophoresis -based machines, such as ABI-3730 XL, which are necessary for sequencing limited amount samples Generations of sequencing technology are not mutually ex- and/or for special tasks. It is predictable that the current clusive – emergence of a newer generation technology does next-gen sequencers will move to such a niche-based opera- not mean older platforms become obsolete completely. They tion. For instance, we will soon see scaled-down versions of often coexist due to complementary functionality of differ- the current machines, such as those of Roche/454 and Illu- ent generations as illustrated in Table 3. A case in point, the mina GA in the market, which will be used for small-scale next-gen platform, which provided much higher throughput, operations to satisfy individual labs’ needs that start from as has not completely replaced the 1st-gen technology based on small as a few bacterial genomes in a single machine run to Sanger’s method. Advantage of read length and raw se- as large as a human genome in a few runs. The large-scale quencing accuracy attainable through Sanger’s method still machines will be pushed to the capability of acquiring finds it a niche in small-scale sequencing projects and/or 20x-coverage sequence data for a haploid human genome, provide complement to the next-gen systems in large scale equivalent to 100-150 Gb per run. project. We believe that most of future large-scale DNA se- quencing activities will shift to large commercial service 3.8 What China Should Do? centers or providers rather than being done in individual labs or even small institutions. As sequencing cost drops to Fierce competitions in developing newer generation of se- a level below $1000 per genome, if current promise holds quencing technologies among organizations in highly de- true, it becomes harder to justify the purchase of a half mil- veloped countries and regions, such as the US and EU, is lion dollar instrument (assuming instrument price stays the one of many indicators for its importance. For China, it is same as of now – a reasonable assumption based on what still a great challenge to develop technology of this sophis- we have experienced so far). To quickly recuperate the in- tication. But, it is also a great opportunity. strument cost and reach break-even point, it would require Traditionally, China has been lacking financial support large enough volume of sequencing jobs. Service-oriented and technical know-how to develop sophisticated analytical sequencing provider of efficient operation with large cus- instrument such as newer generation of DNA sequencer. tomer base would certainly have the financial advantage. Most analytical equipments that have so far been developed Simply put it, it is the economies of scale. This is, in a way, in China are limited to basic laboratory products such as similar to the changes that we have seen in the IT industry – centrifuge, , and power supply. For high-end analyti- large Web hosting service providers have taken over the cal needs such as high-throughput DNA sequencing, all the

Table 3 Functionality of sequencing technologya)

Functionality Technology Definition De novo PCR Seq Re-Seq GT 1000/100 WGS CBC 1.1 Slab Gel + + + + + NA 1st-G 1.2 Cap-4color +++ ++ ++ ++ ++ NA LR NA +++ NA ++ NA NA 2.1 emPCR SR NA ++ NA +++ NA NA 2nd-G 2.2 HT/No Flow Cell NA +++ NA +++ NA NA 2.3 Dingle Molecule NA +++ NA +++ + 1000 3.1 Chem/Nanotech NA +++ NA +++ + 1000 3rd-G 3.2 Nanotech NA +++ NA +++ + 100 3.3 Nanotech NA +++ NA +++ + 100 a) WGS: short reads; LR, long reads; GT: genotyping, 1000/100: per genome costs; WGS, whole genome shotgun; CBC, clone; HT, high throughput; SM, single molecule 56 Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 equipments are imported. This is understandable due to 2 Sanger F, Coulson A R. A rapid method for determining sequences in China’s state of economy and technology in the past. DNA by primed synthesis with DNA polymerase. J Mol Biol, 1975, 94: 441–448 To develop an advanced analytical platform of this kind, 3 Shendure J, Mitra R, Varma C, et al. Advanced sequencing one needs technical and engineering infrastructure, collabo- technologies: methods and goals. Nat Rev Genet, 2004, 5: 335–344 ration, and integration of multiple disciplines, e.g. biologi- 4 Moore G E. Cramming more components onto integrated circuits. cal chemistry, semiconductor, electronic engineering, me- Electronics,1965, 38: 4 5 Sanger F. Sequences, sequences, and sequences. Annu Rev Biochem, chanical engineering, computing, etc. It also requires a huge 1988, 57: 1–28 public or private investment organizationally and financially 6 Smith L M, Fung S, Hunkapiller M W, et al. The synthesis of oli- in such an endeavor. But the potential benefit is huge. gonucleotides containing an aliphatic amino group at the 5' terminus: First, through the development of an analytical system of synthesis of fluorescent DNA primers for use in DNA sequence anal- ysis. Nucleic Acids Res, 1985, 13: 2399–2412 this kind, Chinese scientists and engineers alike not only 7 Applied Biosystems Timeline. www.appliedBiosystems.come can contribute to human efforts in deciphering the secret 8 Mitra R D, Shendure J, Olejnik J, et al. Fluorescent in situ code of life but also can learn and acquire the technical and sequencing on polymerase colonies. Anal Biochem, 2003, 320: 55–65 organizational skills needed for such effort. Second, beyond 9 Shendure J, Porreca G J, Reppas N B, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science, 2005, advancing technological, engineering, and scientific know- 309: 1728–1732 how, it has huge economical potential as well. Advanced 10 Adessi C, Matton G, Ayala G, et al. Solid phase DNA amplification: genomic sequencing technology opens the door for person- characterisation of primer attachment and amplification mechanisms. alized medicine. A corollary of this is the potential eco- Nucleic Acids Res, 2000, 28: e87 11 Dressman D, Yan H, Traverso G, et al. Transforming single DNA nomic benefit for ultimate healthcare delivery. If only ten molecules into fluorescent magnetic particles for detection and percent of population in China opt to have their genome enumeration of genetic variations. Proc Natl Acad Sci USA, 2003, determined at a price of $1000 a piece, it would create 130 100: 8817–8822 billion US dollar economy. Therefore, we argue that even 12 Mitra R D, Church G M. In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res, though the up-front investment for China to develop this 1999, 27: e34 technology seems large but the potential loss of not doing it 13 Fedurco M, Romieu A, Williams S, et al. BTA, a novel reagent for is even greater. DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res, 2006, 34: e22 14 Margulies M, Egholm M, Altman W E, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437: 4 Conclusions 376–380 15 Ronaghi M, Karamohamed S, Pettersson B, et al. Real-time DNA Three decades’ innovation and development have ushered in sequencing using detection of pyrophosphate release. Anal, Biochem, 1996, 242: 84–89 a new era for genomic sequencing. It has evolved from 16 Macevicz S C. DNA sequencing by parallel oligonucleotide manual and one-sample-at-a-time operation to a highly- extensions. US patent 5750341. 1998. automated and massive array-based sequencing activity. We 17 Complete Genomics Technology Paper. www.completegenomics.com have experienced an exponential increase in sequencing 18 Dahl F, Drmanac R, Sparks A. Methods and oligonucleotide designs for insertion of multiple adaptors into library constructs. US patent throughput while seeing a precipitous reduction in its cost. application 20090176652. 2009. With current and newer generation of sequencing technolo- 19 Holt R A, Jones S J M. The new paradigm of flow cell sequencing. gies in the horizon, the goal of reaching one thousand dollar Genome Res, 2008, 18: 839-846 genome becomes more attainable. Ability engendered by 20 Drmanac R, Sparks A B, Callow M J, et al. Human genome sequencing using unchained base Reads on self-assembling DNA rapid and less expensive readout of sequence information nanoarrays. Science, 2010, 327: 78–81 opens up a realm of possibility in comparative genomic 21 Braslavsky I, Hebert B, Kartalov E, et al. Sequence information can analysis, disease diagnosis, and ultimately personalized (or be obtained from single DNA molecules. Proc Natl Acad Sci USA, 2003, 100: 3960–3964 individualized) medicine. With its huge potential benefit in 22 Harris T D, Buzby P R, Babcock H, et al. Single-molecule DNA both scientific and financial terms, China should play a sequencing of a viral genome. Science, 2008, 320: 106–109 greater role in this key technological invention of the cen- 23 Harris T D, Buzby P R, Jarosz M, et al. Optical train and method for tury and its in-depth application in biological research and TIRF single molecule detection and analysis. US patent application 20070070349. 2007. medicine. When genomes of all extant life forms and their 24 Hardin S, Gao X, Briggs J, et al. Methods for real-time single mole- meaningful variations are thoroughly acquired and discov- cule sequence determination. US patent 7329492. 2008. ered, the contributors of such an endeavor should be all very 25 Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules. Science, 2009, 323: 133–138 proud of themselves. 26 Levene M J, Korlach J, Turner S W, et al. Zero-mode waveguides for single-molecule analysis at high concentrations. Science, 2003, 299: 682–686 This work was supported by the Chinese Academy of Sciences Scientific 27 Korlach J, Marks P J, Cicero R L, et al. Selective aluminum Research Equipments (Grant No. YZ200823) passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. Proc Natl Acad Sci USA, 2008, 105: 1176–1181 1 Gilbert W. DNA sequencing and gene structure. Nobel lecture, 1980 28 Array based sequencing-by-synthesis. www.mobious.com Zhou XiaoGuang, et al. Sci China Life Sci January (2010) Vol.53 No.1 57

29 Densham D H. Nucleic acid sequence analysis. EU Patent Applica- Acad Sci USA, 2008, 105: 20647–20652 tion EP1229133. 2002. 42 Wu H-C, Astier Y, Maglia G, et al. Protein nanopores with covalently 30 Driscoll R J, Youngquist M G, Baldeschwieler J D. Atomic-scale attached molecular adapters. J Am Chem Soc, 2007, 129: 16142– imaging of DNA using scanning tunnelling microscopy. Nature, 1990, 16148 346: 294–296 43 Clarke J, Wu H-C, Jayasinghe L, et al. Continuous base identification 31 Ikai A. TM and AFM of biolorganic molecules and structures. Surf for single-molecule nanopore DNA sequencing. Nat Nanotechnol, Sci Rep, 1996, 26: 263–332 2009, 4: 265–270 32 Tanaka H, Kawai T. Partial sequencing of a single DNA molecule 44 Cheikh C, Koper G. Influence of the stick-slip transition on the with a scanning tunnelling microscope. Nat Nanotechnol, 2009, 4: electrokinetic behavior of nanoporous material. Physica A, 2007, 373: 518–522 21–28 33 Bension R. Rapid sequencing of polymers. US patent application 45 Benner S, Chen R J, Wilson N A, et al. Sequence-specific detection 20040214177. 2004. of individual DNA polymerase complexes in real time using a 34 Glover III, Roy W. Systems and methods of analyzing nucleic acid nanopore. Nat Nanotechnol, 2007, 2: 718–724 polymers and related components. US patent 7291467. 2007. 46 IBM press release. Advancing the Science of DNA Sequencing. 35 Branton D, Deamer D W, Marziali A, et al. The potential and www.ibm.com. 2009. challenges of nanopore sequencing. Nat Biotechnol, 2008, 26: 47 Postma H W Ch.Rapid sequencing of individual DNA molecules in 1146–1153 graphene nanogaps. arXiv:0810.3035v1 [physics.bio-ph]. 2008. 36 Meller A, Nivon L, Branton D. Voltage-driven DNA translocations 48 Albertorio F, Hughes M E, Golovchenko J A, et al. Base dependent through a nanopore. Phys Rev Lett, 2001, 86: 3435–3438 DNA-carbon nanotube interactions: activation enthalpies and 37 Fologea D, Gershow M, Ledden B, et al. Detecting single stranded assembly-disassembly control. Nanotechnol, 2009, 20: 395101 DNA with a solid state nanopore. Nano Lett, 2005, 5: 1905–1909 49 Gigliott B, Sakizzie B, Bethune DS, et al. Sequence-independent 38 Ling X S, Bready B, Pertsinidis A. Hybridization-assisted nanopore helical wrapping of single-walled carbon nanotubes by long genomic sequencing of nucleic acids. US patent application 20070190542. DNA. Nano Lett, 2006,6: 159–164 2007. 50 Meng S, Maragakis P, Papaloukas C, et al. DNA nucleoside 39 Lagerqvist J, Zwolak M, Di Ventra M. Fast DNA sequencing via interaction and identification with carbon nanotubes. Nano Lett, 2007. transverse electronic transport. Nano Lett, 2006, 6: 779–782 7: 45–50 40 He J, Lin L, Zhang P, et al. Identification of DNA base-pairing via 51 Wright T P. Affecting the cost of airplan. J Aeronautical Sci, 1936, 3: tunnelcurren decay. Nano Lett, 2007, 7: 3854–3858 122–128 41 Butlera T Z, Pavlenokb M, Derringtona I M, et al. Single-molecule 52 Karow J. The Cost of Sequencing a Human Genome? Answers Differ, DNA detection with an engineered MspA protein nanopore. Proc Natl Even for the Same Platform. In Sequence, 2009