Journal of Entomology and Zoology Studies 2018; 6(3): 1601-1609

E-ISSN: 2320-7078 P-ISSN: 2349-6800 Next generation sequencing and its application in JEZS 2018; 6(3): 1601-1609 © 2018 JEZS livestock Received: 18-03-2018 Accepted: 23-04-2018

Vandana Yadav Vandana Yadav, BL Saini, Narendra Pratap Singh, Neeti Lakhani and Ph.D. Scholar, Animal Genetics, Rahul Sharma ICAR-NDRI, ICAR-National Dairy Research Institute, Karnal, Haryana, India Abstract The ability to determine nucleic acid sequences is one of the most important platforms for the detailed BL Saini study of biological systems. Next generation sequencing (NGS) technology has revolutionized genomic Ph.D Scholar, Animal Genetics, and genetic research. Starting from the basic Sanger sequencing, the next generation sequencing are ICAR-IVRI, ICAR-National providing high throughput and much cheaper alternative. All NGS have a similar base methodology that Dairy Research Institute, includes template preparation, sequencing and imaging, and data analysis. Different NGS platform use Karnal, Haryana, India different technology for the identification of the called base. Different commercially available technologies include 454 FLX Roche (pyrosequecing), Illumina Solexa (CRT), ABI SOLiD (sequencing Narendra Pratap Singh M.V.Sc Scholar, Animal genetics, by ligation) and Ion semiconductor sequencing (Ion Torrent) which require the pre-amplification process ICAR- NDRI, ICAR-National before sequencing. Single molecule approach are the developing technology for sequencing which can Dairy Research Institute, provide the @1000$ per genome. Single molecule technology includes Helicos sequencer, SMRT, Karnal, Haryana, India Nanopore technology that does not require the amplification prior to sequencing. Data analysis can be time-consuming and may require special knowledge of bioinformatics to gather accurate information Neeti Lakhani from sequence data. Resequencing of the human genome is being performed to identify genes and Ph.D Scholar, Nutrition division, regulatory elements involved in pathological processes. NGS has also provided a wealth of knowledge ICAR-NDRI, ICAR-National for comparative biology studies through whole genome sequencing of a wide variety of organisms. Dairy Research Institute, Simply analyzing all DNA from control versus infected (diseased) individuals will reveal the “extra” Karnal, Haryana, India DNA, which most likely derives from the infectious agent. The approach was used successfully to identify e.g. colony collapse disorder killing honeybees but also to identify the cause of diseases that Rahul Sharma killed thousands of humans in the past. Thus NGS can be used for genomics, trancriptomics, epigenetic VEO, DAH, Katni, Madhya study and study. Pradesh, India

Keywords: Sanger sequencing, nucleotide bases, template, adapter

1. Introduction DNA sequencing is the process of determining the precise order of nucleotide bases (adenine, guanine, cytosine, and thymine) within a DNA molecule. The advancement in DNA

sequencing technology has been proved to be a boon for biological and medical research and discovery. Knowledge of DNA sequences is not only required for basic biological research but in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology and biological systematic. The modern DNA sequencing technology with rapid speed has been

instrumental in the sequencing of the complete DNA sequences, or genomes of numerous types and various life forms, including the human genome and other complete DNA sequences of many animal, plants, and microbial species.

2. History of sequencing

Ray Wu at Cornell University (1970) established the first method for determining DNA sequences, based on a location-specific primer extension strategy. The cohesive ends of lambda phage DNA were sequenced by using DNA polymerase catalysis and specific nucleotide labeling, both of which forms the base for the current sequencing technologies [1]. This primer-extension strategy was then used to develop more rapid DNA sequencing Correspondence Vandana Yadav methods. Frederick Sanger (1977) at the MRC Centre, Cambridge, UK then developed the Ph.D. Scholar, Animal Genetics, method for "DNA sequencing with chain-terminating inhibitors". In the same year Walter ICAR-NDRI, ICAR-National Gilbert and Allan Maxam at Harvard also developed another method for DNA sequencing Dairy Research Institute, known as “DNA sequencing by chemical degradation”. In 1987, gave the Karnal, Haryana, India first full-automated sequencer (ABI370) for the first time. ~ 1601 ~ Journal of Entomology and Zoology Studies

After that various high throughput-sequencing platforms 3. Next-generation sequencing methods (Pyrosequencing-2005, Illumina-2006, Ion torrent-2011, Pac- Generation refers to the chemistry and technology used by the Bio- 2012, illumine HiseqX-2013, Nanopore- 2014) were sequencing process. First generation generally refers to launched, marking the era of sequencing. The advancement in Sanger sequencing. "Next-generation" is generally used to sequencing technologies has now made it possible to refer to any of the high-throughput methods which were sequence the complete genome within a short time and in a developed after Sanger. Third-generation refers to single- cost effective manner. molecule methods (Helicos and Pac-Bio SMRT) and fourth generation includes the Nano-pore sequencing (Fig: 1).

Fig 1: Different NGS platforms

3.1 Sanger method of sequencing invented by Frederick Sanger in 1977 & got Nobel Prize in It was the most common approach used for DNA sequencing 1980.

Fig 2: Sanger sequencing by chain termination

This method is also termed as Chain Termination or Dideoxy polymerase, a short primer DNA to enable the polymerase to method (Fig: 2). It relies on the use of dideoxyribonucleoside start DNA synthesis, and the four deoxyribonucleoside triphosphate, derivative of the normal deoxyribonucleoside triphosphate (dATP, dCTP, dGTP, dTTP) [3]. If a triphosphate that lack the 3′ hydroxyl group [2]. It is based on dideoxyribonucleotide analog of one of these nucleotides is the synthesis of DNA in a controlled way to generate also present in the mixture, it can incorporate into a growing fragments terminating at specific point. Purified DNA is DNA chain. Because this chain now lacks a 3′ OH group, the synthesized in vitro in a mixture that contains single stranded addition of next nucleotide is blocked, and the DNA chain molecules of the DNA to be sequenced, the enzyme DNA terminates at that point. ~ 1602 ~ Journal of Entomology and Zoology Studies

To determine the complete sequence of a DNA fragment, the dramatic increase in cost-effective sequence throughput, double- stranded DNA is first separated into its single strands albeit at the expense of read lengths [5]. The next-generation and one of the strands is used as a template for sequencing. technologies commercially available today include the 454 Four different chain- terminating dideoxyribonucleoside GS20 pyrosequencing based instrument (Roche Applied triphosphates (ddATP, ddCTP, ddGTP, ddTTP) are used in Science), the Solexa 1G analyzer (Illumina, Inc.), the SOLiD four separate DNA synthesis reactions on copies of the same instrument from Applied Biosystems, and the Heliscope from single-stranded DNA template. Each reaction produces a set Helicos, Inc. and Nanopore from Oxford [6]. of DNA copies that terminate at different points in the sequence. The products of these four reactions are separated 5. Workflow of next-generation sequence by electrophoresis in four parallel lanes of a thin slab The workflow to produce next-generation sequence-ready polyacrylamide gel. The newly synthesized fragments are libraries is straightforward and includes sample preparation, detected by a label (either radioactive or fluorescent) that has library construction, clonal amplification, sequencing and data been incorporated either into the primer or into of the analysis (Fig: 3). deoxyribonucleoside triphosphates used to extend the DNA chain [4]. In each lane, the bands represent fragments that have a. Sample preparation been terminated at a given nucleotide but at different  Source of sample (DNA, RNA) positions in the DNA. By reading off the bands in order,  Qualify and quantify samples starting at the bottom of the gel and working across all lanes, the DNA sequence of the newly synthesized strands can be b. Library Construction determined. Thus, a long and labor-intensive process was  Prepare platform specific library. completed, and the sequencing data for the DNA of interest  Fragmentation of target DNA and adapter ligation. were in hand and ready for assembly, translation to amino acid sequence, or other types of analysis. c. Clonal Amplification  Bridge PCR amplification-Illumina sequencing. 4. Advances in DNA sequencing technologies  Emulsion PCR-SOLiD, Ion torrent and pyrosequencig. The landmark publications of the late 1970 s by Sanger's and  Third and fourth generation sequencing technologies Gilbert's groups and notably the development of the chain don’t require amplification. termination method by Sanger and colleagues established the

groundwork for decades of sequence-driven research that d. Sequencing followed. The chain-termination method published in 1977  Perform sequencing run reaction on NGS platform. also commonly referred to as Sanger or dideoxy sequencing,  Sequencing by ligation-SOLiD sequencing. has remained the most commonly used DNA sequencing technique to date and was used to complete human genome  Sequencing by synthesis-Illumina sequencing, sequencing initiatives led by the International Human pyrosequencig, Ion torrent. Genome Sequencing Consortium and Celera Genomics. Very recently, the Sanger method has been partially supplanted by e. Data Analysis several “next-generation” sequencing technologies that offer  Application specific data analysis pipeline.

Fig 3: Workflow of NGS ~ 1603 ~ Journal of Entomology and Zoology Studies

DNA fragments that may originate from a variety of front-end apyrase and can therefore be incorporated after the next processes are prepared for sequencing by ligating specific nucleotide) that eventually will generate high levels of noise. adaptor oligos to both ends of each DNA fragment. Approximately 1.2 million wells will give one unique Importantly, relatively little input DNA (a few micrograms at sequence of 400 bp, on average generating less than 500 most) is needed to produce a library. These platforms also million bases (Mb) in one single run. Whole-genome have the ability to sequence the paired ends of a given sequencing has been performed on bacterial genomes in fragment, using a slightly modified library process. This single runs. An oversampling of 20× permits the identification approach can be used if a de novo genome sequence is to be of PCR-introduced errors and to call homopolymeric errors. assembled from the next generation data. Finally, next- generation sequencers produce shorter read lengths (35-250 bp, depending on the platform) than capillary sequencers (650–800 bp), which also can impact the utility of the data for various applications such as de novo assembly and genome resequencing.

6. Pyrosequencing An inherent limitation of Sanger sequencing is the requirement of in vivo amplification of DNA fragments that are to be sequenced, which is usually achieved by cloning into bacterial hosts. The cloning step is prone to host-related biases, is lengthy, and is quite labor intensive. The 454 technology, the first next-generation sequencing technology is based on a highly efficient in-vitro DNA amplification

(emulsion PCR) rather than in-vivo amplification by cloning. Fig 4: Basic principle behind Pyrosequencing The DNA fragment to be sequenced is fragmented and streptavidin beads are attached to each fragment using 7. Illumina/Solexa sequencing adapters. During amplification by emulsion PCR these The Illumina/Solexa approach achieves cloning-free DNA fragments carrying streptavidin beads are captured into amplification by attaching single-stranded DNA fragments to separate emulsion droplets, acting as individual amplification a solid surface known as a single-molecule array, or flowcell, 7 reactors and producing ~10 clonal copies of a unique DNA and conducting solid-phase bridge amplification of single- template per bead. The DNA templates with beads are then molecule DNA templates. In this process, one end of single transferred to wells of a picotiter plate (one template per well) DNA molecule is attached to a solid surface using an adapter; and analyzed using a pyrosequencing reaction. the molecules subsequently bend over and hybridize to Pyrosequencing, sequencing-by-synthesis method detecting complementary adapters (creating the “bridge”), thereby successful incorporation of nucleotide as emitted photons. forming the template for the synthesis of their complementary Since the single-stranded DNA fragments on the beads have strands. After the amplification step, a flow cell with more been amplified with general tags, an universal primer is than 40 million clusters is produced, wherein each cluster is annealed permitting an elongation towards the bead. The composed of approximately 1000 clonal copies of a single emission of photons upon incorporation depends on a series template molecule. The templates are sequenced in a of enzymatic steps. Incorporation of a nucleotide by a massively parallel fashion using a DNA sequencing-by- polymerase releases a diphosphate group (PPi), which synthesis approach that employs reversible terminators with catalyzed by ATP sulphurylase forms adenosine triphosphate removable fluorescent moieties and special DNA polymerases (ATP) by the use of adenosine phosphosulphate (APS). that can incorporate these terminators into growing Finally, the enzyme luciferase (together with D-luciferin and oligonucleotide chains [7]. The terminators are labeled with oxygen) can use the newly formed ATP to emit light. Another fluors of four different colors to distinguish among the enzyme, apyrase, is used for degradation of unincorporated different bases at the given sequence position and the dNTPs as well as to stop the reaction by degrading ATP (Fig: template sequence of each cluster is deduced by reading off 4). the color at each successive nucleotide addition step. In the 454 systems, the Pyrosequencing technology is adapted Although the Illumina approach is more effective at as follows. The enzymes luciferase and ATP sulphurylase are sequencing homopolymeric stretches than pyrosequencing, it immobilized on smaller beads surrounding the larger produces shorter sequence reads and hence cannot resolve amplicon carrying beads. All other reagents are supplied short sequence repeats. In addition, due to the use of modified through a flow allowing reagents to diffuse to the templates in DNA polymerases and reversible terminators, substitution the Picotiter Plate. Polymerase and one exclusive dNTP per errors have been noted in Illumina sequencing data. Typically, cycle generate one or more incorporation events and the the 1G-genome analyzer from Illumina, Inc. is capable of emitted light is proportional to the number of incorporated generating 35-bp reads and producing at least 1 GB of nucleotides. Photons are detected by a CCD camera and after sequence per run in 2-3 days. The raw accuracy is said to be each round; apyrase is flowed through in order to degrade at 98.5% and the consensus (3×coverage) at 99.99%. The cost excess nucleotides. The washing procedure for the removal of per base is approximately 1% of the cost for Sanger byproducts permits read lengths of over 400 bp (250 bp in the sequencing [8]. GS FLX system and over 400 bp in the recently upgraded instrument, the GS FLX Titanium). This limitation is due to 8. ABI Solid Sequencing negative frame shifts (incorporation of nucleotides in each The ABI Solid sequencing system (2007) is based upon cycle is not 100% complete) and positive frame shifts (the sequencing by ligation [9]. The ABI Solid sequencing starts population of nucleotides that is not fully degraded by the with the basic step i.e. fragmentation of the target DNA

~ 1604 ~ Journal of Entomology and Zoology Studies

generation of library. The fragments are ligated to adapters sphere. During the emPCR steps, individual library molecules and then bound to beads. The emulsion PCR amplifies DNA get amplified to millions of identical copies that are bound to fragments on the beads. A water droplet in oil emulsion the beads to allow ultimate detection of the signal. contains the amplification reagents and only one fragment Although one emPCR reaction can generate billions of bound per bead. During amplification at first, a primer is template spheres, some aspects inherent to the emPCR hybridized to the adapter and then specially designed probe method, in addition to the general biases during PCR (different fluorescently dyed oligonucleotide 8-mer probe) is amplification, prevent optimal output. Due to the double attached to the primer by ligase enzyme along with the Poisson distribution behavior, it is impossible to achieve addition of ligation mixture. In these octamers, the first two optimal loading of one library molecule into all individual are the actual bases, next three are universal bases and last vesicles. In fact 1/3 of the vesicles will have the one molecule three are universal bases with fluorescent dye, measures for to one vesicle ratio, the remaining 2/3 will be either without a fluorescence and cleaved in each cycle, so attached probe is molecule or have more than one. In addition, breaking of the only 5 nucleotide long i.e. fluorescence measurement for emulsion and recovery of the spheres are inefficient even with every fifth base. The ligated octamer oligonucleotides are the latest automated systems. In the final step spheres cleaved off after the fifth base, removing the fluorescent label, containing amplified DNA are selected in an enrichment step then hybridization and ligation cycles are repeated, this time from empty spheres and the loaded spheres are deposited into determining bases 9 and 10 in the sequence; in the subsequent the sequencing chip. cycle bases 14 and 15 are determined, and so on (Fig: 5). The The Ion torrent chip consists of a flow compartment and error rate with this technique is very low along with a great solid-state pH sensor micro-arrayed wells that are output of 3 to 10 Gb per run. manufactured using processes built on standard CMOS technology. The detection of the incorporated bases during sequencing is not based on imaging of fluorescent signals but on the release of an H+ during extension of each nucleotide. The release of H+ is detected as a change in the pH within the sensor wells. Due to the lack of the time consuming imaging a sequencing run can be completed within 4 h. Since there is no detectable difference for H+ released from an A, C, G or T bases, the individual dNTPs are applied in multiple cycles of consecutive order. If upon delivery of a dNTP no change in pH is detected in a specific well, that nucleotide is not present in the template at the next available position. Alternatively, if a change in pH is detected, that base is in the template. In contrast to the Illumina's SBS method the dNTPs used are not blocked and when the template contains a series of a nucleotide after each other (a homopolymer stretch), the entire stretch of identical bases will be extended, leading to an accordingly stronger pH change which is directly proportional to the number of identical bases incorporated. Relative to a single ‘A’ a stretch of ‘AA’ will give a 2 fold increase in the pH, while an ‘AAA’ template will yield a 1.5 fold (3/2) increase in pH relative to ‘AA’ and for 6 vs. 5 identical bases this relative increase is just 1.2 fold. This decrease of the relative increase of the change in pH as the homopolymer length increases reduces the probability by which a homopolymer region is called correctly. The dNTPs are added in a predefined flow order. At the first release of the system, this order was a repetitive T–A–C–G sequence. As with Illumina sequencing, not all of the template molecules on a template sphere get extended in perfect

synchrony. On average 0.5–1% of the molecules deviate from Fig 5: Steps involved in Solid sequencing the flow either because they lag behind due to improper extension or they advance ahead due to carry over of dNTPs 9. Ion torrent technology from a previous cycle (Fig: 6). In order to minimize this de- For the PGM/ Proton sequence platforms the sequence phasing, the flow order was changed to a more sophisticated templates are generated on a bead or a sphere via emulsion sequence that incorporated A–T–A catch-up type flows. This PCR. An oil–water emulsion is created to partition small scheme allows incomplete extension of the A nucleotide to reaction vesicles that each ideally contains one sphere, one catch up after the T base. Although this does come at a cost of library molecule and all the reagents needed for amplification. decreased overall read length, the overall quality of the read Two primers that are complementary to the sequence library does improve. Still, the quality of the reads gradually adapters are present, but one is only present in solution while decreases towards the ends of the reads. By taking into the other is bound to the sphere. This serves to select for the account the flow order, it is possible to make flow-aware base library molecules with both an A and a B adapter while caller algorithms and flow-space aware aligner software and excluding those molecules with two A or B adapters from variant detection tools that take the actual flow order into loading on the beads during emPCR. In addition, this ensures account when processing the data in order to generate higher a uniform orientation of the sequence library molecules on the accuracy data. The present error rate for substitutions is

~ 1605 ~ Journal of Entomology and Zoology Studies

~0.1% which is very similar to that of the Illumina systems. is removed. Fluorescently labeled nucleotides are added, one The main point of criticism the system endures is the homo- in each cycle, followed by imaging. A cleavage step removes polymer errors. Despite many improvements the 5-mer homo- the fluorophore and permits nucleotide incorporation in the polymer error rate is still at 3.5%. next round. A different, although very promising, approach is taken by Pacific Biosciences. The technology, denoted Single- molecule Real Time Sequencing-by-synthesis (SMRT shown read lengths of single DNA fragments of over 1500 bases in 3000 parallel reactions. The heart of the technology is so called zero-mode waveguides (ZMW), which essentially are nanometer scale wells with a diameter of 70 nm [11]. Light bulges inward at the opening, permitting illumination of a detection volume of 20 zl where a single DNA polymerase is immobilized. Nucleotides, fluorescently labeled at the terminal phosphate, are incorporated by the polymerase and thereby exposing its base-specific fluorophore for a few milliseconds which is enough for detection. Benefits are long read lengths of thousands of bases in one stretch and high speed (10 bases per second and molecule). It is still at the

Fig 6: Mechanism involved in Ion torrent sequencing proof-of-concept stage and no commercial instrument is ready. Thousands of ZMWs in parallel may in a future Since the initial release of the Ion torrent platform, this instrument (no sooner than 2010) generate 100 gigabases per technology has evolved at a very rapid pace. Through hour. A second-generation instrument capable of sequencing a increasing the total surface area of the chips and the sensor human genome for $1000 is an additional number of years in well density the average read length from 100 up to 400 bp the future. could be increased. The Proton-I chips currently yield 60-80 million reads per run, reaching 10 GB. This is enough to 11. Nano pore sequencing sequence two human exomes at ~50× coverage. Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and 10. Helicos true single molecule sequencing reliably sequence the entire human genome for less than The concept of sequencing-by-synthesis without a prior $1000, and possibly for even less than $100. The nanopore amplification step i.e. single-molecule sequencing is currently approach will be one option for the fourth-generation low-cost pursued by a number of companies [10]. Helicos Biosciences and rapid DNA sequencing technology. Nanopore-based has an instrument, the HeliScope™, with a claimed technologies originated from the Coulter counter and ion throughput of 1.1 Gpb per day. Single fragments are labeled channels. With the application of an external voltage, particles with Cy3 for localization of template strands on an array and with sizes slightly smaller than the pore size are passed a predefined, Cy5-labeled nucleotide (for instance “A”) are through the nanometer-sized pores are either embedded in a incorporated, detected by a fluorescent microscope and biological membrane or formed in solid-state film, which cleaved off in each cycle. Four cycles, one for each separates the reservoirs containing conductive electrolytes nucleotide, constitutes a “quad” and multiple quad runs are into cis and trans compartments. When the pore is blocked by claimed to produce read lengths of up to ~55 bases. At 20 an analyte, such as a negatively charged DNA molecule added bases or longer, 86% of the strands are available and at 30 into the cis chamber, current flowing through the nano pore bases, around 50%. True single-molecule sequencing would be blocked, interrupting the current signal (Fig: 7). (tSMS™) is achieved by initially adding a poly A sequence to The physical and chemical properties of the target molecules the 3′-end of each fragment, which allows hybridization to can be calculated by statistically analyzing the amplitude and complementary poly T sequences in a flow cell. After duration of transient current blockades from translocation hybridization, the poly T sequence is extended and a events. Nanopores as single-molecule sensing technologies complementary sequence is generated. In addition, the have great potential applications in many areas, such as template is fluorescently labeled at the 3′-end and thus, analysis of ions, DNA, RNA, peptides, proteins, drugs, illumination of the surface reveals the location of each polymers, and macromolecules, as previously reviewed. hybridized template. This process allows generation of a map of the single molecule landscape before the labeled template

~ 1606 ~ Journal of Entomology and Zoology Studies

Fig 7: Mechanism involved in Nanopore sequencing

12. Application of Next generation sequencing in animal thousands of SNPs present in the genome. genetics 12.1 Transcriptome sequencing 12.4 Identification of all mutation in an organism at The sequencing of cDNA rather than genomic DNA focuses genomic level analysis on the transcribed portion of the genome. This focus 12.5 Nucleotide variation profiling reduces the size of the sequencing target space, which can be 12.6 Selection of resistant animal to disease viewed as desirable given the fact that, even with next- 12.7 Metagenomics generation sequencers, sequencing an entire vertebrate Using brute force sequencing, simply reading all DNA genome is still an expensive undertaking. Transcriptome sequences present in a sample, metagenomics is a way to sequencing has been used for Gene expression profiling, make an inventory of what is present in a sample, of what is Genome annotation, Rearrangement detection that causes living where. A simple but effective application of this is aberrant transcription events, noncoding RNA discovery and trying to detect the cause of an infectious disease. Simply quantification [12]. analyzing all DNA from control versus infected (diseased) individuals will reveal the “extra” DNA, which most likely 12.2 Analysis of epigenetic modifications of histones and derives from the infectious agent. The approach was used DNA successfully to identify e.g. colony collapse disorder killing Epigenetics is the study of heritable gene regulation that does honeybees but also to identify the cause of diseases that killed not involve the DNA sequence. The two major types of thousands of humans in the past, inclusive the Black Death. epigenetic modifications regulating gene expression are DNA Metagenomics can be performed by a sequence it all approach methylation by covalent modification of cytosine-5′ and or by focusing on specific uniformly conserved sequences like posttranslational modifications of histone tails there are three e.g. ribosomal RNA genes only [14]. The latter approach has main approaches to detecting DNA methylation on a large two main advantages; the complexity of the data obtained is scale, including restriction endonuclease digestion coupled to much smaller and more sequences can be assigned to a microarray technology, bisulfite sequencing, and specific organism or a group of related organisms. The latter immunoprecipitation of 5′-methylcytosine to separate facilitates some semi-quantitative analysis which is much methylated from unmethylated DNA. Bisulfite sequencing is more difficult when analyzing all sequences mixed from based on the chemical property of bisulfite to induce the many organisms with largely varying genome sizes. conversion of cytosine residues to uracils while leaving 5′ Œ- Metagenomics focussing on rDNA genes has been used to methylcytosines intact. Therefore, sequencing of bisulfite study many different things, incl. e.g. the effect of the 2004 treated DNA will reveal the positions of methylated cytosines tsunami on microbial ecologies in marine, brackish, (those positions that remained cytosines following the freshwater and terrestrial communities in Thailand [15]. treatment) [13]. 12.8 Whole-exome sequencing (WES) 12.3 Genomic selection Only the coding regions of the genome are sequenced. The Advent of all these sequencing techniques of sequencing exome represents less than 2% of the human genome, but forms the basis for the Genomic Selection. Decreased cost of contains ~85% of known disease-causing variants. For this sequencing triggered the sequencing of all the animal’s reason, WES has been extensively used for clinical studies in genomes, which results in the identification of tens of the recent years, and is giving rise to promising novel ~ 1607 ~ Journal of Entomology and Zoology Studies

diagnostic tools that have the potential to transform medical functional genomics application is determining DNA healthcare in the near future. For example, an experimental sequences associated with epigenetic modifications of approach for comprehensive WES of circulating tumor cells histones and DNA. Next-generation sequencing approaches from cancer patients has recently been described [16]. have been used in this field to profile DNA methylations, posttranslational modifications of histones, and nucleosome 12.9 Single cell genomics positions on a genome-wide scale. While Sanger sequencing An exciting new field is a single cell genomics. A major has previously addressed these areas, next-generation objective of this field is to reconstruct cell lineage trees using technologies have improved upon the throughput, the depth of somatic mutations that arise due to DNA replication errors. coverage, and the resolution of Sanger sequencing studies. As a result, each cell in a multicellular organism carries a Despite the recent exciting research advances involving next genomic signature that is probably unique. Cell lineage trees generation sequencers, it should be noted that method provide important information and have applications in development is still in its infancy [19]. Efficient data analysis developmental biology and tumor biology [17]. For example, pipelines are required for many applications before they sequencing the genomic DNA of individual breast cancer become routine, and more studies are needed to address the cells allowed researchers to reconstruct the tumor population robustness of these techniques as well as the correspondence structure and evolutionary history [18]. of results with those obtained by previous methods. Although next-generation sequencers are already being widely used, 12.10 Large-scale identification and development of there are other sequencing methods, such as nanopore molecular markers sequencing, whose scalability is being explored to decrease One of the most important application of NGS within the sequencing cost and enhance throughput even further [20]. ecological and population genetics is the development of molecular markers on a large scale. NGS generally allows for 14. References cost-effective and rapid identification of hundreds of 1. Sanger F, Donelson JE, Coulson AR, Kossel H, Fischer microsatellite loci and thousands of SNPs, even if only a D. Use of DNA polymerase I primed by a synthetic fraction of a sequencing run is used. This will, for example, oligonucleotide to determine a nucleotide sequence in facilitate QTL mapping studies and will increase the quality phage fl DNA. Proc. Natl. Acad. Sci. USA. 1973; of outlier- and structure analyses. Massively increasing the 70:1209-13. number of markers will enable researchers not only to get 2. Sanger F, Nicklen S, Coulson AR. DNA sequencing with better precision in population genetic studies (QTL and chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA. linkage disequilibrium (LD) mapping projects and kinship 1977; 74:5463-67. assignments but also to pursue topics such as historical 3. Sanger F, Coulson AR. A rapid method for determining demographic patterns, introgression and admixture. sequences in DNA by primed synthesis with DNA polymerase. J Mol. Biol. 1975; 94:441-48. 13. Conclusions 4. Sanger F, Brownlee GG, Barrell BG. A two-dimensional The short read structure of next-generation sequencers fractionation procedure for radioactive nucleotides. J provides potential problems for Mol. Biol. 1965; 13:373-98. particularly in areas associated with sequence repeats. 5. Metzker ML. Sequencing technologies-the next However, it has found broad applicability in sequence census generation. Nat. Rev. Genet. 2009; 11:31-46. studies, wherein determining the sequence of the whole DNA 6. Metzker ML. Emerging technologies in DNA molecule is not essential. The short read length also sequencing. Genome Res. 2005; 15:1767-1776. necessitates the development of paired-end sequencing 7. Rogers YH and Venter JC. Genomics: massively parallel approaches for improved mapping efficiency. To date, such sequencing. Nature. 2005; 437:326-327. approaches have been reported for the 454 technology and are 8. Goldberg SM, Johnson J, Busam D, Feldblyum T, being made available for the Illumina and Solid platforms. Ferriera S, Friedman R et al. A Sanger/pyrosequencing The accuracy of next-generation sequencers is improving, but hybrid approach for the generation of high-quality draft users generally rely on relatively high redundancy of assemblies of marine microbial genomes. Proc.Natl sequence coverage to determine reliably the sequence of a Acad. Sci. USA. 2006; 103:11240-11245. region, particularly of that containing a polymorphism. 9. Gait MJ, Matthes HW, Singh M, Sproat BS, Titmas RC. Addressing the accuracy issue by improving the reaction Rapid synthesis of oligodeoxyribonucleotides. Solid chemistry has the potential of further decreasing the current phase synthesis of oligodeoxyribonucleotides by a sequencing cost associated with next-generation sequencers. continuous flow phosphotriester method on a kieselguhr- Next-generation sequencing technologies have found broad polyamide support. Nucleic Acids Res. 1982; 10:6243-54. applicability in functional genomics research. Their 10. Braslavsky I, Hebert B, Kartalov E, Quake SR. Sequence applications in the field have included gene expression information can be obtained from single DNA molecules. profiling, genome annotation, small ncRNA discovery and Proc. Natl Acad. Sci. USA. 2003; 100:3960-3964. profiling, and detection of aberrant transcription, which are 11. Levene MJ, Korlach J, Turner SW, Foquet M, Craighead areas that have been previously dominated by microarrays. HG. Webb WW. Zeromode waveguides for single- Significantly, several studies found that 454 sequencing molecule analysis at high concentrations, Science. 2003; correlated well with the established gene expression profiling 299:682.686. technologies such as microarray results (correlation 12. Wang Z, Gerstein M, Snyder M. RNA-Seq: a coefficients of (0.83-0.91) and moderately with SAGE data revolutionary tool for transcriptomics. Nat Rev Genet. (correlation coefficients of 0.45). While the transcriptome 2009; 10(1):57-63. sequencing studies discussed here predominantly used the 454 13. Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares technology, Illumina and Solid technologies also offer EC, Clark TA et al. Direct detection of DNA methylation significant potential for such applications. Another major during single-molecule, real-time sequencing. Nat

~ 1608 ~ Journal of Entomology and Zoology Studies

Methods. 2010; 7(6):461-465. 14. Abubucker S, Segata N, Goll J, Schubert AM, Izard J. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 2012; 8:e1002358 15. Somboonna N, Wilantho A, Jankaew K, Assawamakin A, Sangsrakru D. Microbial ecology of Thailand tsunami and nontsunami affected terrestrials. Plos One. 2014; 9:e94236. 16. Lohr JG. Whole-exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Nat. Biotechnol. 2014; 32:479-484. 17. Reizel Y. Colon stem cell and crypt dynamics exposed by cell lineage reconstruction. Plos Genet. 2011; 7:e1002192. 18. Navin N. Tumour evolution inferred by single-cell sequencing. Nature Pettersson, Erik; Lundeberg, Joakim and Ahmadian, Afshin. Generations of sequencing technologies Genomics. 2009; 93(472): 90-94, 105-111. 19. Chan EY. Advances in sequencing technology. Mutat. Res. 2005; 573:13-40. 20. Buermans HPJ, Dunnen JT. Next generation sequencing technology: Advances and applications. Biochimica ET Biophysica Acta. 2014; 1842:1932-1941.

~ 1609 ~