SUBJECT FORENSIC SCIENCE

Paper No. and Title PAPER No.13: DNA Forensics

Module No. and Title MODULE No.21: DNA Sequencing - I

Module Tag FSC_P13_M21

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

TABLE OF CONTENTS

1. Learning Outcomes

2. Introduction

3. First Generation Sequencing Methods

4. Second Generation Sequencing Techniques

5. Third Generation Sequencing – emerging technologies

6. DNA Sequence analysis

7. Summary

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

1. Learning Outcomes

After studying this module, reader shall be able to understand -

 DNA Sequencing  Methods of DNA Sequencing  DNA sequence analysis

2. Introduction

In 1953, James Watson and Francis Crick discovered double-helix model of DNA, based on crystallized X-ray structures studied by Rosalind Franklin. As per this model, DNA comprises of two strands of nucleotides coiled around each other, allied by hydrogen bonds and moving in opposite directions. Each strand is composed of four complementary nucleotides – adenine (A), cytosine (C), guanine (G) and thymine (T) – with A always paired with T and C always paired with G with 2 & 3 hydrogen bonds respectively.

The sequence of the bases (A,T,G,C) along DNA contains the complete set of instructions that make up the genetic inheritance. Defining the arrangement of these nucleotide bases in DNA strand is a primary step in assessing regulatory sequences, coding and non-coding regions. The term DNA sequencing denotes to methods for identifying the sequence of these nucleotides bases in a molecule of DNA. The basis for sequencing proteins was initially placed by the effort of Fred Sanger who by 1955 had accomplished the arrangement of all the amino acids in insulin, a small protein produced by the pancreas.

The first technique for identifying DNA order involving a location-specific primer extension scheme was developed by Ray Wu at Cornell University in 1970. Between 1970 and 1973, Wu, R Padmanabhan and co-workers revealed that this technique could be engaged to identify any DNA arrangement by applying synthetic location-specific primers.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

Frederick Sanger then executed this primer-extension method to advance more rapidly DNA sequencing conducts at the MRC Centre, Cambridge, UK and delivered a method for "DNA sequencing with chain-terminating inhibitors" in 1977.

The awareness of DNA sequences of genes and further fragments of the of organisms has become crucial for several applied and research fields such as:

 Diagnostic  Biotechnology  Forensic Biology  Biological Systematics  Taxonomy  Phylogeny  Ecology  Genetic studies Progressions in sequencing were assisted by the coexisting expansion of recombinant DNA technology, permitting DNA samples to be separated from sources other than viruses. Development of dye grounded sequencing technique with automated examination, DNA sequencing has become easy to handle and comparatively faster. The speedy sequencing accomplished with contemporary DNA sequencing tools has been influential in the sequencing of the human genome, in the human genome project.

Developments in DNA sequencing technologies can be grouped into three stages:

1. First generation sequencing 2. Second generation sequencing or next generation (NGS) 3. Third generation sequencing (TGS) – emerging technologies

The Sanger and Gilbert approaches of sequencing DNA are often called "first-generation" sequencing because they were the first to be developed. In the late 1990s, new methods, called second-generation sequencing methods, that were faster and cheaper, began to be developed. The most popular, widely-used second-generation sequencing method was one called Pyrosequencing. Today various newer sequencing methods are available and others are in the progression of being developed. These are often called third generation or next- generation sequencing methods.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

3. First Generation Sequencing Methods

3.1 Maxam–Gilbert Sequencing (The Chemical Cleavage) Method

During 1976-1977, Allan Maxam and Walter Gilbert established a DNA sequencing technique grounded on chemical alteration of DNA and succeeding cleavage at precise bases. The technique necessitates radioactive tagging towards one end and purification of the DNA fragment to be sequenced. Chemical handling forms breaks at a minor proportion of one or two out of four nucleotide base in individual four reactions (G, A+G, C, C+T). Thus a sequence of marked remains is produced, from the radiolabelled terminal to the first ‘cut’ site in every molecule. The fragments of the four reactions are organized near each other in for size by size isolation. To observe the fragments, the gel is exposed to X- ray film for autoradiography, forming a sequence of dark bands each according to a radiolabelled DNA fragment, from which the arrangement may be inferred.

It was primary extensively accepted technique for DNA sequencing, however, it is no longer in widespread use because of its methodical complication eliminating the application in standard molecular biology kits, widespread application of dangerous elements, and difficulties with scale-up. Hence, this method has been supplanted by next generation sequencing methods.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

3.2 Sanger Sequencing Methods:

Frederick Sanger established numerous fast, more effective methods to order DNA (Sanger et al., 1977). Certainly, Sanger's effort in this domain was so ground breaking that he got the Nobel Prize in Chemistry in 1980. The two key methods developed by Sanger are called Chain-termination and Dye- terminator sequencing.

3.2.1 The chain-termination or dideoxy Method:

The chain termination technique needs a single-stranded DNA template, a DNA primer, a DNA polymerase, radioactively or fluorescently labelled nucleotides, and altered nucleotides that halts DNA strand extension. The DNA sample is separated into four distinct sequencing reactions, containing all four of the typical deoxynucleotides (dATP, dGTP, dCTP, dTTP) and the DNA polymerase. To each reaction is added one of the four dideoxynucleotide (ddATP, ddGTP, ddCTP, ddTTP) which are the chain terminating nucleotides, lacking a 3’- OH group needed for the creation of a phosphodiester bond among two nucleotides, thus halting DNA strand elongation and subsequently forming DNA fragments of varying length. The recently produced and labelled DNA fragments are heat denatured, and isolated as per size through gel electrophoresis on a denaturing polyacrylamide-urea gel with all of the four reactions move in one of the four individual lanes (lanes A, T, G,C), the DNA bands are then observed by autoradiography or UV light, and the DNA arrangement can be straight revealed by the X-ray film or gel image. A dark band specifies a DNA portion which is outcome of chain finish after combination of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The comparative location of the various bands amongst the four tracks are then applied to read the DNA sequence. The procedural differences of chain termination arrangement comprise labelling each dideoxynucleotide with dye.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

Dideoxynucleotides are alike to even, or deoxynucleotides, but has only one main difference they do not have hydroxyl group on the 3’ carbon of the sugar circle. In an even nucleotide, the 3’ hydroxyl group perform the function of as “hook," permitting a new nucleotide to be attached to prevailing chain.

After the dideoxynucleotide have been attached to the chain, it will have no hydroxyl accessible and no additional nucleotides will be attached. The chain terminates with the dideoxy nucleotide, that is labelled with a specific color of dye according to the base (A, T, C or G) that it transports.

3.2.2. Dye -terminator sequencing

Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than 4 reactions as in the tagged- primer method. In this, all of the four dideoxynucleotide chain terminators is marked with fluorescent dyes, all of that has diverse wavelengths of fluorescence and emission.

It produces DNA pieces that halt at every nucleotide beside the template strand. The DNA is isolated with capillary electrophoresis as per their size. By the arrangement of fragments, the DNA sequence can be revealed. The short pieces were halted initially, and they processed first from the column, and subsequently different fluorescent labels leaving the column is the sequence of the strand. The DNA sequence revealed is exposed on an electropherogram that is produced by a scanner. This technique permits four times more sequencing reactions to be electrophoresed on a gel, because 4 dideoxynucleotide reactions from a single template are in a single lane rather than four separate lanes.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

Shotgun Sequencing:

This technique was originally used by Fred Sanger and his co-workers for sequencing small of viruses and bacteria. The technique is entitled by the analogy with promptly intensifying, quasi-random firing configuration of a shotgun. In shotgun sequencing numerous replicas of the similar chromosome are isolated and fragmented in random locations. All the fragments are sequenced, the sequences obtained are compared to find out the overlapping ends of the fragments. Hence, complete sequence of the novel DNA from one terminal to another is assembled.

Nearly three decades since sanger was awarded part of the 1980 Noble prize in chemistry (which he shared with Walter Gilbert & Paul Beeg) for DNA sequencing, the sequencing technologies have been evolved dramatically to improve sequencing capabilities and new innovations in sequencing methods are developed quickly.

4. Second generation sequencing techniques

The key features of second generation sequencing methodologies is parallelization of high number of sequencing reactions achieved by miniaturization of sequencing reactions, highly improved detection system, reduction in cost and sequencing time to hours. Some of the second generation sequencing techniques are: 4.1 Pyrosequencing

A new automated technique pyrosequencing, named for the pyrophosphate molecule that is released when a dNTP is used by DNA polymerases to extend a new DNA strand, starts in a same manner to dideoxy sequencing but the pyrosequencing machine detects the incorporation of nucleotide into the growing strand without chain termination.

The technique intensifies DNA existing in water drops in an oil solution, with every drop comprising a single DNA template joined to a single primer-coated bead that then forms a clonal colony. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

4.2 Illumina (Solexa) sequencing

Solexa developed a sequencing technology based on dye terminators. In this, DNA molecule are first attached to primers on a slide and amplified, this is known as bridge amplification.The DNA can only be extended one nucleotide at a time. A camera takes images of the fluorescently labelled nucleotides, then the dye along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle.

4.3 SOLiD sequencing

It is based on sequencing by oligonucleotide ligation (SBL) and detection. Template of DNA is fragmented, two different adapters are attached to the termini of the fragments and then the fragments are mixed with excess of beads for PCR. After PCR the beads are deposited on the glass slide and bases are read by probing the beads with mixtures of 5’ fluorescently labelled probes.

4.4 Helioscope (TM) single molecule sequencing

Helioscope sequencing uses DNA fragments with added polyA tail adapters, which are attached to the flow cell surface. The next steps involve extension-based sequencing with cyclic washes of the flow cell with fluorescently labelled nucleotides.

5. Third generation sequencing- emerging technologies

5.1 Single molecule SMRT(TM) sequencing

SMRT sequencing is based on the sequencing by synthesis approach. The DNA is synthesised in so known zero-mode wave-guides (ZMWs) - small well-like containers with the seizing technique existing at the foot of the well. The sequencing is accomplished by the application of unchanged polymerase and fluorescently labelled nucleotides existing at liberty in the solution. The fluorescent label is separated from the nucleotide at its fusion in the DNA strand, leaving an unmodified DNA strand. This take place by the observation of polymerase kinetics. This approach allows reads of 1000 nucleotides.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

5.2 DNA nanoball sequencing

It is high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify fragments of genomic DNA molecules. This DNA sequencing allows large number of DNA nanoballs to be sequenced per run and at low reagent cost compared to other next generation sequencing platforms.

5.3 Nanopore Sequencing

Nanopores are formed by pore forming proteins. Conductivity of the pore for the ion currents changes when the pore is blocked by the strand of nucleic acid passing through the pore. Flow of ion current depends on the shape of the molecule passing through the pore. Since nucleotides have different shapes, they are recognised due to the change of ionic current. This technique can sequence single molecule.

5.4 Ion torrent sequencing

In this technique a chip with ion-sensitive field-effect transistor sensor capable of detecting individual protons is used. Beads containing template DNA are deposited into the wells of the chip. The chip is sequentially fleshed with individual dNTPs. Integration of nucleotide releases H+ which changes pH of the solution. the change in pH is detected by the sensor at the bottom of the well, that is converted into electronic signal which is recorded by the system. In conclusion it can be said that Sanger sequencing is still considered to be the “gold standard” of sequencing, due to its accuracy and read lengths. Automated Sanger sequencing still produces the longest reads. However, it is slow and expensive while the second and third generation technologies are faster and cost effective. The key problems of these technologies are accuracy and short reads, which makes final assembly or alignment difficult and computationally challenging. Therefore, the choice of the technology depends on the required application and advantages and disadvantages of the particular sequencing approach,

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

6. DNA Sequence analysis

Sequence is the procedure of imperilling DNA, RNA or peptide sequence in a widespread systematic technique to recognize its features, purpose, structure, or development. Procedures applied comprise arrangement configuration, explorations beside biological databases. Sequence processing can be applied to allocate function to genes and proteins by the study of the connections between the associated sequences. Currently, there are several tools and systems that deliver the sequence assessments (sequence alignment) and evaluate the arrangement product to understand its biology.

Sequence processing in molecular biology comprises number of significant issues:

1. The assessment of sequences for finding the resemblance, frequently to deduce if they are linked (homologous).

2. Documentation of intrinsic structures of the arrangement like active locations, post translational alteration positions, gene-structures, interpretation frames, deliveries of introns and exons and regulatory elements.

3. Identify sequence alterations and dissimilarities like point and single nucleotide polymorphism (SNP) for extracting the genetic marker.

4. Finding out the development and heritable variation of arrangements and organisms.

Sequence association denotes to the restoration of a DNA sequence by bring into line and integration small DNA fragments. It is an essential fragment of modern DNA sequencing. Subsequently existing DNA sequencing procedure are ill-suited for analysis elongated sequences, big pieces of DNA are frequently sequenced by

(1) Cutting the DNA into small pieces,

(2) Reading the small fragments, and

(3) Reconstituting the original DNA by merging the information on various fragment.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

One can just take the first letter of the query sequence, search for its first occurrence in the database, and then check if the succeeding letter of the query is the same in the subject. If it is indeed the same, the program could check the third letter, then the fourth, and continue this comparison to the end of the query. If the next letter in the issue is dissimilar from the second letter in the query, the program should search for another occurrence of the first letter, and so on. This will identify all the sequences in the database that are identical to the query sequence (or include it).

In the given instance, we looked only for sequences that exactly match the query. The algorithm would not even find a sequence that is identical to the query with the exception of the first letter. To discover such arrangement, the similar scrutiny should be directed with the fragments beginning from the second letter of the original query, then from the third one, and similarly proceeded further.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I

7. Summary

 DNA composed of two strands of nucleotides coiled near each other, joined by hydrogen bonds and successively moving in reverse directions. Each strand is composed of four complementary nucleotides – adenine (A), cytosine (C), guanine (G) and thymine (T).

 DNA sequencing is the identification of the specific arrangement of nucleotides in a sample of DNA. There are 2 classical methods of sequencing, Sanger sequencing or Chain termination and Maxam-Gilbert sequencing by chemical cleavage method.

 In the late 1990s, new methods, called second-generation sequencing methods, that were faster and cheaper, began to be developed. The most widespread applied second- generation sequencing method was also called Pyrosequencing.

 Today, various newer sequencing methods are available and others are in the procedure of being developed. These are often called third generation or next- generation sequencing methods.

 Sanger sequencing is still called to be the “gold standard” of sequencing, because of its accurateness. Automated Sanger sequencing still produces the longest reads. However, it is slow and expensive while the second and third generation technologies are faster and cost effective. The key problems of these technologies are accuracy and short reads, which makes final assembly or alignment difficult and computationally challenging. Therefore, the selection of the method rest on the required application and advantages and disadvantages of the particular sequencing approach.

FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I