Chapter 6 Sequencing Genomes
Total Page:16
File Type:pdf, Size:1020Kb
4/6/2020 Sequencing Genomes - Genomes - NCBI Bookshelf NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health. Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002. Chapter 6 Sequencing Genomes Learning outcomes When you have read Chapter 6, you should be able to: Distinguish between the two methods used to sequence DNA Give a detailed description of chain termination sequencing and an outline description of the chemical degradation method Describe the key features of automated DNA sequencing and evaluate the importance of automated sequencing in genomics research State the strengths and limitations of the shotgun, whole-genome shotgun and clone contig methods of genome sequencing Describe how a small bacterial genome can be sequenced by the shotgun method, using the Haemophilus influenzae project as an example Outline the various ways in which a clone contig can be built up Explain the basis to the whole-genome shotgun approach to genome sequencing, with emphasis on the steps taken to ensure that the resulting sequence is accurate Give an account of the development of the Human Genome Project up to the publication of the draft sequence in February 2001 Debate the ethical, legal and social issues raised by the human genome projects of a genome project is the complete DNA sequence for the organism being studied, ideally integrated with the genetic and/or physical maps of the genome so that genes and other interesting features can be located within the DNA sequence. This chapter describes the techniques and research strategies that are used during the sequencing phase of a genome project, when this ultimate objective is being directly addressed. Techniques for sequencing DNA are clearly of central importance in this context and we will begin the chapter with a detailed examination of sequencing methodology. This methodology is of little value however, unless the short sequences that result from individual sequencing experiments can be linked together in the correct order to give the master sequences of the chromosomes that make up the genome. The middle part of this chapter describes the strategies used to ensure that the master sequences are assembled correctly. Finally, we will review the way in which mapping and sequencing were used to produce the two draft human genome sequences that were published in February 2001. 6.1. The Methodology for DNA Sequencing Rapid and efficient methods for DNA sequencing were first devised in the mid-1970s. Two different procedures were published at almost the same time: The chain termination method (Sanger et al., 1977), in which the sequence of a single- stranded DNA molecule is determined by enzymatic synthesis of complementary https://www.ncbi.nlm.nih.gov/books/NBK21117/#A6452 1/21 4/6/2020 Sequencing Genomes - Genomes - NCBI Bookshelf polynucleotide chains, these chains terminating at specific nucleotide positions; The chemical degradation method (Maxam and Gilbert, 1977), in which the sequence of a double-stranded DNA molecule is determined by treatment with chemicals that cut the molecule at specific nucleotide positions. Both methods were equally popular to begin with but the chain termination procedure has gained ascendancy in recent years, particularly for genome sequencing. This is partly because the chemicals used in the chemical degradation method are toxic and therefore hazardous to the health of the researchers doing the sequencing experiments, but mainly because it has been easier to automate chain termination sequencing. As we will see later in this chapter, a genome project involves a huge number of individual sequencing experiments and it would take many years to perform all these by hand. Automated sequencing techniques are therefore essential if the project is to be completed in a reasonable time-span. 6.1.1. Chain termination DNA sequencing Chain termination DNA sequencing is based on the principle that single-stranded DNA molecules that differ in length by just a single nucleotide can be separated from one another by polyacrylamide gel electrophoresis (Technical Note 6.1). This means that it is possible to resolve a family of molecules, representing all lengths from 10 to 1500 nucleotides, into a series of bands (Figure 6.1). Box 6.1 Polyacrylamide gel electrophoresis. Separation of DNA molecules differing in length by just one nucleotide. Polyacrylamide gel electrophoresis is used to examine the families of chain-terminated DNA molecules resulting from a sequencing experiment. Agarose (more...) Figure 6.1 Polyacrylamide gel electrophoresis can resolve single- stranded DNA molecules that differ in length by just one nucleotide. The banding pattern is produced after separation of single-stranded DNA molecules by denaturing polyacrylamide gel electrophoresis. (more...) Chain termination sequencing in outline The starting material for a chain termination sequencing experiment is a preparation of identical single-stranded DNA molecules. The first step is to anneal a short oligonucleotide to the same position on each molecule, this oligonucleotide subsequently acting as the primer for synthesis of a new DNA strand that is complementary to the template (Figure 6.2A). The strand synthesis reaction, which is catalyzed by a DNA polymerase enzyme (Section 4.1.1 and Box 6.1) and requires the four deoxyribonucleotide triphosphates (dNTPs - dATP, dCTP, dGTP and dTTP) as substrates, would normally continue until several thousand nucleotides had been polymerized. This does not occur in a chain termination sequencing experiment because, as well as the four dNTPs, a small amount of a dideoxynucleotide (e.g. ddATP) is added to the reaction. The polymerase enzyme does not discriminate between dNTPs and ddNTPs, so the dideoxynucleotide can be incorporated into the growing chain, but it then blocks further https://www.ncbi.nlm.nih.gov/books/NBK21117/#A6452 2/21 4/6/2020 Sequencing Genomes - Genomes - NCBI Bookshelf elongation because it lacks the 3′-hydroxyl group needed to form a connection with the next nucleotide (Figure 6.2B). Box 6.1 DNA polymerases for chain termination sequencing. Any template-dependent DNA polymerase is capable of extending a primer that has been annealed to a single-stranded DNA molecule, but not all polymerases do this in a way that is useful for DNA sequencing. (more...) If ddATP is present, chain termination occurs at positions opposite thymidines in the template DNA (Figure 6.2C). Because dATP is also present the strand synthesis does not always terminate at the first T in the template; in fact it may continue until several hundred nucleotides have been polymerized before a ddATP is eventually incorporated. The result is therefore a set of new chains, all of different lengths, but each ending in ddATP. Now the polyacrylamide gel comes into play. The family of molecules generated in the presence of ddATP is loaded into one lane of the gel, and the families generated with ddCTP, ddGTP and ddTTP loaded into the three adjacent lanes. After electrophoresis, the DNA sequence can be read directly from the positions of the bands in the gel (Figure 6.2D). The band that has moved the furthest represents the smallest piece of DNA, this being the strand that terminated by incorporation of a ddNTP at the first position in the template. In the example shown in Figure 6.2 this band lies in the ‘G’ lane (i.e. the lane containing the molecules terminated with ddGTP), so the first nucleotide in the sequence is ‘G’. The next band, corresponding to the molecule that is one nucleotide longer than the first, is in the ‘A’ lane, so the second nucleotide is ‘A’ and the sequence so far is ‘GA’. Continuing up through the gel we see that the next band also lies in the ‘A’ lane (sequence GAA), then we move to the ‘T’ lane (GAAT), and so on. The sequence reading can be continued up to the region of the gel where individual bands are not separated. Chain termination sequencing requires a single-stranded DNA template The template for a chain termination experiment is a single-stranded version of the DNA molecule to be sequenced. There are several ways in which this can be obtained: The DNA can be cloned in a plasmid vector (Section 4.2.1). The resulting DNA will be double stranded so cannot be used directly in sequencing. Instead, it must be converted into single-stranded DNA by denaturation with alkali or by boiling. This is a common method for obtaining template DNA for DNA sequencing, largely because cloning in a plasmid vector is such a routine technique. A shortcoming is that it can be difficult to prepare plasmid DNA that is not contaminated with small quantities of bacterial DNA and RNA, which can act as spurious templates or primers in the DNA sequencing experiment. The DNA can be cloned in a bacteriophage M13 vector. Vectors based on M13 bacteriophage are designed specifically for the production of single-stranded templates for DNA sequencing. M13 bacteriophage has a single-stranded DNA genome which, after infection of Escherichia coli bacteria, is converted into a double-stranded replicative form. The replicative form is copied until over 100 molecules are present in the cell, and when the cell divides the copy number in the new cells is maintained by further replication. At the same time, the infected cells continually secrete new M13 phage particles, approximately 1000 per generation, these phages containing the single-stranded version of the genome (Figure 6.3). Cloning vectors based on M13 vectors are double-stranded DNA molecules equivalent to the replicative form of the M13 genome. They can be manipulated https://www.ncbi.nlm.nih.gov/books/NBK21117/#A6452 3/21 4/6/2020 Sequencing Genomes - Genomes - NCBI Bookshelf in exactly the same way as a plasmid cloning vector. The difference is that cells that have been transfected with a recombinant M13 vector secrete phage particles containing single- stranded DNA, this DNA comprising the vector molecule plus any additional DNA that has been ligated into it.