Chapter 5 Mapping Genomes
Total Page:16
File Type:pdf, Size:1020Kb
4/8/2020 Mapping Genomes - Genomes - NCBI Bookshelf NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health. Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002. Chapter 5 Mapping Genomes Learning outcomes When you have read Chapter 5, you should be able to: Explain why a map is an important aid to genome sequencing Distinguish between the terms ‘genetic map’ and ‘physical map’ Describe the different types of marker used to construct genetic maps, and state how each type of marker is scored Summarize the principles of inheritance as discovered by Mendel, and show how subsequent genetic research led to the development of linkage analysis Explain how linkage analysis is used to construct genetic maps, giving details of how the analysis is carried out in various types of organism, including humans and bacteria State the limitations of genetic mapping Evaluate the strengths and weaknesses of the various methods used to construct physical maps of genomes Describe how restriction mapping is carried out Describe how fluorescent in situ hybridization (FISH) is used to construct a physical map, including the modifications used to increase the sensitivity of this technique Explain the basis of sequence tagged site (STS) mapping, and list the various DNA sequences that can be used as STSs Describe how radiation hybrids and clone libraries are used in STS mapping describe the techniques and strategies used to obtain genome sequences. DNA sequencing is obviously paramount among these techniques, but sequencing has one major limitation: even with the most sophisticated technology it is rarely possible to obtain a sequence of more than about 750 bp in a single experiment. This means that the sequence of a long DNA molecule has to be constructed from a series of shorter sequences. This is done by breaking the molecule into fragments, determining the sequence of each one, and using a computer to search for overlaps and build up the master sequence (Figure 5.1). This shotgun method is the standard approach for sequencing small prokaryotic genomes, but is much more difficult with larger genomes because the required data analysis becomes disproportionately more complex as the number of fragments increases (for n fragments the number of possible overlaps is given by 2n 2 - 2n). A second problem with the shotgun method is that it can lead to errors when repetitive regions of a genome are analyzed. When a repetitive sequence is broken into fragments, many of the resulting pieces contain the same, or very similar, sequence motifs. It would be very easy to reassemble these sequences so that a portion of a repetitive region is left out, or even to connect together two quite separate pieces of the same or different chromosomes (Figure 5.2). Figure 5.1 https://www.ncbi.nlm.nih.gov/books/NBK21116/ 1/31 4/8/2020 Mapping Genomes - Genomes - NCBI Bookshelf The shotgun approach to sequence assembly. The DNA molec small fragments, each of which is sequenced. The master sequ by searching for overlaps between the sequences of individual practice, an overlap of (more...) Figure 5.2 Problems with the shotgun approach. (A) The DNA molecule contains a tandemly repeated element made up of many copies of the sequence GATTA. When the sequences are examined, an overlap is identified between two fragments, but these are from either end (more...) The difficulties in applying the shotgun method to a large molecule that has a significant repetitive DNA content means that this approach cannot be used on its own to sequence a eukaryotic genome. Instead, a genome map must first be generated. A genome map provides a guide for the sequencing experiments by showing the positions of genes and other distinctive features. Once a genome map is available, the sequencing phase of the project can proceed in either of two ways (Figure 5.3): Figure 5.3 Alternative approaches to genome sequencing. A genome consisting of a linear DNA molecule of 2.5 Mb has been mapped, and the positions of eight markers (A-H) are known. On the left, the clone contig approach starts with a segment of DNA whose position (more...) By the whole-genome shotgun method (Section 6.2.3), which takes the same approach as the standard shotgun procedure but uses the distinctive features on the genome map as landmarks to aid assembly of the master sequence from the huge numbers of short sequences that are obtained. Reference to the map also ensures that regions containing repetitive DNA are assembled correctly. The whole-genome shotgun approach is a rapid way of obtaining a eukaryotic genome sequence, but there are still doubts about the degree of accuracy that can be achieved. By the clone contig approach (Section 6.2.2). In this method the genome is broken into manageable segments, each a few hundred kb or a few Mb in length, which are short enough to be sequenced accurately by the shotgun method. Once the sequence of a segment has been completed, it is positioned at its correct location on the map. This step- by-step approach takes longer than whole-genome shotgun sequencing, but is thought to produce a more accurate and error-free sequence. With both approaches, the map provides the framework for carrying out the sequencing phase of the project. If the map indicates the positions of genes, then it can also be used to direct the initial part of a clone contig project to the interesting regions of a genome, so that the sequences of important genes are obtained as quickly as possible. 5.1. Genetic and Physical Maps The convention is to divide genome mapping methods into two categories. https://www.ncbi.nlm.nih.gov/books/NBK21116/ 2/31 4/8/2020 Mapping Genomes - Genomes - NCBI Bookshelf Genetic mapping is based on the use of genetic techniques to construct maps showing the positions of genes and other sequence features on a genome. Genetic techniques include cross-breeding experiments or, in the case of humans, the examination of family histories (pedigrees). Genetic mapping is described in Section 5.2. Physical mapping uses molecular biology techniques to examine DNA molecules directly in order to construct maps showing the positions of sequence features, including genes. Physical mapping is described in Section 5.3. 5.2. Genetic Mapping As with any type of map, a genetic map must show the positions of distinctive features. In a geographic map these markers are recognizable components of the landscape, such as rivers, roads and buildings. What markers can we use in a genetic landscape? 5.2.1. Genes were the first markers to be used The first genetic maps, constructed in the early decades of the 20th century for organisms such as the fruit fly, used genes as markers. This was many years before it was understood that genes are segments of DNA molecules. Instead, genes were looked upon as abstract entities responsible for the transmission of heritable characteristics from parent to offspring. To be useful in genetic analysis, a heritable characteristic has to exist in at least two alternative forms or phenotypes, an example being tall or short stems in the pea plants originally studied by Mendel. Each phenotype is specified by a different allele of the corresponding gene. To begin with, the only genes that could be studied were those specifying phenotypes that were distinguishable by visual examination. So, for example, the first fruit-fly maps showed the positions of genes for body color, eye color, wing shape and suchlike, all of these phenotypes being visible simply by looking at the flies with a low-power microscope or the naked eye. This approach was fine in the early days but geneticists soon realized that there were only a limited number of visual phenotypes whose inheritance could be studied, and in many cases their analysis was complicated because a single phenotype could be affected by more than one gene. For example, by 1922 over 50 genes had been mapped onto the four fruit-fly chromosomes, but nine of these were for eye color; in later research, geneticists studying fruit flies had to learn to distinguish between fly eyes that were colored red, light red, vermilion, garnet, carnation, cinnabar, ruby, sepia, scarlet, pink, cardinal, claret, purple or brown. To make gene maps more comprehensive it would be necessary to find characteristics that were more distinctive and less complex than visual ones. The answer was to use biochemistry to distinguish phenotypes. This has been particularly important with two types of organisms - microbes and humans. Microbes, such as bacteria and yeast, have very few visual characteristics so gene mapping with these organisms has to rely on biochemical phenotypes such as those listed in Table 5.1. With humans it is possible to use visual characteristics, but since the 1920s studies of human genetic variation have been based largely on biochemical phenotypes that can be scored by blood typing. These phenotypes include not only the standard blood groups such as the ABO series (Yamamoto et al., 1990), but also variants of blood serum proteins and of immunological proteins such as the human leukocyte antigens (the HLA system). A big advantage of these markers is that many of the relevant genes have multiple alleles. For example, the gene called HLA-DRB1 has at least 290 alleles and HLA-B has over 400. This is relevant because of the way in which gene mapping is carried out with humans (Section 5.2.4). Rather than setting up many breeding experiments, which is the procedure with experimental organisms such as fruit flies or mice, data on inheritance of human genes have to be gleaned by examining the phenotypes displayed by members of a single family.