Genome Mapping” Introduction

Central University of South Bihar Panchanpur, Gaya, India

E-Learning Resources

Department of Biotechnology

NB: These materials are taken/borrowed/modified/compiled from various sources like research articles and freely available internet websites, and are meant to be used solely for the teaching purpose in a public university, and for serving the needs of specified educational programmes. Programme: M.Sc. Biotechnology; 2nd semester Molecular Biology and Genomics Course Instructor: Dr. Krishna Prakash

Topic “Genome mapping” Introduction

• The process of determining the relative position of genes on a chromosome (or another piece of DNA) and the distances between the genes in linkage units or in physical units • Distance between two genes is measured by centimorgan unit (Man; 1CM=1108 kb) • 1 CM means 1% chance of recombination between 2 genes. Genetic and Physical Maps

• genome mapping methods can be divided into two categories. • Genetic mapping: uses genetic techniques to construct maps showing the positions of genes and other sequence features on a genome. Genetic techniques include cross-breeding experiments or, in the case of humans, the examination of family histories (pedigrees). • Physical mapping uses molecular biology techniques to examine DNA molecules directly in order to construct maps showing the positions of sequence features, including genes. Gene Markers for genetic mapping

• The first genetic maps, constructed in the organisms such as the fruit fly, used genes as markers. • The only genes that could be studied were those specifying phenotypes that were distinguishable by visual examination. Eg. Eye color, height. • Some organisms have very few visual characteristics so gene mapping with these organisms has to rely on biochemical phenotypes Biochemical markers for genetic analysis of yeast • ADE2 :Requires adenine, Grows only when adenine is present in the medium • CAN1: Resistant to canavanine, Grows in the presence of canavanine • CUP1 :Resistant to copper, Grows in the presence of copper • CYH1 :Resistant to cycloheximide, Grows in the presence of cycloheximide • LEU2 :Requires leucine, Grows only when leucine is present in the medium • SUC2: Able to ferment sucrose, Grows if sucrose is the only carbohydrate in the medium • URA3: Requires uracil, Grows only when uracil is present in the medium Biochemical markers in Human

• In human the biochemical phenotypes that can be scored by blood typing. These include the standard blood groups such as the ABO series and also the human leukocyte antigens (the HLA system). • A big advantage of these markers is that many of the relevant genes have multiple alleles. For example, the gene called HLA-DRB1 has at least 290 alleles and HLA- B has over 400. This is relevant because If all the family members have the same allele for the gene being studied then no useful information can be obtained. Drawbacks of using gene as marker

• Genes are very useful markers but they are by no means ideal. One problem, especially with larger genomes such as those of vertebrates and flowering plants, is that a map based entirely on genes is not very detailed DNA markers

• As with gene markers, a DNA marker must have at least two alleles to be useful. There are three types of DNA sequence feature that satisfy this requirement: restriction fragment length polymorphisms (RFLPs), simple sequence length polymorphisms (SSLPs), and single nucleotide polymorphisms (SNPs). RFLP

• A DNA fragment whose length is variable because of (+)ce of polymorphic restriction sites. • The DNA molecule on the left has a polymorphic restriction site (marked with the asterisk) that is not present in the molecule on the right. The RFLP is revealed after treatment with the restriction enzyme because one of the molecules is cut into four fragments whereas the other is cut into three fragments. RFLP scoring in Human

• Lets say EcoRI= target site is hexanucleotide 5’G AATTC3’ • 46=4096 bp would give 800,000 fragments when used with human genome DNA (3x109) • Gel electrophoresis will give smear of DNA • The RFLP cannot be distinguished • Here scoring is needed for RFLP • Two methods Southern hybridization and by PCR Simple sequence length polymorphisms (SSLPs) • SSLPs are arrays of repeat sequences that display length variations, different alleles containing different numbers of repeat units • Unlike RFLPs that can have only two alleles, SSLPs can be multi-allelic as each SSLP can have a number of different length variants. There are two types of SSLP, both of which were described in • Minisatellites, also known as variable number of tandem repeats (VNTRs), in which the repeat unit is up to 25 bp in length; • Microsatellites or simple tandem repeats (STRs), whose repeats are shorter, usually dinucleotide or tetranucleotide units. Microsatellites are more popular than minisatelite as DNA marker • 2 reasons • Minisatellite are not spread evenly through out the genome but found to be frequently in telomeric region • Microsatellites are evenly distributed throughout the genome Why PCR typing is quicker and accurate in microsatellite DNA than minisatellite DNA? • PCR typing is accurate for less than 300 bp length. Most minisatellites alleles are longer than this because repeat units are relatively large and tend to them in a single array. So PCR products several kilobases in length are needed to type them • Microsatellite is amenable by PCR typing as its repeat is no longer than 6 bp • There are 5x105 microsatellite with repeat units of 6 bp of less in the human genome. SSLPs typing by PCR Single nucleotide polymorphisms (SNPs) • These are positions in a genome where some individuals have one nucleotide (e.g. a G) and others have a different nucleotide (e.g. a C)). There are vast numbers of SNPs in every genome, some of which also give rise to RFLPs, but many of which do not because the sequence in which they lie is not recognized by any restriction enzyme. In the human genome there are at least 1.42 million SNPs, only 100 000 of which result in an RFLP SNP continued… • Although each SNP could, potentially, have four alleles (because there are four nucleotides). • If SNP array length is 3 nucleotides then 4 nt change is possible. for example 1. ATG, 2. TTG, 3. AAG and 4. ATC • Theoretically, it is possible but practically SNPs exists as just two variants. This is due to point mutation in genome converting one nucleotide to another • If point mutation is in reproductive cell then individual’s off spring inherits the mutation and after several generation the SNPs may eventually established in population. But two alleles exists- the original and the mutated version. SNP continued…

• For to establish 3rd allele, a new mutation to be occurred in same position in genome in another individual and this must passed to his/her offspring. In this way new allele becomes established. This is not possible consequently the most SNPs are biallelic. • Huge no. of SNPs are present in each genome in most eukaryotes, at least 1 for every 10 kb of DNA • That’s why SNP enable very detailed genome maps to be constructed. oligonucleotide hybridization analysis. • An oligonucleotide is a short single-stranded DNA molecule, usually less than 50 nucleotides in length, that is synthesized in the test tube. If the conditions are just right, then an oligonucleotide will hybridize with another DNA molecule only if the oligonucleotide forms a completely base- paired structure with the second molecule. If there is a single mismatch - a single position within the oligonucleotide that does not form a base pair - then hybridization does not occur. SNP typing: oligonucleotide hybridization analysis

• An oligonucleotide is short, 50 nt long , ssDNA synthesized in test tube Oligonucleotide hybridization

• Oligonucleotide hybridization can therefore discriminate between the two alleles of an SNP. Various screening strategies have been devised including DNA chip technology and solution hybridization techniques. Solution hybridization techniques

Require microtitre tray, contains well contains different oligonucleotides

Quencher is close to fluorescent label end and it quenches fluorescent end

Here the quencher end of probe is away from fluorescent end due to hybridization

This is measured by spectrophotometer Physical mapping techniques

• Restriction mapping, on the basis of presence of recognition sequence of Restriction Enzyme. • Fluorescent in situ hybridization (FISH), in which marker locations are mapped by hybridizing a probe containing the marker to intact chromosomes; • Sequence tagged site (STS) mapping, short sequences are mapped by examining collections of genomic DNA fragment by PCR/hybridization analysis. The basic methodology for restriction mapping • RFLP only locate the polymorphic site in genome. • But few restriction sites are polymorphic so many genes are not mapped by RFLP • To locate non-polymorphic restriction site in the genome Restriction mapping is helpful • The simplest way to construct a restriction map is to compare the fragment sizes produced when a DNA molecule is digested with two different restriction enzymes that recognize different target sequences.

Limitations of Restriction mapping

• Larger fragment ~50 kb is not mapped by Restriction mapping because this much fragment size in not resolved on GE. • Two restriction enzymes are required to construct restriction map Is there any possibility of using restriction mapping for the more general mapping of entire genomes larger than 50kb? • Yes, because restriction mapping limitation is eased by choosing RE that cut DNA molecule infrequently. These rare cutter falls into 2 categories: • 1. Enzymes with seven- or eight-nucleotide recognition sequences • 7 or 8 nt recognition sequence ex. SapI(5’GCTCTTTC-3’) and SgfI(5’GCGATCGC3’) 7 nt. recognition sequences would be expected on average to cut a DNA molecule with a GC content of 50% once every 47=16384 bp. • 7 or 8 nt RS often used in restriction mapping of large molecules but the approach is not useful because many of this group RE are not known. • Enzymes whose recognition sequences contain motifs that are rare in the target DNA • 2nd Some RE target sites are rare in genomic DNA for example the sequence 5’-CG-3’ is rare in genome of vertebrates. Vertebrates posses methylase enzyme that add methyl group at 5C nt. Deamination results into T nt. • NotI has 8 nt target site 48=~65kb but cut genome once at 10 mbp • SmaI 5’CCCGGG3’ which cuts once every 78 kbp on average • BssHII 5’GCGCGC 3’ which cuts once every 390 kb Gel stretching and molecular combing

Air dry--- treat with RE---stain by DAPI (4,6-diamino-2-phenylidone)—DNA visualized by fluorescence Micro scope. Fluorescent in situ hybridization (FISH)

• FISH ( Heiskanen et al., 1996). • Is a version of in situ hybridization in which intact chromosome is probed by a labeled DNA molecule. • The position of labeled DNA is visualized by optical mapping directly. • Chromosome DNA must be denature to be hybridized with probe. Then chromosome will able to hybridize with probe. • For denaturation DNA of chromosome is done by drying the cell glass microscope slide then treated with formamide. Fluorescent in situ hybridization • Sample cell dried on MS slide • Treatment with formamide • Chromosome denatured but doesn’t lose their characteristic metaphase morphologies. • Hybridization with probe • Signal is detected by fluorescence MS. Sequence tagged site (STS)

• For generating detailed genome map of large genome • More powerful than FISH • STS is a short DNA sequence (100-500bp) occurs only once in genome chromosome. • To map STS, collection of DNA fragment is made from single chromosome High probability Large of close apart has markers less are chance to present on come on all one fragments fragment What exactly is STS and how DNA fragment collection obtained? • To qualify DNA fragment to be STS. Two criteria must be fulfill: • 1. Its sequence must be known. So, PCR can be set up easily. To test the presence or absence of STS on different DNA fragments • 2. STS must have a unique sequence in genome. If more than one STS present in genome than the mapping data will be ambiguous. Importance of gene mapping

#Gene map is the anatomy of human genome. It is a perrequisite to understand functioning of human genome.

#Helps to understand human genetic diseases.

#Helps to develop methods for gene therapy.

# Localization of genes to improvement of plant verities (maps constructed Arabidopsis, Rice, barley, banana, Brassicaceae etc.) Reference Book

• Genomes 3 By TA Brown