Conserved Sequence-Tagged Sites

Proc. Nati. Acad. Sci. USA Vol. 89, pp. 3681-3685, May 1992 Genetics Conserved sequence-tagged sites: A phylogenetic approach to genome mapping (primates/mouse/sequence conservation/evolution/PCR) RICHARD MAZZARELLA*, VITrORIO MONTANARO*, JUHA KERE*, ROLLAND REINBOLD*, ALFREDO CICCODICOLAt, MICHELE D'URSOt, AND DAVID SCHLESSINGER*t *Department of Molecular Microbiology and the Center for Genetics in Medicine, Washington University School of Medicine, St. Louis, MO 63110; and tInternational Institute of Genetics and Biophysics, Consiglio Nazionale della Ricerche, Naples, Italy Communicated by James D. Watson, February 3, 1992 ABSTRACT Cognate sites in genomes that diverged 100 genomic mapping studies wherever they are feasible-and, million years ago can be detected by PCR assays based on for example, often permit the easy discrimination of members primer pairs from unique sequences. The great majority of of a gene family, or of true genes from pseudogenes (14), that such syntenically equivalent sequence-tagged sites (STSs) from are difficult by hybridization methods. human DNA can be used to assemble and format corresponding In the formulation of Olson et al. (13), a genomic map is maps for other primates, and some based on gene sequences are formatted with sequence-tagged sites (STSs). Each STS is shown to be useful for mouse and rat as well. Universal genomic characterized by a unique primer pair and the corresponding mapping strategies may be possible by using sets of STSs PCR product; different types of STSs may be developed for common to many mammalian species. various purposes. Some STS PCR products detect length polymorphisms, usually based on their content of dinucle- The unity of biochemistry is based on conservation of genes otide or other simple sequence repeats (15, 16). Such STSs during evolution. With the increasing interest in maps of are highly informative probes for genetic linkage mapping and complex genomes (1), the subclass ofgenomic sequences that provide markers for physical maps, but they would usually be are most tightly conserved takes on a special significance. relatively species specific. STSs for physical mapping can Such sequences are already reagents to detect homologous also be developed indifferently from any fragments ofunique genes in different organisms (2, 3). But the order as well as sequence (17), but such STSs could well be specific to one or the sequence ofgenes tends to be conserved across extensive a few species (see below). In contrast, STSs derived from blocks of syntenically equivalent DNA in complex orga- evolutionarily conserved gene sequences, including cDNA nisms. As a result, such sequences may provide a way to sequences (18), could provide relatively universal mapping assemble overlapping clones and format the resulting ge- reagents. nomic maps in many species-even across branches of the Despite the potential advantages, it is not intuitively ob- phylogenetic tree. vious that syntenically equivalent STSs are feasible mapping Relevant syntenically equivalent relationships are well tools. Although PCRs can yield greater specificity, the prim- established-for example, between mouse and human ge- ers are much shorter than hybridization probes and require nomes. Spanning 100 million years of evolution (4), the greater stringency to maintain that specificity. Syntenically content and order of genes are conserved across regions of equivalent STS content mapping then depends on the extent the order of a cytogenetic band (5) [2-10 megabases (Mb) or to which evolution has conserved pockets of sufficiently high more]. On the X chromosome, where the content has been sequence identity between disparate species. have Apart from some information about cDNA sequences, largely fixed during evolution (6), only a few regions little is known about the conservation of genomic sequence, shifted their relative positions (7), and, on autosomes, the especially intergenic regions. It has therefore been unclear to gene content of large blocks of chromosomal DNA seems to what extent STSs from random fragments ofhuman DNA can have been conserved during translocation from one chromo- detect corresponding regions even among closely related some to another (8). species. This study has examined the ability of STS primers Clones for different species have been correlated by direct made from human DNA sequences (i) to function across hybridization of cDNA probes. Even in cases in which a species at the genomic level of DNA complexity; (ii) to cDNA from human is only partially homologous, it can be accommodate a mismatch or degenerate sequence to increase used as a hybridization probe to screen for the corresponding their likelihood of success; and (iii) to provide syntenically cDNA in mouse (e.g., see ref. 9). Hybridization provides a equivalent products from cDNA sequences and random proven way to organize overlapping clones into long-range DNA fragments. We then attempted to determine the degree maps (10-12), and it is fast and cheap. However, most current to which such "conserved STSs" are possible. syntenic equivalence studies with cDNA probes involve isolation of a species-specific cDNA before location in the genome is determined; few random DNA probes have been MATERIALS AND METHODS shown to cross-hybridize efficiently and specifically with PCR Analysis. One hundred nanograms of total genomic genomic DNA from a variety of species. DNA from each mammalian species and chicken was ampli- An alternative approach to cross-species physical mapping fied by PCR in a 15-,ul reaction mixture containing 6 pmol of is conceivable based on use of the PCR. As has been the appropriate primer pair as listed in Table 1 and 0.3 unit of discussed (13), their portability, ease, and potential for au- Amplitaq (Perkin-Elmer/Cetus). For each primer pair, the tomation make PCR methods an attractive alternative for Abbreviations: STS, sequence-tagged site; Mb, megabase(s); F9, The publication costs of this article were defrayed in part by page charge factor IX; HPRT, hypoxanthine-guanine phosphoribosyltransferase; payment. This article must therefore be hereby marked "advertisement" GRP94, 94-kDa glucose-regulated protein; GLA, a-galactosidase A. in accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be addressed. 3681 Downloaded by guest on October 8, 2021 3682 Genetics: Mazzarella et al. Proc. Natl. Acad. Sci. USA 89 (1992) Table 1. Gene-specific and random genomic STSs used for study of sequence conservation STS GenBank Primer sequences (5'- STS Specific PCR product name Primer Primer 2 1length HUM GOR CH T MAC MOU RAT YEA F9 J00137, CTTCAGTACCTTAGAGTTCC CCATATTTGCCTTTCATTGC 221 + + + M23109 HPRT M31642, AGCTTGCTGGTGAAAAGG TCATTATAGTCAAGGGCATATC 278 J00423 GRP94 X15187, CTGAA (G/A)AAGGGCTATGAAGT AACCTCTT (C/G) CCATCAAA (C/T) TC 89 J03297, M14772 MIC2 M22556 GCTCTATGTTTCCAAGAAG GTTTACAGCCCTCTGAATG 84 + + + + _ __ PLP M15026 GAGAAGATGGAGCCCTTA TCCTCTTCTCCTGCAATGAAA 153 + + + + + + _ HRASP X00419 CTGAACCACCAGTGCTTCG CACACCATCACAGACAGCC 167 + + + + - - - AMD M21154 CTGTATCTGCCTCTATTTC GTTACTAAAGTTCAGGTTCC 139 + + + + + + _ GF1A M30601 AGCCCAGGTTAATCCCCAG TGTGGAGGACACCAGAGCAG 107 + + + + - - - POLA X06745 CAGGGAGTTTTGTATCTTC CTTTTTCAGTCTTTCTAGGG 83 + + + + _ _ _ GLA M13571 CTAGAGCACTGGACAATGG GTCAAGGTTGCACATGAAG 80 + + + + + + - TBG M14091 CAGCGTTTTCATAATGTTGC TAATATGGACAGGGAGTAG 93 + + + + _ _ _ L1CAM M30257 TGAATACCCTCCCAGGCAC ATCTTCCCAGGCATTTTAAG 99 + + + _ _ _ COL4A5 M31115 CAGGAGAAAAAGGTAGTAAAGG TTTTGAGCCCAGAAGATTTG 80 + + + + _ _ _ sWXD93 CATAGAACAAGCAGAAGG CAGAAAGAAGATATTGCTGG 102 + + + + sWXD94 CAAAACTTTCCTACCTACC CTGACCATACACATAATCC 130 + + + + - - - sWXD95 AATTTAGGCAAGAGCAGC TTCTCCCCAAATAAATCCC 60 + + + + - - - sWXD96 ATCGTGCTGCTGTACTCC GGCAGATATGAAACTGAGG 128 + + + - - - - sWXD97 GGAGGGAAGAAGAGAGGG CAGCGAGAGTTAGTGAGG 137 + + + - _ _ _ sWXD98 CAACTGGGATAAGTCACC GTGATTGAGAATGAATGGG 106 + + + + - - - sWXD99 CCCTTCACTCACCTTCCC CAGATAGTTCTTTATAGCAGTGCG 102 + + + + - - - sWXD1 00 CGTGCTTAGGCTTAATCCCC GAACTGACTGTAGAGAAGG 145 + + + + _ __ sWXD101 GAAATTCTTCACTACCTCC AACACATCTCAGACATCC 160 + + + + - - - sWXD1 02 CTTTGATAGTTCAGGTTTGC GAGAATCTTCTGTCTAGG 122 + + + sWXD115 GCTGTAGATTCACTTTCG AAGACCTACCAAAGCTCC 142 + _ + _ _ sWXD1 17 CATTTTGTAGCTGAGAAAGG GCAATTCAAGGAACATAACTGG 79 + + + + _ __ sWXD118 CTCTTTTCCTTAATCCAACCC CCACTGTGCTATACTGCC 100 + + + + _ _ _ sWXD1 19 GATCAACACGGCTCTCGG CTGGGCTCTTGGCTAAGG 73 + + + + _ _ _ sWXD121 TCCTTTTATCCCCATATTTC TTTCTCTCAGCACATTTATCC 60 + + + + - - - Human Gene Mapping (HGM) (19) symbols are used for gene-specific STSs. GRP94 (20) and GF1A (21) are not HGM designations and refer to the 94-kDa glucose-regulated protein and to the erythroid DNA-binding protein. STSs generated from randomly isolated genomic DNA are named according to laboratory terminology with s representing STS, W representing Washington University, and XD identifying the X chromosome project followed by an acquisition number. STS length (bp) refers to the expected product size from human DNA. Nucleotides in parentheses indicate a degenerate oligonucleotide position. HUM, human; GOR, gorilla; CHI, chimpanzee; MAC, macaque; MOU, mouse; RAT, rat; YEA, yeast (Saccharomyces cerevisiae). GenBank accession numbers for the human sequence are listed, followed by accession numbers for the mouse and chicken sequence, respectively, when applicable. +, Detection of a band of identical or very nearly identical size; - no strong products of nearly identical size. optimal TNK buffer was determined

Conserved Sequence-Tagged Sites

Molecular Evolution and Nucleotide Sequences of the Maize Plastid Genes for the Cy Subunit of CFI (Atpa)And the Proteolipid Subunit of Cfo (Atph)

Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting Mathieu Blanchette and Martin Tompa

Conserved Sequence Human Genome Transcription

The Most Conserved Genome Segments for Life Detection on Earth and Other Planets

A Conserved Heptamer Motif for Ribosomal RNA Transcription

Intron Evolution As a Population-Genetic Process

Conserved Sequence (B Lymphocyte-Specific Gene Regulation/Promoters) TRISTRAM G

Finding Patterns in Biological Sequences

A Highly Conserved Sequence in the 3 -Untranslated Region

Multiple Sequence Alignment Is Not a Solved Problem

Downloaded the 10 Cotton Cp Genomes (8 Diploid and 2 Allotetraploid Species: Gossypium Arboreum L., G

Evolution International Journal of Organic Evolution Published by the Society for the Study of Evolution