Proc. Nati. Acad. Sci. USA Vol. 89, pp. 3681-3685, May 1992 Genetics Conserved sequence-tagged sites: A phylogenetic approach to mapping (primates//sequence conservation/evolution/PCR) RICHARD MAZZARELLA*, VITrORIO MONTANARO*, JUHA KERE*, ROLLAND REINBOLD*, ALFREDO CICCODICOLAt, MICHELE D'URSOt, AND DAVID SCHLESSINGER*t *Department of Molecular Microbiology and the Center for Genetics in Medicine, Washington University School of Medicine, St. Louis, MO 63110; and tInternational Institute of Genetics and Biophysics, Consiglio Nazionale della Ricerche, Naples, Italy Communicated by James D. Watson, February 3, 1992

ABSTRACT Cognate sites in that diverged 100 genomic mapping studies wherever they are feasible-and, million years ago can be detected by PCR assays based on for example, often permit the easy discrimination of members primer pairs from unique sequences. The great majority of of a family, or of true from pseudogenes (14), that such syntenically equivalent sequence-tagged sites (STSs) from are difficult by hybridization methods. human DNA can be used to assemble and format corresponding In the formulation of Olson et al. (13), a genomic map is maps for other primates, and some based on gene sequences are formatted with sequence-tagged sites (STSs). Each STS is shown to be useful for mouse and rat as well. Universal genomic characterized by a unique primer pair and the corresponding mapping strategies may be possible by using sets of STSs PCR product; different types of STSs may be developed for common to many mammalian species. various purposes. Some STS PCR products detect length polymorphisms, usually based on their content of dinucle- The unity of biochemistry is based on conservation of genes otide or other simple sequence repeats (15, 16). Such STSs during evolution. With the increasing interest in maps of are highly informative probes for genetic linkage mapping and complex genomes (1), the subclass ofgenomic sequences that provide markers for physical maps, but they would usually be are most tightly conserved takes on a special significance. relatively species specific. STSs for physical mapping can Such sequences are already reagents to detect homologous also be developed indifferently from any fragments ofunique genes in different organisms (2, 3). But the order as well as sequence (17), but such STSs could well be specific to one or the sequence ofgenes tends to be conserved across extensive a few species (see below). In contrast, STSs derived from blocks of syntenically equivalent DNA in complex orga- evolutionarily conserved gene sequences, including cDNA nisms. As a result, such sequences may provide a way to sequences (18), could provide relatively universal mapping assemble overlapping clones and format the resulting ge- reagents. nomic maps in many species-even across branches of the Despite the potential advantages, it is not intuitively ob- . vious that syntenically equivalent STSs are feasible mapping Relevant syntenically equivalent relationships are well tools. Although PCRs can yield greater specificity, the prim- established-for example, between mouse and human ge- ers are much shorter than hybridization probes and require nomes. Spanning 100 million years of evolution (4), the greater stringency to maintain that specificity. Syntenically content and order of genes are conserved across regions of equivalent STS content mapping then depends on the extent the order of a cytogenetic band (5) [2-10 megabases (Mb) or to which evolution has conserved pockets of sufficiently high more]. On the X chromosome, where the content has been sequence identity between disparate species. have Apart from some information about cDNA sequences, largely fixed during evolution (6), only a few regions little is known about the conservation of genomic sequence, shifted their relative positions (7), and, on autosomes, the especially intergenic regions. It has therefore been unclear to gene content of large blocks of chromosomal DNA seems to what extent STSs from random fragments ofhuman DNA can have been conserved during translocation from one chromo- detect corresponding regions even among closely related some to another (8). species. This study has examined the ability of STS primers Clones for different species have been correlated by direct made from human DNA sequences (i) to function across hybridization of cDNA probes. Even in cases in which a species at the genomic level of DNA complexity; (ii) to cDNA from human is only partially homologous, it can be accommodate a mismatch or degenerate sequence to increase used as a hybridization probe to screen for the corresponding their likelihood of success; and (iii) to provide syntenically cDNA in mouse (e.g., see ref. 9). Hybridization provides a equivalent products from cDNA sequences and random proven way to organize overlapping clones into long-range DNA fragments. We then attempted to determine the degree maps (10-12), and it is fast and cheap. However, most current to which such "conserved STSs" are possible. syntenic equivalence studies with cDNA probes involve isolation of a species-specific cDNA before location in the genome is determined; few random DNA probes have been MATERIALS AND METHODS shown to cross-hybridize efficiently and specifically with PCR Analysis. One hundred nanograms of total genomic genomic DNA from a variety of species. DNA from each mammalian species and was ampli- An alternative approach to cross-species physical mapping fied by PCR in a 15-,ul reaction mixture containing 6 pmol of is conceivable based on use of the PCR. As has been the appropriate primer pair as listed in Table 1 and 0.3 unit of discussed (13), their portability, ease, and potential for au- Amplitaq (Perkin-Elmer/Cetus). For each primer pair, the tomation make PCR methods an attractive alternative for Abbreviations: STS, sequence-tagged site; Mb, megabase(s); F9, The publication costs of this article were defrayed in part by page charge factor IX; HPRT, hypoxanthine-guanine phosphoribosyltransferase; payment. This article must therefore be hereby marked "advertisement" GRP94, 94-kDa glucose-regulated protein; GLA, a-galactosidase A. in accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be addressed.

3681 Downloaded by guest on October 8, 2021 3682 Genetics: Mazzarella et al. Proc. Natl. Acad. Sci. USA 89 (1992)

Table 1. Gene-specific and random genomic STSs used for study of sequence conservation

STS GenBank Primer sequences (5'- STS Specific PCR product name Primer Primer 2 1length HUM GOR CH T MAC MOU RAT YEA

F9 J00137, CTTCAGTACCTTAGAGTTCC CCATATTTGCCTTTCATTGC 221 + + + M23109 HPRT M31642, AGCTTGCTGGTGAAAAGG TCATTATAGTCAAGGGCATATC 278 J00423 GRP94 X15187, CTGAA (G/A)AAGGGCTATGAAGT AACCTCTT (C/G) CCATCAAA (C/T) TC 89 J03297, M14772 MIC2 M22556 GCTCTATGTTTCCAAGAAG GTTTACAGCCCTCTGAATG 84 + + + + _ __ PLP M15026 GAGAAGATGGAGCCCTTA TCCTCTTCTCCTGCAATGAAA 153 + + + + + + _ HRASP X00419 CTGAACCACCAGTGCTTCG CACACCATCACAGACAGCC 167 + + + + - - - AMD M21154 CTGTATCTGCCTCTATTTC GTTACTAAAGTTCAGGTTCC 139 + + + + + + _ GF1A M30601 AGCCCAGGTTAATCCCCAG TGTGGAGGACACCAGAGCAG 107 + + + + - - - POLA X06745 CAGGGAGTTTTGTATCTTC CTTTTTCAGTCTTTCTAGGG 83 + + + + _ _ _ GLA M13571 CTAGAGCACTGGACAATGG GTCAAGGTTGCACATGAAG 80 + + + + + + - TBG M14091 CAGCGTTTTCATAATGTTGC TAATATGGACAGGGAGTAG 93 + + + + _ _ _

L1CAM M30257 TGAATACCCTCCCAGGCAC ATCTTCCCAGGCATTTTAAG 99 + + + _ _ _ COL4A5 M31115 CAGGAGAAAAAGGTAGTAAAGG TTTTGAGCCCAGAAGATTTG 80 + + + + _ _ _ sWXD93 CATAGAACAAGCAGAAGG CAGAAAGAAGATATTGCTGG 102 + + + + sWXD94 CAAAACTTTCCTACCTACC CTGACCATACACATAATCC 130 + + + + - - - sWXD95 AATTTAGGCAAGAGCAGC TTCTCCCCAAATAAATCCC 60 + + + + - - - sWXD96 ATCGTGCTGCTGTACTCC GGCAGATATGAAACTGAGG 128 + + + - - - - sWXD97 GGAGGGAAGAAGAGAGGG CAGCGAGAGTTAGTGAGG 137 + + + - _ _ _ sWXD98 CAACTGGGATAAGTCACC GTGATTGAGAATGAATGGG 106 + + + + - - - sWXD99 CCCTTCACTCACCTTCCC CAGATAGTTCTTTATAGCAGTGCG 102 + + + + - - - sWXD1 00 CGTGCTTAGGCTTAATCCCC GAACTGACTGTAGAGAAGG 145 + + + + _ __ sWXD101 GAAATTCTTCACTACCTCC AACACATCTCAGACATCC 160 + + + + - - - sWXD1 02 CTTTGATAGTTCAGGTTTGC GAGAATCTTCTGTCTAGG 122 + + + sWXD115 GCTGTAGATTCACTTTCG AAGACCTACCAAAGCTCC 142 + _ + _ _ sWXD1 17 CATTTTGTAGCTGAGAAAGG GCAATTCAAGGAACATAACTGG 79 + + + + _ __

sWXD118 CTCTTTTCCTTAATCCAACCC CCACTGTGCTATACTGCC 100 + + + + _ _ _ sWXD1 19 GATCAACACGGCTCTCGG CTGGGCTCTTGGCTAAGG 73 + + + + _ _ _ sWXD121 TCCTTTTATCCCCATATTTC TTTCTCTCAGCACATTTATCC 60 + + + + - - -

Human Gene Mapping (HGM) (19) symbols are used for gene-specific STSs. GRP94 (20) and GF1A (21) are not HGM designations and refer to the 94-kDa glucose-regulated protein and to the erythroid DNA-binding protein. STSs generated from randomly isolated genomic DNA are named according to laboratory terminology with s representing STS, W representing Washington University, and XD identifying the X chromosome project followed by an acquisition number. STS length (bp) refers to the expected product size from human DNA. Nucleotides in parentheses indicate a degenerate oligonucleotide position. HUM, human; GOR, gorilla; CHI, ; MAC, macaque; MOU, mouse; RAT, rat; YEA, yeast (Saccharomyces cerevisiae). GenBank accession numbers for the human sequence are listed, followed by accession numbers for the mouse and chicken sequence, respectively, when applicable. +, Detection of a band of identical or very nearly identical size; - no strong products of nearly identical size.

optimal TNK buffer was determined [from among 10 mM sequence analysis. Dideoxynucleotide sequencing (24) was Tris-HCI, pH 8.6/1.5 mM MgCl2/5 mM NH4Cl/25, 50, or 100 done on the double-stranded products using deoxyadenosine mM KCI-a series developed by V. Nowotny (personal [a-[35S]thio]triphosphate (Amersham) in the PCR thermal communication)]. The optimal reaction buffer for all primer cycler for 25 cycles. The products were then analyzed on a pairs listed in Table 1 was H-TNK (100 mM KCI), except for 5% Hydrolink gel (AT Biochemicals, Malvern, PA). TBG, which was optimal in M-TNK (50 mM KCI), and MIC2, which was optimal in L-TNK (25 mM KCI). Reactions that still showed nonspecific PCR products were optimized for RESULTS primer annealing temperature in 5°C intervals. Each reaction Predicted Likelihood of Conserved STSs from GenBank mixture was incubated in a DNA thermal cycler 480 (Perkin- Searches. One way to test the prospects for cross-species Elmer/Cetus) for 35 cycles at 94°C for 1 min, 55°C for 2 min STSs is based on computer comparisons of putative primer 1 and [60°C for vascular cell adhesion molecule (LiCAM)], pairs and PCR products for homologous genes whose se- for 2 min. The PCR products were then fractionated on 72°C quences are recorded in GenBank. A number of such tests a 10% (vol/vol) glycerol/ix TBE (89 mM acrylamide/15% In one a simulation of STS Tris borate/89 mM boric acid/2 mM EDTA, pH 8.0) gel. The have been encouraging. approach, development was done for cDNA sequences chosen among products were visualized by ethidium bromide staining and UV light. those known for both human and mouse. The sequences of the X chromosome-linked genes Primer pairs were inferred from the sequence with the aid and ornithine transcarbamylase were searched with a pro- of an program (ref. 22; see text). For example, in optimizing the choice of STS the case of phosphoglycerate kinase and ornithine transcar- gram developed to help optimize primer bamylase, the top-rated primer pairs were, respectively, pairs (22). This program evaluates all the possible primer 5'-TAGTCCTTATGAGCCACC-3' and 5'-lTTTCCCTTC- pairs for a given target sequence based on primer-self, CCTTCTTCC-3', defining a product of 227 base pairs (bp), primer-primer, and primer-product homologies. All primer and 5'-CTAAAGAAGCATCCATCCC-3' and 5'-GATAT- pairs conforming to a set of parameters suitable for an STS TGTTCCCATCCCC-3', yielding a product of 146 bp. are scored and ranked. DNA Sequencing. Thirty-microliter reaction mixtures were For each cDNA, essentially the same primer pair was rated prepared as described above for those PCR products that most highly for both human and mouse DNA (see Materials were to be analyzed by DNA sequencing. The entire reaction and Methods; one primer pair showed a single mismatch mixture was fractionated on a 7% acrylamide/ix TBE gel between the two species, but PCR products were neverthe- and the appropriate bands were excised. The DNA was then less from equivalent sites; R.M. and V.M., unpublished eluted from the gel slice in 0.5 M ammonium acetate/1 mM data). Two of the other four best primer pairs for phospho- EDTA and precipitated by ethanol as described (23). The glycerate kinase and one of the other four primers for DNA precipitate was resuspended in 10 ,ul ofTE (10 mM Tris, ornithine transcarbamylase that were top-rated by the pro- pH 8.0/1 mM EDTA, pH 8.0) and 2 ,ul was used for DNA gram would also produce cross-species STSs. Downloaded by guest on October 8, 2021 Genetics: Mazzarella et al. Proc. Natl. Acad. Sci. USA 89 (1992) 3683

Experimental Test of Conserved STSs. In a second ap- devised that amplified the corresponding product from all proach, three genes were selected for which sequences are seven species tested (Table 1, entry 3). Fig. 1A shows the known in human and mouse. The genes for factor IX (F9) and PCR results with primer pairs for F9 and GRP94. hypoxanthine-guanine phosphoribosyltransferase (HPRT) The PCR products from human and mouse for F9 and were chosen because their human / boundaries HPRT and from human, mouse, and chicken for GRP94 were were also known (GenBank accession nos. K02402 and sequenced to confirm that the products arose from the known M26434, respectively). The chicken sequence for one of the genes. Therefore, primer pairs could produce STSs from genes, the 94-kDa glucose-regulated protein (GRP94), is also species that diverged from any common ancestor >100 known, and in that case the exon/intron structure is also million years ago, and, for GRP94, a degenerate mixture of known for the chicken accession no. six primers produced unique syntenically equivalent prod- (GenBank M31321). ucts for six mammals and chicken. Primer pair sequences were sought that overlapped enough to PCR Analysis of Candidate Conserved STSs by Using Hu- yield potential conserved STSs. Each potential primer was man STS Primer Pairs. The "zoo PCR" approach initiated then tested against all the sequences in GenBank, and was with computer searches was next extended to blind experi- found to have at least three mismatches with all sequences mental tests on primer pairs developed from instances in outside of the cognate gene. which only a human cDNA sequence was available. In these For two genes (F9 and HPRT), primer pairs were selected cases, there was no information about exon/intron borders. that gave homologous PCR products from human, macaque, We therefore used only the 5' end of the cDNA sequence, mouse, and rat (Table 1, entries 1 and 2). The primer pair for which is largely an untranslated sequence colinear with F9 also amplified the equivalent PCR product from gorilla genomic DNA, for the development of primer pairs. and chimpanzee. A series of 10 STSs were developed from published se- For the third gene (GRP94), no single primer pair was quences for 10 X chromosome-linked genes. The primer pairs found to extend from human to mouse and chicken. In that that were selected for these X chromosome-linked genes are case, however, a mixture of six oligonucleotide primers was listed in Table 1. These primer pairs were optimized for production of strong, unique products from human DNA in

a 0 cr a 0 the absence of products from yeast DNA. The strong prod- 0 = 00 -d u :0 0 m 0 0 0 ucts increased the likelihood of unequivocal scoring of PCR A a: 0 u 2 2c: > u 2 >- against DNA templates of other mammalian species; the negative scoring against yeast DNA would facilitate the use of the primer pair in screening for cognate yeast artificial chromosomes to use in the assembly of long-range physical maps. -275 When they were tested against yeast, rat, mouse, and four -234 primate , all 10 primer pairs gave strong, unique -194 products from all of the primates (Table 1). Three primer -118 pairs also gave equivalent products from mouse and rat DNA (Table 1, entries 5, 7, and 10). In no case were similar-sized 0 products detected in yeast DNA. Fig. 1B shows PCR data for C 0 a c 0 a CL = E oa a u 0 two of the more inclusive 10 primer pairs. = E s n U :3 0 E - E C:: u 0 0 0 0 2 cs These results probably underestimate the potential fraction B X wr >- of gene-specific cross-species STSs for mouse and human, since the 5' untranslated region ofthe gene used in generation of the STS diverges as much as 3-fold more rapidly than the coding sequences (25, 26). We extended the studies of cross-species STSs to include -194 primer pairs developed from 15 random fragments of DNA (Table 1, entries 14-28) isolated from a human 4X cell line and -118 verified to be X chromosome specific by PCR tests with hamster-human hybrid cells containing only human X or -72 single other human chromosomes. Ten of the primer pairs gave strong products in all the primates, three gave similar- - a 0 0 = E t 5l sized bands in all the primates except macaque, and one E = C r_ 0 L r c C E E ° _ 0 I- C :3 0 0 C functioned in > only human and chimpanzee. No strong prod- t_ O ucts were detected in mouse, rat, or yeast with any of these "random" STSs (see Fig. lCfor sample data). Therefore, the vast majority of STSs obtained from the -194 should be useful in determining evolutionary relationships to primates that diverged as much as 30 million years ago, but it is predominantly gene sequences that are likely to be -118 conserved enough to provide STSs that extend further in phylogeny. Evolution of Genes and Genomic Sequences. Conventional -72 -72 cloning methods have been used to look at features of the evolution ofglobin genes (2, 3). Syntenic STSs could provide FIG. 1. Analysis of PCR products from two primer pairs gener- an entree to analysis of the organization and evolution of ated from gene sequences known in more than one species (A), from gene sequences known only in human (B), and from randomly corresponding segments ofgenomes without the necessity for isolated genomic sequences (C). The results shown here are products cloning. We initiated a study for a particular case, the X from primer pairs for F9 (A Left), GRP94 (A Right), GLA (B Left), chromosome-linked glucose-6-phosphate PLP (B Right), sWXD100 (C Left), and sWXD94 (C Right). Positions dehydrogenase (G6PD), and we have observed that conser- of marker DNAs (bp) are indicated on the right. vation of the sequence is sufficient for the great majority of Downloaded by guest on October 8, 2021 3684 Genetics: Mazzarella et al. Proc. Natl. Acad. Sci. USA 89 (1992) primer pairs to function in all primate species (V.M. and for syntenic STSs, which also span simple sequence repeats. M.D., unpublished data). Sequence information from both They thus could also be potentially useful for studies of the and of G6PD shows that these primer pairs polymorphism in at least some species. amplify DNA corresponding to evolutionarily conserved How Universally Can Maps Be Based on Conserved STSs? sites. From searches in GenBank and data like those presented We have also examined the sequence conservation of one here, at least 1 in 10 STSs made starting from a human cDNA gene-specific STS, a-galactosidase A (GLA; Fig. 1B Left) sequence will extend not only to primates but also to species and two random STSs (sWXD100 and sWXD94; Fig. 1C). much lower on the phylogenetic scale. Such occurrences The DNA sequence of the PCR products of each of these obviously depend on the degree to which a local sequence has STSs is shown in Fig. 2. especially stringent requirements for conservation or devi- The gene-specific STS primer pair detects the correspond- ates statistically from an average random distribution of ing site in mouse DNA as well as in the other primates, since mutational changes. It is encouraging that even 1 in 10 STSs all the sequences are -=75% identical (Fig. 2A). One of the would be sufficient to generate, for example, a corresponding random STSs yields strong products ofthe same size from all STS map between mouse and human. This is primarily the primates, and these are identical at >90% of base pairs because syntenically equivalent regions span 1 Mb or more (Fig. 2B). The other random STS also produced strong bands (5). Thus, an STS every Mb (or 10% of a cohort of 30,000) in all primates, but the product from chimpanzee was rela- would be enough to correlate corresponding segments of tively smaller (Fig. 2C). In that case, the products show only mouse and human DNA, placing neighboring clones in con- =60% identity, but they are again almost certainly from text. equivalent sites, since the lower degree of similarity results A major step toward this goal has already been taken, with from a 14-bp in chimpanzee and two 3-bp insertions the development of the current census of >1500 human- in the other primate species. specific cDNA sequences in GenBank and several thousand more currently being processed (27, 28). In addition, im- provements in cloning technology are producing libraries of DISCUSSION yeast artificial chromosomes with average size approaching Conserved STSs for Primates. On the order of 30,000 STSs 1 Mb (29-31); thus, adjacent cloned units of DNA can be would yield a physical map with 100-kilobase resolution identified by an achievable number of STSs of the order of 1 across the 3 x 109 bp of a mammalian genome. For primates, Mb apart. at least to the level of Old World monkeys, a large fraction of Such a number of STSs could be provided by the coinci- the gene-specific STSs are in fact conserved; a pair ofprimers dence of identical primer pair sequences in randomly chosen from a human gene will likely show few if any mismatches bits of coding regions (as in the sample here). The develop- with other primates, and the sequence of the amplified ment of numbers of STSs conserved for relatively distant products can easily be verified to detect any differences. species could also be facilitated in at least two ways exem- Because there are on the order of 100,000 genes and plified here. First, if the sequences of both cDNAs are cDNAs are much longer than typical PCR products, there are known, primer pairs can often be chosen that permit the many sequences to choose from. Syntenic STSs can thus be generation of a corresponding PCR product from each spe- tailored. For example, in any case in which exon/intron cies. Ifthis route is considered important for particular genes borders are known, primer pairs can be chosen that bracket for which the cDNA sequence is only available from human, an intron. One can then discriminate the larger PCR product the mouse cDNAs can presumably be recovered and sys- of the true gene in the genome from the smaller product of tematically sequenced. many pseudogenes, which typically lack intronic sequences. Second, if the cDNA sequence is known for mouse and As another example, specific searches could be undertaken human but has diverged too much to permit the design of a

10 20 30 40 A HUMAN ttggcaaGGA CgCCTACcAT GGGCTGGCTG CAcTGGGAgC G GORILLA ttggcaaGGA CgCCTACcAT GGGCTGGCTG CAcTGGGAgC G CHIMPANZEE ttggcaaGGA CgCCTACcAT GGGCTGGCTG CAcTGGGAgC G MACAQUE ttggcaaGGA CgCCTACcAT GGGCTGGCTG CAcTGGGAgC G MOUSE ..ttggcGGA CtCCTACtAT GGGCTGGCTG CAtTGGGAaC G Consensus ------GGA C-CCTAC-AT GGGCTGGCTG CA-TGGGA-C G

10 20 30 40 50 60 B HUMAN AcGGATTTaT GTGcCAGAAA TAAGATAACT GAGGAAAgaA GTTGTTTGAC ATTCAtGGCA GORILLA AcGGATTTaT GTGcCAGAAA TAAGATAACT GAGGAAAatA GTTGTTTGAC ATTCAtGGCA CHIMPANZEE AtGGATTTaT GTGcCAGAAA TAAGATAACT GAGGAAAgaA GTTGTTTGAC ATTCAtGGCA MACAQUE AtGGATTTgT GTGtCAGAAA TAAGATAACT GAGGAAAgaA GTTGTTTGAC ATTCAcGGCA Consensus A-GGATTT-T GTG-CAGAAA TAAGATAACT GAGGAAA--A GTTGTTTGAC ATTCA-GGCA 70 80 90 HUMAN TGAGTCATTA GgTCCACTTG GGTTCGTTGA GTG GORILLA TGAGTCATTA GgTCCACTTG GGTTCGTTGA GTG CHIMPANZEE TGAGTCATTA GgTCCACTTG GGTTCGTTGA GTG MACAQUE TGAGTCATTA GaTCCACTTG GGTTCGTTGA GTG Consensus TGAGTCATTA G-TCCACTTG GGTTCGTTGA GTG

10 20 30 40 50 60 C HUMAN TTACCAtTaa ctatGAACAT TTATGc.TTT t... TCTCTC TGTATGAtgg ttaata...t FIG. 2. Cross-species nucleotide con- GORILLA TTACCAtTac tatgGAACAT TTATGctTTT t... TCTCTC TGTATGAtgg ttaatacctg servation of one gene-specific and two CHIMPANZEE TTACCAtTac tatgGNNCAT TTATGctTTT .... TCTCTC TGTATGA ...... MACAQUE TTACCAcTgc tatgGANCAT TTATGgtTTT tttcTCTCTC TGTATGAtgg ttaata...t random STSs. Nucleotide sequence from Consensus TTACCA-T------GA-CAT TTATG--TTT ----TCTCTC TGTATGA------the PCR products obtained with primer pairs for GLA (A), sWXD100 (B), and 70 80 sWXD94 (C). Consensus sequences HUMAN tgaGTGTCAA CTNNTGATGg GTAT shared all are in GORILLA agaGTGTCAA CTTCNNATGt GTAT by species capital letters; CHIMPANZEE ... GTGTCAA CTTCNNATGg GTAT dot is used to indicate where a gap was MACAQUE tgaGTGTCAA CTCNNNATGg GTAT introduced to produce optimal alignment; Consensus ---GTGTCAA CT----ATG- GTAT dash indicates a region of nonconsensus. Downloaded by guest on October 8, 2021 Genetics: Mazzarella et al. Proc. Natl. Acad. Sci. USA 89 (1992) 3685

cross-species primer pair, degenerate primers or a mixture of 1. National Research Council (1988) Mapping and Sequencing the primer pairs could be inferred that would function adequately Human Genome (National Academy Press, Washington). in many cases. 2. Fitch, D. H. A., Mainone, C., Goodman, M. & Slightom, J. L. (1990) J. Biol. Chem. 265, 781-793. A strategy for the design of such primers could follow 3. Fitch, D. H. A., Bailey, W. J., Taglie, D. A., Goodman, M., Sieu, procedures currently used for cloning cDNAs with oligonu- L. & Slightom, J. L. (1991) Proc. Natl. Acad. Sci. USA 88, cleotide mixtures based on known sequence. 73%-7400. Primer pairs could be generated from the sequence for seven 4. Li, W.-H., Gouy, M., Sharp, P., O'hUlgin, C. & Yang, Y.-W. (1990) amino acids in two conserved regions. Each primer could be Proc. Natl. Acad. Sci. USA 87, 6703-6707. constructed by placing the sequence for four consecutive 5. Nadeau, J. H. (1989) Trends Genet. 5, 82-86. amino acids with the least number of degenerate codons in 6. Ohno, S. (1967) Sex Chromosomes and Sex-Linked Genes (Spring- er, Berlin). the middle of the oligonucleotide and synthesizing a mixture 7. Lyon, M. F. (1988) Am. J. Hum. Genet. 42, 8-16. of all the corresponding oligonucleotides. The overall se- 8. Nadeau, J. H., Davisson, N. P., Doolittle, D. P., Grant, P., Hill- quence similarity ofmouse and human suggests that not more iard, A. L., Kosowsky, M. & Roderick, T. H. (1991) Mamm.

than 1 bp in a primer would mismatch, and nearly all cases Genome 1, S1-S534. . could be included without varying the third positions of the 9. Borsani, G., Tonlorenzi, R., Simmler, M. C., Dandolo, L., Arnaud, amino acid at the D., Capra, V., Grompe, M., Pizzuti, A., Muzny, D., Lawrence, C., 5' end and the penultimate amino acid at the Willard, H. F., Avner, P. & Ballabio, A. (1991) Nature (London) 3' end. The 3' end of the primer would consist of the first 2 351, 325-329. bases of the last codon and would thus match both species 10. Abidi, F. E., Wada, M., Little, R. D. & Schlessinger, D. (1990) exactly. 7, 363-376. Such a STS strategy could be implemented immediately by 11. Nelson, D. L., Ledbetter, S. A., Corbo, L., Victoria, M. F., using the large number of protein and DNA sequences Ramirez-Solis, R., Webster, T. D., Ledbetter, D. H. & Caskey, C. T. (1989) Proc. Natl. Acad. Sci. USA 86, 6686-6690. currently known for mouse or human and would minimize the 12. Riley, J., Butler, R., Ogilvie, D., Finniear, R., Jenner, D., Powell, number of primers in any PCR, thereby presumably increas- S., Anand, R., Smith, J. C. & Markham, A. F. (1990) Nucleic Acids ing the specificity of the product. In fact, based on yeast Res. 18, 2887-2890. sequences, highly degenerate primers have detected prod- 13. Olson, M., Hood, L., Cantor, C. & Botstein, D. (1990) Science 245, ucts corresponding to mouse and human cDNAs in one test 1434-1435. case (32). 14. Schlessinger, D., Little, R. D., Freije, D., Abidi, F., Zucchi, I., Porta, G., Pilia, G., Nagaraja, R., Johnson, S. K., Yoon, J., Current Prospects for Cross-Species Maps. We have fo- Srivastava, A., Kere, J., Palmieri, G., Ciccodicola, A., Montanaro, cused this discussion on primates and on mouse and human. V., Romano, G., Casamassimi, A. & D'Urso, M. (1991) Genomics The former is especially important for assembly of syntenic 11, 783-793. long-range physical maps and analysis of recent evolution, 15. Weber, J. L. & May, P. E. (1989) Am. J. Hum. Genet. 46, 95-106. where, for example, one can determine when chromosomal 16. Zoghbi, H. Y., Jodice, C., Sandkuijl, L. A., Kwiatkowski, T. J., Jr., McCall, A. E., Huntoon, S. A., Lulli, P., Spadaro, M., Litt, M., rearrangements occurred or when repetitive sequences such Cann, H. M., Frontali, M. & Terrenato, L. (1991) Am. J. Hum. as Alu begin to appear in homologous regions in the genome Genet. 49, 23-30. (33) and how they develop. The latter is likely to be critical 17. Green, E. D. & Olson, M. V. (1990) Proc. Natl. Acad. Sci. USA 87, both for map assembly and for subsequent functional anal- 1213-1217. yses. It may be that regions difficult to recover in DNA clones 18. Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Poly- from human, for example, will be more easily cloned as meropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R. & Venter, corresponding segments of mouse DNA; studies of pathol- J. C. (1991) Science 252, 1651-1656. ogy, development, and the effects ofdisrupting specific genes 19. Tenth International Workshop on Human Gene Mapping (1989) should continue in mouse and be extrapolated to human (34). Cytogenet. Cell Genet. 51, 1-1148. The extent to which syntenically equivalent STS maps 20. Lee, A. S., Bell, J. & Ting, J. (1984) J. Biol. Chem. 259,4616-4621. could be more generally developed depends in part on the 21. Zon, L. I., Tsai, S.-F., Burgess, S., Matsudaira, P., Bruns, G. A. course evolution has taken (which could, in fact, be assessed & Orkin, S. H. (1990) Proc. Natl. Acad. Sci. USA 87, 668-672. 22. Hillier, L. & Green, P. (1991) PCR Methods Appl. 1, 124-128. by such an endeavor). A set ofprimers can be used for a group 23. Dybczynski, I. & Plucienniczak, A. (1988) Biotechniques 6, 924- of related species across -30 million years and could be 926. extended to a second group of species as discussed above. 24. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Nati. Acad. However, where considerable branching occurred in phylog- Sci. USA 74, 5463-5467. eny [as at the "star" divergence during the origination of 25. Wu, C.-I. & Li, W.-H. (1985) Proc. Natl. Acad. Sci. USA 82, 1741-1745. mammals (35, 36)], a different set of STS primers might be 26. Eastel, S. (1990) Genetics 124, 165-173. required for each group of evolved organisms. One option 27. Eleventh International Workshop on Human Gene Mapping (1991) would then be to create a minimum number of sets of primer Cytogenet. Cell Genet. 58, 1-2200. pairs, which would provide a group of STSs capable of 28. Hochgeschwender, U. & Brennan, M. B. (1991) Bioessays 13, extending smoothly through evolution. 139-144. The initial investment required to develop large numbers of 29. Anand, R., Villasante, A. & Tyler-Smith, C. (1989) Nucleic Acids Res. 17, 3425-3433. gene-specific STSs has already been justified on many 30. Albertsen, H., Abderrahim, H., Cann, H., Dausset, J., Le Paslier, grounds. To them can be added the capacity of evolutionarily D. & Cohen, D. (1990) Proc. Natl. Acad. Sci. USA 87, 4256-4260. conserved STSs to superimpose the map of one species on 31. Larin, Z., Monaco, A. P. & Lehrach, H. (1991) Proc. Natl. Acad. another with uniform formatting. The recovery of cognate Sci. USA 88, 4123-4127. cloned DNAs then allows for further comparisons in detail, 32. Petrini, J. H. J., Huwiler, K. G. & Weaver, D. T. (1991) Proc. Natl. and, cumulatively, the pattern of change in the Acad. Sci. USA 88, 7615-7619. organization 33. Deininger, P. L. (1989) in Mobile DNA, eds. Berg, D. E. & Howe, of genomes during evolution can be discerned. M. M. (Am. Soc. Microbiol., Washington), pp. 619-636. 34. Capecchi, M. R. (1989) Science 244, 1288-1292. This work was supported by National Institutes of Health Grant 35. Kimura, M. (1983) The Neutral Theory of HG00247 and funds from the Washington University/Monsanto (Cambridge Univ. Press, Cambridge). Biomedical Research agreement. 36. Gillespie, J. H. (1984) Proc. Natl. Acad. Sci. USA 81, 8009-8013. Downloaded by guest on October 8, 2021