Mapping by Sequence Homology
Total Page:16
File Type:pdf, Size:1020Kb
Commentary 1609812 Europeon Journal Eur J Hum Genet 1996;4:247-249 of Human Genetics StylianosE. Antonarakis Mapping by Sequence Homology Laboratory of Human Molecular Genetics, Department of Genetics and Microbiology, University of Geneva, and Division of Medical Genetics, Cantonal Hospital of Geneva, Switzerland An important current preoccupation of the human investigators in Europe, Japan and the US are also con geneticist is the mapping of disease phenotypes and genes. tributing to this fast-growing and useful resource. Besides Linkage analysis using DNA polymorphic markers in the nucleotide sequence information, an important anno appropriate families, and chromosomal abnormalities as tation of these ESTs is almost certainly their position in sociated with pathologies were the two principal ways of the growing physical maps of the human chromosomes. assigning phenotypes to particular chromosomal regions. The current international mapping efforts for the ESTs The menu for the mapping of genes or other DNA seg include the use of PCR amplification in either YACs or ments of interest contains many more choices: PCR or radiation hybrids [4, 5]. hybridization alternatives using somatic cell or radiation Exon trapping [6] has proven to be an outstanding hybrids, or recombinant clones of physical maps, as well method of identifying partial gene sequences from cloned as fluorescent in situ hybridization and linkage analysis of genomic DNA (or to be more accurate, sequences that polymorphic markers are the most common modalities by start with a splice acceptor and end with a splice donor which a wealth of mapping information has already been sequence). Several disease-causing genes have been iden accumulated. The Genome Database now contains tified with this method using clones that map in critical (search of 15 March 1996) 84,609 mapped DNA ‘objects’; regions (examples include the huntingtin, presenilin and among those 11,121 are DNA polymorphic markers, leptin genes) [7], Furthermore, exon trapping is now 33,276 are STS (sequence-tagged sites), 5,902 are genes extensively used to create transcription (genic) maps for (4,102 of those are cloned). The Online Mendelian inheri several chromosomes using chromosome-specific cos- tance in man (OMIM) now contains 8,002 entries, ap mids, PACs, BACs, YACs as the starting material. After proximately half of which are disease phenotypes. There elimination of the false-positive sequences the over is still, however, a long way to go until the mapping of all whelming majority (nearly all) of the trapped sequences human genes that number probably between 70,000 and map back to their clones of origin. Comparison of the 100,000 [1], sequences of these trapped exons with the nucleotide One of the most exciting recent developments in sequences in the public databases often provides identi genomics research is the determination of approximately ties (or near identities) to one or more ESTs. This identity 400,000 partial sequences of human cDNAs (ESTs; ex is obviously sufficient initial evidence for mapping the pressed sequence tags) from different tissues and the corresponding EST(s), and therefore their cDNA clones, availability of these sequences through the public data to the chromosome or chromosomal region of the trapped bases [2], These ESTs have been accumulated mainly sequence. This mapping by sequence homology/identity from the Merck-Wash.U project (255,163 sequences on will probably become a popular and useful method of 13 February 1996, http://genome.wustl.edu/est) and the mapping in the coming years since sequencing is the ulti TIGR project [about 150,000 sequences, ref 3]. Other mate method of exploration and characterization of the © 1996 S. Karger AG, Basel Prof. Stylianos E. Antonarakis Received: May 20, 1996 KARGER 1018-4813/96/0045-0247$ 10.00/0 Division de Génétique Médicale Revision received: E-Mail [email protected] Centre Médical Universitaire July 15,1996 Fax+ 41 61 306 12 34 This article is also accessible online at: 1, rue Michel Servet Accepted: August 5, 1996 http://www. karger, ch httpV/BioMedNet.com/karger CH-1211 Genève (Switzerland) genomes. Below, we refer to this mode of localizing several disease-related genes. Recently, Lovett et al. [ 13] cDNAs as mapping by sequence homology (MSH). have applied this method to create chromosome-specific Let’s look at some numbers. During the last 18 months, cDNA libraries by using chromosome-specific cosmids as several investigators have used exon trapping from chro a starting reagent. As with the experience with trapped mosome-specific cosmids to identify portions of tran exons, 9.8% of the selected cDNA clones showed identity scription units. In our laboratory, 1,200 randomly picked to ESTs. After elimination of cDNA clones with repetitive chromosome 21-specific cosmids have been used and a elements, 78% (108/139) of the cDNAs map back to the total of 559 different ‘exons’ have been identified [8], To appropriate chromosome [13], This represents an out date we have mapped 133 of them back to chromosome standing enrichment for chromosome-specific cDNAs. It 21; no exon has been mapped elsewhere in the human seems however that MSH after cDNA selection is not spe genome. We are therefore confident that the majority of cific enough since there is one in five mapping missassign- trapped sequences map to the chromosome of origin. ment. This is probably due to capture of cDNAs that Exons from 41 % of the known chromosome 21 genes have belong to sequence-related gene families that map in dif been trapped, and interesting homologies with other ferent chromosomes. Thus, MSH from chromosome-spe genes have been found. Homology searches have also cific cDNA selection needs to be confirmed by another revealed identity or near identity (homology of 100 or mapping method. >95%, respectively) of 49 trapped sequences (8.8%) with How does the MSH compare with other existing map unmapped ESTs (more than 49 ESTs are identified be ping methods? In MSH after exon trapping there is no cause of the redundancy of the EST database). We pro difference with the methods based on hybridization, since pose that the vast majority of their corresponding cDNAs these latter depend on nucleic acid homology. Errors can map to chromosome 21. In the laboratories of Buckler be due to hybridization of related sequences scattered and McCormick [9], a similar experiment was done using throughout the human genome; the same is true for the chromosome 12-specific cosmids. From a total of 936 dif computer ‘hybridization’ in which a 95% homology can ferent exons that recognized 37% of the known chromo be due to sequencing errors or DNA polymorphisms as some 12 genes, 60 (6.4%) showed identity or near identity well as evolutionarily related sequences. There is no dif to ESTs and their corresponding cDNAs presumably map ference either from the PCR amplification based methods to chromosome 12. The experience with chromosome 22 since these also depend on nucleotide matches, and am from the Buckler laboratory is similar [10]. From a total of plification products of related targets may have the same 603 different unique trapped sequences that recognized size. In addition, false-positive amplification products of 35% of the known chromosome 22 genes, 57 (9.5%) contamination of the PCR may result in erroneous map showed identity (or near) to ESTs. All of these results were ping conclusions. Furthermore the mapping methods of obtained by homology searches done around August/Sep linkage analysis and radiation hybrids provide a statistical tember 1995 and do not include searches in the EST data statement for order and distance between DNA ‘objects’ base of TIGR. Thus approximately 8% of the trapped and therefore as such give a best estimate which is always exons from the three different experiments (166 of 2,098) subject to changes with the availability of more data, elim recognized and provisionally mapped cDNAs for which ination of genotyping errors (for linkage) or false-positive ESTs are available in the public databases. The homology and negative results (for radiation hybrids), or alternative computer search therefore provided considerable evi ways of statistical treatment of the data. dence for the localization of many cDNA fragments to a In summary, MSH may become a popular selection in specific human chromosome or chromosomal segment. the menu of mapping possibilities since it provides infor As with every mapping method, confirmation of this ini mation that is not of inferior quality or accuracy from the tial evidence by MSH is needed for the definitive map other existing alternatives. ping. It is well accepted in the genetic mapping communi ty and a common practice over the years that two inde pendent mapping methods should be used to definitely Acknowledgements assign a mapping ‘object’ to a specific chromosomal region. The author’s genome research is supported by grants from the Swiss NSF 31.40500.94, the European Union CT93-0015 and the cDNA selection is an alternative method to isolate University and Cantonal Hospital of Geneva. The author thanks Drs transcribed sequences from selected genomic regions [11, H. Scott, H.M. Chen and M.K. McCormick for comments on the 12], It has also been successfully used in the cloning of manuscript. 248 Eur J Hum Genet 1996;4:247-249 Antonarakis References 1 Fields C, Adams MD, White O, Venter JC: 6 Buckler AJ, Chang DD, Graw SL, Brook JD, io Troffater JA, Long KR, Murrell JR, Stotler CJ, How many genes are in the human genome? Haber DA, Sharp PA, Housman DE: Exon Gusella JF, Buckler AJ: An expression-inde Nature Genet 1994;7:345-347. amplification: A strategy to isolate mammalian pendent catalogue of genes from human chro 2 Boguski MS, Schuler GD: ESTablishing a hu genes based on RNA splicing. Proc Natl Acad mosome 22.