Life with 6000 Genes Author(S): A. Goffeau, B. G. Barrell, H. Bussey, R
Total Page:16
File Type:pdf, Size:1020Kb
Life with 6000 Genes Author(s): A. Goffeau, B. G. Barrell, H. Bussey, R. W. Davis, B. Dujon, H. Feldmann, F. Galibert, J. D. Hoheisel, C. Jacq, M. Johnston, E. J. Louis, H. W. Mewes, Y. Murakami, P. Philippsen, H. Tettelin and S. G. Oliver Source: Science, Vol. 274, No. 5287, Genome Issue (Oct. 25, 1996), pp. 546+563-567 Published by: American Association for the Advancement of Science Stable URL: https://www.jstor.org/stable/2899628 Accessed: 28-08-2019 13:31 UTC REFERENCES Linked references are available on JSTOR for this article: https://www.jstor.org/stable/2899628?seq=1&cid=pdf-reference#references_tab_contents You may need to log in to JSTOR to access the linked references. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at https://about.jstor.org/terms American Association for the Advancement of Science is collaborating with JSTOR to digitize, preserve and extend access to Science This content downloaded from 152.3.43.41 on Wed, 28 Aug 2019 13:31:29 UTC All use subject to https://about.jstor.org/terms by FISH across the whole genome. A subset of the 37. G. D. Billingsleyet al., Am. J. Hum. Genet. 52, 343 (1994); L. M. Mulliganet al., Nature 363, 458 (1993); results were selected in which there was no evi- (1993); S. S. Schneider et al., Proc. Natl. Acad. Sci. P. Edery et al., ibid. 367, 378 (1994). dence of the YAC having been chimeric, and the U.S.A. 92, 3147 (1995). 42. The potential effect of a comprehensive gene map is position was resolved to a cytogenetic interval (as 38. R. Sherrington et al., Nature 375, 754 (1995); D. well illustrated by the fact that 82% of genes that opposed to fractional length only). A G6n6thon Levitanand 1.Greenwald, ibid. 377, 351 (1995); S. A. have been positionally cloned to date are marker was given for each of these YACs, which Hahn et al., Science 271, 350 (1996). represented by one or more ESTs in GenBank (http:/ allowed the position in centimorgans to be 39. S. Tugendreich et al., Hum. Mol. Genet. 3, 1509 /www.ncbi.nlm.nih.gov/dbEST/dbESTL genes/). determined. Because this study contained no data (1994); D. E. Bassett Jr., M. S. Boguski, P. Hieter, 43. K. H. Fasman, A. J. Cuticchia, D. T. Kingsbury,Nu- for chromosomes 19, 21, and 22, an alternative cleic Acids Res. 22, 3462 (1994). strategy was used: Genes that are nonrecombinant Nature 379, 589 (1996); P. Hieter, D. E. Bassett Jr., with respect to G6n6thon markers [table 5 in C. Dib D. Valle, Nature Genet. 13, 253 (1996). 44. M. Bonaldo et al., Genome Res. 6, 791 (1996). et al., Nature 380 (suppl.), iii (1996)] and have 40. The scoring systems were amino acid substitution 45. Consensus sequences from at least three overlap- well-known cytogenetic locations [Online Mende- matrices based on the PAM (point accepted muta- ping ESTs were generated from 294 UniGene clus- lian Inheritance in Man (OMIM) at http://www. tion) model of evolutionary distance. Pam matrices ters. Of the 230 (78%) redesigned primer pairs, 188 ncbi.nlm.nih.gov,] were used to establish the may be generated for any number of PAMs by ex- (82%) of these yielded successful PCR assays. cross-reference. trapolation of observed mutation frequencies [M. 0. 46. We thank M. 0. Anderson, A. J. Collymore, D. F. 28. N. E. Morton, Proc. Natl. Acad. Sci. U.S.A. 88, 7474 Dayhoff et al., in Atlas of Protein Sequence and Courtney, R. Devine, D. Gray, L. T. Horton Jr., V. (1991). Structure, M. 0. Dayhoff, Ed. (National Biomedical Kouyoumjian, J. Tam, W. Ye, and 1. S. Zemsteva 29. Detailed instructions and examples are provided on Research Foundation, Washington, DC, 1978), vol. from the Whitehead Institute for technical assist- the Web site. 5, suppl. 3, pp. 345-352; S. F. Altschul, J. Mol. Evol. ance. We thank W. Miller,E. Myers, D. J. Lipman, 36, 290 (1993)]. PAM matrices were customized for 30. G. D. Schuler, J. A. Epstein, H. Ohkawa, J. A. Kans, and A. Schaffer for essential contributions toward scoring matches between sequences in each of the Methods Enzymol. 266,141 (1996). the development of UniGene. Supported by NIH 15 species pairs; for the pool of remaining proteins 31. Five additional assays yielded ambiguous results as awards HG00098 to E.S.L., HG00206 to R.M.M., ("Other organisms" in Table 4), the PAM120 matrix a result of the presence of an interferingmouse band HG00835 to J.M.S., and HGO0151 to T.C.M., and was used because it has been shown to be good for or poor amplification. by the Whitehead Institutefor Biomedical Research general-purpose searching [S. F. Altschul, J. Mol. 32. This could be the result of either a trivialprimer tube and the Wellcome Trust. T.J.H. is a recipient of a Biol. 219, 555 (1991)]. The BLASTX program labeling erroror a sequence clustering errorin which [W. ClinicianScientist Award from the Medical Research Gish and D. J. States, Nature Genet. 3, 266 (1993)] two separate genes were erroneously assigned to Council of Canada. D.C.P. is an assistant investiga- takes a inucleotide sequence query (EST or gene), the same UniGene entry. tor of the Howard Hughes Medical Institute. The translates it into all six conceptual ORFs, and then 33. Typing errors in a framework marker would allow a Stanford Human Genome Center and the Whitehead close EST to map with significant lod scores (loga- compares these with protein sequences in the data- Institute-MITGenome Center are thankful for the rithmof the odds ratio for linkage) to a correct chro- base; TBLASTN performs a similar function but in- oligonucleotides purchased with funds donated by mosome location but would tend to localize the stead searches a protein query sequence against marker in a distant bin, in order to minimize "double- six-frame translations of each entry in a nucleotide Sandoz Pharmaceuticals. G6n6thon is supported by breaks" caused by the erroneous typings of the sequence database. Searches were performed with the Association Francaise contre les Myopathies and framework marker. E = 1e-6 and E2 = 1e-5 as the primaryand second- the Groupement d'Etudes sur le Genome. The 34. J. M. Craig and W. A. Bickmore, Bioessays 15, 349 ary expectation parameters. Sanger Centre, G6n6thon, and Oxford are grateful (1993). 41. M. A. Pericak-Vance et al., Am. J. Hum. Genet. 48, for support from the European Union EVRHESTpro- 35. V. Romano et al., Cytogenet. Cell Genet. 48, 148 1034 (1991); W. J. Strittmatter et al., Proc. Natl. gramme. The Human Genome Organizationand the (1988). Acad. Sc. U.S.A. 90, 1977 (1993); R. Fishel et al., Wellcome Trust sponsored a series of meetings from 36. S. Bahram, M. Bresnahan, D. E. Geraghty, T. Spies, Cell 75, 1027 (1993); N. Papadopoulos et al., Sci- October 1994 to November 1995 without which this Froc. Natl. Acad. Sci. U.S.A. 91, 6259 (1994). ence 263, 1625 (1994); R. Shiang et al., Cell 78, 335 collaboration would not have been possible. physicalenvironment. S. cerevisiaehas a life Life with 6000 Genes cycle that is ideally suited to classical ge- netic analysis,and this has permittedcon- A. Goffeau,*B. G. Barrell,H. Bussey, R. W. Davis, B. Dujon, struction of a detailed genetic map that H. Feldmann, F. Galibert,J. D. Hoheisel, C. Jacq, M. Johnston, defines the haploidset of 16 chromosomes. Moreover,very efficient techniques have E. J. Louis, H. W. Mewes, Y. Murakami,P. Philippsen, been developedthat permitany of the 6000 H. Tettelin, S. G. Oliver genes to be replacedwith a mutantallele, or completelydeleted from the genome, with The genome of the yeast Saccharomyces cerevisiae has been completely sequenced absolute accuracy(17-19). The conmbina- through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 tion of a largenumber of chromosomesand potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, a small genome size meant that it was pos- 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the sible to divide sequencing responsibilities complete sequence provides information about the higher order organization of yeast's conveniently among the differentinterna- 16 chromosomes and allows some insight into their evolutionary history. The genome tional groupsinvolved in the project. shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate Old Questions and New Answers the biological functions of all of these genes. The genome.At the beginning of the se- quencing project, perhaps1000 genes en- codingeither RNA or proteinproducts had The genome of the yeast Saccharomyces nucleotideand proteinsequence data from been defined by genetic analysis(20). The cerevisiaehas been completely sequenced each of the 16 yeast chromosomes(1-16) complete genome sequence defines some through an internationaleffort involving have been established(Table 1). 5885 open readingframes (ORFs) that are some 600 scientistsin Europe,North Amer- The position of S. cerevisiaeas a model likely to specify protein products in the ica, andJapan. It is the largestgenome to be eukaryoteowes much to its intrinsicadvan- yeastcell. This meansthat a protein-encod- completelysequenced so far (a recordthat tages as an experimentalsystem.