The EMBO Journal vol.13 no.24 pp.5795-5809, 1994

H.Feldmann 1,2, M.Aigle3, G.Aljinovic4, B.Andre5, M.C.Baclet3, C.Barthe3, A.Baur6, A.-M.Becam7, N.Biteau3, E.Boles6, T.Brandt8, M.BrendeI9, M.Brückner10, F.Bussereau11, C.Christiansen8, R.Contreras12, M.Crouzet3, C.Cziepluch13, N.Demolis11, Th.Delaveau14, F.Doignon3, H.Domdey15, S.Düsterhus16, E.Dubois17, B.Dujon18, M.EI Bakkoury17, K.-D.Entian16, M.Feuermann19, W.Fiers12, G.M.Fobo20, C.Fritz21, H.Gassenhuber15, N.Glansdorff17, A.Goffeau22,23, L.A.GriveI124, M.de Haan24, C.Hein5, C.J.Herbert7, C.P.Hollenberg21, K.Holmstrßm8, C.Jacq14, M.Jacquet11, J.C.Jauniaux5,13, J.-L.Jonniaux22, T.Kallesße8, P.Kiesau 16, L.Kirchrath21, P.Kötter16, S.KoroI16, S.LiebI20, M.Logghe 12, A.J.E.Lohan25, E.J.Louis26, Z.Y.Li9, M.J.Maat24, L.Mallet11, G.Mannhaupt1, F.Messenguy17, T.Miosga6, F.Molemans 12, S.Müller 10, F.Nasr7, B.Obermaier15, J.Perea 14, A.Pierard17, E.Piravandi15, F.M.PohI27, T.M.PohI4, S.Potier19, M.Proft16, B.Purnelle22, M.Ramezani Rad21, M.Rieger10, M.Rose16, I.Schaaff-Gerstenschläger6, B.Scherens 17, C.Schwarzlose 1, J.Skala22, P.P.Slonimskr , P.H.M.Smits24, J.L.Souciet19, H. Y.Steensma28, R.Stucka 1, A.Urrestarazu5, Q.J.M.van der Aart28, L.van Dyck22, A.Vassarotti23, I.Vetter1, F.Vierendeels17, S.Vissers5, G.Wagner10, P.de Wergifosse22, In the framework of the EU genome-sequencing pro- K.H.Wolfe25, M.Zagulskr, F.K.Zimmermann6, grammes, the complete DNA sequence of the yeast H.W.Mewes20 and K.Kleine20 Saccharomyces cerevisiae II (807 188 bp)

1Institut für Phy'iol()gische Chemie, Physikalische Biochemie und has been determined. At present, this is the largest eukaryotic chromosome entirely sequenced. A total Zel[bi()l()gie der Universität München, Schillerstraße 44, D-80336 München, Germany, 3Universitc de Bordeaux II, Laboratoire de of 410 open reading frames (ORFs) were identified, Bi()l()gie Molcculaire et de Sequencage, Rue Le() Saignat, F-33()76 covering 72% of the sequence. Similarity searches B()rdeaux Cedex, France, 4GATC-Gesellschaft für Analyse Technik revealed that 124 ORFs (30%) correspond to und Consulting, Fritz-Arnold-Strasse 23, D- 78467 Konstanz, Germany, of known function, 51 ORFs (12.5% ) appear to be 'Lab()rat()ire de Physiologie Cellulaire et de Gcnctique des Levures, Universitc Libre de Bruxe[[es, Campus de la Plaine, CP244, Boulevard homologues of genes whose functions are known, 52 du Triomphe, B-I ()5(), Bruxelles, Belgium, 6Institut für Mikrobiologie, others (12.5% ) have homologues the functions of which TH Darmstadt, Schnitt'pahnstraße I (), D-64287 Darmstadt, Germany, are not well defined and another 33 of the novel 7Centre Nati()nal de la Recherche Scientifique (CNRS ), Centre de putative genes (8% ) exhibit a degree of similarity Gcnctique M()leculaire, r-91198 Gif-sur- Yvette Cedex, France, which is insufficient to confidently assign function. Of xResearch Institute fl)r F()()d Technology, Agr()-[ndustrial Technology and M()lecular Bi()techn()l()gy, Biotechnological Institute, Lundtoftevej the ~enes on chromosome 11, 37-45% are thus of I()(), Building 227, PO B()x 199, DK-28()() Lyngby, Denmark, 9J()hann unpredicted function. Amon~ the novel putative ~enes, W()lfgang G()etbe-Universität Frankfurt, Institut für Mikrobiologie, we found several that are related to ~enes that perform The()d()r-Stern-Kai 7, Haus 75A, D-6()596 Frankfurt/M, Germany, differentiated functions in multicellular organisms or 1()Gelli)type GmbH, Bi()techn()l()gische und Molekularbi()l()gixche Forschung, Angelh()fweg 39, D-69259 Wilhelmxteld, Germany, are involved in malignancy. In addition to a compact I1Universitc de Parix-Sud, [nxtitut de Genetique et Mier()biologie, arrangement of potential codin~ sequences, URA 1354 du CNRS, Lab()rat()ire lnfl)rmati()n Genetique et the analysis of this chromosome confirmed ~eneral

@ Oxford University Press 5795 H.Feldmann et al. chromosome patterns but also revealed particular novel plant genes may be compared. Moreover, the ease of features of chromosomal organization. Alternating genetic manipulation in yeast opens up the possibility of regional variations in average base composition functionally dissecting products from other correlate with variations in local gene density along eukaryotes in the yeast system. chromosome II, as observed in XI and Two years ago a consortium of 35 European laboratorics III. We propose that functional ARS elements are published the first complete sequence of a eukaryotic preferably located in the AT-rich regions that have a chromosome: chromosome III of S.cerevisiae {Oliver el al., spacing of -110 kb. Similarly, the 13 tRNA genes and 1992). For the past 3 years our consortium has tumed its the three Ty elements of chromosome II are found in efforts to the sequencing of yeast chromosomes XI and II AT-rich regions. In chromosome II, the distribution of and will continue to contribute to the sequencing of the coding sequences between the two strands is biased, yeast genome. The sequence of chromosome XI, the with a ratio of 1.3:1. An interesting aspect regarding second eukaryotic chromosome entirely sequenced, has the evolution of the eukaryotic genome is the tinding been published recently (Dujon el al., 1994 ). We report that chromosome II has a high degree of internal here the complete sequence of chromosome II {807 188 genetic redundancy, amounting to 16% of the coding bp ), the largest eukaryotic chromosome sequence ever capacity. entirely determined. The sequence of chromosome 11, Kev word.l: compositional bias/gene function/gene which constitutes -6% of the yeast genome, adds consider- redundancy/genome organization/putative replication ably to the body of information we have gained so far orlglns from chromosomes III and XI, which together make up -7.3% of the genome. Apart from the many novel genes detected in chromosome ll, we have also arrived at a more precise description of the organization of the yeast Introduction genome. The size of chromosome II is sufficient to The current genome projects endeavour to decipher the reveal specific novel chromosomal organization patterns; genetic information of a number of organisms by combined with the previous data from chromosomes III establishing detailed maps and finally complete sequences and XI, its analysis perrnits us to substantiate general of their genomes. With the present level of sequencing principles of chromosomal organization in yeast. methodology, early efforts at genome sequencing have been concentrated on organisms with less complex Results genomes. In this context, model organisms like bacteria (Kunst and Devine, 1991; Daniels et al., 1992; Honore Assembly and ver;f;cat;on of sequence et al., 1993) or organisms with genomes of intermediate The sequence was detennined from a set of 43 selected sizes such as Caenorhabditis elegan.'i (Wilson et al. , 1994 ) partially overlapping cosmid clones of a purpose-built or Arabidopsi.'i thaliana (Meyerowitz and Pruitt, 1985) genomic library from S.cerevisiae strain aS288C, supple- assume great importance as experimental systems. Among mented by an overlapping plasmid clone containing the all eukaryotic model organisms, Saccharomyces cerevi.\.iae right telomere. By cross-reference with an ordered library combines several advantages: (i) this yeast has a genome from strain C836, established prior to this work (Stucka, size of only 13.5 Mb, i.e. 220 times smaller than that of 1992), and by chromosomal walking, a set of overlapping the ; (ii) the yeast system is tractable to cosmid clones for chromosome II from strain aS288C powert.ul genetic techniques; and (iii) functions in yeast was generated. These cosmids then served to construct have been studied in great detail biochemically. Based on the physical map using the restriction enzymes BamHI, present data, one can calculate that a repertoire of 6500- SaII, XhoI and Xbal (average resolution -2 kb). 7000 genes is sufficient to build this simple eukaryotic Clones were distributed between the collaborating cell. Considering recent progress and worldwide studies laboratories according to a scheme to be presented else- of yeast genome sequencing (Vassarotti and Goffeau, where (H.Feldmann et al., manuscript in preparation). 1992; Goffeau, 1994 ), we can be confident of deciphering Assembly and interpretation of the sequence followed the its genetic potential within a reasonable time period and same principles as those applied for chromosome XI with relatively limited effort. (Dujon et al., 1994). Telomeres were physically mapped Since a large variety of examples provide evidence that relative to the tenninal-most cosmid inserts using the substantial cellular functions are highly conserved from I-SceI chromosome fragmentation procedure described by yeast to mammals, and that corresponding genes can Thierry and Dujon ( 1992). From this analysis it follows often complement each other, the wealth of sequence that the right telomere is completely contained in the information obtained in yeast will be extremely useful as sequence presented here. This sequence was determined a reference against which sequences of human, animal or from a specific plasmid clone (pEL 19B2) obtained by

Fig, 1, S{/cch{/romyce, cerevi,i{/e chromosome II map as deduced from the complete sequence. The map is drawn to scale from the sequence and coordinates (top line) are in kb The genetic elements on the two strands are shown as coloured bars. The top strand (designated 'Watson' strand) is oriented 5' to 3' from left to right. The sequence has been interpreted using the principIes detailed in Materials and methods This procedure identified 410 ORFs (blue and purpIe boxes), which have been numbered in increasing order from the centromere and designated L I()r the left arm and R for the right arm (note that the database entries will use a more complex nomenclature, namely YBL für ORFs on the leti arm and YBR für ORFs on the right arm, followed by a w/c suffix indicating their location on the Watson-Crick coding strand; see also Table I) ORFs corresponding to known genes are indicated by black bars. Tentative gene names are in brackets Ty elements (Dr remnants thereof) are shown as green bars 0, cr and t reter to the LTRs of the Ty 1/2, Ty3 and Ty4 elements, respectively, Dr remnants thereof. tRNA genes (red bars) are symbolized by a t and the one-letter code tor the amino acid accepted.

5796

H.Feldmann et al.

.telomere trap cloning' (Louis et al., 1994) and was shown correspond to real genes. Necessarily, this cstimate rcmains to overlap the right-most cosmid by some 12 kb. At its uncertain until the number of cxpresscd gcnes in Icft end, the scquence presented here starts with a telomere chromosome II has been determined by cxperimcntal consensus sequence contained in the left-most cosmid. methods. According to our physical mapping data, chromosome II A total of 13 tRNA genes, one of them containing an extends some 7 kb beyond this telomere consensus. intron, have been identifjcd on chromosome II. Among A detailed analysis of yeast telomeres indicated that these is one copy of the tandemly arrangcd pairs of chromosome II carries an additional 5.2 kb Y' element tRNA(Arg)-tRNA(Asp) genes. As anticipated (c.g. beyond the left telomere consensus sequence (Louis et al., Hauber et al., 1988; Ji et al., 1993), most of thc tRNA 1994 ), which is in full agreement with thc mapping data. genes are associated with complete Ty elements, with During sequence assembly, quality controls were their long terminal rcpeats ( 0, (J and T) and/or rcmnants pert'ormed following the same prjnciples as applied in thereof. Two of the three complete Ty elements encoun- sequcncing yeast chromosome XI (Dujon et al. , 1994; see tered on chromosome II belong to class l and one to class Materials and methods). We are confident, therefore, that 2 elements. a levcl of 99.97% accuracy has been achieved. Partial scquences of chromosome II have been published Analysis of the predicted protein products indepcndently by the authors of this work (Delaveau et al., Comparison of the present sequence with public databascs 1992, 1994; Skala et al., 1992, 1994; Van Dyck et al., revealed that 124 of the 41{) ORFs (3{)0/c) corrcsp()nd 1992, 1993, 1994; Baur et al., 1993; Bussereau et al., either to previously known protein-encoding genes ()r t() 1993; Demolis et al., 1993, 1994; Doignon et al., 1993a,b; genes whose functions have been determined during this Miosga and Zimmermann, 1993; Schaaff-Gerstenschläger work. In all. 7{)% of the total ORFs represent n()vel et al., 1993a,b, 1994; Scherens et al., 1993; Becam et al., putative yeast genes; 51 of them ( 12.5(X of thc total) have 1994; De Wcrgifosse et al., 1994; Holmstr0m et al. , 1994; homologues among gene products rrom yeast or other Logghe et al. , 1994; Mallet et al., 1994; Mannhaupt et al., organisms whose functions arc known. whereas 52 ()thers 1994: Nasr et al., 1994a,b: Ramezani Rad et al., 1994; ( 12.50/() of the total) havc h()mologues wh()se functions Smits et al., 1994; van der Aart et al., 1994; Wolfe and are not weIl defined. A rurther 33 ORFs ()f the novel Lohan, 1994; Zagulski et al. , 1994 ). putative genes (8% of the total) show a degree of similarity which is insufficient to confidently assign functi()n ( sec Definition of open reading frames (ORFs) and Materials and methods). The remaining 15{) ORFs (37°X other genetic elements of the total) have either homologues tlJ ORFs or unkn()wn The map of chromosome 11, as deduced from the complete function on other chromos()mes or no h()mol()gues in sequence, is shown in Figure I. A total of 410 ORFs were data libraries at all (n()te that this last set includes 17 identified in the entire chromosome using the principles 'questionable. ORFs). Overall. between 37 and 45c;lo ()f explained in Materials and methods, disregarding the six the genes of chromos()me Il arc thus ()f unpredictcd ORFs contributed by three complete yeast retrotransposons function. All of the above figurcs are similar to thosc (Ty elements); 17 ORFs, mainly encoding ribosomal obtained for chromosomes III (OJiver et al.. 1992) and , are interrupted by introns. The list includes 30 XI (Dujon et al.. 1994 ). Table I lists the known gencs pairs of partially overlapping ORFs, five representing plus all those ORFs which arc c()nsidered to bc hom()l()gues parallel and 25 representing anti-parallel overlaps; II pairs of gene products rrom yeast or other organisms whosc each include a gene whose function is known, whereas functions are known or whose runctions can bc prcdicted 10 other pairs include an ORF whose predicted product from similarity scorcs and protein signaturcs. has a homologue in the databases, suggesting that it We have analysed the chromosomc II ORFs by using corresponds to a real gene (for details see H.Feldmann the ALOM algorithm (Klcin et al.. 1985) to prcdict putative ('t (/I., manuscript in preparation). Moreover, in all such membrane spans. A total of 142 ORFs (disregarding 21 cases the partially overlapping partner ORF is shorter, 'questionable. ORFs from the partially overlapping pairs) suggesting that it may not correspond to a real gene. This were found to contain fr()m one to 14 potential membranc leaves uncertainty for nine of the pairs of overlapping transversions (one ORF evcn sh()wed 21 such spans). ORFs. Thcse results were confirmed by visual inspccti()n of In the analysis of chromosome XI \Dujon ('t al., 1994), hydropathy plots of the ORFs in question. Thus. wc arrivc each ORF was evaluated using the codon adaptation index at an estimate that some 38(lc of thc 'real. gencs in (CAI; Sharp and Li, 1987) and ORF sizes as criteria: chromosome II may code for transmembrane proteins. ORFs that were both < 150 codons in length and had a A similarly high figure has already bccn t()und with CAI <0.110 were considered as 'questionable' .If the chromosome III (Goffeau et al.. 1993a.b). Preliminary same criteria are applied to chromosome 11, 38 ORFs fall data obtained rrom other sy,tematically sequenced yeast into this category; of these, 21 also belong to the set of chromosomes suggest that this may apply as a general partially overlapping ORFs. These criteria, however, can rule in yeast (A.G()ffeau. personal communication). Evcn be used only as a guideline rather than a strict rule; in though the algorithm may give a somewhat high estimatc. chromosome 11, three genes of known function are < 150 possibly a third of the yeast proteins havc t() bc c()nsidcrcd codons in length and have CAI values between 0.091 and to be associated with membrane structurcs. We also 0.067. If we exclude trom the list of 410 predicted examined chromosome II ORFs for the ()ccurrcnce of chromosome II ORFs 21 of the partially overlapping putative mitochondrial target signal scqucnces. A rough ORF, and the residual 17 out of the 'questionable' ORFs, estimate is that 8-1 {)0/c or thc proteins may bc designcd we arrive at a total number of 372 ORFs that might for mitochondrial import. Details ()r these analyscs will

5798 Veast chromosome "

l'ahle I. G~ncs of klli)Wn or prcdicted function on chromosome II

JRf Size (aa)" Gcnct Function CA!

YBL I ()hL. I()I() probable G-protcin, ß-transducin type 0.123 YBli().'i( 1151 PKCI protein kinase Clike protein (Serrrhr-specific) 0.171 YBIIQ,( 316 probable cytochromc c subunit, copper binding 0.125 YBlj)()() 43~ TY2A 0180 YBU)()() 177()t TY2B QI50 YBU)99w 545 ATPI mitochondrial ATPase, a chain precursor Q470 YBlj)9,( 22() ROXJ nuclear protein involved in CYC7 expression QI04 YBlj)92w 13()i {RPLJ2e] probable ribo,omal protein L32.e Q817 YRU)91( 349 probable IF2~associated glycoprotein Q211 YRU)88( 27~7 probable PI3 kinase, ORR I homologue 0.122 YBU)87( 137i RPLI7A probable ribosomal protein L23e or YL 17a Q623 YR[j)85w 9~() 8081 BEMI binding protein Ql32 YR[j)X4( 75~ CDC27 cell division eontrol protein COC27 QI31 YRU)X()( 541 maintenance of rho + mitochondrial ON A PETI12 QI32 YBU)7h( 1()72 ILSI isoleucyl-tRNA synthetase Q342 YBU)75( 649 SSAJ heat-shock protein, 70 kOa QI77 YBU)74( '55 AAR2 MATal-mRNA splicing factor 0143 YRI~()72( 2()()i RPS'X ribosomal protein S8e Q746 YBU)hXw ,55 [PRPS'J] probable ribose-phosphate pyrophosphokinase QI91 YBU)hh( 1()57 probable regulatory Zn finger protein QI37 YBU)h4( 261 homologue to thiol-speeific antioxidant Q206 YB[j)h,\, 11I1 KIPI kinesin-related protein QI43 YRU)hl( 696 SKT5 probable Ca2+ binding protein (prenylated) QI13 YBU)5hw 46~ probable phosphoprotein (Ser!Thr) phosphatase QI77 YBL()54w 525 homologue to m yb transforming proteins QI42 YRU)5()w 292i Slo'CI7 transport vesiele fusion protein QI58 YRlj)47( 13~1 cytoskeletal-related transport protein, Ca2+ binding QI96 YR["()45( 457 'ORI ubiquinol-eytochrome c reductase Q293 YBU)42( 639 FUR4 homologue, uracil transport protein Ql68 YBU)4Iw 241 PR.)] proteasome subunit 3

YBU)4()( 219i E:'RD2 ER lumen protein retaining reeeptor QI81 YRlQ,9( 579 URA7 cytidine triphosphate synthase 0.308 YBUl,Xw 232 [MRPL16[ probable mitochondrial ribosomal protein L 16 0154 YBU).,hc 257 homologue to twitching motility protein 0.237 YBLQ,.,c .,45 [RIBlj probable GTP cyclohydrolase 11 0.111 YBUl,()c 31~ AAC2 mitochondrial ATP/AOP carrier 0.537 YR[j)27w 1~9i RPLI9 ribosomal protein L 1ge 0.707 YBU)2hw 9.'ii probable snRNP-related protein 0168 YRL()24w 6~4 probable proliterating-cell nucleolar antigen (human p 120) 0.270 YBL()23c ~6~ MCM2 transeription factor 0175 YBL()22( 113., P!M! mitochondrial ATP-dependent lon-Iike serine proteinase 0186 YBLO21c 144 HAPJ transcription factor 0091 YBl() 17( 1579 PEP! carboxypeptidase Y sorting precursor 0163 YBU)lhw 353 FU!)'J protein kinase (cell cycle and cell fusion) 0110 YB["()I)w 526 ACH! acetyl-CoA hydrolase 0.217 YRU)I,w 393 probable met-tRNA formyltransferase, mitochondrial 0.073 YBU)()Xw ~4() HIRJ regulator of histone gene transcription 0.128 YB["()()7c 1244 SLAJ cytoskeleton assembly control protein 0.189 YBlj)()5w 976 PDRJ pleiotropic drug resistance protein 3 0147 YBlj)()() 44() TYJA 0140 YBU)()() 1755t TYJB 0140 YBU)Q,( 132 HTA2 histone H2A2 0.542 YI3U)()2w I., I HTB2 histone H2B2 0562 YBR()()I( 78() NTH2 a,a-trehalase 0123 YBR()Q,w 473 COQI hexaprenyl-pyrophosphate synthase precursor 0140 YBR()()h\, 4,5 probable aldehyde dehydrogenase 0.187 YBR()()Xc 54~ probable benomyl/methotrexate resistance protein 0155 YBR()()9c 1()3 HHFI histone H4 0.733 YBR()I()w 1.,6 HHTI histone H3 0.621 YBR()II( 2~7 IPPI inorganic pyrophosphatase 0620 YBR()()() 44() TYIA 0150 YBR()()(I 1756t TYIB 0140 YBR()14( 2()3 glutaredoxin homologue 0.154 YBR()I'i( 597 7TPI typeIl transmembrane protein 0.169 YRR() 1Xc ,66 GAL7 galactose-l-phosphate uridylyltransferase 0221 YBR()19c 699 GALIO UDP-glucose-4-epimerase 0185 YBR()2()w 52~ GALI galactokinase 0194 YBR()2Iw 633 FUR4 uracil transport protein 0186 YBR()2,( 1165 CHS..~ chitin synthase 3 0166 YBR()24w ,()I [SCO2[ SCOI protcin homologue 0155 YBR()2.'i( ,94 probable purine nucleotide binding protcin 0.566 YBR()2hc 3~() [MRFI] probable (mitochondrialj ssDNA binding protein 0165

5799 H.Feldmann et al.

Tahle I. C,mtinued

ORF Size (aar Geneh Function CAI

YBRO28c 525 probable Serf[hr-specific protein kinase ()149 YBRO31w 362 RPL2 ribosomal protein L2A ()8()2 YBRO33w 919 probable regulatory Zn-finger protein ()115 YBRO34c 348 [ODP1J ORF adjacent to PDX3 0.267 YBRO35c 228 PDXJ pyridoxamine-phosphate oxidase ().242 YBR036c 410 CSG2 Ca2+-dependent regulatory protein ().142 YBR037c 295 SCOl cytochrome oxidase assernbly protein precursor ()II() YBR038w 963 CHS2 chitin synthase 2 QI72 YBR039w 311 probable H+-transporting ATPase IF(I )-ATPase yl 0.337 YBRO41w 623 probable AMP binding protein QI86 YBRO42c 397 probable mernbrane-bound sillall GTPase QI32 YBRO43c 689 probable pleiotropic resistance protein QI24 YBRO44c 573 hoillologue to mitochondrial chaperonin hsp60 QI26 YBRO46c 334 hoillologue to quinone oxidoreductase (E.coli) QI56 YBRO48w 156i RPS18B ribosomal protein S 11.eB ()733 YBRO49c 810 REBl DNA binding regulatory protein ()199 YBRO52c 210 hoillologue to Trp repressor binding protein (t.'.coli) Q15() YBRO54w 344 [YRO2] hoillologue to HSP30 heat-shock protein Q456 YBRO55c 899 PRP6 pre-rnRNA splicing factor QI26 YBRO56w 501 hoillologue to glucan-I,3-ß-glucosidase Q202 YBRO59c 1108 probable protein kinase QI45 YBRO60c 620 RRRI origin recognition coillplex, 72 kDa subunit QI40 YBRO61c 310 hoillologue to ftsJ protein (Ecoli) 0.J43 YBRO6.ic 404 probable phosphopantethein binding protein QI31 YBRO66c 220 probable Zn-finger protein QI24 YBRO67c 210 TIPI ternperature shock-inducible protein precursor SRPlf[IPI Q449 YBRO68c 609 probable amino acid transport protein QI57 YBRO69c 619 probable amino acid transport protein 0.151 YBRO72w 214 HSP26 heat-shock protein, 30 kDa 0.337 YBRO73w 958 probable RAD protein, DNA repair helicase QI31 YBRO74w 413 hoillologue to aminopeptidase Y 0123 YBRO78w 226i hoillologue to sporulation-specific protein SPS2 Q614 YBRO80c 758 SEC/8 vesicular fusion protein QI92 YBRO81c 1332 SPT7 probable transcription factor, suppressor of Ty transcription QI54 YBRO82c 148i UBC4 ubiquitin conjugating enzyme E2, 16 kDa subunit Q313 YBRO83w 486 TEC/ Ty transcription activator Q12() YBRO84w 975 M/SJ C l-tetrahydrofolate synthase precursor, mitochondrial 0.207 YBRO85w 307 AACJ mitochondrial ATP, ADP carrier 0.198 YBRO86c 946 probable transmembrane protein QI75 YBRO87w 354 replication factor RFC3 hoillologue QI51 YBRO88c 258 POLJO proliferating cell nuclear antigen Q256 YBRO91c 109 MRS5 nuclear protein involved in mitochondrial intron splicing Q()67 YBRO92c 467 PHOJ acidic phosphatase, constitutive Q353 YBRO93c 467 PHO5 acidic phosphatase, repressible Q46() YBRO97w 1454 VPSl5 protein kinase, vacuolar transport QI34 YBRIO4w 329 [YMC2] mitochondrial carrier protein QI19 YBRIO8w 848 probable transcription factor QI()6 YBRIO9c 147 CMDJ calillodulin Q219 YBR II Ow 449 ALGJ a-rnannosyltransferase 0140 YBRllIc 231 {YSAJ] hoillologue to Drosophila serendipity protein Q246 YBRI12c 966 SSN6 transcription regulatory protein Ql61 YBRI14w 790 RADJ6 radiation repair protein, putative DNA helicase OJ62 YBRI15c 1392 LYS2 a-aminoadipate reductase Q212 YBRI17c 681 TKL2 transketolase 2 (EC 2.2.11) QI68 YBR 118w 458 TEF2 translational elongation factor a-l Q875 YBR 119w 298i MUDJ U I snRNP-specific A protein Ql12 YBR 120c 162 CBP6 cytochrome h pre-rnRNA processing protein 6 0126 YBRI21c 667 {GRSJ] probable glycyl-tRNA synthase Q413 YBRI22c 196 MRPL36 mitochondrial ribosomal protein YillL36 QI73 YBR 123c 649 TFCJ transcription factor TFIIIC, 95 kDa subunit QI35 YBR 125c 393 probable phosphoprotein phosphatase QI28 YBR 126c 495 TPSl a,a-trehalose-phosphate synthase (CIFI) Q187 YBRl27c 517 ATPvs H+ -transporting ATPase, vacuolar 039() YBR 132c 596 probable amino acid transport protein QI42 YBRI35w 150 CKSI CDC28 kinase coillplex, regulatory subunit 0143 YBRI36w 2368 probable phosphatidyl inositol kinase QI36 YBR 139w 508 probable serine-type carboxypeptidase Q15() YBRI40c 3092 !RA! GTPase-activating protein of the RAS-cAMP pathway 0.139 YBRI42w 773 probable DEAD box RNA helicase 0182 YBRI4ic 437 SUPl ominipotent suppressor protein of nonsense codons Q333

YBRI45w 351 [ADH5l alcohol dehydrogenase 0.253

YBRI46w 278 [MRPS9] probable mitochondrial ribosomal protein S9 Ql37

5800 Veast chromosome II

Tahle I. Cvntinued

ORF Size (aa)" Geneh Function CAI

YBR 149w probable aldehyde reductase YBR 15()c probable regulatory Zn-finger protein YBR 153w {RIB7j riboflavin biosynthetic protein YBR 154c RPB5 RNA polymerases I, II and III, 27 kDa subunit YBR 16()w CDC28 cell division control protein YBR 161w SURl homologue YBR 164c ARFJ GTP binding ADP ribosylation factor 3 YBR 166c TYRI prephenate dehydrogenase (NADP+) Y13R 169c SSE2 heat-shock protein, 70 kDa Y13R 17()c NPL4 ,uppressor of SEC63, ER translocation component YBR 171w HSSI ER translocation complex subunit SEC66 YBR 172c SMY2 kinesin-related protein suppressing myosin defects YBR 175w probable GTP binding protein YBR 176w probable 3-methyl-2-oxobutanoate hydroxymethyltransferase Y13R 177c probable membrane receptor YBR 179c probable purine nucleotide binding protein YBR 18()w probable drug resistance protein YBR 181c RPS1Ol ribosomaI protein S6.e YBR 182c probable DNA binding transcription factor YBR 186w probable ATP binding protein YBR 187w probable membrane protein YBR 189w SUP46 suppressor, ribosomal protein S 13 YBR 191w URPJ ribosomal protein L2 I e YBR 192w RJM2 probable carrier protein, mitochondrial Y13R 195c MSJJ multicopy suppressor of IRA 1, G-protein YBR 196c PGJJ phosphoglucose isomerase YBR 198c probable transcription-associated factor protein YBR 199w KTR4 a-l,2-mannosyltransferase homologue YBR 2()()w BE.'Ml budemergence mediator YBR 2()2w MCM3 protein homologue YBR 2()4c probable serine-active lipase, peroxisomal YBR 2()5w [KTRJ] KTR3 protein YBR 2()7w probable membrane protein Y13R208c DUR1,2 urea carboxylase YBR 212w RBPl RNA binding protein, NGRI YBR 213w Mt'T8 effector of PAPS reductase and sulfite reductase YBR 215w HPC2 cell cycle regulatory protein Y13R218c PYC2 pyruvate carboxylase 2 YBR 221c PDBl pyruvate dehydrogenase (lipoamide), ß-chain YBR 222c probable AMP binding protein YBR 227c homologue to ATP binding protein clpX (Ecoli) YBR 229c homologue to a-l ,4-glucosidase YBR 233w homologue to human hnRNP complex K protein YBR 236c ABDl protein with mutational synergism related to BEM I YBR 237w PRP5 pre-mRNA processing protein, RNA helicase YBR 239c probable Zn-finger protein YBR 24()c probable Zn-finger protein YBR 241c probable sugar transport protein YBR 242w probable ATP/GTP binding protein Y13R243c TURl UDP-N-acetylglucosamin-l-phosphate transferase YBR 244w probable glutathione peroxidase YBR 245c homologue to SNF2/SWI2 DNA binding regulatory protein YBR 248c HISl glutamine amido transferase YBR 249c ARO4 2-deoxy-3-deoxyphosphoheptanoate aldolase YBR 251w {MRPS'5j probable mitochondrial ribosomal protein SS YBR 252w DUTI mitochondrial dUTP pyrophosphatase YBR 254c probable membrane protein YBR 256c RIB5 riboflavin synthase a-chain Y13R263w [SHMTI/ serine hydroxymethyltransferase YBR 264c probable small GTP binding protein Y13R 265w probable membrane protein YBR 266c probable membrane protein YBR 267w probable Zn-finger protein (C2H2 type) YBR 268w MRPL~7 probable mitochondrial ribosomal protein L37 YBR 27()c probable ATP/GTP binding protein Y13R 274w probable protein kinase (cytokine receptor family) YBR 27'ic RIF RAPl-interacting regulatory protein Y13R276c probable tyrosine-specific protein phosphatase Y13R278w DPB.j DNA-directed DNA polymerase, chain C YBR 281c probable G-protein, ß-transducin type YBR 282w MRPL27 mitochondrial ribosomal protein YmL27 Y13R 283c probable SEC61 homologue

5801 H.Feldmann et al.

Table I. C'JI1tinued

ORF Size (aa)" Gench Function CAI

YBR286w 564 aminopeptidase y Q331 YBR289w 905 SNF5 general transcriptional activator QJI9 YBR291c 299 probable mitochondrial carrier prolein 0148 YBR293w 474 probable multidrug resistance protein 0087 YBR294w 859 probable sulfate transport protein OJ30 YBR295w 1216 P CA l P-type copper-transporting ATPase 0140 YBR296c 574 homologue to phosphate-repressible phosphate permease 0254 YBR297w 468 MAL.jR maltose fermentation regulatory protein 0123 YBR298c 614 MALJT maltose permease 0.104 YBR299w 584 MALJS malta,e 0227

Dctailed lists ()f all chromosome \J ORFs (including GC content and CAI values), intron-containing genes, tRNA genes and pr()tcins with putalivc memhrane spans can he lound in tables dep()sited together with thc scquence data (see Acknowledgemcnts) a,i' indicatcs an intron-containing ORF; t indicates TYB protein pr()duced with an internal + 1 frame,hifl hSuggcsted gene names are in parentheses.

be presented elsewhere (H.Feldmann et al., manuscript in preparation).

Tahle II. Rclatcd genes from chromosome I1 'Redundant' sequences in chromosome I/ Gene/ORF on Reiutcd gene/ Functional description Several algorithms were used to analyse chromosome chromosome II ORF on other II for the occurrence of sequences demonstrating high chromosome" similarity, both at the nuclcotide and the amino acid leveJs

HTA2 HTAJ (4R) histones H2A (H.Feldmann et al., manuscript in preparation). The results HTB2 HTBJ (4R) histones H2B not only confirm earlier notions (e.g. Dujon et al., 1994 ) HHTI HHT2 (4) histones H3 that the degrce of interna] genetic redundancy in the yeast HHFI HHF2 (4) histones H4 genome must be high, but also provide a more detailed PYC2 PYC J (7) pyruvate earboxylases TKL2 TKLJ transketolases picture of this phenomenon (Table 11). First, in chromo- Tl:F2 TE'FJ (16R) translational elongation fators a some II we find quite a number of genes that are YMC2 YMCJ (16) mitoehondrial carrier proteins functionally weJl characterized and have highly homo- MCM2 MCMJ (5L) transcription factors logous counterparts on other chromosomes. Surprisingly, IRAI JRA2 (15L) regulators in the cAMP-RAS a second category that we encountered is represented by pathway KIPI KIP2(16L) kinesin-related proteins a number of highly homologous genes on chromosome II NTH2 NTHJ (4) trehalases itself. Several of these are functionally characterized, YBRO78w .)PS2 sporulation-specific proteins while for others only probable functions are predicted. YBRO28c YKR2 (13R) protein kinases Additionally, 20 of the chromosome II ORFs of unknown YRO2 YCR20c (3) seven transmembrane proteins function have homologues among ORFs also of unknown RPc'iI!B RPS8A (5) ribosomal proteins RPSII!B RPSJ8A ribosomal proteins function and lying on other systematically sequenced chromosomes or on chromosome II itself. Gene/ORF on Related gene/ Functional description By applying the program PYTHIA (Milosavljevic and ehromosome II ORF on Jurka, 1993) to search for simple repeats, we detected at chromosome II least 12 sets of regularly repeated trinucleotides along AAC2 AACJ mitochondrial ADP/ATP chromosome II (H.Feldmann et al., manuscript in prepara- translocators tion). Concomitant examination of the chromosome II CHS2 CHSJ chitin synthases ORFs revealed that these triplets represent repetitious SCOI SCO2 cytochrome oxidase assembly factors codons for particular amino acids, such as asparagine, MCM2 YBR202w probablc transcription factors glutamine, arginine, aspartic acid, glutamic acid, proline KTR-~ YBRI99w probable mannosyltransferases and serine, thus forming homopeptide stretches. Searches RAD/6 YBRO73w probable radiation rcpair proteins in the databases show that there are numerous proteins YMC2 YBR291c probable mitochondrial carrier containing homopeptides bui]t from these amino acids, proteins YBLO88c YBR 136w probable phosphatidyl inositol sometimes of considerable size, in yeast and other kinases organisms. A]though the role of such homopeptides is not YBRO41w YBR222c probable AMP binding proteins weIl defined, it appears that they constitute specific YBRO6Xc YBRO69c probable amino acid transporters domains enabling the respective proteins to fulfil specific YBRO6Xc YBRl32c probable amino acid transporters YBROOXc YBRO43c probable multidrug resistance functions. proteins YBROO8c YBR293w probable multidrug resistance Organization of the chromosome proteins The gene density in chromosome II is as high as found YBLO56w YBR 125c probable phosphoprotein previously with chromosomes III and XI: ORFs occupy phosphatases on average 71.9% of the sequence of chromosome II, "Where known, thc chromosomal location is indicated in parenthescs. excluding the ORFs contributed by the Ty elements. The

5802 Yeast chromosome II

phenomenon is not known; more detailed analyses, e.g. of biased codon usage in the two strands from chromosome II and others, may give further clues. For the putative membrane proteins, the same asymmetrical distribution of ORFs is observed as for the rest of the ORFs. Remarkably, the 'membrane' ORFs appear to occur in clusters on chromosome II and occupy 46.5% of the total coding capacity. Regional variations of base composition with similar amplitudes were noted along chromosomes III (Sharp and Lloyd, 1993) and XI (Dujon ei al. , 1994 ), with major ac- rich peaks in each arm. The analysis of chromosome XI revealed an almost regular pcriodicity of thc ac content, with a succession of aC-rich and aC-poor segments of -50 kb each; a further interesting observation was that thc compositional periodicity correlated with local gene density. Profiles obtained from a similar analysis of chromosome II again show these phenomena (Figure 3). aC-poor peaks coinciding with relatively low gene densities are located at the centromere (around coordinate 230) and at both sides of the centromere with a periodicity of -110 kb. These minima are more pronounced around coordinates 120, 340 and 560, while they are less so at coordinates 450 and 670. Remarkably, most of the tRNA genes reside in aC-poor 'valleys' and the Ty elements eventually became integrated into these regions. We have also analysed chromosome II for the occurrence of simple Fi~. 2. Compositional symmelry/asymmetry of chromosome II and its conslitllent clcmcnts Rclativc deviations of dinllclcotidc frcqllencies repeats, potential ARS elements and putative regulatory I(obscrved cxpcctcd)/expcctcdl arc shown as vcrtical bars (cxpectcd signals. Some of the results will be discussed below frcqllencics are calclllated from mononllcleotide freqllencies) and a detailed evaluation will be presented elsewhere Complementary dinllcleotide pairs have been arranged in mirror image (H.Feldmann ei al., manuscript in preparation). to help visllalizc compositional symmelry or asymmetry Self- complementary dinllclcotides arc at the cenlre (A) Dala for the entire chromo,ome scqllence. calclllated from thc Watson strand (B) Data Comparison of the physical and genetic maps for ORFs only. calclllated in each case from the coding strand The genetic map of S.cerel'isiae (Mortimer et al., 1992) assigned 92 genes or markers to chromosome II; 71 were average ORF size is 475 codons ( 1425 bp). The mean located on a linear array and 21 remained unmapped. sizes of inter-ORF regions are 647 bp for "divergent Figure 4 shows a comparison of this map wjth thephysical promoters' and 414 bp for 'convergent terminators', while map deduced from the complete sequence. In all, 42 of "promoter-terminator combinations' are 662 bp in length the mapped genes and II of the unmapped genes could on average. These values are similar to those reported be unambjguously assjgned to an ORF or a tRNA gene for chromosome XI. The average base composition of of the present sequence on the basjs of previous partjal chromosome II is 38.3% GC, a value close to that of sequence data, use of probes or gene function; the assign- chromosomes III (38.5% ) and XI (38.1% ). As expected, ment of four genes remains tentative. Thus, a total of 35 the coding regions have a higher GC content on average genes or markers remains unassigned on the physical map (39.6clc) than the non-coding regions (35.1% ).In sliding of chromosome II at present. These include several genes windows, coding regions may be discriminated from lpet9 (= AACJ); pdr7 (= pdr4); RNAJ4; rpcJ9] whose intergenic regions because "transitions' in GC content are sequences are known but which do not appear in chromo- rather sharp at their borders (data not shown). An almost some II of strain aS288C. This is also true for the MELJ , symmetrical distribution of dinucleotide frequencies over SUC3 and MGL2 genes. CDC25 had been mapped to the entire chromosome is apparent (Figure 2A), whereas chromosome II erroneously but has been located to the base composition of ORFs shows a significant excess chromosome XII (Johnson et al., 1987). Two suppressors, of homopurine pairs on the coding strand (Figure 28). SUP87 and SUP72, may correspond to the tRNA genes These data are also similar to those obtained for chromo- found between coordinates -320 and -345 on chromosome some XI (Dujon er al., 1994). II. The order of the genes positjoned on chromosome II Contrary to w hat has been observed in chromosomes by genetjc and physical mapping is largely the same, with III and XI, chromosome II shows a significant bias of some exceptions. No gross translocations or jnversions on coding capacity between the two strands (Table III). the genetic map, as found with chromosome XI (Dujon Whereas in the two other chromosomes the coding capacity et al., 1994 ), were observed here. is nearly symmetrical on the two strands, in chromosome II the coding capacity on the 'Crick' strand exceeds that Discussion of the 'Watson' strand by 33%. This bias remains virtually unchanged when the "questionable' ORFs are excluded The network approach to systernatic sequencing of the from the calculations. At present, the significance of this yeast genorne started with chromosome III and has been

5803 H.Feldmann et al.

Ratio of coding coding ORFs coding ORFs capacity (C/W) 0/" aa

11807 188 bp 30.3 81525 177 475.6 4().5 108929 204 534.0 1.336 (overlapping ORFs and Tys excludedj

XI 666 448 bp 363 80742 163 495.3 34.8 77231 518.3 0.960 (overlapping ORf'"s excludedj

111315287 bp 32.4 34 037 79 430.8 35.5 37162 104 357.3 1.092 (overlapping ORFs and Tys excluded}

ceN cloning of chromosome Il DNA, we employed a vector A which carries a yeast marker and therefore can be r 40 ' ,', i used in direct complementation cxperiments (Stucka and : " '" \ r', ; ..:::,... 40 ..' Feldmann, 1994 ). Furthermore. these cosmid clones tumed u \ '", \ ;" \, ~ + .. " ,. , .. \ 0:.. ! , ..: out to bc stable for many years under usual storage Q. .1;'." ~ .~. conditions. Like chromosome XI. the physical map of chromosome Il has been constructed without reference to 30 30 the genetic map and has been confirmed by the final i IJD 00 O .O O ID U ~U O DI J I sequence. The comparison of the physical and genetic maps of 100 200 300 400 500 600 700 800 chromosome Il (Figure 4) shows that most of the linkages LJ."II 1111,111 1,, 11 1,11, ,1,,, II, 1,11, 111 1,,,, ! ,,1,1,1,, 1111 ,1,,,111 ! 111,1,11 1 .0 B 10-,1 Ty ; Ty have been established to give the correct gene order; ~~ i ~ Ii " ~ 0,9 ~ I'; 0 .9 however, in many cases the relative distances derived JI " , :',' " 0.8 from genetic mapping are rather imprecise. The obvious 0,6 I~ .. , ' : ~'i~i ,. imprecisions of the genetic map may be due to the fact l:' ,~.1r'i~\;, '1'-0 ,~:;,: ' , '~I :,; , in !ii:.\~~;,;i :,~; 0,7 ,. , j L. ,., 0.7 " ..'!!'~ ;\ ~ " , I that different yeast strains have been used to establish the 0) , ,,1" " ..", [1 .\:;W,,'.," , "0 .'. i linkages. It is possible that some strains employed in 0) ,;~, ,~ 0.6 " 0,6 ,~. , , ! 0) genetic mapping experiments show inversions or trans- (9 ! :i, 05 j G.~. d.nsity (wlndow-30000) 0.5 locations which then might contribute to discrepancies

04 0.4 between physical and genetic maps, as considered in the case of chromosome XI. However. a more wide-spread Fig. 3. Compositional variation and gene density along chromosome phenomenon that may lead to imprecisions in the genetic II. (A) Compositional variation along chromosome II calculated as in Dujon et l/l. (1994). Each point represents the average GC maps are strain polymorphisms caused by the Ty elements. composition calculated from the silent positions only of the codons of Detailed information on strain differences resulting from 15 consecutive ORFs. Similar slopes were obtained when the GC Ty insertions and/or deletions is available for chromosome composition was calculated from the entire ORFs or from the inter- 11, where we can compare the complete Ty patterns from ORF regions. or when averages of 13-30 elcments wcre plotted (results not shown). Thc location of pert.ect ARS consensus sequences strains aS288C and C836, and local patterns from two is indicated by the rectangles; filled boxes. ARS patterns fulfilling other strains, YNNI3 and M1417-c (Stucka. 1992). In criteria attributed to functional replication origins (sec text). (B) Gene aS288C, a Ty2 element is associated with the tRNAPhc density along chromosome II. Gene density is expressed as the gene (coordinate -24), while it is absent in C836 at this fraction of nuclcotides within ORFs versus the total number of nucleotides in ,Iiding window, of 30 kb (increments are I kb). Simi]ar position; instead. a Ty2 has been inserted into a .solo. 8 results werc obtained for sliding windows of 20 or 50 kb. The arrows sequence near the tRNALeu4 gene (coordinate -3.6). The indicate the locations of tRNA genes; tRNA gcnes associated with Tyl element next to IPPI (coordinate -251) is missing in completc Ty elements are marked by .Ty.. The verticallines have C836. whereas a Ty3 element is found at the equivalent been introduced at a regular spacing of 110 kb. starting from the centromere (coordinatc 230) and taking the most prominent troughs at position in YNNI3. In C836. the tRNACys and tRNAGlu:i coordinates 120 and 560 as reterences. genes bracket a Ty I element. which is absent at this location (coordinate -638) in aS288C; in MI417-c. the continued successfully with chromosomes XI and II. In Ty 1 element and the L sequence. the LTR of a Ty4 element. the two latter cases, cosmid libraries and fine-resolution are missing. It may be noted that the sequences around physical maps of the respective chromosomes from the the elements are weIl conserved among all these strains. same unique strain were first constructed to facilitate Many more examples of this kind can be found in the sequencing and assembly of the sequences. It should be literature. Altogether, this reveals a substantial plasticity noted that. by convention. in all laboratories engaged in of the yeast genome around tRNA gene loci which appear sequencing the yeast genome, the strain aS288C, or to be the preferred target sites for Ty transpositions (e.g. isogenic derivatives thereof. were chosen as the source of Hauber et al.. 1988; Feldmann, 1988). Experimentally. DNA because they have been fairly weIl characterized this latter phenomenon has been proven for yeast chromo- and employed in many genetic analyses. For cosmid some III (Ji et al.. 1993). Since these regions do not

5804 Yeast chromosome II

DUT1, NOV1., pho83., SNF5, STA2., tRNA(ser2), tRNA(asp), aar2, cna1"(= cmp1 ), fus3, mis1 , pol3O, reb1 , rib1?, rib7?, RNA14., rpb5, rpc19., tec1 , vps15

Fig. 4. Comparison of the genetic and physical maps of yeast chromosome II The gcnetic map (lower part; 71 mapped gencs or markers) is redrawn trom (;enetic ,md Ph"iclll MllpS '!f Sllcchllrom,ce, cerevi,lle (edition 11; Mortimer et lll, 1992) The unmapped gcnes are listed beneath The physical map (upper part) dcrived from the completc sequence of chromosomc 11 has been drawn to thc same scalc. The circle indicates the position of the centromere. Gcnes or markers for which no ORF or RNA gene has been assigned on the physical map as yet arc indicated by an asterisk; thc assignation of gencs marked by "')' is only tcntative Numerous othcr genes described in this work were not assigned previously to a chromosome (compare Figure 1 and Table I).

contain any special ON A sequences, the region-specific At the same time, the differences found between various integration of the Ty elements may be due to specific yeast strains demonstrate the need to use one particular interactions of the Ty integrase(s) with the transcriptional strain as a reference system. complexes formed over the intragenic promoter elements As observed in chromosome XI (Dujon et al., 1994 ), of the tRNA genes or triggered by positioned nucleosomes the compositional periodicity in chromosome II correlates in the 5' flanking regions of the tRNA genes (Feldmann, with local gene density, as is the case in more complex 1988; Ji et al., 1993). In any case, the Ty integration genomes in which isochores of composition are, however, machinery can detect regions of the genome that may much larger (Bemardi, 1993). Although the fairly periodic represent 'safe havens' for insertion, thus guaranteeing variation of base composition is now evident for the three survival of both the host and the retroelement. sequenced yeast chromosomes, its significance remains About two thirds of the genes or markers mapped to unclear. Several explanations for the compositional chromosome II could be assigned to an ORF or an RNA distribution and the location-dependent organization of gene on the basis of previous sequence data, the use of individual genes have been offered (Bemardi, 1993; Dujon probes or gene function. At present, 35 genes or markers et al., 1994), some ofwhich could be tested experimentally. remain unassigned. Further assignments must await the For example, transcription mapping of a whole chromo- correlation of our sequence data and new information that some could give a clue as to whether such rules influence will become available in the literature. Three genes mapped the expression of genes. Furthermore, long-range deter- on chromosome 11, MELl , SUCJ and MGL2, are absent mination of DNase I-sensitive sites may be used to find from the strain aS288C. MEL and SUC genes, which are a possible correlation between compositional periodicity involved in carbohydrate metabolism, have been found and chromatin structure along a yeast chromosome. Sim- previously as subtelomeric repeats in several yeast strains. ilarly, knowledge of the sequence provides a basis to The presence of multiple gene copies could be attributed search for potential ARS elements, thus enabling functional to selectivc pressure induced by human domestication, replication origins to be sorted out experimentally. In but it appears that they are largely dispensable in laboratory Figure 3 we have listed the location of 36 ARS elements strains (such as aS288C) which are no longer used in which completely conform to the II bp degenerate con- fermentation processes. A comparison at the molecular sensus sequence (Newlon, 1988; van Houten and Newlon, lcvel of aS288C with brewer's yeast strain C836 clearly 1990). Several of these were found associated at their 3' shows that the SUC genes are present on chromosome extensions with imperfect (one to two mismatches) parallel II of the latter strain (Stucka, 1992). Non-homologous and/or antiparallel ARS sequences or putative ABFI recombination processes may account for the duplication binding sites, reminiscent of the elements reported to be of these and other genes residing in subtelomeric regions critical for replication origins (Bell and Stiliman, 1992; (Michels et al., 1992), reflecting the dynamic structure of Marahrens and Stiliman, 1992). Remarkably, these patterns yeast telomeres in general (Louis et al., 1994). Altogether, are found within the GC valleys, suggesting that functional the experience gained from the yeast chromosomes replication origins might preferably be located in AT-rich sequenced so far shows that genetic maps provide valuable regions. A similar correlation is apparent from an analysis information but that in some cases they may be misleading. of chromosome XI (data not shown) and, more convinc- Thercfore, independent physical mapping and eventual ingly, when the distribution of functional replication determination of the complete sequences is needed to origins mapped in 200 kb of chromosome III (Dershowitz unambiguously delineate all genes along chromosomes. and Newlon, 1993) is compared with the GC profiles of

5805 H.Feldmann et al.

CHRII

CHRXI

CHRIII

Ty5

~ b

tel

TG1-3 (-300) STA-A(37) STA-G (43) X-core(460) STA-B (62) STA-D (148) 5kb

Fig. 5. OrgaIlizatioIl or telomcric regioIls. Thc I ()~ 13 kb rrom cach eIld or thc scqueIlces or chromosomes II. XI aIld II1 arc represeIlled by lhc mosaic boxcs Rcpetitious scqueIlces or diftercIlt types (a. -8()() bp; b. -1 kb; c, four coIlsecutive regioIls of -II, ()X, 3() aIld 2() kb, rcspcctively arc iIldicated by thc triaIlgular segmcIlts withiIl thc boxcs. The telomcrc rcgioIls (tel) arc showIl as black boxes They coIlform to lhc coIlseIlsus pattcrIl describcd by Louis 1!1 {1[ ( 1994 ), COIlsistiIlg of a variable Ilumbcr of TGi i-,1 rcpeats, four types of subtelomcric repcats (STRs) aIld aIl X C( segmeIlt (scc iIlscrt, IlOt drawIl to scale) Thc locatioIls or ORFs are iIldicated by arrows abovc (.WatsoIl' slraIld) aIld below (.Crick' straIld) cach chromosomc paIlel this chromosome {Sharp and Lloyd, 1993). The spacing a roJe in the positioning of chromosomes in the nucleus. of -100-11 O kb of the AT -rich regions is compelling, Chromosome II might then offer an experimental system bccause this is also the observed spacing between active ta address the functional signiflcance of a particular origins {for a review see Fangman and Brewer, 1992). Of Y' element. course, functional ARS elements have yet to be defined A camparisan of the termini of chromosome II with for chromosomes II and XI, and also for the remainder those of chromosomes III and XI revealed that our of chromosomc III. In this context, it would be interesting chromosome II sequence not only extends into genuine to see whether the putative origins of replication and the telomere regions but that these three chromosomes share chromosomal centromeres in chromosomes II and XI extended similarities in their subtelomeric regions by the might maintain specific interactions with the yeast nuclear occurrence of repetitious sequences of different types. ,caffold {Amati and Gasser, 1988). It is not surprising While segments b and c (Figure 5) represent interchromo- that ARS elements possibly functioning as replication somal subtelomeric duplications (Dujon er al. , 1994 ), an origins occur next to the histone genes in chromosome II -800 bp sequence (Figure 5, segment a) is found as an {located at both sides of the centromere), but it is puzzling inverted dupIication near both termini of chromosome II. that the majority of the tRNA genes are flanked by such These duplicated regions contain ORFs, the putative ARS elements. In all of the ycast chromosomes sequenced products of which exhibit high similarity; but their func- thus far, ARS elements located in the subtelomeric regions tions remain unclear because no homalagues of known are closely associated with specific sites for origin binding functian can be faund in the databases. factors {Eisenberg et a/., 1988; Estes et a/., 1992). A survey of previaus sequence data and sequences A comparison of the telomere regions of chromosome obtained in the yeast sequencing programme suggests Il with those of chromosomes I\\ and XI (Figure 5) that there is a considerable degree of internal genetic revealed the characteristic subtelomeric structures (.tel') redundancy in the yeast genome (Dujan er al., 1994). found in all yeast chromosomes (Louis et a/., 1994). As Whereas an estimate of sequence similarity (both at the inferred from our mapping data and the detailed analysis nucleatide and the amino acid levels) becomes predictive of the yeast telomeres {Louis et a/., 1994), chromosome at this stage, it still remains difficult to correlate these II carries an additional 5.2 kb Y' element at its left end; values ta functional redundancy because on I y in a limited because of its particular structure, this element from number of cases have gene functions been defined pre- chromosome II could not be cloned as yet. There are two cisely. Classic examples of redundant genes in yeast are Y' classes, 5.2 and 6.7 kb in length, both of which include the MEL, SUC and MAL genes that are faund in the an ORF for a putative RNA helicase of as yet unknown subtelameric regions of several chromosomes. There is function. y , s show a high degree of conservation but vary also a great variety of interna] genes that appear ta have among difterent strains, as weIl as within a single strain, arisen fram duplications, as suggested by the analyses of with respcct to their presence {Louis and Haber, 1992; chromosomes II and XI. In chramosame II, this concerns Louis et a/. , 1994 ). Experiments with the e.\11 { ever -16°lc of the total ORFs, while this figure is estimated to shortening telomeres) mutants, in which telomeric repeats be on I y 4% in chramasame XI. However, in these and other are progressively lost, have shown that the senescence of cases available fram the literature, sequence similarities at these mutants can be rescued by a dramatic proliferation the nucleatide level are generally restricted ta the coding of Y' elements {Lundblad and Blackburn, 1993). Several regions and da not extend into the intergenic regions. Thus, additional functions have been suggested for these the correspanding gene products share high similarity in elements {for a review see Palladino and Gasser, 1994 ), terms of amino acid sequence ar sametimes are even such as extension of telomcre-induced heterochromatin, identical; they may be functionally redundant but their protcction of nearby unique sequences from its effects or expression will depend on the nature of the regulatory

5806 Yeast chromosome II elements. This has been demonstrated experimentally in library. DNA sampIes prcpared from these clol1cs wcrc tral1sfcrrcd [() gJidded filters al1d used for hybridizatiol1s (Stucka, 1992). A sct of numerous cxamples. prominent cascs being the PHO3 and overlappil1g cosmid clones containing chromosome II inserts was estab- PHO5 genes located next to each other on chromosome lished by (i) hybridizations of the ordered cosmid cloncs witb cbromo- II. Biochcmical studies also rcvcaled that in particular some II DNA; (ii) chromos()mal walking and (iii) by using a c()llecti()n cases .redundant. proteins can substitute each other. thus of -I ()() uniquc restriction fragmel1ts precisely mappcd ()n C836 chr()m()- accounting for the fact that a large portion of single s()me II as a reference library of 'sequenccd taggcd markers' Restriction pr()tilcs were obtained for all clones by using at Icasl tbc four restriction gene disruptions in yeast do not impair growth or cause el1zymcs BamHI, Sall, XhaI and Xh(}l abnormal phenotypes. This does not imply. however. that Riliht tel(}mere re,~i(}n (!r ,hr(}/m,,(}me 1{ pEL 19B2, a plasmid containing these .redundant. genes were a pri()ri dispensible. Rather. the right telomere of chromos()mc 11, was constructed following thc thcy may bc designed to help adapt yeast cells to particular pr()ccdure as described by Louis ( 1994) 111brief, DNA from URA , cnvironmental conditions. These notions are of practical transf()rmants of aS288C transformcd with pEL61 was prepared for importancc when carrying out and intcrpreting gene disrup- CHEF gel al1d Southern analysis. Transgenes that had il1tegraled thc vector by homologous rcc()mbinatiol1 within the right tclomerc ()f tion experiments. chrom()some II were identified by probing CHEF bl()ts before and afler Thc availability of the complete sequence of chromo- diagl1ostic N(}tl restriction The DNA from a right telomere integrant some II not only provides further insight into genome was digested with BamHI and ligated at I()w DNA c()ncentratiol1 organization and evolution in yeast, but extends thc This ligatiol1 was transt()rmed into t'c(}li strain HB 1() 1 pyrF- using catalogue of novel genes detected in this organism. Of electroporation One tral1sformal1t, pEL 19B2, carryil1g an -14 kb inserl fr()m the right arm of chromosomc 11, was selected by diagnostic general intcrest may be those that are homologues to S()uthcrl1 hybridizations. genes that pert.orm differentiated functions in multicellular

organisms (YBL088c and YBR 136w, homologues to phos- Telomere mapping phatidyl inositol kinases; YBL056w and YBR 125c, homo- Physical mapping of the telomercs was pcrformcd using Ihe I-Scel logues to phosphoprotein phosphatases; YBR274w. chromosomc fragmentation procedure describcd by Thierry and Dujon (1992) Yeast slrain FY73 and the 1 1 kb B{lmHI fragmcnl from pAFI01 homologue to cytokine family protein kinase; YBR I 08w. (thc .pAF cassette' containing Ihc URA.J gene and Ihe I-.~cel sile; Thierry probable homologue to Dr().\,()phila ma,\'termind) or that el {1{. 1990) was used The cassctlc was cngineered 10 be inIegJaled into might be of relevance to malignancy (YBL024w. homo- dcfincd silcs of the lefl and right terminal-mosl cosmids, respeclively logue to p 120. major human antigen associated with DNA isolated from Ihe Iransgenes obtained in Ihis way was then analysed malignant tumours; YBRO08c. YBR043c and YBR293w. using ISceI and a number of olher approprialc rcstriction enzymes, resolved by pulsed-ficld gcl clectrophoresis and thc lcnglhs of thc probable multidrug resistance proteins; YBR295w. P-type terminal-most reslriclion fragmcnls determined by hybridizalion wilh coppcr transporting ATPase. homologue to Menkes and diagnoslic probes (H.Feldmann el {1{, manuscripl in preparalion ) Wilson disease genc). Although the role of these genes has still to be clarified. yeast may offer a useful experi- Sequence assembly, sequence analysis and quality controls mental system to identify their function. On the other Sequencc assembly in the single contracting lab()rat()ries was performed hand. the wealth of information to be expected when the by a variety of s()ftware program packages C()mplctcd c()ntigs submitted t() the Martinsried Institutc fl)r Protein Sequences (MIPS) were stored yeast genome scquencing programme progresses clearly in a data library and asscmbled using thc GCG softwarc packagc 72 dcmands that new routes are explored to investigate the for thc VAX (Devereux et {[I., 1984). Spccial software developed f()r thc functions of novel genes. VAX by Dr SLiebl at MIPS was used t() I()catc and translate ORFs (ORFEX and FINDORF), t() retrieve noncüding intcrgcnic sequences (ANTIORFEX) and to display various features of thc scquence(s) on Materials and methods graphic devices (XCHROMO; an intcractive graphics display pr()gram, vcrsi()n 20) The sequence has becn interpreted using thc foll()wing Strains, plasmids, vectors and general methods principles (i) All intr()n splice site/branch-p()int pairs detected using Thc t()II()wil1g ycast slrail1s were employed C836, a diploid brewers specially defined pattcrns (F()ndrat and Kalogeropoul()s, 1994; KKleine ycasl; aS2RXC (YGSC); FY73 (MATa um3-52 hi\c'C1200 GAL2) derived and Hfeldmann, unpublishcd results) were listed (ii) All ORFs con- from lhc slrail1 aS2RRC (Thicrry al1d Dujol1, 1992) FY73/a224-pAFl01 taining at least 100 contiguous sensc cod()ns and not contained cntirely al1d FY73/aI0011-pAFI01 are lral1sgel1ic strail1s derived from FY73 in a longer ORF ()n either DNA strand werc listcd (this inclu'1Jes partially carryil1g lhc 1-.),1'1 silc wilhil1 the right al1d Icfl arm lelomeric regiol1s overlapping ORFs, indicatcd by asterisks in Figure I) (iii) The tW() lists ()f chr()m()s()mc II, respectively pYc3030, a cosmid shuttle vect()r wcrc merged and all intron splicc site/branch-point pairs occurring inside carryil1g lhe 2~ plasmid origil1 ()f replicatiol1 al1d HIS-, as a gel1elic an ORF but in opposite orientations werc disregarded (iv) Ccntromcrc markcr ( H()hl1 al1d Hil1l1el1, 1980), was used for cosmid clol1il1g lhrough- and tcl()mere regions, as weil as tRNA gencs and Ty clcments ()r oul Cosmids were propagated il1 E"hl'ri,hi(1 ,oli slrail1s A490 al1d remnants thereof, were sought by comparis()n with a previously HBI01 pAFI01 is a plasmid carryil1g lhc URA-, marker al1d the Is,I'I characterized dataset ()f such elements (K.Klcinc and H.Feldmann, sile (Thierry ,'1 (iI.. 1990) pEL61, a veclor derivcd from pGEM-3Zf( ~ ) unpublished results) including thc database entries provided in a tRNAI by lhe il1scrti()11 ()f a (G I JT)J(){) repeat sequel1ce al1d carryil1g URA3 as tRNA gcnc library (Steinberg et {[C, 1993; rctricved from the EMBL ftp a sclcclivc markcr, was used for lcl()mere cl()l1il1g Slal1dard proccdurcs server) All sequenccs submitted by collaborating lab()rat()rics t() the were used il1 rccombil1al1l DNA lechl1iques (Sambrook I't al, 19R9) MIPS data library were subjcctcd t() quality controls similar t() thosc Yeasllral1sl()rmaliol1 was carried oul by lhe proccdurc of Ito 1'1al ( 19R3 ) pcrf()rmed in the w()rk on chromosomc XI (Duj()n et {[1, 1994) Sequencc verificati()ns were obtained from (i) the original ()vcrlaps hetween 33 Chromosome // DNA c()ntiguous segments (total of 40037 bp); (ii) resequencing ()f selectcd C(ilc'lrll,lil,ll (If ("c,mid libraries, re5lricli(}n mapping and cQ'mid scgments (209 hp to 14.6 kb long; 2255 bp ()n average; total of di,lributi(}n. 1\ sct ()f ()vcrlapping c()smid cl()nes containing chr()m()some 58635 bp); and (iii) rcscquencing of suspected segments fr()m designcd II inserts and issucd fr()m a gen()mic library ()f yeast strain aS288C was oligonucleotide pairs (21()-1530 bp I()ng; 511 bp on average; total ()f used as the DNA material Similar t() pr()cedures describcd carlier in the 6646 bp) l.()nstructi()n ()f a chr()m()s()mc II-specific c()smid library fr()m strain Scarches lor similarity of proteins to entrics in thc databanks werc C836 (Hauhcr el ac. 1988; Nelböck, 1988; Stucka, 1992), t()tal DNA pert()rmcd by FastA (Pear\on and Lipman, 1988), BlastX (Altschul fr()m aS288C was suhmitted t() partial digesti()n with Sau3A, size- et {[C , 199()) and FLASH (Califan() and Rigouts()S, 1993 ), in comhination tr"cti()nated tragmcnts cl()ncd int() the vect()r p y c3()3(), DNA samplcs wilh the Protein Sequence Database of PIR Intcrnati()nal (rcleasc 41 ) r"ck"ged in \.ilr(} int() lambda particlcs "nd E,(}li A49() transfected with and other public databascs Protein signatures were detected using the thcsc Fr()m " t()tal ()f 2()() ()()() cl()nes, 3()()() (ah()ut seven gen()me PROSITE dictionary (relcasc 11.1; Bairoch, 1989) ORFs were con- c4uiv"lcnts) wcrc individu"lly "mrlitied and kept as an ()rdered c()smid sidcrcd t() be hom()l()gues or t() have probable functions when thc

5807 H.Feldmann et al.

alignments from FastA searches showed significant similarity and/or Estes,H.G, Robinson,BS and Eisenberg,S ( 1992) Pr{)c N(ltl Ac(ld cS(i protein signatures were apparent; at this stage of analysis, FastA USA,89, 11156-11160 scores < 150 were considered insufficient to confidently assign function. Fangman,W.L. and Brewer,BJ. (1992) Cell, 71,363-366 Compositional analyses of the chromosome (base composition; Feldmann,H (1988) In Grunberg-Manago,M, Clark,BFC. and nucleotide pattern frequencies, GC profiles; ORF distribution profiles, Zachau,H.G (eds), Ev{)luti{)n(lT!. Tinkerin!( in Gene loxpressi(m NATO etc.) were performed using the X1I program package (CMarck, unpub- ASI Series, Life Sciences A J69, Plenum Press, New York, pp 79-86 lished rcsults). For calculations of CAI and GC content of ORFs, the Fondrat,C. and Kalogeropoulos,A (1994) Curl: Genet, 25,396-406 algorithm CODONS (Lloyd and Sharp, 1992) was used. Comparisons Goffeau,A (1994) Nature, 369,101-102 of chromosomc II ,equence with databank entrics (EMBL databank, Goffeau,A, Nakai,K., Slonimski,PP and RislerJL ( 1993a) Ff:HS Let( , rclease .'19; GenBank, release 8.'1) were based on a new algorithm 325,112-117. developed at MIPS by KHeumann. Goffeau,A., Slonimski,P.P, Nakai,K and RislerJL ( 1993b) Yl!a't, 9, 691-702 HauberJ., Stucka,R, Krieg,R and Feldmann,H ( 1988) Nucleic Acid, Acknowledgements RI!', 16, 10623-10634 The Laboratory Consortium operating under contracts with the European Hohn,B and Hinnen,A ( 1980) In SetlowJK and Hollaender,A (eds), Commission was initiated and organised by A.Goffeau. This study is Genetic En!(ineering Plcnum Press, New Y()rk, pp 169-183 part of the second phase of the European Yeast Genome Sequencing Holmstr0m,K" Brandt,T. and Kalles0e,T (1994) Yea,l, 10,47-62 Project carried out under the administrative coordination of A. Vassarotti H()nore,N et al. (1993) M{)I. Micr{)hi{){, 7,207-214 (DG-XII) and the Universite Catholique de Louvain, and under the Ito,B, Fukuda,Y and Kimura,A. (1983) J B(lcteri{){, 153, 163-168 scientific responsibility of H.Feldmann, as DNA coordinator, and Ji,H., M()()re,D.P, Blomberg,MA., Braitcrman,LT" Voytas,QF., H W.Mewes, as Informatics coordinator. We thank P.Mordam for account- Natsoulis,G and BoekeJ.Q (1993) Cell, 73,1007-1018. ing; AThicrry for supplying pAFIOI and yeast strain FY73; E.Louis Johns()n,D.I" Jac()bs,C.W" PringleJ.R" Robins()n,LC, Carle,GF and ft)r preparing pEL 19B2; PJordan and W.G.Peng for support in computing; Olson,M.V. (1987) Ye(l'I, 3,243-253 K.Heumann tor providing a new algorithm for database searches; our Klein,P., Kaneisa,M and Delcsi,C ( 1985) Bi{)chim Bi{)phy,. ACI(l, 815. colleagues for help and discussions; and J.Svaren for comments on the 468-476 manuscripl Datasets containing nucleic acid and protein sequences in Kunst,F and Devinc,K (1991) RI!\ Micr{)hi{)l" 142, 9()5-912 different standard database formats, including annotations and detailed Lloyd,AT and Sharp,PM (1992) J Hered., 83, 239-24Q tables, will be available through anonymous ftp rctrieval from the L()gghe,M" M()lemans,F., Fiers,W and Contreras,R (1994) Ye(lM, 10. t()llowing computer nodes ftp.ebiac.uk (/pub/databases/yeast-chrii); 1093-1100 mips.embnclorg (lanonymous.yeaslchrlll); gcnome-ftp.stanford.edu Louis,EJ (1994) Ye(lSI, 10,271-274. (/pub/yeast/genome-seq/chrll); and ncbi.nlm.nih.gov (/repository/yeast/ L()uis,EJ. and HaberJ.E. (1992) Genetic" 119, 3()3-315 chrlI) This work was supported by the EU under the BRIDGE Louis,EJ, Naumova,ES" Lee,A" Naum()v,G and Habcr,EJ ( 1994 ) Programme; the Region Wallone, the Fonds National de la Recherche Genelic" 136, 789-802. Scientifique and La Region de Bruxelles Capitale; Thc Bundesminister Lundblad,V and Blackburn,E.H. (1993) Cell, 73, 347-36Q für Forschung und Technologie and the Fonds der Chemischen Industrie; Marahrens, Y. and Stillman,B. ( 1992) 5.cience, 255, 817-823 The Ministere de l'Education Nationale and the Ministere de la Recherche Mallet,L, Bussereau,F and Jacquet,M (1994) Ye(l'I, 10,819-831 et de l'Espace; and The Ministere de la Recherche et Technologie; The Mannhaupt,G., Stucka,R., Ehnle,S, Vetter,L and Feldmann,H ( 1994) Jumelage Franco-Polonais du CNRS Ye(lsl, 10,1363-1381 Mcyerowitz,EM and Pruitt,R.E (1985) Science, 229, 1214-1218 Michels,C.A, Read,E, Nat,K and Charr()n,MJ (1992) Yea,t, 8,655- References 605. Mil()savljevic,A and JurkaJ ( 1993) CABIOS, 9, 409-411. Altschul,S.F., Gish, W., Miller, W., Myers,E Wand Lipman,D.l ( 199()) Miosga,T and Zimmermann,F.K. (1993) Yea\1, 9,1273-1277 J M(}I. Bim, 215, 403-4IQ Mortimer,RK, Cont()p()ulou,R and KingJ.S (1992) Yea\1, 8, 817-9()2 Amati,B.B. and Gasser,SM. (1988) Cell, 54, 967-978 Bairoch,A ( 1989) In EMBL Bi(}c(}mputinl; Technical D(}cument 4. Nasr,F, Becam,A.-M" Grzyb()wska,E" Zagulski,M, SI()nimski,P.P and Herbert,CJ (1994a) Curl: Gl!ne(, 26,1-7 EMBL, Heidelberg, Germany Baur,A, Schaaff-Gerstenschläger,I., Boles,E, Miosga, T, Rose,M and Nasr,F" Becam,A-M" SI()nimski,PP. and Herbert,C.J ( 1994b) CR Ac(ld Sci. P(lri.,/Life Sci., 317, 6()7-613 Zimmermann,FK. ( 1993) Yea~t, 9, 289-293. Becam,A-M et a[ ( 1994) Yeast, 10, SI-ll. Nelböck,P (1988) Ph.D. Thesis, University of Würzburg, Germany Bel1,SP and Stillman,B. (1992) Nature, 357, 128-134. Newl()n,C.S (1988) Micr{)hi{){ Rev., 52, 568-6()1 Bemardi,G (1993) Gene, 135, 57-66 Oliver,S et (l{ (1992) N(llurl!, 357,38-46 Bussereau,F., Mallet,L, Gaillon,L and Jacquet,M. (1993) Yeast, 9, Palladin(),F. and Gasser,SM. ( 1994) Curl: Opin Cell Bi{){, 6,373-379 797-8()6 Pcarson,W.R. and Lipman,DJ (1988) Pr{)c. N(lll Ac(ld. Sci U5'A, 85, Califano,A and Rigoutsos,I. ( 1993) In Pmceedings (}f the /,1 2444-2448 Internati(}nal C(}nference (}n Intellil;ent Systemsf(}r M(}lecular Bi(}l(}gy. Ramezani Rad,M" Kirchrath,L and H()llcnberg,C.P. (1994) Yea,l, 10, Bethesda, MD, pp 56-64. 1217-1225 Daniels,D.L, Plunkett,G., Burland,Y. and Blattner,FR. (1992) Science, SambrookJ., Fritsch,EF. and Maniatis,T. (1989) M{)lecul(lr CI(min!(: A 257, 771-778. Lah{)r(lt{)ry M(lnu(l{ C()ld Spring Harbor Lab()rat()ry Press, Cold Delaveau,T, Jacq,C and Perea,J. (1992) Yeast, 8, 761-768 Spring Harb()r, NY. Delaveau, T, Delahodde,A, Carvajal,E, Subik,J and Jacq,C. ( 1994) Schaaff-Gerstenschläger,L, Mannhaupt,G., Vetter,l" Zimmermann,FK. M(}[ Gen. Genet, 243, 501-5II. and Feldmann,H (1993a) Eul: J Bi{)chem, 217,487-492. Demoli"N, Mallet,L, Busserau,F. and Jacquet,M. ( 1993) Yea~t, 9, Schaaff-Gerstenschläger,L, Baur,A., B()les,E and Zimmermann,FK 645-659 (1993b) Yea\1, 9,915-921. Demolis,N., Mallet,L and Jacquet,M. (1994) Yea~t, 10, in press. Schaaff-Gerstenschläger,l" Schindwolf, T" Lehncrt, T. R()sc,M and Dershowitz,A. and Newlon,C.S. (1993) M(}[ Cel[ Bi(}[, 13, 391-398. Zimmermann,FK. (1994) Ye(l.l1, 10, in press Devereux,l, Haeberli,P and Smithies,O. (1984) Nucleic Acids Re.~., 12, Scherens,B, EI Bakkoury,M., Vierendeels,F, Dubois,E. and 387~395 Messenguy,F. (1993) Ye(l'I, 9, 1355~1371 Dc Wergifosse,P, Jacques,B., Jonniaux,J-L, Purnellc,B, SkaIa,J. and Sharp,PM and Li,WH. (1987) Nucleic Acid, Res, 15, 1281-1295 Goffeau,A ( 1994) Yea~t, 10, in press Sharp,P.M and Lloyd,A.T. ( 1993) Nucleic Acid, Res, 21, 179-183 Doignon,F" Biteau,N, Crouzet,M and Aigle,M (1993a) Yea~t, 9, Skala,J" van Dyck,L, Purnelle,B and Goffeau,A (1992) Ye(l'I, 8, 189-199 777-785 Doignon,F" Biteau,N, Aigle,M and Crouzet,M. ( 1993b) Yeast, 9, SkalaJ., van Dyck,L, Purnellc,B. and Goffeau,A (1994) Ye(l'I, 10, 1131-1137 S13-24 Dujon,B et a[ (1994) Nature, 369, 371-378. Smits,P., De Haan,M, Maat,C. and Grivell,LA. ( 1994) Yea\1, 10, S75-8Q Eisenberg,S., Civalier,C. and Tyc,B.K. ( 1988) Pr(}c Natl Acad Scl Steinberg,S" Misch,A and Sprinzl,M (1993) Nu(,leic Acid, Re" 21, USA, 85, 743-746 3()11-3015

5808 Yeast chromosome II

Stucka,R ( 1992) Ph.D Thesis. Ludwig-Maximilians-Universität Munich, Germany Stucka,R and Feldmann,H. (1994) In Johnston,J. (ed.), Molecular Geneti" (!f Yea\1: A Practical Approach. Oxford University Press, Oxford. UK, pp. 49-64. Thierry,A. and Dujon,B ( 1992) Nucleic Acids Res., 20, 5625-5631. Thicrry,A, Fairhead,C and Dujon,B (1990) Yea.~t, 6,521-534. van dcr Aart,Q.JM, Barthe,C, Doignon,F., Aigle,M., Crouzet,M. and Stccnsma,HY (1994) Yeast, 10,959-964. Yan Dyck,L. Purnelle,B, Skala,J. and Goffeau,A. (1992) Yeast, 8, 769-776 Yan Dyck,L, Pcarce,D. and Sherrnan,F. (1993) J. Biol. Chem., 269, 238-242 Yan Dyck,L, Jonniaux,J-L, de Melo Barreiros,T., Kleine,K. and Goffeau,A. (1994) Yea~t, 10, in press Yan Houten,JY and Newlon,S.M (1990) Mol. Ce/l. Biol., 10,3917- 3925 Yassarotti.A and Goffeau,A (1992) Trends Biotechnol., 10,15-18 Wilson,R et al (1994) Nature, 368,32-38. Wolfc.KH and Lohan,AJ.E (1994) Yea~t, 10, S41-46. Zagulski,M, Becam,A-M., Grzybowska,E., Lacroute,F, Migdalski,A., Slonimski,PP, Sokolowska,B. and Herbert,CJ. (1994) Yea~t, 10, 1227-1234

Recei"ed 011 AUKU\116. 1994: revised 011 September 21. 1994

5809