YEAST VOL. 12: 1085-1090 (1996)

oo O OO 0 00 IV '00 Yeast Sequencing Reports

0 0000 Sequencing and Analysis of a 35.4 kb Region on the Left Arm of Chromosome IIV from Saccharomvces J cerevisiae Reveal 23 Open Reading Frames

LIV GUNN EIDE, CHRIS SANDER AND HANS PRYDZ* Bioteclinology Centre of Oslo, PO BO.Y1125, Blina'crn, N-0316 Oslo, Norwy

Received 15 March 1996; accepted 22 April 1996

The complete DNA sequence of cosmid clone 31A5 containing a 35452 bp segment from the left arm of chromosome IV from Saccharomyces cerevisiae, was determined from an ordered set of subclones in combination with primer walking on the cosmid. The sequence contains 23 open reading frames (ORFs) of more than 100 residues and the tRNA-Val2a . Five ORFs corresponded to the known yeast SNQZ, SESl, GCVI, RPL2B and RPS18A. The DNA sequence for RPSl8A is interrupted by an intron. One ORF corresponded to a part of the yeast gene HEX2 at the end of the cosmid insert. Four ORFs encoded putative proteins which showed strong homologies to other previously known proteins, three of yeast origin and one of non-yeast origin. Two ORFs were classified as having borderline homologies: one had similarity to two protein families and another to two protein products of unknown function from other species. The remaining 11 ORFs bore no significant similarity to any published protein. The complete DNA sequence has been submitted to the EMBL data library, Accession Number X95966.

KEY WORDS - Sriccharom)?ces cerevisiae; chromosome IV; SNQ2; SESl; GCVI; RPLZB; HEXZISRNI; RPSIYA; tRNA- Vall2a

INTRODUCTION mosome IV DNA and cloned into the BaniHI site of vector pWEl5, was received from Dr C. Jacq Within the framework of the European Union (Laboratoire de Ghetique Moleculaire, Ecole BIOTECH programme of sequencing the Sacclia- Normale Superieure, Paris). Subcloning was romyces cerevisiae genome, we have sequenced a performed in pUCl8 or pBluescript vectors. 35 452 bp fragment (cosmid clone 31A.5) on the Escherichiu cofistrain DH5a was used to obtain all left arm of chromosome IV. This fragment is subclones and deletion clones. Synthetic oligo- located near the centromere, and flanked by deoxynucleotide primers were made at the the yeast insert from cosmid clone 3 1D12 Biotechnology Centre of Oslo (Dr E. Babaie). (Urrestarazu) and 2M15 (Arnold). Sequencing .strategies atid metliods MATERIALS AND METHODS An ordered sequencing strategy was used to Costnid, vectors, strain and oligonucleotides determine the sequence of 31A5. The cosmid DNA 31A5 was digested with EcoRI and the fragments Cosinid 31A5, containing a 35.5 kb insert, were separated on an agarose gel. Eight restriction obtained by a partial MboI digest of yeast chro- fragments in the range of 200-7400 bp were *Corresponding author. isolated and subcloned. Another three smaller

CCC 0749-503)(/96/ 101085-06 'Q 1996 by John Wiley & Sons Ltd 1086 L. G. EIDE ET AL. fragments (67-147 bp) were detected by Sequence analysis sequencing. Analysis of the sequence revealed 23 ORFs of Restriction maps of the subclones were !Per- more than 100 amino acid residues and a gene for ated to construct second-order restriction clones. tRNA-Va12,& The ORFs were distributed in all six All clones were sequenced on both strands with reading frames and also evenly Over the whole universal andlor reverse primers in combination cosmid, except for a 2 kb fragment where no ORF with primer walking. was detected. All ORFs were provisionally named The sequences Of the remaining first-order by MIPS: pz, followed by a letter for the frame and EcoRI fragments, shorter than 200bP Or longer the number of amino acids deduced from the than 7400 bp, were Obtained by using primer sequence (Table 1). Sequences of puts- ing strategy On the cosmid. To tive genes were then aligned with updated protein the order of the EcoRI fragments, primers were databases, The deduced amino acid sequences designed and sequences obtained with the entire were also analysed using the GeneQuiz automated cosmid as a DNA for sequence search program (Casari et al., 1995). The codon reactions were prepared using Tip20 preference values (Table 1) were calculated using a (Qiagen) Or by CsCl-gradient Purification. Double- yeast codon usage table made out of 435 genes stranded dideoxY DNA sequencing was carried Out from cerevisiae found in GenBank 63 (produced using the Autoread Kit (Pharmacia) with simpli- s. by J. Michael Cherry with the GCG program fied denaturation (Zimmerman et al., 1990) and T7 CodonFrequency). The lower the value, the DNA - and with the modifications greater the probability that the ORF represents necessary for using internal labelling with a functional gene. Table 1, in addition, shows fluorescein-15-dATP (VOSS et Ul., 1992). For the the best optimized FASTA and the direct sequencing on cosmid DNA, we modified corresponding homo~ogousproteins. the protocol further by using only 5 pg template, 40 pmol primer, denaturing at 85°C and increasing the amount of T7 DNA-polymerase to l0U per Open reading frames reaction. Five of the ORFs encoded genes identical or Walking primers were designed so that they highly similar to known yeast genes. In addition, either ended with a dATP or the first nucleotide to one ORF encoded the end of a known yeast gene. be incorporated in the sequencing reactions was a Four ORFs encoded putative proteins which (fluorescent) dATP (Wiemann et d., 1995). showed strong homologies to other known pro- teins, three of yeast origin and one of non-yeast Hardwave and software origin (Table 1). Two ORFs were defined to be DNA sequencing was carried out using a borderline cases with FASTA scores of 201 and standard automated ALF DNA sequencer 169. No significant similarities were observed for (Pharmacia). Raw data collection and evaluation the Other l1 ORFs. They showed Optimized were performed with the ALF Manager software FASTA 125 and less than 29% (pharmacia), F~~~~~~~assembly and primer identity over maximum 174 amino acid residues to design were done with the Geneskipper program known proteins. (C. Schwager, EMBL). Open reading frames Five ORFs encoded yeast genes already known. (ORFs), codon preferences and the codon usage ORF pzA462 encoded s. cerevisiae seryl-tRNA profile were calculated with Geneskipper and the synthetase, Seslp (Weygand-Durasevic et UWGCG program package in parallel. 1987). The ATP-dependent permease conferring hyper-resistance to certain chemicals, Snq2p (Servos et al., 1993), was encoded by ORF RESULTS AND DISCUSSION pzB1501. ORF pzD400 encoded the S. cerevisiue glycine cleavage T protein, Gcvlp (McNeil et al., Sequence determination AC L41522). Our cosmid sequence for the above The final sequence of 35 452 bp was determined three genes differed in each gene by one single by a total number of 266 overlapping fragments nucleotide from that already published. These dif- and a redundancy of 3.0. A total of 175 walking ferences also caused differences in the amino acid primers were designed, and the average reading sequence (Table 2). The calculated codon prefer- length was 406 bases. ences for ORFs pzB362 and pzAl56i gave rather 35.4 kb REGION ON LEFT ARM OF CHROMOSOME IV 1087 Table 1. Analysis of ORFs in cosmid 31a5. Provisional names of ORFs assigned by MIPS. Codon preference values calculated from yeast codon usage table based on 435 genes from S. cerevisiae in GenBank 63. The deduced amino acid sequences of the ORFs were compared with available data in relevant libraries. Two cases of borderline similarity are in brackets.

Codon FASTA YOIdentity and ORF MIPS preference score length of overlap Homologous protein Organism

1 pzA520 0.91 2082 7 4% 517 aa , Gallp S. cerevisiae 2 pzA208 1.69 20 1 27% 190 aa “KIAAOl86 protein’, unknown H. sapiens function] 3 pzAll4 2.88 77 25% 80 aa No homology observed 4 pzA462 0.7 2328 99% 462 aa Seryl-tRNA synthetase, Seslp S. cerevisiae 5 pzA161 2.43 72 26% 91 aa No homology observed 6 pzA109 4.16 83 25% 98 aa No homology observed 7 pzAlO5 2.8 84 25% 94 aa No homology observed 8 pzB 1501 0.33 7593 99% 1501 aa Snq2p protein S. cerevisiae 9 pzB362 4.26 1643 99% 361 aa 60s ribosomal protein, Rpl2Bp S. cerevisiae 10 pzB647 1.81 93 25% 94 aa No homology observed 11 pzC399 1.13 1313 62% 392 aa Initiation factor 4A-3 N. plurnbagingolia 12 pzC 109 3.43 70 21% 44 aa No homology observed 13 pzF240 2.53 1001 100% 240 aa End of Hex2p protein S. cerevisiae 14 pzF889 1.45 102 18% 117 aa No homology observed 15 pzF396 1.19 811 41% 362 aa 45.5 kDa protein in ATP3-RPS18b S. cerevisiae intergenic region 16 pzF1050 1.24 125 21% 174 aa No homology observed 17 pzF 129 3.13 95 29% 86 aa No homology observed 18 pzD196 3.7 85 27% 82 aa No homology observed 19 pzD400 1.45 1960 990/0 400 aa Glycine cleavage T protein, Gcvlp S. cerevisiae 20 pzE570 1.47 650 33% 389 aa DNA binding protein, Reblp, S. cerevisiae 21 pzE232 2.74 169 25% 171 aa [uridine , Urklp] S. cerevisiae 22 pzEl10 4.63 84 26% 73 aa No homology observed 23i pzA 156i 4.54 76 1 100% 156 aa 40s ribosomal protein, Rpsl8Ap S. cerevisiae

Table 2. Comparison of present and previously pub- Rps 18Ap (Folley and Fox, 1994), respectively. lished DNA sequences of highly homologous genes. These genes have both been described previously Numbering in square brackets is based on complete in two versions. The latter was interrupted by an cDNAs and their deduced amino acid sequences. intron as described by Folley and Fox (1994). The ORF pzA156i on chromosome IV is identical to Nucleotide Amino acid difference difference the one designated RPSlSA, whereas RPS18B is ORF Gene 3 1a5lpublished 3 1a5lpublished located on yeast chromosome 11. RPS18A differed from RPS18B by 21 silent mutations and by having an intron of 339 bp instead of 511 bp. pzA462 SESI [671] T/C [224] Leu/Pro The designation of the A- and B-version of gene pzB1501 SNQ2 [233] T/A [78] Val/Glu RPL2 varies among sequences submitted to data pzD400 GCVI [366] C/G [122] Asp/Glu libraries. We suggest that the designation proposed by Presutti et al. (1988) should be used. The ORF pzB362 sequence was highly homologous to the partial sequence of RPL2B (Lucioli et al., 1988). high values (4.26 and 4.54), which normally might The RPL2 gene on yeast chromosome I1 is then suggest non-functional genes; both these ORFs RPL2A (Smits et al., 1994). The RPL2B gene encoded ribosomal proteins. ORFs pzB362 and differs from the RPL2A gene by II silent pzA156i coded for the S. cevevisiae ribosomal mutations and by one (G1066A) which results in proteins Rpl2Bp (Presutti et al., 1988) and an amino acid difference (Ala356Thr). 1088 L. G. EIDE ET AL. A fifth ORF, pzF240, at the end of the cos- conserved between mouse-, E. coli- and yeast- mid insert corresponded to one-quarter of the uridine is conserved also in the ORF S. cerevisiae HEX2IREGl ISRN1 (Niederacher and sequence (Figure 1). Phosphoribulokinases appear Entian, 1991), whose product is a negative regulat- to have the same conserved functional pattern as ory element in glucose repression. HEX2 mutants the uridine kinases (Figure 1). These observations are deficient in glucose repression, and defective may define an enzymatic family, probably of in RNA processing (SRNI), two very different similar enzymatic mechanism and 3D structure, of phenotypes given by the same gene. which pzE232 most likely is a member, although 25690-25763 were identical to those the identity to Synechocystis sp. phosphoribulo- coding for the tRNA-Val2a previously described kinase (Su and Bogorad, 1991) was only 16% over on chromosome I1 (Andre et al., AC Z35914), 186 amino acids. In the other borderline case, an carrying the anticodon sequence TAC. It is inter- interesting homology, suggesting an undescribed esting to note that this valine codon, apparently protein family, was observed. The deduced rarely used in yeast (Sharp and Li, 1987), has at sequence from the S. cerevisiae ORF pzA208 least two identical genes. showed similar homology to two ‘protein prod- Four ORFs had deduced amino acid sequences ucts’, of different origin, with unknown function. which showed similarities to other genes that were An identity of 27% over 190 amino acid residues is clearly significant. The ORF pzA520 amino acid shared with Homo sapiens ‘KIAA0186 protein’ sequence had 74.2’0 identity over 517 amino acids (Nagase et al., AC DSOOOS) and an identity of with S. cerevisiae galactokinase, Gallp (Smits 26% over 208 amino acid residues is shared et al., 1994) and may encode another galacto- with Caenorhabditis elegans ‘R53.6 product’ kinase. The ORF pzC399 had 62.0% identity over (Wilkinson, AC 266515). In addition, a fragment 392 amino acids with the eukaryotic initiation of this protein was sequenced by the Institute for factor 4A-3 in Nicotiana plumbaginifolia (Owttrim Genomic Research as an Expressed Sequence Tag et al., 1991) and showed high similarity to several (EST) from S. cerevisiae (Weinstock, AC T38583), other proteins in this eukaryotic initiation factor showing that pzA208 is indeed expressed. The EST family. The codon preference value for this ORF is shared an identity of 98% over 51 amino acid 1.13. The ORF pzE570 amino acid sequence was residues with the deduced sequence of ORF 39.7% identical over its full length with a DNA pzA208. The alignment of the four sequences binding protein Reblp from Kluyveromyces lactis showed strong conserved pattern homology (Morrow et al., 1993) and 33.4% identical over 389 (Figure 2). The relevance of these borderline cases amino acids with the corresponding protein from remains to be confirmed experimentally. S. cerevisiae (Ju et al., 1990) located on chromo- Of the remaining ORFs, pzAl14, pzA109, some 11. These proteins have two areas with high pzAlO5 and pzC109 overlapped with ORFs on the similarity to Myb, but there is not much Myb opposite strand and were very short (105-1 14 similarity in pzE570. The ORF pzF396 amino acid amino acids). None of these four ORFs showed sequence had 41.7% identity over 362 amino acids significant similarities with any protein in the with the S. cerevisiae putative 45.5 kDa protein in searched databases and their codon preference the ATP3-RPS18b intergenic region (Andre et al., values were generally high. They are most likely AC P38226). The closest similarity to any protein not functional genes. Seven ORFs remain for of known function for this ORF was with l-acyl- further analysis in the future. glycerol-3-phosphate acyltransferase from Zea The part of chromosome IV reported here does mays (Brown et al., 1994), with an identity of 36% not deviate significantly from previously described over 76 amino acids residues and an optimized cosmids (Dujon et al., 1994). It most likely con- FASTA score of 173. The codon preference value tains 18 genes (not counting the HEX2 fragment), was 1.19. The sequence revealed a hydrophobic i.e. one gene per 1.9 kb. ORF lengths vary between segment corresponding to residues 27-54, which 110 and 1501 amino acids, one gene out of 18 has may indicate a membrane-spanning domain. an intron. The base composition of the cosmid was Two ORFs with similarities of borderline signifi- A 30.5%, C 19.2%. G 18.5% and T 31.8%. Our cance were observed. In spite of relatively low analysis reveals two ribosomal protein genes on sequence similarity between the deduced amino chromosome IV which are close homologues of acid sequence of ORF pzE232 and the S. cerevisiae genes on chromosome 11, suggesting a fairly recent , Urklp (Kern, 1990), the pattern duplication. There was no evidence of sequence 35.4 kb REGION ON LEFT ARM OF CHROMOSOME IV 1089

Orfqze232 ......

orfqze232 ......

...... mmurki-mouse ......

orfqze232 L D N. - - - - . . M - I E G N I K S Y N N N D Y D F D - - N I L N . - L V Y E K - H AV T S N - 83 urkljeasturk-ecoliflCpK LDNFYN- D Q- S- HISPLGPEDRARAFMiEmVg: KNEYDFDEPNAINLDLAYKCILNLKEGKR- - -nN[HnSBHEH L Q AaRbSA 13589 mmurkimouse D C F Y K V. ..L T A E Q K A K A L . K G QBN F D H P D A F D N D L M H K T L K N I V E G K T 92 kppr-syny3 L D D Y H S ....L D R Q Gm. K A ..A G V T A L D. P R A N N F D L M Y E Q I K TDSG Q S 83

-TKSQIE 201 ...... L K S N E H L REL IiLG S S 276 ......

-AETMDLEK DmYYDLS ...... 232 . N Q V L SOHTM t L N. K NON C A D F V F Y F D R L A 321 ...... 213 mmurki-mouse GPN G---mNHKRTFPEP-.-GDHPGVLATGKR SHLESSSRPH-..... 260 kppr-syny3 ORK i T C T Y P G I K M YDG P D N F M G N E V SOLEV D GQi E NQE - - EM V Y V E N H L S 277 Figure 1. Alignment of the deduced amino acid sequence of pzE232 with uridine kinases from yeast (urkl), E. coli (urk) and mouse (mmurki), and the Synechocystis sp. (kppr-syny 3).

esLae_yeasL ...... 0 41 41 33

eSt4eYeaSt- ...... 0 77 91 67

est(e_ytast ...... 0 125 132 108

16 173 181 158

estlejeaat I Q T E X G - - V F N L I K D S Q F F V R Q S D V E R L I Q Q G Y L Q K I - - 51 cer516-celegansorfgza208 IF QEQDIT T E Y G - QE-[S@H!L - V F N L I K D S F F V RP RQ Q~C~M[R~V[ES D V E R L I Q Q G Y L Q L[H K I - -. 208 218 kiaa0186Lhuaan F E V D D G T S V L . L K K N S H@L P R W K C E Q L I R G V L E H I L S 196 Figure 2. Alignment of the deduced amino acid sequence of pzA208 with a C. rlrguns protein product (cer53.6), a human protein (kiaa 0186), and a yeast EST (est4e).

homology beyond the limits of the genes, neither ACKNOWLEDGEMENT when the flanking sequences of RPL2A and B were compared, nor when RPSl8A and B were The assistance of Dr Marianne Wright, Jorun compared. Solheim and Janne R0e in establishing some 1090 L. G. EIDE ET AL. subclones is gratefully acknowledged. This work necessary for glucose repression in yeast. Eur. J. was supported by grants from the European Biochem. 200, 311-319. Union (contract no. B102-CT94-2071) and the Owttrim, G. W., Hofmann, S. and Kuhlemeier, C. Research Council of Norway to H.P. (1991). Divergent genes for translation initiation fac- tor eIF-4A are coordinately expressed in tobacco. Nucl. Acids Res. 19, 5491-5496. REFERENCES Presutti, C., Lucioli, A. and Bozzoni, I. (1988).., Ribo- soma1 protein L2 in Saccharomyces cerevisiae is Andre, B., Cziepluch, C., Hein, C., Jauniaux, J. C., homologous to ribosomal protein L1 in Xenopus Urrestarazu, A. and Visser, S. EMBL AC 235914. laevis. Isolation and characterization of the genes. Andre, B., Cziepluch, C., Hein, C., Jauniaux, J. C., J. Biol. Chem. 263, 6188-6192. Urrestarazu, A. and Vissers, S. SwissProt AC P38226. Servos, J., Haase, E. and Brendel, M. (1993). Gene Brown, A. P., Coleman, J., Tommey, A. M., Watson, SNQ2 of Saccharomyces cerevisiae, which confers M. D. and Slabas, A. R. (1994). Isolation and char- resistance to 4-nitroquinoline-N-oxide and other acterization of a maize cDNA that complements a chemicals, encodes a 169 kDa protein homologous to 1-acyl sn-glycerol-3-phosphate acyltransferase mutant ATP-dependent permeases. Mol. Gen. Genet. 236, of E. coli and encodes a protein which has similarities 2 14-2 18. to other acyltransferases. Plant. Mol. Biol. 26, Sharp, P. M. and Li, W.-W. (1987). The codon adapta- 21 1-223. tion index-a measure of directional synonymous Casari, G., Andrade, M., Bork, P., et al. (1995). Chal- codon bias, and its potential applications. Nuc!. Acids lenging times for bioinformatics. Nature 376, 647- Res. 15, 1281-1295. 648. Dujon, B., Alexandraki, D., Andre, B. et al. (1994). Smits, P. H. M., De Haan, M.. Maat, C. and Grivell, Complete DNA sequence of yeast chromosome XI. L. A. (1994). The complete sequence of a 33 kb Nature 369, 371-378. fragment on the right arm of chromosome I1 from Folley, L. S. and Fox, T. D. (1994). Reduced dosage of Saccharomyces cerevisiae reveals 16 open reading genes encoding ribosomal protein S18 suppresses a frames, including ten new open reading frames, five mitochondria1 initiation codon mutation in Saccharo- previously identified genes and a homologue of the mjices cerevisiae. Genetics 137, 369-379. SCOI gene. Yeast lO(Supp1. A), S75-S80. Ju, Q., Morrow, B. E. and Warner, J. R. (1990). REBI, Su, X. and Bogorad, L. (1991). A residue substitution in a yeast DNA-binding protein with many targets, is phosphoribulokinase of Synechocystis PCC 6803 essential for growth and bears some resemblance to renders the mutant light-sensitive. J. Biol. Chem. 266, the oncogene myb. Mol. Cell. Biol. 10, 5226-5234. 23698-23705. Kern, L. (1990). The URKI gene of Saccharomyces Voss, H., Wiemann, S., Wirkner, U., et al. (1992). cerevisiae encoding uridine kinase. Nucl. Acids Res. Automated DNA sequencing system resolving 1,000 18, 5279-5279. bases with fluorescein- 1S*dATP as internal label. Lucioli, A,, Presutti, C., Ciafre, S., Caffarelli, E., Meth. Mol. Cell. Biol. 3, 153-155. Fragapane, P. and Bossoni, I. (1988). Gene dosage Weinstock, K. GenBank AC T38583. alteration of L2 ribosomal protein genes in Saccharo- Weygand-Durasevic, I., Johnson-Burke, D. and Soell, niyces cerevisiae: effects on ribosome synthesis. Mol. D. (1987). Cloning and characterization of the gene Cell. Biol. 8, 41924798. coding for cytoplasmic seryl-tRNA synthetase McNeil, J. B., Zhang, F. R., Taylor, B. V., Pearlman, from Saccharomyces cerevisiae. Nucl. Acids Res. 15, R. E. and Bognar, A. L. EMBL AC L41522. 1887-1904. Morrow, B. E., Ju, Q. and Warner, J. R. (1993). A Wiemann, S., Rupp, T., Zimmerman, J., Voss, H., bipartite DNA-binding domain in yeast Reblp. Mol. Schwager, C. and Ansorge, W. (1995). Primer design Cell. Biol. 13, 1173-1182. for automated DNA sequencing utilizing T7 DNA Nagase, T., Seki, N., Tanaka, A., Ishikawa, K. 1. and polymerase and internal labeling with fluorescein- 1.5- Nomura, N. Prediction of the coding sequences dATP. BioTechniques 18, 688-697. of unidentified human genes. Translated EMBL Wilkinson, J. Translated EMBL DataBank AC 266515. DataBank AC 080008. Zimmerman, J., Voss, H., Schwager, C., et al. (1990). A Niederacher, D. and Entian, K.-D. (1991). Characteriz- simplified protocol for fast plasmid DNA sequencing. ation of Hex2 protein, a negative regulatory element Nucl. Acids Res. 18, 1067.