<<

Proc. NatL Acad. Sci. USA Vol. 78, No. 3, pp. 1441-1445, March 1981 Biochemistry

Nucleotide sequence of the thymidine gene of herpes simplex virus type 1 (intervening sequences/transcriptional control elements/codon usage) MICHAEL J. WAGNER, JANICE A. SHARP, AND WILLIAM C. SUMMERS Departments of Therapeutic Radiology, Molecular Biophysics and Biochemistry, and Human Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, Connecticut 06510 Communicated by Edward A. Adelberg, November 17, 1980

ABSTRACT We have determined the complete se- MATERIALS AND METHODS quence of the thymidine Idnase (ATP:thymidine 5' phosphotrans- ferase, EC 2.7.1.21) gene of herpes simplex virus type 1 strain Plasmid DNA Preparation. Manipulations of recombinant CLO1I from a plasmid clone of viral DNA derived by Enquist et DNA were under conditions prescribed by the National Insti- at [Enquist, L. W., Vande Woude, G. F., Wagner, M., Smiley, tuAs-of Health guidelines. Plasmid pXI, which contains the J. R. & Summers, W. C. (1979) Gene 7, 335-342]. A cDNA copy HSV-1 CL101 TK gene, was maintained in the Escherichia coli of the 5' end of mRNA was also analyzed to lo- cate the transcribed sequences. The transcribed portion of the strain LE392 (11). Plasmid DNA was prepared from chloram- gene is approximately 1300 in length and appears to phenicol-treated cells by the cleared lysate technique (16) and contain no intervening sequences. There is an untranslated region purified by centrifugation through a 5-20% sucrose gradient. of 107 nucleotides at the 5' end of the mRNA followed by an open Cloning in M13mp2. EcoRI fragments of plasmid pXl were reading frame of 1128 nucleotides. The gene is thus capable of inserted into the EcoRI site of the single-stranded DNA bac- coding for a protein of 376 amino acids. Sequences similar to those teriophage cloning vector M13mp2 (17). Phage DNA was pre- thought to be involved in control of transcription and translation of a variety of eukaryotic and viral genes such as a "Hogness box" pared from 1- litercultures ofphage-infected E. coli cells. Phage and A-A-T-A-A-A polyadenylylation signals are also present in the were precipitated from the culture supernatant with 4% poly- herpesvirus thymidine kinase gene. ethylene glycol and then banded to equilibrium in CsCl gra- dients. The purified phage were extracted with phenol, and the Cells infected with herpes simplex virus (HSV) type 1 a novel DNA was purified by sedimentation through 5-20% sucrose thymidine kinase (TK; ATP:thymidine 5' phosphotransferase, gradients. EC 2.7.1.21) activity that has been shown to be encoded by the Preparation of Restriction Endonuclease Fragments. Re- virus (1-3). The TK gene is an early gene of HSV. In a lytic in- striction endonucleases were purchased from New England fection, expression of at least one immediate-early gene of the Biobabs or Bethesda Research Laboratories (Rockville, MD) or virus is necessary before expression of the TK gene can occur, were prepared in our laboratory according to published pro- and some product of the late genes of the virus (those expressed cedures. Digestion conditions were those suggested by New after the onset of viral DNA synthesis) results in the shutoff of England BioLabs. Fragments of pXl DNA were separated by TK gene expression (4, 5). This regulatory effect is believed to electrophoresis in agarose or acrylamide gels and eluted elec- occur at the transcriptional level (4, 5). trophoretically (18). Fragments from agarose gels were purified Viral TK also can be expressed in TK- cells which have been on columns of hydroxylapatite (Bio-Rad HTP) and Sephadex G- "biochemically transformed" to a TKV phenotype by infection 50(-18). with UV-inactivated virus (6, 7) or viral DNA fragments (8). The DNA Sequence Analysis. Two methods were used. The di- TK gene is heritably and stably incorporated into the genome deoxynucleotide chain termination method of Sanger et al (19) of the cell and is expressed in the apparent absence of the im- employed M13mp2 clones of TK DNA as templates and re- mediate-early viral gene products. However, some aspects of striction endonuclease fragments of pXl as primers. The chem- the normal viral regulatory circuits are maintained in these ical modification and cleavage method of Maxam and Gilbert cells, as indicated by the ability to obtain higher levels of TK (20) employed restriction endonuclease fragments of pXl ter- expression upon superinfection of biochemically transformed minally labeled with phage T4 kinase and [y- cells with TK- HSV (9, 10). 32P]ATP. The availability of a cloned TK gene (11-13), the ease with Isolation of mRNA. Cytoplasmic extracts from infected cells which TK mutants and revertants can be selected (3), the ability prepared by the method of Cremer et aL (15) were added to an to perform in vitro mutagenesis on the cloned TK gene and to equal volume of 10 mM Tris-HCl, pH 8/25 mM EDTA/75 mM rescue such mutants into virus (14), and the ability to detect TK NaCl/0.5% NaDodSO4 and extracted with an equal volume of activity and the TK polypeptide synthesized either in vivo or phenol saturated with 50 mM Tris-HCI, pH 8/100 mM NaCl/ in vitro (3, 15) make the TK gene a good subject for the study 5 mM EDTA. The aqueous phase and interphase were ex- of the regulation of gene expression at the molecular level, both tracted twice more with phenoVchloroform (1:1). The final in the infected cell and in the biochemically transformed cell. aqueous phase was digested with 100 ,ug of proteinase K per As a step toward this goal, we have determined the complete ml at 370C for 30 min and extracted with phenoVchloroform nucleotide sequence of the TK gene of HSV-1 strain CLIOl. (1:1). RNA was precipitated by the addition of 0.1 vol of 0.5 M Tris-HCl, pH 7.5/1 M NaCV50 mM EDTA and 2.5 vol of The publication costs ofthis article were defrayed in part by page charge ethanol. After 12 hr at -200C, RNA was collected by centrif- payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact. Abbrreviations: TK, thymidine kinase; HSV, herpes simplex virus. 1441 Downloaded by guest on September 24, 2021 1442 Biochemistry: Wagner et alPProc. Natl. Acad. Sci. USA 78 (1981) 10 20 30 40 50 60 70 80 GGGTCCTAGGCTCCATGGGGACCGTATACGTGGACAGGCTCTGGAGCATCGCACGACTGCGTGATATTACCGGAGACCTTCTGCGGGAC 100 110 120 130 140 150 160 170 GAGCCGGGTCACGCGGCTGACGGAGCGTCCGTTGGGCGACAAACACCAGGACGGGGCACAGGTACACTATCTTGTCACCCGGAGCGCGAG 190 200 210 220 230 240 250 260 GGACTGCAGGAGCTTCAGGGAGTGGCGCAGCTGCTTCATCCCCGTGGCCCGTTGCTCGCGTTTGCTGGCGGTGTCCCCGGAAGAAATATA PVU II 280 290 300 310 320 330 340 350 TTTGCATGTCTTTAGTTCTATGATGACACAAACCCCGCCCAGCGTCTTGTCATTGGCGAATTCGAACACGCAGATGCAGTCGGGGCGGCG ECO RI 370 380 390 400 410 420 430 440 CGGTCCCAGGTCCACTTCGCATATTAAGGTGACGCGTGTGGCCTCGAACACCGAGCGACCCTGCAGCGACCCGCTTAACAGCGTCAACAG * mRNA 460 470 480 490 500 510 520 530 CGTGCCGCAGATCTTGGTGGCGTGAAACTCCCGCACCTCTTTGGCAAGCGCCTTGTAGAAGCGCGTATGGCTTCGTACCCCTGCCATCAA BGL II MetAlaSerTyrProCysHisGZn 550 560 570 580 590 600 610 620 CACGCGTCTGCGTTCGACCAGGCTGCGCGTTCTCGCGGCCATAGCAACCGACGTACGGCGTTGCGCCCTCGCCGGCAGCAAGAAGCCACG HisAlaSerAlaPheAspGlnAlaAlaArgSerArgGlyHisSerAsnArgArgThrAlaLeuArgProArgArgGlnGZnGluAZaThr 640 650 660 670 680 690 700 710 GAAGTCCGCCTGGAGCAGAAAATGCCCACGCTACTGCGGGTTTATATAGACGGTCCTCACGGGATGGGGAAAACCACCACCACGCAACTG GZuVaZ4rgLeuGZuGZnLysMetProThrLeuLeuArgVaZTyrIleAspGZyProHisGlyMetGlyLysThrThrThrThrGlnLeu 730 740 750 760 770 780 790 800 CTGGTGGCCCTGGGTTCGCGCGACGATATCGTCTACGTACCCGAGCCGATGACTTACTGGCAGGTGCTGGGGGCTTCCGAGACAATCGCG LeuVaZAZaLeuGZySerArgAspAspIZeVaZTyrVaZProGZuProMetThrTyrTrpGZnVaZLeuGZyAZaSerGZuThrIleAZa 820 830 840 850 860 870 880 890 AACATCTACACCACACAACACCGCCTCGACCAGGGTGAGATATCGGCCGGGGACGCGGCGGTGGTAATGACAAGCGCCCAGATAACAATG AsnIZeTyrThrThrGlnHisArgLeuAspGlnGlyGluIleSerAZaGlyAspAZaAlaVaZVaZMetThrSerAlaGlnIZeThrMet 910 920 930 940 950 960 970 980 GGCATGCCTTATGCCGTGACCGACGCCGTTCTGGCTCCTCATGTCGGGGGGGAGGCTGGGAGTTCACATGCCCCGCCCCCGGCCCTCACC GZyMetProTyrAZaVaZThrAspAZaVaZleuAZaProHisVaZGZyGiyGZuAZaGlySerSerHisAZaProProProAZaLeuThr 1000 1010 1020 1030 1040 1050 1060 1070 CTCATCTTCGACCGCCATCCCATCGCCGCCCTCCTGTGCTACCCGGCCGCGCGATACCTTATGGGCAGCATGACCCCCCAGGCCGTGCTG LeuIZePheAspArgHisProzleAZaAlaLeuLeuCysTyrProAlaAZaArgTyrLeuMetGZySerMetThrProGlnAlaVaZleu 1090 1100 1110 1120 1130 1140 1150 1160 GCGTTCGTGGCCCTCATCCCGCCGACCTTGCCCGGCACAAACATCGTGTTGGGGGCCCTTCCGGAGGACAGACACATCGACCGCCTGGCC AZaPheVaZAZaLeuIZeProProThrLeuProGZyThrAsnIZeVaZLeuGZyAZaLeuProGZuAspArgHisIZeAspArgLeuAZa 1180 1190 1200 1210 1220 1230 1240 1250 AAACGCCAGCGCCCCGGCGAGCGGCTTGACCTGGCTATGCTGGCCGCGATTCGCCGCGTTTACGGGCTGCTTGCCAATACGGTGCGGTAT LysArgGZnArgProGZyGZuArgLeuAspLeuAZaMetLeuAZaAZaIZeArgArgVaZTlrGZyLeuLeuAZaAsnThrVaZArgTyr 1270 1280 1290 1300 1310 1320 1330 1340 CTGCAGGGCGGCGGGTCGTGGTGGGAGGATTGGGGACAGCTTTCGGGGACGGCCGTGCCGCCCCAGGGTGCCGAGCCCCAGAGCAACGCG LeuGlnGZyGZyGiySerTrpTrpGZuAspTrpGZyGZnLeuSerGZyThrAZaVaZProProGZnGlyAZaGZuProGZnSerAsnAla 1360 1370 1380 1390 1400 1410 1420 1430 GGCCCACGACCCCATATCGGGGACACGTTATTTACCCTGTTTCGGGCCCCCGAGTTGCTGGCCCCCAACGGCGACCTGTATAACGTGTTT GZyProArgProHisIZeGZyAspThrLeuPheThrLeuPheArgAZaProGZuLeuLeuA ZaProAsnGZyAspleuTyrAsnVaiPhe 1450 1460 1470 1480 1490 1500 1510 1520 GCCTGGGCCTTGGACGTCTTGGCCAAACGCCTCCGTCCCATGCACGTCTTTATCCTGGATTACGACCAATCGCCCGCCGGCTGCCGGGAC AZaTrpAZaLeuAspVaZleuAZaLysArgLeuArgProMetHisValPheIZeLeuAspTyrAspGZnSerProAlaGZyCysArgAsp 1540 1550 1560 1570 1580 1590 1600 1610 GCCCTGCTGCAACTTACCTCCGGGATGGTCCAGACCCACGTCACCACCCCAGGCTCCATACCGACGATCTGCGACCTGGCGCGCACGTTT AZaLeuLeuGZnLeuThrSerGZyMetVaZGZnThrHisVaZThrThrProGZySerIZeProThrIZeCysAspLeuAZaArgThrPhe SMA I 1630 1640 1650 1660 1670 1680 1690 1700 GCCCGGGAGATGGGGGAGGCTAACTGAAACACGGAAGGAGACAATACCG6AAGGAACCCGCGCTATGACGGCACGA^AAGACAGAATAA A ZaArgGZuMetGlyGZuA ZaAsn * * * 1720 1730 1740 1750 1760 1770 1780 1790 QACGCACGGGTGTTGGGTCGTTTGTTCATAAACGCGGGGTTCGGTCCCAGGGCTGGCACTCTGTC6ATACCCCACCGAGACCCCATTGGG FIG. 1. Nucleotide sequence of the TK gene and its flanking regions. The complement of the transcribed strand is shown. Locations of cleavage sites for Pvu II, EcoRI, Bgl H, and SmaI are indicated. The putative "Hogness box" and polyadenylylation signals are underlined, and the location of the 5' end of TK mRNA is indicated by an arrow. The predicted amino acid sequence of the TK protein is also given.

ugation, redissolved in water, and chromatographed on oligo(dT)- phoresis in an 8% polyacrylamide gel. The labeled DNA frag- cellulose as described by Honjo et al. (21) to isolate polyadenyl- ment was denatured by heating at 100°C for 3 min, mixed with ylated RNA. 40-60 tkg of RNA in 200 ,ul of 0.4 M NaCl/20 mM Tris HCl, RNA Sequence Analysis. A modification of the method of pH 7.5/0.4% NaDodSOd/75% (vol/vol) formamide, and al- Bina-Stein et aL (22) was used. DNA fragments labeled in the lowed to hybridize for 20 hr at 420C. The DNA-RNA hybrid was strand complementary to TK mRNA were prepared by primed precipitated with ethanol, washed 5 times with 70% (vol/vol) synthesis (23) using an M13mp2 clone of HSV DNA as template ethanol, dried, resuspended in water, and divided into four and restriction fragments of pX1 as primers, and the fragments parts. Each portion was made up to 20 pul containing 50 mM were isolated after restriction enzyme digestion and electro- Tris-HCl at pH 8, 50 mM KCl, 10 mM MgCl2, 10 mM di- Downloaded by guest on September 24, 2021 Biochemistry: Wagner et al. Proc. Nati Acad. Sci. USA 78 (1981) 1443 thioerythritol, 50 MLM each of dATP, dGTP, dCTP, and dTTP, Table 1. Cleavage sites for restriction endonucleases predicted and 20 MiM of one dideoxynucleoside triphosphate (ddATP, from the nucleotide sequence ddGTP, ddCTP, or ddTTP; P-L Biochemicals). Eight units of Endonuclease Locations of sites reverse transcriptase (RNA-dependent DNA polymerase) was HincH 442 reaction mixtures were added (a gift from S. Weissman), and the Hinfl 1217 incubated at 420C for 30 min. Reactions were continued for 10 Pst I 183, 420, 1260 min at 420C after addition of 20 Mul of the same buffer containing Alu I 190, 208, 1297 400 MM each of dATP, dGTP, dCTP, and dTTP. After addition Ava I 759, 1398, 1621 of 10 Mil of 1 M NaOH, reaction mixtures were heated at 100'C Hha I 173, 204, 357, 497, 510, 564, 602, 737, 883, for 5 min, neutralized with acetic acid, and precipitated with 1038, 1178, 1608, 1610, 1679 ethanol. Precipitates were dissolved in 7 Ml of 7 M urea/50% EcoRII 135,365, 557, 638, 728,839, 1067, 1163, 1199, (vol/vol) formamide and heated at 1000C for 3 min, and half of 1322, 1441, 1493, 1578, 1604, 1756 each sample was loaded on a standard sequencing gel (19). Hpa II 70, 93, 168, 256, 611, 856, 978, 1032, 1111, 1140, 1183, 1516, 1523, 1549, 1622, 1666 RESULTS AND DISCUSSION Hae III 225,399,576,725,854,980, 1034,1070,1088, 1133, 1166, 1211, 1310, 1350, 1394, 1409, Fig. 1 shows the nucleotide sequence of an 1800- re- 1445, 1460 gion of plasmid pXl that appears to contain the entire TK gene. The strategy used to determine the sequence is shown in Fig. Locations of Pvu II, EcoRI, Bgl II, and Sma I sites are indicated in Fig. 1. The Sac I site in the TK gene of HSV-1 strains 17, MP, and F, 2. For more than 95% of the gene, the sequence was confirmed and the Kpn I site in strain MP are not present in strain CL101 (refs. by analysis of both complementary DNA strands. Also, more 12 and 13; S. McKnight, personal communication). In addition, there than 92% of the sequence was determined bv both the dide- are no cleavage sites predicted from the nucleotide sequence for the oxynucleotide sequencing method of Sanger and the chemi- following enzymes: Ava III, Bcl I, Cla I, HgiAI, HindII, Hpa I, Pvu I, cal modification and cleavage method of Maxam and Gilbert. Sac II, Sal I, Xba I, and Xho I. Fifty sites for restriction enzyme cleavage had previously been located by standard mapping techniques (ref. 11; unpub- lished data). All of these sites are also predicted by the se- we have determined the sequence covers most of the trans- quence (Fig. 1, Table 1). From these data an upper limit for the forming Pvu II fragment and contains the Bgl II and HincIl sites random error frequency in the sequence of approximately 1 in and one of the EcoRI sites within the BamHI fragment (Fig. 200 nucleotides can be obtained. 3). The direction of transcription of the TK gene as determined Because the amino acid sequence of the TK protein is not by Smiley et al. (25) is also indicated in Fig. 3. known, we cannot locate directly the coding sequences for the To locate the TK gene more precisely, we have determined protein. We therefore have relied on a variety ofother evidence the sequence of the 5' end of TK mRNA. A 32P-labeled Hha to locate the gene. Using DNA fragment-mediated transfor- I fragment extending from nucleotides 564 to 510 in the DNA mation of TK- cells to a TK' phenotype, Wigler et al. (8) sequence (Fig. 1) was used to prime synthesis by reverse tran- showed that the HSV-1 TK gene resided on a 3.5-kilobase scriptase on total polyadenylylated RNA from the cytoplasm of BamHI fragment of HSV-1 DNA [map coordinates 0.29 to 0.31 HSV-infected cells. The labeled DNA fragment used as primer on the P orientation of the HSV-1 genome (24)], and that diges- is located near the Bgl II site (nucleotide 458) and should be tion of this fragment with EcoRI greatly reduced the transfor- entirely within the TK gene. Synthesis of cDNA will proceed mation frequency. Similarly, Colbere-Garapin et al. (12) showed to the 5' end of the RNA to which the primer hybridizes, and that a 2-kilobase subfragment of the BamHI fragment generated incorporation of dideoxynucleotides into the synthesis reaction by Pvu II could also transform cells to TK+, and Wilkie et al. mixtures will produce sequence data for the strand comple- (13) showed that digestion of TK DNA with Bgl II and HincIl mentary to the RNA in a manner analogous to the method of abolished transformation. The 1800-base pair region for which Sanger et al. (19) for DNA sequence determination. Fig. 4 shows the results of such an experiment. The sequence ob- tained corresponds exactly to the DNA sequence from nucleo- tides 494 to 409, with the exception of nucleotides 431-433, for Hoe m 11111" 1 1 which the bands on the gel are compressed and cannot be read accurately. A similar experiment using a primer extending from the Hha I site at nucleotide 497 to the Hae III site at nucleotide 398, i.e., about 10 nucleotides past the site where cDNA syn- thesis stopped, resulted in no extension of the primer (data not II Hpa 1 - "ii H' " ' shown). Thus, the 5' end of an mRNA covering the Bgl site, and therefore presumably covering the TK gene, is located at nucleotide 409 of the DNA sequence. This location has been confirmed by S. McKnight (personal communication), using the S1 nuclease transcript mapping technique. Other 1I Ai ' il Comparison of the DNA sequences of a number of eukary- otic and viral genes has revealed the presence of conserved se- quences located at specific distances outside of the transcribed FIG. 2. Strategy for DNA sequence determination. Arrows indi- regions of the genes. These conserved sequences may serve as cate the direction (5' to 3') and extent of sequence determined in in- signals for initiation of transcription. Inspection of the DNA dividual experiments. Solid arrows indicate sequence determined by sequence adjacent to the location of the 5' end of TK mRNA the Maxam and Gilbert method (20) and dashed arrows indicate se- reveals the presence of similar sequences. Thirty-one nucleo- quence determined by the Sanger method (19). Symbols for "other" restriction enzymes are as follows: +, EcoR:I; i, EcoRI; T, Bgl II; i, tides before the beginning of the mRNA is the sequence C-A- Sma I; 6, AluI; x, Ava I. T-A-T-T-A-A, which is similar to the "Hogness box" (26). Downloaded by guest on September 24, 2021 1444 Biochemistry: Wagner et aLPProc. Nad Acad. Sci. USA 78 (1981)'

o 0 0 0 0 o o0 0 0 0 LO _ in 0

304 I 0 E E 0 a co w 2-e :j-CL. 0n

3' mRNA 5'

I

/ FIG. 3. Location of the TK gene. / The top-line shows the region for which the sequence was determined. The nu- / cleotide numbers correspond to those o / ° 0 0 / 0 -- 0 w o0 in Fig. 1. The second line shows the lo- o0 01 0 0 o/ ol o / o & ° o \ cation of restriction endonuclease sites _, _- _ L I . ~ ~~. . a . A within the 3.5-kilobase BamHI frag- mentof HSV-1 DNA contained in plas- 44 4 C, * mid pXl (11). The third line shows the location and orientation of TK mRNA. 144 The bottom lines areexpanded regions 44 E*,W.-o,a..ZE a X,.. of the map, showing locations of the A cEcpEW CD proposed control sequences discussed 0 in the text.

Eighty-five nucleotides before the beginning of the mRNA is the sequence G-G-C-G-A-A-T-T-C, which is similar to one found by Benoist et al. (27) to be conserved in many eukaryotic genes. This sequence in the TK gene contains the EcoRI site, which may explain the low frequency of TK transformation by AGCT~G EcoRI-cleaved HSV-1 DNA. 99 . Kozak (28) has proposed that eukaryotic ribosomes initiate * s; GGGTCGI protein synthesis at the AUG closest to the 5' end of an mRNA. The first AUG codon in.TK mRNA occurs 107 nucleotides from the 5' end of the message (nucleotide 516 in the DNA se- quence). Twenty-nine nucleotides prior, to this.site is a region of partial complementarity to the'3' end of eukaryotic 18S ri- bosomal RNA. Such regions have been proposed as possible ribosome binding sites (29). From the ATG at nucleotide 516, there is an open reading frame of 1128 nucleotides terminated i~~~~ by TGA (nucleotide 1644 in the DNA sequence). That TGA is the physiologic terminator of the TK gene is supported by the work of Cremer et al. (30), who have shown that UGA sup- pressor tRNA from yeast causes read-through of the normal terminator of the'TK gene in an in vitro translation system. Forty-eight nucleotides beyond the TGA is the sequence A-A- ANNM-A T-A-A-A, which is repeated again 13 nucleotides later. The se-, CA quence A-A-U-A-A-A has been found 14 to 30 nucleotides prior

to the poly(A) of all polyadenylylated messenger RNAs exam- ss v ined to date (31). We therefore expect that the 3' end of' TK mRNA will be in the neighborhood of nucleotides 1710 to 1730, .,f, G resulting in a total mRNA length of about 1300 nucleotides, plus the posttranscriptionally added poly(A). This is in good agree- VA?: A,&CG ment with previously published estimates of the length of the TK mRNA (15). The TK gene can code for a protein containing 376 amino acids with a molecular weight of 40,900. This also compares favorably with the estimated molecular weight of AGG 40,000-44,000 obtained from NaDodSOjpolyacrylamide gel .- electrophoresis,(2-4, 30). GA AA The preceding interpretation of the sequence data assumes that there are no splices. in the messenger RNA for TK. Several a lines of evidence support this assumption. First, viral TK ac- tivity is expressed in an E. colihost carrying a plasmid containing FIG. 4. Autoradiogram of a sequence gel of cDNA synthesized the TK gene linked in the proper orientation to a prokaryotic from the 5' end of TK mRNA. Downloaded by guest on September 24, 2021 Biochemistry: Wagner et aL Proc. NatL Acad. Sci. USA 78 (1981) 1445

Table 2. Predicted codon usage in the mRNA and amino acid there are two nucleotide differences between the two strains, composition of TK protein one increasing and one decreasing complementarity with 18S Nucleotide in codon rRNA. There are 19 single base changes between the strains within the coding portion of the gene. Of these only seven re- Second No. per Resi- mole- sult in amino acid differences in the two strains. First U C A G Third due cule Knowledge of the nucleotide sequence of the HSV-1 TK a U 5 2 gene should make possible direct approach toward under- Phe Ser 4Tyr O Cys U Ala 46 structure 3Phe 3 Ser 8Tyr 4 Cys C Arg 28 standing how the of a gene influences its expression. 1 Leu 1 Ser O Stop 1 Stop A Asn 8 Modification of portions of the gene thought to be involved in 6Leu 6Ser O Stop 5Trp G Asp 19 regulation of expression and examination of mutants with al- Cys 4 tered regulatory properties should indicate which nucleotide C 6 Leu 4 Pro 6 His 3 Arg U Gln 19 sequences are necessary for proper control of gene expression. 6 Leu 15 Pro 6His 14Arg C Glu 14 The interaction of proteins with specific sites in the gene can 1 Leu 2Pro 6Gln 3Arg A Gly 30 also be examined in detail. 21 Leu 9 Pro 13 Gln 7 Arg G His 12 Ile 16 We thank J. Smiley, S. McKnight, and F. Colbere-Garapin for mak- A 1 Ile 1 Thr 1 Asn 1 Ser U Leu 41 ing available unpublished data. We also thank S. Cronin for providing 11 Ile 13 Thr 7 Asn 4 Ser C Lys 4 expert technical assistance. This work was supported by grants from the 4Ile 5 Thr 4Lys 1 Arg A Met 13 U.S. Public Health Service, CA-13515, CA-16038, CA-06519, and CA- 13 Met 9 Thr O Lys O Arg G Phe 8 09259. Pro 30 G 3 Val 7Ala 3 4 17 1. Dubbs, D. R. & Kit, S. (1964) Virology 22, 493-505. Asp Gly U Ser 2. Honess, R. W. & Watson, D. H. (1974)J. Gen. Virol. 22, 171-185. 7 Val 27 Ala 16 Asp 11 Gly C Thr 28 3. S.ummers, W. P., Wagner, M. & Summers, W C. (1975) Proc. NatL 2Val 0Ala 2Glu 1 Gly A Trp 5 Acad. Sci. USA 72, 4081-4084. 10 Val 12 Ala 12 Glu 14 Gly G Tyr 12 4. Preston, C. M. (1979)J. Virol. 29, 275-284. Val 22 5. Honess, R. W. & Roizman, B. (1974)J. Virol. 14, 8-19. 6. Munyon, W, Kraiselburd, E., Davis, D. & Mann, J. (1971) J. Virol. 7, 813-820. 7. Munyon, W., Buchsbaum, R., Paoletti, E., Mann, J., Kraisel- promoter (J. Smiley, personal communication; F. Colbere-Gar- burd, E. & Davis, D. (1972) Virology 49, 683-689. apin, personal communication). E. coli is not known to carry out 8. Wigler, M., Silverstein, S., Lee, L.-S., Pellicer, A., Cheng, Y.- RNA splicing. Second, S1 nuclease transcript mapping has C. & Axel, R. (1977) Cell 11, 223-232. 9. Lin, S. & Munyon, W. (1974)J. Virot 14, 1199-1208. failed to detect any splices within TK mRNA of HSV-1 strain 10. Leiden, J. M., Buttyan, R. & Spear, P. G. (1976) J. Virol. 20, MP (S. McKnight, personal communication). Because the nu- 413-424. cleotide sequences of the TK genes from strains MP and CLIOl 11. Enquist, L. W., Vande Woude, G. F., Wagner, M., Smiley, J. are nearly identical (see below), it is unlikely that the gene from R. & Summers, W. C. (1979) Gene 7, 335-342. one strain contains a splice that is absent in the other. Third, 12. Colbere-Garapin, F., Chousterman, S., Horodniceanu, F., Kour- ilsky, P. & Garapin, A. Proc. Natl. Acad. Sci. USA SI mapping miss a near (1979) 76, while nuclease might splice the end of 3755-3759. an mRNA, direct sequence determination at the 5' end of TK 13. Wilkie, N. M., Clements, J. B., Boll, W, Mantei, N., Lonsdale, mRNA has shown that the RNA and DNA sequences are colin- D. & Weissmann, C. (1979) Nucleic Acids Res. 7, 859-877. ear over the first 100 base pairs. Thus, barring a splice near the 14. Smiley, J. R. (1980) Nature (London) 285, 333-335. 3' end of the mRNA that is not necessary for TK expression in 15. Cremer, K., Bodemer, M. & Summers, W. C. (1978) Nucleic E. coli, or the unlikely event that the structures of the TK gene Acids Res. 5, 2333-2344. 16. in strains MP and CLO1l are quite different, it appears that the Clewell, D. B. & Helinski, D. R. (1969) Proc. Natl. Acad. Sci. USA messenger RNA for TK is not spliced. 62, 1159-1166. 17. Gronenborn, B. & Messing, J. (1978) Nature (London) 272, The amino acid composition of the TK protein can be pre- 375-377. dicted from the nucleotide sequence, and Table 2 indicates the 18. Southern, E. (1979) Methods Enzymot 68, 152-176. number of times each codon is used and the overall amino acid 19. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. composition. While all but four codons are used at least once, Sci. USA 74, 5463-5467. the distribution of codon usage is somewhat unusual. Where 20. Maxam, A. M. & Gilbert, W. (1980) -Methods Enzymnot 65, 499-560. an amino acid can be specified by two or more codons differing 21. Honjo, T., Packman, S., Swan, D., Nau, M. & Leder, P. (1974) in the third position, there is a marked preference for codons Proc. NatL Acad. Sci. USA 71, 3659-3663. with a G or a C in the third position over codons with an A or 22. Bina-Stein, M., Thoren, M., Salzman, N. & Thompson, J. A. a U in that position. The overall amino acid composition is also (1979) Proc. Natl. Acad. Sci. USA 76, 731-735. somewhat unusual, being high in proline, arginine, glycine, and 23. Fiddes, J. C. & Godson, G. N. (1978) Virology 89, 322-326. alanine. These four amino acids can be specified by codons con- 24. Locker, H. & Frenkel, N. (1979)J. Virol. 32, 429-441. taining only and . These codon and amino 25. Smiley, J. R., Wagner, M. J., Summers, W. P. & Summers, W. acid C. (1980) Virology 102, 83-93. choices may represent an adaptation to the high G +C content 26. Corden, J., Wasylyk, B., Buchwalder, A., Sassone-Corsi, P., Ked- of the coding portion of the TK gene (65%) and of the HSV-1 inger, C. & Chambon, P. (1980) Science 209, 1406-1414. DNA molecule as a whole (67%). 27. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P. (1980) The sequence of the TK gene from the MP strain of HSV-1 Nucleic Acids Res. 8, 127-142. has also been determined (S. McKnight, personal communi- 28. Kozak, M. (1978) Cell 15, 1109-1123. cation). Comparison of the two sequences has to 29. Hagenbuchle, O., Santer, M., Steitz, J. A. & Mans, R. (1978) Cell shown them 13, 551-563. be very similar. All of the putative recognition sequences for 30. Cremer, K. J., Bodemer, M., Summers, W P., Summers, W C. control of transcription and translation discussed above are & Gesteland, R. F. (1979) Proc. NatL Acad. Sci. USA 76, 430-434. identical between the two strains of virus except for the region 31. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London) 263, of partial complementarity to the 3' end of 18S rRNA. Here, 211-214. Downloaded by guest on September 24, 2021