JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4271-4278 Vol. 169, No. 9 0021-9193/87/094271-08$02.00/0 Copyright © 1987, American Society for Microbiology Nucleotide Sequence of a Glucosyltransferase Gene from sobrinus MFe28 JOSEPH J. FERRETTI,'* MARTYN L. GILPIN,2 AND ROY R. B. RUSSELL2 Department of Microbiology and Immunology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma 73190,1 and Dental Research Unit, Royal College of Surgeons of England, Downe, Kent BR6 7JJ, United Kingdom2 Received 3 April 1987/Accepted 2 June 1987 The complete nucleotide sequence was determined for the Streptococcus sobrinus MFe28 g#f gene, which encodes a glucosyltransferase that produces an insoluble glucan product. A single open reading frame encodes a mature glucosyltransferase protein of 1,559 amino acids (Mr, 172,983) and a signal peptide of 38 amino acids. In the C-terminal one-third of the protein there are six repeating units containing 35 amino acids of partial homology and two repeating units containing 48 amino acids of complete homology. The functional role of these repeating units remains to be determined, although truncated forms of glucosyltransferase containing only the first two repeating units of partial homology maintained glucosyltransferase activity and the ability to bind glucan. Regions of homology with alpha-amylase and glycogen phosphorylase were identified in the gluco- syltransferase protein and may represent regions involved in functionally similar domains.

The glucosyltransferases (EC 2.4.1.5) produced by vari- MATERIALS AND METHODS ous of oral streptococci are of considerable interest and media. E. coli MAF1 (containing plasmid because of their production of extracellular glucans from pMLG1) was the initial source of the S. sobrinus MFe28 gtfp . These glucans are thought to play a key role in the gene (27a). E. coli MAF5 contains plasmid pMLG5, which development of because of their ability to has the same 5.0-kilobase (kb) fragment as pMLG1 and an adhere to smooth surfaces and mediate the aggregation of additional 0.5-kb fragment from the bacteriophage lambda bacterial cells and food debris (12). It is known that a single recombinant in which the gtfp insert was first cloned (9). E. strain can produce several distinct glucosyltransferases dif- coli JM109 was used as the recipient for transfection exper- fering in electrophoretic, antigenic, or enzymatic properties, iments with M13 bacteriophage vectors (35) and was rou- although some of this apparent variety may be due to the use tinely grown in 2x YT broth (19). Soft agar overlays of different oral streptococcal strains and different purifica- consisted of 2x YT broth supplemented with final concen- tion procedures and activity assays by different laboratories. trations of 0.75% agar, 0.33 mM isopropyl-3-D-thiogalacto- The properties and characteristics of the glucosyltrans- pyranoside, and 0.02% 5-bromo-4-chloro-3-indolyl-3-galac- ferases of the mutans group streptococci have been reviewed toside for differentiating recombinant and nonrecombinant by Ciardi (3) and Mukasa (18). phages. For the titration of M13 recombinants carrying all or Recently, several glucosyltransferase genes from various part of gtJl, phages were plated on E. coli JM109 on B-broth strains of streptococci have been cloned by recombinant agar (26) to which 1% sucrose was added for detection of DNA techniques and have been shown to be expressed in activity. Escherichia coli. Robeson et al. (24) have cloned a glucosyl- and chemicals. Restriction enzymes were pur- transferase gene (gtfA) from UAB90 chased from Bethesda Research Laboratories, Inc., Gaith- (serotype c) and shown that it produces a protein with a ersburg, Md., and were used in accordance with the speci- molecular weight of 55,000. A similar gtfA gene has also fications of the manufacturer. T4 DNA ligase was purchased been cloned by Pucci and Macrina (23) from S. mutans LM7 from Amersham Corp., Arlington Heights, Ill., or Bethesda (serotype e) and by Burne et al. (2) from S. mutans GS5 Research Laboratories. The Klenow fragment of DNA poly- (serotype c). Aoki et al. (1) reported the cloning of a gluco- merase and the M13 15-base primer were purchased from syltransferase gene (gtfB) from S. mutans GS-5 that pro- Bethesda Research Laboratories. The deoxy- and dideoxy- duces a protein with a molecular weight of about 150,000. nucleotide triphosphates were purchased from P-L Bio- Another glucosyltransferase gene, gtfC, which specifies a chemicals, Inc., and [a-32P]dATP was purchased from New 150,000-molecular-weight polypeptide has been obtained England Nuclear Corp., Boston, Mass.). Isopropyl-3-D-thio- from S. mutans LM7 by Pucci et al. (22). Finally, Gilpin et galactopyranoside and 5-bromo-4-chloro-3-indolyl-3-galac- al. (9) have cloned two glucosyltransferase genes from toside) were purchased from Sigma Chemical Co., St. Louis, Streptococcus sobrinus MFe28 (serotype h): gtfS, which Mo. encodes a glucosyltransferase that synthesizes a water- Subcloning of the gtfl gene and nucleotide sequencing. The soluble glucan, and gtfl, which encodes a glucosyltrans- gtfl gene was obtained for subcloning experiments by diges- ferase that synthesizes a water-insoluble glucan. tion of pMLG1 with HindIII followed by electrophoresis and The availability of these cloned genes allows further isolation of the 5.0-kb fragment from 0.8% type VII agarose characterization of both the genes and gene products, and in gels as previously described (16). The fragment was unidirec- this communication, we report the complete nucleotide tionally degraded with Bal 31 by a modification of the sequence of the gtfl gene from S. sobrinus MFe28. procedure of Gilmore et al. (8), and all subcloning into M13 phages mpl8 and mpl9 was done as described by Ferretti et * Corresponding author. al. (7). A 0.5-kb HindIIl fragment was subsequently isolated 4271 4272 FERRETTI ET AL. J. BACTERIOL.

co sis of purified glucosyltransferase was done with an Applied ('4 in P Biosystems 470A protein sequencer with an on-line 120A so. I( 4( PTH analyzer in accordance with the instructions of the manufacturer.

IIh-.a ...... 1.... !X...... RESULTS Cloning of complete gtl gene. The gtpl gene from S. sobrinus MFe28 was originally cloned into the BamHI sites of bacteriophage lambda L47.1 and was located in a 7.6-kb DNA insert. Subsequently, mapping experiments showed that HindIII cleaved the insert at three points and the lambda DNA at two points outside the insert to give four fragments of 5.0, 2.6, 2.5, and 0.5 kb which carried S. sobrinus DNA. Plasmid pMLG1, from which a functional glucosyltrans- ferase is expressed in E. coli, carries the 5.0-kb fragment FIG. 1. SDS-PAGE of glucosyltransferase activity in S. sobrinus (27a). At an early stage of the nucleotide-sequencing pro- MFe28, E. coli MAF4 (carrying pMLG5), and E. coli MAF1 gram, it became clear that pMLG1 contained a long open (carrying pMLG1). The right lane contains the following protein reading frame with no termination codon. These results standards: RNA polymerase 1' subunit 165,000), RNA poly- (Mr, indicated that the entire gtp gene was not present in merase P subunit (155,000), 3-galactosidase (116,000), phosphorylase b (97,400), albumin (66,000), and ovalbumin (45,000). pMLG1, and so a further collection of derivatives of pBR322 carrying HindIII fragments from the bacteriophage lambda recombinants was examined. One of these, pMLG5, en- from pMLG5, cloned into M13 phages, and sequenced. This coded a glucosyltransferase of 173 ki}odaltons (Fig. 1) and fragment contained the remainder of the glucosyltransferase was found to have both the 5.0- and 0.5-kb fragments. sequence not present in the pMLG1 HindIII fragment. Restriction mapping of pMLG5 and subsequent nucleotide Sequencing reactions were performed by the Sanger dide- sequencing confirmed that the 0.5-kb fragment carried the oxy chain termination method (28) by using the procedures information for the C-terminal region of glucosyltransferase. described by Amersham. All sequences were confirmed A partial restriction site map of the 5.5-kb insert containing from at least two overlapping clones, and the entire gene the gft gene is shown in Fig. 2. sequence was determined on both strands. The sequence Nucleotide sequence. The complete nucleotide sequence of information was analyzed by the James M. Pustell DNA/ the 4,995-base-pair (bp) fragment carrying the gtp gene was protein sequencing program obtained from International determined in both orientations and is shown in Fig. 3. Biotechnologies, Inc., New Haven, Conn. Previous evidence had indicated that 0.5 kb of the insert in Gel electrophoresis. Sodium dodecyl sulfate-polyacryl- pMLG1 and pMLG5 was derived from the bacteriophage amide gel electrophoresis (SDS-PAGE) and detection of lambda vector (27a), and this was confirmed by the finding glucosyltransferase activity by incubation of gels with su- that the first 494 bp of the 5.5-kb fragment showed 100% crose in the presence of Triton X-100 was done as described homology with the corresponding part of the lambda se- previously (9), but the sensitivity of the method was en- quence (lambda sequence not shown). An open reading hanced by the use of a periodic acid-Schiffreagent procedure frame containing 4,791 bp codes for the glucosyltransferase modified from published methods (15, 29). After incubation protein. The deduced amino acid sequence, which is coded in sucrose (generally for 40 h at 37°C), the gels were fixed for starting at the ATG codon beginning at position 160 and 30 min in 75% ethanol and treated on a shaker for 30 min extending to the termination codon TAA at position 4951, with 0.7% periodic acid in 5% acetic acid. They were then contains 1,597 amino acids and has a molecular weight of shaken for 60 min in several changes of 0.2% sodium 177,100. A putative ribosome-binding site sequence (AG- metabisulfite in 5% acetic acid and placed in Schiff reagent GAGGA) is located nine nucleotides upstream from the (Sigma) for several hours. Finally, the gels were washed translation initiation codon. Further upstream is a probable extensively in 45% methanol-45% acetic acid-10% water. -35 region (TTGACG) separated by 18 bp from a proposed All procedures were carried out at room temperature. -10 region (TTAAAA). Purification of glucosyltransferase and glucan-binding pep- Amino acid composition. The deduced amino acid compo- tides. Glucosyltransferase was purified from cells of E. coli sition indicated a highly hydrophilic protein, with a hydro- MAF5 (which carries plasmid pMLG5) by subjecting the phobic N-terminal region. This region displayed the charac- bacteria to ultrasonic disruption and using a single-step teristics expected of a signal peptide, i.e., a basic N-terminal affinity chromatography procedure: the bacterial extract was region followed by a central hydrophobic region and a more passed through a column containing Sepharose 1000 (Phar- polar C-terminal region. The residues surrounding amino macia) and mutan; after the column was washed with buffer, acid 38 conform with the "-3, -1" rule proposed by von bound glucosyltransferase was eluted with 5 M guanidine Heijne (31) for amino acids found at a cleavage site. To hydrochloride (26). The same procedure was used for detec- tion of glucan-binding peptides; i.e., disrupted cell extracts, Haei KWnI Tbal K,pnl NciI Aval Hidul PatIHindlI plus for phage M13-infected cultures, the culture superna- - EcoRV10.Clal Clal20.b0 3 8acl tant, were passed through the affinity column. Eluted pep- tides were analyzed by SDS-PAGE, and derivation from 800 1600 2400 3200 4000 4800 bp glucosyltransferase was confirmed by Western blotting FIG. 2. Partial restriction map of the 4,995-bp fragment contain- (immunoblotting) with antiserum against purified glucosyl- ing the gtp gene and the 494-bp fragment of bacteriophage lambda transferase (27a). (hatched) carried by plasmid pMLG5. The 5' end of the gene is at the N-terminal sequence analysis. N-terminal sequence analy- left, and the 3' end is at the right. VOL. 169, 1987 DNA SEQUENCE OF S. SOBRINUS gtfl 4273

10 20 30 40 50 60 1330 1340 1350 1360 1370 1380 * * * * * *

GAT OG1 CIA TOG TM MC AGA CM CAA ACA TIC ATG CCT1ITC TC TOT MT MC TAG AMA ACT GOT TCC CTG GAC TCr CGT TTC ACC TAC AT OCT MC GAC COG TTA GS100G TAT GAG Thr Gly Ser Leu Asp Sr Arg Ph. Thr Tyr Asn Ala Ann Asp Pro Lu Gly Gly Tyr Glu

70 s0 90 100 110 120 1390 1400 1410 1420 1430 1440 * * * * * * TM ATT GTA MT TOT GGT AAA ATT ACT TGA OCA TIA CM TIA TOT MC TCT TAO AM AGT TIC CIT CTG OCT MC GAC GTG GAT MC TCT MAT CCC ATC GTT CM GCA GAG CM CTI MC Phe Lou Lou Ala Asn Asp Val Asp Asn Ser Aon Pro Ile Val Gln Ala Glu Gln Lou Asn

130 140 150 160 170 10 1450 1460 1470 1480 1490 1500

TM AGl TCO TTA TOT TO CM ATT AG0 AG ACT cOCA T20 A02 G AAMG AAT "A OT TOT TOG CTG CAT TAC CTG CCC MC TIC GOT ACT ATC TA GCT AM0AGA GOTU0AT OCT MC m Sot Glu Lys AMn Glu Arg Phe Trp Leu His Tyr Lou Ltu Asn Phe Gly Thr I1e Tyr Ala Lys Asp Ala Asp Ala Ann Phe

190 200 210 220 230 240 1510 1520 1530 1540 1550 1560 * * 0 * * * M T AG AT2 CAT AM OTC AM AMA AGA TOG0T ACT ATC TCA COTT CA TCT GCC ACT A TTA GAc TCT ATC CGT GTT 0U GOG GTAG0 T AT C GT GCT GAc cTT CTG CM ATc TeT AGT Lys Sot Hnis Lys Val Lys Lys Arg Trp V01 Thr Ile SIr Vol A01 Ser Alo Thr Not Leu Asp Ser Ile Arg Val Asp Ala Val Asp Asn Val Asp Ala Asp Leu Ltu Gln Ioe Ser Ser

250 260 270 200 290 300 1570 1580 1590 1600 1610 1620 * * * * * * GOT II0 AMA MACAM CAAAM AA GCT MT MC CAC GTT GCT TCA GCT CTC GOT lCO TCA GOT GOT ATC (TICAGrACT CMACT GTT AOC GM MC AGC GAT TAC COT AMG GCA GCT TAC Al4 SMr Ala Ltu Gly A1 SMr Vol A41 Sir Ala Asp Thr Olu Thr Vol Ser Glu Asp Ser Asp Tyr Lou Lys Ala Ala Tyr Gly Ile Asp Lys Ash Asn Lys Asn Ala Asn Asn His Val 1670 310 320 330 340 350 360 1630 1640 1650 1660 1680 * * * * 0 * TOT GTA GAM CA TGG AOC GAC MC G4 ACC CCT TAT CCC CAT GAT 04T GOC GAC MC AAC CAA GCA GTC TTG ACG GCT GAC CAA ASG ACT ACC MC CM GAT ACT GAG CM ACT TCT UI Ser Ann Thr Pro LAu His Asp Asp Asp Asn Asn Gln A10 Vol Ltu Thr Alo Asp Gln Tht Thr Thr Asn Gln Asp Thr Glu Gin Thr SIr nr Ile Val Glu Ala Trp Asp Asp Tyr Gly 1690 1700 1710 1720 1730 1740 370 380 390 400 410 420 * * * 0 * * ATG MC ATG GAC MC AMG TC CGT TTS TCT ATG CTT TCG TOT TTG GCT AM CCA T21 GT GCA GoCG AcA GoCT ACA TCA GM CAG TCT GCT TCA ACT 0A 0A ACA GAT CAA GCA CTC lNot Asn Hot Asp Ann Lys Ph6 Arg Lou Ser Met Leu Trp Ser Lou Ala Lys Pro Leu Val Alo A41 Thr A010 hr r Gin 1in SMr Ala SIr Thr Asp A1 A01 Thr Asp Gln Ala Ltu 1750 1760 1770 1780 1790 1600 430 440 450 460 470 480 * GAC AMA CT TCT GOC TTG MTCCC CTC ATCCAT MCAG0 CTG OTT GOCGTOT CGATG 04T TCA 4AACA GAT CM 0CA TCA GCAG0 A GAG CAA AOT CM 0A ACA ACA GOCT AGC ACA GAC Asp Lys Arg Ser Gly Ltu Ain Pro Ltu Ile His Ann SIr Lou Val Asp Arg Glu Val Asp Ser A01 Thr Asp Gln Ala Ser A01 A01 Glu Gln Thr Gln Gly TOn lhr A01 SIr Thr Asp 1010 1820 1830 1840 1850 1860 490 500 510 520 530 540 GAC CGT CAA GTT GM ACC GII CCA A0T TAC AGC TO CC OTTGCT CAC G0 AGC GM G0A GT CSG AOG G&A ocr CAA ACA ACC ACA AAT oCT AAT GAA GCT AAG TGG ACT GMAA T GAG Asp Arg Gli Val Glu Thr Val Pro Ser Tyr Ser Ala Arg 410 Ris Asp SIr Glu Val Thr Ala Alo Gln Thr Thr Thr AMn Alo Ann Glu Alo Lys Trp Vol Pro ohr Glu AMn Glu 1870 1800 1890 1900 1910 1920 550 560 570 500 590 600 CM GAC CTG ATT CGT GAC OTT ATT MG GCT GM AT MU CCA AAT GCA TOT 0G0 TAT TO MC CM GOT mTT AC0 GAT GAG ATG TTA &A GM GOC 00G MT GTG GCT ACT CT GMA TCT Gln Anp Lou ITe Arg Asp I10 I1e Lys Ala GluInA1e Pro AMn Ala Ph6 Gly Tyr SMr Asn Gln Val Phe TOr Asp Glu Set Lou Ala Glu Ala Lys Asn Vol A01 Thr Alo Glu Ser 610 620 630 640 650 660 1930 1940 1950 1960 1970 1980 TOT CM ATc GAC CM GCC MAG AT TAC MC GAA GAC CTC GAMG ACe MT TCA ATT CCA T MAMC TTG0CC AM ATG TCA MT GTT AMG CAG GTT GAC GUT MA TAT ACT OAC GM TTC e Gln Phe Lys Ile Tyr Ann Glu Asp Ltu Lys Lys Thr Asn SIr Ile Pro Sir Asp Lnu A41 Lys lot SMr Asn Vol Lys Gln Vol Aso Gly Lys Tyr Phe Thr Gln Asp GluIn Asp A10 1990 2010 2020 2030 2040 670 680 690 700 710 720 2000 GAT MG AM TAC ACT CAC TAC 4T GTG COG CTT TCT TAT ACC TTG CTT CTG ACT MC AMG TAT TAc TAC GAT CM GAC GC AAC GTT AMG AG MC TT Gct GTT AOC GTT GT GAG AMG Asp Lys Lys Tyr Thr His Tyr Ann Val Pro Lu SMr Tyr Thr Leu Lou Ltu Thnr An Lys Tyr Tyr Tyr Asp Gin Asp Gly Asn Vol Lys Lys Ann Phe A01 Vol SMr Vol Gly Glu Lys 2050 2060 2070 2080 2090 2100 730 740 750 760 770 780 G00 ToT A Cr OTT GTT TAOT 41 A GAT T0ATG C ACC GAT MGAT 0T CM TAC ATG Gcc ATC TAT TT GCM ACT G0C GCT TAC MA GAC ACT AGC 4G GTA GCM GOTATMAM Gly Ser Il. Pro Arg Val Tyr Tyr Gly Asp Sot Phe Thr Asp Asp Gly Gln Tyr Sot Ala Ile Tyr Tyr P60 Asp Glu TOn Gly A10 Tyr Lys Asp Thr SIr Lys Vol Glu Ala Aop Lys 2110 2120 2130 2140 2150 2160 790 800 810 820 830 840 MC MG ACT G01 MC TAC GAT GCT ATC GM TCT CTG CTG AM GIC CGT AT AMG TAC OTT Lou Lys Arg Lys Tyr TCA GOT TCC GAT ATO AG AM GA ACA0CC TT OCT GT MC MC MC1AM Asn Lys Thr Val Asn Tyr Asp A10 Ile Glu SMr Lon Al Sot Vol Ser Gly SMr Asp Ioe Ser Lyo Glu Glu Thr Thr Phe Alo Alo Asn Ann Arg A1 Tyr SIr 2170 2180 2190 2200 2210 2220

850 860 870 000 090 900 GCT G00 G0T CAA GCT ATG CM AAT TAC CM ATC GT MTCGOTCGM ATC TTG ACT TCT GTC Ala Gly Cly Gln Ala Hot Gln Ann Tyr Gln Ile Gly Ann Gly Glu Ile Ltu Thr Ser Val ACC TCA GCT GM MC TT GM GC ATT GATAAMC TTG ACA OCT MAC TCA TOG TAO OGT Thr Ser Ala Glu Ann Phe Glu Ala Ile Asp Asn Tyr Lou Thr A1a Asp Ser Trp Tyr Arg 2230 2240 2250 2260 2270 2280 910 920 930 940 950 960 CGT TAT 00 AMS G10 GCC COT AM CM AGC GAT AMG GT GAT GOG ACA OCT C ACG TCA Arg Tyr Gly Lys Gly Ala Lou Lys Gln SMr Asp Lys Gly Asp Ala Thr Thr Arg Thr Ser CCA AMG TCC ATC CTC MAG GAT AAG C TG ACA GM TCA AGC AMS GAT GAC TTC CdT Pro Lys SMt Ile Lnu Lys Asp Gly Thr Trp Thr Glu Ser SMr Lys Asp Asp Ph6 Lys Arg 2290 2300 2310 2320 2330 2340

990 1000 1010 1020 970 980 GOT OTC GOC GTT GTT ATG G0A MC CM CCC MTC TT AGC 20T MGA0G4M C0TA GCC Gly Val Gly Val Val Sot Gly Asn Gln Pro Ann Phe SrLo u Asp Gly Lys Val Val Ala CCG CTA TT2 ATG GOT T1G TG1 CCA GAT ACC G00 ACC AM COC MC TAT GOT MC TAC A02 Pro L tuL SuNot A1 Trp Trp Pro Asp Thr Glu Thr Lys Arg Asn Tyr Vol AMn Tyr oet 2350 2360 2370 2380 2390 2400

1030 1040 1050 1060 1070 1080 CTC MC A0GGT GCT GCC CAC OCT MC CM GM TAC OTT GOT CTT T0ATG GTA AACC AAM Lto Asn Met Gly Ala A41 His Ala Asn Gln Glu Tyr Arg A41 Lou Set Val SMr Thn Lys MC MG GOT GTT GOT ATT GAT AM ACC TAT Ace0 GOT G00A 40ACM OCT CGACCG ACA AMn Lys Val Val Gly Ile Asp Lys lir Tyr Tn Ala0 Gli 04ln A41 Asp Lou Thr 2410 2420 2430 2440 2450 2460 0 1090 1100 1110 1120 1130 1140 GAC G5T GOT oCA ACC TAT lCT404 GAT OCTMCOTocAGCM GG T cTG GT AMG CGC Asp Gly Val Ala Thr Tyr Ala Thn Asp Ala Asp Ala Ser Lys Ala Gly Ltu Val Lys Arg GCA O&A G&TGAC0C GOT CM 0cC OGT ATC M CAA AG A0T ACT CM AAT ACC Ala Ala Ala Glu Ltu Vol Gln Ala Arg I1e Glu 0ln Lys I11 Thr Thr Glu Gln AMn TOr 2470 2400 2490 2500 2510 2520

1150 1160 1170 1100 1190 1200 0A CAT G0A MC0T0 TAC CTC TAC TTC 200 MAC GAC OTT &CAM 0 GTT GCT MT CC Thr Asp Glu Ann Gly Tyr Lou Tyr Phe Leu AMn Asp Asp Lnu Lys Gly Val Ala Asn Pro AAA TOG TTG GT GM G ATC TCA GCT OT GTT AAA A0111AACA CM TG AAT ST GCM Lyo Trp Ltu Arg Glu A01 Ile Ser A01 Phe Vol Lys Thr Gln Pro Gln Trp Asn Gly Glu 2530 2540 2550 2560 2570 2580 0 M 1210 1220 1230 1240 1250 1260 CAG GIT TCT G0T TIC COT CM OTC TG20A CCA CCA GCA CAT AC CM CAT A0T Gln Val Ser Gly Phb Lto Gln Val Trp Val Pro Val Gly Ala Ala Asp Asp Gln Asp I1e AGC GAA AAG 2CATAC GAT GAC MAC TTGCMMA 0T G0CC CIT TICAT AmC CM TOT Ser Glu Lys Pro Tyr Asp Asp HisLo Gln AMn lly Ala Lou Lys Ph Asp Ann 0ln SMr 2590 i600 2610 2620 2630 2640 0 00 24 1270 1280 1290 1300 1310 1320 CTT CIA GCA GCT ACC GAT ACAA4GCA ACC CAT AM CTC CAT CM GAM CT GCC Arg Val Ala Ala SMr Asp Thr A1 Ser Thn Asp Cly Lys Ser Lu His Gln Asp A10 Ala GAT Tno AcA CCA GAT AoG CM TSG MC TAOGT TTG CTC AMT CoC AA COCA ACT mC CAA 2670 2680 2690 2700 Asp Lou Thr Pro Asp Thr SMr Asn Tyr Arg Lto Ltu AMn Arg TO Pro Tn AMn Gln 2650 2660 AT0 GAC ICT COC GTC ATG00 GTM G0T TIC TCT MC TrC CM TOT OGC 404MACA CM Sot Asp Sr Arg Val Sot Phe Glu Gly Phe Ser Asn Phe Gln Ser Phe A4a Thn Lys Glu

2710 2720 2730 2740 2750 2760

GM GAG TAT A MT OT G ATT OCT MC MATCT MAM m TT TOA TOG G0A ATC t.lu Glu Tyr Thr Ann Val Val 210 Ala Ann AMn Val Asp Lys Phs Val Ser Trp Gly I1-

Continued on following page 4274 FERRETTI ET AL. J. BACTERIOL.

4250 2770 2780 2790 2800 2810 2820 4210 4220 4230 4240 * BI * * * * * AG TCT GOT MC GOT MG ACT TAC TAC TIC GOT MT 00A ACT GCT CM ACT ACT GAC TOT G0A ATG G0 CCr CAG TAT GTC 1CA TC AC0 GAC GOT CG TIC CITT GAT ICT GTA AGT GU Gly Gly Thr Ala Gln Thr Thr Asp Phe Glu Mst Ala Pro Gln Tyr Val Ser Ser Thr Asp Gly Gln Phb LAu Asp SOr Lys Ser Val Oer Val Asn Gly Lys Thr Tyr Tyr Phs Ser Asp

2830 2840 2850 286 2870 2880 4270 4280 4290 4300 4310 4320 * * * * * * * * * * CCA MG CM TT GOT TCT GOT COT COT TAC AT GTC ASSCMAA AAT GT TSAT GCC TOT All GAC CGT TAT GAC OTG G0 A1G C1 MA GCA AAC CAG OCT AT GGT ACC MOG GAT 00A TTC Val Ile Gln A n Gly Tyr Ala Phb Thr Asp Arg Tyr Asp Lou 0ly Nost SOr Lys Ala AMn Gln Ala Asn Pro Lys Gly Gln Thr Phe Lys Asp Gly Ser Gly Val Lsu Arg Phe Tyr A.n

2890 2900 2910 2920 2930 2940 4330 4340 4350 4360 4370 4380 * * * * AS * * * *0 * AAG TAT G0S ACA GCC GAC CAA OTG GTT AAG GCT AC CTC CM AAA G00 CT COT GM GOT CAG TAT GTA TCA GOT AGT G0A TGG TAT G0M ACA GCA GAG CAC GM TOO GOT Lyt Tyr Gly Thr Ala Asp Gln Lau Val Lys Ala Ile Lys Ala Lou Ris Ala Lys Gly Lou Leo Glu Gly Gln Tyr Val Ser Gly Ser Gly Trp Tyr Glu Thr Ala Glu His Glu Trp Val

2950 2960 2970 2980 2990 3000 4390 4400 4410 4420 4430 4440 * * * * * * * * AAG GOT ATG GCA GAC TOG GTT CCA 4AC CAA AT0 TAC ACC TIC CT AAA CMA "A GIG G01 TAT GTT MO TC0 GOT MG GTA TTG ACT GOT GCT CM ACC ATT G0A MT CM COGA GTT TAC Lys Val Hst Ala Asp Trp Val Pro Asp G0n lit Tyr Thr Phb Pro Lys Gln Glu Val Vol Tyr Val Lys Ser Gly Lys Val Lsu Thr Gly Ala Gln Thr I1e Gly AMn Gln Arg Val Tyr

3010 3020 3030 3040 3050 3060 4450 4460 4470 4480 4490 4500 * * Hindlul* coO OAT TOT AGT AC4 GTTS AAM GSC AAA CCA ATC GSAG0A4A CAAATC AAT CAC TIC MO GAT MT GOC CAT CM GTC MM GOT CM TOG 01A ACT 0T1 MC GAT GOT MO CcT Pro Ala Asn His SOr Thr Vol TAr Arg Thr Asp Lys Phb Gly Lys I1e Gly SOr Gln Ile Phs Lys Asp AMn Gly His Gln Val Lys Gly Gln Lou Val Thr Gly AMn Asp G1y Lys Lou

3110 3120 3070 3080 3090 3100 4510 4520 4530 4540 4550 4560 * * * * * * * 62 G0c TAT CAA AAA G0C o01 GCC CTC TAC G0A ACA GAT ACA AAG AGC TCS S GOAT GoC TAc COC TAC TAT GAT GCC MC TCT GGC GAC CAG OCT TIC MC MG TCT GTM ACT GTT MT G0C Ala TAr TAr Ser Ser Gly Asp Asp Tyr Gln Ala Lys Tyr Gly Gly Lau Tyr Val Asp Lys Arg Tyr Tyr Asp Ala Asn Ser Gly Asp Gln Ala Phe AMn Lys Ser Val Thr Val Asn Gly 3130 3140 3150 3160 3170 3180 4570 4580 4590 4600 4610 4620 * * * * TT ACC TIC COT GAC GAA TIA AAM GAA AAA TAT CCA GAA CTC ACC MO AA CAA A4C TC1 OAT CM MG ACT TAC TAC TIC GOT ACTGT CMAA T GCT MT CCA MO GOT Pha Leu Asp Glu Lou Lys Glu Lys Tye Pro Glu Lau Phe Thr Lys Lys Gln I1e Ser Thr Lys TAr Tyr Tyr Phs Gly Ser Asp Gly Thr Ala Gln Thr Gin Ala Asn Pio Lys Gly Gln 3210 3220 3230 3240 3190 3200 4630 4640 4650 4660 4670 4680 * * * G0C CM 1cc A1A0ATCCM TCT GTT MO ATS MA CM TOG TOT GCT MO TIC MC GOT ACC TTTAMOG AT TCT G0A GOT CTT CGT TIc TAC MT CTT CM CAG TAT 11A Gly Gln Ala Ile Asp Pro Ser Val Lys Ile Lys Gln Trp SOr Ala Lys Tyr Phe Asn Gly TAr Pbh Lys Asp Gly Ser Gly Val Lou Arg Phe Tyr AMn Lou Glu Gly 01n Tyr Val Ser 3250 3260 3270 3280 3290 3300 4690 4700 4710 4720 4730 4740 * * * AB * AGC MT A0C CTT G0C COGGOTr01 GAT TAT GTC C0C AMC GAC CM G0A AMC MC MO 4AC GOT AGT G0A TOG TAT AT GCC CM GOT CM TGl CTT T1C OTT MO GAC MAMA GTA Ser Asn Ile Lau Gly Arg Gly Ala Asp Tyr Vol LAu SOr Asp Gln Ala SOr AMn Lys Tyr MMA Gly Ser Gly Trp Tyr Lyt AMn Ala Gln Gly Gln Trp Lou Tyr Val Lys Asp Gly Lys Val 3310 3320 3330 3340 3350 3360 4750 4760 4770 4780 4790 4800 * * * * CTC MST GTSTCA GAT GAT MA CTC TIC STO CCA AAA ACT CSC CTA GG CAA GTC GTA CAA TOT CM Leu Asn Val SOr Asp Asp Lys Lau Ph Leou Pro Lys Thr Leu Lau Gly Gln Val Val Glu TTG ACT GGC CT G ACA GTA GOT MC CMA MOG GT TAC GU AMA MT GOT ATc Lau Thr Gly Leu Gln Thr Val Gly AMn Gln Lys Val Tyr Pbs Asp Lys Asn Gly Ile Gln 3370 3380 3390 3400 3410 3420 4810 4820 4830 4840 4850 4860 1CA GSS AOC COC TOT GAT G0A ACT 0G0 TAT G0C TAC MC TCA AMC ACA ACA GGT GAA AAG TOT Ser Gly Ile Arg Phs Asp Gly Thr Gly Tyr Val Tyr A n Ser SOr Thr Thr Gly Glu Lys GCC MO G00 MOG GCGOTAA0 A ACT TCT GAT GOT MO GOTC 1GC TAC TTT M MU Ala Lys Gly Lys Ala Val Arg Thr Ser Asp Gly Lys Val Arg Tyr Phe Asp Glu Asa SOr 3430 3440 3450 3460 3470 3480 * * * Al1 * * 4870 4880 4890 4900 4910 4920 GUSA C1AT MA ATn AM GM G01 GG MT COT SA SC TOT G CAA GA GG0 AC * * * Val Thr Asp Ser Phb Ile Thr Glu Ala Gly Asn Leu Tyr Tyr Phb Gly Gln Asp Gly Tyr GOr AGC AG 0T1 ACC MC CM ToG AMA TT GTT TAC G0A CM AT TAC TAT TIC G0T AGT Gly Ser HIt Ile TAr Asn Gln Trp Lys Phb Val Tyr Gly Gln Tyr Tyr Tyr Phe Gly Ser 3490 3500 3510 3520 3530 3540 4950 4940 4950 4960 4970 4980 A40 GTG Ac GOTG C MC AT1 MOG 00A TOC 4C0 TAT 1AC TC CSO GST M Ile Ala Ala TAT Hst Val Thr Gly Ala Gln AMn Lys Gly SOr AMn Tyr Tyr Phe Lou A n Gly GAT GOT GOGT GOGTC TAC CST G0C TOO Ac MAA GAT cTG AG0 TM Am crc GAC Asp Gly Ala Ala Val Tyr Arg Gly Trp Asn --- 3550 3560 3570 3580 3590 3600 49iO GCC TT CSC MT ACA OT 14S01SGT 1CC CAA GOT CM MC CA 11AC TAT 01C 0C GAC Val Thr Ala Gln Gln AMn His Ain Ala Lou Arg Asn Thr Tyr Asp Gly Tyr Tyr Gly Asp CM G0C AMA MO cO 3610 3620 3630 3640 3650 3660 * * * * ~~~~~A2* * FIG. 3. Nucleotide sequence of the gtjp gene and flanking re- GGr AAA CGr TAc GAA MST 0G TAC CAA CM TT GG0 MT GAC AGC TOG TAC TIC AAG 5' Gly Lys Arg Tyr Glu Asn Gly Tyr Gln Gln Phe Gly AMn Asp SOr Trp Arg Tyr Phs Lys gions. Numbering begins at the end of the sequence. The deduced amino acid sequence of glucosyltransferase is given below the 3670 3680 3690 3700 3710 3720 nucleotide sequence; an arrow designates the cleavage site for the MAT GOT GTC ATG GCA COT 00C CTC ACA ACC GTS GAT G00 CAC GTr CAA TA4 IT GAT MA Asn Gly Val eIt Ala Lau Gly LAo Thr Thr Val Asp Gly His Val Gln Tyr Phb Asp Lys removal of the signal peptide. Putative promoter and ribosome- site are underlined. The starts of repeat regions 3730 3740 3750 3760 3770 3780 binding sequences Al to A6, Bi, and B2 are marked. GAT G0T GOT CAAMC AAGGAM MO AT ATI GTC ACC CGT GAT MO GTT COT TAC STC Asp Gly Val Gln Ala Lys Asp Lys Ile Ile Vol Thr Arg Asp Gly Lys Val Arg Tyr Phb 3790 3800 3810 3820 3830 3840 investigate whether the gtJp gene product was indeed GAC CAA CAT MT G0A MAAT GT TA ACCAAM ACC TIC GTC GOT 0A8 MO ACT GGT CAC TOG cleaved at this site, the enzyme expressed in E. coli MAF5 Asp Gln His Asn Gly Aga Ala Val Thr Asn Thr Phe Val Ala Asp Lys Thr Gly His Trp was purified by affinity chromatography and subjected to 3850 3840 3870 3080 3890 3900 A3 * * * * N-terminal amino acid analysis. The first 16 amino acids T1C TAT CTA GSSMAA 0GAG 0TGTC OM GTT AC1 GOT CM MT GOT OCT MA CM CAC Tyr Tyr Lou Gly Lys Asp Gly Val Ala Val Thr Gly Ala Gln Thr Val Gly Lys Gln His were identified as Asp-Thr-Glu-Thr-Val-Ser-Glu-Asp-Ser-

3910 3920 3930 3940 3950 3960 Asn-Gln-Ala-Val-Leu-Thr-Ala; this sequence is identical to

CTT TAC TTC GM 0CC MT GOT CM CM GTr AMG GOT G04C TOT C1 0A GCC MAM GAT GG that of the deduced amino acid sequence directly following Leu Tyr Phe Glu Ala Asn Gly Gln Gln Val Lys Gly Asp Phe Val Thr Ala Lys Asp Gly the postulated cleavage site. Thus, the signal peptide is 38 3970 3980 3990 4000 4010 4020 amino acids long and contains regions similar to those found AM CTT SAC TTC T0C GAT GTT TOT 00C 01G TOO 0C MT A1 TIc AT1 GAA G08 in most secretory signal sequences (20). The mature gluco- Lys La Tyr Phe Val S r Asp lit Trp Thr AMn Thr PFs Ile Glu Asp Tyr Asp Asp Gly syltransferase contains 1,559 amino acids and has a molec- 4030 4040 4050 4060 4070 4080 * A4 * * * ular weight of 172,983. The deduced amino acid composition MOG 0A G0C MC TGG TIC TAT COT GOT MMA AT G0 A Ge G 0CACA GGT GC CAA ACC Lys Ala Gly Asn Trp Phe Tyr Lau Gly Lys Ap Gly Ala Ala Val Ahr Gly Ala Gln Thr of glucosyltransferase with and without the signal peptide is

4090 4100 4110 4120 4130 4140 presented in Table 1. Amino acid sequence homology. A series of repeating units Am1MG G0C CM MA COT 1AC TIC AAG 0T MC 0C0CM CMGT AM GO A ATCG1C I1e Lys Gly Gln Lys Lou Tyr Phe Lys Ala A n Gly Gln Gln Val Lys Gly Asp Ile Val is located in the C-terminal one-third of the glucosyltrans- 4150 4160 4170 4180 4190 4200 ferase molecule (Fig. 4). One of the repeating units, desig- MG GAT OSAT 0SMG AZT OG0 TAC TACGA1 CA0CMOM ACT G GAA CM 0T1 AAT nated A, is 35 amino acids long and is present six times. Lys Asp Ala Asp Gly Lys I1 Arg Tyr Tyr Asp Ala Gln Thr Gly Glu Gln Val Ph Asn Although these repeats are hot completely identical, repeat- ing unit A4 was found to have the greatest homology with VOL. 169, 1987 DNA SEQUENCE OF S. SOBRINUS gtfl 4275

TABLE 1. Amino acid composition of glucosyltransferase A deduced from the nucleotide sequence of gtfl 1100 YYFGQDGYMVTGAQNIKGSNYYFLANGAALRNTVY No. of residuesa 1163 WRYFKNGVMALGLTTVDGHVQYFDKDGVQAKDKII Amino acid With signal peptide Without signal peptide 1228 YYLGKDGVAVTGAQTVGKQHLYFEANGQQVKGDFV Alanine 143 136 1293 FYLGKDGAAVTGAQTIKGQKLYFKANGQQVKGDIV Arginine 43 41 Asparagine 109 108 1406 WVYVKSGKVLTGAQTIGNQRVYFKDNGHQVKGQLV Aspartic acid 137 137 Glutamic acid 65 63 1519 WLYVKDGKVLTGLQTVGNQKVYFDKNGIQAKGKAV Glutamine 93 93 Glycine 135 134 B Histidine 19 18 Isoleucine 50 49 1352 VNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQYVSGSGWY Leucine 92 90 1464 VNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQWYVSGSGWY Lysine 122 117 Methionine 27 24 FIG. 4. Amino acid sequences of A and B repeating units found Phenylalanine 64 63 in the C-terminal region of the glucosyltransferase protein. The Proline 30 30 numbers at the left indicate the position number of the first amino Serine 106 101 acid of each repeating unit. Threonine 124 122 Tryptophan 25 24 Tyrosine 98 98 Valine 115 111 resultant recombinant plasmid pSF86 expressed a 65- Total 1,597 1,559 kilodalton peptide which reacted with antiserum to glucosyltransferase, indicating that the entire C-terminal a Mr with signal peptide, 177,100; Mr without signal peptide, 172,983. one-third of the enzyme was being made. This peptide had no detectable glucosyltransferase activity but did bind to the each of the other five repeating units. Based on a comparison affinity column. with A4, the homologies were as follows: Al, 65%; A2, 38%; Protein homology. Comparison of the deduced amino acid A3, 72%; A5, 65%; and A6, 50%. The gene segments sequence of glucosyltransferase with other sequenced pro- corresponding to these regions are also highly conserved and teins revealed partial homologies with three proteins: alpha- except for repeat A2, which contained only 38% identical amylase from barley, alpha-amylase from amylo- bases, the repeats contained 65 to 72% identical bases. liquefaciens, and glycogen phosphorylase from rabbits (Fig. Repeating unit B is present twice and contains 48 amino 7). The homologies of the glucosyltransferase with the two acids, all of which are identical. The corresponding gene alpha-amylases overlap in the same general region, suggest- regions contain a stretch of 132 identical bases. ing a region of functional homology for the three proteins. Functional regions of glucosyltransferase. Since functional glucosyltransferase was expressed from both pMLG1 and DISCUSSION pMLG5, the terminal 0.5 kb of the gtpl gene is clearly not essential for activity. However, we have previously reported Nucleotide sequence analysis of the 5-kb fragment of S. that a deletion extending from the end of the gene to the Sacl sobrinus MFe28 showed the presence of a single open site at position 3085 resulted in expression of a truncated and reading frame which coded for the glucosyltransferase pro- enzymatically inactive peptide (27, 27a). To further define tein. The deduced amino acid sequence of the mature protein the length of the gene sequence required for expression of a had a molecular weight of 172,983, which agreed closely with functional glucosyltransferase, a series of derivatives of the value derived by SDS-PAGE. This value varies from phage M13 containing various lengths of gtf7 were examined for expression of peptides which had enzyme activity and the ability to bind to glucan (Fig. 5). The M13 derivatives Base 3457 3646 3841 4036 4213 4375 4552 4713 R7-20, R5-2, and R7-3 all formed polymer when plated with Amino acid 1100 1163 1228 1293 1352 1406 1464 1519 E. coli JM109 on sucrose-containing medium (Fig. 6). The Al A2 A3 A4 B1 A5 82 AS _ - -6_ _w%A two shortest M13 derivatives tested, R7-34 and R3-6, did not GTF GBP form any polymer either on plates or in a tube assay for pMLG5 glucosyltransferase using 14C-labeled sucrose. Nor did they pMLG1 release reducing sugar from sucrose, whereas the longer + + derivatives did encode enzyme activity for the release of R7-20 R5-2 reducing sugars, as indicated by the fact that when they were + + plated on E. coli JM109 on sucrose indicator plates (Russell R7-3 et al., in press), acid was produced by E. coli. The same R7-34 pattern of function was found when glucan-binding ability was examined; derivatives R7-20, R5-2, and R7-3 all ex- R3-6 pSF86 pressed peptides which were retained by a mutan-Sepharose _ + column, whereas R7-34 and R3-6 did not. FIG. 5. The 3' end of the gtp gene showing positions of repeat The results presented above indicated that the genetic regions and termination points of phage M13 derivatives and pSF86, information essential for enzyme activity and glucan-binding a pUC8 vector carrying the terminal ScaI-HindIII fragment of gtfl. function was located in the C-terminal one-third of the gene. Adjacent to each fragment is shown the ability of its product to An in-frame gene fusion was therefore made between pUC8 exhibit glucosyltransferase (GTF) activity or to function as a glucan- and the ScaI site of gtpl located at position 3290. The binding protein (GBP). 4276 FERRETTI ET AL. J. BACTEPIOL.

is available to comment further about sequences involved in transcription termination. The presence of a 38-amino-acid signal peptide was con- firmed by N-terminal amino acid analysis of the purified glucosyltransferase protein, in which the sequence of the first 16 amino acids was identical to the deduced sequence. This signal peptide has properties similar to those of other signal peptides (20), i.e., a positively charged N-terminal region followed by a string of 23 hydrophobic amino acids and a more polar C-terminal region. The cleavage site between Ala and Asp and the surrounding residues are in accordance with the -3, -1 rule proposed by von Heijne (31). The 38-amino-acid signal peptide ofglucosyltransferase is in the general size range of other streptococcal signal peptides (6, 14, 17, 32) and that reported for other gram- positive organisms (20). It is apparent that E. coli is capable of recognizing the gtfl gene product and cleaving it at the site expected for removal of a secretion signal peptide. Other evidence suggests that the enzyme passes through the cytoplasmic membrane. For FIG. 6. Accumulation of glucan above points where E. coli example, E. coli strains expressing glucosyltransferase can JM109 was infected with phage M13 derivative R7-20 on sucrose- metabolize sucrose (27a), although sucrose can pass through containing medium. only the outer membrane and not the cytoplasmic membrane (5). Glucosyltransferase would thus be expected to accumu- late in the periplasmic space, but we have been unable, using previous estimates for glucosyltransferase that produce in- conventional osmotic shock methods for release of periplas- soluble glucans (3, 18), although proteolytic degradation and mic proteins (13, 34), to obtain release of the protein. In view problems associated with molecular weight determinations of the observation that much of the C-terminal region of by gel analysis could easily account for the differences. The glucosyltransferase is not essential for function, it is tempt- deduced amino acid composition of the glucosyltransferase ing to speculate that the commonly observed heterogeneity indicates that it is a highly hydrophilic protein, containing of molecular sizes in enzyme preparations (18, 25) is due to 11.5% basic amino acids, 12.6% acidic amino acids, and sequential degradation by proteolytic action on this end of 41.6% polar amino acids. the molecule. As yet, however, there is insufficient evidence The restriction map generated from nucleotide sequence to confirm this idea. analysis is in agreement with previous maps established for The C-terminal region of the glucosyltransferase protein this fragment and also supports previous speculations con- contains two sets of repeated sequences, the A repeating cerning the location of probable transcription and translation unit present six times and the B repeating unit present twice. initiation sites (27, 27a). These sites are similar to transcrip- The A repeating units exhibit some variability, whereas the tion and translation sites reported for other streptococcal B repeating units are completely identical. The manner or genes (6, 7, 14, 17). Downstream of the coding region, a sequence in which these duplications and changes occurred single termination codon is present, but insufficient sequence is not obvious. However, duplications in other streptococcal

Alpha-amylase (EC 3.2.1.1)

69a HSVIQNGYAFTDRYDID---ASKYGNAAELKSL *:: *-::, * ::-- :-:-:* 840 FQSFATKEEEYTNVVIANNVDKFVSWGITDFEMAPQYVSSTDGQFLDSVIQNGYAFTDRYDLGMSKANKYGTADQLVKA 40 FEWYTPNDGQHWK-RLQNDAEHLSDIGITAVWIPPAYKGLSQSD-NGYGPYDLYDLGE-FQQKGYVRTKYGTKSELQDA

looa IGALHGKGVQAIADIVINHRCA 919 IKALHAKGLKVMADWVPDQMYT 116b IGSLHRRNVQVYGD

Glycogen phosphorylase (EC 2.4.1.1)

1107 YMVYGAQNIKGSNYYFLANGAALRNTVYTD-AQGQNHYYGNDGKRYENGYQQFGNDSWRYFKNGVMALGLTTVDGHVQY

* --:- --- :--:-- --:-. . . *0::::--: :: : *---::----:.. .. 37c FTLVKNRNVATPRDYYFAHALTVRDHLVGRWIRTQQHYYEKDPKRI--YYLSLQFYMGRTLQNTMVNLALENACDEADY FIG. 7. Alignment of predicted amino acid sequences of glucosyltransferase with regions of alpha-amylase from barley (a), alpha-amylase from Bacillus amyloliquefaciens (b), and glycogen phosphorylase from rabbit (c). The sequences were aligned by the PRTALN program of Wilbur and Lipmann (33). Identical (:) and conserved ( ) amino acids are indicated. The numbers at the left indicate the position number of the first amino acid of each protein shown. VOL. 169, 1987 DNA SEQUENCE OF S. SOBRINUS gtfl 4277 genes and proteins have been recently reported, e.g., the method for the rapid purification of near milligram quantities of group A type 6 M protein (14) and the group B immunoglob- a cloned restriction fragment. Gene Anal. Tech. 2:108-114. G-binding protein (6, 11). 9. Gilpin, M. L., R. R. B. Russell, and P. Morrissey. 1985. Cloning ulin and expression of two Streptococcus mutans glucosyltrans- The functional role of the repeating units is not clear, ferases in Escherichia coli K-12. Infect. Immun. 49:414-416. although the region containing them is essential both for 10. Goldsmith, E., and R. J. Fletterick. 1983. Oligosaccharide glucosyltransferase activity and for binding of glucans. It is conformation and protein saccharide interactions in solution. of interest that the part of the glucosyltransferase protein Pure Appl. Chem. 55:577-588. showing homology with glycogen phosphorylase spans the 11. Guss, B., M. Eliasson, A. Olsson, M. Uhlen, A.-K. Frej, H. region containing the first two A repeat units. This region of Jornvall, J.-I. Flock, and M. Lindberg. 1986. Structure of the glycogen phosphorylase is thought to be involved in sub- IgG-binding regions of streptococcal protein G. EMBO J. 5: strate binding or catalysis (21) but is distinct from the region 1567-1575. (amino acids 401 to 443) thought to be involved in the storage 12. Hamada, S., and H. D. Slade. 1980. Biology, immunology, and of cariogenicity of Streptococcus mutans. Microbiol. Rev. 44: site which binds heptamylose (10). The regions homology 331-384. with the alpha-amylase proteins are found further upstream, 13. Hazelbauer, G. L., and S. Harayama. 1979. Mutants in trans- not too distant from the repeating unit. The common feature mission of chemotactic signals from two independent receptors of all three enzymes is the ability to bind to glucans, and it of E. coli. Cell 16:617-625. seems likely that the identified regions of glucosyltransferase 14. Hollingshead, S. K., V. F. Fischetti, and J. R. Scott. 1986. homology are also involved in or essential for glucan bind- Complete nucleotide sequence of type 6M protein of the group ing. A streptococcus. J. Biol. Chem. 261:1677-1686. The relationship of the gtJf gene to other known gluco- 15. Konat, G., H. Offner, and J. Mellah. 1984. Improved sensitivity syltransferase genes is of considerable interest, especially in for detection and quantitation of glycoproteins on polyacryl- maps reported (1, 22, 24, 27a) amide gels. Experientia 40:303-304. view of the different restriction 16. Kuhn, S., H.-J. Fritz, and P. Starlinger. 1979. Close vicinity of and the evolutionary distance between S. mutans, (G+C ISI integration sites in the leader sequence of the gal operon of content, 36 to 38%) and S. sobrinus (G + C content, 44 to E. coli. Mol. Gen. Genet. 167:235-241. 46% (4). In the accompanying paper, Shiroza et al. report a 17. Malke, H., B. Roe, and J. J. Ferretti. 1984. Nucleotide sequence sequence analysis of the S. mutans gtJB gene, which speci- of the streptokinase gene from Streptococcus equisimilis H46A. fies a glucosyltransferase that also produces insoluble Gene 34:337-362. glucans (30). 18. Mukasa, H. 1986. Properties of Streptococcus mutans glucosyl- transferases, p. 121-132. In S. Hamada, S. Michalek, H. Ki- yono, L. Menaker, and J. R. McGhee (ed.), Molecular micro- ACKNOWLEDGMENT biology and immunobiology of Streptococcus mutans. Elsevier We thank David R. Lorenz for his many contributions to this Science Publishing, Inc., New York. work and Tricia Stalker for excellent technical assistance. 19. Muller-Hill, B., L. Crapo, and W. Gilbert. 1968. Mutants that This research was supported by Public Health Service grant DE make more lac repressor. Proc. Natl. Acad. Sci. USA 59: 08191 from the National Institutes of Health and by the Medical 1259-1264. Research Council. 20. Oliver, D. 1985. Protein secretion in Escherichia coli. Annu. Rev. Microbiol. 39:615-648. 21. Palm, D., R. Goerl, and K. J. Burger. 1985. Evolution of LITERATURE CITED catalytic and regulatory sites in phosphorylases. Nature (Lon- 1. Aoki, H., T. Shiroza, M. Hayakawa, S. Sato, and H. K. don) 313:500-503. Kuramitsu. 1986. Cloning of a Streptococcus mutans glucosyl- 22. Pucci, M. J., K. R. Jones, and F. L. Macrina. 1987. Evidence for transferase gene coding for insoluble glucan synthesis. Infect. a duplicated DNA sequence associated with a glucosyltrans- Immun. 53:587-594. ferase gene in Streptococcus mutans, p. 205-208. In J. J. 2. Burne, R. A., B. Rubinfeld, W. H. Bowen, and R. E. Yasbin. Ferretti and R. Curtiss III (ed.), Streptococcal genetics. Amer- 1986. Cloning and expression of a Streptococcus mutans gluco- ican Society for Microbiology, Washington, D.C. syltransferase gene in Bacillus subtilis. Gene 40:201-209. 23. Pucci, M. J., and F. L. Macrina. 1986. Molecular organization 3. Ciardi, J. 1983. Purification and properties of glucosyltrans- and expression of the gtfA gene of Streptococcus mutans LM7. ferases of Streptococcus mutans: a review, p. 51-64. In R. J. Infect. Immun. 54:77-84. Doyle and J. E. Ciardi (ed.), Glucosyltransferases, glucans, 24. Robeson, J. P., R. G. Barletta, and R. Curtiss III. 1983. sucrose and dental caries (a special supplement to Chemical Expression of a Streptococcus mutans glucosyltransferase gene Senses). Information Retrieval Limited, Washington, D.C. in Escherichia coli. J. Bacteriol. 155:211-221. 4. Coykendall, A. L., and K. B. Gustafson. 1986. of 25. Russell, R. R. B., E. Abdulla, M. L. Gilpin, and K. Smith. 1986. Streptococcus mutans, p. 21-28. In S. Hamada, S. M. Mich- Characterization of Streptococcus mutans surface antigens, p. alek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molec- 61-70. In S. Hamada, S. Michalek, H. Kiyono, L. Menaker, and ular microbiology and immunobiology of Streptococcus J. R. McGhee (ed.), Molecular microbiology and immunobiol- mutans. Elsevier Science Publishing, Inc., New York. ogy of Streptococcus mutans. Elsevier Science Publishing, Inc., 5. Decad, G. M., and H. Nikaido. 1976. Outer membrane of New York. gram-negative bacteria. XII. Molecular-seiving function of cell 26. Russell, R. R. B., D. Coleman, and G. Dougan. 1985. Expression wall. J. Bacteriol. 128:325-336. of a gene for glucan-binding protein from Streptococcus mutans 6. Fahnestock, S. R., P. Alexander, J. Nagle, and D. Filpula. 1986. in Escherichia coli. J. Gen. Microbiol. 131:295-299. Gene for an immunoglobulin-binding protein from a group G 27. Russell, R. R. B., and M. L. Gilpin. 1987. Identification of streptococcus. J. Bacteriol. 167:870-880. virulence components of mutans streptococci, p. 201-204. In J. 7. Ferretti, J. J., K. S. Gilmore, and P. Courvalin. 1986. Nucleo- J. Ferretti and R. Curtiss III (ed.), Streptococcal genetics. tide sequence analysis of the gene specifying the bifunctional American Society for Microbiology, Washington, D.C. 6'-aminoglycoside acetyltransferase 2"-aminoglycoside phos- 27a.Russell, R. R. B., M. L. Gilpin, H. Mukasa, and G. Dougan. photransferase enzyme in Streptococcus faecalis and identifica- 1987. Characterization of glucosyltransferase expressed from a tion and cloning of gene regions specifying the two activities. J. Streptococcus sobrinus gene cloned in Escherichia coli. J. Gen. Bacteriol. 167:631-638. Microbiol. 133:935-944. 8. Gilmore, M. S., K. S. Gilmore, and W. Goebel. 1985. A new 28. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc- strategy for "ordered" DNA sequencing based on a novel ing with chain terminating inhibitors. Proc. Natl. Acad. Sci. 4278 FERRETTI ET AL. J. BACTERIOL.

USA 74:5463-5467. from bacteriophage T12. Infect. Im- 29. Segrest, J. P., and R. L. Jackson. 1972. Molecular weight mun. 52:144-150. determinations of glycoproteins by polyacrylamide gel electro- 33. Wilbur, W. J., and D. J. Lipman. 1983. Rapid similarity phoresis in sodium dodecyl sulphate. Methods Enzymol. 28: searches of nucleic acid and protein data banks. Proc. Natl. 54-63. Acad. Sci. USA 80:726-730. 30. Shiroza, T., S. Ueda, and H. K. Kuramitsu. 1987. Sequence 34. Witholt, B., M. Boekhout, M. Brock, J. Kingma, H. van Heerik- analysis of the gtjB gene from Streptococcus mutans. J. Bacte- huizen, and L. de Lelj. 1976. An efficient and reproducible riol. 169:4263-4270. procedure for the formation of spheroplasts from variously 31. von Heijne, G. 1983. Patterns of amino acids near signal se- grown Escherichia coli. Anal. Biochem. 74:160-170. quence cleavage sites. Eur. J. Biochem. 133:17-21. 35. Yanisch-Perron, C., J. Vieira, and J. Messing. 1985. Improved 32. Weeks, C. R., and J. J. Ferretti. 1986. Nucleotide sequence of M13 phage cloning vectors and host strains: nucleotide se- the type A streptococcal exotoxin (erythrogenic toxin) gene quences of M13 mpl8 and pUC19 vectors. Gene 33:103-119.