<<

7723

APPLIED AND ENVIRONMENTAL MICROBIOLOGY. Feb. 1997. p. 355-363 Vol. 63. Nl). :: 0099-2240/97/S04.00 +0 Copyright £' 1997. American Society for Microbiology

Cloning of Cellobiose Phosphoenolpyruvate-Dependent Phosphotransferase Genes: Functional Expression in Recombinant Escherichia coii "and Identification of a Putative Binding Region for Disaccharidest

XIAOKUANG LAI,I F. C. DAVIS, I R. B. HESPELV AND L. O. INGRAM 1*

Depanmellt ofMicrobiology and Cell Science, University ofFlorida, Gainesville, Florida 32611. I and Fennenration Biochemistry Unit. National Cellter for Agricultural Utilization Research USDA Agricultural Research Service. Peoria, Illinois 61604:

Received 9 July 1996/Accepted 4 November 1996

Genomic libraries from nine cellobiose-metabolizing bacteria were screened for cellobiose utilization. Pos­ itive clones were recovered from six libraries, all of which encode phosphoenolpyruvate:carbohydrate phos­ photransferase system (PTS) proteins. Clones from Bacillus subtilis, Butyrivibrio jibrisolvens, and Klebsiella oxytoca allowed the growth of recombinant Escherichia coli in cellobiose-M9 minimal medium. The K. oxytoca clone, pL0I1906, exhibited an unusually broad range (cellobiose, arbutin, salicin. and methylum­ belliferyl derivatives ofglucose, cellobiose, mannose, and xylose) and was sequenced. The insert in this plasmid encoded the carboxy-terminal region of a putative regulatory protein, cellobiose permease (single polypeptide), and phospho-~-glucosidase,which appear to form an operon (casRAB). Subclones allowed both casA and casB to be expressed independently, as evidenced by in vitro complementation. An analysis of the translated sequences from the EIIC domains of cellobiose, aryl-~-glucoside, and other disaccharide permeases allowed the identification of a 50-amino-acid conserved region. A disaccharide consensus sequence is proposed for the most conserved segment (13 amino acids), which may represent part of the EIIC for binding and phosphorylation.

Cellulose, a [3-1,4-linked polymer of , represents ap­ phosphotransferase systems (PTS) have been reported in proximately half of the dry weight of plant cell walls. Each year Streptococcus bovis (34), Bacillus stearothemwphilus (27). and 4 X 1010 metric tons of this polymer is produced by photosyn­ B. subtilis (15a). Cryptic PTS genes for cellobiose utilization thesis and degraded by microbial cellulases (8). Recent interest are present in Escherichia coli (16). Although the mechanisms in cellulose hydrolysis has focused on animal nutrition (32) and of uptake are unknown, Microbispora bispora (58) contains two on the potential development of environmentally benign pro­ genes which encode cellobiase activity. R flavefaciens. R. albus, cesses for microbial conversion into fuel ethanol (57). Cellulomonas uda, and C. favigena (48) contain intracellular In nature cellulose is solubilized bv a combination of endo­ which cleave cellobiose into glucose and glu­ glucanase and cellobiohydrolase activities with cellobiose as cose-phosphate, conserving energy from the . the primary . Cellobiose is a potent inhibitor of these Four PTS operons which encode proteins (Ell permease and must be continually removed by microbial activity and phospho-[3-glucosidase) for cellobiose utilization have (8, 58). This disaccharide is among the most abundant soluble been sequenced: B. subtilis celRABCD (15a). B. stearothermo­ substrates in nature for microbial growth and is metabolized in philus celRABCD (27), and two cryptic operons from E. coli, preference to glucose by some rumen organisms (17, 53). celABCDF (37) and ascFG (arbutin, salicin, and cellobiose) The ability to utilize cellobiose is widespread among gram­ (16). The three permeases encoded by eel operons are quite negative. gram-positive, andArchaeal genera (8). Bacterial cel­ similar but share little sequence similarity with the protein lobiase ([3-glucosidase) activity is typically cell associated and encoded by ascF. These four operons also encode three dif­ may hydrolyze cellobiose to glucose prior to uptake in some ferent types of phospho-[3-glucosidases. Only the E. coli ascB cases. Multiple systems for cellobiose utilization within a single and B. subtilis celC enzymes appear similar (56% identity). organism do not appear uncommon (12, 16, 58). Cellobiose In this study, we have used 4-methylumbelliferyl-[3-D-glu­ uptake in Clostridium themwcellum is energized directly by coside (MUG) and cellobiose-MacConkey agar to screen nine ATP hydrolysis (35). Inhibitor studies have established the different genomic libraries for cellobiose utilization genes presence of active transport systems for cellobiose in Rumino­ which are functionally expressed in E. coli. Subsequent se­ coccus flavefaciens (17), R albus (53), Fibrobacter succinogenes quencing of the most active clone and sequence analysis al­ (32), and Streptomyces granaticolor (23) which may be coupled lowed the identification of a highly conserved region within the to ion gradients. Cellobiose phosphoenolpyruvate-dependent ElIC domain which may be involved in disaccharide binding.

MATERIALS AND METHODS * Corresponding author. Mailing address: Department of Microbi­ Bacterial strains, growth conditions, and plasmids. Bacteria and plasmids used in this study are described in Table 1. B. coagulans was grown at 55°C in ology and Cell Science, Museum Road, Bldg. 981. P.O. Box 110700, Difco Tryptic Soy medium. K oxylOca and E. coli were grown in Luria-Bertani University of Florida, Gainesville, FL 32611. Phone: (352) 392-8176. (LB) medium (2) at 30 and 37°C. respectively. PrevOielia (Bacteroides) rumini­ Fax: (352) 392-5922. E-mail address: [email protected]. cola. Butyrivibrio jibrisolvens. Selenomonas ruminamium. and Streptococcus bovis t Florida Agricultural Experiment Station publication no. R-05241. were grown anaerobically at ~oC in RGM medium (20).

355 356 LAI ET AL. APPL. El"VIRO;-';. MICROBIOL.

TABLE 1. Bacterial strains and plasmids used in this study

Strain or plasmid Genetic characteristic( s) Source or reference

Bacterial strains .,.., Bacillus coagulalls XL-55-60 Prototroph _I B. subli/is YB886 trpC2 xill-1 61 PrevOIella ntnlillicola 23 20 BUlVrivibrio fibrisoll'ells H17c 20 Klebsiella Q.\"-y·toca P2 57 SelellOl1l0llaS nmlillanrium HD4 20 Slreptococcus bOl'is 26 20 Escherichia coli DHSe> F- lacZ~M15 recAl endAl hsdR17 (rl-;- ml-; -) supE44 BRL" E. coli JLT2 F- mcrB mrr hsdS20 (rB- mB-) recAl3 supE44 pIS! 54 E. coli JLT3 F- merB mrr hsdS20 (rB- mB-) recAl3 supE44 plsH 54 E. coli JL630 bglR67 bgIB::A1acZ bgL4.7 ~acX74 30

Plasmids pUC18 bla amp lacI' Z' BRL" pL0I902 pUC1S with B. slearolhemlOphilus cel operon 27 pL0I1901 pUCl8 with B. coagulans l3-glucoside PTS genes This stud\' pL0I1902 pUC18 with B. subli/is l3-glucoside PTS genes This stud,­ pL0I1903 pUCl8 with B. subti/is l3-glucoside PTS genes This stud~ pL0I1904 pUCl8 with S. bOI-is l3-glucoside PTS genes This stud\' pL0I1905 pUC1S with B. jibrisolvells l3-glucoside PTS genes This stud" pL0I1906 pUC18 with K oxytoca P2 l3-glucoside PTS genes This stud\' pL0I1907 pUClS with K planricola l3-glucoside PTS genes This stud\' pL0I1998 EagIJXbaI deletion of pL0I1906 retaining casR' AB This stud)' pL0I1992 SjilXbaI deletion of pL0I1906 (3' end of casA and all of casB) pL0I1997 San deletion of pL0I1998 (3' end of casB) This studY pL0I1974 Frameshift mutation in casA of pL0I1998 This stud)' pL0I1975 NcoIlBslXI deletion of pL0I1998 (internal deletion in casA) This study

a Life Technologies. Inc.. Gaithersburg. Md.

Recombinant E. coli was evaluated for carbohydrate utilization using LB agar harvested by centrifugation (5.000 x g. 5 min. 4'C). washed twice. and resus­ containing 10 mg of4-methylumbelliferyl-glucoside liter-I. M9 agar (2) contain­ pended in 50 mM NaKHP04 buffer (pH 7.2) to a density of approximately 50 ing arginine (50 mg liter-I) and 2 g of cellobiose liter-I. and MacConkey agar optical density at 550 nm ml- 1• Cells were disrupted by two passages through a base containing 10 g of I'-glucoside liter-I (cellobiose. arbutin. or salicin). Am­ French pressure cell at 20.000 Ib in -2. Lysates were assayed at 37'C in 50 mM I picillin (50 ....g ml- ) was added when appropriate for the selection. Chromogenic NaKHP04 buffer (pH 7.2) containing 5 mM MgCI2. 2 mM p-nitrophenyl-I'-D­ substrates. I'-glucosides. and ampicillin were purchased from the Sigma Chem­ l.4-glucopyranoside (PNPG) and 2 mM phosphoenolpyruvate. Reactions were ical Company (St. Louis. Mo.). tenninated by adding an equal volume of IM Na2C03' After centrifugation DNA manipulation. Chromosomal DNA was isolated essentially as described (5.000 x g. 5 min).p-nitrophenol was measured at 410 nm. Protein was estimated by Cutting and Vander Horn (9). Standard procedures were used for the con­ with the Bradford Reagent (Bio-Rad Laboratories. Richmond. Calif.) with bo­ struction. isolation. and analysis of plasmids (46). vine serum albumin as a standard. Genomic libraries. Genomic libraries of C. thennocellum and K planticola Nucleotide sequence accession number. The nucleotide sequence data have were generously provided by Arnold Demain (Massachusetts Institute of Tech­ been submitted to GenBank and assigned accession number U6I727. nology. Cambridge. Mass.) and J. Doran (Central Michigan University. Mt. Pleasant. Mich.). respectively. Genomic libraries for seven other organisms were constructed in DH5CI by ligating Sau3AI partial digestion products (4- to 6-kbp RESULTS AND DISCUSSION fragments) into the BamHI site of pUC1S. From 2.000 to 10.000 recombinant 'colonies were pooled for each library. Isolation of E. coli recombinants expressing l3-glucosidase Southern hybridization. K oX)ltoca chromosomal DNA was digested with activity. DH5o: recombinants harboring plasmids from nine restriction endonuclease (BamHI. Clal. Hindi, Hpal. or Pst!). separated by agarose gel electrophoresis. and transferred to Zeta-probe GT membranes (Bio­ genomic libraries were screened for l3-glucosidase activity on Rad Laboratories. Richmond, Calif.). A BstEIl DNA fragment within the cas LB-MUG plates containing ampicillin and on MacConkey agar operon of pLOII906 served as the probe. after being labelled with digoxigenin containing cellobiose (Table 2). All libraries were prepared (Genius System I; Boehringer Mannheim Biochemicals. Indianapolis. Ind.). from bacterial strains which metabolize cellobiose. Positive Membranes were hybridized at 55°C and developed as recommended by the colonies were recovered from six libraries: B. coagulans. B. manufacturer. DNA sequencing and sequence analysis. Wizard Miniprep columns (Promega. subtilis. B. fibrisolvens. S. bovis, K oxytoca, and K plantieola. No Madison. Wis.) were used for plasmid purification. Dideoxy sequencing was MUG-positive or MacConkey-positive clones were recovered perfonned by using fluorescent primers (forward. 5'-CACGACGTTGTAAAA from the genomic libraries of P. rnminieola. S. rnminantium, CGAC-3'; reverse. 5'-ATAACAATTTCACACAGGA-3') (L1-COR. Lincoln. Neb.). Extension reactions were perfonned with a Perkin Elmer GeneArnp PCR and C. thennoeellum despite repeated attempts. System 9600 (Perkin Elmer-Cetus, Norwalk. Conn.) using a SequiThenn Long­ With the exception of B. subtilis (two types), positive clones Read Cycle Sequencing Kit-LC (Epicentre Technologies. Madison. Wis.) (30 from each genomic library contained related DNA segments cycles; denaturation for 30 s at 95°C. annealing for 30 s at 60'C. and extension for based on analyses of restriction fragments and were considered I min at 70°C). Extension products were separated and read with aLI-COR DNA Sequencer model 4000L to be siblings. A single representative of each type was retained Sequences were analyzed by using the Wisconsin Genetics Computer Group for further study. The previously cloned PTS eel operon from (GCG) software package (10) and the National Center for Biotechnology Infor­ B. stearothennophilus (27) was also included for comparison. mation BLAST network service (4). Global alignments were investigated using Recombinant DH5o: strains harboring plasmid pLOI902 from MACAW (51). B. stearothennophilus, pL0I1901 from B. eoagulans, pL0I1902 In vitro assay of PTS activil)·. The combined PTS phosphorylation of I'-glu­ coside and cleavage activity of recombinant E. coli were detennined by a mod­ and pL0I1903 from B. suMlis, pL0I1904 from S. bovis. and ification of the Kricker and Hall (26) procedure. Overnight cultures (15 h) were pL0I1906 from K oxytoea appeared much more active on VOL. 63. 1997 CLONING OF CELLOBIOSE PHOSPHOTRANSFERASE GENES 35~

TABLE 2. Hydrolysis of glycosides by recombinant E. coli"

R~sult' Source organism Plasmid LB agar! MacConkeyagar' tgen~I~j)I' M~ mmim"j MUG MUC MUM MtlA MUX Cd Arh Sal IcdlohlOSC I Bst pLOI902 + +,- -1-:- Bco pLOI1901 ~ -/+ Bsu pLOI1902 + Bsu pLOI1903 Sbo pLOI1904 -i-/- + Bfi pLOI1905 +/- +/- + Kox pLOI1906 ... +/- ~ + Kpl pLOI1907! ... NA NA NA NA NA NA NA NA Kox(casA - B-) pLOI1998 +1- Kox(casB-) pLOI1997 -/+ -/+ -/+ Kox(casA-) pLOI1975

Vector pUC18 +

a Plasmids were tested in two different hosts: strain JL630 (bgLA-negative mutant) and strain DHSa (wild type for bgL4.). b Source organisms are listed in Table 1 and are designated here hy the first letter of each genus (uppercase) followed by the first two letters of each species (lowercase). C Results are indicated as positive (+) or negative (-) for substrate utilization. Hosts lacking plasmids were negative in all cases. Activities were scored after 24 h of incubation at 37"C. Most results with recombinants of JL630 and DH5a were the same. Where these differed. both results are shown separated hy a slash (JL630 DH5a). d All strains were negative when tested with 4-methylumbelliferyl a-L-arabinofuranoside and 4-methylumbelliferyl a-o-mannopyranoside. < eel. cellobiose: Arb. arbutin: Sal. salicin. fPlasmid pL0I1907 was unstable in E. coli. Results are not available (NA) for other substrates.

MUG indicator plates than recombinants harboring plasmids biose-M9 broth (doubling times of 6 and S h, respectively). from B. fibrisolvens (pL0I190S) or K planticola (pLOI1907). Three clones hydrolyzed 4-methylumbelliferyl-(3-o-cellobio­ Recombinants harboring pL0I902, pL0I1903, pLOI190S. and side (MUC), a cellotriose analog. pL0I1906 were positive on cellobiose-MacConkey plates. Recombinants were also tested with 4-methvlumbellifervl Plasmid pL0I1907 was unstable and not examined further. derivatives of mannose. arabinose, and xylose (Table 2). These All MUG-positive clones recovered express f3-glucoside PTS substrates are analogs of soluble disaccharide products from genes. Plasmids from positive clones of DHSex were trans­ the depolymerization of hemicelluloses in plant cell walls. formed into E. coli JLT2 and JLT3, which contain mutations in 4-Methylumbelliferyl-(3-o-mannopyranoside (MUM) and the PTS 'general proteins, EI(ptsl) and HPr(ptsH), respectively. 4-methylumbelliferyl-(3-o-xylopyranoside (MUX) were hydro­ All resulting transformants were unable to hydrolyze MUG or lyzed by only one clone each, DHSex(pL0I1906) and DHSex produce red colonies on celiobiose-MacConkey or grow on (pL0I1904), respectively. All clones were negative with 4-meth­ cellobiose-M9 medium, demonstrating an absolute depen­ vlumbellifervl-ex-L-arabinofuranoside. dence on the PTS general proteins for activity. Although other - Unexpected results were obtained with 4-methylumbelli­ mechanisms for cellobiose utilization have been described pre­ feryl-ex-L-arabinopyranoside (MUA) (Table 2), a substrate also viously (8), such as the ATP-dependent uptake by C. thenno­ hydrolyzed by (3-galactosidase (24). As expected, MUA was eel/urn (3S), only clones which express PTS genes were recov­ hydrolyzed by lacZ-proficient strains but not by JL630 or ered by our screening procedures. Possible reasons for the DHSex which contain truncated lacZ genes. Both strains JL630 -absence of clones encoding other cellobiose utilization systems (pUC18) and DHSex(pUC18) exhibited ex-complementation include the localization of transport and cleavage enzymes at of (3-galactosidase, and both hydrolyzed MUA Hydrolysis of different chromosomal sites, the presence of cellobiose phos­ MUA by recombinants containing heterologous cellobiose phorylases which may produce a nonfiuorescent product (meth­ genes also resulted from ex-complementation. MUA-positive ylumbelliferyl phosphate), or poor gene expression in E. coli. recombinants were also positive with S-bromo-4-chloro-3-in­ Substrate range. All positive clones were further evaluated dolyl-(3-o-galactopyranoside (data not shown). for (3-glucoside uptake and hydrolysis by using chromogenic Plasmids pL0I1903 (B. subtilis) and pL0I1906 (K oxytoca) model substrates in LB, (3-glucosides in MacConkey agar. and were selected for further study. PTS genes on these plasmids cellobiose in M9 minimal medium (Table 2). Strain DHSex were expressed well in E. coli. During the course of this inves­ constitutively expresses the bgLA gene encoding a phospho-(3­ tigation, sequences for the PTscclioblOsc genes from B. subtilis glucosidase. To eliminate potential confusion from this relat­ were submitted to GenBank (lSa) which were essentially iden­ ed activity, a bgLA mutant (strain JL630) was also used as a tical to those in our clone. Our further studies have focused on host (Table 2). Results obtained with recombinant DHSex the K oxytoca PTS genes (denoted cas) which encode enzymes and JL630 were similar in most cases. Four JL630 clones with an unusually broad range of substrates (cellobiose, arbu­ (pL0I902 from B. stearothennophitus, pL0I1903 from B. sub­ tin, salicin, MUG, MUC, MUM, and MUX) and support the titis, pL0I190S from B. fibrisolvens, and pLOI1906 from K OXY­ most rapid growth of recombinant E. coli on cellobiose-mini­ toea) were able to utilize cellobiose, as evidenced by acid pro­ mal medium. duction on cellobiose-MacConkey plates. Three clones utilized Localization ofK. oxytoca cas coding regions and character­ cellobiose as a sole carbon and energy source on cellobiose-M9 ization of encoded activities. TheK oxytoca genes in pL0I1906 medium and utilized aryl-(3-glucosides (arbutin and salicin). were designated cas for utilization of cellobiose, arbutin, and Two of these, pL0I1903 and pL0I1906, grew well in cello- salicin. Southern hybridization analysis with the internal estEll 358 LA! ET AL. APPL. ENVIRON. MICROBIOL.

o 2 4 6 Kbp lacZ promoter ..

0:: - = - = ;: ~ ~ r:: ~ ~ ~ ::::'" '" ~ ::::'" ~ ! - ~ I - pL0I1906 IC + A B pL0I1998 + + pL0I1997 + pL0I1992 , pL0I1974 pL0I1975 FIG. J. Restriction map of DNA fragments containing K a:'ylOca P2 casRAB genes (solid lines). R' denotes the 3' end of casR encoding a putative regulalOry protein. A and B refer 10 casA and casB. respectively. Polylinker regions of the veClOr. pUC18. are shown as dashed lines. Arrow on pL011974 indicates the site of a frameshift mutation. Activities are indicated on the right. MUG positive (+) indicates hydrolysis of the fluorescent analog. Cellobiose positive (+) indicates growth in cellobiose-minimal medium. Negative (-) indicates absence of hydrolysis or growth. fragment from pL0I1906 as a probe confirmed the origin of used to measure the combined casAB-encoded activities (phos­ this DNA. Probe hybridized to a single band in five different phorylation and cleavage) in JL630, a bgL4-deficient host (Fig. restriction digests of K oxytoca chromosomal DNA. 2). Activity in cell extracts of JL630(pLOII906) remained lin­ Subclones were constructed to localize the coding regions ear for at least 60 min. No activity was observed with JL630 (Fig. 1; Table 2). All activities were retained by DH5o. (pUCI8) and JL630(pL0I1997), ~hich contain a deletion in (pL0I1998), allowing the deletion of a 2.5-kbp Eagl-to-Xbal the casB-encoded phospho-J)-glucosidase. However, a low fragment. Subclones containing this 3.8-kbp Eagl-SacI frag­ ment from pL0I1906 were inactive unless supplied with an upstream promoter in the same orientation as the lac promoter 4;------, in pL0I1998. Cellobiose and salicin activities were lost after deletion of the two San fragments, but the resulting DH5o. A (pL0I1997) remained positive with MUG, MUM, and arbutin. .3 These activities were absent in the bgL4-deficient host, JL630 (pL0I1997). Since the native bgL4-encoded product (DH5o.) ~'" .2 hydrolyzes phosphoarbutin but not phosphosalicin (39), the til San deletion must inactivate the K oxytoca gene encoding oQJ EO phospho-J)-glucosidase. Attempts to delete the region adjacent 2. to the lac promoter resulted in the loss of all activities. All 0.. Z activities were also lost upon insertion of a frameshift mutation 0.. into the Neol site (ligation after Klenow treatment) to produce fi----I\) pL0I1974 and after deletion of the Ncol-BstXl fragment to • produce pL0I1975. These results are consistent with the pres­ o 15 45 60 ence of a gene(s) encoding the PTS Ell uptake complex in the region between the phospho-J)-glucosidase and the lac pro­ moter, as illustrated in Fig. 1. ~'2 Further information concerning the substrate range of indi­ B /' vidual cas-encoded activities can be deduced by a comparison of activities expressed by DH5n and JL630 recombinants har­ e / boring native and deleted plasmids. Provided polar effects do j .08 /' not prevent independent expression (addressed below), the elimination of all positive reactions by a frameshift mutation in o / cas indicates that each substrate (cellobiose, arbutin, salicin, : .04 /e MUG, MUC, MUM, and MUX) is transported and phosphor­ ylated by the K oxytoca cas-encoded Ell prior to hydrolysis by ~ -~======~ either the cas product or bgL4 product. Retention of all activ­ o e/ ~=-===;... ities by JL630(pL0I1998), a bgL4-deficient host, demonstrates o 20 40 60 that the cas-encoded cleavage is active on all sub­ Time (min) strates which tested positive. Retention of MUG, MUM, and arbutin activities by the bgL4-proficient host (DH5o.) harboring FIG. 2. In vitro activity of the K OX)'loca casAB-encoded PTS proteins in recombinant E. coli J1.630. Activity was measured as the release ofp-nitrophenol pL0I1997 (partial deletion of the cas-encoded cleavage en­ (PNP) from p-nitrophenyl-(3-D-glycopyranoside. (A) Effect of a C-terminal de­ zyme) indicates that E. coli BglA hydrolyzes the phosphory­ letion in casB. Symbols: e, pL0I1906 (original clone containing casR'AB): "', lated products ofMUG, MUM, and arbutin but is inactive with pL0I1997 (casR'AB', C-terminal deletion in casB): 0, pUC18 (control). (B) In the phosphorylated products of MUC, MUX, cellobiose, and vitro complementation ofcasR'A'B with casR'AB'. Symbols: 0, pL011975 (cas­ R'A'B, internal deletion in casAl: e, 1:1 mixture of cell extracts from J1.630 salicin. (pL0I1975) and J1.630(pL011997): .... 1:1 mixture of cell extracts from J1.630 In vitro activity ofthe K. oxytoca PTscellobiose in recombinant (pL0l1975) and J1.630(pUCI8). Results represent an average of those from E. coli. The in vitro release ofp-nitrophenol from PNPG was three experiments. VOL. 63. 1997 CLONING OF CELLOBIOSE PHOSPHOTRANSFERASE GENES 359

CTGCAACTGGTGAAGTATCAGCTGACGCTGAACTACGACGAAGAGTCGCTGAGCTACCAGCGCTTTGTCACCCACCTCAAGTTTTTCGCCCAGCGCATGCTGACGCGAACGGTGGTCGAA 120 L Q LV K Y Q LTLNY0EESLSY Q R FVTNL K FF A Q R M LT R TVVE 40 CelRI ... GATGA TGA TGTGACGCTGCACAGCGCGGTAAAGGATMCTACCCTAAAGCGTGGAAGTGTGCGGAAACCGTCGCCCGGCACCTGCAAAAGAGCTACCAGCGGCCACTGACGACGGAAGAG 240 o 0 0 V T L H S A V K 0 N Y P K A ~ K CAE TV A R HL Q K SY Q R PLT TEE 80 ATTATGTTTCTCGCCATTCATATTGAGCGGGTGAGAAAAGAGGGACGCTMGCGCCGATAGGGCCTGGCGCATAGCTCCCGCGCTACAACTGGATTGTTACTGCATT ACGCAGGCAAAAC 360 I M F l A I HIE RVR KEG R * .. 35 ·10- ••••••• 96 CTGAGCCGACGCCTACACGGCGTTGTTCTCGGGTTTTTTTATTTTTM TACTCAGTTMCGATGCCTGCGGGCAGCGGAAGTMGGAAAAACAGCATGGAATAT AAAGCACTCGCGCAGG 480 ••• • ••••• ~ RBS M EY K A L A Q 0 9 CelA ­ ATATTCTCACTCGGGTAGGCGGCAGAGAGAATATTGTGAGTCTGGTTCACTGCGCTACACGGCTGCGTTTTMGCTCAAAGATAGCAAGAAGGCGGATGCCGAAGGACTGAAGGCGAACC 600 ILT R VGG R ENIVSLVH CAT RLRF K L K 0 S K K A 0 A EGL K A N P 49 ElIB Signiture Sequence CTGGCGTCATTATGGTCGTCGAAAGCGGCGGCCAGTTTCAGGTGGTTATCGGTMTCACGTGCACGATGTCTGGCTGGCGGTGCGCAGCGAAGCCGGCCTGACGGATGACAGTGAGCCGG 720 GVIMVVESGGQFQVVIGNHVHOV~LAVRSEAGLTOOSEPV 89 TGGCGGTAAMGGCGAAAAAGTGTCGCTTATCGGCCAGCTTATCGACATTGTTTCCGGTATTTTCACCCCGTTTATCGGCGTCCTGGCGGCCTCGGGCATCCTCAAAGGCTTGCTGGCGC 840 AV K GE K VSLIG Q LID IVSG 1FT P FIG VL A A S GIL K GLL A L lB TGGCGGTGGTTTGCGGCTGGTTMCGACGCAGCAGGCGACCTATAAMTCTGGTTTGCCGCCAGCGATGCGCTGTTCTTCTTTTTCCCGCTGTTTCTCGGCTATACCGCAGGCAAGAAAT 960 A VVCG ~ LTT Q QAT Y K I ~ F A A S 0 A LFFFFPLFLGY TAG K K F 1~ TTGGCGGCAACCCCTATATTACCATGGTGATCGGCGGCGCGTTMCTCATCCGCTGATGATTCAAGCGTTTGACGCCAGCATGGCTCCCGGCGCCGCGACGGAGTATTTCCTCGGTATTC 1080 G GNP YIT M VIG GAL THPL M I Q A FDA SMA PG A ATE YFLGIP 2~ CGGTGACCTTTATCAATTACAGCTCCTCGGTCATTCCGATTATCCTCGCCTCGTGGGTCAGCTGCTGGATTGAAAAACGCAGCAACGCGATACTGCCTTCCTCGATGAAGAACTTCTTCA 1200 VT FIN YSSSVIPII LAS ~ VSC ~ IE K RSN A ILP SSM K NFFT 249 CGCCGGCGATTTGCCTGGCCGTCGTCGTACCGCTCACGTTTCTGATTATCGGCCCCGTAGCCACCTGGCTGAGCCAGCTGCTGGCCAATGGTTATCAGCTTATCTATCAGGTTGCGCCCT 1320 P A ICLAVVVPLTFLIIGPV A T ~ LS Q L LAN GY Q LIY Q V A P ~ 289 GGCTTGCCGGAGCGGCGATGGGAGCGCTGTGGCAGGTATGCGTTATCTTCGGGCTGCACTGGGGCCTGATACCGTTMTGATTMCAACCTCGCCGTGCTGGGCCATGATTCGATGATGC 1440 LAG A A M GAL ~ Q VCVIFGLH ~ G LIP L MIN N L A V L G H 0 S M M P 3B CGATGCTGCTGCCTGCGGTGATGGGCCAGGTGGGCGCCGCGCTGGGGATTTTCCTGCGCACTCGCGATGCGCGGCAAAAAGTGCTGGCGGGTTCTGCGGTTTCCGCCGGGATCTTTGGCG 1560 M LLP A V H G Q VG A A L G I F L R T R 0 A R Q K V LAG S A V SAG IFGV 369 Conserved Sequence for Cellobiose EIlC TGACCGAGCCGGCGATTTATGGTCTGAACCTGCCTTTACGCCGTCCCTTTATCTTCGGCTGCGTTTACTGGCCCATCGGCGGCGCGATGGTTGGTTTTAGCGACAGCCACGTCTCGACCT 1680 T EPA IYGL N LPL R RPFIFGCVY ~ PIG GAM V G F S 0 S H V STY 4~ ACTCTTTTGGTTTCGGTMCATTATCACCCTGGCGCAAATGATCCCGCCGGAGGGCATTGACGCTACGGTGTGGGGCGGCGCGGCGGGGATGTTTGCCTCGTTMTCATCGCCTGCGTGC 1800 SFGFGNIITL A Q M IP PEG lOA TV ~ GG A A G MFA SLII A CVL 449 TTACCCTCGTCGCCGGGCTGCCGCGCAGCAGCGCTGAGCAGGCTGCGGTTGTCGTGGCGCCGGCCTCGGtAAACGACATCCTGGCGCCGATGACCGGCAGCGTGCTGGCGCTGGATCAGG 1920 TLV A GLPRSSAE Q A A VVV A PAS V N 0 I LAP M TGSVL A L 0 Q V 489 TGCCTGACAGTACCTTCGCCAGCGGCCTGTTGGGCCAGGGCGTGGCGATTATTCCCTCAGTGGGCAAGGTTATCGCGCCGTTCTCGGGCGAAGTCGCCTCCATTTTTCAAACCAAACACG 2040 P0STFASGLLG Q GV A I IPS VG K VI A PFSGEV A SIF Q T K H A 5B CCATCGGTCTGCTGAGCGACAGCGGCATTGAGCTGCTCATCCACGTTGGCATCGACACCGTTMGCTCGATGGCGCGCCGTTCACCGCCCATGTAAMGAGGGTGACAAAATCAAAGCGG 2160 I G L L S 0 S G I ELL IHVG lOT V K LOG A PFT A HV KEG 0 K I K A G 5~ EllA Signiture Sequence GCGACCTGCTGCTGGAGTTTGACCGCCAGGCGATTCTGGACGCCGGCTATGATCTGGCGACGCCCATTATTATCAGCAACAGCGACGATTTTCGTACTCTCGACATGGTCTCCGCCAGCG 2280 o L L L E FOR Q A I LOA GY0LATPI IISNS00F R T L 0 M VS A S A 609 CGGTGGATGCCGGTCAGCCGCTGTTGTCCGTMGCCGTTMTCACAGGAGAGAGATGAATGAAGACATTCCCACAGGCGTTTTTATGGGGCGGCGCGACCGCGGCAAACCAGGTTGAAGG 2400 V 0 A G Q PLLSV SR· RiiS M K TFP Q A FL ~ G GAT A A NOV EG 21 CasB ... CGCTTATCTGGAAGACGGGAAGGGATTMCCACTTCTGACGTCCAGCCGCGCGGCGTATTTGGCGATGTGGTAGAACGCGTGCCCGGCGATAGCGGCATTMGGATATCGCTATCGACTT 2520 A Y LEO G K G L T T S 0 V 0 P R GVFG0VVERVPG0SGI K 0 I A I 0 F 61 TTACCACCGTTATCCGGAAGATATCAGCCTCTTTGCCGAGATGGGCTTTMCTGCCTGCGCGTATCGATTGCCTGGGCGCGTATTTTCCCCCACGGCGATGAGGCGCAGCCAAACGAAGC 2640 YHRYPE0I SLF A E H GFNCLRVSI A ~ A R I F P N G 0 E A 0 P N E A 101 CGGGCTGGCCTTCTACGATMGCTGTTTGATGAAATGGCGAAGCACAATATTACTCCGCTGGTMCCCTGTCCCACTATGAGATGCCGTGGGCGCTGGTGAAAAACTACGGCGGCTGGGG 2760 GL A F Y 0 K L FOE M A K N NIT PLVTLSNYE M P ~ A LV K NYGG ~ G 141 CAATCGCAAAGTGATTGGCTTCTTCGAGCGCTACGCCCGTACGGTGTTCGAGCGCTACCAGGCGAAGGTCAAACTGTGGCTGACCTTCAATGAAATCAATATGTCCCTGCACGCGCCGAT 2880 N R K VIGF FER Y ART V FER Y Q A K V K L ~ LTFNEIN M SL HAP H 181 GACCGGCGTCGGTCTGCCGGCTGACAGCAGCAAAGCGGAAGTCTACCAGGCTATTCACCATCAGCTGGTGGCGAGCGCGCTGGCGGCTMGGCCTGCCATGACATCGTGCCTGAGGGCAA 3000 TGVGLP A 0 S S K AEVY Q AIHH Q LVA SAL A A K AC HOI V PEG K 2~ MTCGGCAATATGCTGCTCGGCGGCCTGATGTATCCGCTCAGCTGCAAACCGGACGATATTTTTGAAACCCTGCAGCAGAACCGTAGCTGGCAGTTCTTTGGCGATGTCCAGTGCCGCGG 3120 IGN M LL ~ GL M YPLSC K P 0 0 I F E T L Q Q N R S ~ 0 F F G 0 V Q C R G Ul CGCTT ACCCGGGCTATATGCTGCGCTATTTCCGCGACAACGGCATTMCCTCGACATCACCGATGCCGACCGCGCCGCCCTGAAAGAGACCGTCGACTTTATCTCCTTCAGCTACTATAT 3240 A YP GYM LRY FRO H GIN L 0 ITO A 0 R A A L K ETV0FISFSYY M .1 GACCGGCTGCGTGACCGCCGACGAAGAGCTGAACAAAAAAGCGCGCGGCAATATTCTCAGCA TGGTGCCGAACCCGCACCTGGCAAGCTCGGAATGGGGCTGGCAGATTGACCCGCTCGG 3360 TGCV TAO EEL N K K A R GHILS M VPNPN LAS SE ~ G W Q lOP LG Nl TCTGCGCACGCTGCTGAACGTGCTGTGGGATCGCTATCAGAAGCCGCTGTTTATCGTGGAAAACGGCCTGGGCGCCAAGGATMGGTGGAAGCCGACGGCAGCATCAACGACGATTACCG M80 LRTLLHVL W 0 R Y Q K PLFIVEHGLG A K 0 K VE A 0 G S I H 0 0 Y R ~1 F...ily 1 Signiture Sequence GATCAGCTATCTTMCGACCATCTGGTGCAGGTTCGTGAAGCAATCGAGGACGGCGTTGAGCTGATGGGCTACACCAGCTGGGGGCCTATTGACCTGGTGAGCGCGTCGAAGGCGGAGAT 3600 ISYL NOH LV Q V REA lEO G VEL M GYTS W G P I 0 L V S ASK A E M 4~ GTCCAAACGCTATGGCTTTATCTATGTCGATCGCGACGATGACGGCAACGGCACCCTGGCGCGCAGTCGCAAGAAGAGCTTCTGGTGGTATAAAGAGGTCATTGCCACTMCGGCGGCAG 3720 S K RYG FlY V 0 ROO 0 G N G T L A R S R K K SF W ~ Y K EVI A THGGS "1 CTTAAMGAGTM TCGTCTGGAGGTTGCGTCGCTMCCGCCACCCGGCAGAAATGCGCGACAGTAGCTAAMCCCTCGCCCGGATMCCACCGGCAATTTTCGTATTACGCCGTTACGCG 3840 LKE· 464 CGCGCAGCAA 3850

FIG. 3. Nucleotide sequence ofK axylOca P2 DNA fragment in pL0I1998. The deduced amino acid sequences for each ORF are placed below the first nucleotide of the corresponding codon. Both are numbered on the right. Putative Shine·Dalgarno sequences for ribosom~1 binding (RES) a~e underlined and labeled. Proteins encoded are labeled at their respective start codons. SlOP codons are indicated by asterisks. Arrows indIcate a palmdromlc sequence m the promoler regIOn. ImmedIately downstream from the proposed antiterrninator (marked with a closed circle benealh). The signature sequences for EllA and EIlB are underlined and labeled. A conserved region for disaccharide EIIC is also underlined and labeled.

but reproducible level of activity was observed with JL630 truncated gene fragment followed by two complete open read­ (pL0I1975) containing a deletion in the casA-encoded per­ ing frames (ORFs). All ORFs appear to be transcribed in the mease consistent with cleavage of the unphosphorylated sub­ same direction as the lac promoter. Putative ribosomal-binding strate at a reduced rate. Activity was restored to one-quarter sites were identified for each complete ORF. A stem-loop that expressed with the native plasmid (pL0I1906) by mixing region resembling a rho-independent terminator is located im­ equal amounts of extract from JL630(pL0I1975) and JL630 mediately upstream from the first ORF. A nucleotide sequence (pL0I1997), demonstrating that the casAB-encoded PTS Ell near this putative terminator resembles the conserved antiter­ permease and cleavage enzymes can be expressed separately in minator-binding site reported in other PTS operons (29). A E. coli and contain independent ribosomal binding sites. BLAST search of GenBank allowed identification of these Sequencing of the K. oxytoca cas operon. The K oxytoca ORFs as the carboxy-terminal region of a PTS antiterminator, DNA fragment in pLOI1998 was sequenced in both directions PTS Ell (permease), and a phosphoglycohydrolase, respective­ by using a family of overlapping subclones (Fig. 3). This frag­ ly. These genes have been designated casRAB. Gene spacing ment contained 3,850 nucleotides and includes a small 5'- 360 LA! ET AL. APPL. E"VIRO". MICROBiaL.

TABLE 3. Comparison of predicted amino acid sequence of CasA ENGLG) is highly conserved in phospho-(3-glucosidases and and CasB from K my·lOca P4 to those from other bacteria (3-glucosidases from manv organisms and matches the signa- ture sequence for glycosyi hydrolase family 1 (12. 19). - C;C Identi~· ca (C;C similari~") GenBank Localization of functional domains \\-ith K. oxvtoca ElI , by Protein ------accession using hydropathy analysis and signature sequ~nces. PTS Ell K oXYlOca- K oXYlOca- no. typically consists of two hydrophilic domains (EIAA and EIIB) CasA CasB which· contain phosphorylation sites and one hydrophobic Enzyme II transmembrane domain which binds the sugar substrate for I3-Glucoside permease transport (EIIC) (39. 45). Highly conserved. short signature Erwinia clzrysarztlzemi ArbF 57.5 (76.4) M81772 sequences have been reported for EIIB and EIIA (39. 43). Eschen·chia coli BglS 55.5 (75.8) MI6487 Corresponding sequences in Cas ElI are located in the N Clostridium longisponlm AbgF 45.3 (64.5) L49336 terminus and C terminus of CasA respectively (Fig. 3). By Bacillus subtilis BglP 38.7 (64.2) Z34526 Escherichia coli AscF 33.3 (57.7) M73326 default. the long middle region is presumed to encode the integral membrane component. domain C. No signature se­ Sucrose permease quence has been identified for domain C. Streptococcus mlltans ScrA 34.9 (59.3) M22711 The limited degree of similarity between single polypeptide PedicocClls penrosaceus SCtA 32.8 (58.2) Z32771 PTS Ell permeases and PTS Ell permeases composed of sep­ arate polypeptides does not permit a delineation of domain Hydrolase boundaries. These boundaries can be approximated. however. Phospho-l3-g1ucosidase by using hydropathy plots (data not shown). Based on this Ern-inia chrysarzthemi ArbB 75.7 (84.8) M81772 comparison, CasA Ell domain B is proposed to extend from Escherichia coli BgiB 75.0 (85.3) MI6487 Clostridium longisporum AbgA 64.3 (75.7) L49336 G16 to 1109, domain C is proposed to extend from 1145 to Bacillus subtilis BglH 64.5 (78.4) Z34526 K527, and domain A is proposed to extend from L540 to K617. Escherichia coli AscB 55.1 (72.1) M73326 A short 16-amino-acid N-terminal segment immediately pre­ cedes domain B. Shorter interdomain segments were found I3-Glucosidase between the B-C and C-A boundaries which lack similaritv to Bacilills subtilis BglA 44.9 (67.4) L197IO proteins containing single domains. These segments may rep­ TlzemlOanaerobacter broc/':ii BglT 37.7 (55.1) Z56279 resent connecting regions which allow proper folding. Caldocelillm saccharo~Wieum BglA 35.6 (55.6) XI2575 Hydropathy analysis using GCG-ALOM indicates that CasA contains 6 to 10 transmembrane helices. At least one of these Phospho-l3-galactosidase Staphylococclls allrells LacG 38.6 (60.6) J03479 is located within domain A. Most are predicted for domain C. LacTOcoccllS lactis LacG 38.3 (58.3) J05748 One transmembrane helix is predicted for the connecting se­ StrepTOcoccllS lacris LacG 38.3 (58.3) Ml9454 quence between domains A and C. Previous studies have re­ ported three-dimensional structures for domain A and domain Plan! hydrolase C (39). The topologies of ElIcmannitol and ElICBglucosc (7) Arabidopsis thallana PyklO 36.2 (56.0) X89413 have been mapped by using gene fusions and shown to contain Brassica naplls Bgl 33.0 (54.0) X82577 six and eight transmembrane helices, respectively. Sorghum bicolor Bgl 32.9 (56.8) U33817 Identification of conserved regions in Ell domain C. Do­ main C forms the integral membrane portion of Ell which facilitates the binding, phosphorylation (donated from domain B), and transport of sugar substrates. Although the amino acid and similarity with other PTS systems are consistent with the sequences in Ellcelloblosc C domains appear to share little iden­ expression of casRAB as an operon (Fig. 1 and 3). tity, the MACAW Gibbs Sampler (28) identified six conserved The translated sequence for K oxytoca casA contains 621 regions of 15 to 30 amino acid residues as being highly signif­ amino acids (Mr, 65,492) and is most similar to aryl-(3-gluco­ icant (>70% similarity). Each region contained seven or more side-specific PTS Ell permeases which comprise single poly­ consecutive residues with >90% similarity. Four of these re­ peptide chains (Table 3). The cryptic E. coli ascF-encoded gions reside in hydrophilic regions of domain C. However, two cellobiose permease is a single but shorter polypeptide with regions are located at the ends of a predicted transmembrane more limited regions of similarity to the casA product. In helix and are marked in Fig. 4. BLAST searches (cutoff score contrast, there is little identity between the casA product and adjusted to 32) with GI[N or T)EPAIFGXX recognized virtu­ the other three PTS permeases (B. stearothennophilus celABC, ally all PTS EIIe regions which transport disaccharides (cel­ B. subtilis celABC, and E. coli celABC) reported to transport lobiose, lactose, sucrose, trehalose, and aryl-(3-glucosides), two cellobiose (9 to 23% identity). Each of these cellobiose per­ synthesis regulatory proteins (SacX), and two un­ meases consists of three separate polypeptide chains rather usual EIIC proteins for monosaccharides (glucose and man­ than the single polypeptide encoded by K oxytoca casA. nose). These BLAST searches did not recognize EIIC regions The casB gene begins 18 bp downstream from the stop from PTS permeases for fructose, mannitol, sorbose, maltose, codon ofcasA and is 1,395 nucleotides in length (including the N-acetylglucosamine, or the Wical.glucose permease. stop codon). The translated amino acid sequence contains 464 The 26 sequences for ElId,sacchande domain C, two regula­ amino acids (Mr , 52,242) and is most similar to the PTS phos­ tory proteins, and domain C from two closely related monosac­ pho-(3-glucosidases encoded by E. chrysantlzemi (arbB) and E. charides were analyzed further by using MACAW. Pairwise coli (bgIB) (Table 3). A high degree of similarity was also alignments revealed that domain C sequences formed multi­ observed between the casB product and (3-glycohydrolases ple, distinct groups with shared segments. Only a single large from procaryotes and plants. As with casA, however, the pre­ conserved region in domain C was identified as being common dicted casB product shared little similarity with the three pre­ to all 26 proteins. Further analysis with the Gibbs Sampler viously reported phosphocellobiases (16 to 18% identity). The defined a region of 50 amino acid residues with >70% simi­ region ofK oxytoca CasB from amino acids 362 to 370 (LFIV larity (Fig. 4). VOL. 63. 1997 CLONING OF CELLOBIOSE PHOSPHOTRANSFERASE GENES 361

Glucose Group (representative sequence)

Eeo-NagE 249-AALAMYFAAPKERRP-MVGGMLLSVAVTAFLTGVTEPLEFLFMFLAPLLYL-298 ** * * *** ** * * ** Lactose Group

Smu-LacE 292-ATLVVPFMFMWLCKS-KRNKAIGRASVVPTFFGVNEPILFGAPIVLNPVFF-341 Sau-LacE 295-ATLVVPFHFMWMTKS-KRNKAIGRASVVPTFFGVNEPILFGAPLVLNPVFF-344 Lla-LacE 292-ATLIVPFLFMWICKS-DRNRAIGRASVVPTFFGVNEPILFGAPIVLNPIFF-341 Lea-LacE 288-ATLVVPFIMLFAARS-AQLKAVGKAAFVPCTFGVNEPVLFGMPIIMNPMLF-337 vBsu-CelB 303-ATLALVVTMFLRARS-KQMKQLGKLAVGPAIFNINEPIIFGMPIVHNPMLL-352 vBst-CelB 305-ATLALALTMMFRARS-RQLKSLGRLAIAPGIFNINEPITFGMPIVHNPLLI-354 vEeo-CelB 306-ATLGLlLAIFIASRR-ADYRQVAKLALPSGIFQINEPILFGLPIIMNPVHF-355 *** * *** ** * ** Sucrose Group ####### ########### vKox-CasA 337-GQVGAALGIFLRTRD-ARQKVLAGSAVSAGIFGVTEPAIYGLNLPLRRPFI-386 Eeh-ArbF 338-GQAGATLGVLLRTQD-LKRKGIAGSAFSAAIFGITEPAVYGVTLPLRRPFI-387 Eeo-BglS 335-AQVGAALGVFLCERD-AQKKVVAGSAALTSLFGITEPAVYGVNLPRKYPFV-384 Clo-AbgF 331-AQTGVVHAlLAKTKD-KKLKSLCIPAIISGFFGVTEPAIYGITLPRKKPFI-380 Ppe-ScrA 356-GQGAATLAIFFATKS-QKQKALTSSAGVSALLGITEPAIFGVNLKMKFPFV-405 Smu-SctA 365-AQGAATFAIYFLTKD-KKMKGLSSSSGVSALLGITEPALFGVNLKYRFPFF-414 Bsu-TreP 355-AQGSAALALMFIVKD-EKQKGLSLTSGISAYLGITEPAIFGVNLRYRFPFI-404 Val-SetA 356-SQGAAALAVGVHSKD-KKMKGIAIPSGVTGLLGITEPAMFGVNLKLRYPFI-405 Sxy-SerA 358- QGAAALAAFFIIKENKKLKGVASAAGVSALLGITEPAMFGVNLKLRYPFI-407 Kpn-ScrA 340-AQGGACFAVWFKTKD-AKlKAITLPSAFSAMLGITEAAIFGINLRFVKPFI-389 sty-SerA 340-AQGGACLAVWFKTKD-AKlKAITLPSAFSAMLGITEAAIFGINLRFVKPFI-389 Bsu-SacP 338-AQGGAGLAVFFMAKK-AKTKEIALPAAFSAFLGITEPVIFGVNLRYRKPFI-387 Bsp-sacX 250-AQGGAGLAVFFLTKN-PKVKEIAIPASFSAFLGITEPVIFGVNLRFMRPFI-299 Bsu-SacX 340-AQGGAGLAVFLKTKQ-SSLKKIALPASLTAFLGlVEPIVFGVNLKLIRPFI-389 Bsu-BglP 33 l-GQAGASFAVFLRSRN-KKFKSLALTTSITALMGITEPAMYGVNMRLKKPFA-38 0 Eeo-TreB 350-AQGSAVIGIIISSRK-HNEREISVPAAISAWLGVTEPAMYGINLKYRFPML-399 * * * * ASC Group

Bla-PtsG 355-GLVTGVFLIALKEKNRAMRQVSLGGMLAGLLGGISEPSLYGVLLRFKKTYF-404 cgl-PtsM 355-GLVTGVFLLSIKERNKAMRQVSLGGMLAGLLGGISEPSLYGVLLRFKKTYF-404 vEco-AscF 341-SLGGSSLAVAWKTKNPELRQTALAAAASAlMAGISEPALYGVAIRLKRPLI-390 * * ** ***** **** * * FIG. 4. Comparison of conserved regions in the translated sequences of PTS EIIC domains. These regions were identified by using MACAW (Gibbs Sampler). The first letter of the genus and first two letters of the species are listed on the left followed by the protein (see Table 4 for locus designations and accession numbers). The positions of the first and last amino acids in each string are noted on the left and right. respectively. Four major groups were identified with GCG·PILEUP (glucose. lactose. sucrose. and ASC groups). Residues which are fully conserved within a group are marked by an asterisk. The five permeases which transport cellobiose are marked with a check. The pound symbol marks the location of two blocks in CasA which were identified by MACAW as being highly conserved (>90%) among the five ElIC'.cIlOb;O," domains.

GCG-PILEUP was used to align the 50-residue regions in C residue regions of similarity in C domains are in general agree­ domains and allowed assignment of these to four major groups ment. The E. coli CelABC was grouped with gram-positive (Fig. 4). Ell permeases for glucose, N-acetylglucosamine­ lactose permeases by both approaches as noted previously specific, and maltose (E. coli malX) were also included in this (44). The glucose family of Ell permeases (39) included su­ comparison. Ell permeases capable of cellobiose uptake were crose. trehalose, l3-glucosides, glucose. and N-acetylglucos­ scattered among three of these groups, suggesting that the amine. Our comparison further divided the glucose family into ability to utilize cellobiose may have multiple origins. Per­ three groups: glucose, sucrose, and ASC. Also. the 50-residue meases encoded by B. subtilis celB, B. stearothermophilus celB, conserved region in the three disaccharide permease groups and E. coli celB were clustered with lactose permeases (lactose (lactose, sucrose, and ASC) appeared more similar to each group). K oxytoca CasA was grouped with sucrose. aryl-l3­ other than to members of the glucose group. glucosides, and trehalose permeases and with the two sac reg­ Consensus sequence for dissacharide binding. Comparisons ulatory proteins (sucrose group). The E. coli ascF-encoded of the 50-residue conserved region in C domains resulted in permease (arbutin, salicin, and cellobiose) clustered with the groupings which appear to reflect substrate specificity rather two unusual monosaccharide permeases from Brevibacterium than domain order or phylogenetic relationships (Fig. 4; Table lactofermemum (61a) and Corynebacterium glutamicllm (30), 4). Combining the 50-residue regions from the 26 members of which transport both mannose and glucose (ASC group). Con­ the lactose, sucrose and ASC groups provides the following sidering the similarity in domain order and amino acid se­ consensus for disaccharides: [Flm][Gn][Iv][Tsn]EP[Aiv][ILmv] quences between these two unusual permeases and the l3-glu­ [Fy]G[Vilma][Npt][Li]. Conservation of the glutamate residue coside and sucrose permeases, it is tempting to speculate that had been previously noted in C domains (42, 59). The consen­ dissacharides may also be transported. Glucose and N-acetyl­ sus sequence represents 90% of the residues at each position glucosamine permeases formed the fourth group (glucose group), (highly conserved in boldface type, most abundant residue in which also included an E. coli maltose permease (malX). uppercase, and lowercase residues in order of frequency). The Postma et al. (39) have compared all three domains in Ell conservation of this sequence in two regulatory proteins (syn­ permeases and assigned these to four families (glucose, man­ thesis of levansucrase) that do not function in sugar transport nitol, lactose. and mannose) based on amino acid identity (3) is consistent with a role in disaccharide binding site. Len­ (greater than 25%), coding order of domains, and functional geler et al. (31) have previously proposed that part of this complementation in some cases. Our results using the 50- sequence (GlTE) is involved in phosphoryl transfer to the 362 LA! ET AL. APPL. EI'VIROr". MICROBIOL.

TABLE 4. Ell protein sequences analyzed in this study REFERENCES I. Alpert, c.-A.. and B. M. ChasS)·. 1990. Molecular cloning and DNA sequence Group and Acces- Pro- Reference Organism of lacE. the gene encoding the lactose-specific enzyme Il of the phospht1­ locus sion no. tein or source svstem of Lactobacillus casei: evidence that a cysteine residue b essential for'sugar phosphorylation. J. BioI. Chern. 265:22561-22568. Glucose 2. Atlas. R. M., and L C. Parks (ed.). 1993. Handbook of microbiological KPNAGE X63289 NagE Klebsiella pneumoniae 55 -.media. CRC Press. Inc.. Boca Raton. Fla. • ECONAGBE MI9284 NagE Escherichia coli 38 3. AYmerich, S., and M. Steinmetz. 1987. Cloning and preliminary character­ ECOPTSG J02618 PtsG Escherichia coli 13 ization of the sacS locus from Bacillus subtilis which controls the regulation SCPTSG X80415 PtsG Staphylococcus camosus 7a of exoenzvrne levansucrase. Mol. Gen. Genet. 208:114-120. BSIIGLCG Z11744 PtsG Bacillus subtilis 62 4. Benson. 0.. D. J. Lipman. and J. Ostell. 1993. GenBank. Nucleic Acids Res. STPTSG X74629 PtsG Salmonella IJphimurium 51a 21:2963-2965. 5. BIatch. G. L, R. R. Scholle, and D. R. Woods. 1990. Nucleotide sequence and M60722 MalX Escherichia coli 41 ECOMALAA analysis of the vibrio alginolyticus sucrose uptakc·cncoding region. Gene MGU39687 U39687 PtsG Mycoplasma genitalium 15 95:17-23. 6. Breidl, F., Jr., W. Hengstenberg. U. Finkeldei. and G. C. Stewart. 1987. Lactose Identification of the lactosc·specific components of the phosphOlransfcras,-, STRE2PBG U8993 lacE Streptococcus mutans 21 system in the lac operon ofStaphylococcus aureus. J. BioI. Chern. 262:16444­ STALACS J03479 lacE Staphylococcus aureus 6 16449. LACLACRABC M60447 lacE Lactococcus lactis 11 6a.Brown, G. D. Unpublished data. LBALACE M60851 lacE Lactobacillus casei I 7. Buhr, A., and E. Bernhard. 1993. Membrane topology of glucose transporter of Escherichia coli. J. BioI. Chern. 268:11599-11603. BSCELABCD Z49992 CelB Bacillus subtilis 15a 7a.Christiansen, 1. Unpublished data. BSU07818 U07818 CelB Bacillus stearothenno- 27 8. Coughlan, M. P., and F. Mayer. 1992. The cellulose-decomposing bacteria phi/us and their enzyme systems. p. 460-516. In A. Balows. H. G. Trupcr. M. ECCELOPE X52890 CelB Escherichia coli 37 Dworkin, W. Harder, and K. Schleifer (ed.). The prokaryotes. 2nd ed. A handbook on the biology of bacteria: ecophysiology. isolation. identification. Sucrose applications. vol. 1. Springer-Verlag. Berlin. Germany. KOU61727 U61727 CasA Klebsiella oxytoca This study 9. Cntting, S. M., and P. B. Vander Hom. 1990. Genetic analysis. p. 27-74./11 ECOBGLO MI6487 BglS Escherichia coli 50 C. R. Harwood and S. M. Cutting (ed.). Molecular biological methods for Bacillus. John Wiley & Sons Ltd.. Chichester. England. GLOABG L49336 AbgF Clostridium longisporum 6a 10. Devereux, J., P. Haeberli, and O. Smithies. 1984. A comprehensive set of PPSURFOP Z32771 ScrA Pediococcus pentosa- 30a sequence analysis programs for the VAX. Nucleic Acids Res. 12:387-395. ceus 11. de Vos, W. M., 1. Boerrigter, R. J. van Rooyen. B. Reiche. and W. Hengsten. STRSCRA M22711 ScrA Streptococcus mlltans 47 berg. 1990. Characterization of the lactose-specific enzymes of the phospho­ BSTREAPR Z54245 TreP Bacillus subtilis 18 transferase system in Lactococcus lactis. J. BioI. Chern. 265:22554-22560. VIBSCRAK M76768 ScrA Vibrio alginolyticus 5 12. EI Hassouni, M., B. Henrissal, M. Chippaux, and F. Barras. 1992. Nucleo­ SXSCRA X69800 ScrA Staphylococcus xylosus 56 tide sequences of the arb genes. which control fl-glucoside utilization in KPSCRYAB X57401 ScrA Klebsiella pneumoniae 49 Erwinia chrysanrhemi: comparison with the Escherichia coli bgl operon and evidence for a new fl·glycohydrolase family including enzymes from eubac­ STSCRCOMP X67750 ScrA Salmonella IJphimurium 22 teria, archeabacteria. and humans. J. Bacteriol. 174:765-777. BACSACP J03006 SacP Bacillus subtilis 14 13. Emi, B., and B. Zsnolari. 1986. Glucose-permease of the bacterial phos­ BACISPQ 037921 SacX Bacillus sp. 60 photransferase system: gene cloning. overproduction. and amino acid se­ G1C BACSACXY M29333 SacX Bacillus subtilis 63 quence of enzyme 11 • J. BioI. Chern. 261:16398-16403. BSGBGLUC Z34526 BglP Bacillus subtilis 29 14. Foue!, A., M. Arnaud, A. Klier, and G. Rapoport. 1987. Bacil/us subtilis ECOTREBC U06195 TreB Escherichia coli 25 sucrose·specific enzyme 11 of the phosphotransferase system: expression in ERWBGPA M81772 ArbF Erwinia chT)'santhemi 12 Escherichia coli and homology to enzyme 11 from enteric bacteria. Proc. Natl. Acad. Sci. USA 84:8773-8777. 15. Fraser, C. M., J. D. Gocayne, O. White, M. D. Adams, R. A. Clayton, R. D. ASC Fleischmann, C. J. Bul!, A. R. Kerlavage, G. Sutton, J. M. Kelley, J. 1.. ECOASCBFG M73326 AscF Escherichia coli 16 Fritchman, F. J. Weidman, K. V. Small, M. Sandusk}', J. L Fuhrmann, D. T. BRLPTSG U8875 PtsG Brevibacterium lactofer- 61a Nguyen, T. R. Utterback, D. M. Saudek, C. A. Phillips, J. M. Merrick, J.·F. mentum Tomh, B. A. Dougherty, K. F. Botl, P.-e. Hu, T. S. Lucier, S. N. Peterson, CORPTSMA U8874 PtsM Corynebacterium gillta- 30 H. O. Smith, C. A. Hutchison ill, and J. C. Venter. 1995. The minimal gene micum complement of Mycoplasma genitalium. Science 270:397-403. 15a.Glaser, P., B. Lubochinsky, and A. Danchin. 1995. GenBank accession no. Z49992. 16. Hall, B. G., and L Xu. 1992. Nucleotide sequence. function. activation. and evolution of the cryptic asc operon of Escherichia coli K12. Mol. BioI. Evol. bound sugar. Extensive investigations of receptor proteins for 9:688-706. sugars and sugar-phosphates (not PTS) have now identified 17. Helaszek, C. T., and B. A. White. 1991. Cellobiose uptake and metabolism by a common theme for sugar binding, cooperative hydrogen Ruminococcus flavefaciens. Appl. Environ. Microbiol. 57:64-68. bonding of sugar hydroxyls with two amino acid residues and 18. Helfer!, C., S. Gotsche, and M. K. Dahl. 1995. Cleavage of trehalose-phos. phate in Bacillus subtilis is catalyzed by a phospho-a·(I.1).glucosidase en­ stacking of aromatic residues against the sugar face (40). In coded by the treA gene. Mol. Microbiol. 16:111-120. periplasmic receptor proteins, many of the hydrogen-bonding 19. Henrissat, B. 1991. A classification of glycosyl based on amino residues are not contiguous but are folded into place. Residues acid sequence. Biochem. J. 280:309-316. such as T, S, N, E, P, or T in the consensus region of EIIe 20. Hespell, R. B., R. Wolf, and R. J. Bothast. 1987. Fermentation of xylans by Butyrivibrio fibrisolvens and other ruminaI bacteria. Appl. Environ. Micro· could serve as hydrogen-bonding partners to form part of the bioI. 53:2849-2853. binding site for disaccharides, with additional bonding partners 21. Honeyman, A. L., and R. Curtiss. 1993. Isolation. characterization and nu· being recruited from adjacent transmembrane helices. cleotide sequence of the Streptococcus murans lactose-specific enzyme 11 (lacE) gene of the PTS and the phospho·fl·galactoside (lacG) gene. J. Gen. Microbiol. 139:2685-2694. ACKNOWLEDGMENTS 22. Jahreis, K., and J. W. Lengeler. 1993. Molecular analysis of two ScrR This work was supponed in pan by the U.S. Depanment of Energy, repressors and a ScrR·FruR hybrid repressor for sucrose and D·fructose specific regulons from enteric bacteria. Mol. Microbiol. 9:195-209. Energy Biosciences Program (FG05-86ER3574), and the U.S. Depan­ 23. Jiresova, M., S. Dobrova, J. Naprstek, P. Rysavy, and J. Janecek. 1987. The ment of Agriculture, National Research Initiative (95-37308-1843), cellobiose uptake system in Streptomyces granaticolor. FEMS Microbiol. Lett. and Agricultural Research Service (58-3620-2-112). 41:279-282. We appreciate the generosity of Jane LopiJato in sharing E. coli 24. Kaji, A. 1984. L-Arabinosidases. Adv. Carbohydr. Chern. Biochem. 42:383­ JL630. 393. VOL. 63. 1997 CLONING OF CELLOBIOSE PHOSPHOTRANSFERASE GENES 363

25. Klein. W.• R. Horlacher. and W. Boos. 1995. Molecular analvsis of treB the proteins and protein domains of the bacterial phosphoenolpyruvate: encoding the Escherichia coli enzyme II specific for trehalose. i Bacteriol. sugar phosphotransferase system. J. Bacteriol. 174:1433-1438- 177:4043-4052. 46. Sambrook.. J.. E. F. Fritsch. and T. Maniatis. 1989. Molecular c1onine: a 26. Kricker. M.. and B. G. Hall. 1987. Biochemical genetics of the cryptic gene laboratory manual. 2nd ed. Cold Spring Harbor Laboratory. Cold Sp~mg system for cellobiose utilization in Escherichia coli. Genetics 115:419-429. Harbor. N.Y. 27. Lai. X.. and L. O. Ingram. 1993. Cloning and sequencing of a cellobiose 47. Sato. Y•• F. Poy. G. R. Jacobson, and H. K. Kuramitsu. 1989. Characteriza· phosphotransferase system operon from Bacillus slearolhennophilus XL-65-6 tion and sequence analysis of the scr:4 gene encoding enzyme II"" of the and functional expression in Escherichia coli. J. Bacteriol. 175:6441-6450. SrreplOcoccus mUlans phosphoenolpyruvate-dependent sucrose phospho· 28. La"Tence. C. E.• S. F. Altschul. M. S. Boguski, J. S. Uu, A. F. Neuwald. and transferase system. J. Bacteriol. 171:263-271. J. C. Wootton. 1993. Detecting subtle sequence signals: a Gibbs sampling 48. Schirnz, K..L.. and B. Broil. 1983. Cellobiose (EC 2.4.1.20) of strategy for multiple alignment. Science 262:208-214. Cellulomonas: occurrence. induction and its role in cellobiose metabolism. 29. Le Coq. D., C. Lindner. S. KrUger, M. Steinmetz, and J. Stiilke. 1995. New Arch. Microbiol. 135:241-249. J3-glucoside (bgl) genes in Bacillus SUblills: the bglP gene product has both 49. Schmid, K., R. Ebner, K. Jahreis, J. W. Lengeler. and F. Titgeme~·er. 1991. transport and regulatory functions similar to those of BgIF. its Escherichia A sugar-specific porin. ScrY. is involved in sucrose uptake in enteric bacteria. coli homolog. J. Bacteriol. 177:1527-1535. Mol. Microbiol. 5:941-950. 30. Lee, J. K..l\I. H. Sung. K. H. Yoon,J. H. Yu, and T. 1(. Oh. 1994. Nucleotide 50. Schnetz, K., C. TolOCZ)'ki, and B. Rak. 1987. J3·Glucoside (bg/l operon of sequence of the gene encoding the CorvnebaClerium gllllamicum mannose Escherichia coli K-12: nucleotide sequence. genetic organization. and possi. enzyme 11 and analyses of the deduced protein sequence. FEMS Microbiol. ble evolutionary relationship to regulatory components of two Bacilills su/>o Lett. 119:137-145. /ilis genes. J. Bacteriol. 169:2579-2590. 30a.Leenhouts. K. Unpublished data. 51. Schuler, G. D., S. F. Altschul. and D. J. Lipman. 1991. A workbench for 31. Lengeler, J. W" F. Titgemeyer. A. P. Vogler, and B. M. Wohrl. 1990. Struc­ multiple alignment construction and analysis, Proteins Struct. Funct. Genet. tures and homologies of carbohydrate:phosphotransferase system (PTS) pro­ 9:180-190. teins. Phil. Trans. R. Soc. Lond. B BioI. Sci. 326:489-504. 5la.Stolz, B. Unpublished data. 32. Maas. L. K., and T. L. Glass. 1991. Cellobiose uptake by the cellulolytic 52. Sugiyama, J. E., S. Mahmoodian, and G. R. Jacobson. 1991. Membrane rumina! anaerobe Fibrobacler succinogenes. Can. J. Microbiol. 37:141-147. topology analysis of Escherichia coli mannitol permease by using a nested­ 33. Martin, S. A. 1994. Nutrient transport by ruminal bacteria: a review. J. Anim. deletion method to create mlLA-phoA fusions. Proc. Natl. Acad. Sci. USA Sci. 72:3019-3031. 88:9603-9607. 34. Martin, S. A.. and J. B. Russell. 1987. Transport and phosphorylation of 53. Thurston, B.. 1(. A. Dawson, and H. J. Strobel. 1993. Cellobiose versus disaccharides by the ruminal bacterium StreplOcoccus bo,·Is. Appl. Environ. glucose utilization by the ruminal bacterium Rllminococcus albus. Appl. Microbiol. 53:2388-2393. Environ. Microbiol. 59:2631-2637. 35. Nochur, S. V., G. R. Jacobson, M. F. Roberts. and A. L. Demain. 1992. Mode of sugar phosphorylation in Closlridium lhennocellum. Appl. Biochem. Bio­ 54. Titgemeyer, F. 1986. Genetische UnterSUChungen zum Sucrose-Stoffwechsel bei Klebsiella pneumoniae. Diploma thesis. Universltiit Osnabrock. Os· technol. 33:33-41. nabrock. Germanv. 36. O·Day. K., J. Lopilato. and A. Wright. 1991. Physical locations of bgLA and serA on the Escherichia coli K-12 chromosome. J. Bacteriol. 173:1571. 55. Vogler, A. P., and J. w. Lengeler. 1991. Comparison of the sequences of nagE operons from Klebsiella pneumoniae and Escherichia coli K12: en­ 37. Parker. L. L.. and B. G. Hall. 1990. Characterization and nucleotide se­ hanced variability of the enzyme nN.acetylgluco;amin, in regions connecting quence of the cryptic cel operon of Escherichia coli K12. Genetics 124:455­ 471. functional domains. Mol. Gen. Genet. 230:270-276. 38. Peri, K. G.• and E. B. Waygood. 1988. Sequence of cloned enzyme 56. Wagner, E.. F. Gotz, and R. Bruckner. 1993. Cloning and characterization of Il"·acetylgluco;amin, of the phosphoenolpyruvate: N-acetylglucosamine phos­ the scrA gene encoding the sucrose-specific enzyme II of the phosphotrans­ photransferase system of Escherichia coli. Biochemistry 27:6054-6061. ferase system from Slaphylococcus xylosus. Mol. Gen. Genet. 241:33-41. 39. Postma. P. W., J. W. Lengeler, and G. R. Jacobson. 1996. Phosphoenolpyru­ 57. Wood, B.. and L. O. Ingram. 1992. Ethanol production from cellobiose. vate:carbohydrate phosphotransferase systems. p. 1149-1174. In F. C. Neid­ amorphous cellulose. and crystalline cellulose by recombinant Klebsiella 0-,)'­ hardt. R. Curtiss Ill. J. L. Ingraham. E. C. C. Lin. 1(. B. Low. B. Magasanik. IOca containing chromosomally integrated Zymomonas mobilis genes for W. S. Reznikoff. M. Riley. M. Schaechter. and H. E. Umbarger (ed.). Esch· ethanol production and plasmids expressing thermostable cellulase genes erichia coli and Salmonella: cellular amd molecular biology. ASM Press. from Closlridium /hennocellum. Appl. Environ. Microbiol. 58:2103-2110. Washington. D.C. 58. Wright, R. M.. M. D. Yablonsk:)', Z. P. Shalita, A. K. Goyal, and D. E. 40. Quiocho, F. A., and P. S. Ledvina. 1996. Atomic structure and specificity of Eveleigh. 1992. Cloning. characterization. and nucleotide sequence of a gene bacterial periplasmic receptors for active transport and chemotaxis: variation encoding Microblspora blspora BgIB. a thermostable J3-g1ucosidase expressed of common themes. Mol. Microbiol. 20:17-25. in Escherichia coli. Appl. Environ. Microbiol. 57:3455-3465. 41. Reidl, J., and W. Boos. 1991. The maLX malY operon of Escherichia coli 59. Wu, L.·F.. and M. H. Saier, Jr. 1990. Nucleotide sequence of the fruA gene. encodes a novel enzyme II of the phosphotrasferase system recognizing encoding the fructose permease ofRhodobacler capsula/us phosphotransfer­ glucose and maltose and an enzyme abolishing the endogenous induction of ase system. and analyses of the deduced protein sequence. J. Bacteriol. the maltose system. J. Bacteriol. 173:4862-4876. 172:7167-7178. 42. Reizer, A., G: M. Pao, and M. H. Saier, Jr. 1991. Evolutionary relationship 60. Yamagata, Y~ and E. Ichishima. 1995. A new alkaline serine protease from among the permease proteins of the bacterial phosphoenolpyruvate:sugar alkalophilic Bacillus sp.: cloning. sequencing. and characterization of an phosphotransferase system. Construction of phylogenetic trees and possible intracellular protease. Curro Microbiol. 30:357-366. relatedness to proteins of eukaryotic mitochondria. J. Mol. Evol. 33:179-193. 61. Yasbin, R. E.. and B. J. Anderson. 1980. Properties of Bacillus sublills 168 43. Reizer, J., V. Michotey, A. Reizer, and M. H. Saier, Jr. 1994. Novel phos­ derivatives freed of their natural prophages. Gene 12:155-159. photransferase system genes revealed by bacterial genome analysis: unique. 6la.Yoon, 1(.·H. GenBank accession no. Ll8875. putative fructose- and glucoside-specific systems. Protein Sci. 3:440-450. 62. Zagorec, M., and P. W. Postma. 1992. Cloning and nucleotide sequence of 44. Reizer, J., A. Reizer, and M. H. Saier, Jr. 1990. The cellobiose permease of the plsG gene of Bacillus SUblills. Mol. Gen. Genet. 234:325-328. Escherichia coli consists of three proteins and is homologous to the lactose 63. Zukowski, M. M~ L. Miller, P. Cosgwell, K. Chen, S. Aymerich. and M. permease of Slaphylococcus aureus. Res. Microbiol. 141:1061-1067. Steinmetz. 1990. Nucleotide sequence of the sacS locus of Bacillus SUblills 45. Saier, M. H.. Jr., and J. Reizer. 1992. Proposed uniform nomenclature for reveals the presence of two regulatory genes. Gene 90:153-155.

Supplied by U.S. Dept. of Agric., National Center for Agricultural Utilization Research, Peoria, IL