<<

doi:10.1016/S0022-2836(03)00307-3 J. Mol. Biol. (2003) 328, 307–317

COMMUNICATION An Evolving Hierarchical Family Classification for

Pedro M. Coutinho1, Emeline Deleury1, Gideon J. Davies2 and Bernard Henrissat1*

1Architecture et Fonction des Glycosyltransferases are a ubiquitous group of that catalyse the Macromole´cules Biologiques transfer of a sugar moiety from an activated sugar donor onto saccharide UMR6098, CNRS and or non-saccharide acceptors. Although many glycosyltransferases catalyse Universite´s d’Aix-Marseille I chemically similar reactions, presumably through transition states with and II, 31 Chemin Joseph substantial oxocarbenium ion character, they display remarkable diversity Aiguier, 13402 Marseille in their donor, acceptor and specificity and thereby generate a Cedex 20, France potentially infinite number of glycoconjugates, oligo- and polysacchar- ides. We have performed a comprehensive survey of - 2Structural Biology Laboratory related sequences (over 7200 to date) and present here a classification of Department of Chemistry, The these enzymes akin to that proposed previously for glycoside , University of York, Heslington into a hierarchical system of families, clans, and folds. This evolving York YO10 5YW, UK classification rationalises structural and mechanistic investigation, harnesses information from a wide variety of related enzymes to inform cell biology and overcomes recurrent problems in the functional prediction of glycosyltransferase-related open-reading frames. q 2003 Elsevier Science Ltd. All rights reserved Keywords: glycosyltransferases; protein families; classification; modular *Corresponding author structure; genomic annotations

The biosynthesis of complex carbohydrates and are involved in glycosyl transfer and thus conquer is of remarkable biological import- what Sharon has provocatively described as “the ance. These molecules govern a diverse range of last remaining frontier of molecular and cell cellular functions, including energy storage, cell- biology”.5 wall structure, cell–cell interactions and signalling, host–pathogen interactions, and protein glycosylation.1–3 Because these functions, The problems of current nomenclature especially those in which carbohydrate moieties act as a cellular language, rely on precise carbo- A vast number of glycosyltransferase sequences hydrate structures that display an extreme chemi- are unveiled by the sequencing of genomes. Cur- cal diversity, the biosynthesis of oligosaccharides rent estimates suggest that about 1% of the ORFs and polysaccharides may involve the action of of each genome is dedicated to the task of glyco- hundreds of different and selective glycosyl- sidic bond synthesis (P.M.C. & B.H., unpublished , the enzymes that transfer sugar results). Furthermore, protein glycosylation, a moieties from activated donor molecules to specific glycosyltransferase-catalysed process, massively acceptor molecules (Figure 1).4 The challenge of the expands the functional proteome of higher post-genomic era is to dissect the myriad of open organisms. It is a huge drawback, and not merely reading frames (ORFs) whose encoded proteins to glycobiology, that glycosyltransferases have often proved extremely hard to characterise bio- chemically. This has resulted in a widening chasm Abbreviations used: CAZy, carbohydrate-active of ignorance separating the few enzymes of enzymes; CBM, carbohydrate-binding module; EC, Commission; IUBMB, International Union of known biological activity (and even fewer with Biochemistry and Molecular Biology; ORF, open reading 3-D structures) from the thousands of putative frame. glycosyltransferase sequences now available in E-mail address of the corresponding author: databanks. To utilise the genomic resource to the [email protected] fullest, it is essential to understand the sequences

0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved 308 Family Classification of Glycosyltransferases

sequences are available and several hundred are at various stages of completion†, compared to the 12 that were available in 1997. The sequencing of the genomes of several higher eukaryotes has further augmented the number of glycosyl- transferases, as many of these organisms contain hundreds of glycosyltransferases, reflecting the impact of glycosylation in cellular communication and differentiation.13 Furthermore, the growing impact of glycobiology has driven the functional Figure 1. “True” glycosyltransferases use sugar donors identification of many novel glycosyltransferases, in which the activating group is typically a (substituted) several of which are unrelated to the families phosphate such as a (R ¼ nucleoside mono- originally described.8,12 Finally, our classification phosphate), nucleoside monophosphate (R ¼ has been extended to integrate glycosyltransferases ¼ nucleoside) or lipid phosphate (R lipid). The acceptor that utilise dolichol-phospho-sugars, sugar-1-phos- is shown in red as R0OH (R0 is extremely varied and can be a sugar, a lipid, a protein, an antibiotic, a nucleic phates, nucleoside monophospho-sugars and lipid acid, etc.). may occur with two possible out- diphospho-sugars as activated donors in addition comes: inversion or retention of the anomeric configur- to nucleotide sugar-dependent enzymes: all these ation of the donor. enzymes perform an O-glycosylation reaction, pre- sumably through a transition state(s) with signifi- cant oxocarbenium-ion-like character, and of the enzymes themselves, and how these involving departure of a chemically equivalent sequences relate to enzyme structure, mechanism phospho-containing leaving group from their and specificity.6 activated sugar donors. Traditionally, glycosyltransferases, as with other The factors described previously allow a signifi- enzymes, have been classified on the basis of their cant and substantial expansion of the glycosyl- donor, acceptor and product specificity, according family classification system to to the recommendations of the International encompass presently 65 distinct sequence-derived Union of Biochemistry and Molecular Biology families. The resultant grouping of enzymes with (IUBMB).7 There are, however, severe limitations different donor, acceptor or product specificity to the utility of this system for classification of into polyspecific families provides powerful glycosyltransferases. Such a scheme requires the insight into the divergent evolution of glycosyl- full characterisation of an enzyme’s donor, transferase families. Significantly, the discrimina- acceptor(s) and product(s) before an Enzyme Com- tory power of this classification system is such mission (EC) number can be assigned. EC numbers that the molecular mechanism is conserved within cannot adequately accommodate enzymes that act a given family; this provides the means for the on several distinct substrates and, furthermore, extrapolation of mechanistic information from the the EC numbers do not reflect the intrinsic few biochemically-characterised cases and pro- structural and mechanistic features of the enzymes, vides a powerful tool to inform genomic annota- nor were they ever intended to do so. tion and inspire strategies for the utilisation of Unfortunately, a classification scheme that genomic information. demands demonstration of function is ill-equipped for the post-genomic era. Enzymes not included in the classification The IUBMB classification features one class of Sequence families: historical perspectives glycosyltransferase not considered here: the enzymes that utilise disaccharides, oligosacchar- In order to overcome the limitations of the IUBMB system and to reflect the likely increase in ides or polysaccharides as sugar donors, such as cyclodextrin glucanotransferases (EC 2.4.1.19), dex- sequence data, Campbell and colleagues had transucrase (EC 2.4.1.5), xyloglucan endotrans- proposed the classification of glycosyltransferases into families on the basis of similarities in amino ferases (EC 2.4.1.207), etc. Unlike the 8 glycosyltransferases discussed here, these enzymes acid sequence, a scheme inspired by the analogous and widely accepted classification of glycoside are transglycosidases which are structurally, hydrolases.9–11 In 1997, 27 families of glycosyl- mechanistically and evolutionarily related to glyco- sidases. Transglycosidases are therefore included transferases were described based on the analysis 9–11 of the 600 sequences available at that time.8,12 in the glycosidase classifications. 1-Phosphosugar transferases, such as those Since then, the number of glycosyltransferase- that catalyse the conversion of a UDP-sugar related sequences has grown dramatically to over and dolichyl phosphate to UMP and a 7200 at present. The enormous increase in the number of sequences encoding glycosyltransferases reflects † See Genome OnLine at URL: several factors. Over 120 complete genome http://wit.integratedgenomics.com/GOLD/ Family Classification of Glycosyltransferases 309

Table 1. The updated family classification of glycosyltransferases

Number of sequences Taxonomic Representative Family (12-Feb-2003) range Known activities Mechanism Clan Fold PDB code(s) GT1 708 A; B; V; Ea; b- INV I GT-B 1IIR Ep; Ef; Ex 2-Hydroxyacylsphingosine b- N-Acylsphingosine b-galactosyltransferase Flavonol 3-O-b- Indole-3-acetate b-glucosyltransferase Sterol b-glucosyltransferase Ecdysteroid b-glucosyltransferase Zeaxanthin b-glucosyltransferase Zeatin O-b-glucosyltransferase Zeatin O-b- Limonoid b-glucosyltransferase Sinapate b-glucosyltransferase GT2 1946 A; B; V; Ea; Cellulose synthase INV I GT-A 1QG8 Ep; Ef; Ex Dolichyl-phosphate b- Dolichyl-phosphate b-glucosyltransferase b-N-Acetylglucosaminyltransferase b-N-Acetylgalactosaminyltransferase Chitin oligosaccharide synthase b-1,3-Glucan synthase GT3 17 A; Ea; Ef synthase RET GT4 1236 A; B; V; Ea; RET Ep; Ef; Ex Sucrose-phosphate synthase a-Glucosyltransferase a-N-Acetylglucosaminyltransferase a-Mannosyltransferase 1,2-Diacylglycerol a-3-glucosyltransferase Trehalose GT5 471 B; Ep; Ef Glycogen glucosyltransferase RET Starch glucosyltransferase GT6 73 Ea a-1,3-Galactosyltransferase RET III GT-A 1FG5 a-1,3 N-Acetylgalactosaminyl- transferase a-Galactosyltransferase Globoside a-N-acetylgalactosaminyl- transferase GT7 44 Ea INV I GT-A 1FGX b-1,4-Galactosyltransferase N-Acetyllactosamine synthase b-1,4-N-Acetylglucosaminyl- transferase GT8 200 A; B; V; Ea; Lipopolysaccharide a-galactosyl- RET III GT-A 1GA8 Ep; Ef; Ex transferase Lipopolysaccharide a-glucosyl 1LL0 transferase a-glucosyltransferase Inositol 1-a-galactosyltransferase GT9 147 B Lipopolysaccharide b-N-acetyl- INV glucosaminyltransferase Heptosyltransferase GT10 89 B; Ea; Ep; Ex Galactoside a-1,3-1,4-fucosyl- INV transferase Galactoside a-1,3- Glycoprotein a-1,3-fucosyltransferase GT11 89 B; Ea; Ex Galactoside a-1,2-fucosyltransferase INV GT12 5 Ea b-N-Acetylgalactosaminyltransferase INV

(continued) 310 Family Classification of Glycosyltransferases

Table 1 Continued

Number of sequences Taxonomic Representative Family (12-Feb-2003) range Known activities Mechanism Clan Fold PDB code(s)

GT13 26 Ea; Ep b-1,2-N-Acetylglucosaminyl- INV I GT-A 1FO8 transferase GT14 81 V; B; Ea; Ep b-1,6-N-Acetylglucosaminyl- INV transferase Protein b-xylosyltransferase GT15 22 Ef a-Mannosyltransferase RET GT16 9 Ea; Ep b-1,2-N-Acetylglucosaminyl- INV transferase GT17 14 Ea; Ep; Ex b-1,4-N-Acetylglucosaminyl- INV transferase GT18 7 Ea b-1,6-N-Acetylglucosaminyl- INV transferase GT19 54 B; Ep Lipid-A-disaccharide synthase INV GT20 77 A; B; Ea; Ep; a,a-Trehalose-phosphate synthase RET IV GT-B 1GZ5 Ef GT21 24 B; Ea; Ep; Ef Ceramide b-glucosyltransferase INV GT22 32 B; Ea; Ep; Ef; Dolichyl-phosphate-mannose a- INV Ex mannosyltransferase GT23 17 B; Ea a-1,6-Fucosyltransferase INV GT24 15 Ea; Ep; Ef; Ex UDP- glycoprotein a-gluco- RET syltransferase GT25 100 B; Ea; Ex Lipopolysaccharide biosynthesis INV protein b-1,4-Galactosyltransferase GT26 61 B b-N-Acetyl mannosaminuronic acid INV transferase b-1,4-Glucosyltransferase GT27 85 Ea; Ex a-N-Acetylgalactosaminyltransferase RET GT28 122 A; B; Ep 1,2-Diacylglycerol 3-b-galactosyl- INV II GT-B 1F0K transferase 1,2-Diacylglycerol 3-b-glucosyl- transferase b-N-Acetylglucosaminyltransferase GT29 94 V; Ea; Ep a-2,6- INV a-2,3-Sialyltransferase a-2,8-Sialyltransferase

GT30 78 B; Ep a-3-Deoxy-D-manno-octulosonic-acid INV transferase GT31 173 Ea; Ep; Ef; Ex b-1,3-N-Acetylglucosaminyl- INV transferase b-1,3-Galactosyltransferase Fucose-specific b-1,3-N-acetyl- glucosaminyltransferase Globotriosylceramide b-1,3-N-acetyl- galactosaminyltransferase Chondroitin b-1,3-glucuronyl- transferase Chondroitin b-1,4-N-acetylgalacto- saminyltransferase GT32 63 B; Ea; Ep; Ef a-1,6-Mannosyltransferase RET a-1,4-N-Acetylglucosaminyl- transferase GT33 11 Ea; Ep; Ef; Ex b-Mannosyltransferase RET GT34 21 Ep; Ef a-1,2-Galactosyltransferase RET a-1,2-Xylosyltransferase GT35 140 A; B; Ea; Ep; Glycogen and RET IV GT-B 1ABB, 1YGP Ef; Ex

(continued) Family Classification of Glycosyltransferases 311

Table 1 Continued

Number of sequences Taxonomic Representative Family (12-Feb-2003) range Known activities Mechanism Clan Fold PDB code(s)

GT36 29 B INV Cellodextrin phosphorylase Chitobiose phosphorylase GT37 15 Ep a-1,2-Fucosyltransferase INV GT38 5 B Poly-a-sialyltransferase INV GT39 40 B; Ea; Ep; Ef; Dolichyl-phosphate-mannose-protein INV Ex mannosyltransferase GT40 4 Ex b-1,3-Galactofuranosyltransferase INV GT41 34 B; Ea; Ep; Ef b-N-Acetylglucosaminyltransferase INV GT42 16 B a-2,3-Sialyltransferase INV GT43 36 Ea; Ep; Ex b-Glucuronyltransferase INV I GT-A 1FGG GT44 22 B Glucosyltransferase n.a. N-Acetylglucosaminyltransferase GT45 5 B a-N-Acetylglucosaminyltransferase RET GT46 8 B Glycosyltransferase n.a. GT47 82 Ea; Ep b-Glucuronyltransferase INV Heparan synthase GT48 51 Ep; Ef b-1,3-Glucan synthase INV GT49 17 Ea; Ex b-1,3-N-Acetylglucosaminyl- INV transferase GT50 11 Ea; Ep; Ef; Ex Dolichyl-phosphate-mannose a-1,4- INV mannosyltransferase GT51 312 B Murein polymerases INV GT52 28 B a-2,3-Sialyltransferase INV GT53 17 B INV GT54 16 Ea b-1,4-N-Acetylglucosaminyl- INV transferase GT55 5 A a-Mannosyltransferase RET GT56 10 B 4-Acetamido-4,6-dideoxy-galactosyl- INV transferase (Fuc4NAc transferase) GT57 15 Ea; Ep; Ef Dolichyl-phosphate-glucose a-1,3- INV glucosyltransferase GT58 10 Ea; Ep; Ef Dolichyl-phosphate-mannose a-1,3- INV mannosyltransferase GT59 7 Ea; Ep; Ef Dolichyl-phosphate-glucose a-1,2- INV glucosyltransferase GT60 4 B; Ex Hydroxyproline polypeptide n.a. N-acetylglucosaminyltransferase GT61 2 Ep b-1,2-Xylosyltransferase INV GT62 11 Ef a-1,2-Mannosyltransferase RET a-1,6-Mannosyltransferase GT63 1 V DNA b-glucosyltransferase INV II GT-B 1BGT GT64 25 Ea; Ep a-GlcNAc transferase; heparan RET GT-A 10MX synthase GT65 6 Ea a-Fucosyltransferase INV Sequences were retrieved from the NCBI and then subjected to BLAST18 in-house against a library containing all sequences of the CAZy database deleted of their known non-catalytic modules. Any sequence which matched significantly (typically with a BLAST e-value , 1025) several known members of a glycosyltransferase family8 was added to that family. Novel glycosyltransferase families were created from characterised glycosyltransferases (and their homologs) displaying no significant sequence similarity to any pre- viously defined glycosyltransferase family. Taxonomic range: A, archaea; B, bacteria; V, virus; Ea, eukaryota (animal); Ep, eukaryota (plant); Ef, eukaryota (fungal); Ex, other eukaryota. Mechanism: INV, inverting mechanism; RET, retaining mechanism; n.a., the stereochemical outcome of the reaction could not be found in the literature. 312 Family Classification of Glycosyltransferases sugar-diphosphodolichol, are often and incorrectly high level of similarity between an ORF and many termed glycosyltransferases. These enzymes members of a large and well-characterised mono- catalyse substitution at the phosphorus atom of specific family should one be more confident of phosphate (EC 2.7.8.x); they do not catalyse the prediction of specificity. glycosyl transfer and consequently are not Conversely, an additional facet of the classifi- included in the glycosyltransferase classification. cation is that it highlights the potential for over- prediction. There are well-documented cases where substitution of only very few residues is An updated and evolving sequence sufficient to alter the specificity of a glycosyltrans- classification for glycosyltransferases ferase. In an extreme case, it has been shown that Table 1 shows a summary of the content of the 65 mutation of a single residue changes an a-1,3-N- glycosyltransferase families identified so far, acetylgalactosaminyltransferase into an a-1,3- 15 including families GT1–GT27 described galactosyltransferase. Thus, whilst one may be previously.8,12 These continuously updated able to exclude some possibilities, if an ORF is families, together with their links to appropriate distantly related to a largely polyspecific family or databases (including GenBank, SwissProt, Enzyme, to a family where few members have been charac- Taxonomy, Protein DataBank, etc.) are available terised, the precise donor, acceptor and product from the Carbohydrate-Active enZymes (CAZy) specificities cannot be predicted reliably. database†. Our classification is probably incom- plete, as it is likely that some glycosyltransferase Glycosyltransferases: fold, clans, families, families are not yet known. The CAZy database stereochemistry and specificities will report novel families if and when these are 8,12,16 identified. As already noted, distant similarities As a direct consequence of the deluge of between some families are revealed with sensitive sequence data from the genome sequencing sequence-similarity detection methods such as 17 18 projects, the vast majority of the members of the hydrophobic cluster analysis or PSI-BLAST. glycosyltransferase families are uncharacterised These distant similarities indicate interfamily ORFs. Almost half of the known glycosyltransfer- relatedness, presumably as a result of evolutionary ase sequences are from fully sequenced genomes divergence. 3-D structure comparison is, arguably, and this proportion will increase in the future. the most powerful means to establish relatedness Given the well-documented difficulties in the pro- of proteins, even in the absence of detectable duction and characterisation of these enzymes sequence similarity, and the recent elucidation of (many have one or several membrane-spanning the first ten 3-D structures of glycosyltransferases domains, others are membrane-associated; the from different families has allowed us, and 19 – 22 synthesis of all putative acceptor molecules is others, to comment on the occurrence of just potentially impossible even for combinatorial two different topologies, in marked contrast to the approaches14), this situation will worsen. We note huge diversity of folds utilised by glycoside 23,24 that about one-third of the sequence-derived hydrolases. families contain enzymes of different donor/accep- That there are, thus far, two folds is perhaps a tor and product specificity. Indeed, it is likely that reflection of both the constraints of a nucleotide- the number of such “polyspecific” families will binding motif and the potential evolutionary origin increase when the biochemical properties of from few precursor sequences. Primitive Archaea more members are established. The classification possess just two glycosyltransferases families, GT2 system represents a significant step towards a and GT4, from which the others may have evolved. better annotation of these ORFs. Family GT4 is particularly interesting, in that it contains enzymes that harness both nucleotide- sugar and phospho-sugar donors, suggesting an Genomic annotation additional evolutionary link between these two classes of enzymes highlighted further by the A feature of the sequence-based classification is recently observed similarities between OtsA, a that given families contain enzymes that display UDP-glucosyltransferase (GT20), and glycogen the same stereochemical outcome (Figure 1). Even phosphorylase (GT35), which uses glucose-1-phos- in the most general case, instead of annotating an phate as the sugar donor.25,26 ORF as putative glycosyltransferase (now The two glycosyltransferase folds have been commonplace), one may annotate it as putative described as GT-A, as first observed for the Bacillus retaining (or inverting) glycosyltransferase from subtilis SpsA from family GT2, and GT-B, as family GTxx. Whilst a superficially small improve- described for the phage T4 b-glucosyltransferase ment, such annotations would massively improve and the catalytic core of .20 our ability to dissect diverse cellular processes such as cell-surface antigen synthesis, for example. The GT-A fold may be considered as two tightly associated and abutting b/a/b domains that tend Only on those occasions where there is a very to form a continuous central sheet of at least eight b-strands (hence, some authors describe this as a † http://afmb.cnrs-mrs.fr/CAZY/ single domain fold). The GT-A enzymes share Family Classification of Glycosyltransferases 313

Figure 2. The hierarchical classification of glycosyltransferases from folds to clans, families and subfamilies, illustrated for those glycosyltransferases with a reported 3-D structure. SpsA, Bacillus subtilis glycosyltransferase;42 bGalT, bovine b-1,4-galactosyltransferase;43 bGlcNAcT, rabbit b-1,2-N-acetylglucosaminyltransferase I;19 bGlcUAT, human b-1,3-glucuronyltransferase;34 aGalT, bovine a-1,3-galactosyltransferase;44 LgtC, Neisseria meningitidis a-galactosyltransferase;45 glycogenin, rabbit glycogenin;46 GtfB, Amycolatopsis orientalis b-glucosyltransferase;47 MurG, Escherichia coli b-N-acetylglucosaminyltransferase;48 T4-BGT, bacteriophage T4 DNA b-glucosyltransferase;49 glycogen phosphorylase, rabbit glycogen phosphorylase;50 OtsA, E. coli trehalose-6-phosphate synthase.25 The stereochemical outcome of these enzymes is indicated for each clan: a ! a, retention of configuration (the donor and the glycosidic bond formed are both axial); a ! e, inversion of configuration (the donor is axially linked, whilst the glycosidic bond formed is equatorial). common, though not sequence-invariant, / and prediction demands better characterisation of M2þ coordinating carboxylates often termed DxD protein–ligand complexes for these enzymes. motif (see below). Furthermore, the inverting Thus far, nucleotide binding has been observed on GT-A members share a structurally equivalent the N-terminal domain of the GT-A enzymes and carboxylate, Asp or Glu, which acts as a catalytic on the C-terminal domains of the GT-B enzymes base.21 Beyond this limited catalytic similarity, with the acceptors binding on the other domain. plus the observation of an aspartate residue that Given the intrinsic “symmetry” of these folds, it is coordinates N3 of the uridine moiety (in some quite conceivable that some, as yet undescribed, GT-A members only), no other conserved feature glycosyltransferases may bind the nucleotide- is obvious. The GT-B fold, again displays two sugar donor and acceptor in the “reversed” Rossmann-like b/a/b domains, but these are orientation. associated less tightly, indeed they “face each These two folds do not control the stereochemi- other” with ligand binding being associated with cal outcome of the reaction, since some retaining conformational changes in the relative orientation. glycosyltransferases (which make glycosidic Wrabl and Grishin have pointed out limited bonds with a stereochemistry identical with that sequence motifs that are characteristic of this of the sugar donor; Figure 1) have folds clearly fold,16 with a notable ribose-coordinating gluta- related to inverting glycosyltransferases (which mate and phosphate-coordinating glycine-rich instead make glycosidic bonds with a stereo- loops, but as described below, further comparison chemistry opposite to that of the sugar donor; 314 Family Classification of Glycosyltransferases

Figure 3. The modularity of glycosyltransferases. (A) Examples of glycosyltransferases with appended non-catalytic modules; (B) examples of tandems of two glycosyltransferases on a single polypeptide; (C) a glycosyltransferase with an appended (trans)glycosidase domain. CBM13, carbohydrate-binding module of family 13 (a family classification of CBMs is available from the CAZy database at: http://afmb.cnrs-mrs.fr/CAZY/CBM.html) galectin, module displaying sequence similarity to galectins; SH3, Src homology 3 domain; UNK, module of unknown function; GH13, transglycosidase module from glycoside family GH13. A family classification of glycoside hydrolases is available from the CAZy database at: http://afmb.cnrs-mrs.fr/CAZY/GH.html.

Figure 1). While certainly pointing to common clan IV, with a GT-B fold and containing families evolutionary origins for retaining and inverting GT20 and GT35 (Figure 2 and Table 1). We and glycosyltransferases, any system that would clas- others have used various sensitive fold recognition sify the glycosyltransferases solely on their overall or threading analyses that suggest strongly that fold would therefore have limited predictive many other glycosyltransferase families are likely power. As noted before,6,20 fold recognition and to comprise proteins with one of these two threading analyses suffer from their very folds,16,19,20 and that therefore the number of advantage, in that these methods are so sensitive families in the four clans defined here will increase that they are indeed able to detect existing folding in the future. similarities between glycosyltransferases and Subfamily, family, clan (or superfamily) and enzymes that have similar folds but that perform common fold are recurrently used terms to other reactions such as UDP-N-acetylglucosamine describe different degrees of protein relatedness. 2-epimerase27 or a glucosamine-1-phosphate Even though subfamily, family, clan and common pyrophosphorylase.28 fold correlate with increasing divergence, there is By analogy to the glycoside hydrolases,23,24 we unfortunately no (and there cannot be any) clear propose the grouping of families of glycosyltrans- definition of the thresholds between these ferases displaying similar fold, analogous catalytic categories; the definitions are always arbitrary.29 apparatus and identical molecular mechanism into Categorisation at the subfamily level would clans. For the inverting enzymes, two clans may provide maximal functional prediction but would be identified: clan I, with the GT-A fold and con- result in hundreds of different categories (some of taining families GT2, GT7, GT13 and GT43;21 and them strongly related). At the other end of the clan II, with the GT-B fold and containing families spectrum, a grouping into common folds (only GT1, GT28 and GT63 (Figure 2 and Table 1) with two for glycosyltransferases so far) enables the the caveat in this latter case that there are, as yet, detection of distant evolutionary relationships, but insufficient ligand complexes available for these provides little functional predictive power three families to allow unambiguous characteris- (Figure 2). Because each level has its own advan- ation and comparison of their catalytic apparatus. tages and limitations, a classification system must Similarly, two clans can be defined for the retain- include different hierarchical levels reflecting the ing glycosyltransferases: clan III, with the GT-A variety of the users’ needs. The CAZy database fold and containing families GT6 and GT8; and reports, where known, three levels (family, clan Family Classification of Glycosyltransferases 315 and fold; Table 1). We are currently evaluating the plant pectin b-glucuronyltransferases,39 is most utility of including subfamily information in the likely a b-glucuronyltransferase. The second future. module, GT64, found in isolation in other plants genes, is an a-N-acetylglucosaminidase.40 These observations demand systematic analyses of the The CAZy classification highlights sequence modular structure of glycosyltransferases. An “pitfalls” including so-called “conserved” analysis of the sequences in the CAZy database motifs and modularity reveals a number of potentially modular glycosyl- One benefit of the sequence family classification transferases, which display a single glycosyltrans- is that it allows one to assess other potential diag- ferase domain alongside a module of nostics of glycosyltransferase activity. For example, uncharacterised function or family (P.M.C. & B.H., it has been observed that a number of glycosyl- unpublished results). In many cases these may transferases contain a so-called DxD motif,30 – 33 turn out to be bi-functional glycosyltransferases, although, confusingly, none of the elements of this as described above. An example is ORF K09C8.4 conserved motif is invariant. In the GT-A fold of Caenorhabditis elegans, which appears to com- structures, this motif binds one of the ribose prise two glycosyltransferase modules from hydroxyl groups and a divalent metal ion coordi- families GT8 and GT49 (Figure 3). The function of nated to the phosphate groups.19,21,34 A survey of this ORF and of its human orthologs is not known. the CAZy database shows that 71% of all glycosyl- A third and rare category of modular glycosyl- transferases contain this signature (regardless of transferases is that where the glycosyltransferase the position of the motif in the sequences). This module carries an appended glycoside hydrolase/ proportion does not exceed significantly that transglycosidase domain. The best characterised found in the unrelated glycoside hydrolases, since example is cell-wall a-glucan synthase of 69% of the latter also have a DxD motif. Indeed, Schizosaccharomyces pombe, which contains an the DxD motif was found in 51% of all sequences N-terminal module belonging to glycoside from SwissProt (version 40; 119,805 entries). This hydrolase family GH13 followed by a GT5 implication is clear. Neither is the occurrence of glycosyltransferase (Figure 3).41 It is likely that the such a motif alone diagnostic of a potential glyco- role of the glycosidase module is to perform a syltransferase function nor is the occurrence of transglycosylation reaction on the product of the more than one copy of a DxD motif necessarily polymerising glycosyltransferase. indicative of two active centres. An additional feature of many glycosyltrans- Concluding remarks ferases, in common with many carbohydrate-active enzymes, is their modularity. Dissection of this Structurally, glycosyltransferases could be modularity is an essential pre-requisite both to mistaken as “dull”, as they seem to adopt either successful genome annotation and utilisation of one of only two folds. Given the large number of genomic data.35 A number of glycosyltransferases nucleotide-sugar donors, the huge variety of display an appended carbohydrate-binding acceptors (almost any class of molecule can be module (CBM) such as CBM-13, galectin-like or glycosylated: proteins, sugars, lipids, steroids, other accessory non-catalytic modules that could nucleic acids, antibiotics, etc.) and the resulting potentially mediate recognition and/or astronomical number of products, the two struc- protein–protein interaction (Figure 3). A fascinat- tural templates prove to be amongst the most ing and emerging modularity is displayed by ingenious and versatile scaffolds in nature. The tandem glycosyltransferase modules within a almost infinite variety of glycosyltransferase pro- single polypeptide, common in glycosyltrans- ducts makes function prediction for thousands of ferases involved in the synthesis alternating poly- sequences based on a handful of structures clearly saccharides such as hyaluronan, chondroitin and impossible. Only a large enough number of struc- heparin. The first dissected and well-characterised tures and characterisations of glycosyltransferases example is that of Pasteurella multocida hyaluronan with different donor, acceptor and product specifi- synthase, which is composed of two GT2 family cities will allow us to accumulate enough members,36 one using UDP-N-acetylglucosamine, knowledge to predict the precise function of the the other UDP-glucuronic acid as a donor. Since many genes encoding glycosyltransferases. then, other examples of tandems of glycosyltrans- ferases have been identified in glycosyltransferases producing alternating polysaccharides. Examples include bacterial heparosan synthase37 and 38 chondroitin synthase (Figure 3). Acknowledgements Higher organisms also produce bi-functional modular glycosyltransferases. For instance, the We thank Chris Whitfield (Guelph, Ontario, large human heparan synthases of the EXT type Canada), Warren Wakarchuk (Ottawa, Ontario, can be divided into two unrelated glycosyltransfer- Canada), Chris West (Gainesville, FL, USA), Rafael ase modules assigned to families GT47 and GT64 Oriol (Villejuif, France) for useful discussions (Figure 3). The first module, found in isolation in and/or for sharing unpublished observations with 316 Family Classification of Glycosyltransferases us. This work was funded by grant QLK5-CT2001- 17. Gaboriaud, C., Bissery, V., Benchetrit, T. & Mornon, 00443 (EDEN) of the European Commission and J.-P. (1987). Hydrophobic cluster analysis: an efficient by the Wellcome Trust. G.J.D is a Royal Society new way to compare and analyse amino acid University Research Fellow. sequences. FEBS Letters, 224, 149–155. 18. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402. References 19. U¨ nligil, U., Zhou, S., Yuwaraj, S., Sarkar, M., 1. Rudd, P. M., Elliott, T., Cresswell, P., Wilson, I. A. & Schachter, H. & Rini, J. (2000). X-ray crystal structure Dwek, R. A. (2001). Glycosylation and the immune of rabbit N-acetylglucosaminyltransferase I: catalytic system. Science, 291, 2370–2376. mechanism and a new . EMBO J. 2. Wells, L., Vosseller, K. & Hart, G. W. (2001). Glycosyl- 19, 5269–5280. ation of nucleocytoplasmic proteins: signal trans- 20. Bourne, Y. & Henrissat, B. (2001). Glycoside hydro- duction and O-GlcNAc. Science, 291, 2376–2378. lases and glycosyltransferases: families and func- 3. Bertozzi, C. R. & Kiessling, L. L. (2001). Chemical tional modules. Curr. Opin. Struct. Biol. 11, 593–600. glycobiology. Science, 291, 2357–2364. 21. Tarbouriech, N., Charnock, S. J. & Davies, G. J. 4. Kleene, R. & Berger, E. G. (1993). The molecular and (2001). Three-dimensional structures of the Mn and cell biology of glycosyltransferases. Biochim. Biophys. Mg dTDP complexes of the family GT-2 glycosyl- Acta, 1154, 283–325. transferase SpsA: a comparison with related NDP- 5. Sharon, N. (2001). The conquest of the last frontier of sugar glycosyltransferases. J. Mol. Biol. 314, 655–661. molecular and cell biology. Biochimie, 83, 555. 22. Hu, Y. & Walker, S. (2002). Remarkable structural 6. Davies, G. J. & Henrissat, B. (2002). Structural similarities between diverse glycosyltransferases. enzymology of carbohydrate-active enzymes: impli- Chem. Biol. 9, 1287–1296. cations for the post-genomic era. Biochem. Soc. Trans. 23. Davies, G. & Henrissat, B. (1995). Structures and 30, 291–297. mechanisms of glycosyl hydrolases. Structure, 3, 7. Enzyme Nomenclature (1992). Recommendations of the 853–859. Nomenclature Committee of the International Union of 24. Henrissat, B. & Davies, G. (1997). Structural and Biochemistry and Molecular Biology on the Nomenclature sequence-based classification of glycoside and Classification of Enzymes, Academic Press, London. hydrolases. Curr. Opin. Struct. Biol. 7, 637–644. 8. Campbell, J. A., Davies, G. J., Bulone, V. & Henrissat, 25. Gibson, R. P., Turkenburg, J. P., Charnock, S. J., B. (1997). A classification of nucleotide-diphospho- Lloyd, R. & Davies, G. J. (2002). Insights into sugar glycosyltransferases based on amino acid trehalose synthesis provided by the structure of the sequence similarities. Biochem. J. 326, 929–939. retaining glucosyltransferase OtsA. Chem. Biol. 9, 9. Henrissat, B. (1991). A classification of glycosyl 1337–1346. hydrolases based on amino acid sequence simi- 26. Withers, S. G., Wakarchuk, W. W. & Strynadka, N. C. larities. Biochem. J. 280, 309–316. (2002). One step closer to a sweet conclusion. Chem. 10. Henrissat, B. & Bairoch, A. (1993). New families in Biol. 9, 1270–1273. the classification of glycosyl hydrolases based on 27. Campbell, R. E., Mosimann, S. C., Tanner, M. E. & amino acid sequence similarities. Biochem. J. 293, Strynadka, N. C. (2000). The structure of UDP-N- 781–788. acetylglucosamine 2-epimerase reveals homology to 11. Henrissat, B. & Bairoch, A. (1996). Updating the phosphoglycosyl transferases. Biochemistry, 39, sequence-based classification of glycosyl hydrolases. 14993–15001. Biochem. J. 316, 695–696. 28. Brown, K., Pompeo, F., Dixon, S., Mengin-Lecreulx, 12. Campbell, J. A., Davies, G. J., Bulone, V. V. & D., Cambillau, C. & Bourne, Y. (1999). Crystal struc- Henrissat, B. (1998). A classification of nucleotide- ture of the bifunctional N-acetylglucosamine 1-phos- diphospho-sugar glycosyltransferases based on phate uridyltransferase from Escherichia coli:a amino acid sequence similarities. Biochem. J. 329, 719. paradigm for the related pyrophosphorylase super- 13. Henrissat, B., Coutinho, P. M. & Davies, G. J. (2001). family. EMBO J. 18, 4096–4107. A census of carbohydrate-active enzymes in the 29. Henrissat, B. & Romeu, A. (1995). Families, super- genome of . Plant Mol. Biol. 47, families and subfamilies of glycosyl hydrolases. 55–72. Biochem. J. 311, 350–351. 14. Laine, R. A. (1994). A calculation of all possible oligo- 30. Costa, A. A., Gomez, F. J., Pereira, M., Felipe, M. S., saccharide isomers both branched and linear yields Jesuino, R. S., Deepe, G. S., Jr & de Almeida Soares, 1.05 £ 1012 structures for a reducing hexasaccharide: C. M. (2002). Characterization of a gene which the isomer barrier to development of single-method encodes a mannosyltransferase homolog of saccharide sequencing or synthesis systems. Paracoccidioides brasiliensis. Microb. Infect. 4, Glycobiology, 4, 759–767. 1027–1034. 15. Seto, N. O., Compston, C. A., Evans, S. V., Bundle, 31. Stolz, J. & Munro, S. (2002). The components of the D. R., Narang, S. A. & Palcic, M. M. (1999). Donor Saccharomyces cerevisiae mannosyltransferase substrate specificity of recombinant human blood complex M-Pol I have distinct functions in mannan group A, B and hybrid A/B glycosyltransferases synthesis. J. Biol. Chem. 277, 44801–44808. expressed in Escherichia coli. Eur. J. Biochem. 259, 32. Wiggins, C. A. & Munro, S. (1998). Activity of the 770–775. yeast MNN1 a-1,3-mannosyltransferase requires a 16. Wrabl, J. O. & Grishin, N. V. (2001). Homology motif conserved in many other families of glycosyl- between O-linked GlcNAc transferases and proteins transferases. Proc. Natl Acad. Sci. USA, 95, 7945–7950. of the glycogen phosphorylase superfamily. J. Mol. 33. Breton, C., Bettler, E., Joziasse, D. H., Geremia, R. A. & Biol. 314, 365–374. Imberty, A. (1998). Sequence–function relationships Family Classification of Glycosyltransferases 317

of prokaryotic and eukaryotic . Bacillus subtilis, in native and nucleotide-complexed J. Biochem. (Tokyo), 123, 1000–1009. forms. Biochemistry, 38, 6380–6385. 34. Pedersen, L. C., Tsuchida, K., Kitagawa, H., 43. Gastinel, L. N., Cambillau, C. & Bourne, Y. (1999). Sugahara, K., Darden, T. A. & Negishi, M. (2000). Crystal structures of the bovine b 4-galactosyltrans- Heparan/chondroitin sulfate biosynthesis: structure ferase catalytic domain and its complex with uridine and mechanism of human glucuronyltransferase I. diphosphogalactose. EMBO J. 18, 3546–3557. J. Biol. Chem. 275, 34580–34585. 44. Gastinel, L. N., Bignon, C., Misra, A. K., Hindsgaul, 35. Henrissat, B. & Davies, G. J. (2000). Glycoside hydro- O., Shaper, J. H. & Joziasse, D. H. (2001). Bovine a lases and glycosyltransferases: families, modules and 1,3-galactosyltransferase catalytic domain structure implications for genomics. Plant Physiol. 124, and its relationship with ABO histo-blood group 1515–1519. and glycosphingolipid glycosyltransferases. EMBO 36. Jing, W. & DeAngelis, P. L. (2000). Dissection of the J. 20, 638–649. two transferase activities of the Pasteurella multocida 45. Persson, K., Ly, H. D., Dieckelmann, M., Wakarchuk, hyaluronan synthase: two active sites exist in one W. W., Withers, S. G. & Strynadka, N. C. (2001). Crys- polypeptide. Glycobiology, 10, 883–889. tal structure of the retaining galactosyltransferase 37. DeAngelis, P. L. & White, C. L. (2002). Identification LgtC from Neisseria meningitidis in complex with and molecular cloning of a heparosan synthase from donor and acceptor sugar analogs. Nature Struct. Pasteurella multocida type D. J. Biol. Chem. 277, Biol. 8, 166–175. 7209–7213. 46. Gibbons, B. J., Roach, P. J. & Hurley, T. D. (2002). 38. DeAngelis, P. L. & Padgett-McCue, A. J. (2000). Crystal structure of the autocatalytic initiator of Identification and molecular cloning of a chondroitin glycogen biosynthesis, glycogenin. J. Mol. Biol. 319, synthase from Pasteurella multocida type F. J. Biol. 463–477. Chem. 275, 24124–24129. 47. Mulichak, A. M., Losey, H. C., Walsh, C. T. & 39. Iwai, H., Masaoka, N., Ishii, T. & Satoh, S. (2002). A Garavito, R. M. (2001). Structure of the UDP- pectin glucuronyltransferase gene is essential for glucosyltransferase GtfB that modifies the hepta- intercellular attachment in the plant meristem. Proc. peptide aglycone in the biosynthesis of vancomycin Natl Acad. Sci. USA, 99, 16319–16324. group antibiotics. Structure, 9, 547–557. 40. Pedersen, L. C., Dong, J., Taniguchi, F., Kitagawa, H., 48. Ha, S., Walker, D., Shi, Y. & Walker, S. (2000). The Krahn, J., Pedersen, L. G. et al. (2003). Crystal struc- 1.9 A˚ crystal structure of Escherichia coli MurG, a ture of an a 1, 4-N-acetylhexosaminyltransferase membrane-associated glycosyltransferase involved (EXTL2), a member of the exostosin gene family in peptidoglycan biosynthesis. Protein Sci. 9, involved in heparan sulfate biosynthesis. J. Biol. 1045–1052. Chem. In press. 49. Vrielink, A., Ruger, W., Driessen, H. P. & Freemont, 41. Hochstenbach, F., Klis, F. M., van den Ende, H., van P. S. (1994). Crystal structure of the DNA modifying Donselaar, E., Peters, P. J. & Klausner, R. D. (1998). enzyme b-glucosyltransferase in the presence and Identification of a putative alpha-glucan synthase absence of the substrate uridine diphosphoglucose. essential for cell wall construction and morpho- EMBO J. 13, 3413–3422. genesis in fission yeast. Proc. Natl Acad. Sci. USA, 95, 50. Goldsmith, E. J., Sprang, S. R., Hamlin, R., Xuong, 9161–9166. N. H. & Fletterick, R. J. (1989). Domain separation in 42. Charnock, S. J. & Davies, G. J. (1999). Structure of the the activation of glycogen phosphorylase a. Science, nucleotide-diphospho-sugar transferase, SpsA from 245, 528–532.

Edited by J. Thornton

(Received 13 January 2003; received in revised form 25 February 2003; accepted 28 February 2003) 本文献由“学霸图书馆-文献云下载”收集自网络,仅供学习交流使用。

学霸图书馆(www.xuebalib.com)是一个“整合众多图书馆数据库资源,

提供一站式文献检索和下载服务”的24 小时在线不限IP 图书馆。 图书馆致力于便利、促进学习与科研,提供最强文献下载服务。

图书馆导航:

图书馆首页 文献云下载 图书馆入口 外文数据库大全 疑难文献辅助工具