An Evolving Hierarchical Family Classification for Glycosyltransferases
Total Page:16
File Type:pdf, Size:1020Kb
doi:10.1016/S0022-2836(03)00307-3 J. Mol. Biol. (2003) 328, 307–317 COMMUNICATION An Evolving Hierarchical Family Classification for Glycosyltransferases Pedro M. Coutinho1, Emeline Deleury1, Gideon J. Davies2 and Bernard Henrissat1* 1Architecture et Fonction des Glycosyltransferases are a ubiquitous group of enzymes that catalyse the Macromole´cules Biologiques transfer of a sugar moiety from an activated sugar donor onto saccharide UMR6098, CNRS and or non-saccharide acceptors. Although many glycosyltransferases catalyse Universite´s d’Aix-Marseille I chemically similar reactions, presumably through transition states with and II, 31 Chemin Joseph substantial oxocarbenium ion character, they display remarkable diversity Aiguier, 13402 Marseille in their donor, acceptor and product specificity and thereby generate a Cedex 20, France potentially infinite number of glycoconjugates, oligo- and polysacchar- ides. We have performed a comprehensive survey of glycosyltransferase- 2Structural Biology Laboratory related sequences (over 7200 to date) and present here a classification of Department of Chemistry, The these enzymes akin to that proposed previously for glycoside hydrolases, University of York, Heslington into a hierarchical system of families, clans, and folds. This evolving York YO10 5YW, UK classification rationalises structural and mechanistic investigation, harnesses information from a wide variety of related enzymes to inform cell biology and overcomes recurrent problems in the functional prediction of glycosyltransferase-related open-reading frames. q 2003 Elsevier Science Ltd. All rights reserved Keywords: glycosyltransferases; protein families; classification; modular *Corresponding author structure; genomic annotations The biosynthesis of complex carbohydrates and are involved in glycosyl transfer and thus conquer polysaccharides is of remarkable biological import- what Sharon has provocatively described as “the ance. These molecules govern a diverse range of last remaining frontier of molecular and cell cellular functions, including energy storage, cell- biology”.5 wall structure, cell–cell interactions and signalling, host–pathogen interactions, and protein glycosylation.1–3 Because these functions, The problems of current nomenclature especially those in which carbohydrate moieties act as a cellular language, rely on precise carbo- A vast number of glycosyltransferase sequences hydrate structures that display an extreme chemi- are unveiled by the sequencing of genomes. Cur- cal diversity, the biosynthesis of oligosaccharides rent estimates suggest that about 1% of the ORFs and polysaccharides may involve the action of of each genome is dedicated to the task of glyco- hundreds of different and selective glycosyl- sidic bond synthesis (P.M.C. & B.H., unpublished transferases, the enzymes that transfer sugar results). Furthermore, protein glycosylation, a moieties from activated donor molecules to specific glycosyltransferase-catalysed process, massively acceptor molecules (Figure 1).4 The challenge of the expands the functional proteome of higher post-genomic era is to dissect the myriad of open organisms. It is a huge drawback, and not merely reading frames (ORFs) whose encoded proteins to glycobiology, that glycosyltransferases have often proved extremely hard to characterise bio- chemically. This has resulted in a widening chasm Abbreviations used: CAZy, carbohydrate-active of ignorance separating the few enzymes of enzymes; CBM, carbohydrate-binding module; EC, Enzyme Commission; IUBMB, International Union of known biological activity (and even fewer with Biochemistry and Molecular Biology; ORF, open reading 3-D structures) from the thousands of putative frame. glycosyltransferase sequences now available in E-mail address of the corresponding author: databanks. To utilise the genomic resource to the [email protected] fullest, it is essential to understand the sequences 0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved 308 Family Classification of Glycosyltransferases sequences are available and several hundred are at various stages of completion†, compared to the 12 that were available in 1997. The sequencing of the genomes of several higher eukaryotes has further augmented the number of glycosyl- transferases, as many of these organisms contain hundreds of glycosyltransferases, reflecting the impact of glycosylation in cellular communication and differentiation.13 Furthermore, the growing impact of glycobiology has driven the functional Figure 1. “True” glycosyltransferases use sugar donors identification of many novel glycosyltransferases, in which the activating group is typically a (substituted) several of which are unrelated to the families phosphate such as a nucleotide (R ¼ nucleoside mono- originally described.8,12 Finally, our classification phosphate), nucleoside monophosphate (R ¼ has been extended to integrate glycosyltransferases ¼ nucleoside) or lipid phosphate (R lipid). The acceptor that utilise dolichol-phospho-sugars, sugar-1-phos- is shown in red as R0OH (R0 is extremely varied and can be a sugar, a lipid, a protein, an antibiotic, a nucleic phates, nucleoside monophospho-sugars and lipid acid, etc.). Catalysis may occur with two possible out- diphospho-sugars as activated donors in addition comes: inversion or retention of the anomeric configur- to nucleotide sugar-dependent enzymes: all these ation of the donor. enzymes perform an O-glycosylation reaction, pre- sumably through a transition state(s) with signifi- cant oxocarbenium-ion-like character, and of the enzymes themselves, and how these involving departure of a chemically equivalent sequences relate to enzyme structure, mechanism phospho-containing leaving group from their and specificity.6 activated sugar donors. Traditionally, glycosyltransferases, as with other The factors described previously allow a signifi- enzymes, have been classified on the basis of their cant and substantial expansion of the glycosyl- donor, acceptor and product specificity, according transferase family classification system to to the recommendations of the International encompass presently 65 distinct sequence-derived Union of Biochemistry and Molecular Biology families. The resultant grouping of enzymes with (IUBMB).7 There are, however, severe limitations different donor, acceptor or product specificity to the utility of this system for classification of into polyspecific families provides powerful glycosyltransferases. Such a scheme requires the insight into the divergent evolution of glycosyl- full characterisation of an enzyme’s donor, transferase families. Significantly, the discrimina- acceptor(s) and product(s) before an Enzyme Com- tory power of this classification system is such mission (EC) number can be assigned. EC numbers that the molecular mechanism is conserved within cannot adequately accommodate enzymes that act a given family; this provides the means for the on several distinct substrates and, furthermore, extrapolation of mechanistic information from the the EC numbers do not reflect the intrinsic few biochemically-characterised cases and pro- structural and mechanistic features of the enzymes, vides a powerful tool to inform genomic annota- nor were they ever intended to do so. tion and inspire strategies for the utilisation of Unfortunately, a classification scheme that genomic information. demands demonstration of function is ill-equipped for the post-genomic era. Enzymes not included in the classification The IUBMB classification features one class of Sequence families: historical perspectives glycosyltransferase not considered here: the enzymes that utilise disaccharides, oligosacchar- In order to overcome the limitations of the IUBMB system and to reflect the likely increase in ides or polysaccharides as sugar donors, such as cyclodextrin glucanotransferases (EC 2.4.1.19), dex- sequence data, Campbell and colleagues had transucrase (EC 2.4.1.5), xyloglucan endotrans- proposed the classification of glycosyltransferases into families on the basis of similarities in amino ferases (EC 2.4.1.207), etc. Unlike the 8 glycosyltransferases discussed here, these enzymes acid sequence, a scheme inspired by the analogous and widely accepted classification of glycoside are transglycosidases which are structurally, hydrolases.9–11 In 1997, 27 families of glycosyl- mechanistically and evolutionarily related to glyco- sidases. Transglycosidases are therefore included transferases were described based on the analysis 9–11 of the 600 sequences available at that time.8,12 in the glycosidase classifications. 1-Phosphosugar transferases, such as those Since then, the number of glycosyltransferase- that catalyse the conversion of a UDP-sugar related sequences has grown dramatically to over and dolichyl phosphate to UMP and a 7200 at present. The enormous increase in the number of sequences encoding glycosyltransferases reflects † See Genome OnLine at URL: several factors. Over 120 complete genome http://wit.integratedgenomics.com/GOLD/ Family Classification of Glycosyltransferases 309 Table 1. The updated family classification of glycosyltransferases Number of sequences Taxonomic Representative Family (12-Feb-2003) range Known activities Mechanism Clan Fold PDB code(s) GT1 708 A; B; V; Ea; b-Glucuronosyltransferase INV I GT-B 1IIR Ep; Ef; Ex 2-Hydroxyacylsphingosine b-galactosyltransferase N-Acylsphingosine b-galactosyltransferase Flavonol 3-O-b-glucosyltransferase Indole-3-acetate b-glucosyltransferase Sterol b-glucosyltransferase Ecdysteroid