THE ALPHA-GALACTOSIDASE SUPERFAMILY: SEQUENCE BASED CLASSIFICATION of ALPHA-GALACTOSIDASES and RELATED GLYCOSIDASES Naumoff D.G
Total Page:16
File Type:pdf, Size:1020Kb
COMPUTATIONAL STRUCTURAL AND FUNCTIONAL PROTEOMICS THE ALPHA-GALACTOSIDASE SUPERFAMILY: SEQUENCE BASED CLASSIFICATION OF ALPHA-GALACTOSIDASES AND RELATED GLYCOSIDASES Naumoff D.G. State Institute for Genetics and Selection of Industrial Microorganisms, Moscow, Russia, e-mail: [email protected] Keywords: α-galactosidase, melibiase, glycoside hydrolase, GH-D clan, GH31 family, GHX family, COG1649, enzyme classification, protein family, protein phylogeny Summary Motivation: About 1 % of genes in genomes code enzymes with glycosidase activities. On the basis of sequence similarity all known glycosidases have been classified into 90 families. In many cases proteins of different families have common evolution origin. It makes necessary to combine the corresponding families into a superfamily. Results: Using of the PSI-BLAST program we found significant sequence similarity of several glycosidase families, two of which includes enzymes with the α galactosidase activity. Sequence homology, common catalytic mechanism, folding similarities, and composition of the active center allowed us to group three of these families – GH27, GH31, and GH36 – into the α-galactosidase superfamily. Phylogenetic analysis of this superfamily revealed polyphyletic origin of GH36 family, which could be divided into four families. Glycosidases of the α-galactosidase superfamily have a distant relationship with proteins belonging to families GH13, GH70, and GH77 of glycosidases, as well as with two families of predicted glycosidases. Introduction Glycoside hydrolases or glycosidases (EC 3.2.1.-) are a widespread group of enzymes, hydrolyzing the glycosidic bonds between two carbohydrates or between a carbohydrate and an aglycone moiety. A large multiplicity of these enzymes is a consequence of the extensive variety of their natural substrates: di-, oligo-, and polysaccharides. Comparative analysis of 300 amino acid sequences of glycosidases known at the beginning of the 1990s showed that they could be classified into 36 families. Recent progress in genome sequencing resulted in collecting of a huge number of enzymatically-uncharacterized proteins: about 1 % of all genes encode enzymes with predicted glycosidase activities. Currently, more than ten thousand sequences of glycosidases and their homologues are known. They are grouped into 91 families: GH1 GH95 (except GH21, GH40, GH41, and GH60). Several glycosidases do not have any homologues. They are included into a group of non-classified glycoside hydrolases. Glycosidases catalyze hydrolysis of the glycosidic bond of their substrates via two general mechanisms, leading to either inversion or overall retention of the anomeric configuration at the cleavage point. Some related families of glycosidases, having the same molecular mechanism of hydrolyzing reaction, have been combined into clans. Currently, 14 clans (GH-A–GH-L) are described, and in total they contain 46 families (see Carbohydrate-Active Enzymes server, http://afmb.cnrs-mrs.fr/CAZY/). Melibiases or α-galactosidases [E.C. 3.2.1.22] are glycosidases that cleave, with overall retention of the anomeric configuration, the terminal non-reducing α-D-galactose residues in α-D-galactosides, including galactose oligosaccharides, galactomannans, and galactolipids. On the basis of sequence similarity, all α-galactosidases have been classified into four families of glycosidases: GH4, GH27, GH36, and GH57. Families GH4 and GH57 mostly include other glycosidases. The majority known б-galactosidases belong to GH27 and GH36 families which 315 BGRS'2004 COMPUTATIONAL STRUCTURAL AND FUNCTIONAL PROTEOMICS BGRS 2004 form clan GH-D. Proteins of this clan have distant sequence similarity with representatives of several other families of glycosidases (Naumoff, 2001; Rigden, 2002). The recently established tertiary structure of several members of GH27 family is similar to the structure of retaining glycosidases from GH13 family (clan GH-H). Glycosidases of both families consist of the N-terminal catalytic (β/α)8-barrel domain and the C-terminal β-sandwich domain. Results and Discussion Sequences of the proteins belonging to the GH27 and GH36 families, according to the Carbohydrate- Active Enzymes classification, were used for BLAST screening of the GenPept database of amino acid sequences at NCBI server. The resulted database was enlarged by translation of nucleic acid sequences found by screening genomic sequences with the Genomic BLAST. In total we analyzed more than 300 proteins. Family GH27 includes representatives from Eukaryota (Alveolata, Fungi, Metazoa, Mycetozoa, Viridiplantae) and Bacteria (Actinobacteria, Bacteroidetes, Fibrobacteres, Firmicutes, Proteobacteria). They possess the α-galactosidase, isomalto-dextranase [E.C. 3.2.1.94], α-N-acetylgalactosaminidase [E.C. 3.2.1.49], and galactosyltransferase [E.C. 2.4.1. ] activities. Multiple sequence alignment of the full-length sequences of proteins from GH27 family shown that each protein has both domains characteristic of the family. Only three enzymatically- uncharacterized proteins contain solely the catalytic N-terminal domain. Some (mostly prokaryotic) proteins have additional domains, which we grouped into eight families by sequence homology. Pairwise sequence comparisons showed that the majority of GH27 proteins have higher then 30 % identity, meeting the criterion of glycosidase subfamilies. All these proteins were grouped into 27a subfamily. Another subfamily, 27b, included five enzymatically-uncharacterized proteins from plants and bacteria. Two fungal proteins, including one α galactosidase, were considered to be the only representatives of subfamily 27c. A unique isomalto-dextranase from Arthrobacter globiformis and two other bacterial proteins do not belong to any of the subfamilies. The largest subfamily 27a included three subgroups, each containing sequences with no less then 50 % identity. The subgroups comprised proteins of yeasts, plants, and chordates, respectively. Phylogenetic analysis of the GH27 family was used to study the evolutionary relationships of its members. Trees constructed by neighbor-joining and maximum parsimony methods (PHYLIP package) were topologically similar: all subfamilies (27a 27c) appear to form monophyletic groups with bootstrap value higher than 90 %. Eukaryotic proteins compose five distinct clusters of branches on the phylogenetic trees. PSI-BLAST searches (E-value was 0.001 or 0.01) with a few randomly selected divergent representatives of the GH27 family used as a query sequence during the first or second iteration revealed some representatives of GH31 and GH36 families of glycosidases. The further iterations yielded members of GH-H clan (it includes families GH13, GH70, and GH77). Also we found a number of bacterial enzymatically-uncharacterized hypothetical proteins from several genome projects. Sequence analysis allowed to group them into two distinct families. One of them is known as COG1649. Another includes a unique α-glucosidase [E.C. 3.2.1.20] SusB from Bacteroides thetaiotaomicron, which belongs to the group of non-classified glycoside hydrolases. We have found the latter family for the first time and named it as the GHX family. Statistically significant similarity of GH27 glycosidases with members of the other protein families was only within the N-terminal catalytic (β/α)8-barrel type domain. Families GH31 and GH36 includes representatives from Archaea, Bacteria, and Eukaryota. In addition to the α-galactosidase activity, α-N-acetylgalactosaminidase, stachyose synthase [E.C. 2.4.1.67], and raffinose synthase [E.C. 2.4.1.82] activities have been described for some members of GH36 family. Family GH31 includes retaining enzymes with α-glucosidase [E.C. 3.2.1.20], glucoamylase [E.C. 3.2.1.3], sucrase-isomaltase [E.C. 3.2.1.10 and E.C. 3.2.1.48], α-xylosidase 316 BGRS'2004 COMPUTATIONAL STRUCTURAL AND FUNCTIONAL PROTEOMICS BGRS 2004 [E.C. 3.2.1.-], α-glucan lyase [E.C. 4.2.2.13], and isomaltosyltransferase [E.C. 2.4.1. ] activities. Multiple protein sequence alignment allowed us to find that two key Asp residues, playing the roles of nucleophile and proton donor in the enzyme active center, are located in the homologous sites of the catalytic domain in proteins of GH27, GH31, and GH36 families. Based on sequence homology, composition of the active center, common catalytic mechanism with overall retention of the α-D-glycopyranoside anomeric configuration of substrate during the reaction catalyzed, and predicted common (β/α)8 TIM barrel-type tertiary structure of the catalytic domain, we combined GH27, GH31, and GH36 families into the α-galactosidase superfamily (Fig.). Phylogenetic analysis of proteins from the α-galactosidase superfamily showed that GH27 and GH31 appear to be monophyletic families and GH36 family is a polyphyletic one. Sequence analysis allowed us to distinguish in GH36 family four subgroups, which are monophyletic. We suggest considering these subgroups as four different families of glycosidases (GH36A-GH36D) belonging to the α-galactosidase superfamily (Fig.). Family GH36A includes proteins from Fungi and several phyla of Bacteria (Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria, Spirochaetes). Family GH36B contains only bacterial proteins (Actinobacteria, Proteobacteria, Spirochaetes, Thermotogales, Thermus). Among members of GH36A and GH36B families only the α-galactosidase activity has been shown. Family GH36C is composed by proteins from Archaea (Crenarchaeota), Bacteria (Actinobacteria, Bacteroidetes), and Eukaryota