The diversity of dolichol-linked precursors to Asn-linked glycans likely results from secondary loss of sets of

John Samuelson*†, Sulagna Banerjee*, Paula Magnelli*, Jike Cui*, Daniel J. Kelleher‡, Reid Gilmore‡, and Phillips W. Robbins*

*Department of Molecular and Cell Biology, Boston University Goldman School of Dental Medicine, 715 Albany Street, Boston, MA 02118-2932; and ‡Department of Biochemistry and Molecular Biology, University of Massachusetts Medical School, Worcester, MA 01665-0103

Contributed by Phillips W. Robbins, December 17, 2004 The vast majority of eukaryotes (fungi, plants, animals, slime mold, to N-glycans of improperly folded proteins, which are retained in and euglena) synthesize Asn-linked glycans (Alg) by means of a the ER by conserved glucose-binding lectins (calnexin͞calreticulin) lipid-linked precursor dolichol-PP-GlcNAc2Man9Glc3. Knowledge of (13). Although the Alg glycosyltransferases in the lumen of ER this pathway is important because defects in the glycosyltrans- appear to be eukaryote-specific, archaea and Campylobacter sp. ferases (Alg1–Alg12 and others not yet identified), which make glycosylate the sequon Asn and͞or contain glycosyltransferases dolichol-PP-glycans, lead to numerous congenital disorders of with domains like those of Alg1, Alg2, Alg7, and STT3 (1, 14–16). glycosylation. Here we used bioinformatic and experimental Protists, unicellular eukaryotes, suggest three notable exceptions methods to characterize Alg glycosyltransferases and dolichol- to the N-linked glycosylation path described in yeast and animals PP-glycans of diverse protists, including many human patho- (17). First, the kinetoplastid Trypanosoma cruzi (cause of Chagas gens, with the following major conclusions. First, it is demon- myocarditis), fails to glucosylate the dolichol-PP-linked precursor strated that common ancestry is a useful method of predicting and so makes dolichol-PP-GlcNAc2Man9 (18). Another kinetoplas- the Alg inventory of each eukaryote. Second, tid Leishmania mexicana (cause of skin ulcers) lacks the manno- in the vast majority of cases, this inventory accurately predicts the sylating activities of Alg9 and Alg12 and makes dolichol-PP- dolichol-PP-glycans observed. Third, Alg glycosyltransferases are GlcNAc2Man6 (Alg9 adds both the 7th and 9th Man residues) (18). missing in sets from each organism (e.g., all of the glycosyltrans- Second, Tetrahymena pyriformis, which is a free-living ciliate, lacks ferases that add glucose and mannose are absent from Giardia and all of the mannosylating activity in the ER lumen and makes Plasmodium). Fourth, dolichol-PP-GlcNAc2Man5 (present in Entam- dolichol-PP-GlcNAc2Man5Glc3 (19). Third, it has been difficult to oeba and Trichomonas) and dolichol-PP- and N-linked GlcNAc2 identify N-glycans from either Giardia lamblia (cause of diarrhea) (present in Giardia) have not been identified previously in wild- or Plasmodium falicparum (cause of severe malaria) (20–22). type organisms. Finally, the present diversity of protist and fungal Because these observations of unique protist glycans were dolichol-PP-linked glycans appears to result from secondary loss of made before identification of multiple Alg glycosyltransferases glycosyltransferases from a common ancestor that contained the and͞or whole-genome sequencing of these protists, numerous complete set of Alg glycosyltransferases. important questions remain concerning the diversity of N-glycan precursors among eukaryotes. First, are the same Alg glycosyl- evolution ͉ N-glycans ͉ protist conserved across all eukaryotes, and are these Alg glycosyltransferases specific to eukaryotes or are they also he majority of eukaryotes studied to date (fungi, plants, ani- present in prokaryotes? Second, are kinetoplastids [Trypano- Tmals, slime mold, and euglena) synthesize Asn-linked glycans soma cruzi and Trypanosoma brucei (cause of African sleeping (Alg) by means of a lipid-linked precursor dolichol-PP- sickness)] missing the encoding the glucosylating GlcNAc2Man9Glc3 (Fig. 1A) (1–3). Each of the 14 sugars is added or are the genes present but silent, and similarly, is Tetrahymena to the lipid-linked precursor by means of a specific glycosyltrans- missing the set of genes encoding mannosylating enzymes in the ferase (Alg1–Alg12 and others as yet specified), which were num- lumen of the ER? Third, what Alg glycosyltransferases are bered according to the order of their discovery rather than by the missing from Giardia and Plasmodium? Fourth, what is the sequence of enzymatic steps (4–6). Defects in these Alg glycosyl- diversity of predicted Alg glycosyltransferases of other protists transferases lead to numerous congenital disorders of glycosylation, (e.g., Entamoeba histolytica, Trichomonas vaginalis, Toxoplasma which can cause dysmorphic features and mental retardation (7). gondii, and Cryptosporidium parvum, which cause dysentery, We and others used the budding yeast Saccharomyces cerevisiae vaginitis, birth defects, and diarrhea, respectively), and what are mutants to characterize many but not all of the Alg glycosyltrans- the Alg glycosyltransferases of Encephalitozoon cuniculi (an ferases, which are present on the cytosolic aspect of the ER. These opportunistic fungus with a dramatically reduced genome) and glycosyltransferases include Alg7, which adds phospho-GlcNAc to Cryptococcus neoformans (an opportunistic fungus that is dis- dolichol phosphate, and Alg1, Alg2, and Alg11, which add the first, tantly related to Saccharomyces)? Fifth, do the predicted Alg second, and fifth Man residues, respectively (Fig. 1A). Dolichol- glycosyltransferases correlate with the dolichol-PP-linked gly- PP-GlcNAc2Man5 is flipped into the lumen of the ER by a flippase cans of each protist or fungus? Sixth, does the pattern distribu- (Rft1) (8). Within the ER lumen, Alg3, Alg9, tion of Alg glycosyltransferases across protists, fungi, and meta- and Alg12 make dolichol-PP-GlcNAc2Man9 by using dolichol-P- zoa suggest whether these glycosyltransferases have been added Man (made by Dpm1) as the sugar donor (Fig. 1A) (1, 9). Also to or lost from eukaryotes during their evolution (23, 24)? within the ER lumen, Alg6, Alg8, and Alg10 make dolichol-PP-GlcNAc2Man9Glc3 by using dolichol-P-Glc (made by Alg5) as the sugar donor (Fig. 1A). Freely available online through the PNAS open access option. An oligosaccharyltransferase (OST), which contains a catalytic Abbreviations: Alg, Asn-linked glycan; OST; oligosaccharyltransferase; TMH, transmem- peptide, STT3, transfers the dolichol-PP-linked oligosaccharide to brane helix. ‘‘sequon’’ Asn residues (N-X-T͞S) on nascent peptides (Fig. 1A) †To whom correspondence should be addressed: E-mail: [email protected]. (10–12). A UDP-Glc:glycoprotein adds glucose © 2005 by The National Academy of Sciences of the USA

1548–1553 ͉ PNAS ͉ February 1, 2005 ͉ vol. 102 ͉ no. 5 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0409460102 Downloaded by guest on September 26, 2021 Fig. 1. The inventory of Alg glycosyltransferases and predicted dolichol-linked glycans vary dramatically among protists and fungi. Predicted Alg glycosyl- transferases and dolichol-linked glycans of Saccharomyces cerevisiae, Homo sapiens, and Dictyostelium discoideum (A), Trypanosoma cruzi, Trypanosoma brucei, Leishmania major, and Cryptococcus neoformans (B), Tetrahymena thermophilia, Toxoplasma gondii, and Cryptosporidium parvum (C), Entamoeba histolytica and Trichomonas vaginalis (D), Plasmodium falciparum and Giardia lamblia (E), and Encephalitozoon cuniculi (F) (see also Table 1). With the exceptions of Saccharomyces and Homo, sets of Alg glycosyltransferases were identified here. Names of organisms, whose dolichol-PP-linked glycans were previously identified (e.g., Saccharomyces), are indicated in black. Names of organisms, whose dolichol-PP-linked glycans were identified here (e.g., Cryptococcus), are indicated in

red. Names of organisms, whose dolichol-PP-linked glycans have not yet been identified (e.g., Cryptosporidium), are indicated in green. EVOLUTION

14 Materials and Methods mutant incubated with C-Man, GlcNAc2Man9 from a Saccha- ⌬ Use of Bioinformatics to Identify Alg Glycosyltransferases from Di- romyces mutant, and unlabeled GlcNAc and GlcNAc2. verse Protists and Fungi. Twelve Saccharomyces Alg glycosyltrans- For identification of N-glycans, Entamoeba, Trichomonas, and ferases, DPM1, and STT3 were used to search predicted proteins Giardia were labeled with mannose and GlcN for2hinmedium of eukaryotes, which have been sequenced in their entirety or containing 0.1% glucose before washing and lyophilization. The ͞ ͞ near entirety (Plasmodium falciparum, Encephalitozoon cuniculi, dry-cell pellet was delipidated with chloroform methanol water, Cryptosporidium parvum, Giardia lamblia, Homo sapiens, and glycosylphosphatidylinositol precursors were removed with water- Arabidopsis thaliana)intheNR protein database of the National saturated butanol, and glycogen and other free glycans were removed with 50% methanol (35). The clean pellet was finely Center for Biotechnology Information by using PSI-BLAST (25– resuspended by using a manual homogenizer in 500 ␮l Tris⅐HCl 30). BLASTP or TBLASTN searches of Entamoeba histolytica, Trichomonas vaginalis, Trypanosoma brucei, Trypanosoma cruzi, (0.1M, pH 8) and incubated with 50 milliunits of peptide-N- and Tetrhymena thermophilia were performed at web sites man- glycosidase F (PNGaseF) at 37°C for 16 h. Negative controls omitted the PNGaseF, and the peptide:N-glycanase supernatant aged by The Institute for Genomic Research (www.tigr.org). was chromatographed on a P-4 column. Radioactivity was mea- Predicted proteins of the slime mold Dictyostelium discoideum sured by scintillation counting, and peaks were isolated for treat- and kinetoplastid Leishmania major were performed on the ment with glycosidases. The putative GlcNAc2Man5 from Entam- Sanger Institute GENEDB web site (www.genedb.org), and the oeba and Trichomonas and putative GlcNAc2Man7 from apicomplexan Toxoplasma gondii-predicted proteins were ␣ ͞͞ Cryptococcus were treated with -1,2-mannosidase, whereas the searched at TOXODB (http: ToxoDB.org). Transmembrane he- putative GlcNAc of Giardia was cleaved with chitiobiase. lices were predicted by using the Phobius combined transmem- 2 brane topology and signal peptide predictor (31). In Vitro Synthesis of Glycopeptides by Using Intact Protist Membranes Alignments of protein sequences were made by using CLUSTALW as a Source of Dolichol-PP-Glycans. Total cellular membranes were ͞͞ ͞ (http: www.ebi.ac.uk clustalw), and manual adjustments and prepared from cultures of Trichomonas, Entamoeba, and Cryp- trimming of the alignments were performed with JALVIEW (32). tococcus. The membranes were incubated for 2–90 min at 37°C Phylogenetic trees were constructed from the positional variation with the membrane permeable tripeptide acceptor, 5 ␮M with maximum likelihood by using quartet puzzling (33, 34). 125 N␣-Ac-Asn-[ I]-Tyr-Thr-NH2 (NYT), in the presence of deoxynojiromycin to ensure that the glycopeptide products Identification of Dolichol-PP-Linked Precursors and N-Linked Glycans. were not degraded by glucosidases I and II (36). Glycopeptide Genome project strains of Entamoeba histolytica, Trichomonas products were collected by binding to immobilized Con A and vaginalis, Giardia lamblia and Cryptococcus neoformans were separated on HPLC by using standards from Saccharomyces ␮ ϭ grown axenically and labeled with 200 Ci (1 Ci 37 GBq) that included Man5GlcNAc2-NYT, Man9GlcNAc2-NYT, 3 3 3 [2- H]Man, [6- H]GlcN, or [ H]Glc in a Glc-free medium for 10 Glc3Man5GlcNAc2-NYT, and Glc3Man9GlcNAc2-NYT (37). min in a final volume of 250 ␮l (6). Dolichol-PP-linked glycans were extracted with chloroform͞methanol͞water, dried, and Results and Discussion hydrolyzed in 0.1 M HCl for 45 min at 90°C. Glycans were A Common Ancestor of Eukaryotes and Archaea May Have Contained neutralized and resuspended in 0.1 M acetic acid and 1% butanol STT3 and Alg7, but the Remaining Alg Glycosyltransferases Appear to for separation on a 1-m Biogel P-4 superfine column (BioRad). Be Eukaryote-Specific. Similarities between eukaryotic cytosolic Alg Standards were GlcNAc2Man5 from a Saccharomyces ⌬ glycosyltransferases (Alg1, Alg2, and Alg7) and STT3 and their

Samuelson et al. PNAS ͉ February 1, 2005 ͉ vol. 102 ͉ no. 5 ͉ 1549 Downloaded by guest on September 26, 2021 Table 1. Predicted Alg glycosyltransferases of representative eukaryotes Cytosol GlcNAc Cytosol Man ER lumen Man ER lumen Glc OST Genus͞species abbreviation Alg7 Alg1 Alg2 Alg11 Rft1 Alg3 Alg9 Alg12 Dpm1 Alg5 Alg6 Alg8 Alg10 STT3*

Sc͞Hs͞Dd yes yes yes yes yes yes yes yes yes yes yes yes yes 1͞2͞1 Tb͞Tc͞Lm͞Cn yes yes yes yes yes yes yes y͞y͞n͞y yes no no no no 3͞1͞4͞1 Eh͞Tv yes yes yes yes yes no no no y͞nn͞ynonono2͞3 Tt͞Cp͞Tg yes yes yes y͞y͞ny͞y͞n no no no yes yes yes y͞n͞yy͞n͞y1͞1͞1 Pf͞Gl yes no no no no no no no yes no no no no 1͞1 Ec no no no no no no no no yes no no no no 0

Sc, Saccharomyces cerevisiae; Hs, Homo sapiens; Dd, Dictyostelium discoideum; Tb, Trypanosoma brucei; Tc, Trypanosoma cruzi; Lm, Leishmania mexicana; Cn, Cryptococcus neoformans; Eh, Entamoeba histolytica; Tv, Trichomonas vaginalis; Tt, Tetrahymena thermophilia; Tg, Toxoplasma gondii; Cp, Cryptosp parvum; Pf, Plasmodium falciparum; Gl, Giardia lamblia; Ec, Encephalitozoon cuniculi. *No. of STT3 subunits in each organism.

prokaryotic counterparts suggest their common origin (1, 24), but karyotic Alg1, Alg2 and Alg11 to archaeal and bacterial glycosyl- phylogenetic methods have not been used to test this idea. STT3 is transferases, however, is unresolved, suggesting that cytosolic man- present in all eukaryotes examined except Encephalitozoon (Fig. 1, nosylating enzymes are eukaryote-specific and that their precise Table 1, and see below) and is present in multiple copies in some origins are not clear. These results then do not support the recent protists [e.g., two in Entamoeba, three in Trichomonas and Trypano- hypothesis that the set of cytosolic Alg glycosyltransferases were soma brucei, and four in Leishmania (data not shown)]. Homo- present in an archaeal ancestor of eukaryotes (24). logues of STT3 are also present in the bacterium Campylobacter In the case of Alg5 and Dpm1 that make dolichol-P-Glc and jejuni and both divisions of archaea, euryarchaeota and crenarcha- dolichol-P-Man, respectively (9), phylogenetic analyses show Alg5 eota (16). The hydrophobicity plots of the eukaryotic and prokary- and Dpm1 form distinct clades, which are well supported by otic STT3 closely resemble each other, each containing 10 to 15 bootstrap values (data not shown). The Dpm1 clade is itself divided predicted transmembrane helices (data not shown). In addition, into two clades. Dpm1 clade A contains enzymes with a C-terminal eukaryotic STT3 show a 25–45% positional identity with each other transmembrane helix (TMH) (Saccharomyces, Entamoeba, over a Ϸ700-amino acid (90%) overlap and a 19–23% positional Trypanosoma, and Leishmania). Dpm1 clade B contains enzymes identity with prokaryotic STT3 over a Ϸ600-amino acid (80%) with no TMH (plants, animals, and fungi) or an N-terminal TMH overlap. Phylogenetic analyses of STT3 show distinct eukaryotic (Plasmodium). Numerous organisms lacking TMH in their Dpm1 and archael clades, although it was not possible to determine have Dpm2 homologues, which contain two predicted TMH and whether eukaryotic STT3 are more similar to homologues of associate with Dpm1 (data not shown). Conversely, Saccharomyces, euryarchaeota or crenarchaeota (data not shown). The Campy- Plasmodium, and Entamoeba, which contain TMH in their Dpm1, lobacter STT3 appears to have been laterally transferred from are missing Dpm2 homologues. Remarkably Trichomonas contains archaea. These sequence comparisons, as well as the demonstration no Dpm1 but has multiple copies of Alg5. of OST activity of Campylobacter STT3 to an N-X-T͞S sequon (16), Phylogenetic methods also show that Alg glycosyltransferases in are consistent with the idea that a common ancestor to eukaryotes the lumen of the ER, which are unique to eukaryotes but are often and archaea contained STT3. similar to each other (e.g., Alg6 and Alg8 or Alg9 and Alg12), may Alg7, which is a UDP-GlcNAc:dolichol-phosphate GlcNAc- be clearly identified from diverse eukaryotes (data not shown) (15). 1-phosphate , is the first in the synthesis of Alg3 and Alg10, which are present only in eukaryotes, are not dolichol-PP-linked glycans and is present in all eukaryotes similar to other Alg glycosyltransferases. In organisms that contain examined except Encephalitozoon (Fig. 1, Table 1, and see them, luminal Alg glycosyltransferases are single-copy. These re- below). Proteins similar to Alg7 are predicted from whole- sults suggest the possibility (tested in the next four sections) that Alg genome sequences of some but not all archaea and bacteria. The glycosyltransferase repertoire may be used to correctly predict the hydrophobicity plots of the eukaryotic and prokaryotic Alg7 dolichol-PP-linked glycans made by each organism. closely resemble each other, each containing 8 to 12 predicted transmembrane helices (data not shown). Eukaryotic Alg7 show Sets of Alg Glycosyltransferases Correlate Precisely with Known a 28–40% positional identity with each other over an Ϸ310- Dolichol-PP-Linked Glycans. Alg glycosyltransferases were examined amino acid (80%) overlap and show an Ϸ19–23% positional first from organisms from which dolichol-PP-linked precursors identity with prokaryotic sequences over a 240-amino acid (60%) have been characterized. The slime mold Dictyostelium discoideum, overlap. Phylogenetic analyses of Alg7 show distinct eukaryotic, which makes dolichol-PP-GlcNAc2Man9Glc3, contains all 12 Alg archaeal, and bacterial clades (Fig. 2A). Although eukaryote glycosyltransferases that have been molecularly characterized (Fig. Alg7 are much more similar to archaeal than bacterial homo- 1A and Table 1) (1, 3). In contrast, Trypanosoma cruzi, which makes logues, it was again not possible to determine whether eukaryotic dolichol-PP-GlcNAc2Man9, and Trypanosoma brucei are missing Alg7 was more similar to those of euryarchaeota or crenarcha- the set of genes encoding glucosylating enzymes in the ER lumen eota. These results and the presence of dolichol-PP-linked (Fig. 1B) (18). Leishmania major, which causes visceral leishman- glycans in archaea (14) are consistent with the idea that a iasis, is also missing the Alg12 gene and likely makes dolichol-PP- common ancestor to eukaryotes and archaea contained Alg7. GlcNAc2Man7. The ciliate Tetrahymena thermophilia, which makes In the case of Alg1, Alg2, and Alg11, which add the 1st-, 2nd-, and dolichol-PP-GlcNAc2Man5Glc3 as described for Tetrahymena pyri- 5th-mannose residues to the dolichol-PP-linked precursor, respec- formis (ref. 19; unpublished data), is lacking the set of genes tively, Ͻ50% of each eukaryotic protein is alignable with homo- encoding mannosylating enzymes in the ER lumen (Fig. 1C). logues of prokaryotes, and Alg1, Alg2, and Alg11 each have Plasmodium falciparum, from which it has been difficult to identify transmembrane domains that are absent from prokaryotic glyco- N-glycans, is missing all of the Alg glycosyltransferases except Alg7 syltransferases (data not shown). Phylogenetic analyses show eu- and STT3 (Fig. 1E) (21, 22). The absence of the other Alg karyotic Alg1, Alg2, and Alg11 form distinct clades, which are well glycosyltransferases makes it likely that putative mannosylated supported by bootstrap values (Fig. 2B). The relationship of eu- N-glycans of Plasmodium were contaminants from host cells (38).

1550 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0409460102 Samuelson et al. Downloaded by guest on September 26, 2021 Fig. 2. Common ancestry is a useful method of predicting the Alg glycotransferase inventory of each eukaryote. Phylogenetic reconstructions by using the maximum likelihood method of representative eukaryotic and prokaryotic Alg7 (A) and Alg1, Alg2, and Alg11 (B). Branch lengths are proportionate to differences between sequences, and numbers at nodes indicate bootstrap values for 100 replicates. Eukaryotes include Arabidopsis thaliana (At), Cryptococcus neoformans (Cn), Cryptosporidium parvum (Cp), Dictyostelium discoideum (Dd), Entamoeba histolytica (Eh), Giardia lamblia (Gl), Homo sapiens (Hs), Leishmania

major (Lm), Plasmodium falciparum (Pf), Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Tetrahymena thermophilia (Tt), Toxoplasma gondii EVOLUTION (Tg), Trichomonas vaginalis (Tv), Trypanosoma brucei (Tb), and Trypanosoma cruzi (Tc). Archaea include Euryarchaeota Archaeoglobus fulgidus (Af), Ferroplasma acidarmanus (Fa), Methanococcoides burtonii (Mb), Methanococcus jannaschii (Mj), Methanopyrus kandleri (Mk), Methanosarcina mazei (Mm), Picrophilus torridus (Pt), Pyrococcus abyssi (Pab), Pyrococcus furiosus (Pfu), Pyrococcus horikoshii (Ph), and Crenarchaeota Pyrobaculum aerophilum (Pae), Sulfolobus solfataricus (Ss), and Sulfolobus tokodaii (St). Bacteria include Actinobacillus actinomycetemcomitans (Aa), Bifidobacterium longum (Bl), Borrelia garinii (Bg), Burkholderia cepacia (Bc), Clostridium acetobutylicum (Ca), Enterococcus hirae (Ehi), Listeria innocua (Li), Staphylococcus aureus (Sa) Streptococcus pneumoniae (Spn), Synechococcus elongates (Se), Thermus thermophilus (Tth), and Tropheryma whipplei (Tw). Not all organisms are present in each tree.

These results suggest that the absence of glycosyltransferase activity Entamoeba and Trichomonas Make the Predicted Dolichol-PP- (18, 19) is caused by the absence of the gene encoding the enzyme GlcNAc2Man5, Whereas Cryptococcus Makes Dolichol-PP-GlcNAc2- rather than an inactive gene so that sets of predicted Alg glycosyl- Man7–9. The major dolichol-PP-linked glycan of Entamoeba his- transferases correlate precisely with experimentally determined tolytica and Trichomonas vaginalis contains GlcNAc2Man5, dolichol-PP-linked glycans. which was predicted from the Alg glycosyltransferases of these protists (Fig. 3 A and B). Each peak digests with ␣-1,2- Whole-Genome Sequences of Select Protists and Fungi Predict Addi- mannosidase to GlcNAc2Man3 and mannose (data not shown). tional Diversity in Alg Glycosyltransferases and Subsequently Doli- To show that the labeled dolichol-PP-linked precursor is the chol-PP-Linked Glycans. Like kinetoplastids, the fungus Cryptococcus same as that transferred to nascent peptides by the OST, neoformans is lacking the set of glucosylating enzymes in the ER membranes of Entamoeba and Trichomonas were incubated with lumen and is predicted to make dolichol-PP-GlcNAc2Man9 (Fig. an iodinated tripeptide NYT. As expected, the products of 1B, Table 1, and see below) (18). Like Tetrahymena to which it is Entamoeba and Trichomonas in vitro comigrate with the similar, Toxoplasma gondii and Cryptosporidium parvum are missing GlcNAc2Man5 standard (Fig. 3 D and E). Finally, when Enta- the set of luminal mannosylating enzymes (Fig. 1C) (19). Crypto- moeba and Trichomonas are briefly labeled with 3H-Man in vivo sporidium is also missing Alg8 and Alg10 and likely makes dolichol- and N-glycans are released with PNGaseF, a major product is PP-GlcNAc Man Glc (Table 1). Entamoeba histolytica and 2 5 GlcNAc Man (data not shown). Trichomonas vaginalis are missing sets of luminal Alg glycosyltrans- 2 5 Although we have not isolated each Alg enzyme individually and ferases that add Man and Glc to lipid-linked precursors and likely have not shown its glycosyltransferase activity in vitro, correct make dolichol-PP-GlcNAc2Man5 (Fig. 1D and see below). Like Plasmodium, Giardia lamblia is missing all Alg glycosyl- predictions of previously uncharacterized dolichol-PP-linked gly- transferases except Alg7 and STT3 (Fig. 1E, Table 1, and see cans of Entamoeba and Trichomonas strongly suggest each Alg below). The presence of Rft1 in all eukaryotes except Giardia, glycosyltransferase is functioning as expected. The presence of Plasmodium, and Encephalitozoon (see below) supports the idea UDP-Glc:glycoprotein glucosyltransferase, glucosidase II, calreti- ͞ that Rft1 flips dolichol-PP-GlcNAc Man into the lumen of the culin calnexin, and ERGIC-53 in Entamoeba and Trichomonas 2 5 ͞ ER (8). Why Toxoplasma is also missing Rft1 is not clear. (unpublished data) suggests these enzymes and or lectins function Encephalitozoon cuniculi, whose genome has been sequenced with N-glycans built on GlcNAc2Man5 rather than the usual in its entirety, lacks all Alg glycosyltransferases and is missing GlcNAc2Man9 (13, 24). STT3 (Fig. 1F and Table 1) (28). Consistent with the absence of When the fungus Cryptococcus neoformans is labeled with 3 N-glycans, Encephalitozoon, like Plasmodium and Giardia,is H-Man, the major peak on P-4 runs with GlcNAc2Man7–8, missing UDP-Glc:glycoprotein glucosyltransferase, glucosidases whereas a minor peak runs with GlcNAc2Man9, which was I and II, calreticulin͞calnexin, ERGIC-53, and ␣-1,2- predicted from their complement of Alg genes (Fig. 3C). How- mannosidases, which operate on N-linked glycans in the ER and ever, GlcNAc2Man9 is the most abundant glycan transferred to Golgi apparatus (unpublished data) (13, 24). the iodinated peptide in vitro (Fig. 3F).

Samuelson et al. PNAS ͉ February 1, 2005 ͉ vol. 102 ͉ no. 5 ͉ 1551 Downloaded by guest on September 26, 2021 Fig. 3. Predicted Trichomonas, Entamoeba, and Cryptococcus dolichol-PP-glycans were identified in vivo and in vitro. Dolichol-PP-linked precursors from Trichomonas vaginalis (A), Entamoeba histolytica (B), and Cryptococcus neoformans (C), as well as in vitro OST assays by using membranes from Trichomonas (D), Entamoeba (E), and Cryptococcus (F). In A–C, glycans were labeled in vivo with [3H]Man and separated on a P-4 column, whereas in D–F, glycans were transferred to a radio-iodinated tripeptide NYT in vitro, captured with Con A, and separated by HPLC.

Giardia Dolichol-PP- and N-Linked Glycans Contain GlcNAc1–2. No 5A) or from secondary loss of sets of Alg glycosyltransferases Giardia lamblia dolichol-PP- or N-linked glycans are labeled with (Fig. 5B) (23, 24). Fig. 5A is drawn so that organisms with the 3H-Man, consistent with the absence of cytosolic Alg glycosyl- least complex N-glycans are at the bottom, whereas those with transferases that add Man to dolichol-PP-linked precursors (Fig. the most complex glycans are at the top. Each step in Fig. 5A 1E and Table 1). In contrast, when Giardia is labeled with indicates the addition of a particular set of sugars to the N-glycan 3H-GlcN, dolichol-PP- and N-linked glycans include GlcNAc precursor. For example, an ancestral eukaryote like Encephali- and GlcNAc2 (diacetylchitobiose) (solid lines in Figs. 4A and tozoon lacked all dolichol-PP-linked precursors (step 1) until 4B). As expected, Giardia dolichol-PP- and N-linked GlcNAc2 STT3 and Alg7 were obtained to make organisms like Giardia are cleaved with chitobiase to GlcNAc (dotted lines in Fig. 4 A and Plasmodium (step 2). Subsequently cytosolic mannosylating and B). These results, which suggest there is no further modifi- cation of N-glycans in the Golgi apparatus of Giardia, are consistent with (i) binding of wheat germ agglutinin to the surface of Giardia (39), and (ii) the prediction that Ϸ90 secreted proteins of Giardia have Ն10 predicted sites for N-linked glycosylation (unpublished data) (10). These results suggest that the Alg enzyme that makes dolichol-PP-GlcNAc2, which has not yet been molecularly characterized, is present in Giardia (1).

Origins of Eukaryotic Alg Glycosyltransferase Diversity. The present diversity of fungal and protist precursors for N-glycosylation may have resulted from development of an increasingly complex series of Alg glycosyltransferases with eukaryotic evolution (Fig.

Fig. 4. Giardia dolichol- and N-linked glycans are composed of GlcNAc and Fig. 5. The present diversity of protist and fungal dolichol-PP-lined glycans GlcNAc2. Dolichol-PP-linked glycans (A) and N-linked glycans (B)ofGiardia appears to result from secondary loss glycosyltransferases from a common lamblia, each labeled in vivo with [3H]GlcN and separated on a P-4 column. ancestor that contained the complete set of Alg glycosyltransferases. Models Solid lines indicate labeled products, whereas dotted lines indicate products of suggesting sequential addition (A) or secondary loss (B) of Alg glycosyltrans- digestion of the excised GlcNAc2 peak with chitobiase. Giardia dolichol- and ferases during eukaryotic evolution are shown. Nodes, which are labeled N-linked glycans are composed of GlcNAc and GlcNAc2. numerically in A and alphabetically in B, are explained in the text.

1552 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0409460102 Samuelson et al. Downloaded by guest on September 26, 2021 enzymes were added to make ancestors resembling Entamoeba because of poor resolution at the base of the eukaryotic phylo- and Trichomonas (step 3), followed by the addition of luminal genetic tree, suggests that a common eukaryotic ancestor con- mannosylating enzymes as in Trypanosoma and Cryptococcus tained Alg7 and STT3. In this hybrid model, Giardia branched (step 4) (18) or of luminal glucosylating enzymes as in Tetrahy- off before acquisition of mannosylating and glucosylating en- mena and Cryptosporidium (step 5) (19). The final result was zymes. Still, secondary losses of Alg glycosyltransferases at nodes organisms with an entire set of Alg glycosyltransferases and a a–d remain a major factor in the diversity of N-glycan precursors complete 14-sugar dolichol-PP-linked precursor (step 6) as in among protists and fungi. Similarly, secondary loss explains the Saccharomyces, Euglena, Dictyostelium, animals, and plants (1). absence of most mitochondrial function in microaerophilic pro- The difficulties with the Fig. 5A model are (i) Alg7 and STT3 tists such as Giardia, Entamoeba, and Trichomonas (44, 45). appear to have been present in an ancestor to both eukaryotes and prokaryotes and should be present in Encephalitozoon (16), Significance. The bioinformatic approach here allowed us to quickly (ii) it is impossible to determine whether luminal mannosylation and accurately predict the dolichol-PP-glycans made by each protist (step 4) came before or after luminal glucosylation (step 5), and and fungus, which we confirmed for free-living organisms (Giardia, (iii) the model is in disagreement with rRNA and protein Entamoeba, Trichomonas, and Cryptococcus) and have yet to con- phylogenies, which do not place Encephalitozoon at the base of firm for intracellular pathogens (Plasmodium, Toxoplasma, Cryp- the phylogenetic tree and do not pair Giardia with Plasmodium, tosporidium, and Encephalitozoon). Although it has previously been Entamoeba with Trichomonas, Trypanosoma with Cryptococcus, assumed that the diversity of eukaryotic N-glycans results primarily or Saccharomyces with Euglena (40–42). from differential modification of a common GlcNAc2Man9Glc3 In the Fig. 5B model, which groups organisms according to precursor in the ER and Golgi apparatus (1, 24), these results rRNA and protein phylogenies, the distribution of Alg glyco- suggest that there are major differences in the dolichol-PP-glycans syltransferases is best rationalized by secondary loss (23). At transferred to the nascent peptide. Secondary losses of Alg glyco- node a (fungi), there is loss of luminal glucosylating enzymes in syltransferases and mitochondrial function (44, 45) suggest all Cryptococcus and all Alg glycosyltransferases in Encephalito- extant eukaryotes may derive from a relatively complex last com- zoon, whereas at node b (amebozoa), all luminal Alg glycosyl- mon ancestor, and that simple, deeply branching eukaryotes with a transferases are lost from Entamoeba. At node c (ciliates and primary absence of important biochemical pathways may no longer apicomplexa), some luminal glucosylating enzymes are lost from exist (23). It remains to be determined how the diversity of Cryptosporidium, whereas all luminal glucosylating enzymes and dolichol-PP-glycans effects OST function, protein-folding in the EVOLUTION all cytosolic mannosylating enzymes are lost from Plasmodium. ER, and modification of glycans in the Golgi apparatus, as well as At node d (kinetoplastids and Euglena), there is loss of luminal the antigenicity of glycoproteins on surfaces of these important glucosylating enzymes from Trypanosoma and an additional loss human pathogens. of one mannosylating enzyme from Leishmania major.Ifone assumes that Trypanosoma, Giardia, and Trichomonas all We thank Ann-Marie Surette and Charles Specht for help with protist branched at the same time from the base of the tree (node e), and fungal cultures; Kosuke Hashimoto for help with collection and there are additional secondary losses from Giardia and alignment of Alg sequences; Prashanth Vishwanath for advice on phylogenetic methods; investigators at The Institute for Genomic Re- Trichomonas. Finally, if one assumes the ‘‘big bang’’ hypothesis search and The Sanger Institute for release of preliminary sequence data for eukaryotic origins (43), the common ancestor must have had for numerous protists and fungi; and Temple Smith and Armando Parodi all of the Alg glycosyltransferases, and all of the differences for their comments on this manuscript. This work was supported in part among extant eukaryotes are due to secondary loss. by National Institutes of Health Grants AI44070 and AI48082 (to J.S.), A hybrid of the Fig. 5 models, which we cannot rule out GM43768 (to R.G.), and GM31318 (to P.W.R.).

1. Burda, P. & Aebi, M. (1999) Biochim. Biophys. Acta. 1426, 239–257. 27. Gardner, M. J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R. W., Carlton, 2. de la Canal, L. & Parodi, A. J. (1985) Comp. Biochem. Physiol. B Biochem. Mol. Biol. J. M., Pain, A., Nelson, K. E., Bowman, S. et al. (2002) Nature 419, 498–511. 81, 803–805. 28. Katinka, M. D., Duprat, S., Cornillot, E., Metenier, G., Thomarat, F., Prensier, G., 3. Ivatt, R. L., Das, O. P., Henderson, E. J. & Robbins, P. W. (1984) Cell 38, 561–567. Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P., et al. (2001) Nature 414, 4. Huffaker, T. & Robbins, P. W. (1983) Proc. Natl. Acad. Sci. USA 80, 7466–7470. 450–453. 5. Reiss, G., te Heesen, S., Zimmerman, J., Robbins, P. W. & Aebi, M. (1996) 29. McArthur, A. G., Morrison, H. G., Nixon, J. E. J., Passamaneck, N. Q. E., Kim, U., Glycobiology 6, 493–498. Hinkle, G., Crocker, M. K., Holder, M. E., Farr, R., Reich, C. I., et al. (2000) FEMS 6. Cipollo, J. F., Trimble, R. B., Chi, J. H., Yan, Q. & Dean, N. (2001) J. Biol. Chem. Microbiol. Lett. 189, 271–273. 276, 21828–21840. 30. Mewes, H. W., Albermann, K., Bahr, M., Frishman, D., Gleissner, A., Hani, J., 7. Aebi, M. & Hennet, T. (2001) Trends Cell Biol. 11, 136–141. Heumann, K., Kleine, K., Maierl, A., Oliver, S. G., et al. (1997) Nature 387, 7–65. 8. Helenius, J., Ng, D. T., Marolda, C. L., Walter, P., Valvano, M. A. & Aebi, M. (2002) Nature 415, 447–450. 31. Kall, L., Krogh, A. & Sonnhammer, E. L. (2004) J. Mol. Biol. 338, 1027–1036. 9. Orlean, P., Albright, C. & Robbins, P. W. (1988) J. Biol. Chem. 263, 17499–17507. 32. Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22, 10. Kornfeld, R. & Kornfeld, S. (1985) Annu. Rev. Biochem. 54, 631–664. 4673–4680. 11. Silberstein, S. & Gilmore, R. (1996) FASEB J. 10, 849–858. 33. Jones, D. T., Taylor, W. R. & Thornton, J. M. (1992) Comput. Appl. Biosci. 8, 275–282. 12. Yan, Q. & Lennarz, W. J. (2002) J. Biol. Chem. 277, 47692–47700. 34. Strimmer, K. & Von Haeseler, A. (1997) Proc. Natl. Acad. Sci. USA 94, 6815–6819. 13. Parodi, A. J. (2000) Biochem. J. 348, 1–13. 35. McConville, M. J., Thomas-Oates, J. E., Ferguson, M. A. J. & Homans, S. W. J. Biol. 14. Lechner, J. & Wieland, F. (1989) Annu. Rev. Biochem. 58, 173–194. Chem. 265, 19611–19623. 15. Oriol, R., Martinez-Duncker, I., Chantret, I., Mollicone, R. & Codogno, P. (2002) 36. Kelleher, D. J., Kreibich, G. & Gilmore, R. (1992) Cell 69, 55–65. Mol. Biol. Evol. 19, 1451–1463. 37. Kelleher, D. J., Karaoglu, D. & Gilmore, R. (2001) Glycobiology 11, 321–333. 16. Wacker, M., Linton, D., Hitchen, P. G., Nita-Lazar, M., Haslam, S. M., North, S. J., 38. Kimura, E. A., Couto, A. S., Peres, V. J., Casal, O. L. & Katzin, A. M. (1996) J. Biol. Panico, M., Morris, H. R., Dell, A., Wren, B. W. & Aebi, M. (2002) Science 298, Chem. 271, 14452–14461. 1790–1793. 39. Ortega-Barria, E., Ward, H. D., Evans, J. E. & Pereira, M. E. (1990) Mol. Biochem. 17. Guha-Niyogi, A., Sullivan, D. R. & Turco, S. J. (2001) Glycobiology 11, 45R–59R. Parasitol. 43, 151–165. 18. Parodi, A. J. (1993) Glycobiology 3, 193–199. 40. Baldouf, S. L. (2003) Science 300, 1703–1706. 26, 19. Yagodnik, C., de la Canal, L. & Parodi, A. J. (1987) Biochemistry 5937–5943. 41. Bapteste, E., Brinkmann, H., Lee, J. A., Moore, D. V., Sensen, C. W., Gordon, P., 20. Adam, R. D. (2001) Clin. Microbiol. Rev. 14, 447–475. Durufle, L., Gaasterland, T., Lopez, P., Mu¨ller, M. & Philippe, H. (2002) Proc. Natl. 21. Berhe, S., Gerold, P., Kedees, M. H., Holder, A. A. & Schwarz, R. T. (2000) Exp. Acad. Sci. USA 99, 1414–1419. Parasitol. 94, 194–197. 22. Gowda, D. C., Gupta, P. & Davidson, E. A. (1997) J. Biol. Chem. 272, 6428–6439. 42. Sogin, M. L. & Silberman, J. D. (1998) Int. J. Parasitol. 28, 11–20. 23. Dacks, J. B. & Doolittle, W. F. (2001) Cell 107, 419–425. 43. Philippe, H., Germot, A. & Moreira, D. (2000) Curr. Opin. Genet. Dev. 10, 24. Helenius, A. & Aebi, M. (2004) Annu. Rev. Biochem. 73, 1019–1049. 596–601. 25. Abrahamsen, M. S., Templeton, T. J., Enomoto, S., Abrahante, J. E., Zhu, G., Lancto, 44. Embley, T. M., van der Giezen, M., Horner, D. S., Dyal, P. L. & Foster, P. (2003) C. A., Deng, M., Liu, C., Widmer, G., Tzipori, S., et al. (2004) Science 304, 441–445. Philos. Trans. R. Soc. Lond. B. 358, 191–201. 26. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & 45. Mai, Z., Ghosh, S., Frisardi, M., Rosenthal, B., Rogers, R. & Samuelson, J. (1999) Lipman, D. J. (1997) Nucleic Acids Res. 25, 3389–3402. Mol. Cell. Biol. 19, 2198–2205.

Samuelson et al. PNAS ͉ February 1, 2005 ͉ vol. 102 ͉ no. 5 ͉ 1553 Downloaded by guest on September 26, 2021