Supporting Online Material
Total Page:16
File Type:pdf, Size:1020Kb
1 SUPPLEMENTARY MATERIAL 2 The glycan alphabet is not universal: a hypothesis 3 4 Jaya Srivastava1*, P. Sunthar2 and Petety V. Balaji1 5 6 1Department of Biosciences and Bioengineering, Indian Institute of 7 Technology Bombay, Powai, Mumbai 400076, India 8 9 2Department of Chemical Engineering, Indian Institute of Technology 10 Bombay, Powai, Mumbai 400076, India 11 12 *Corresponding author 13 Email: [email protected] 1 14 CONTENTS Data Description Figure S1 Number of organisms with different number of strains sequenced Figure S2 Biosynthesis pathways Figure S3 Proteome sizes for different number of monosaccharides Figure S4 Prevalence of monosaccharides in species versus that in genomes Figure S5 Bit score distribution plots for hits of various pairs of profiles Table S1 Tools and databases used in this study References References cited in Table S1 Table S2 Comparison of the precursor and nucleotide used for the biosynthesis of two enantiomers of a monosaccharide Flowchart S1 Procedure used to generate HMM profiles Flowchart S2 Precedence rules for assigning annotation to proteins that are hits to two or more profiles and/or BLASTp queries References References to the research articles which describe the pathways (or enzymes of the pathways) of monosaccharide biosynthesis. These formed the basis for generating HMM profiles and choosing BLASTp queries. 15 16 MS-EXCEL file provided separately: Supplementary Data.xlsx 17 Worksheet1 Details of HMM profiles Worksheet2 Details of BLASTp queries Worksheet3 Prevalence of monosaccharides in genomes / species Worksheet4 Abbreviated names of monosaccharides Worksheet5 Enzyme types, enzymes and monosaccharide groups Worksheet6 Precursors of various monosaccharides 18 2 19 Figure S1 The number of species for which different number of strains are sequenced. 20 Six or fewer strains are sequenced for most of the species. On the other hand, more than 21 50 strains are sequenced for 29 species. Escherichia coli and Salmonella enterica have 22 the highest number of sequenced strains (714 and 602, respectively). Genus and species 23 names are not known for 45 endosymbionts; only their host name is known e.g., 24 Legionella endosymbiont. Each such case is considered as a distinct species. 3 4 Mutase Glucose-1-phosphate C1 4 (d)TDP- (d)TDP-fucose C1 ThymidylylT (Pyr to Fur) fucofuranose ) 0 6 4 GPE01130 A GPE09510 3 1 ( C 2 3 - - e @ 1 1 s : : 4 H a C 4 (d)TDP-glucose 1 2 O t 7 F – c 3 7 x u E GPE05430 5 4,6-Dehydratase (R) A 6 0 d e Q Retention of config @ C5 O R - 4 4-AminoT 4 (d)TDP-4-keto-6-deoxy-glucose C1 Eq –NH2@C4 3,4-Ketoisomerase 3,4-Ketoisomerase GPE01830 GPE07510 Eq –OH@C4 Ax –OH@C4 GPE01230 (d)TDP-4-amino- e s (d)TDP-3-keto-6- a (d)TDP-3-keto-6- 4 r 0 0 6-deoxy-glucose C1 3 3 e 4 4 5 4 deoxy-glucose C1 deoxy-galactose C1 m 2 2 i 0 0 3 p E E 2 1 G E GPE01910 9 P P 3-AminoT (E) 3-AminoT (E) 2 - GPE01910 - 1 P G G 1 - F E 5 GPE01230 : 1 5 , Eq –NH @C3 Eq –NH @C3 : T o 2 GPE01230 2 3 3 l ( 0 3 D y ) 4 rm 0 8 W t o 1 -a 0 B C e in y (d)TDP-3-amino-6- 4 X c m lT (d)TDP-3-amino-6- C 9 A m i Q n 4 a 0 4 0 4 T - o C C 3 C deoxy-galactose 1 3 4 ) deoxy-glucose 1 o ( 8 2 @ n 1 (d)TDP-4-keto- 1 2 i 0 0 H E E 1 m AcetylT AcetylT N P L-rhamnose C4 Q6T1W7:1-192 P A – - G Q6TFC6:1-265 G x (3-amino) Q12KT8:1-152 (3-amino) 4 A 4 (d)TDP-Qui4NAc C1 (d)TDP-Qui4NFo E5 ) 4 O F1 4 E - 4 C ( R 6 6 4 4 1 e 62 : (d)TDP-Qui3NAc C1 (d)TDP-Fuc3NAc C1 e A d 5 1- s 4 x u 6: 31 a C 0 – c 1 8 t 3 O t - @ a 2 (d)TDP-4-amino-6- c 3 H s 7 H @ e 0 u 6 d O 0 C (A 4C e – E 4 ) deoxy-galactose 1 P R q - G 4 E AcetylT Q8FBQ3:1-224 4-Epimerase (4-amino) (d)TDP- (d)TDP-6-deoxy- 1 Q2SYH7:1-363 L-rhamnose C4 L-talose 1C 4 4 (d)TDP-Fuc4NAc C1 D-enantiomer L-enantiomer HMM profile BLASTp query D-enantiomer / pyranose form unless mentioned otherwise 25 26 Figure S2a TDP-/dTDP-linked monosaccharides derived from glucose-1-phosphate. 27 Abbreviated names are used for some of the monosaccharides. Full names of these are 28 given in Supplementary_data.xlsx:Worksheet4. 4 29 4 Glucose-1-phosphate C1 GPE00330 CytidylylT 4 CDP-glucose C1 GPE05230 4,6-Dehydratase (R) Retention of config @ C5 GPE40110 C-MeT (E) 3-Epimerase CDP-4-keto-6- CDP-4-keto-3C-methyl- 4C 4C 1 CDP-4-keto-6-deoxy gulose 1 4 GPE02530 deoxy-glucose C1 Eq –Me, Ax-OH@C3 6-deoxy-gulose Ax –OH@C4 4-Reductase GPE06230 4 GPE05630 3-Dehydratase -R e e s E a q d t 4 – u 4 c C O c CDP-6-deoxy gulose C1 u H t CDP-6-deoxy-D3,4-glucocene @ @ a d H GPE06230 s e C e O 4 R – - x P26395:1-330 D3,4-glucocene 4 A Q66DP5:1-329 reductase GPE02530 CDP-3,6-dideoxy-L-glycero- 5-Epimerase glycero 4 CDP-3,6-dideoxy-D- - CDP-cereose C1 CDP-cillose 1C 4 D-glycero-4-hexulose 4 C1 4 D-glycero-4-hexulose C1 GPE06230 4-Reductase 4 e - s R Eq –OH@C4 a E e t 4 q d c C – u u @ O c d H GPE06230 H t 1C e O @ a CDP-L-ascarylose 4 R – C s - x 4 e 4 A 2-Epimerase 4C 4 4C CDP-abequose 1 CDP-paratose C1 CDP-tyvelose 1 P14169:1-338 D-enantiomer / pyranose form unless mentioned otherwise 30 D-enantiomer L-enantiomer HMM profile BLASTp query 31 Figure S2b CDP-linked monosaccharides derived from glucose-1-phosphate. 32 Abbreviated names are used for some of the monosaccharides. Full names of these are 33 given in Supplementary_data.xlsx:Worksheet4. 5 4 Glucose-1-phosphate C1 GPE00430 UridylylT GPE00530 GPE09510 4,6-Dehydratase (R) 4-Epimerase Mutase UDP-galacto- UDP-4-keto-6- 4 4 UDP-glucose C1 UDP-galactose C1 4 Retention of config @ C5 furanose deoxy-glucose C1 GPE02230 (pyranose GPE05510 to furanose) 2 5 3 3 , - ) 4 5 :1 E - - 8 ( R E 6-Dehydrogenase N p 4 T 4 E e Y o C q d im GPE03130 2 n G – u G i @ e 2 P O c GPE03430 0 m H E H t r A N 0 @ a a 0 A – 5 s s A - 7 C e e 4 q 1 4 4C E 0 ( , Decarboxylase 1 E 4 ) UDP-GlcA C1 UDP-4-keto xylose GPE20030 GPE01830 1 4-Epimerase 4-AminoT (A) UDP-4-amino-6- UDP-L-rhamnose C4 GPE01230 GPE02230 Ax –NH @C4 4C 2 deoxy-glucose 1 4 UDP-xylose C1 4C 4 AcetylT UDP-GalA 1 UDP-L-Ara4N C1 (4-amino) 4-Epimerase FormylT GPE02230 GPE50010 Q5UR11:1-213 (4-amino) 4C 4 UDP-L-arabinose 1 UDP-L-Ara4NFo UDP-Qui4NAc C1 4 C1 D-enantiomer L-enantiomer HMM profile BLASTp query D-enantiomer / pyranose form unless mentioned otherwise 34 35 Figure S2c UDP-linked monosaccharides derived from glucose-1-phosphate. 36 Abbreviated names are used for some of the monosaccharides. Full names of these are 37 given in Supplementary_data.xlsx:Worksheet4. 6 GPE01330 4 Fructofuranose-6-phosphate Glucosamine-6-phosphate C1 GPE07030 2-AmidoT GPE07130 M lT GPE07230 y u Isomease t 0 G t e 3 P a GPE07330 1 G E c 8 P 0 s A 0 E 9 e Q5SIM4:1-254 E 0 1 P 9 3 G 6 P29954:1-387 3 0 0 1 4C C4 Mannose-6-phosphate 1 GDP-L-galactose GPE09230 4 3,5 Glc2NAc-6-phosphate Glucosamine-1-phosphate C1 -e GPE09330 Mutase 4C R p 1 4N im GPE09630 RR er AcetylT 5: as 1-3 e 4 Mutase GPE08510 50 Mannose-1-phosphate C1 (2-amino) GPE09430 GPE00620 (GlmU-CTD) GPE00720 GuanylylT 4 Glc2NAc-1-phosphate C1 4 6-dehydrogenase 4 GDP-ManA C1 GDP-mannose C1 GPE03210 GPE00831 GPE03430 GPE05020 4,6-dehydratase (R) UridylylT UridylylT Retention of config @ C5 GPE00830 (GlmU-NTD) 4-AminoT (E) 4 4 C1 GDP-Per C1 GDP-4-keto-6-deoxy mannose 4 Eq –NH2@C4 UDP-Glc2NAc C1 AcetylT Q7DBF7:1-221 GPE01430 Eq 4- –OH O85353:1-215 se 3,5-epimerase, red @C (4-amino) ta uct 4 ra GP ase d 4-reductase (A) 4 E06 (E 4C hy 0 -r 51 ) GDP-Per4Ac 1 e 63 GPE06030 Ax –OH@C4 ed 0 d 05 A u 3- E x – ct GP O as H@ e 1C C4 (A 4 GDP-L-fucose 4 ) GDP-rhamnose C1 GDP-4-keto-3,6- 4 4C dideoxy mannose C1 GDP-6-deoxy-talose 1 5-epimerase, 4-reductase (A) GPE06030 Ax –OH@C4 1 GDP-L-colitose C4 D-enantiomer L-enantiomer HMM profile BLASTp query D-enantiomer / pyranose form unless mentioned otherwise 38 39 Figure S2d GDP- and UDP-linked monosaccharides derived from fructofuranose-6- 40 phosphate.