BMC Evolutionary Biology BioMed Central

Research article Open Access Insights into the evolution of sialic acid catabolism among Salvador Almagro-Moreno1,2 and E Fidelma Boyd*1

Address: 1Department of Biological Sciences, University of Delaware, Newark, DE 19716, USA and 2Department of Microbiology, National University of Ireland, University College Cork, Cork, Ireland Email: Salvador Almagro-Moreno - [email protected]; E Fidelma Boyd* - [email protected] * Corresponding author

Published: 26 May 2009 Received: 16 September 2008 Accepted: 26 May 2009 BMC Evolutionary Biology 2009, 9:118 doi:10.1186/1471-2148-9-118 This article is available from: http://www.biomedcentral.com/1471-2148/9/118 © 2009 Almagro-Moreno and Boyd; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: Sialic acids comprise a family of nine-carbon amino sugars that are prevalent in mucus rich environments. Sialic acids from the human host are used by a number of pathogens as an energy source. Here we explore the evolution of the genes involved in the catabolism of sialic acid. Results: The cluster of genes encoding the enzymes N-acetylneuraminate lyase (NanA), epimerase (NanE), and kinase (NanK), necessary for the catabolism of sialic acid (the Nan cluster), are confined 46 bacterial species, 42 of which colonize mammals, 33 as pathogens and 9 as gut commensals. We found a putative sialic acid transporter associated with the Nan cluster in most species. We reconstructed the phylogenetic history of the NanA, NanE, and NanK proteins from the 46 species and compared them to the species tree based on 16S rRNA. Within the NanA phylogeny, Gram-negative and Gram-positive bacteria do not form distinct clades. NanA from and Vibrio species was most closely related to the NanA clade from eukaryotes. To examine this further, we reconstructed the phylogeny of all NanA homologues in the databases. In this analysis of 83 NanA sequences, Bacteroidetes, a human commensal group formed a distinct clade with Verrucomicrobia, and branched with the Eukaryotes and the Yersinia/Vibrio clades. We speculate that pathogens such as V. cholerae may have acquired NanA from a commensal aiding their colonization of the human gut. Both the NanE and NanK phylogenies more closely represented the species tree but numerous incidences of incongruence are noted. We confirmed the predicted function of the sialic acid catabolism cluster in members the major intestinal pathogens , , V. vulnificus, and Y. pestis. Conclusion: The Nan cluster among bacteria is confined to human pathogens and commensals conferring them the ability to utilize a ubiquitous carbon source in mucus rich surfaces of the human body. The Nan region shows a mosaic evolution with NanA from Bacteroidetes, Vibrio and Yersinia branching closely together with NanA from eukaryotes.

Background in the Eukaryotes and Prokaryotes, being the only nine- Sialic acid or neuraminic acid, is the designation of a fam- carbon sugar known to date in the latter [1]. Both names, ily that encompasses over 50 naturally occurring and sialic acid and neuraminic acid, indicate the source of the structurally distinct nine-carbon amino sugars found both molecules from which they were first discovered: sialic,

Page 1 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

saliva in Greek, and neuraminic, brain and amine [2]. The the genes that encode NagA and NagB is highly variable, most abundant and widely studied sialic acid is N-acetyl- in some cases being part of the cluster, such as in H. influ- neuraminic acid (2-keto-3-deoxy-5-acetamido-D-glycero- enzae, or in most cases, scattered in the genome [1]. For D-galacto-nonulosonic acid or Neu5Ac), with the rest of this reason, in this work we focused on nanA, nanK, and the sialic acids being derivatives of Neu5Ac (Fig. 1) [1-4]. nanE, which from now on we will refer to as the Nan clus- ter. In eukaryotes, sialic acids are primarily found at terminal positions of numerous glycoconjugates, and are involved Few studies have investigated the ability to utilize sialic in a varied array of cell-cell interactions and cell-molecule acid as a carbon source or its relationship to bacterial recognition, such as stabilizing glycoconjugates and cell pathogenesis, even though the molecule is extensively membranes, or acting as chemical messengers [5,6]. Thus, found in mucus rich environments such as the gut and the presence of sialic acid is crucial for the development of lungs where many pathogens thrive [18,21-23,28,29]. vertebrates, with mutations in the synthesis pathway caus- Sialic acid catabolism has been demonstrated in five spe- ing premature death of mice embryos [7]. Sialic acids are cies: Clostridium perfringens, E. coli, P. multocida, H. influen- widely found in Deuterostomes and recent speculation zae, and Bacteroides fragilis [21-23,28,29]. Chang et al suggests that they might appear in particular life stages or showed in E. coli that the ability to degrade sialic acid was in small quantities in Protostomes [8-10]. Sialic acids are important for the colonization of the mouse colon [18]. also found in Fungi and some protozoa, although the lat- This finding suggests that the ability to utilize sialic acid as ter likely can only scavenge them from the host [11-13]. a carbon source may be important for bacteria to colonize this niche. To date, little is known about the distribution Current studies have shown that several bacterial patho- of the Nan genes in the Bacterial kingdom and their evo- gens such as enterohemorrhagic , Haemo- lutionary history [1,30,31]. de Koning and colleagues in philus influenzae, H. ducreyi, , Neisseria their analysis of inter domain transfer events found that gonorrhoeae, N. meningitidis, , and lateral gene transfer of nanA had occurred between bacte- Streptococcus agalactiae can put sialic acid residues on their ria and the human parasite Trichomonas vaginalis based on outer surfaces (sialylate) masking them from the host the phylogeny of 15 N-acetylneuraminate lyase protein immune system [14-32]. Interestingly, these pathogens sequences [31]. Furthermore, their analysis also indicated have developed different mechanisms for obtaining sialic possible transfer between Gram-negative and Gram-posi- acid that include de novo biosynthesis of sialic acid (E. tive bacteria. coli, N. meningitidis), sialic acid scavenging (N. gonor- rhoeae), or precursor scavenging (H. influenzae) (Fig. 1) In this study, we examined the distribution of the Nan [14-17]. cluster (nanA, nanE and nanK) among the 1,902 bacterial genomes in the database and found that these genes have Bacteria can also utilize sialic acid as a carbon and nitro- an extremely limited distribution. The Nan cluster is con- gen source by scavenging it from the surrounding environ- fined to predominantly pathogenic and commensal bac- ment [1,18-23]. The catabolic pathway of sialic acid in teria. The cluster was present only in members of the bacteria involves five steps (Fig. 1): first N-acetyl- Gamma- and Fusobacteria among Gram- neuraminic lyase (NanA) removes a pyruvate group from negative bacteria, and Bacillales, Clostridia, and Lacto- Neu5Ac yielding N-acetylmannosamine (ManNAc), and bacillales among Gram-positive bacteria as well as Myco- then N-acetylmannosamine kinase (NanK) adds a phos- plasma. We studied the gene order of the cluster, and phate group at C6 position, which yields N-acetylman- uncovered a surprising variability in its organization even nosamine-6-P (ManNAc-6P). Next, N- within members of the same genus. We identified a puta- acetylmannosamine-6-P epimerase (NanE) epimerizes tive sialic acid transporter within the Nan clusters from 40 the ManNAc-6P into N-acetylglucosamine-6-P (GlcNAc- of the 46 species that contained the region. We describe 6P). Then, N-acetylglucosamine-6-P deacetylase (NagA) four novel sialic acid transporter types and found different removes the acetyl group from GlcNAc-6P and yields Glu- transporters associated with Nan within each major phyl- cosamine-6-P (GlcN-6P) whose amine group is removed ogenetic group. We reconstructed the phylogenetic rela- by Glucosamine-6-P deaminase (NagB), which is con- tionships of NanA, NanE, and NanK and demonstrated verted into Fructose-6-P (Fig. 1) [19,24-28]. Interestingly, that NanA evolved independently from NanE and NanK. recent studies have shown an alternative in the catabolic The NanA phylogenetic tree in particular revealed several pathway of the gut commensal Bacteroides fragilis, which putative horizontal gene transfer events, one involving does not encode NanK since the epimerase (NanE) does transfer between domains. To examine further the evolu- not require a phosphorilated substrate to perform its met- tion of N-acetylneuraminate lyase (NanA), the key abolic reaction [23]. The genes encoding NanA, NanK, enzyme in sialic acid catabolism, we determined the dis- and NanE are clustered together, however the location of tribution and phylogeny of all homologues in the data-

Page 2 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

Figure 1 (see legend on next page)

Page 3 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

SchematicFigure 1 (seerepresentation previous page) of the metabolism of sialic acid among Bacteria Schematic representation of the metabolism of sialic acid among Bacteria. Summary of varied pathways of sialic acid utilization in Bacteria. The catabolic pathway of sialic acid involves several steps beginning with NanA. Highlighted in orange is the donor-scavenging synthesis of sialic acid. Highlighted in pink is the de novo pathway for the synthesis of sialic acid. For a more comprehensive review of sialic acid utilization see refs. 1 and 29. NanH, Neuraminidase; Neu5Ac, N-acetyl- neuraminic acid, sialic acid; T, sialic acid transporter; NanA, N-acetylneuraminic acid lyase; ManNAc, N-acetylmannosamine; NanK, N-acetylmannosamine kinase; ManNAc-6-P, N-acetylmannosamine-6-phosphate; NanE, N-acetylmannosamine-6-P epi- merase; GlcNAc-6-P, N-acetylglucosamine-6-phosphate; NagA, N-acetylglucosamine-6-phosphate deacteylase; GlcN-6-P, Glu- cosamine-6-phosphate; NagB, Glucosamine-6-phosphate deaminase; Fru-6-P, Fructose-6-phosphate; NeuA, CMP-N- acetylneuraminic acid synthetase; CMP-NeuAc, CMP-N-acetylneuraminic acid; NeuB, N-acetylneuraminic acid synthase; LPS, lipopolysaccharide. base. NanA was present in four additional bacterial (b3224), the DctM protein component from the TRAP groups, α-Proteobacteria, Planctomycetes, Verrucomicro- transporter in V. cholerae (VC1777), and the SatB protein bia and Bacteroidetes. The Bacteroidetes, members of the from the SatABCD transporter from H. ducreyi (HD1670) human gut microbiota, formed a distinct clade with the as seeds in our BLAST searches [32-34]. In addition, we Verrucomicrobia and this clade is most closely related to also examined the distribution of sialic acid biosynthesis the eukaryotes. The Vibrio/Yersinia clade branched next to genes (neuA, neuB, and neuC) among the bacteria encod- the Bacteroidetes clade. We verified that sialic acid can be ing the Nan cluster to determine whether there was a cor- used as a sole carbon source in a number of pathogenic relation between the presence of genes involved in sialic species, a capability that should confer a competitive acid catabolism and the genes required for the synthesis of advantage in the heavily sialylated environments of the sialic acid de novo. human body. Sequence alignment Methods The sequences were aligned using ClustalW [35]. The Sequence retrieval and cluster identification alignments were further checked manually using Gene- BLAST searches were used to identify homologues of Doc [36]. Large gaps and hypervariable sites were NanA, NanE, and NanK from Vibrio cholerae N16961 and removed from the alignments; the same was applied to Staphylococcus aureus N315 in the database. We considered gaps at the beginning and end of the alignment, represent- three genes to form a Nan cluster if: 1) they were the best ing missing sequence data. matches in BLAST for NanA, NanE, and NanK from V. cholerae N16961; 2) reciprocal BLAST searches of the three Phylogenetic analysis genes against the V. cholerae N16961 genome hit on V. We used prottest and modeltest (protein and DNA cholerae N16961 NanA, NanE, and NanK; 3) the same sequences respectively) in order to choose the most applied when performing BLAST search using NanA, appropriate method to calculate the distances [37]. We NanK, and NanE from S. aureus N315; 4) the three genes chose WAG with invariable sites for the Nan protein were encoded and within 10 ORFs of each other. The sequences and GTR with invariable sites for 16S rRNA rationale for this approach was: A) to avoid false positives, sequences [38,39]. Three different tree-building methods such as dihydropicolinate synthase, which shows similar- were used: Maximum Likelihood (ML), Bayesian analysis ity with NanA, and several sugar kinases that were (BY), and Neighbor Joining (NJ) as implemented in retrieved when we searched for NanK homologues; and B) PHYML and MrBayes 3.1.2, and MEGA 4 respectively [40- BLAST search using S. aureus N315 allowed us to avoid 43]. The Bootstrap values for ML and NJ trees were false negatives due to the low similarity of S. aureus Nan obtained after 1000 generations. For the trees constructed genes with the Nan genes from V. cholerae N16961. using BY the Markov chains were run for 1,000,000 gener- ations. The burn-in values were set for 10,000 generations The DNA sequences of the 16S rRNA from species encod- and the trees were sampled every 100 generations. Split- ing the Nan cluster were downloaded from GenBank. stree and MEGA 4 tree viewer were used to visualize the When several strains from the same species encoded the trees and calculate confidence values [43,44]. The topol- same cluster only a representative strain was included, ogy of ML, NJ and BY trees were very similar, with differ- with the strain from the first sequenced genome chosen. ences in branch lengths and confidence values but not in branching pattern. NJ and ML trees are included as addi- Sialic acid transporter identification tional files (see Additional files 1, 2, 3, 4, 5 and 6). In order to identify putative sialic acid transporters within the Nan cluster we used the NanT protein from E. coli K12

Page 4 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

GC content ihominis, Dorea formicigenerans, D. longicatena, We calculated the GC content of the sequences and com- Faecalibacterium prausnitzii, Fusobacterium nucleatum, pared it to the GC content of the whole genome. The for- Ruminococcus gnavus, Lactobacillus sakei, L. plantarum, and mulae used for the calculations can be found in Karlin et L. salivarius. Thus, the majority of the bacteria that encode al., 2001 [45] (see Additional file 7). the Nan cluster colonize mucous regions of the human body, such as the gut, lung, bladder, or oral cavity, where Growth analysis of various species on minimal media sialic acid is highly abundant and it can serve as a source supplemented with sialic acid of energy, carbon, and nitrogen. We inoculated 2% (100 μl) of a 5 ml overnight LB broth culture into 5 ml of M9 minimal media (V. cholerae We also examined the intraspecies distribution of the Nan N16961, V. vulnificus YJO16, V. parahaemolyticus RIMD, V. cluster to determine whether all strains from a species fischeri H905, Salmonella enterica INSP85, MOPS-based encoded the cluster. We found that for most species all minimal media ( KIM D27), or M9 minimal sequenced strains contained the Nan cluster. However, a media supplemented with amino acids (Yersinia enteroco- few exceptions were noted. For example, of the eight fully litica ATCC 27729) [46]. All media were supplemented sequenced genomes of C. botulinum, only strain Eklund with 1 mg/ml of N-acetylneuraminic acid (Sigma) or D- encoded the Nan cluster. Among the six sequenced S. Glucose. The growth of each species was detected by pneumoniae strains, five (TIGR, D39, G54, Hungary 19 A- measuring the absorbance of the cultures at 595 nm using 6, and R6) encoded two Nan clusters whereas strain a Sunrise 96-well plate reader by Tecan. The incubation CGSP14 does not contain the cluster. Similarly, for Salmo- temperature was 30°C for all the species except for Y. pes- nella enterica, two out of the nine sequenced strains do not tis which was incubated at 26°C. The data obtained was encode the Nan cluster (serovar Typhi Ty2 and CT18). exported to an excel sheet, and the growth curves were Among the 16 sequenced V. cholerae isolates, only the 10 made using Sigma plot. sequenced pathogenic isolates encoded the Nan cluster.

Results and Discussion To make the study tractable, we took the approach to Distribution of Nan cluster include in our analysis only the species that encoded the The distribution of the Nan cluster was remarkable in its three genes nanA, nanK, and nanE (the Nan cluster) which limited occurrence among the 730 finished and 1172 were within 10 ORFs from each other. In addition, we unfinished bacterial genomes examined. The Nan cluster investigated the number of species that encoded NanA, was present in 46 bacterial species and confined to six bac- the key enzyme in the first step of sialic acid degradation, terial families of the Gamma-Proteobacteria, one member but did not encode NanK or NanE. Overall, the distribu- of the genus Fusobacterium, and five bacterial families of tion of NanA, from this analysis resembles that of the Nan the Firmicutes, encompassing both low and high GC rep- cluster with only four additional bacterial groups added, resentatives (see Additional file 8). two genera from α-Proteobacteria, and several members of the Planctomycetes, Verrucomicrobia and Bacter- Interestingly, apart from Photobacterium profundum, Pseu- oidetes. Interestingly, members of the Bacteroides are well doalteromonas haloplanktis, Shewanella pealeana, Psychrom- known commensals of the human gut that in some cases onas, and Vibrio, which are all aquatic bacteria, the Nan can become opportunistic pathogens. The mucinolytic cluster is only present in either commensal or pathogenic abilities of the Bacteroides have been documented, and bacteria (see Additional file 8). In fact, 42 species from the sialic acid seems to be an important carbon source for 46 where the Nan cluster was identified are human com- these organisms [23,47]. However, Bacteroides does not mensals or pathogens, 33 are known pathogens of either require the presence of NanK in order to catabolize sialic humans or livestock. A total of 43% of the genomes in the acid [23]. The Verrucomicrobia are a recently described databases belong to pathogens, whereas 72% of the spe- phylum of Bacteria and are recovered from fresh water, cies that contain the Nan clusters are pathogenic. The soil and human feces. Overall, the majority of the addi- pathogens that encode the cluster are causative agents of a tional species identified as containing NanA were com- wide range of diseases; many of them are intestinal path- mensal or pathogens of humans. ogens such as E. coli, Shigella, Salmonella enterica, Yersinia enterocolitica, V. vulnificus and V. cholerae, the etiological Sialic acid transporters within the Nan cluster agent of . Clostridium botulinum is the causative Prior to its catabolism, sialic acid has to be transported agent of botulism, influenzae is a major cause into the cell, unless there is endogenous biosynthesis (Fig. of lower respiratory and meningitis in children, 1). To date there are three functionally characterized sialic Streptococcus pneumoniae causes , and Yersinia acid transporters: NanT, a single component system, pestis is the agent of . The nine human gut commen- which belongs to the major facilitator superfamily, first sals that encode the Nan cluster include Anaerotruncus col- identified in E. coli; a tripartite ATP-independent periplas-

Page 5 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

mic C4-dicarboxilate (TRAP) multicomponent transport All the Gamma-Proteobacteria that contained the Nan system, first identified in H. influenzae and Pasteurella mul- cluster encoded one of three types of transporters within tocida; and an ATP binding cassette (ABC) transporter, first the cluster: homologues of NanT, TRAP, or a novel identified in [1,3,48]. We identified a Sodium-glucose/galactose cotransporter, which belongs putative sialic acid transporter in 40 of the 46 species to the SSS family of transporters (Table 1). The three examined, four of the seven families of transporters were members of the encoded the multicompo- novel types associated with sialic acid (Table 1). nent TRAP transporter system, whereas all the Enterobac-

Table 1: Sialic acid transporters within Nan cluster

Species type locus e-value

Clostridium botulinum C str. Eklund SSS CBC_A1223 4.00E-78 Clostridium perfringens SM101 SSS CPR_0176 6.00E-63 Dorea formicigenerans ATCC 27755 SSS DORFOR_00337 4.00E-60 Dorea longicatena DSM 13814 SSS DORLON_02532 2.00E-59 Escherichia coli K12 NanT b3224 seed Escherichia coli O157:H7 str. Sakai NanT ECs4097 0 Fusobacterium nucleatum ATCC 25586 TRAP FN1473 2.00E-105 Rd KW20 TRAP HI0147 8.00E-103 Haemophilus somnus 129PT TRAP HS_0702 3.00E-104 Lactobacillus plantarum WCFS1 SSS lp_3563 8.00E-96 Lactobacillus sakei subsp. 23K SSS LSA1642 4.00E-96 Mycoplasma capricolum ATCC 27343 SSS MCAP_0416 1.00E-75 Mycoplasma mycoides SC str. PG1 SSS MSC_0558 1.00E-74 Mycoplasma synoviae 53 SSS MS_0191 1.00E-45 Pasteurella multocida subsp.Multocida Pm70 TRAP PM1708 2.00E-101 Photobacterium profundum SS9 TRAP PBPRA2279 5.00E-169 Pseudoalteromonas haloplanktis TAC125 SSS PSHAb0151/0160 1e-75/6e-75 Salmonella enterica ser.Cholerasuis SC-B67 NanT SC3276 0 Salmonella enterica ser.Paratyphi A str. ATCC 9150 NanT SPA3206 0 Salmonella typhimurium LT2 NanT STM3338 0 Shewanella pealeana ATCC 700345 SSS Spea_1518 3.00E-78 Sb227 NanT SBO_3165 0 Sd197 NanT SDY_3399 0 2a str. 301 NanT SF3260 0 Ss046 NanT SSO_3365 0 Staphylococcus aureus subsp. Aureus N315 Sym SA0303 0 Staphylococcus haemolyticus JCSC1435 Sym SH0283 0 Staphylococcus saprophyticus ATCC 15305 Sym SSP0376 seed Streptococcus agalactiae 2603V/R SAT 3 SAG0035 4.00E-121 Streptococcus gordonii str. Challis substr. CH1 SAT 2 SGO_0122 1.00E-130 Streptococcus pneumoniae TIGR4 a SAT 2/3 SP_1688 SP_1682 3.00E-131 Streptococcus pneumoniae TIGR4 b Sym SP_1328 8.00E-59 Streptococcus pyogenes M1 GAS SAT 3 Spy_254 seed Streptococcus sanguinis SK36 SAT 2 SSA_0076 seed Vibrio cholerae N16961 TRAP VC1777 seed Vibrio fischeri ES114 SSS VF0668 seed YJ016 TRAP VVA1200 0 ATCC 43970 NanT YberA_01002486 1.00E-91 Yersinia enterocolitica subsp. enterocolitica 8081 NanT YE1193 0 Yersinia frederiksenii ATCC 33641 NanT YfreA_01001471 0 ATCC 43969 NanT YmolA_01003560 0 Yersinia pestis KIM NanT y1465 1.00E-180 Yersinia pseudotuberculosis IP 32953 NanT YPTB2736 0

NanT (b3224) Major facilitator superfamily sialic acid transporter TRAP (VC1777) Tripartite ATP-independent periplasmic C4-dicarboxylate transport system SAT (HD1670) ATP binding cassette transporter SAT 2 (SSA_0076) ATP binding cassette transporter SAT 3 (Spy_254) ATP binding cassette transporter N-acetylneuraminate transport system permease SSS (VF0668) Sodium-glucose/galactose cotransporter/SSS family Sym (SSP0376) Na+/proline symporter

Page 6 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

teriaceae examined encoded the NanT single component enterica (Fig. 2). Interestingly, in all species examined transporter, similar to that present in E. coli (Table 1). All among the Gamma-Proteobacteria, except for Yersinia, the members of the family , except V. fischeri, nanK and nanE genes are clustered together and encoded encoded a TRAP transporter. Within the Nan cluster on the same strand, whereas the nanA gene is always sep- among the Firmicutes, the predominant transporter asso- arate from nanE and nanK by at least one gene (Fig. 2). ciated with the Nan cluster belonged to SSS, ABC, or Among the 20 Firmicutes examined, there are 10 different Sodium/proline (Sym) family of transporters. None of the gene order combinations, but unlike Gamma-Proteobac- Firmicute representatives examined encoded either the teria, in only two cases did nanE and nanK cluster NanT or the TRAP systems. together. There is no clear canonical gene order within the Firmicutes, since the three genes cluster in almost all pos- Co-occurrence of catabolic and biosynthetic sialic acid sible orders. This higher degree of variation of gene order gene clusters in Firmicutes compared to the Gamma-Proteobacteria From the 46 species that encoded the Nan cluster, only 10 may reflect a long association within this diverse group species also encoded the genes for the biosynthesis of (Fig. 3a). sialic acid (neuAB) or nonulosonic acid (nul): F. nuclea- tum, R. gnavus, C. botulinum, S. pealeana, S. agalactiae, V. As is well known, operons are pervasive in Prokaryotes fischeri, V. vulnificus, P. profundum, and Psychromonas. [50]. However, the mechanisms underlying their evolu- There were only four species, A. pleuropneumoniae, H. tion are not fully understood. Some authors argue that the influenzae, H. somnus, and P. multocida that encoded only fact that some genes are located within a single co-tran- NeuA, which is required for recognition of sialic acid by scribed region selects for a more efficient regulation [51]. sialyltransferases and subsequent sialylation of the bacte- Also, a group of genes encoding co-dependent functions rial cell surface, suggesting that the donor-scavenging when forming an operon increases the likelihood of a method of sialylation is limited. All the species that fully functional horizontal gene transfer event, a major encoded neuA and neuB also encoded neuC, with one evolutionary force within bacterial evolution [51,52]. The exception: F. nucleatum. As shown in other organisms, Nan cluster would fall within what is considered a Fusobacterium might scavenge a precursor of sialic acid "destructed" operon, due to its loosely organized configu- from its environment instead of synthesizing it de novo ration, which might be due to rearrangements within its [49]. It is well known that some E. coli strains can synthe- host genome or during the possible HGT events that led size sialic acid and sialylate their surface [3]. However the to its particular distribution, a scenario that is more wide- strains under study here do not encode neuA and neuB. spread than previously thought [53,54]. Also it might indicate the relatively new acquisition of the genes by the The distribution of the genes for the synthesis of sialic bacterial kingdom, since fragmentation of well-adapted acid/nonulosonic acid in the Bacterial kingdom is very ancient operons will at least require the evolution of reg- different from that of the Nan genes for catabolism. The ulatory elements, which might not confer a selective sialic acid/nonulosonic acid synthesis genes are consider- advantage to the organism [55]. The latter hypothesis ably more widespread both ecologically and taxonomi- could explain why the Nan cluster is so limited in its dis- cally. For example, a high number of marine bacteria tribution. encode the Neu cluster such as species from the genera Synechococcus, Salinibacter, Shewanella, Sphingopyxis, Chro- Signatures of horizontal gene transfer mobacterium, Hahella, Idiomarina, Prochlorococcus, Reinekea, Typically, there are two main methods to detect putative Tenacibaculum, Rhodopseudomonas, Thiomicrospira and horizontal gene transfer events: Phylogenetic methods most members of the family Vibrionaceae, V. cholerae is and surrogate methods based on nucleotide composition. the notable exception. In addition, the presence of transposases and/or inte- grases within a region may suggest a mode of transfer. Colinearity of the Nan cluster Therefore, we located all transposases and integrases The three main groups that contain the Nan cluster, within or near the Nan cluster (see Additional file 9). Gamma-Proteobacteria, Fusobacterium, and Firmicutes, From the 46 species that encode the Nan cluster 12 species encode the nanA, nanE, and nanK genes in a different gene contained transposases close to the region, with a notice- order and on different strands (Fig. 2). In fact, the gene able abundance in C. perfringens SM101, L. salivarius order of the Nan cluster varies among families, and, to a UCC118, S. pneumoniae TIGR B, and Y. pestis KIM (see lesser extent, within families (Fig. 2). Within the Gamma- Additional file 9). In V. cholerae N16961, the Nan region, Proteobacteria, there are seven variants of the Nan cluster, is present on a pathogenicity island named Vibrio Patho- each family having its own gene order with the exception genicity Island-2 (VPI-2) and encodes an integrase [56]. of the . Within this group,Yersinia spe- Next, we compared the average GC content of the whole cies have a different gene order to E. coli, Shigella, and S. genome (GCg) against the GC content of the Nan genes

Page 7 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

Gamma-proteobacteria Firmicutes

0-2 1 Pasteurellaceae Staphylococcaceae

C. perfringens 0-2 0-2 Enterobacteriaceae (A) Lachnospiraceae R. gnavus

0-1 Enterobacteriaceae (B) C. botulinum 1 1 F. prausnitzii

1 4 Psychromonadaceae A. colihominis

5 Pseudoalteromonadaceae L. plantarum 0-2 L. salivarius

Shewanellaceae 1 1 L. sakei

1-4 5-8 Vibrionaceae Streptococcaceae (A)

4 Fusobacterium Streptococcaceae (B)

Fusobacterium M. capricolum 2 M. mycoides

1 nanA nanK nanE M. synoviae

StructureFigure 2 of the Nan clusters among bacterial groups Structure of the Nan clusters among bacterial groups. Unless otherwise indicated all the members from a family shown share a common canonical structure of the Nan cluster, with differences only in the number of ORFs between the genes (indi- cated by a number within each cluster). nanA, orange arrows; nanK, green arrows; nanE, blue arrows. Enterobacteriaceae (A): E. coli, S. enterica, S. typhimurium, S. boydii, S. dysenteriae, S. flexneri, S. sonnei; Enterobacteriaceae (B): All species from genus Yers- inia; Streptococcaceae (A) S. agalactiae, S. gordonii, S. pneumoniae (cluster A), (B) S. pyogenes, S. sanguinis, S. pneumoniae (cluster B).

(GCnan) shown as the difference between GCg and GCnan content; suggesting an independent history for the two (see Additional file 7). None of the nanA gene sequences clusters in this strain (see Additional file 7). had a significant aberrant GC content (deviating from the GCg by +/- 5). However, nanE and nanK from Yersinia spp, Phylogenetic analysis of NanA Shigella spp, E. coli, S. enterica, Pasteurella and H. somnus The limited distribution of the Nan cluster, its variable had significant aberrant GC content suggesting its evolu- gene order, and the diversity of transporters within the tionary history differs from nanA. In S. pneumoniae TIGR cluster, indicates mosaic evolution of the region (Fig. 2). nanE and nanK from both clusters had an aberrant GC In order to examine further the evolutionary history of the

Page 8 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

Streptococcaceae

e s a

i e

s i i

n i n

o n e

n e

i a

m g i o

u t

u d o

g c

e r y a

n

n l

o

a

p p a

s g S g a) S a

S S S m i u e ar k t Lactobacillaceae a n s la Shewanellaceae p s L L riu va ali Psychromonadaceae Pasteurellaceae Ls us tic

e y

a l

a o

z

d m Staphylococcaceae Pseudoalteromonadaceae i

n ae c e us

p h ure o

u a

o S S

t l l r 74 69

f Ssaprop

u hyt u

s icus n

i s

i e

t S m l

k u

H p P T3 p n n

a e

l A m

a

p o s

l e

o d

e s i

CNP o

l yc Mycoplasmataceae a

a m

P H M

h n Mcapr P ico V P a lum c p ho r le o ra fu Ms V e nd ynovi vu u ae lnifi m Cperf Vibrionaceae cus ringens Cbo Clostridiaceae tuli Vfischeri num

D Dfor Yen Rg lo micige teroc ngi nerans olitic n cat

a av ena Lachnospiraceae

Ybercovieri us

i i

ti l

e i o

ar i c E Ruminococcaceae

oll n i

m se s e Y n

k s i F

ri i t i n A

s e i p

r e s a m o

i

d o s F r

c a

e l e u e c a i d

r i S n

u i o u f p r r n y u

Y c e r l s

r e x c i n t o Y u h

e t l i

n e b b e t n

l om z

u e m a f t e S i o S i tu i s d h S m in u y e p i s y d s t Ruminococcaceae p S Y 0.05 S Enterobacteriaceae

II Enterobacteriacea

e S

S

P Mycoplasmataceae

B

A

O S 8

C 3

b S

3 9 2 T 3 3

S 1 M 1

F 2 0 2 6

3 0 9 7

b) 7

3 2 3

7 5 4 S 2 3 S 9 3 O 6 5 5 I 5 33 1 5 0 1 6 S 4 6 0 ECs4098 C M P 00 S Lactobacillaceae 34 A 8 DY M C 6 S M 5 8 3 3 p 19 l L LS Clostridiaceae A 5 Ruminococcaceae N 17 F A R 0 221 AE C P A1 375 P O 79 C BC SP0 R L 81 81 C S A 0 SH0282 Staphylococcaceae S M 1 G 21 9 62 SA03 O 2 9 F 04 02 4 N14 SS 0 2 75 A 1 19 0 2 T 07 4 vag 8 PM ina SP H 1 lis Streptococcaceae 1676 I0 71 HA1 5 S p 42 64 0 SPy0257 0 2 6 0 9 0 Pasteurellaceae 39 5 1 III G00 7 SA 0 7 29 4 3 13 6 2 70 P 1 9 S A 6 1 6 S 2 3 3 L 0 5 3 2 0 A 0 0 Lactobacillaceae N N G R O M O L

U F

R

R R

u c i g e v r o n R O s

O

Ruminococcaceae D

D s

n

e

i

p

a

s s

H

u

n

i

Lachnospiraceae t Eukaryotes a

n

s a

u

O

l

76 a Yb01002480 65 g

35556 G

0 2

0 7 8 2

1 2 5

6

0 4 8 1 V

5

m 1 8 a

4 e

p

Y 0 5 1

1 S

0 6

7

1 4 b 9 10 7

0 2 A

1 7 f E 9

H 1

Y 1

Y y B S

C 7 P

1

V

T 8

A

2 3

Enterobacteriacea P 2 V

6

Y

V 6

A

0

e 0 R

F

1 P

V 8 B

0 P

0 Vibrionaceae 3

T Shewanellaceae P

N Psychromonadaceae C 0.2 P Pseudoalteromonadaceae IV

PhylogeneticFigure 3 trees of a) 16S rRNA of bacteria containing the Nan cluster b) NanA Phylogenetic trees of a) 16S rRNA of bacteria containing the Nan cluster b) NanA. The trees were obtained using Bayesian analysis as implemented in MrBayes. 1,000,000 generations were used to build the consensus trees. Only confidence values below 85 are shown. Blue Operational Taxonomic Units (OTUs) indicate Gram-negative Bacteria; red OTUs, Gram- positive Bacteria; black OTUs, Eukaryotes. For NanA tree the five main lineages are highlighted with color brackets embracing them. Lineage I, green bracket; lineage II, light blue bracket; lineage III, purple bracket; lineage IV, yellow bracket; lineage V, grey bracket.

Page 9 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

Nan region, we performed a phylogenetic analysis of and II, which contain the other Gamma-Proteobacteria NanA, NanE, and NanK and compared the branching pat- species examined. Thus, it appears that the origin of NanA terns of the three proteins with the topology for the spe- in lineage IV is unique. The six Yersinia NanA sequences cies tree based on 16S rRNA sequences (Figs. 3, 4 and 5). examined are also present within lineage IV and are unre- lated to NanA from other species of Enterobacteriaceae The nanA gene encodes the enzyme N-acetylneuraminic suggesting that there is not a single origin for nanA among lyase (NanA). As outgroups we included in our phyloge- enterobacteria (Fig. 3a). Even more surprising is the close netic analysis NanA from four eukaryotes: Gallus gallus, proximity of lineage IV to lineage V representing NanA Ornithorhynchus anatinus, Rattus norvegicus, and Homo sapi- from four eukaryotic species used as outgroups. A close ens. Overall, within the NanA phylogenetic tree, Gram- relationship between NanA from Vibrio/Yersinia was previ- negative and Gram-positive representatives did not form ously indicated by Andersson and colleagues in their anal- two distinct lineages as in the 16S rRNA tree. Indeed, the ysis of NanA from 10 bacterial species [30]. NanA tree can be subdivided in five main lineages (named I, II, III, IV and V) (Fig. 3b), with Lineage I as the most In order to investigate further the relationship between divergent clustering, encompassing Gram-negative gen- lineages IV and V, we reconstructed a phylogenetic tree era, , Haemophilus and Pasteurella, Fusobacte- including all sequences annotated as NanA in the genome ria, Gram-positive genera, Clostridium, Lactobacillus, database (Fig. 4 and Additional files 10 and 11). In total, Staphylococcus, and Mycoplasma (Fig. 3b). Also within this we studied 83 putative NanA protein sequences from bac- lineage is NanA from Trichomonas vaginalis, a protozoan teria and an additional 18 NanA protein sequences from parasite that adheres to the urogenital tract, which eukaryotes. From this analysis, six additional Clostridium branches with members of the family Pasteurellaceae and species contain NanA (Fig. 4). Two human pathogens, C. Fusobacterium nucleatum. The placement of T. vaginalis difficile and C. butyricum, cluster with C. botulinum, C. per- with members of the Pasteurellaceae and not the eukary- fringens and pathogenic species of Staphylococcaceae, Pas- otes indicates interdomain transfer of NanA from an teurellaceae and Mycoplasmataceae (Fig. 4). Four ancestor of Pasteurellaceae to a progenitor of T. vaginalis, Clostridium commensals cluster within three divergent which was noted previously by others [30,31]. T. vaginalis branches, C. leptum forms a single divergent branch, C. ter- also encodes a homologue of NanK but does not encode tium and C. scindens cluster with commensals from the NanE. Within lineage I are three members of the genus family Ruminococcaceae, whereas C. bolteae clusters with Staphylococcus, which cluster together, and within this the γ-Proteobacteria Oceanicola granulosus (Fig. 4). The dis- branch is NanA from C. botulinum and C. perfringens. The tribution of NanA from the different Clostridium species NanA from Lactobacillus salivarius and L. plantarum branch on divergent branches of the tree indicates that in this spe- together within lineage I whereas NanA from L. sakei is cies nanA was acquired multiple times and from different located within lineage III. All members of the genus Myc- sources. Six additional γ-Proteobacteria species contained oplasma are found within lineage I (Fig. 3b). Members of NanA, the majority of which are human commensals and the Mycoplasma are obligate pathogens found in a wide pathogens (Fig. 4). A note of interest is the placement of range of hosts, the primary habitats of human and animal NanA from the human commensal , a mycoplasmas are mucous membranes of the respiratory member of the Enterobacteriaceae, firmly within the Vibri- and urogenital tracts, eyes, mammary glands and the onaceae suggesting a recent acquisition from a Vibrio (Fig. joints [57]. The nine highly related NanA protein 4). sequences from Shigella spp., E. coli, and S. enterica, gas- trointestinal pathogens of humans form the separate line- NanA was present in two genera from α-Proteobacteria, age II (Fig. 3b). and several members of the Bacteroidetes, Verrucomicro- bia, and Planctomycetes. The NanA from the Gram-nega- The six Firmicutes not present within lineage I, cluster tive Bacteroidetes represents a large group of gut together within lineage III, L. sakei, branching firmly with commensal (Bacteroides caccae, B. fragilis, B. ovatus, B. ster- Streptococcus, which suggests a common origin for NanA coris, B. uniformis, B. vulgatus, Parabacteroides distasonis, P. in this group. The five additional commensals of the merdae, Flavobacteriales bacterium) that formed a distinct human gut that contain NanA, D. formicigenerans, D. long- clade with five genera of the Verrucomicrobia and three icatena, R. gnavus, A. colihominis and F. prausnitzii, are also genera of Planctomycetes. This clade was most closely present within this lineage (Fig. 3b). related to the eukaryotes (Fig. 4). Planctomycetes are pep- tidoglycan-less bacteria with a shared compartmentalized The NanA protein from members of the families Vibri- cell structure and divide by a budding process [58]. Verru- onaceae, Shewanellaceae, Psychromonadaceae, and Pseu- comicrobia is a divergent phylum within the domain Bac- doaltermonadaceae are all located within the divergent teria, which also contains a compartmentalized cell lineage IV. Lineage IV branches away from both lineage I structure similar to Planctomycetes [59]. It has been sug-

Page 10 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

N

v i tr i p e

n A T Eukaryotes n NanA g a) c i A s a a

m m s

e t a b ll i n i f a e

e e

r u s e

Clostridia and s

a t a

m s s

t y

n u

i u

t

l l a

d s

i l

e l c

u

o a i u a a i

l t c

Bacillales us l f

e r b p

c s g s

i o

u u a b e

a o

u r c

r

eg a c

a s t m

v m m t E

Lactobacillales P r s o

S S P H P

M B d no S

a M s

s a

R M u e

a s n

g i

i t

n n C

S s a a

s s

S

i n

g g l u u

b n l l t a l

F e

o u s a a o

e i a a O

L p t g r

p c r r

a i u e Verrucomicrobia

d n c G

c t c r r

S o i t u u D to i i s a i e

s a p

b p S n l r m v y l

a u 2 u

o e i e s

c g i u i is V s N v l

i e q s a A lla n u p e ic n T

J a p

c C l e es i S tro e i D XX a a t e e z t c e R a S i t L r p s i tu 2 neu a t i m k i p oni ei u O a m e s C s 14 tu is Bacteroidetes sc vu 5 a cor inde la lin lg er Clostridia ns l vu st onis Rgnavus C bE B Bistas tena Pd lis longica rans gi e D ne fra erda Actinobacteria icige BPm Rho orm cae doc Df Bcac Clo occ Bovatus strid usR A ia HA col Buniformis Flavobacteriaceae 1 ihom Cbo inis FlavobacteralesHTCC2170 Og ltea V ranulo e spinosum Verrucomicr. sus Go A bscurigl α-Proteobacteria Sm mucin obus edic Bm iphila ae R arin Verrucomicr. Smeliloti ba a ltic Ente a Planctomycetaceae roba Es cte ae γ-Proteobacteria akaz r638 Vcholer akii Vvulnific Ckoseri V us γ-Proteobacteria

Enterobacteriaceae i P V sh

icatypTh2 ii M ilonii h r S Sente umL yd li ED Vibrionaceae

i o a

ur bo c p Pm 22

m S l 2

i e o Y

ph E Y YVYb ir

ty a a p e e P m

S p f r b l r c i

l n l e e e o o i p

a t l v s

V d l i

a s e e V r a

n r

o t e r i n r

f e

k v i i o r

f s t s

a i i

a P u k

t c i γ i

c s -Proteobacteria d s n o

s e h

e d l m y i n

olu e t n ric s c u ic ii Enterobacteriaceae

ap e r c s m

M oid h i a i

c r s Mycoplasmataceae my o

M m

us o c Verrucomicrobian

phyti uss m c u a

apro lyti eu lin

Ss r u s

s

mo u t e

ae a o m i a

h b r

S S C tu i o

a v

e t

l o

Bacillales c is a

n

u a s

u g

n

i

n s c y

a l

i i e F a l

t s γ

d r g

i e e -Proteobacteria

l

n

r a a y a a

i t m l i

e M

s p r

f

c d e

o r o u

l a i c H M

i i e

a C r f

c

e m c p Vibrionaceae

s f t i z i i

c C

e s

a l o r

t e d

a n b a l y

o u t h a e

s n u C

i

u i n i

u

F M u

l

g n

y Mycoplasmataceae f m b

a m

e o

n

P C

o r v i

s

s m

T c u

i

H

H u u r a

m

e d v

i

l u

n r

H a

a s p t

L

o n

r

a

l

u

p

e

L

l p Lactobacillaceae

0.2 A Streptococcaceae Clostridia Staphylococcaceae γ-Proteobacteria Lactobacillales Paenibacillaceae Clostridiaceae Pasteurellaceae Lachnospiraceae Ruminococcaceae Fusobacteriaceae Mycoplasmataceae Flavobacteriaceae b) Bacteroidaceae Porphyromonadaceae Victivallaceae Rhodobacteraceae Rhizobiaceae Vibrionaceae Enterobactericeae 16S rRNA Pasteurellaceae Shewanellaceae Pseudoalteromonadaceae Psychromonadaceae Nocardiaceae Planctomycetaceae Opitutaceae Xiphinematobacteriaceae Verrucomicrobiaceae

0.05

PhylogeneticFigure 4 trees of a) All species encoding NanA b) 16S rRNA of all families encoding NanA Phylogenetic trees of a) All species encoding NanA b) 16S rRNA of all families encoding NanA. The trees were obtained using Bayesian analysis as implemented in MrBayes. 1,000,000 generations were used to build the consensus trees as indicated in methods section. Main inclusive taxonomic groups are indicated. b) Grey shading, Lactobacillales; purple shading, Bacillales; dark green shading, Clostridia; red shading, Bacteroidetes; green shading, Alpha-proteobacteria; blue shading, Gamma-proteobacteria; yellow shading, Verrucomicrobia.

Page 11 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

Mycoplasmataceae

Staphylococcaceae

8

5

a) 1 b)

9 6 4

1 1 5 0

8 0 5 S

0374 0 2 S

S P

3 P

0 G 5 A

A C S

H O S

C S

S

0

S 0 011

M

M S 0 M Lactobacillaceae S 2 0

A 7 P 2 0

7 G 1 0

3 3 0 1 8 1 2 4 9 6 0 SA305 Ruminococcaceae S 0 A 6 1 9 P 3 8 L

1 y 3 C L 7 9 02 5 O A S 6 9 5 B

S L 5 1 1 C C 0 3 0 A L p L 3 Clostridiaceae l 73 74 N 5

Lachnospiraceae O F A 2

8 A 5

4 C 0 3

4

1 E 01 3

6

7 P 0

2 A

2 Ruminococcaceae R N

9 6 S 7 0

4

Ruminococcaceae 3 N A

1 P

3 1 M 4 O R

1 1

3 A 2

A 2 O 3

1 3 1 F 9 0

1 2 26

P 2 3 RL

9 66 R 0

0 0 0 A CPR C 0 8

6 N

S O GN

FN1 2 O

2 7

7 2 2 D UM R

B F R

0 9 1

1 1 7 D 1

A 0 O

C 3

N

R

F 5

4 M2

G S

P 3 R

5 M M

2 83

C O

0

U RA N R

D

O L P

R s O E

D A u 5

l s 55

0 F l 0

4 u

0 C

0 a n S

G i A t M S g a s

n cu MCAP 041

G a gi 9

SPy0 e 2 V 5 8 60 O rv V

o A

53 HRnsapiens 1

S 68 72 P 2

1

6 75 0 7

Streptococcaceae5 V 4

S C

S

S 82 1781

A G

0 0279 O H

0 Eukaryotes P S

7 lis SH

9 a

0 in Ab

1 vag 01 S

2 T Sp 50 SP 5 ea 15 0372 19 S A 0 30 HS 7 06 99 05 HI0 LS 12 145 L VA 1 V Vibrionaceae 2001775 93 Ap lp 9 1782 711 3 VC M1 5 P 7 Pseudoalteromonadaceae 1 149 Ab0 L S PSH 5 Shewanellaceae 28 S F A2 R A 3 Spea 1520 BP b 2 P 1 32 5 6 S 2 8 00 66 4 SO 2 8 6 1 3363 0 00 0 3167 82 YE119 T3 VF SBO 7 YfreA NP 64 3995 YYb 01001 C 1 3 Yy m erA 474 P Yb0100248

40 1 o 0 56

SDY P 4 lA 100 035 5 79 9

5

T 6 0 2 0 2 3

Cs 4 7 F

1 8 1 S 1 0 4 8

E 4 B 0 2 9 89 6 9

4 0 Ym 1

7 01 5 3

3 2 3 33

2 0 4 4 364 5 6 2

Enterobacteriaceae 3 7 5 1 11 6 Y

1 3 20 83 7 D

0 2 6 7

3 y S

3 2 9

Yf 3 SC3 7 1 O

M 9 5 0 YE b B 3 5 S

Enterobacteriaceae 7 3 T

3 S pa3 4

T 0

S 7 4 3 2 s4

s 4

2

O 7 P

4 6 3

2 M C

5 7 1

3

Y B

8 6 T 1

9 1 C 8 E

A

2 0 S S

7 0 I0 7

9 S

2 0

0 1 P

F H

A 6

02 0 M S

R V e 0

l 0.2

P 3 P

T S p B A

P P H N C Pasteurellaceae 0.2 P Vibrionaceae Psychromonadaceae

PhylogeneticFigure 5 trees of a) NanK and b) NanE Phylogenetic trees of a) NanK and b) NanE. The trees were obtained using Bayesian analysis as implemented in MrBayes. 1,000,000 generations were used to build the consensus trees. Only confidence values below 85 are shown. Blue Operational Taxonomic Units (OTUs) indicate Gram-negative Bacteria; red OTUs, Gram-positive Bacteria; black OTUs, Eukaryotes. Blue branches indicate Gamma-proteobacteria; red branches, Firmicutes; Black branches; Eukaryotes.

gested that these groups, both inhabitants of the aquatic Phylogenetic analysis of NanK and NanE environment, may form some of the most ancient bacte- Unlike the NanA phylogeny where Gram-negative and rial lineages (Fig. 4) [58,59]. Within the 16S rRNA tree the Gram-positive species clustered together, the phylogenetic Bacteroidetes and Verrucomicrobia lineages are not trees of both NanK and NanE did not demonstrate clear related (Fig. 4). The NanA from the Vibrio/Yersinia lineage cases of horizontal transfer between the two groups (Fig. branches close to the Verrucomicrobia/Planctomycetes 5a and 5b). The NanE and NanK protein sequences from lineage and the eukaryote lineage (Fig. 4). A single species the Gamma-Proteobacteria branch separately from NanE from the Verrucomicrobia also branches within the Vibrio/ and NanK from the Firmicutes. In both trees, F. nucleatum Yersinia clade. The data suggests horizontal transfer clusters firmly within the Firmicutes lineage (Fig. 5a and between eukaryotes and bacteria; however, the possible 5b). For the NanK tree we used the same outgroups as for direction of transfer cannot be determined as the prokary- the NanA tree, however, NanE is only present in bacteria. otes and eukaryotes form separate distinct clades (Fig. 4). It is interesting to note that NanK from T. vaginalis does We speculate that human pathogens such as V. cholerae, V. not cluster with the other eukaryotic representatives, vulnificus, Y. pestis and Y. enterocolitica may have acquired which form a tight and closely related group, unrelated to nanA from a commensal species in the human gut. An any member of the bacterial kingdom. alternative evolutionary scenario for this branching pat- tern may be convergent evolution of NanA, for instance, Although in the NanE and NanK trees, Gamma-Proteo- in order to recognize the same variant of sialic acid, which bacteria and Firmicutes are grouped on separate lineages, in the case of the bacterial pathogens within lineage IV similar to the 16S rRNA tree, within each lineage signifi- would allow them to utilize the sialic acid found in the cant differences are found (Fig. 5). Strikingly, P. profun- mucus of their host. dum, V. fischeri, and Psychromonas branch with members of the family Pasteurellaceae in both the NanK and NanE trees (Fig. 5a and 5b). V. vulnificus and V. cholerae are both

Page 12 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

0.6

0.5

0.4

0.3 O.D. 595 nm

0.2

0.1 0 5 10 15 20 25 Time point (hours)

V. cholerae N16961 V. vulnificus YJO16 V. fischeri H905 V. parahaemolyticus RIMD2210633 S. typhimurium INSP85 Y. enterocolitica ATCC 27729 E. coli K12

GrowthFigure 6of species that encode the Nan cluster on minimal media supplemented with sialic acid Growth of species that encode the Nan cluster on minimal media supplemented with sialic acid. Closed circles represent V. cholerae N16961; open circles represent V. vulnificus YJO16; closed triangles represent V. fischeri H905; open trian- gles represent V. parahaemolyticus RIMD2210633; closed squares represent S. enterica serovar Typhimurium INSP85; open squares represent Y. enterocolitica ATCC 27729; closed diamonds represent E. coli K12.

Page 13 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

closely related to each other in both trees but form a diver- Nan region or NanA (Fig. 6). We also studied the growth gent lineage from the other members of the Vibrionaceae of Y. pestis KIM D27 for 48 hours on M9 minimal media and Gamma-Proteobacteria strengthening the possibility supplemented with amino acids, N-minimal media, and of an independent evolutionary origin. MOPS minimal media. Slight growth occurred only in MOPS minimal media when supplemented with sialic Within the Firmicutes lineages for the NanK protein there acid or D-glucose, with both substrates showing similar is significantly more diversity in the branching patterns growth patterns (data not shown). It is worth noting that than in the NanE tree (Fig. 5b). For both the NanE and Y. pestis has a very slow growth rate and is an auxotroph NanK trees, the Staphylococci group branch with mem- for many nutrients. Overall our findings indicate that bers of the genus Lactobacillus. The three members of Myc- three major groups of , Salmonella, oplasma in the NanK tree form a divergent branch from Vibrio and Yersinia, can utilize sialic acid as a sole carbon other Firmicutes, similar to F. nucleatum and C. botulinum. and energy source, a nutrient widespread in the mucous NanK from Streptococcus species branch with D. formici- surfaces colonized by these organisms, increasing their fit- generans, D. longicatena, and R. gnavus, similar to the NanA ness in their host's environment. tree. The NanE and NanK proteins from the two species of the genus Clostridium are located on different branches in Conclusion both trees (Fig. 5a and 5b). Sialobiology is a an emerging field in cellular microbiol- ogy, which is beginning to uncover the significant role of Even though the general topologies of both the NanK and sialic acid metabolism in bacterial interactions with the NanE trees resemble more closely the topology of the16S human host both as commensals and pathogens. In this rRNA than that of NanA, there were many incongruencies work, we studied the distribution, gene order, and molec- found within the main lineages of the trees (Fig. 3, 4 and ular evolution of the cluster involved in sialic acid degra- 5). These differences might be due to stochastic reasons, dation (nanA, nanE, and nanK) among bacteria. We show such as different rates of mutation between the two genes, for the first time that the Nan cluster is limited to patho- or horizontal transfer events post speciation, which is sug- genic and commensal bacteria, encompassing a limited gested by the differences in the gene order of the clusters number of Gamma-Proteobacteria and Firmicutes. We (Fig. 2). Within the majority of the Firmicutes nanK and demonstrated that NanA, the first enzyme in the catabolic nanE genes were not coupled together, whereas in pathway, has a distinct evolutionary history from NanE Gamma-Proteobacteria the nanK and nanE genes were and NanK, with multiple instances of horizontal gene always coupled (one exception was noted within Yers- transfer found. The Nan cluster shows mosaic evolution inia). The coupling of the genes is likely reflected in the with incongruencies in its phylogeny and diversity in its higher congruency between the topology for the Gamma- structure. For the first time, we confirm the predicted abil- Proteobacteria for both NanE and NanK. The evolution- ity to utilize sialic acid as a carbon source in several bacte- ary scenario emerging from the analysis of the phyloge- rial pathogens encompassing three major groups netic trees is a mosaic evolution of the Nan cluster, due to, Salmonella, Vibrio and Yersinia, which can provide them among other possible reasons, horizontal transfer and with a selective advantage in heavily sialylated environ- reshuffling, as suggested by the variability in gene order. ments such as the human gut.

Utilization of sialic acid as a sole carbon source Authors' contributions In order to verify whether the species predicted to encode SAM and EFB designed research, analyzed data, and the Nan cluster have the capability to survive using sialic drafted the manuscript. SAM performed the research. acid as a sole carbon source, we performed in vitro assays Both authors read and approved the final manuscript. studying the growth of some pathogenic and commensal bacteria on minimal media supplemented with sialic acid Additional material (M9+sialic acid) (Fig. 6). Previous studies have shown that some bacteria, such as C. perfringens, B. fragilis, E. coli K12, P. multocida, and H. influenzae can utilize sialic acid Additional file 1 NanA NJ tree. The figure shows a phylogenetic tree of NanA using Neigh- as a carbon source [15,22,23,27,29,60]. We examined bor Joining as a tree building method. growth of V. cholerae N16961, V. vulnificus YJO16, V. Click here for file fischeri H905, S. enterica serovar Typhimurium INSP85, Y. [http://www.biomedcentral.com/content/supplementary/1471- enterocolitica ATCC 27729, E. coli K12 (positive control) 2148-9-118-S1.pdf] and V. parahaemolyticus RIMD2210633 (negative control) on M9+sialic acid as a sole carbon source (Fig. 6). As expected all isolates grew with the exception of V. para- haemolyticus RIMD2210633, which does not contain the

Page 14 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

Additional file 2 Additional file 10 NanK NJ tree. The figure shows a phylogenetic tree of NanK using Neigh- NanA Bayesian All Bootstrap. The figure shows a phylogenetic tree of bor Joining as a tree building method. NanA using Bayesian analysis as a tree building method with the boot- Click here for file strap values indicated. [http://www.biomedcentral.com/content/supplementary/1471- Click here for file 2148-9-118-S2.pdf] [http://www.biomedcentral.com/content/supplementary/1471- 2148-9-118-S10.pdf] Additional file 3 NanE NJ tree. The figure shows a phylogenetic tree of NanE using Neigh- Additional file 11 bor Joining as a tree building method. Additional species included in Figure 4. The table indicates the extra Click here for file additional species that were included in Fig. 4 that did not contain a full [http://www.biomedcentral.com/content/supplementary/1471- Nan cluster but encoded NanA. 2148-9-118-S3.pdf] Click here for file [http://www.biomedcentral.com/content/supplementary/1471- Additional file 4 2148-9-118-S11.xls] NanA ML tree. The figure shows a phylogenetic tree of NanA using Max- imum Likelihood as a tree building method. Click here for file [http://www.biomedcentral.com/content/supplementary/1471- Acknowledgements 2148-9-118-S4.pdf] We thank J.H. McDonald and the reviewers for their critical comments on the manuscript, and Michael Napolitano for his technical help. This work Additional file 5 was supported by a Science Foundation Ireland Research Frontiers grant NanK ML tree. The figure shows a phylogenetic tree of NanK using Max- and a University of Delaware Research Foundation grant to EFB. imum Likelihood as a tree building method. Click here for file References [http://www.biomedcentral.com/content/supplementary/1471- 1. Vimr ER, Kalivoda KA, Deszo EL, Steenbergen SM: Diversity of 2148-9-118-S5.pdf] microbial sialic acid metabolism. Microbiol Mol Biol Rev 2004, 68(1):132-153. Additional file 6 2. Angata T, Varki A: Chemical diversity in the sialic acids and related alpha-keto acids: an evolutionary perspective. Chem NanE ML tree. The figure shows a phylogenetic tree of NanE using Max- Rev 2002, 102(2):439-469. imum Likelihood as a tree building method. 3. Severi E, Hood DW, Thomas GH: Sialic acid utilization by bacte- Click here for file rial pathogens. Microbiology 2007, 153(Pt 9):2817-2822. [http://www.biomedcentral.com/content/supplementary/1471- 4. Warren L: Bound Carbohydrates in Nature. Cambridge Univer- 2148-9-118-S6.pdf] sity press; 1994. 5. Schauer R: Achievements and challenges of sialic acid research. Glycoconj J 2000, 17(7–9):485-499. Additional file 7 6. Varki A: Biological roles of oligosaccharides: all of the theories GC content differences between Nan genes and genome. The table indi- are correct. Glycobiology 1993, 3(2):97-130. cates the GC content differences between the Nan genes and the core 7. Schwarzkopf M, Knobeloch KP, Rohde E, Hinderlich S, Wiechens N, Lucka L, Horak I, Reutter W, Horstkorte R: Sialylation is essential genome of the bacterium where they are encoded. for early development in mice. Proc Natl Acad Sci USA 2002, Click here for file 99(8):5267-5270. [http://www.biomedcentral.com/content/supplementary/1471- 8. Malykh YN, Krisch B, Gerardy-Schahn R, Lapina EB, Shaw L, Schauer 2148-9-118-S7.xls] R: The presence of N-acetylneuraminic acid in Malpighian tubules of larvae of the cicada Philaenus spumarius. Glycoconj J 1999, 16(11):731-739. Additional file 8 9. Roth J, Kempf A, Reuter G, Schauer R, Gehring WJ: Occurrence of Distribution of Nan clusters among Bacteria. The table indicates the sialic acids in Drosophila melanogaster. Science 1992, bacterial species that encode a full version of the Nan cluster (nanA, 256(5057):673-675. nanK, and nanE) 10. Saito M, Kitamura H, Sugiyama K: Occurrence of gangliosides in Click here for file the common squid and pacific octopus among protostomia. Biochim Biophys Acta 2001, 1511(2):271-280. [http://www.biomedcentral.com/content/supplementary/1471- 11. Alviano CS, Travassos LR, Schauer R: Sialic acids in fungi: a mini- 2148-9-118-S8.xls] review. Glycoconj J 1999, 16(9):545-554. 12. Cross GA, Takle GB: The surface trans-sialidase family of Additional file 9 Trypanosoma cruzi. Annu Rev Microbiol 1993, 47:385-411. 13. Schauer R, Reuter G, Muhlpfordt H, Andrade AF, Pereira ME: The Presence of mobile elements and Neuraminidases. The table indicates occurrence of N-acetyl- and N-glycoloylneuraminic acid in which species encode a neuraminidase, integrase and/or mobile elements Trypanosoma cruzi. Hoppe Seylers Z Physiol Chem 1983, in the proximities of the Nan cluster. 364(8):1053-1057. Click here for file 14. Parsons NJ, Patel PV, Tan EL, Andrade JR, Nairn CA, Goldner M, Cole [http://www.biomedcentral.com/content/supplementary/1471- JA, Smith H: Cytidine 5'-monophospho-N-acetyl neuraminic acid and a low molecular weight factor from human blood 2148-9-118-S9.xls] cells induce lipopolysaccharide alteration in gonococci when conferring resistance to killing by human serum. Microb Pathog 1988, 5(4):303-309.

Page 15 of 16 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:118 http://www.biomedcentral.com/1471-2148/9/118

15. Vimr E, Lichtensteiger C, Steenbergen S: Sialic acid metabolism's W and Clustal X version 2.0. Bioinformatics 2007, dual function in Haemophilus influenzae. Mol Microbiol 2000, 23(21):2947-2948. 36(5):1113-1123. 36. Nicholas KBNH, Deerfield DW: GeneDoc: Analysis and Visuali- 16. Vimr E, Steenbergen S, Cieslewicz M: Biosynthesis of the poly- zation of Genetic Variation. EMBNEW 1997. sialic acid capsule in Escherichia coli K1. J Ind Microbiol 1995, 37. Posada D, Crandall KA: MODELTEST: testing the model of 15(4):352-360. DNA substitution. Bioinformatics 1998, 14(9):817-818. 17. Vogel U, Claus H, Heinze G, Frosch M: Role of lipopolysaccharide 38. Lanave C, Preparata G, Saccone C, Serio G: A new method for cal- sialylation in serum resistance of serogroup B and C menin- culating evolutionary substitution rates. J Mol Evol 1984, gococcal disease isolates. Infect Immun 1999, 67(2):954-957. 20(1):86-93. 18. Chang DE, Smalley DJ, Tucker DL, Leatham MP, Norris WE, Steven- 39. Goldman N, Whelan S: Statistical tests of gamma-distributed son SJ, Anderson AB, Grissom JE, Laux DC, Cohen PS, et al.: Carbon rate heterogeneity in models of sequence evolution in phyl- nutrition of Escherichia coli in the mouse intestine. Proc Natl ogenetics. Mol Biol Evol 2000, 17(6):975-978. Acad Sci USA 2004, 101(19):7427-7432. 40. Felsenstein J: Mathematics vs. Evolution: Mathematical Evolu- 19. Martinez J, Steenbergen S, Vimr E: Derived structure of the puta- tionary Theory. Science 1989, 246(4932):941-942. tive sialic acid transporter from Escherichia coli predicts a 41. Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of novel sugar permease domain. J Bacteriol 1995, phylogenetic trees. Bioinformatics 2001, 17(8):754-755. 177(20):6005-6010. 42. Guindon S, Gascuel O: A simple, fast, and accurate algorithm 20. Schauer R, Buscher HP, Casals-Stenzel J: Sialic acids: their analysis to estimate large phylogenies by maximum likelihood. Syst and enzymic modification in relation to the synthesis of sub- Biol 2003, 52(5):696-704. mandibular-gland glycoproteins. Biochem Soc Symp 1974:87-116. 43. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolu- 21. Severi E, Randle G, Kivlin P, Whitfield K, Young R, Moxon R, Kelly D, tionary Genetics Analysis (MEGA) software version 4.0. Mol Hood D, Thomas GH: Sialic acid transport in Haemophilus Biol Evol 2007, 24(8):1596-1599. influenzae is essential for lipopolysaccharide sialylation and 44. Huson DH, Bryant D: Application of phylogenetic networks in serum resistance and is dependent on a novel tripartite evolutionary studies. Mol Biol Evol 2006, 23(2):254-267. ATP-independent periplasmic transporter. Mol Microbiol 2005, 45. Karlin S: Detecting anomalous gene clusters and pathogenic- 58(4):1173-1185. ity islands in diverse bacterial genomes. Trends Microbiol 2001, 22. Steenbergen SM, Lichtensteiger CA, Caughlan R, Garfinkle J, Fuller 9(7):335-343. TE, Vimr ER: Sialic Acid metabolism and systemic pasteurello- 46. Brubaker RR: Interconversion of Purine Mononucleotides in sis. Infect Immun 2005, 73(3):1284-1294. Pasteurella pestis. Infect Immun 1970, 1(5):446-454. 23. Brigham C, Caughlan R, Gallegos R, Dallas MB, Godoy VG, Malamy 47. Kuwahara T, Yamashita A, Hirakawa H, Nakayama H, Toh H, Okada MH: Sialic Acid (N-acetyl Neuraminic Acid or NANA) Utili- N, Kuhara S, Hattori M, Hayashi T, Ohnishi Y: Genomic analysis of zation by Bacteroides fragilis Requires A Novel N-acetyl Bacteroides fragilis reveals extensive DNA inversions regu- Mannosamine (manNAc) Epimerase. J Bacteriol 2009, lating cell surface adaptation. Proc Natl Acad Sci USA 2004, 191(11):3629-38. 101(41):14919-14924. 24. Comb DG, Roseman S: The sialic acids. I. The structure and 48. Post DM, Mungur R, Gibson BW, Munson RS Jr: Identification of a enzymatic synthesis of N-acetylneuraminic acid. J Biol Chem novel sialic acid transporter in Haemophilus ducreyi. Infect 1960, 235:2529-2537. Immun 2005, 73(10):6727-6735. 25. Plumbridge J, Vimr E: Convergent pathways for utilization of 49. Vimr E, Lichtensteiger C: To sialylate, or not to sialylate: that is the amino sugars N-acetylglucosamine, N-acetylman- the question. Trends Microbiol 2002, 10(6):254-257. nosamine, and N-acetylneuraminic acid by Escherichia coli. 50. Fani R, Brilli M, Lio P: The origin and evolution of operons: the J Bacteriol 1999, 181(1):47-54. piecewise building of the proteobacterial histidine operon. J 26. Ringenberg MA, Steenbergen SM, Vimr ER: The first committed Mol Evol 2005, 60(3):378-390. step in the biosynthesis of sialic acid by Escherichia coli K1 51. Lawrence JG, Roth JR: Selfish operons: horizontal transfer may does not involve a phosphorylated N-acetylmannosamine drive the evolution of gene clusters. Genetics 1996, intermediate. Mol Microbiol 2003, 50(3):961-975. 143(4):1843-1860. 27. Vimr ER, Troy FA: Regulation of sialic acid metabolism in 52. Woese C: The universal ancestor. Proc Natl Acad Sci USA 1998, Escherichia coli: role of N-acylneuraminate pyruvate-lyase. J 95(12):6854-6859. Bacteriol 1985, 164(2):854-860. 53. Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene 28. Vimr ER, Troy FA: Identification of an inducible catabolic sys- order: a fingerprint of proteins that physically interact. tem for sialic acids (nan) in Escherichia coli. J Bacteriol 1985, Trends Biochem Sci 1998, 23(9):324-328. 164(2):845-853. 54. Itoh T, Takemoto K, Mori H, Gojobori T: Evolutionary instability 29. Nees S, Schauer R, Mayer F: Purification and characterization of of operon structures disclosed by sequence comparisons of N-acetylneuraminate lyase from Clostridium perfringens. complete microbial genomes. Mol Biol Evol 1999, 16(3):332-346. Hoppe Seylers Z Physiol Chem 1976, 357(6):839-853. 55. Omelchenko MV, Makarova KS, Wolf YI, Rogozin IB, Koonin EV: 30. Andersson JO, Doolittle WF, Nesbo CL: Genomics. Are there Evolution of mosaic operons by horizontal gene transfer and bugs in our genome? Science 2001, 292(5523):1848-1850. gene displacement in situ. Genome Biol 2003, 4(9):R55. 31. de Koning AP, Brinkman FS, Jones SJ, Keeling PJ: Lateral gene 56. Jermyn WS, Boyd EF: Characterization of a novel Vibrio patho- transfer and metabolic adaptation in the human parasite genicity island (VPI-2) encoding neuraminidase (nanH) Trichomonas vaginalis. Mol Biol Evol 2000, 17(11):1769-1773. among toxigenic Vibrio cholerae isolates. Microbiology 2002, 32. Blattner FR, Plunkett G 3rd, Bloch CA, Perna NT, Burland V, Riley M, 148(Pt 11):3681-3693. Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al.: The com- 57. Vasconcelos AT, Ferreira HB, Bizarro CV, Bonatto SL, Carvalho MO, plete genome sequence of Escherichia coli K-12. Science 1997, Pinto PM, Almeida DF, Almeida LG, Almeida R, Alves-Filho L, et al.: 277(5331):1453-1474. Swine and poultry pathogens: the complete genome 33. Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, Dodson sequences of two strains of Mycoplasma hyopneumoniae and RJ, Haft DH, Hickey EK, Peterson JD, Umayam L, et al.: DNA a strain of Mycoplasma synoviae. J Bacteriol 2005, sequence of both chromosomes of the cholera pathogen 187(16):5568-5577. Vibrio cholerae. Nature 2000, 406(6795):477-483. 58. Lee KC, Webb RI, Fuerst JA: The cell cycle of the planctomycete 34. Munson RS Jr, Zhong H, Mungur R, Ray WC, Shea RJ, Mahairas GG, Gemmata obscuriglobus with respect to cell compartmen- Mulks MH: Haemophilus ducreyi strain ATCC 27722 contains talization. BMC Cell Biol 2009, 10(1):4. a genetic element with homology to the vibrio RS1 element 59. Lee KC, Webb RI, Janssen PH, Sangwan P, Romeo T, Staley JT, Fuerst that can replicate as a plasmid and confer NAD independ- JA: Phylum Verrucomicrobia representatives share a com- ence on haemophilus influenzae. Infect Immun 2004, partmentalized cell plan with members of bacterial phylum 72(2):1143-1146. Planctomycetes. BMC Microbiol 2009, 9(1):5. 35. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, 60. Allen S, Zaleski A, Johnston JW, Gibson BW, Apicella MA: Novel McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al.: Clustal sialic acid transporter of Haemophilus influenzae. Infect Immun 2005, 73(9):5291-5300.

Page 16 of 16 (page number not for citation purposes)