Phylogenetic-Derived Insights into the Evolution of Sialylation in Eukaryotes: Comprehensive Analysis of Vertebrate β-Galactoside α2,3/6-Sialyltransferases (ST3Gal and ST6Gal). Roxana E Teppa, Daniel Petit, Olga Plechakova, Virginie Cogez, Anne Harduin-Lepers

To cite this version:

Roxana E Teppa, Daniel Petit, Olga Plechakova, Virginie Cogez, Anne Harduin-Lepers. Phylogenetic- Derived Insights into the Evolution of Sialylation in Eukaryotes: Comprehensive Analysis of Vertebrate β-Galactoside α2,3/6-Sialyltransferases (ST3Gal and ST6Gal).. International Journal of Molecular Sciences, MDPI, 2015, 17 (8), ￿10.3390/ijms17081286￿. ￿hal-01358174￿

HAL Id: hal-01358174 https://hal.archives-ouvertes.fr/hal-01358174 Submitted on 28 May 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License International Journal of Molecular Sciences

Review Phylogenetic-Derived Insights into the Evolution of Sialylation in Eukaryotes: Comprehensive Analysis of Vertebrate β-Galactoside α2,3/6-Sialyltransferases (ST3Gal and ST6Gal)

Roxana E. Teppa 1, Daniel Petit 2, Olga Plechakova 3, Virginie Cogez 4 and Anne Harduin-Lepers 4,5,* 1 Bioinformatics Unit, Fundación Instituto Leloir, Av. Patricias Argentinas 435, C1405BWE Buenos Aires, Argentina; [email protected] 2 Laboratoire de Génétique Moléculaire Animale, UMR 1061 INRA, Université de Limoges Faculté des Sciences et Techniques, 123 avenue Albert Thomas, 87060 Limoges, France; [email protected] 3 FRABio-FR3688 CNRS, Univ. Lille, bât. C9, 59655 Villeneuve d’Ascq cedex, France; [email protected] 4 Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, 59000 Lille, France; [email protected] 5 UGSF, Bât. C9, Université de Lille-Sciences et Technologies, 59655 Villeneuve d’Ascq, France * Correspondence: [email protected]; Tel.: +33-320-336-246; Fax: +33-320-436-555

Academic Editor: Cheorl-Ho Kim Received: 28 June 2016; Accepted: 28 July 2016; Published: 9 August 2016 Abstract: Cell surface of eukaryotic cells is covered with a wide variety of sialylated molecules involved in diverse biological processes and taking part in cell–cell interactions. Although the physiological relevance of these sialylated glycoconjugates in vertebrates begins to be deciphered, the origin and evolution of the genetic machinery implicated in their biosynthetic pathway are poorly understood. Among the variety of actors involved in the sialylation machinery, sialyltransferases are key enzymes for the biosynthesis of sialylated molecules. This review focus on β-galactoside α2,3/6-sialyltransferases belonging to the ST3Gal and ST6Gal families. We propose here an outline of the evolutionary history of these two major ST families. Comparative genomics, molecular phylogeny and structural bioinformatics provided insights into the functional innovations in sialic acid metabolism and enabled to explore how ST- function evolved in vertebrates.

Keywords: evolution; sialyltransferases; sialic acid; molecular phylogeny; functional genomics

1. Introduction Sialic acids (SA) represent a broad family of nine-carbon electro-negatively charged monosaccharides commonly described in the deuterostomes and some microorganisms [1–5]. Interestingly, SA show a discontinuous distribution across evolutionary metazoan lineages. Outside the deuterostome lineage (vertebrates, urochordates, echinoderms), SA are rarely described in some ecdysozoa and lophotrochozoa protostomes like in the Drosophila melanogaster nervous system during embryogenesis [6–9] or in larvae of the cicada Philaenus spumarius [10], or on glycolipids of the common squid and pacific octopus [11]. They are notably absent from plants, archaebacteria or the ecdysozoan Caenorhabditis elegans [12]. SA exhibit a huge structural diversity and species specific modifications. This family of compounds encompasses N-acetylneuraminic acid (Neu5Ac) and over 50 derivatives showing various substituents on carbon 4, 5, 7, 8 or 9, like Neu5Gc and Kdn, with Neu5Ac being the most prominent SA found in higher vertebrates (Figure1A). In vertebrates, the SA hydroxyl group at position 2 is most frequently glycosidically-linked to either the 3- or 6-hydroxyl group of galactose (Gal) residues (Figure1B) or the 6-hydroxyl group of N-acetylgalactosamine (GalNAc) residues and

Int. J. Mol. Sci. 2016, 17, 1286; doi:10.3390/ijms17081286 www.mdpi.com/journal/ijms Int. J. Mol.Int. J. Sci. Mol.2016 Sci. ,201617,, 1286 17, 1286 2 of 202 of 20

hydroxyl group at position 2 is most frequently glycosidically-linked to either the 3- or 6-hydroxyl can formgroup to of a lessergalactose extent (Gal) di-, residues oligo- or(Fig poly-SAure 1B) or chains the 6-hydroxyl via their 8-hydroxyl group of N group.-acetylgalactosamine In deuterostome lineages,(GalNAc) sialoglycans residues and are foundcan form in to cellular a lesser secretionsextent di-, oligo- and onor poly-SA the outer chains cell via surface, their 8-hydroxyl essentially as group. In deuterostome lineages, sialoglycans are found in cellular secretions and on the outer cell terminal residues of the glycan chains of glycoproteins and glycolipids [13,14] constituting the so-called surface, essentially as terminal residues of the glycan chains of glycoproteins and glycolipids [13,14] siaLome [15], which varies according to animal species. constituting the so-called siaLome [15], which varies according to animal species.

Figure 1. Sialic acids and sialylated molecules. (A) N-acetylneuraminic acid (Neu5Ac) is the major Figure 1. Sialic acids and sialylated molecules. (A) N-acetylneuraminic acid (Neu5Ac) is the major sialic acid molecule found in human tissues. Other commonly described sialic acids in vertebrates are sialicN acid-glycolylneuraminic molecule found acid in human (Neu5Gc) tissues. and 2-keto-3-deoxy-nonulosonic Other commonly described acid sialic (Kdn); acids (B) In in vertebrates, vertebrates are N-glycolylneuraminicthe sialic acid hydroxyl acid group (Neu5Gc) at position and 2-keto-3-deoxy-nonulosonic 2 is most frequently glycosidically-linked acid (Kdn); to (B either) In vertebrates, the 3- the sialicor 6-hydroxyl acid hydroxyl group group of galactose at position (Gal) 2 isresidues. most frequently These glycosidic glycosidically-linked linkages are formed to either by thethe 3- or 6-hydroxylβ-galactoside group ofα2,3/6-sialyltransferases galactose (Gal) residues. described These in glycosidic this review. linkages are formed by the β-galactoside α2,3/6-sialyltransferases described in this review. Owing to their anionic charge and their peripheral position in glycans, SA play major roles in the various vertebrate biological systems ranging from protecting from proteolysis, Owing to their anionic charge and their peripheral position in glycans, SA play major roles modulating cell functions to regulating intracellular communication [16]. For instance, the in theα2,3-linked various vertebrateSA contribute biological to the high systems viscosity ranging of the mucin-type from protecting O-glycosylproteins proteins fromfound proteolysis, on the modulatingintestine cell endothelia functions orto on regulating the surface intracellular of fish or frog communication eggs [17]. Besides, [16 ].some For instance,endogenous the proteinsα2,3-linked SA contributespecificallyto recognize the high sialylated viscosity molecules of the at mucin-type the cell surfaceO-glycosylproteins that act as receptors. found Examples on the include intestine endotheliaselectin or on on endothelial the surface cells of mediating fish or frog leucocytes eggs [17 and]. Besides, platelets sometrafficking, endogenous and siglecs proteins playing specifically a role recognizein immune sialylated cell regulation molecules [18,19]. at the cellLikewise, surface a number that act of as pathogenic receptors. Examplesagents like includetoxins (cholera selectin on endothelialtoxin), protozoa cells mediating (Plasmodium), leucocytes viruses and (influenza platelets virus), trafficking, bacteria and (Helicobacter siglecs playing pylori) use a rolecell surface in immune cell regulationSA as ligands [18 ,for19]. cell Likewise, adhesion a [20] number and have of pathogenic evolved this agents ability liketo distinguish toxins (cholera a specific toxin), sialylated protozoa (Plasmodium),sugar code viruses[21,22] distinguishing (influenza virus), α2,3- bacteriaor α2,6-linked (Helicobacter SA in vertebrate pylori) use tissues cell surface[23]. One SA of asthe ligands most for notable examples is the flu virus tropism: human strains of influenza A virus bind selectively to cell adhesion [20] and have evolved this ability to distinguish a specific sialylated sugar code [21,22] SAα2,6-Gal epitopes that prevail in the human tracheal mucosal epithelium, whereas chimpanzee α α distinguishingstrains bind 2,3-selectively or 2,6-linked to SAα2,3-Gal SA in vertebrateepitopes primarily tissues [expressed23]. One ofin thetheir most tracheal notable mucosal examples is theepithelium flu virus tropism:[24–26], suggesting human strains that the of influenzaswitch to α A2,6-linked virus bind SA selectivelycould give tothe SA humanα2,6-Gal ancestor epitopes that prevailsome resistance in the human towards tracheal influenza mucosal viruses, epithelium, which later whereas on could chimpanzee have evolved strains and adapted bind selectively to the to SAα2,3-Galmodern epitopes humans. primarily expressed in their tracheal mucosal epithelium [24–26], suggesting that the switchThe to αSA2,6-linked metabolism SA is could complex give and the requires human ancestora large panel some of resistanceenzymes with towards various influenza subcellular viruses, whichlocalization later on could including have evolvedthe nuclear and CMP-Neu5Ac adapted to the synthase modern (CMAS), humans. the cytosolic UDP-GlcNAc The2-epimerase/ SA metabolismN-acetylmannosamine is complex andkinase requires (GNE), a large the panelcytosolic of enzymes cytidine with monophosphate- various subcellularN- acetylneuraminic acid hydroxylase (CMAH), the Golgi CMP-Neu5Ac transporter (SLC35A1), the localization including the nuclear CMP-Neu5Ac synthase (CMAS), the cytosolic UDP-GlcNAc Golgi sialyltransferases (ST) and sialidases (Neu) (Figure 2A) [27]. The distribution of SA in the 2-epimerase/N-acetylmannosamine kinase (GNE), the cytosolic cytidine monophosphate-N- metazoans further suggests that this sialylation machinery has evolved at least in the last common acetylneuraminic acid hydroxylase (CMAH), the Golgi CMP-Neu5Ac transporter (SLC35A1), the Golgi sialyltransferases (ST) and sialidases (Neu) (Figure2A) [ 27]. The distribution of SA in the metazoans further suggests that this sialylation machinery has evolved at least in the last common ancestor (LCA) of the metazoans, well before the divergence of protostomes (Ecdysozoa and Lophotrochozoa) and deuterostomes. Very little is known pertaining to the evolutionary history of each orthologous Int. J. Mol. Sci. 2016, 17, 1286 3 of 20 Int. J. Mol. Sci. 2016, 17, 1286 3 of 20 ancestor (LCA) of the metazoans, well before the divergence of protostomes (Ecdysozoa and Lophotrochozoa) and deuterostomes. Very little is known pertaining to the evolutionary history of gene.each However,orthologous these gene. However, show also these an unusualgenes sh andow patchyalso an phylogenetic unusual and distribution patchy phylogenetic with a huge genedistribution families’ with expansion a huge observedgene families’ in the expansion deuterostome observed lineages in the indicative deuterostome of the lineages prominent indicative role of sialoglycoconjugatesof the prominent role in of the sialoglycoconjugates deuterostome ancestor in th [e28 deuterostome–31] and selective ancestor loss in[28–31] most and non-deuterostome selective loss lineagesin most asnon-deuterostome well as in some lineages vertebrate as lineageswell as in (e.g., someCMAH vertebrategene). lineages The humans (e.g., CMAH cannot gene). synthesize The CMP-Neu5Gchumans cannot from synthesize CMP-Neu5Ac CMP-Neu5Gc because from the humanCMP-NeCMAHu5Ac becausegene was the inactivated human CMAH 2 million gene yearswas agoinactivated [32,33], an 2 activitymillion thatyears was ago independently [32,33], an activity lost in that the ferretswas independently [34], birds and lost reptiles in the [35 ferrets] (Figure [34],2B) . Interestingly,birds and reptiles a cmas [35]gene (Figure was identified2B). Interestingly, and characterized a cmas gene in was the D.identified melanogaster and characterizedgenome [36,37 in] the and moreover,D. melanogaster1 gne ,genome2 st and [36,37]2 neu genes and moreover, were identified 1 gne, in2 thest and porifera 2 neu Oscarellagenes were carmella identified[28,29 in,38 the,39 ], andporifera a SLC35A1 Oscarella-related carmella gene [28,29,38,39], was identified and ina SLC35A1 the tunicate-relatedCiona gene intestinalis was identifiedand C. elegansin the genomestunicate (personalCiona intestinalis data) suggesting and C. elegans the genomes ancient occurrence (personal data) and subsequentsuggesting the divergent ancient evolutionoccurrence of and the sialylationsubsequent machinery. divergent evolution of the sialylation machinery.

A NANS syhnthase GNE GNE Phosphatase CMAS SLC35A1 ST Neu UDP-GlcNAc ManNAc ManNAc-6-P NeuAc-9-P NeuAc CMP-Neu5Ac GP/GL-sialic acid GP/GL

CMAH Sialic acid CMP-Neu5Gc

B ST FAMILIES OTHER ACTORS

Neu1 GNE porifera 2 ST6Gal (Oca) ST3Gal Neu5

cnidaria 1 ST6GalNAc (Nve) SLC35A1 nematoda 0 (Cel) CMAS SLC35A1 artropoda 1 ST6Gal (Dme, Dya, Dps, Aga, Aae) Metazoa ST6Gal Neu1 GNE CMAS SLC35A1 ST6GalNAc Neu5 hemichordata 4 ST3Gal a (Sko) ST8Sia om st ~936 mya to ST6GalNAc Neu1 GNE CMAS SLC35A1 ro echinodermata 3 ST3Gal P (Spu) Neu5 Bilatera ns ST8Sia ria ra lac 1 ST3Gal CMAS SLC35A1 ~848 mya bu urochordata Am (Cin, Csa) Deuterostoma ST6Gal ST6GalNAc Neu1 GNE CMAS SLC35A1 ~748 mya cephalochordata 4 (Bfl) ST3Gal Neu5 Chordata ST8Sia ~735 mya agnatha R1 (Pma) ~608 mya R3 bony fishes (Tru, Tni, Dre, Ola, Omy, Gac) ST6Gal Neu1 GNE CMAS SLC35A1 CMAH Vertebrata ST6GalNAc R2 amphibians 4 Neu2, Neu3, Neu4 ~430 mya ST3Gal (Xtr, Xla) ST8Sia Tetrapoda birds ~320 mya (Gga, Tgu) mammals (Hsa, Ptr, Ssc, Bta, Rno, Mmu) Figure 2. Evolution of the biosynthetic pathway of sialic acids in Metazoa. (A) Schematic Figure 2. Evolution of the biosynthetic pathway of sialic acids in Metazoa. (A) Schematic representation representation of the vertebrate biosynthetic pathway of sialylated molecules. Key enzymes of the vertebrate biosynthetic pathway of sialylated molecules. Key enzymes implicated in the implicated in the biosynthetic pathway of sialic acids are indicated as follows: GNE: biosynthetic pathway of sialic acids are indicated as follows: GNE: UDP-GlcNAc2epimerase/ManNAc UDP-GlcNAc2epimerase/ManNAc kinase; NANS: Neu5Ac9-phosphate synthase; CMAS: kinase; NANS: Neu5Ac9-phosphate synthase; CMAS: CMP-Neu5Ac synthase: CMAH: CMP-Neu5Ac CMP-Neu5Ac synthase: CMAH: CMP-Neu5Ac hydroxylase; SLC35A1: CMP-Neu5Ac transporter; hydroxylase; SLC35A1: CMP-Neu5Ac transporter; ST: sialyltransferases, Neu: neuraminidase; ST: sialyltransferases, Neu: neuraminidase; (B) Illustration of the evolutionary history of the sialic (B) Illustration of the evolutionary history of the sialic acid biosynthetic pathway in the metazoans. acid biosynthetic pathway in the metazoans. Evidences of the occurrence of the biosynthetic Evidences of the occurrence of the biosynthetic pathway of sialylated molecules across the metazoans pathway of sialylated molecules across the metazoans have been obtained based on BLAST search have been obtained based on BLAST search analysis of the various actors in genomic databases. analysis of the various actors in genomic databases. Yellow stars indicate the two whole genome Yellow stars indicate the two whole genome duplication events (WGD R1–R2) that took place at the duplication events (WGD R1–R2) that took place at the base of vertebrates and the teleostean whole base of vertebrates and the teleostean whole genome duplication event (WGD R3) that occurred in the genome duplication event (WGD R3) that occurred in the stem of bony fish. stem of bony fish. The structural diversity of sialylated glycoconjugates is further ensured by a diverse set of STs consistingThe structural of 20 members diversity describe of sialylatedd in the glycoconjugateshuman tissues [40,41]. is further The ensuredSTs reside by and a diverse are strictly set of STsorganized consisting in the of trans 20 members-Golgi network described of eukaryotic in the human cells as tissues type [II40 transmembrane,41]. The STs resideproteins and with are strictlya similar organized topology in showing the trans a-Golgi short networkN-terminal of eukaryoticcytoplasmic cells tail, asa typesingle II transmembrane transmembrane domain, proteins with a similar topology showing a short N-terminal cytoplasmic tail, a single transmembrane Int. J. Mol. Sci. 2016, 17, 1286 4 of 20 domain, a stem domain and a large C-terminal catalytic domain oriented in the Golgi lumen [42]. The STs use CMP-β-Neu5Ac, CMP-β-Neu5Gc or CMP-β-Kdn as activated sugar donors for the sialylation at terminal positions of oligosaccharide chains of glycoconjugates. These STs are categorized into 4 families (ST6Gal, ST3Gal, ST6GalNAc and ST8Sia) [41,43] found in the GT-29 of the Carbohydrate-Active enZYme (CAZy) database [44] and named according to the glycosidic linkage formed and the monosaccharide acceptor [45]. Each family catalyzes the formation of different glycosidic linkages, α2–3, or α2–6 to the terminal Gal residue in N- or O-glycans, α2–6 to the terminal GalNAc residue in O-glycans and glycolipids and α2–8 to terminal SA residues in N- or O-glycans or glycolipids). The ST enzymatic activities have been documented mainly in mouse and human tissues and more recently in chicken [46,47], and to a lesser extent in the invertebrates like the fly D. melanogaster [48], the silkworm Bombyx mori [49], the amphioxus Branchiostoma floridae [1] and the tunicate C. intestinalis [5]. Each member of the mammalian ST3Gal and ST6Gal families shows exquisite acceptor specificities (for reviews, see [41,50,51]. However, as most of the STs have not been experimentally characterized, it remains unclear how these diverse biochemical functions evolved and what were the biological consequences of the functional diversification of STs. In the post-genomic era, a major biological question remains to elucidate the multi-level function of STs (i.e., biochemical, cellular or developmental functions) which can be achieved through the simultaneous study of different levels of biological organization and the use of computational means. The most represented vertebrate β-galactoside α2,3/6-sialyltransferases (ST3Gal and ST6Gal) offer the unique opportunity to understand deuterostome innovations and the STs functional evolution. The ST3Gal and ST6Gal are well studied enzymes catalyzing the transfer of sialic acid residues to the terminal galactose residues of either the type-I, type-II or type-III disaccharides (Galβ1,3GlcNAc; Galβ1,4GlcNAc or Galβ1,3GalNAc, respectively) resulting in the formation of α2–3 or α2–6 glycosidic linkages on terminal galactose (Gal) residues. In previous reports, we deciphered key genetic events, which led to the various ST3Gal and ST6Gal subfamilies described in the vertebrates, we established the evolutionary relationships of newly described STs and provided insights into the structure-function relationships of STs [39,52] and into their various biological functions [38,53]. Focusing on β-galactoside α2,3/6-sialyltransferases (ST3Gal and ST6Gal), we explore in this review the molecular evolution of β-galactoside α2,3/6 sialyltransferases with the goal of bringing an evolutionary perspective to the study of SA-based interactions and contributing a powerful approach for a better understanding of sialophenotype in vertebrates.

2. Genome-Wide Search of STs Genes Decline or Expansion? A general strategy using conventional BLAST search approaches [54] was adopted for homologous ST sequences identification in the transcriptomic and genomic databases like NCBI or ENSEMBL to reconstruct the animal ST genes repertoire and assign orthologies [39]. Although the vertebrate ST amino acid sequences show very limited overall sequence identity (around 20%), conserved peptide motifs have been described within their catalytic domain, which are very useful hallmark for ST identification. Different sets of protein regions considering three levels of amino acid sequence conservation have been described in the past that are retrieved from multiple sequence alignments (MSA) of (1) all animal ST called sialylmotifs L (large), S (small), III and VS (very small); (2) each family of ST, called family motifs a, b, c, d and e; (3) in each vertebrate subfamily [55]. This strategy led to the identification of a total number of 750 st3gal- and st6gal-related sequences in the genome of 127 metazoan species that represent a significant sampling of metazoan diversity illustrated in Figure2B. The st6gal and st3gal gene families show a broad phylogenetic distribution in Metazoan from sponges to mammals. The mRNA fragments identified from the Homosclerophore sponge O. carmella in the Porifera phylum suggested that ancestral st6gal1/2 and st3gal1/2/8 genes were already present in the earliest metazoans [38,53] and could represent the most ancient ST described in animals. This observation pointed also the early divergence of st3gal groups GR1 (st3gal1/2/8), GR2 (st3gal4/6/9), GR3 (st3gal3/3-r/5/7) and GRx (st3gal4/6/9/3/3-r/5/7) that far predates the divergence of Int. J. Mol. Sci. 2016, 17, 1286 5 of 20 protostomes and deuterostomes [53]. Interestingly, ST-related sequences possessing a conserved GT-29 protein domain Pfam00777 with sialylmotifs L, S, III and VS and no family motif could be identified in plants, in the green marine microalga Bathycoccus prasinos [56], in the haptophyte Emiliana huxleyi (XM_005778044) [57], in the cryptophyte alga Guillarda theta and in the red tide dinoflagelate Alexandrium minutum [29] suggesting the presence of an ancestral protist ST gene set. However, the evolutionary relationships of these more distantly related ST sequences are not yet clearly established and the origin of ST-related sequences in Metazoan remains enigmatic [13,31,43]. Since no ST-related sequence was identified in Choanoflagellates, the closest known relatives of metazoans [58], nor in fungi, it can be deduced the ancient origin of ST sequences and their subsequent disappearance in some metazoan branches like in the Nematoda C. elegans for both ST families, in protostome for the st3gal family, and echinoderms and tunicates for the st6gal family [38,53]. A large data set of β-galactoside α2,3/6-sialyltransferase related sequences was identified in vertebrate genomes and orthologs of the 8 known mammalian β-galactoside α2,3/6-sialyltransferase genes could be identified in fish and amphibian genomes with the notable exception of the st3gal6 gene that disappeared from fish genome (Table1). An important indication of innovation was obtained in the genome of vertebrates suggesting the occurrence of as-yet not described ST-related homologs in fish and tetrapods. The ST sequence identification in vertebrate genomes with key phylogenetic position like the sea lamprey Petromyzon marinus [59] at the stem of vertebrates or the spotted gar Lepisosteus oculatus at the base of teleosts [60] was helpful to propose an evolutionary scenario. These novel ST-related sequences could originate from gene duplication events like the two whole-genome duplications (WGD R1–R2) that occurred deep in the ancestry of the vertebrate lineage (2R hypothesis) [61].

Table 1. Vertebrate β-galactoside α2,3/6-sialyltransferases ohnologs: Vertebrate β-galactoside α2,3/6-sialyltransferase sequences belonging to the ST3Gal and ST6Gal families are grouped in 4 clades with distinct evolutionary origins (GR1, GR2 and GR3 for the ST3Gal and a unique group for the ST6Gal) encompassing 9 st3gal and 2 st6gal ohnologs. Acceptor substrate preferences of the mammalian enzymes and predicted acceptor substrate preference (in blue) of the novel vertebrate enzymes lost in mammals are indicated.

Ancestral Ohnologs Fish Tetrapods Group (Before 2nd (After 2nd (After 3rd Acc. Substrate Amphibians Birds Mammals WGDR) WGDR) WGDR) st3gal1 st3gal1 st3gal1 st3gal1 st3gal1 Galβ1,3GalNAc-Ser GR1 st3gal1/2/8 st3gal2 st3gal2 st3gal2 st3gal2 st3gal2 Galβ1,3GalNAc-Ser st3gal8 st3gal8 st3gal8 st3gal8 lost Galβ1,3GalNAc-Ser st3gal3 st3gal3 st3gal3 st3gal3 st3gal3 Galβ1,3GalNAc-R st3gal3-r GR2 st3gal3/5/7 st3gal5 st3gal5 st3gal5 st3gal5 st3gal5 GM3 synthase st3gal7 st3gal7 lost lost lost GM4 synthase st3gal4 st3gal4 st3gal4 st3gal4 st3gal4 Galβ1,4GlcNAc-R st3gal6 lost st3gal6 st3gal6 st3gal6 Galβ1,4GlcNAc-R GR3 st3gal4/6/9 lost (except st3gal9 lost lost st3gal9 Galβ1,4GlcNAc-R in platypus) st6gal1 st6gal1 st6gal1 st6gal1 st6gal1 Galβ1,4GlcNAc-R – st6gal1/2 st6gal2 st6gal2 st6gal2 st6gal2 st6gal2 GalNAcβ1,4GlcNAc-R st6gal2-r

3. Molecular Phylogeny of β-Galactoside α2,3/6-Sialyltransferases Molecular phylogeny, with the construction of phylogenetic trees has been used to get further insight into the orthology and structure/function relationships of the identified β-galactoside α2,3/6-sialyltransferase sequences. As a first step, multiple sequence alignments (MSA) using predicted protein sequences, clustal Omega or MUSCLE algorithms evidenced several informative amino acid sites in the catalytic domain of ST to construct phylogenetic trees [39]. Among these conserved motifs, sialylmotifs and family motifs detection helped establishing the global evolutionary Int. J. Mol. Sci. 2016, 17, 1286 6 of 20 relationships between the identified ST sequences and enabled sequence-based prediction of their molecular function. Phylogeny of β-galactoside α2,3/6-sialyltransferase sequences was reconstructed using various methods implemented in the Molecular Evolutionary Genetics Analysis (MEGA) software and the reliability of the branching pattern was assessed by the bootstrap method [39,62]. The topology of the trees indicated that the ST6Gal and ST3Gal sequences identified in invertebrates are orthologous to the common ancestor of vertebrate subfamily members as they branch out from the tree before the split into vertebrate ST subfamilies with the exception of the ST3Gal members of the GRx group, which disappeared during vertebrate evolution [38,53]. The molecular phylogeny of β-galactoside α2,3/6-sialyltransferases is displayed in Figure3 using iTOL [63]. Int. J. Mol. Sci. 2016, 17, 1286 7 of 20 Int. J. Mol. Sci. 2016, 17, x 7 of 20

Figure 3. Maximum Likelihood phylogenetic trees of protein sequences of β-galactoside α2,3/6-sialyltransferases (ST3Gal and ST6Gal). In both cases, the phylogenetic β α treesFigure were 3.inferredMaximum using Likelihood the Maximum phylogenetic Likelihood trees method of protein based sequences on the of Whelan-galactoside and Goldman2,3/6-sialyltransferases method with options (ST3Gal G and (gamma ST6Gal). distributi In bothon) cases, and the I phylogenetic(Invariant sites present).trees wereAlignments inferred were using performed the Maximum using Likelihood Clustal Omega, method basedavailable on theat the Whelan site andhttp://www.ebi.ac.UK/Tools/msa/clustalo/ Goldman method with options G (gamma (Data distribution) S1). Evolutionary and I (Invariant analyses sites were conductedpresent). in AlignmentsMEGA6. The were trees performed were re-drawn using Clustal using Omega,iTOL 3.2 available [63] availabl at thee site at http://www.ebi.ac.UK/Tools/msa/clustalo/the URL http://itol.embl.de/. (A) ST3Gal tree(Data was obtained S1). Evolutionary from 124 analyses sequences were and conducted in MEGA6. The trees were re-drawn using iTOL 3.2 [63] available at the URL http://itol.embl.de/. (A) ST3Gal tree was obtained from 124 sequences and 228 positions in the final data set; (B) ST6Gal tree was inferred from 50 sequences and 256 positions and 50 sequences in the final data set. 228 positions in the final data set; (B) ST6Gal tree was inferred from 50 sequences and 256 positions and 50 sequences in the final data set.

Int. J. Mol. Sci. 2016, 17, 1286 8 of 20

4. When β-Galactoside α2,3/6-Sialyltransferase Evolutionary Studies Meet Genome Reconstruction Gene organization and gene localization studies were used to assign the newly described st3gal and st6gal orthologs and to reconstruct the genetic events that have led to ST functional diversification in vertebrates. At the gene level, β-galactoside α2,3/6-sialyltransferase genes are polyexonic with an overall conserved exon/intron organization in each family from fish to mammals, which support the model of the common ancestral origin of each family (ST3Gal and ST6Gal) [55]. Interestingly, analysis of exon/intron organization and composition in the st6gal gene family showed that the st6gal1 genes encoded by frogs and fish have independently undergone different insertion events inside the first exon. These genetic events have led to an extended stem region of fish and frog ST6Gal I with potential impact on their enzyme activities [38]. It is speculated that during metazoan evolution, β-galactoside α2,3/6-sialyltransferase genes were subject to several duplication events affecting single genes or or whole genomes. As far as the st3gal genes are concerned, a first series of tandem duplication of an ancestral st3gal gene in proto-Metazoa stem led to the GR1 and GR2/GR3/GRx groups of α2,3-sialyltransferases before the Porifera emergence. As previously reported for α2,8-sialyltransferases [30], a second series of tandem duplication took place after the Porifera radiation that gave rise to the full diversity of α2,3-sialyltransferase groups, as confirmed using ancestral genome reconstruction data from Putnam et al. [53,64]. This further indicates that the functional diversity of st3gal groups was acquired well before vertebrate divergence. In addition, gene copy number variants (CNV) were described within various animal genomes that might have contributed to evolutionary novelties, although not much is known about the functional impact of CNVs [65]. For instance, 2 and 3 copies of the ancestral st6gal1/2 gene were identified in the amphioxus (B. floridae) and in the sea lamprey (P. marinus) genome, respectively. Similarly, 4 copies of the st3gal1 gene named st3gal1A, st3gal1B, st3gal1C and st3gal1D, which are not shared among other fish species could be identified in close chromosomal location in the zebrafish genome. In the vertebrate genomes, detection of conserved synteny (i.e., set of orthologous genes born by a chromosomal segment in different genomes) and of large sets of paralogons (i.e., pair of chromosomes bearing a set of paralogous genes in a given genome resulting from WGD) provided strong evidence of the 2 rounds of WGD, which likely occurred about 500 and 555 million years ago (MYA). These large scale genetic events generated a class of paralogs known as ohnologs [66] and the various st3gal and st6gal gene subfamilies described in Table1. Interestingly, 5 out of the 16 β-galactoside α2,3/6-sialyltransferase genes subfamilies generated after the 2 WGD events were immediately lost in the early vertebrate genome, while 4 other st3gal subfamilies were independently lost later on, in various vertebrate genomes like st3gal6 in teleosts or st3gal7 in tetrapods and st3gal8 in mammals (Table1). Similarly, almost all the ST duplicates generated after the teleost specific third WGD event at the base of Actinopterygii were lost with the exception of st6gal2-r and st3gal3-r genes conserved in the zebrafish genome. Finally, chromosomal localization and genome reconstruction studies [67] of the ST gene loci in the various vertebrate genomes indicated several major chromosomal rearrangement and translocations of ST genes like ST3GAL5 in the or st6gal1 in the zebrafish genome, which likely have undergone chromosomal translocation from Hsa2 to Hsa4 and from Dre15 to Dre21, respectively [38,53] (Figure4). Int. J. Mol. Sci. 2016, 17, 1286 9 of 20 Int. J. Mol. Sci. 2016, 17, 1286 9 of 20

Figure 4. Hypothetical scenario of thethe evolutionaryevolutionary historyhistory ofofβ β-galactoside-galactosideα α2,3/6-sialyltransferase2,3/6-sialyltransferase genes in in the the WGR WGR context. context. This This drawing drawing illustrates illustrates the theproposed proposed evolutionary evolutionary scenario scenario of st3gal of st3gal and st6galand st6gal genesgenes drawn drawn in line in with line the with 2R the hypothesis 2R hypothesis [61]. The [61 ].arrows The arrows indicate indicate the two the vertebrate two vertebrate whole genomewhole genome duplication duplication events (WGD-R1: events (WGD-R1: ~555 MYA ~555 an MYAd WGD-R2: and WGD-R2: ~500 MYA) ~500 and MYA) the andteleosts the specific teleosts wholespecific genome whole genome duplication duplication events events (WGD-R3: (WGD-R3: ~350 ~350MYA). MYA). A single A single st3gal1/2/8st3gal1/2/8 (GR1),(GR1), st3gal3/5/7st3gal3/5/7 (GR3), st3gal4/6/9 (GR2)(GR2) and and st6gal1/2st6gal1/2 gene in the stem bilaterian was duplicated twice before and after the emergence of agnathans, agnathans, raising raising 11 11 st3galst3gal andand st6galst6gal subfamiliessubfamilies at at the the base base of of gnathostomes. gnathostomes. 3 st3gal gene subfamilies ( st3gal7st3gal7,, st3gal8st3gal8 andand st3gal9st3gal9)) were were further further lost lost in in the the mammalian mammalian lineage. lineage. In Actinopterygii, Actinopterygii, after after WGD-R3 WGD-R3 the the two two duplicated duplicated genes genes st3gal3st3gal3-r -andr and st6gal2st6gal2-r are-r are maintained maintained in thein the zebrafish zebrafish genome, genome, whereas whereas the the st3gal6st3gal6 andand st3gal9st3gal9 genesgenes are are secondarily secondarily lost. lost. S. S.purpuratus purpuratus = Strongylocentrotus= Strongylocentrotus purpuratus purpuratus; C.; C. intestinalis intestinalis = =CionaCiona intestinalis intestinalis; ;B.B. floridae floridae == BranchiostomaBranchiostoma floridae floridae;; P. marinus == Petromyzon marinus;; C. milii = Calorhinchus milii; D. rerio = Danio rerio .

To account for ST gene novelties found in vertebrates, a refined nomenclature was proposed To account for ST gene novelties found in vertebrates, a refined nomenclature was proposed in Petit et al [39] based on the gene symbols and names assigned by the HUGO Gene in Petit et al [39] based on the gene symbols and names assigned by the HUGO Gene Nomenclature Committee (HGNC; http://www.genenames.org/cgi-bin/genefamilies/set/438) and Nomenclature Committee (HGNC; http://www.genenames.org/cgi-bin/genefamilies/set/438) and the ST nomenclature initially established by Tsuji et al. [45]. As described above, the vertebrate the ST nomenclature initially established by Tsuji et al. [45]. As described above, the vertebrate genomes genomes contain numerous ST-related genes that result from various duplication events. The newly contain numerous ST-related genes that result from various duplication events. The newly identified identified ST subfamilies were named according to their phylogenetic relationship with previously ST subfamilies were named according to their phylogenetic relationship with previously described ST described ST subfamilies as follows: (1) A genome-wide duplication event known as WGD-R3 took subfamilies as follows: (1) A genome-wide duplication event known as WGD-R3 took place in the ray place in the ray fin fish lineage leading to two copies of a gene that is otherwise found as a single fin fish lineage leading to two copies of a gene that is otherwise found as a single copy in tetrapods. copy in tetrapods. The symbols used for these specific fish duplicated genes are identical to those The symbols used for these specific fish duplicated genes are identical to those used for the mouse used for the mouse ST orthologs followed by “-r” meaning “-related” (Table 1); (2) Genes resulting ST orthologs followed by “-r” meaning “-related” (Table1); (2) Genes resulting from lineage-specific from lineage-specific small scale duplications are named according to the mouse ST orthologs small scale duplications are named according to the mouse ST orthologs symbol followed by A, B, C, D symbol followed by A, B, C, D (Table 1); (3) Finally, duplicates that resulted from whole genome (Table1); (3) Finally, duplicates that resulted from whole genome duplication events WGD-R1 and R2 duplication events WGD-R1 and R2 before the emergence of the teleosts branch are given a new ST before the emergence of the teleosts branch are given a new ST subfamily number and no additional subfamily number and no additional suffix is attributed. For instance, see in Table 1 the newly suffix is attributed. For instance, see in Table1 the newly described vertebrate st3gal7, st3gal8 and described vertebrate st3gal7, st3gal8 and st3gal9 gene subfamilies. The invertebrate ST genes are st3gal9 gene subfamilies. The invertebrate ST genes are orthologous to the common ancestor of the orthologous to the common ancestor of the vertebrate subfamilies and are named accordingly. For Int. J. Mol. Sci. 2016, 17, 1286 10 of 20 vertebrate subfamilies and are named accordingly. For instance, the D. melanogaster st6gal1/2 gene (also known as DSiaT) described in [48] and the C. intestinalis st3gal1/2 gene [5].

5. Conservation versus Changes in the β-Galactoside α2,3/6-Sialyltransferase Sequences Even though a phylogenetic tree might not be adequate to reflect relatedness between all sequences and may not provide sufficient resolution, the branch lengths are indicative of the sequence changes. To deduce the evolutionary rates, the branch lengths have to be divided by the elapsed corresponding time, calculated from the calibrations available in Hedges et al. [68]. As illustrated in Figure5, ST3Gal I, ST3Gal II, ST3Gal III and ST6Gal II have the most conserved sequences across the vertebrate lineages, whereas ST6Gal I and to a lesser extent ST3Gal VI and ST3Gal IV show a particularly high evolutionary rate in their catalytic domain during Amniotes differentiation [38,53]. Int. J. Mol. Sci. 2016, 17, 1286 11 of 20 Int. J. Mol. Sci. 2016, 17, x 11 of 20

FigureFigure 5. Evolutionary 5. Evolutionary rates ratesof β-galactoside of β-galactoside α2,3/6-sialyltransferaseα2,3/6-sialyltransferase subfamilies subfamilies in Vertebrates. in Vertebrates. (A) Trees (A obtained) Trees obtained from Minimum from Minimum Evolution Evolution and JTT model and JTT implemented model in MEGAimplemented 6.0 [69] using in MEGA 91 ST3Gal 6.0 [69 and] using 27 ST6Gal 91 ST3Gal sequen andces 27 allowed ST6Gal sequencescalculation allowedof the evolutionary calculation rates of the of evolutionary β-galactoside rates α2,3/6-sialyltransferase of β-galactoside α2,3/6-sialyltransferase subfamilies for the major divisionssubfamilies in Vertebrates for the major (Data divisions S2). For inAmniotes, Vertebrates 2 or (Data 3 sequence S2). Fors Amniotes,were taken, 2 orincluding 3 sequences at least were Man taken, and including a Marsupial at least in Mammal Man ands, a Chicken Marsupial and in Ostrich Mammals, in Avians, Chicken and Carolineand Ostrich Anole inand Avians, Burmese and CarolinePython in Anole Lepidosaurians. and Burmese PythonOrange inbac Lepidosaurians.kground denotes Orange the highest background values; denotes (B) Mean the highest evolutionary values; ( Bra)tes Mean in evolutionarythe different ratesβ-galactoside in the α2,3/6-sialyltransferasedifferent β-galactoside subfamiliesα2,3/6-sialyltransferase (±standard error). subfamilies The mean (˘ standardevolutionary error). rates The of mean ST3Gal evolutionary and ST6Gal rates subfamilies of ST3Gal andwere ST6Gal calculated subfamilies from Teleosteans were calculated to Amniotes. from TheTeleosteans standard errors to Amniotes. show variatio Thens standard in the different errors show subfamilies, variations from inthe the differenthighest in subfamilies, ST3Gal VIII from and theST6Gal highest I, to inthe ST3Gal lowest VIII in ST3Gal and ST6Gal II, ST3Gal I, to theI, ST3Gal lowest III in and ST3Gal ST6Gal II; (CII,) Highest ST3Gal I,evolution ST3Gal III rates and of ST6Gal β-galactoside II; (C) Highest α2,3/6-sialyltransferase evolution rates ofinβ vertebrate-galactoside evolutionaryα2,3/6-sialyltransferase tree. They correspond in vertebrate to the evolutionary cases where tree.an elevated They correspond value is observed to the in one casesor two where lineages. an elevated During valuethe differentiation is observed in of one Amniotes, or two lineages. we record During three thesubfamilies differentiation particularly of Amniotes, evolving we their record catalytic three subfamiliessequences, ST6Gal particularly I and evolvingat a lesser their extent ST3Galcatalytic III and sequences, ST3Gal IV. ST6Gal In the I lineage and at aof lesser Tetrapods, extent ST3Gal IV III and ST3Gal V IV. presentIn the lineagehigh evolutionary of Tetrapods, rates. ST3Gal In the IV ancestors and ST3Gal of birds V present and Lepidosaurians high evolutionary (snakes rates. and lizards,In the i.e.,ancestors Sauropsides), of birds there and is Lepidosaurians only one subfamily (snakes where and lizards, numerous i.e., changes Sauropsides), occur there in the is catalytic only one domain, subfamily the where ST3Gal numerous VIII, as changesmentioned occur later in on. the catalytic domain, the ST3Gal VIII, as mentioned later on.

Int. J. Mol. Sci. 2016, 17, 1286 12 of 20

Int. J. Mol. Sci. 2016, 17, x 12 of 20 It is useful to substantiate the proximity/divergence of sequences between the different It is useful to substantiate the proximity/divergence of sequences between the different subfamilies using other approaches like similarity network. Orthology inference and evolutionary subfamilies using other approaches like similarity network. Orthology inference and evolutionary relationships were analyzed using protein sequences and the approach of similarity network relationships were analyzed using protein sequences and the approach of similarity network visualization in which the nodes represent proteins and the edges indicate similarity in amino acid visualization in which the nodes represent proteins and the edges indicate similarity in amino acid sequence [70]. The generated network can be visualized in Cytoscape [71]. The similarity network of sequence [70]. The generated network can be visualized in Cytoscape [71]. The similarity network of a larger set of β-galactoside α2,3/6-sialyltransferase protein sequences demonstrated a high degree of a larger set of β-galactoside α2,3/6-sialyltransferase protein sequences demonstrated a high degree similarity between ST3Gal sequences belonging to the GR1 group (ST3Gal I/II/VIII) with the notable of similarity between ST3Gal sequences belonging to the GR1 group (ST3Gal I/II/VIII) with the exception of the fish ST3Gal sequences and a lower degree of similarity for the sequences belonging notable exception of the fish ST3Gal sequences and a lower degree of similarity for the sequences to the GR2 and GR3 groups [53]. Similar analysis conducted for ST6Gal sequences illustrated in belonging to the GR2 and GR3 groups [53]. Similar analysis conducted for ST6Gal sequences Figure6 highlighted a higher degree of similarity between the invertebrate ST6Gal I/II and vertebrate illustrated in Figure 6 highlighted a higher degree of similarity between the invertebrate ST6Gal I/II ST6Gal II sequences and pointed to a stronger conservation of ST6Gal II sequence at a stringent and vertebrate ST6Gal II sequences and pointed to a stronger conservation of ST6Gal II sequence at a threshold (E-value). stringent threshold (E-value).

Figure 6. Sequence similarity network of ST6Gal sequences. The Figure represents ST sequences as Figure 6. Sequence similarity network of ST6Gal sequences. The Figure represents ST sequences nodes (circles) and all pairwise sequence relationships (alignments) better than a BLAST E-value as nodes (circles) and all pairwise sequence relationships (alignments) better than a BLAST E-value threshold of 1E-83 as edges (lines). The network is composed by 126 ST6Gal sequences and 9 ST3Gal threshold of 1E-83 as edges (lines). The network is composed by 126 ST6Gal sequences and 9 ST3Gal sequences as control group (red circles). (A) The network is visualized using a Force Direct layout, sequences as control group (red circles). (A) The network is visualized using a Force Direct layout, where the length of the edges is inversely proportional to the sequence similarity. Sequences where the length of the edges is inversely proportional to the sequence similarity. Sequences belonging belonging to Invertebrates form a separate group from all ST6Gal sequences, pointing out the to Invertebrates form a separate group from all ST6Gal sequences, pointing out the dissimilarity with dissimilarity with ST6Gal I and ST6Gal II sequences. To better visualize the relationships of ST6Gal I and ST6Gal II sequences. To better visualize the relationships of invertebrates sequences, invertebrates sequences, we show only the edges that involve the Invertebrate sequences; In panel we show only the edges that involve the Invertebrate sequences; In panel (B) sequences are clustered (B) sequences are clustered by groups without using edges information (the edges are not by groups without using edges information (the edges are not proportional to sequence similarity). proportional to sequence similarity). The network shows that seven invertebrate sequences are The network shows that seven invertebrate sequences are related with sequences of ST6Gal2 and related with sequences of ST6Gal2 and ST6Gal1 of the Bird and Fish groups. The names of the ST6Gal1 of the Bird and Fish groups. The names of the invertebrate sequences related to other groups invertebrate sequences related to other groups and the number of edges are shown in the table. It is and the number of edges are shown in the table. It is important to note that the Invertebrate sequences important to note that the Invertebrate sequences do not show relationships with the mammalian do not show relationships with the mammalian ST6Gal1 sequences at this threshold. ST6Gal1 sequences at this threshold.

6.5. FateFate ofof VertebrateVertebrate DuplicatedDuplicated ST Genes AfterAfter a a gene gene duplication duplication event,event, thethe twotwo paralogousparalogous genesgenes areare identical.identical. Non-functionalization andand loss loss of of one one ofof thethe duplicatesduplicates byby accumulationaccumulation ofof deleteriousdeleterious mutationsmutations is is the the most most frequent frequent outcomeoutcome [66 [66,72],72] while while the the parental parental gene gene is maintainedis maintained active active (Figure (Figure7). As 7). mentioned As mentioned previously, previously, 5 out of5 theout 16ofβ the-galactoside 16 β-galactosideα2,3/6-sialyltransferase α2,3/6-sialyltransferase genes genes subfamilies subfamilies generated generated after theafter 2 WGDthe 2 eventsWGD thatevents took that place took at the place root ofat thethe vertebrate root of the lineage verteb wererate immediately lineage were lost immediately in the early vertebratelost in the genome. early Similarly,vertebrate only genome. 2 duplicated Similarly, ST genes, only namely2 duplicatedst3gal3 -STr and genes,st6gal2 namely-r were st3gal3 maintained-r and in st6gal2 the ray-finned-r were fishmaintained genome afterin the the ray-finned teleost-specific fish roundgenome of WGDafter the that teleost-specific occurred about roun 350 MYA.d of WGD A pseudogenization that occurred processabout 350 can occurMYA. atA largerpseudogenizati evolutionaryon process scales by can the occur accumulation at larger ofevolutionary loss-of-function scales mutations by the inaccumulation previously established of loss-of-function genes and mutations might alsoin prev influenceiously established the fate of genes the surviving and might paralogs also influence [73,74]. the fate of the surviving paralogs [73,74]. Interestingly, the inactivation of 4 st3gal subfamilies

Int. J. Mol. Sci. 2016, 17, 1286 13 of 20

Interestingly,Int. J. Mol. Sci. 2016 the, inactivation17, 1286 of 4 st3gal subfamilies occurred independently, in various vertebrate13 of 20 genomes like st3gal6 in teleosts or st3gal7 in tetrapods and st3gal8 in mammals, while st3gal9 was occurred independently, in various vertebrate genomes like st3gal6 in teleosts or st3gal7 in tetrapods maintained mainly in birds. Substitution rate analysis in each st3gal gene subfamily indicated and st3gal8 in mammals, while st3gal9 was maintained mainly in birds. Substitution rate analysis in st3gal7 st3gal8 st3gal9 a weakereach st3gal selective gene pressure subfamily on theindicated ,a weakerand selective genespressure and on acquisition the st3gal7 of, mutationsst3gal8 and that compromisedst3gal9 genes their and function acquisition in mammals of mutations [53]. Indeed,that compromised several st3gal theirpseudogenes function in could mammals be identified [53]. in theIndeed, human several genome st3gal that pseudogenes likely result could from pseudogenizationbe identified in the of ahuman once active genome gene that like likelyST3GAL8P resulton humanfrom chromosomepseudogenization 20 (ENSG00000242507). of a once active It gene is suggested like ST3GAL8P that inactivation on human of the chromosomest3gal8 gene in20 the mammalian(ENSG00000242507). ancestor became It is suggested possible that after inactivation alternative of or the more st3gal8 beneficial gene in the mammalian ancestor activity evolvedbecame in possible the stem after lineage alternative of mammals, or more beneficia which couldl glycosyltransferase have resultedin activity major evolved adaptive in changesthe stem in SAlineage metabolism. of mammals, which could have resulted in major adaptive changes in SA metabolism.

Figure 7. Evolutionary fate of ST gene duplicates after WGD events. On the left side, schematic Figure 7. Evolutionary fate of ST gene duplicates after WGD events. On the left side, schematic representation of ancestral polyexonic ST genes duplicates (exons are represented by colored boxes representation of ancestral polyexonic ST genes duplicates (exons are represented by colored boxes and genomic regulatory elements are represented by black and white circles). On the right side, and genomic regulatory elements are represented by black and white circles). On the right side, the three major evolutionary fates of the various newly created ST gene subfamilies are indicated the three major evolutionary fates of the various newly created ST gene subfamilies are indicated (1) pseudogenization: gene loss; (2) subfunctionalization: coding sequences and regulatory elements (1)evolve pseudogenization: and are partitioned gene loss;according (2) subfunctionalization: to specific molecular codingfunctions; sequences (3) neofunctionalization: and regulatory elementsone of evolvethe newly and are duplicated partitioned gene according accumulates to specific mutations molecular in its functions; coding region (3) neofunctionalization: and/or in its regulatory one of thegenomic newly duplicated elements giving gene accumulatesrise to new molecular mutations function. in its coding region and/or in its regulatory genomic elements giving rise to new molecular function. As illustrated in Figure 7, the function of the duplicated genes may diverge either because one or Asboth illustrated evolved innew Figure function7, the (neofunctionalization) function of the duplicated [75] or genes because may both diverge duplicates either becausepartition one the or bothancestral evolved gene new function (neofunctionalization)(subfunctionalization) [and75] orseveral because models both duplicateshave been partitionproposed the [76]. ancestral To geneunderstand function the (subfunctionalization) evolutionary forces that and have several infl modelsuenced the have ST beengene proposednumber and [76 ].their To functional understand thefate, evolutionary the expression forces profile that of have the various influenced β-galactoside the ST gene α2,3/6-sialyltransferase number and their genes functional was studied fate, the expressionacross vertebrates. profile of As the a variousfirst step,β -galactosidescreening of vaα2,3/6-sialyltransferaserious tissue EST libraries genes and statistical was studied analysis across vertebrates.using principalAs a component first step, screeninganalysis (PCA) of various of the expression tissue EST profile libraries accessible and statistical from the Unigene analysis data using base were undertaken [39]. The data pointed to a wider expression of st3gal and st6gal1 genes principal component analysis (PCA) of the expression profile accessible from the Unigene data base in mammals and birds, whereas teleost and amphibian st6gal1 genes showed a restricted profile were undertaken [39]. The data pointed to a wider expression of st3gal and st6gal1 genes in mammals of expression comparable to the one of st6gal2 genes suggesting a change in the expression and birds, whereas teleost and amphibian st6gal1 genes showed a restricted profile of expression profile of st6gal1 genes in amniotes. The expression pattern of the various β-galactoside comparable to the one of st6gal2 genes suggesting a change in the expression profile of st6gal1 genes in α2,3/6-sialyltransferase genes analyzed by means of RT-PCR in adult vertebrate tissues or using β α amniotes.whole mount The expression in situ hybridization pattern of the (ISH) various in th-galactosidee developing 2,3/6-sialyltransferasezebrafish embryo and genescomparative analyzed bygenomics means of approaches RT-PCR in adultconfirmed vertebrate a relative tissues conservation or using whole of the mount st6gal2 in gene situ expression hybridization in vertebrate (ISH) in the developingtissues, in zebrafish particular embryo in the central and comparative nervous system, genomics and approachesthe expansion confirmed of st6gal1 a relativegene expression conservation in of themammalianst6gal2 gene tissues expression [38]. In addition, in vertebrate rapid tissues, amplification in particular of cDNA in theends central (5’-RACE) nervous conducted system, in and fish the expansionand frog oftissuesst6gal1 demonstratedgene expression the occurrence in mammalian of a unique tissues st6gal1 [38]. transcript In addition, [38], rapid whereas amplification numerous of cDNAstudies ends highlighted (5’-RACE) the conducted 5’-untranslated in fish region and frogheterogeneity tissues demonstrated of the mammalian the occurrence st6gal1 genes of leading a unique st6gal1to severaltranscript mRNA [38 isoforms], whereas [55,77–79]. numerous These studies data highlighted confirmed the the increasing 5’-untranslated complexity region in heterogeneity the st6gal1 of thegene mammalian expression st6gal1profile genesin higher leading vertebrates to several an mRNAd suggested isoforms that [55phenotypic,77–79]. These differences data confirmed in the thesiaLome increasing between complexity organisms in the couldst6gal1 havegene arisen expression from changes profile in gene in higher regulation vertebrates and from and alterations suggested thatin phenotypicthe protein differencescoding region in theof st6gal1 siaLome gene between (e.g., a organisms neofunctionalization could have of arisen the st6gal1 from changesgene in birds in gene and mammals) [38]. As far as the st3gal genes are concerned, their functional fate could not be Int. J. Mol. Sci. 2016, 17, 1286 14 of 20

Int. J. Mol. Sci. 2016, 17, 1286 14 of 20 regulation and from alterations in the protein coding region of st6gal1 gene (e.g., a neofunctionalization predictedof the st6gal1 on thegene basis in birds of gene and expr mammals)ession [profile38]. As alone. far as theHowever,st3gal genes st3gal are gene concerned, losses were their tentatively functional linkedfate could with not relaxed be predicted gene evolution on the basis and of reduced gene expression gene expression. profile alone. These However, studies indicatedst3gal gene that losses the mostwere tentativelywidely expressed linked withst3gal relaxed genes genelike evolutionst3gal2 and and st3gal3 reduced were gene also expression. the most Theseevolutionary studies conserved,indicated that whereas the most st3gal widely genes losses expressed werest3gal linkedgenes to high like substitutionst3gal2 and ratesst3gal3 andwere to restricted also the tissue most expressionevolutionary [53]. conserved, whereas st3gal genes losses were linked to high substitution rates and to restricted tissue expression [53]. 6. Functional Divergence and Molecular Evolution of STs 7. FunctionalTo better understand Divergence the and molecular Molecular basis Evolution of STs fu ofnctional STs divergence after WGD, evolution of the functionTo better of understand the various the β-galactoside molecular basis α2,3/6-sialyltransferases of STs functional divergence was analyzed after WGD, from evolutiona structural of perspective.the function ofDespite the various sharingβ-galactoside primary andα2,3/6-sialyltransferases secondary structural wassimilarities, analyzed ST from have a structuraldifferent acceptorperspective. substrate Despite specificities sharing primary that can and be secondaryascribed to structural amino acid similarities, sites. A general ST have method different to acceptorpredict functionallysubstrate specificities important that sites can or be structural ascribed torole amino of amino acid sites.acid positions A general in method a protein to predict is to analyze functionally their conservationimportant sites level or structuralbased on the role assumption of amino acid that positions highly conserved in a protein positions is to analyze among their member conservation of the samelevel basedfamily. on the assumption that highly conserved positions among member of the same family. ToTo analyze the conservation level level in in the the ST6Gal ST6Gal family family sequences sequences of of ST6Gal ST6Gal I I and and ST6Gal ST6Gal II, II, proteinprotein sequencessequences were were aligned aligned separately separately to build to build a profile a profile and a MSA and comprisinga MSA comprising the two subfamilies the two subfamilieswas obtained was using obtained profile-profile using profile-profile mode with mode ClustalW. with TheClustalW. sequence The conservation sequence conservation was calculated was calculatedusing ConSurf using server ConSurf [80] andserver mapped [80] and into mapped the crystal into structure the crystal of humanstructure ST6Gal of human I in complex ST6Gal withI in complexcytidine with and phosphatecytidine and [81 phosphate]. As shown [81]. in As Figure shown8, the in Figure highest 8, conserved the highest residues conserved of ST6Galresidues are of ST6Galmostly locatedare mostly in the located active in site. the active site.

FigureFigure 8. High sequence conservation conservation near near the the binding binding site site in in ST6Gal. ST6Gal. ST6Gal ST6Gal I I and and ST6Gal ST6Gal II II show show aa high high level of conservation in the region surro surroundingunding the ligand binding site. MSA MSA were obtained usingusing using using profile-profile profile-profile mode with with clustalX clustalX an andd 101 vertebrate ST6Gal ST6Gal I I and and ST6Gal II sequences (Data(Data S3). Sequence conservationconservation waswas calculated calculated with with ConSurf ConSurf server, server, the the result result was was mapped mapped into into the thePDB PDB structure structure 4JS1, 4JS1, a crystal a crystal structure structure of humanof humanβ-galactoside β-galactosideα2,6-sialyltransferase α2,6-sialyltransferase I (ST6Gal I (ST6Gal I) inI) in a complex with cytidine (CTN) and phosphate (PO4). The structure is depicted in cartoon, colored a complex with cytidine (CTN) and phosphate (PO4). The structure is depicted in cartoon, colored by bythe the conservation conservation score. score. Ligand Ligand molecules molecules are shown are shown in ball in and ball stick and representation, stick representation, residues residues at contact at contactdistance distance (<5 Å) from(<5 Å) ligands from ligands are shown are inshown sphere. in sphere.

On the other hand, it is useful to decipher the specificity-determining positions (SDPs) of a family protein, i.e., the critical amino acids determining their functional specificity. These positions often play critical roles as they are involved in the molecular mechanisms ensuring functional diversity. Int. J. Mol. Sci. 2016, 17, 1286 15 of 20

On the other hand, it is useful to decipher the specificity-determining positions (SDPs) of a family protein, i.e., the critical amino acids determining their functional specificity. These positions often play criticalInt. rolesJ. Mol. Sci. as they2016, 17 are, 1286 involved in the molecular mechanisms ensuring functional diversity. 15 of 20

FigureFigure 9. Sequence 9. Sequence and and structural structural mapping mapping of of SDPs SDPs between between ST6GalI ST6GalI and and ST6GalII. ST6GalII. ((AA)) TheThe human proteinprotein sequence sequence of ST6Gal of ST6Gal I corresponding I corresponding to PDB to PDB 4js2 4js2 is shown is shown as aas reference a reference sequence, sequence, at at the the top top of the panel.of the Thepanel. functionally The functionally important important regions regions are indicated are indicated with colored with colored boxes andboxes 14 and SDPs 14 ofSDPs type of II are highlightedtype II are highlighted in red. SDP in prediction red. SDP prediction between ST6Galbetween I ST6Gal and ST6Gal I and IIST6Gal was carried II was outcarried using out SPEER using serverSPEER [82] usingserver the [82] MSA using described the MSA previously described that prev wasiously composed that was by composed 48 sequences by 48 belonging sequences to ST6Galbelonging I and 53 to of ST6Gal ST6Gal I IIand (Data 53 of S3). ST6Gal A total II (Data of 78 andS3). 69A total SDPs of of 78 Type andI 69 and SDPs Type ofII Type respectively I and Type were II respectively were predicted (Data S4). At the bottom of the panel, a table is shown with the 14 Type II predicted (Data S4). At the bottom of the panel, a table is shown with the 14 Type II SDPs with a reliable SDPs with a reliable score. The first column indicates the position in the multiple sequence alignment score. The first column indicates the position in the multiple sequence alignment (MSA), the second (MSA), the second column the position in the reference sequence, the 3rd and 4th columns indicate column the position in the reference sequence, the 3rd and 4th columns indicate the conserved amino the conserved amino acid in the ST6Gal I and ST6Gal II sequences, respectively. The SDPs are sorted acid in the ST6Gal I and ST6Gal II sequences, respectively. The SDPs are sorted according to the according to the predictive score, from higher to lower; from top to bottom and from left to right; predictive score, from higher to lower; from top to bottom and from left to right; (B) The functionally (B) The functionally important regions and SDPs are shown in the structure of the reference human important regions and SDPs are shown in the structure of the reference human ST6Gal I sequence. ST6Gal I sequence. In addition, the surfaces of the substrates are shown; (C) Close-up of the glycan In addition, the surfaces of the substrates are shown; (C) Close-up of the glycan binding site of ST6Gal binding site of ST6Gal I, hydrogen bonds are denoted by dashed lines; (D) Close-up of the glycan I, hydrogen bonds are denoted by dashed lines; (D) Close-up of the glycan binding site, where the Gal binding site, where the Gal was modeled to GalNAc to represent ST6Gal II binding site. was modeled to GalNAc to represent ST6Gal II binding site. Within a MSA, SDPs are amino acid positions that show a pattern of conservation in agreement withWithin subfamily a MSA, divergence. SDPs are amino Two type acids of positions SDPs can that be showdistinguished: a pattern a ofType conservation I corresponds in agreementto diverse withamino subfamily acids divergence.in one group Two and typesa conserved of SDPs one can in bethe distinguished: other(s) reflecting a Type different I corresponds levels of functional to diverse aminoconstraints acids in onebetween group duplicated and a conserved genes, onewhereas in the Ty other(s)pe II positions reflecting are different characterized levels ofby functional different constraintsconserved between amino duplicatedacids among genes, groups whereas associated Type to divergent II positions constraints are characterized [82,83]. SDP by prediction different conservedbetween amino the three acids vertebrate among groupsST3Gal groups associated (GR1–GR3 to divergent) led to the constraints identification [82,83 of]. five SDP SDP, prediction namely betweenS197, theY233, three V234, vertebrate W304 and ST3Gal N307 groupsin the reference (GR1–GR3) porcine led toST3Gal the identification I structure (PDB: of five 2WNB) SDP, that namely are located in the active site [53]. These SDPs in close contact to the ST3Gal substrates are indicative of S197, Y233, V234, W304 and N307 in the reference porcine ST3Gal I structure (PDB: 2WNB) that are the functional divergence of each group of ST3Gal sequences in early vertebrates. As illustrated in located in the active site [53]. These SDPs in close contact to the ST3Gal substrates are indicative of the Figure 9, SDP prediction between ST6Gal I and ST6Gal II was carried out using SPEER server [82,84]. functional divergence of each group of ST3Gal sequences in early vertebrates. As illustrated in Figure9, Six SDPs corresponding to positions 95, 122, 169, 357, 359 and 380 in the reference sequence human SDP prediction between ST6Gal I and ST6Gal II was carried out using SPEER server [82,84]. Six SDPs ST6Gal I (PDB: 4js2) [81] localized at the protein surface and could reflect protein-protein interaction evolution. Two type II SDPs corresponding to L326 and F346 were found in the sialylmotif S, which is involved in acceptor and donor binding [85] and three others V352, Q357 and F359 were found in the mobile loop nearby sialylmotif III (Figure 9A,B). Interestingly, the highest scored SDP prediction Int. J. Mol. Sci. 2016, 17, 1286 16 of 20 corresponding to positions 95, 122, 169, 357, 359 and 380 in the reference sequence human ST6Gal I (PDB: 4js2) [81] localized at the protein surface and could reflect protein-protein interaction evolution. Two type II SDPs corresponding to L326 and F346 were found in the sialylmotif S, which is involved in acceptor and donor binding [85] and three others V352, Q357 and F359 were found in the mobile loop nearby sialylmotif III (Figure9A,B). Interestingly, the highest scored SDP prediction corresponds to position Y122 in the reference structure that is located near the N-glycan binding site. A Tyr residue is conserved in ST6Gal I, whereas a His is conserved in ST6Gal II sequences. The hydrogen bonds involving the residues Y122, D274 and Y369 and the α1,3Man branch of the N-glycan, place the galactose (Gal) in the vicinity of CMP and the catalytic residue H370 [81] (Figure9C). To visualize the impact of the amino acid change at this position, we performed in silico mutation Y122H and the Gal residue was modified to N-acetylgalactosamine (GalNAc), the monosaccharide acceptor for the ST6Gal II activity [86]. Point mutation was performed using the Dunbrack backbone-dependent rotamer library [87], the most probable rotamer was chosen and changes in the structure were followed by structure minimization (Figure9D). The ST6Gal II H122 residue can participate in hydrogen bond with the Y369, whereas the D274 can interact with K121 and GalNAc. The model also shows that the N atom of GalNAc can be involved in a hydrogen bond with Y275. In summary they are no dramatic changes at the substrate binding site between ST6Gal I and ST6Gal II, however a semi-conservative mutation, such as Y122H, can impact in the substrate stabilization.

8. Conclusion This review has reported an overview of the evolutionary history of the β-galactoside α2,3/6-sialyltransferases. The human/mouse ST3Gal and ST6Gal families are comprised of eight members with low overall sequence similarities except for the conserved sialylmotif and family motifs in the catalytic domain that are hallmarks for homologs identification in databases. Genetics, molecular phylogeny and functional genomics approaches have been used to decipher their evolutionary relationships in the context of the dynamic remodeling of genome content (gene loss/gain, segmental and whole genome duplication events). Interestingly, the st3gal and st6gal genes could be identified in the sponge O. carmela suggesting their ancient occurrence in the metazoans and their expansion in deuterostome lineages. The 8 human st3gal and st6gal orthologs were also identified in all vertebrate genomes with the notable exception of the st3gal6 gene, which has been lost in bony fish. In addition, several novel and less conserved st3gal subfamilies have been described in non-mammalian vertebrates, some of which are restricted to birds and duck-billed platypus (e.g., st3gal9) or to fish (e.g., st3gal7) that could be associated with specialized or species-specific tasks. Finally, protein sequence and structural analyses shed light into the functional evolution of ST3Gal and ST6Gal, their enzymatic specificities and their role in cell-cell interactions and diseases.

Supplementary Materials: Supplementary materials can be found at www.mdpi.com/1422-0067/17/8/1286/s1. Acknowledgments: This work has been funded by the Centre National de La Recherche Scientifique (CNRS), by the University of Lille Nord de France (PPF bioinformatique of the Lille1 University), by the ANR-2010-BLAN-120401 grant from the Agence Nationale de la Recherche, by the Ligue régionale de la recherche contre le cancer (2016). Author Contributions: Roxana E. Teppa, Daniel Petit and Anne Harduin-Lepers conceived the study, performed the experiments and analyzed the data; Olga Plechakova, Virginie Cogez and Anne Harduin-Lepers handled the sequences and sequence files; Roxana E. Teppa, Daniel Petit and Anne Harduin-Lepers wrote the manuscript. All authors read and approved the final manuscript. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Guerardel, Y.; Chang, L.Y.; Fujita, A.; Coddeville, B.; Maes, E.; Sato, C.; Harduin-Lepers, A.; Kubokawa, K.; Kitajima, K. SiaLome analysis of the cephalochordate branchiostoma belcheri, a key organism for vertebrate evolution. Glycobiology 2012, 22, 479–491. [CrossRef][PubMed] Int. J. Mol. Sci. 2016, 17, 1286 17 of 20

2. Schauer, R. Sialic acids: Fascinating sugars in higher animals and man. Zoology (Jena) 2004, 107, 49–64. [CrossRef][PubMed] 3. Schauer, R.; Kamerling, J. Chemistry, biochemistry and biology of sialic acids. In Glycoproteins II; Montreuil, J., Vliegenthart, J., Schachter, H., Eds.; Elsevier: Amsterdam, The Netherlands, 1997; Volume 29b, pp. 243–402. 4. Varki, A. Diversity in the sialic acids. Glycobiology 1992, 2, 25–40. [CrossRef][PubMed] 5. Lehmann, F.; Kelm, S.; Dietz, F.; von Itzstein, M.; Tiralongo, J. The evolution of galactose α2,3-sialyltransferase: Ciona intestinalis ST3GAL I/II and takifugu rubripes ST3GAL II sialylate Galbeta1, 3GalNAc structures on glycoproteins but not glycolipids. Glycoconj. J. 2008, 25, 323–334. [CrossRef][PubMed] 6. Corfield, A.P.; Berry, M. Glycan variation and evolution in the eukaryotes. Trends Biochem. Sci. 2015, 40, 351–359. [CrossRef][PubMed] 7. Gagneux, P.; Varki, A. Evolutionary considerations in relating oligosaccharide diversity to biological function. Glycobiology 1999, 9, 747–755. [CrossRef][PubMed] 8. Koles, K.; Repnikova, E.; Pavlova, G.; Korochkin, L.I.; Panin, V.M. Sialylation in protostomes: A perspective from drosophila genetics and biochemistry. Glycoconj. J. 2009, 26, 313–324. [CrossRef][PubMed] 9. Roth, J.; Kempf, A.; Reuter, G.; Schauer, R.; Gehring, W.J. Occurrence of sialic acids in drosophila melanogaster. Science 1992, 256, 673–675. [CrossRef][PubMed] 10. Malykh, Y.N.; Krisch, B.; Gerardy-Schahn, R.; Lapina, E.B.; Shaw, L.; Schauer, R. The presence of N-acetylneuraminic acid in malpighian tubules of larvae of the cicada philaenus spumarius. Glycoconj. J. 1999, 16, 731–739. [CrossRef][PubMed] 11. Saito, M.; Kitamura, H.; Sugiyama, K. Occurrence of gangliosides in the common squid and pacific octopus among protostomia. Biochim. Biophys. Acta 2001, 1511, 271–280. [CrossRef] 12. Kim, C. Sialic acid (N-acetylneuraminic acid) as the functional molecule for differentiation between animal and plant kingdom. J. Glycom. Lipidom. 2014, 4, e116. 13. Angata, T.; Varki, A. Chemical diversity in the sialic acids and related aketo acids: An evolutionary perspective. Chem. Rev. 2002, 102, 439–469. [CrossRef][PubMed] 14. Corfield, A.; Schauer, R. Sialic acids: Chemistry, metabolism and function. In Cell Biology Monograph; Springer: Berlin, Germany, 1982; Volume 10, pp. 5–50. 15. Cohen, M.; Varki, A. The siaLome—Far more than the sum of its parts. J. Integr. Biol. 2010, 14, 455–464. [CrossRef][PubMed] 16. Schauer, R. Sialic acids as regulators of molecular and cellular interactions. Curr. Opin. Struct. Biol. 2009, 19, 507–514. [CrossRef][PubMed] 17. Corfield, A.P. Mucins: A biologically relevant glycan barrier in mucosal protection. Biochim. Biophys. Acta 2015, 1850, 236–252. [CrossRef][PubMed] 18. Angata, T.; Brinkman-Van der Linden, E. I-type lectins. Biochim. Biophys. Acta 2002, 1572, 294–316. [CrossRef] 19. Kelm, S.; Schauer, R. Sialic acids in molecular and cellular interactions. Int. Rev. Cytol. 1997, 175, 137–240. [PubMed] 20. Schnaar, R.L.; Gerardy-Schahn, R.; Hildebrandt, H. Sialic acids in the brain: Gangliosides and polysialic acid in nervous system development, stability, disease, and regeneration. Physiol. Rev. 2014, 94, 461–518. [CrossRef][PubMed] 21. Gabius, H.J. Biological information transfer beyond the genetic code: The sugar code. Naturwissenschaften 2000, 87, 108–121. [CrossRef][PubMed] 22. Gabius, H.J.; Andre, S.; Kaltner, H.; Siebert, H.C. The sugar code: Functional lectinomics. Biochim. Biophys. Acta 2002, 1572, 165–177. [CrossRef] 23. Lehmann, F.; Tiralongo, E.; Tiralongo, J. Sialic acid-specific lectins: Occurrence, specificity and function. Cell. Mol. Life Sci. 2006, 63, 1331–1354. [CrossRef][PubMed] 24. Gagneux, P.; Cheriyan, M.; Hurtado-Ziola, N.; van der Linden, E.C.; Anderson, D.; McClure, H.; Varki, A.; Varki, N.M. Human-specific regulation of α2–6-linked sialic acids. J. Biol. Chem. 2003, 278, 48245–48250. [CrossRef][PubMed] 25. Hayakawa, T.; Varki, A. Human-specific changes in sialic acid biology. In Post-Genome Biology of Primates; Hirai, H., Imai, H., Go, Y., Eds.; Springer Tokyo: Tokyo, Japan, 2012; pp. 123–148. 26. Varki, N.M.; Strobert, E.; Dick, E.J., Jr.; Benirschke, K.; Varki, A. Biomedical differences between human and nonhuman hominids: Potential roles for uniquely human aspects of sialic acid biology. Annu. Rev. Pathol. 2011, 6, 365–393. [CrossRef][PubMed] Int. J. Mol. Sci. 2016, 17, 1286 18 of 20

27. Li, Y.; Chen, X. Sialic acid metabolism and sialyltransferases: Natural functions and applications. Appl. Microbiol. Biotechnol. 2012, 94, 887–905. [CrossRef][PubMed] 28. De Mendoza, A.; Ruiz-Trillo, I. The mysterious evolutionary origin for the GNE gene and the root of bilateria. Mol. Biol. Evol. 2011, 28, 2987–2991. [CrossRef][PubMed] 29. Giacopuzzi, E.; Bresciani, R.; Schauer, R.; Monti, E.; Borsani, G. New insights on the sialidase protein family revealed by a phylogenetic analysis in metazoa. PLoS ONE 2012, 7, e44193. [CrossRef][PubMed] 30. Harduin-Lepers, A.; Petit, D.; Mollicone, R.; Delannoy, P.; Petit, J.M.; Oriol, R. Evolutionary history of the α2,8-sialyltransferase (ST8Sia) gene family: Tandem duplications in early deuterostomes explain most of the diversity found in the vertebrate ST8Sia genes. BMC Evol. Biol. 2008, 8, 258. [CrossRef][PubMed] 31. Simakov, O.; Kawashima, T.; Marletaz, F.; Jenkins, J.; Koyanagi, R.; Mitros, T.; Hisata, K.; Bredeson, J.; Shoguchi, E.; Gyoja, F.; et al. Hemichordate genomes and deuterostome origins. Nature 2015, 527, 459–465. [CrossRef][PubMed] 32. Chou, H.H.; Hayakawa, T.; Diaz, S.; Krings, M.; Indriati, E.; Leakey, M.; Paabo, S.; Satta, Y.; Takahata, N.; Varki, A. Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proc. Natl. Acad. Sci. USA 2002, 99, 11736–11741. [CrossRef][PubMed] 33. Varki, A. Loss of N-glycolylneuraminic acid in humans: Mechanisms, consequences, and implications for hominid evolution. Am. J. Phys. Anthropol. 2001, 116 (Suppl. 33), S54–S69. [CrossRef][PubMed] 34. Ng, P.S.; Bohm, R.; Hartley-Tassell, L.E.; Steen, J.A.; Wang, H.; LUKowski, S.W.; Hawthorne, P.L.; Trezise, A.E.; Coloe, P.J.; Grimmond, S.M.; et al. Ferrets exclusively synthesize Neu5Ac and express naturally humanized influenza a virus receptors. Nat. Commun. 2014, 5, 5750. [CrossRef][PubMed] 35. Schauer, R.; Srinivasan, G.V.; Coddeville, B.; Zanetta, J.P.; Guerardel, Y. Low incidence of N-glycolylneuraminic acid in birds and reptiles and its absence in the platypus. Carbohydr. Res. 2009, 344, 1494–1500. [CrossRef][PubMed] 36. Sellmeier, M.; Weinhold, B.; Münster-Kühnel, A. Cmp-sialic acid synthetase: The point of constriction in the sialylation pathway. In Sialoglyco Chemistry and Biology I; Springer: Berlin, Germany, 2013; Volume 366, pp. 139–167. 37. Viswanathan, K.; Tomiya, N.; Park, J.; Singh, S.; Lee, Y.C.; Palter, K.; Betenbaugh, M.J. Expression of a functional drosophila melanogaster CMP-sialic acid synthetase: Differential localization of the drosophila and human enzymes. J. Biol. Chem. 2006, 281, 15929–15940. [CrossRef][PubMed] 38. Petit, D.; Mir, A.M.; Petit, J.M.; Thisse, C.; Delannoy, P.; Oriol, R.; Thisse, B.; Harduin-Lepers, A. Molecular phylogeny and functional genomics of β-galactoside α2,6-sialyltransferases that explain ubiQuitous expression of st6gal1 gene in amniotes. J. Biol. Chem. 2010, 285, 38399–38414. [CrossRef][PubMed] 39. Petit, D.; Teppa, R.E.; Petit, J.M.; Harduin-Lepers, A. A practical approach to reconstruct evolutionary history of animal sialyltransferases and gain insights into the sequence-function relationships of golgi-. In Glycosyltransferases: Methods and Protocols; Brockausen, I., Ed.; Springer, Humana Press: New York, NY, USA, 2013; Volume 1022, pp. 73–97. 40. Harduin-Lepers, A. Vertebrate sialyltransferases. In Sialobiology: Structure, Biosynthesis and Function; Sialic Acid Glycoconjugates in Health and Diseases; Tiralongo, J., Martinez-Duncker, I., Eds.; Bentham Science: Schiphol, The Netherlands, 2013; Volume 5, pp. 139–187. 41. Harduin-Lepers, A.; Vallejo-Ruiz, V.; Krzewinski-Recchi, M.A.; Samyn-Petit, B.; Julien, S.; Delannoy, P. The human sialyltransferase family. Biochimie 2001, 83, 727–737. [CrossRef] 42. Paulson, J.C.; Colley, K.J. Glycosyltransferases. Structure, localization, and control of cell type-specific glycosylation. J. Biol. Chem. 1989, 264, 17615–17618. [PubMed] 43. Harduin-Lepers, A.; Mollicone, R.; Delannoy, P.; Oriol, R. The animal sialyltransferases and sialyltransferase-related genes: A phylogenetic approach. Glycobiology 2005, 15, 805–817. [CrossRef] [PubMed] 44. Cantarel, B.L.; Coutinho, P.M.; Rancurel, C.; Bernard, T.; Lombard, V.; Henrissat, B. The carbohydrate-active enzymes database (CAZy): An expert resource for glycogenomics. Nucleic Acids Res. 2009, 37, 233–238. [CrossRef][PubMed] 45. Tsuji, S.; Datta, A.K.; Paulson, J.C. Systematic nomenclature for sialyltransferases. Glycobiology 1996, 6, 5–7. 46. Kidani, S.; Kaneoka, H.; Okuzaki, Y.; Asai, S.; Kojima, Y.; Nishijima, K.I.; Iijima, S. Analyses of chicken sialyltransferases related to O-glycosylation. J. Biosci. Bioeng. 2016.[CrossRef][PubMed] Int. J. Mol. Sci. 2016, 17, 1286 19 of 20

47. Kojima, Y.; Mizutani, A.; Okuzaki, Y.; Nishijima, K.; Kaneoka, H.; Sasamoto, T.; Miyake, K.; Iijima, S. Analyses of chicken sialyltransferases related to N-glycosylation. J. Biosci. Bioeng. 2015, 119, 623–628. [CrossRef] [PubMed] 48. Koles, K.; Irvine, K.D.; Panin, V.M. Functional characterization of drosophila sialyltransferase. J. Biol. Chem. 2004, 279, 4346–4357. [CrossRef][PubMed] 49. Kajiura, H.; Hamaguchi, Y.; Mizushima, H.; Misaki, R.; Fujiyama, K. Sialylation potentials of the silkworm, Bombyx mori; B. Mori possesses an active α2,6-sialyltransferase. Glycobiology 2015, 25, 1441–1453. [CrossRef] [PubMed] 50. Harduin-Lepers, A.; Recchi, M.A.; Delannoy, P. 1994, the year of sialyltransferases. Glycobiology 1995, 5, 741–758. [CrossRef][PubMed] 51. Takashima, S. Characterization of mouse sialyltransferase genes: Their evolution and diversity. Biosci. Biotechnol. Biochem. 2008, 72, 1155–1167. [CrossRef][PubMed] 52. Audry, M.; Jeanneau, C.; Imberty, A.; Harduin-Lepers, A.; Delannoy, P.; Breton, C. Current trends in the structure-activity relationships of sialyltransferases. Glycobiology 2011, 21, 716–726. [CrossRef][PubMed] 53. Petit, D.; Teppa, E.; Mir, A.-M.; Vicogne, D.; Thisse, C.; Thisse, B.; Filloux, C.; Harduin-Lepers, A. Integrative view of α2,3-sialyltransferases (ST3Gal) molecular and functional evolution in deuterostomes: Significance of lineage-specific losses. Mol. Biol. Evol. 2015, 32, 906–927. [CrossRef][PubMed] 54. Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped blast and PSI-blast: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [CrossRef][PubMed] 55. Harduin-Lepers, A. Comprehensive analysis of sialyltransferases in vertebrate genomes. Glycobiol. Insights 2010, 2, 29–61. [CrossRef] 56. Moreau, H.; Verhelst, B.; Couloux, A.; Derelle, E.; Rombauts, S.; Grimsley, N.; van Bel, M.; Poulain, J.; Katinka, M.; Hohmann-Marriott, M.F.; et al. Gene functionalities and genome structure in bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome Biol. 2012, 13, 74–90. [CrossRef][PubMed] 57. Read, B.A.; Kegel, J.; Klute, M.J.; Kuo, A.; Lefebvre, S.C.; Maumus, F.; Mayer, C.; Miller, J.; Monier, A.; Salamov, A.; et al. Pan genome of the phytoplankton emiliania underpins its global distribution. Nature 2013, 499, 209–213. [CrossRef][PubMed] 58. King, N.; Westbrook, M.J.; Young, S.L.; Kuo, A.; Abedin, M.; Chapman, J.; Fairclough, S.; Hellsten, U.; Isogai, Y.; Letunic, I.; et al. The genome of the choanoflagellate monosiga brevicollis and the origin of metazoans. Nature 2008, 451, 783–788. [CrossRef][PubMed] 59. Smith, J.J.; Kuraku, S.; Holt, C.; SaUKa-Spengler, T.; Jiang, N.; Campbell, M.S.; Yandell, M.D.; ManoUSAki, T.; Meyer, A.; Bloom, O.E.; et al. Sequencing of the sea lamprey (petromyzon marinus) genome provides insights into vertebrate evolution. Nat. Genet. 2013, 45, 415–421. [CrossRef][PubMed] 60. Braasch, I.; Postlethwait, J.H. The teleost agouti-related protein 2 gene is an ohnolog gone missing from the tetrapod genome. Proc. Natl. Acad. Sci. USA 2011, 108, 47–48. [CrossRef][PubMed] 61. Ohno, S. Evolution by Gene Duplication; Springer-Verlag: Heidelberg, Germany, 1970; pp. 1–151. 62. Kumar, S.; Stecher, G.; Tamura, K. Mega7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 2016, 33, 1870–1874. [CrossRef][PubMed] 63. Letunic, I.; Bork, P. Interactive tree of life (iTOL) v3: An online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016, 44, 242–245. [CrossRef][PubMed] 64. Putnam, N.H.; Butts, T.; Ferrier, D.E.; Furlong, R.F.; Hellsten, U.; Kawashima, T.; Robinson-Rechavi, M.; Shoguchi, E.; Terry, A.; Yu, J.K.; et al. The amphioxus genome and the evolution of the chordate karyotype. Nature 2008, 453, 1064–1071. [CrossRef][PubMed] 65. Henrichsen, C.N.; Chaignat, E.; Reymond, A. Copy number variants, diseases and gene expression. Hum. Mol. Genet. 2009, 18, 1–8. [CrossRef][PubMed] 66. Wolfe, K.H. Yesterday’s polyploids and the mystery of diploidization. Nat. Rev. Genet. 2001, 2, 333–341. [CrossRef][PubMed] 67. Yegorov, S.; Good, S. Using paleogenomics to study the evolution of gene families: Origin and duplication history of the relaxin family hormones and their receptors. PLoS ONE 2012, 7, e32923. [CrossRef][PubMed] 68. Hedges, S.B.; Marin, J.; Suleski, M.; Paymer, M.; Kumar, S. Tree of life reveals clock-like speciation and diversification. Mol. Biol. Evol. 2015, 32, 835–845. [CrossRef][PubMed] Int. J. Mol. Sci. 2016, 17, 1286 20 of 20

69. Tamura, K.; Stecher, G.; Peterson, D.; Filipski, A.; Kumar, S. Mega6: Molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 2013, 30, 2725–2729. [CrossRef][PubMed] 70. Atkinson, H.J.; Morris, J.H.; Ferrin, T.E.; Babbitt, P.C. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS ONE 2009, 4, e4345. [CrossRef][PubMed] 71. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [CrossRef][PubMed] 72. Lynch, M.; Conery, J.S. The evolutionary fate and consequences of duplicate genes. Science 2000, 290, 1151–1155. [CrossRef][PubMed] 73. Albalat, R.; Canestro, C. Evolution by gene loss. Nat. Rev. Genet. 2016, 17, 379–391. [CrossRef][PubMed] 74. Canestro, C.; Albalat, R.; Irimia, M.; Garcia-Fernandez, J. Impact of gene gains, losses and duplication modes on the origin and diversification of vertebrates. Semin. Cell Dev. Biol. 2013, 24, 83–94. [CrossRef][PubMed] 75. Force, A.; Lynch, M.; Pickett, F.B.; Amores, A.; Yan, Y.L.; Postlethwait, J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 1999, 151, 1531–1545. [PubMed] 76. Innan, H.; Kondrashov, F. The evolution of gene duplications: Classifying and distinguishing between models. Nat. Rev. Genet. 2010, 11, 97–108. [CrossRef][PubMed] 77. Lo, N.W.; Lau, J.T. Transcription of the β-galactoside α2,6-sialyltransferase gene in B lymphocytes is directed by a separate and distinct promoter. Glycobiology 1996, 6, 271–279. [CrossRef][PubMed] 78. Svensson, E.C.; Soreghan, B.; Paulson, J.C. Organization of the β-galactoside α2,6-sialyltransferase gene. Evidence for the transcriptional regulation of terminal glycosylation. J. Biol. Chem. 1990, 265, 20863–20868. [PubMed] 79. Wang, X.; O'Hanlon, T.P.; Young, R.F.; Lau, J.T. Rat β-galactoside α2,6-sialyltransferase genomic organization: Alternate promoters direct the synthesis of liver and kidney transcripts. Glycobiology 1990, 1, 25–31. [CrossRef][PubMed] 80. Ashkenazy, H.; Abadi, S.; Martz, E.; Chay, O.; Mayrose, I.; Pupko, T.; Ben-Tal, N. ConSurf 2016: An improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016, 44, 344–350. [CrossRef][PubMed] 81. Kuhn, B.; Benz, J.; Greif, M.; Engel, A.M.; Sobek, H.; Rudolph, M.G. The structure of human α2,6-sialyltransferase reveals the binding mode of complex glycans. Acta. Crystallogr. D Biol. Crystallogr. 2013, 69, 1826–1838. [CrossRef][PubMed] 82. Chakrabarti, S.; Bryant, S.H.; Panchenko, A.R. Functional specificity lies within the properties and evolutionary changes of amino acids. J. Mol. Biol. 2007, 373, 801–810. [CrossRef][PubMed] 83. Gu, X. Maximum-likelihood approach for gene family evolution under functional divergence. Mol. Biol. Evol. 2001, 18, 453–464. [CrossRef][PubMed] 84. Chakraborty, A.; Mandloi, S.; Lanczycki, C.J.; Panchenko, A.R.; Chakrabarti, S. Speer-server: A web server for prediction of protein specificity determining sites. Nucleic Acids Res. 2012, 40, W242–W248. [CrossRef] [PubMed] 85. Datta, A.K. Comparative sequence analysis in the sialyltransferase protein family: Analysis of motifs. Curr. Drug Targets 2009, 10, 483–498. [CrossRef][PubMed] 86. Rohfritsch, P.F.; Joosten, J.A.; Krzewinski-Recchi, M.A.; Harduin-Lepers, A.; Laporte, B.; Juliant, S.; Cerutti, M.; Delannoy, P.; Vliegenthart, J.F.; Kamerling, J.P. Probing the substrate specificity of four different

sialyltransferases using synthetic β-D-Galp-(1 Ñ 4)-β-D-GlcpNAc-(1 Ñ 2)-α-D-Manp-(1 Ñ O) (CH2)7CH3 analogues general activating effect of replacing N-acetylglucosamine by N-propionylglucosamine. Biochim. Biophys. Acta 2006, 1760, 685–692. [CrossRef][PubMed] 87. Dunbrack, R.L., Jr. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 2002, 12, 431–440. [CrossRef]

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).