Unravelling the Complex Duplication History of Deuterostome Glycerol Transporters
Total Page:16
File Type:pdf, Size:1020Kb
cells Article Unravelling the Complex Duplication History of Deuterostome Glycerol Transporters Ozlem Yilmaz 1,2 , François Chauvigné 3, Alba Ferré 3, Frank Nilsen 1, Per Gunnar Fjelldal 2, Joan Cerdà 3 and Roderick Nigel Finn 1,3,* 1 Department of Biological Sciences, Bergen High Technology Centre, University of Bergen, 5020 Bergen, Norway; [email protected] (O.Y.); [email protected] (F.N.) 2 Institute of Marine Research, NO-5817 Bergen, Norway; [email protected] 3 IRTA-Institute of Biotechnology and Biomedicine (IBB), Universitat Autònoma de Barcelona, 08193 Bellaterra (Cerdanyola del Vallès), Spain; [email protected] (F.C.); [email protected] (A.F.); [email protected] (J.C.) * Correspondence: nigel.fi[email protected] Received: 18 June 2020; Accepted: 8 July 2020; Published: 10 July 2020 Abstract: Transmembrane glycerol transport is an ancient biophysical property that evolved in selected subfamilies of water channel (aquaporin) proteins. Here, we conducted broad level genome (>550) and transcriptome (>300) analyses to unravel the duplication history of the glycerol-transporting channels (glps) in Deuterostomia. We found that tandem duplication (TD) was the major mechanism of gene expansion in echinoderms and hemichordates, which, together with whole genome duplications (WGD) in the chordate lineage, continued to shape the genomic repertoires in craniates. Molecular phylogenies indicated that aqp3-like and aqp13-like channels were the probable stem subfamilies in craniates, with WGD generating aqp9 and aqp10 in gnathostomes but aqp7 arising through TD in Osteichthyes. We uncovered separate examples of gene translocations, gene conversion, and concerted evolution in humans, teleosts, and starfishes, with DNA transposons the likely drivers of gene rearrangements in paleotetraploid salmonids. Currently, gene copy numbers and BLAST are poor predictors of orthologous relationships due to asymmetric glp gene evolution in the different lineages. Such asymmetries can impact estimations of divergence times by millions of years. Experimental investigations of the salmonid channels demonstrated that approximately half of the 20 ancestral paralogs are functional, with neofunctionalization occurring at the transcriptional level rather than the protein transport properties. The combined findings resolve the origins and diversification of glps over >800 million years old and thus form the novel basis for proposing a pandeuterostome glp gene nomenclature. Keywords: aquaporin; glycerol; evolution; gene duplication; pseudogene 1. Introduction Prokaryotic and eukaryotic cells utilize glycerol, a three-carbon polyhydric alcohol, as a metabolic intermediate for anaerobic fermentation or aerobic glycolysis, gluconeogenesis, and the biosynthesis of triacylglycerols and phospholipids [1,2]. In addition, due to its colligative properties, organisms as diverse as algae, fungi, insects, and fishes can accumulate glycerol to alleviate osmotic stress or as an antifreeze metabolite [3–8]. Though glycerol may passively diffuse across cell membranes [9], its transport is greatly facilitated by a group of integral membrane proteins termed the aquaglyceroporins (Glps) [10–15]. Glps were first identified in bacteria as the Escherichia coli glycerol facilitator (GlpF), and they have been phylogenetically and functionally shown to belong to a superfamily of transmembrane water-conducting channels, the aquaporins [16–19]. Cells 2020, 9, 1663; doi:10.3390/cells9071663 www.mdpi.com/journal/cells Cells 2020, 9, 1663 2 of 29 Subsequent research has revealed a complex evolutionary history of channels capable of facilitating the transmembrane conduction of non-polar glycerol. For example, other members of the aquaporin (AQP) superfamily, including Archaean AqpM, plant GIPs and NIPs, insect Eglps, and vertebrate AQP6, -8, -11, and -14, also evolved this biophysical property [20–27]. In the cases of land plants, which acquired GIPs and NIPs via the horizontal gene transfer of bacterial channels, and hemipteran and holometabolous insects, which evolved more efficient Eglps from mutated AQP4-related channels, the genomic expansions of these new forms of glycerol transporters have been found to correlate to the supplantation of their glp genes [13,25,26,28,29]. In nearly all other organisms studied to date, however, the phylogenetically conserved glps remain the most ubiquitously selected form of glycerol transporter. Following the discovery of the bacterial GlpF, multiple Glps were identified in placental mammals (AQP3,-7,-9, and -10), which were named according to the chronology of aquaporin gene discovery [30–33]. In line with this nomenclature, a fifth mammalian Glp, termed AQP13, which is phylogenetically related to a frog Glp [34], was identified in the genome of the platypus (Ornithorhynchus anatinus), an extant member of the egg-laying prototherian mammals [35]. Conversely, in non-mammalian model organisms, genome-wide studies revealed seven glp genes in zebrafish (Danio rerio)(aqp3a,-3b,-7,-9a,-9b,-10a, and -10b)[36] and thirteen related orthologs in the paleotetraploid 1 Atlantic salmon (Salmo salar)[37]. Such gene copy numbers represent ~ /3 of their genomic aquaporin complements, highlighting their importance for cellular homeostasis. In accordance with the zebrafish information network (ZFIN) nomenclature recommendations, the teleost glps were named with respect to their phylogenetic relationships to the mammalian orthologs, although a unified nomenclature for piscine aquaporins has yet to be established. As for the high aquaporin gene copy numbers in land plants, with up to 120 paralogs in species such as the rapeseed (Brassica napus)[38,39], the multiplicity of the teleost and mammalian repertoires has been associated with serial rounds of whole genome duplication (WGD) [40]. For example, in chordates, two rounds (R1 and R2) are considered to have occurred >500 million years ago (Ma), while a third round (R3) occurred ~300 Ma prior to the diversification of Teleostei [41,42]. Subsequently, within the teleost lineage, a fourth round (R4) of autotetraploidization occurred ~80–100 Ma in the common ancestor of Salmoniformes, and allotetraploidization ~12.4 Ma in specific cyprinid lineages [41–47]. It is nevertheless thought that genome reduction has been the dominant mode of evolution, with duplicated paralogs typically silenced within a few million years [48,49]. As a result, many genes are likely to have been lost, which can obscure interpretations of the origin and modes of diversification of a gene superfamily. Asymmetric gene loss can also have undesired consequences for estimations of divergence times and interpretations of the adaptive functions of the encoded proteins, since it is usually assumed that single gene families represent the same ortholog in each species included in the data set [50,51]. This might be particularly problematic for distantly related piscine lineages that experienced both common and independent WGD and/or gene duplication events, along with the subsequent asymmetric loss of paralogs long after the duplication events. Conversely, the asymmetric acquisition of new genes or the absence of gene differentiation due to concerted evolution may further compound interpretations of historical duplication events. Amongst more basal lineages of deuterostomes, preliminary studies have indicated that echinoderms such as the purple sea urchin (Strongylocentrotus purpuratus) can harbor similar aquaporin and glp gene copy numbers to birds, reptiles, and mammals; however, its genome did not undergo WGD [35,52]. How such copy numbers arose and whether they typify the glp repertoires of other non-chordate deuterostomes remains to be established. With the increased availability of public data from large-scale genome sequencing initiatives, it is becoming possible to address such questions in broad groups of distantly related organisms. In the present study, we leveraged >550 genomes and >300 transcriptomes to re-evaluate the evolutionary history of the Glp grade of aquaporins in Deuterostomia using Bayesian and maximum likelihood inference coupled with syntenic analyses. We found that tandem (intrachromosomal) duplication played an unexpectedly important role for glp gene evolution and retention in basal lineages, including Cells 2020, 9, 1663 3 of 29 starfishes, sea urchins, and sea cucumbers (Echinodermata), acorn worms (Hemichordata), lampreys (Hyperoartia), chimaeras and elasmobranchs (Chondrichthyes), as well as spiny ray-finned fishes (Actinopterygii). We assembled and phylogenetically analyzed >100 pseudogenes to validate the proposed evolutionary history, and we experimentally tested the functionality of the channels in a paleotetraploid teleost, the Atlantic salmon, which experienced the most complicated history of gene evolution. Based upon the collated findings, we propose a new pandeuterostome gene nomenclature for the Glp grade of glycerol transporters. 2. Material and Methods 2.1. Coding Sequence Assembly, Phylogenetic and Syntenic Analyses Coding sequence (CDS) assembly was conducted as described previously [25,27,35]). Using either full-length proteins or exon-deduced peptides as tblastn queries, open-source whole genome shotgun (WGS), transcriptome shotgun (TSA) and nucleotide databases (NCBI (blast.ncbi.nlm.nih.gov), GenomeArk (vertebrategenomesproject.org), and Ensembl (ensembl.org)) were searched for Glp-related sequences. Full-length proteins were aligned to generate initial multiple sequence alignments using the L-INS-I algorithm of MAFFT