Divergent Genes in Gerbils: Prevalence, Relation to GC-Biased Substitution, and Phenotypic Relevance Yichen Dai, Rodrigo Pracana and Peter W

Divergent Genes in Gerbils: Prevalence, Relation to GC-Biased Substitution, and Phenotypic Relevance Yichen Dai, Rodrigo Pracana and Peter W

Dai et al. BMC Evolutionary Biology (2020) 20:134 https://doi.org/10.1186/s12862-020-01696-3 RESEARCH ARTICLE Open Access Divergent genes in gerbils: prevalence, relation to GC-biased substitution, and phenotypic relevance Yichen Dai, Rodrigo Pracana and Peter W. H. Holland* Abstract Background: Two gerbil species, sand rat (Psammomys obesus) and Mongolian jird (Meriones unguiculatus), can become obese and show signs of metabolic dysregulation when maintained on standard laboratory diets. The genetic basis of this phenotype is unknown. Recently, genome sequencing has uncovered very unusual regions of high guanine and cytosine (GC) content scattered across the sand rat genome, most likely generated by extreme and localized biased gene conversion. A key pancreatic transcription factor PDX1 is encoded by a gene in the most extreme GC-rich region, is remarkably divergent and exhibits altered biochemical properties. Here, we ask if gerbils have proteins in addition to PDX1 that are aberrantly divergent in amino acid sequence, whether they have also become divergent due to GC-biased nucleotide changes, and whether these proteins could plausibly be connected to metabolic dysfunction exhibited by gerbils. Results: We analyzed ~ 10,000 proteins with 1-to-1 orthologues in human and rodents and identified 50 proteins that accumulated unusually high levels of amino acid change in the sand rat and 41 in Mongolian jird. We show that more than half of the aberrantly divergent proteins are associated with GC biased nucleotide change and many are in previously defined high GC regions. We highlight four aberrantly divergent gerbil proteins, PDX1, INSR, MEDAG and SPP1, that may plausibly be associated with dietary metabolism. Conclusions: We show that through the course of gerbil evolution, many aberrantly divergent proteins have accumulated in the gerbil lineage, and GC-biased nucleotide substitution rather than positive selection is the likely cause of extreme divergence in more than half of these. Some proteins carry putatively deleterious changes that could be associated with metabolic and physiological phenotypes observed in some gerbil species. We propose that these animals provide a useful model to study the ‘tug-of-war’ between natural selection and the excessive accumulation of deleterious substitutions mutations through biased gene conversion. Keywords: gBGC, GC bias, Genome evolution, Insulin receptor, Medag, Metabolism, Osteopontin, Pancreatic duodenal homeobox 1, Protein evolution * Correspondence: [email protected] Department of Zoology, University of Oxford, 11a Mansfield Road, Oxford OX1 3SZ, UK © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Dai et al. BMC Evolutionary Biology (2020) 20:134 Page 2 of 15 Background substitutions. Importantly, in genomic regions where the As animal genomes are sequenced, unexpected features effect of gBGC is particularly strong, deleterious ‘weak to are often discovered, shaping our understanding of mo- strong’ mutations may become fixed in the population lecular evolution and its relation to phenotypic evolu- despite the action of purifying selection [12]. Suggested tion. Amongst mammals, the genomes of gerbils (order examples include coding sequence changes in the LEP Rodentia, subfamily Gerbillinae) have emerged as some gene in the avian lineage [13], the Fxy gene in mice [14], of the most unusual discovered to date. Several puzzling and the Pdx1 gene in sand rats and other gerbils [3]. The features of gerbil genomes are still not fully understood, radical changes to the Pdx1 gene associated with GC bias and these raise important questions about mechanisms in the gerbil lineage could have physiological implications. underlying molecular evolution and possible constraints Pdx1 is a homeobox gene encoding a highly conserved or challenges to natural selection. transcription factor essential for pancreatic development The first indications of unusual genome structure in and function [15–18]. In humans, PDX1 variants are this group came from analyses of the sand rat (Psamm- linked to pancreatic dysfunction of varying severity [19, omys obesus), a desert-living gerbil from the Middle East 20], while experiments in mice have shown that Pdx1 is and North Africa [1]. After initial attempts to clone indi- essential for pancreatic development [15, 16]. Consistent vidual genes gave misleading results [2], genome sequen- with evidence from mutations, comparisons between spe- cing revealed presence of an ‘island’ or ‘islands’ of cies reveal extreme evolutionary conservation, especially remarkably high GC content [3]. In the case of at least in the homeodomain motif, indicating that each amino one gene located within a high GC island, Pdx1, these acid site is under purifying selection and that few changes nucleotide changes are associated with unusual amino are tolerated. It is therefore striking that the 60 amino acid acid substitutions with potential to adversely impact PDX1 homeodomain, normally 100% conserved across physiology or development [3, 4]. Analysis of transcrip- mammals, has 15 amino acid changes in the sand rat and tomes from additional species reveal that the ‘GC islands’ 14 amino acid changes in the Mongolian jird [3]. phenomenon is shared between different gerbil species Despite the occurrence of these otherwise ‘disallowed’ and occurred in the gerbil evolutionary lineage [5]. mutations, sand rats and other gerbils do develop a pan- Variation in GC content within a genome is not in itself creas and do secrete insulin from pancreatic β-cells [21, unusual. With the increased availability of published ge- 22]. There are indications, however, that sand rats and nomes, many studies have reported differences between nu- possibly other gerbils are prone to disorders associated cleotide composition in regions of the genome and with pancreatic dysfunction under some conditions. For between species [6, 7]. What marks gerbil genomes as un- example, at least some sand rats develop diet-induced usual is the extreme nature of the nucleotide compositional type 2 diabetes (T2D) when fed standard laboratory ro- bias, including changes within protein-coding genes. Recent dent food, and physiological studies have shown a pre- work has revealed that gerbil genomes likely have one large disposition towards insulin resistance and stress-induced island (~ 10 Mb) of extreme GC bias, where GC content at β-cell apoptosis in T2D-prone individuals [23, 24]. In synonymous sites reaches almost 100%, and several other addition, some Mongolian jirds raised on a laboratory less extreme islands on other chromosomes [5]. In eukary- diet have also been observed to spontaneously become otes, the location of GC-rich domains is often correlated obese and develop poor glucose tolerance [25, 26]. with chromosomal regions of high recombination, such as We must be cautious before concluding that changes to subtelomeric regions and small chromosomes [8, 9]. The Pdx1, driven by GC-biased evolution, are the cause of correlation is thought to be driven by GC-biased gene physiological disorders in sand rats and other gerbils. In conversion (gBGC) [10, 11], a phenomenon that oc- humans, T2D is a complex metabolic disease caused by a curs at meiosis when homologous chromosomes con- mixture of genetic and environmental factors, with the ma- tain GC/AT heterozygous sites [8]. During strand jority of T2D cases associated with coding/regulatory se- invasion after meiotic pairing, the mismatch between quence mutations in more than one gene [27–29]. In strands at these sites tends to be repaired using the addition, some T2D-related genes are not clearly as- strand containing a G or C nucleotide in preference sociated with adult β-cell function; for example, mu- to the strand containing A or T [8]. tations in the LEP gene can lead to T2D by altering This gBGC process can profoundly affect genome GC appetite and body weight, while mutations altering composition over generations through fixation of AT to the insulin receptor gene INSR can cause T2D due to GC (weak to strong) mutations in genomic regions with insulin resistance in tissues responding to insulin [30, a high recombination rate [8]. This underlying process 31]. generates a pattern of GC biased evolution in these re- In this study, we take a different approach to previous gions, with ‘weak to strong’ nucleotide substitutions oc- studies that have focused on only one protein, PDX1, or an- curring more often than ‘strong to weak’ nucleotide alyzed GC evolution

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us