Patterns of Nucleotide Deletion and Insertion Inferred from Bacterial Pseudogenes Bram Danneels, Marta Pinto-Carbó, Aurélien Carlier
Total Page:16
File Type:pdf, Size:1020Kb
Patterns of Nucleotide Deletion and Insertion Inferred from Bacterial Pseudogenes Bram Danneels, Marta Pinto-Carbó, Aurélien Carlier To cite this version: Bram Danneels, Marta Pinto-Carbó, Aurélien Carlier. Patterns of Nucleotide Deletion and Insertion Inferred from Bacterial Pseudogenes. Genome Biology and Evolution, Society for Molecular Biology and Evolution, 2018, 10 (7), pp.1792-1802. 10.1093/gbe/evy140. hal-02962215 HAL Id: hal-02962215 https://hal.inrae.fr/hal-02962215 Submitted on 14 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution - NonCommercial| 4.0 International License GBE Downloaded from https://academic.oup.com/gbe/article/10/7/1792/5047777 by INRAE Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement user on 14 October 2020 Patterns of Nucleotide Deletion and Insertion Inferred from Bacterial Pseudogenes Bram Danneels1,†, Marta Pinto-Carbo2, and Aurelien Carlier1,*,† 1Department of Biochemistry and Microbiology, Ghent University, Belgium 2Department of Plant and Microbial Biology, University of Zurich, Switzerland †These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Accepted: June 29, 2018 Data deposition: All accession numbers are listed in supplementary table S1, Supplementary Material online. Abstract Pseudogenes are a paradigm of neutral evolution and their study has the potential to reveal intrinsic mutational biases. However, this potential is mitigated by the fact that pseudogenes are quickly purged from bacterial genomes. Here, we assembled a large set of pseudogenes from genomes experiencing reductive evolution as well as functional references for which we could establish reliable phylogenetic relationships. Using this unique dataset, we identified 857 independent insertion and deletion mutations and discover a pervasive bias towards deletions, but not insertions, with sizes multiples of 3 nt. We further show that selective constraints for the preservation of gene frame are unlikely to account for the observed mutational bias and propose that a mechanistic bias in alternative end-joining repair, a recombination-independent double strand break DNA repair mechanism, is responsible for the accumulation of 3n deletions. Key words: insertion–deletion, pseudogenes, mutations, indel. Introduction potential to disrupt the production of proteins by introducing Pseudogenes are commonly viewed as the paradigm of neu- frameshift mutations in protein-coding genes, observed rates tral evolution (Li et al. 1981) and they have been used exten- of indels per generation are generally an order of magnitude sively to infer intrinsic mutational bias in various eukaryotic lower than base-substitution rates (Sung et al. 2016; Senra and prokaryotic organisms (Li et al. 1981; Andersson and et al. 2018), hindering statistically robust inference of indel Andersson 2001; Zhang and Gerstein 2003; Lerat and mutation spectra. Ochman 2004; Williams and Wernegreen 2013). However, In the course of our studies of reductive genome evolution pseudogenes are quickly purged in the genomes of free-living of Candidatus Burkholderia (Ca. Burkholderia) symbionts of bacteria, making them scarce in high-coding density genomes Rubiaceae and Primulaceae plants, we noticed that the (Kuo and Ochman 2010). Despite their value as neutral genomes of these obligate symbionts contained an inordinate ground for the accumulation of mutations and their potential amount of pseudogenes (Carlier and Eberl 2012; Carlier et al. for revealing background mutation rates in organisms, their 2015; Pinto-Carbo et al. 2016), and as a direct consequence short retention time makes the use of pseudogenes generally some of the genomes studied have the smallest estimated impractical for the survey of background mutation processes. coding capacity (i.e., cumulative size of functional ORFs to Highly accurate measures of mutation rates and spectra may total genome size) of known prokaryotic genomes, ranging instead be obtained through mutation accumulation experi- from 41.7% to 67.3% (Pinto-Carbo et al. 2016). Leaf nodule ments, where repeated single-cell bottlenecks ensure the Ca. Burkholderia symbionts are uncultured, obligate for host fixation of all but the most deleterious mutations (Lynch development and are vertically transmitted through seeds. 2008; Lee et al. 2012; Dillon et al. 2015). Possibly because We have previously shown that the genomes of these sym- insertion–deletion mutations (indels) have the highest bionts are reduced compared with free-living Burkholderia ß The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 1792 Genome Biol. Evol. 10(7):1792–1802. doi:10.1093/gbe/evy140 Advance Access publication July 2, 2018 Patterns of indel Mutations in Bacteria GBE Downloaded from https://academic.oup.com/gbe/article/10/7/1792/5047777 by INRAE Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement user on 14 October 2020 and that deleterious mutations such as frameshift or null candidate pseudogenes predicted only on unmasked data, mutations have accumulated in all functional gene categories none passed the criteria for downstream analysis as outlined indiscriminately with the exception of housekeeping genes below. For other genomes, intergenic regions (including and genes related to the production of host-protective sec- flanking 100 bp) between functional CDS features were ondary metabolites (Carlier and Eberl 2012; Carlier et al. searched for homology with predicted functional proteins in 2015; Pinto-Carbo et al. 2016). ad-hoc databases consisting of predicted proteins of free- Relaxation of purifying selection, a consequence of a small living relatives using the NCBI blastx program (Altschul et al. effective population size, may be the main driving force be- 1990). Only blastx hits with e-value < 10À6 and average iden- hind the slow purging of deleterious mutations in reduced tity > 50% were considered. Finally, candidate pseudogenes genomes (Kuo et al. 2009). Accordingly, a vast majority of were aligned to the top blastx hit using the tfasty program of pseudogenes are not translated in these symbionts, making the FASTA software suite (Pearson 2000), and bona fide pseu- these an ideal test bed to study background mutation rates dogenes identified when they contained frameshifts or pre- and profiles in prokaryotes (Carlier et al. 2013). Rather unusu- mature stop codons that affected >20% of gene length. ally, genome reduction is not a monophyletic trait in Finally, in order to exclude candidate pseudogenes corre- Burkholderia (Pinto-Carbo et al. 2018). A consequence of sponding to spurious hypothetical genes, pseudogene sequen- this is that many pseudogenes have functional orthologs in ces were searched against the Conserved Domain Database closely related free-living Burkholderia species. The relative (Marchler-Bauer et al. 2015). Pseudogene sequences without long-term maintenance of pseudogenes in leaf nodule CDD Hit were subsequently searched with Blastx against the Burkholderia sp. and the sequence availability of homologs NCBI Swissprot database and retained if they had at least from closely related genomes affords the assembly of a one match with e-value <0.001 outside of the Burkholderia unique dataset to analyze background indel mutation bias. genus. The results of the CDD search are gathered in supple- In this study, we take advantage of the abundance of mentary table S3, Supplementary Material online. pseudogenes in Ca. Burkholderia leaf nodule symbionts to assemble a large collection of pseudogenes with functional Pseudogene Data Set Selection orthologs in related species. Using this dataset, we investigate In order to select pseudogenes and unambiguously infer the the nature and frequency of insertions and deletions in geno- direction of mutations (deletion or insertion) in a target ge- mic regions evolving neutrally. We observe an overrepresen- nome, we identified two suitable reference genomes which tation of small deletions of sizes multiples of 3 nt in were: 1) closely related (Average Nucleotide Identity (ANI) pseudogenes, and show that this pattern is not due to selec- values > 85%), and 2) for which whole genome phylogeny tive constraints for the preservation of gene reading frame. indicated that one reference genome could act as an out- We further demonstrate a 3n nt size bias in unrelated bacterial group in phylogenetic analysis, that is, the rooted phyloge- taxa, suggesting that the underlying mechanism may be uni- netic tree followed the pattern (A,(B, C)); where