Downloaded from Genbank in May 2018
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.13.249359; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Evolution of Chi motifs in Proteobacteria 2 3 Angélique Buton & Louis-Marie Bobay* 4 5 Department of Biology, University of North Carolina Greensboro, 321 McIver Street, PO Box 6 26170, Greensboro, NC 27402, USA 7 8 * To whom correspondence should be addressed 9 10 11 Contact information 12 Louis-Marie Bobay 13 321 McIver Street 14 Greensboro, NC 27402, USA 15 Email address: [email protected] 16 Phone: 336-256-2590 17 18 Keywords: homologous recombination, bacteria, genome evolution, DNA motifs. 19 20 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.13.249359; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 21 Abstract 22 Homologous recombination is a key pathway found in nearly all bacterial taxa. The 23 recombination complex allows bacteria to repair DNA double strand breaks but also promotes 24 adaption through the exchange of DNA between cells. In Proteobacteria, this process is 25 mediated by the RecBCD complex, which relies on the recognition of a DNA motif named Chi 26 to initiate recombination. The Chi motif has been characterized in Escherichia coli and 27 analogous sequences have been found in several other species from diverse families, suggesting 28 that this mode of action is widespread across bacteria. However, the sequences of Chi-like 29 motifs are known for only five bacterial species: E. coli, Haemophilus influenzae, Bacillus 30 subtilis, Lactococcus lactis and Staphylococcus aureus. In this study we detected putative Chi 31 motifs in a large dataset of Proteobacteria and we identified four additional motifs sharing high 32 sequence similarity and similar properties to the Chi motif of E. coli in 85 species of 33 Proteobacteria. Most Chi motifs were detected in Enterobacteriaceae and this motif appears 34 well conserved in this family. However, we did not detect Chi motifs for the majority of 35 Proteobacteria, suggesting that different motifs are used in these species. Altogether these 36 results substantially expand our knowledge on the evolution of Chi motifs and on the 37 recombination process in bacteria. 38 39 40 41 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.13.249359; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 42 Introduction 43 Bacteria frequently suffer DNA double-strand breaks; and these damages need to be efficiently 44 repaired to avoid cell death. One of the primary repair mechanisms is mediated by homologous 45 recombination which allows the exchange of homologous sequences and thus the repair of 46 damaged DNA 1. Homologous recombination is also a key mechanism contributing to the rapid 47 evolution of bacteria and their viruses 2. In Proteobacteria, the main enzymatic complex 48 involved in this process is the RecBCD helicase-nuclease complex 3,4. Following a double- 49 strand break, RecBCD binds to the broken end of the DNA and unwinds it 5,6 upon encountering 50 a Chi (Crossover Hotspot Instigator) site. The recognition of the Chi motif triggers an 51 endonucleolytic nick near the 3’ end of the Chi motif by RecBCD 7 and initiates the loading of 52 RecA proteins on the DNA strand 8. The DNA-RecA complex can then initiate the search and 53 exchange of genetic material with a homologous sequence 7. The recognition of Chi motifs, 54 mediated by the RecC subunit 9, thus plays a key function in the initiation of the homologous 55 recombination. 56 Chi sites have been well defined in Escherichia coli, where the sequence of this 8nt- 57 long motif is 5’-GCTGGTGG-3’ 10,11. This motif is over-represented on the genome with one 58 motif every five kilobases on average 12. Chi sites are also more frequent in the conserved 59 regions of the bacterial genome (i.e. the core genome) than in the genes recently acquired 13,14 60 and in genomic repeats 15, suggesting the importance of the repair of the regions that encode 61 the most essential cellular functions. This motif is also polarized on the chromosome, which is 62 an important feature for the recognition of Chi sites by RecBCD 16: in E. coli Chi motifs are 63 found almost exclusively on the leading strand of replication. Chi sites have an important role 64 in the RecBCD pathway as its recognition by this enzymatic complex is a key step in initiating 65 this mechanism. However, most of the knowledge on Chi sites has been derived from E. coli 66 and little is known about this mechanism in other species. 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.13.249359; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 67 Chi motifs and functionally-related motifs (i.e. Chi-like motifs) have only been 68 identified in four other bacteria from diverse taxonomic groups: Haemophilus influenzae (5’- 69 GNTGGTGG-3’/5’G(G/C)TGGAGG-3’), Bacillus subtilis (5’-AGCGG-3’), Lactococcus 70 lactis (5’-GCGCGT-3’) 17 and Staphylococcus aureus (5’-GAAGCGG-3’) 13. Note that it 71 remains unclear whether the 5’-AGCGG-3’ motif stimulates homologous recombination in B. 72 subtilis and that it can therefore be considered a true Chi motif. For this reason, we will refer 73 to “Chi motifs” to designate the motifs found in E. coli and related bacteria with a similar 74 sequence. In contrast, we will use “Chi-like motifs” to designate all the motifs that are thought 75 to be associated with homologous recombination in prokaryotes. For most of the species with 76 Chi-like sequences identified, Chi-like sequences are statistically over-represented in the core 77 genome 17. The presence of Chi-like motifs across highly divergent bacteria suggests that this 78 mechanism of action is widespread in bacteria and possibly universal. Moreover, H. influenzae 79 and E. coli are both Gammaproteobacteria and present related Chi sequences. Although Chi- 80 like sequences have likely evolved toward different sequences in most species, this last case 81 suggests that they are evolving relatively slowly and indicates that it might be possible to 82 identify them with computational approaches. 83 In this study we searched for the presence of Chi motifs related to E. coli’s Chi motif 84 across hundreds of species of Proteobacteria. By using sequence similarity, oligonucleotide 85 frequency statistics and the known properties of Chi motifs (i.e. polarization on the leading 86 strand and over-representation on the core genome) we detected the presence of candidate Chi 87 motifs across 87 species of Proteobacteria. The identified motifs appear restricted to five 88 sequences: (GCTGGTGG, GCTGGCGG, GCTGCTGG, GGTGGTGG and GCTGGAGG). We 89 further used phylogenomic approaches to reconstruct the evolution of these sequences in 90 Proteobacteria. Our results underline that these sequences are well conserved in 91 Enterobactericeae, suggesting a common mode of action of homologous recombination in these 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.13.249359; this version posted August 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 92 species. However, we did not find related sequences across most Proteobacteria, indicating that 93 Chi motifs are either absent in these lineages or possess highly divergent sequences. 94 95 Materials and Methods 96 Data collection 97 Genomic sequences were downloaded from GenBank in May 2018. One genome per species 98 was selected. If multiple genomes were available, we selected the one with the most complete 99 assembly (the one with the smallest number of contigs), and if a species presented multiple 100 assemblies with the same quality, we randomly selected one of the genomes. Following this 101 procedure, we obtained 8,924 genomes from distinct species. From this list, we extracted the 102 species belonging to the Proteobacteria and the Terrabacteria, which yielded a total of 2603 103 species, composed of 1,071 and 992 species of Proteobacteria and Terrabacteria, respectively. 104 We then selected the genomes that with complete assemblies (i.e. assemblies at the scaffold or 105 the contig level were discarded), and this yielded a total or 495 Proteobacteria and 363 106 Terrabacteria. Dataset information such as species name, GenBank accession number, 107 taxonomy, genome size and nucleotide composition (GC-content) are detailed in Dataset S1. 108 109 Definition of the core genome 110 It has been previously observed that Chi sites are only statistically over-represented on the core 111 genome 17, likely because these regions need to be repaired efficiently.