A non-canonical mismatch repair pathway in

Article (Accepted Version)

Castañeda-García, A, Prieto, A I, Rodríguez-Beltrán, J, Alonso, N, Cantillon, D, Costas, C, Pérez- Lago, L, Zegeye, E D, Herranz, M, Plociński, P, Tonjum, T, García de Viedma, D, Paget, M, Waddell, S J, Rojas, A M et al. (2017) A non-canonical mismatch repair pathway in prokaryotes. Nature Communications, 8. a14246. ISSN 2041-1723

This version is available from Sussex Research Online: http://sro.sussex.ac.uk/66997/

This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version.

Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University.

Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available.

Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

http://sro.sussex.ac.uk A non-canonical mismatch repair pathway in prokaryotes

A. Castañeda-García1,2, A. I. Prieto1, J. Rodríguez-Beltrán1, N. Alonso3, D. Cantillon4, C. Costas1, L. Pérez5, E. D. Zegeye6, M. Herranz5, P. Plociński2,#, T. Tonjum6, D. García de Viedma5, M. Paget7, S.J. Waddell4, A. M. Rojas8*, A. J. Doherty2* and J. Blázquez1,3,9*

1 Instituto de Biomedicina de Sevilla (IBIS)-CSIC. Stress and bacterial evolution .Sevilla, Spain. 2 Genome Damage and Stability Centre, School of Life Sciences, University of Sussex, Brighton, BN1 9RQ, UK. 3 Centro Nacional de Biotecnología-CSIC. Madrid, Spain. 4 Brighton and Sussex Medical School, University of Sussex, Brighton, BN1 9PX, UK. 5 Servicio de Microbiología Clínica y Enfermedades Infecciosas, Hospital Gregorio Marañón, Madrid, Spain. 6 Department of Microbiology, Oslo University Hospital, Rikshospitalet, Oslo, Norway and Department of Microbiology, University of Oslo, Oslo, Norway. 7School of Life Sciences, University of Sussex, Brighton BN1 9QG, UK 8 Computational Biology and Bioinformatics, Instituto de Biomedicina de Sevilla (IBIS)- CSIC. 9 Unit of Infectious Diseases, Microbiology, and Preventive Medicine. University Hospital Virgen del Rocio, Sevilla, Spain. # Present address: Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland.

*Correspondence to: Jesús Blázquez ([email protected]), Aidan J. Doherty ([email protected]) and Ana M. Rojas for computational Biology analyses ([email protected]).

1 Abstract

Mismatch Repair (MMR) is a near ubiquitous pathway, essential for the maintenance of genome stability. Members of the MutS and MutL protein families perform key steps in mismatch correction. Despite the major importance of this repair pathway, MutS- MutL are absent in almost all and many Archaea. However, these organisms exhibit rates and spectra of spontaneous mutations similar to MMR-bearing species, suggesting the existence of an alternative to the canonical MutS-MutL-based MMR. We report that smegmatis NucS/EndoMS, a putative endonuclease with no structural homology to known MMR factors, is required for mutation avoidance and anti- recombination, hallmarks of the canonical MMR. Furthermore, phenotypic analysis of naturally occurring polymorphic NucS in a M. smegmatis surrogate model, suggests the existence of M. tuberculosis mutator strains. The phylogenetic analysis of NucS indicates a complex evolutionary process leading to a disperse distribution pattern in prokaryotes. Together, these findings indicate that distinct pathways for MMR have evolved at least twice in nature.

2 Cells ensure the maintenance of genome stability and a low mutation rate using a plethora of DNA surveillance and correction processes. These include base selection, proofreading, mismatch repair (MMR), base/nucleotide excision repair, recombination repair and non-homologous end-joining 1. The recognized MMR pathway is highly conserved among the three domains of life 2. In all cases, members of the MutS and MutL protein families perform key steps in mismatch correction. This MutS-MutL-based MMR system (canonical MMR) is a sophisticated DNA repair pathway that detects and removes incorrect mismatched nucleobases. Mismatched base pairs in DNA arise mainly as a result of DNA replication errors due to the incorporation of wrong nucleobases by DNA polymerases. To correct them, the system detects incorrect although chemically normal bases that are mismatched with the complementary strand, discriminating between the parental template and the newly synthesized strand 3. Loss of this activity has very important consequences, such as high rates of mutation (hypermutability) and increased recombination between non perfectly identical (homeologous) DNA sequences 4, both hallmarks of MMR inactivation. Nevertheless, despite the quasi-ubiquitous nature of this pathway, there are some important exceptions. The genomes of many Archaea, including Crenarchaeota and a few groups of Euryarchaeota, and almost all members of the bacterial phylum Actinobacteria, including Mycobacterium, have been shown to possess no identifiable MutS or MutL homologs 5-7. However, these prokaryotes exhibit rates and spectra of spontaneous mutations similar to canonical MMR- bearing bacterial species 8-10, suggesting the existence of unidentified mechanisms responsible for mismatch repair. To identify novel mutation avoidance genes in mycobacteria, we performed a genetic screen and discovered that inactivation of the MSMEG_4923 gene in Mycobacterium smegmatis, encoding a homologue of an archaeal endonuclease dubbed NucS (for nuclease specific for ssDNA) 11, produced a hypermutable phenotype. NucS from Pyrococcus abyssi was initially identified as a member of a new family of novel structure-specific DNA endonucleases in Archaea 11. Notably, a recent report revealed that the protein NucS from the archaeal species Thermococcus kodakarensis (renamed as EndoMS) is a mismatch-specific endonuclease, acting specifically on dsDNA substrates containing mismatched bases 12. The possibility that a non-canonical mismatch repair could be triggered by a specific double- strand break (DSB) endonuclease, able to act at the site of a mispair, followed by DSB repair was hypothesized 30 years ago 13. The archaeal NucS binds and cleaves both strands at dsDNA mismatched substrates, with the mispaired bases in the central position, leaving 5-

3 nucleotide long 5´-cohesive ends 12. Consequently, it has been suggested that these specific double-strand breaks may promote the repair of mismatches, acting in a novel MMR process 12. Moreover, very recently, the structure of the T. kodakarensis NucS-mismatched dsDNA complex has recently been determined, strongly supporting the idea that NucS acts in a mismatch repair pathway 14. Although these biochemical and structural studies have shed some light on the role of NucS at the molecular level, the cellular and biological function of NucS and its impact on genome stability remains unknown. Hypermutable bacterial pathogens, very often associated with defects in MMR components, are frequently isolated and pose a serious risk in many clinical infections 15-21. The major pathogen Mycobacterium tuberculosis appears to be genetically isolated, acquires antibiotic resistance exclusively through chromosomal mutations 22 and presents variability in mutation rates between strains 23. These factors should contribute to the selection of hypermutable M. tuberculosis strains under antibiotic pressure, as it occurs with other chronic pathogens 18,23. However, hypermutable strains have not yet been detected in this pathogen. To investigate the possible existence of M. tuberculosis hypermutable strains affected in NucS activity, we analysed the effect of naturally occurring polymorphisms in nucS from M. tuberculosis clinical isolates on mutation rates, using M. smegmatis as a surrogate model. In this study, we first establish that NucS is essential for maintaining DNA stability, by preventing the acquisition of DNA mutations in Mycobacterium and Streptomyces. Inactivation of nucS in M. smegmatis produces specific phenotypes that mimic those of canonical MMR-null mutants (increased mutation rate, biased mutational spectrum and increased homeologous recombination). Finally, we conduct here distribution and phylogenetic analyses of NucS across the prokaryotic domains to understand its evolutionary origins.

4 Results

Identification and characterization of M. smegmatis nucS To identify novel mutation avoidance genes in mycobacteria, a M. smegmatis mc2 155 library of ~11,000 independent transposon insertion mutants was generated and screened for spontaneous mutations that confer rifampicin resistance (Rif-R), used as a hypermutator hallmark (Fig. 1A). One transposon insertion, which inactivated the MSMEG_4923 (nucS) gene, conferred a strong hypermutable phenotype. The NucS/EndoMS translated protein sequence is 27% identical to NucS from the hyperthermophilic archaeal Pyrococcus abyssi and 87% to that of M. tuberculosis (Fig. 1B). P. abyssi NucS was defined as the first member of a new family of structure-specific endonucleases with a mismatch-specific endonuclease activity 11,12 containing an N-terminal DNA-binding and a C-terminal RecB-like nuclease domain 11. All the catalytic residues required for nuclease activity in P. abyssi are conserved in the mycobacterial NucS (see Fig. 1B). Recombinant M. smegmatis NucS was expressed and purified. The apparent molecular mass of the purified native protein fits with the expected size (25 kDa) (Supplementary Fig. 1A). The biochemical activity was analysed by DNA electrophoretic mobility shift assays (EMSAs) and nuclease enzymatic assays. M. smegmatis NucS was capable of binding to single-stranded DNA (ssDNA), as seen in the archaeal P. abyssi NucS 11, but not to double-stranded DNA (dsDNA) (Fig. 1C). Regarding cleavage of mismatched substrates, no significant specific cleavage activity was observed on different substrates containing a single nucleotide mismatch in the central region of the strand (Supplementary Fig. 1B), in contrast with previous results obtained with the archaeal T. kodakarensis NucS 12. The binding to ssDNA indicates that the protein is correctly folded, as it can bind to its substrate. This suggests that different requirements, such as the binding of additional partners, protein modifications or different DNA substrates, may be required to activate the mismatch- specific cleavage activity of the bacterial NucS.

NucS is essential to maintain low levels of spontaneous mutation To verify that NucS has a key role in maintaining DNA fidelity, we constructed an in- frame deletion of nucS (ΔnucS, nucS-null derivative) in M. smegmatis and measured the rate at which drug resistance was acquired. Notably, the nucS-null strain displayed a hypermutable phenotype, increasing the rate of spontaneously emerging rifampicin and streptomycin (Str-R) resistances by a factor of 150-fold and 86-fold, respectively, above the wild-type strain

5 (3.1x10-7 vs 2.1x10-9 and 4.8x10-8 vs 5.6x10-10, ΔnucS vs wild-type, respectively). These results are equivalent to those observed for mutS- or mutL-deficient E. coli (102-103-fold increases) 24. Significantly, basal mutation rates were recovered when the nucS deletion was chromosomally complemented with the wild-type gene MSMEG_4923 (nucSSm) (Fig. 2A and Supplementary Table 1), confirming that inactivation of this gene was responsible for the high mutation rates observed. Notably, M. smegmatis mc2 155 and its ΔnucS derivative produced a different mutational signature. While spontaneous Rif-R mutations detected in the wild-type strain comprise different base substitutions, including transitions, but also transversions and even an in-frame deletion, all mutations detected in the ΔnucS strain were specifically transitions (A:T→G:C or G:C→A:T) (Fig. 2B and Supplementary Tables 2A-B). Transitions occur much more frequently than transversions or indels in any cell, due to spontaneous or induced errors in the DNA, and are the preferred substrates for MMR mechanism. The ΔnucS strain accumulates a high number of uncorrected transitions, masking transversions and indels to undetectable levels under our experimental conditions. This transition-biased mutational spectrum is a hallmark signature of canonical MMR pathway 25,26. To demonstrate the generality of the mutation-avoidance activity in species encoding NucS, the nucS gene was deleted in Streptomyces coelicolor A3(2), a different actinobacterial species. The precise deletion of nucS in this species increased the rate of spontaneous mutations conferring Rif-R and Str-R by a factor of 108-fold and 197-fold, respectively. As with M. smegmatis, low mutation rates were recovered by complementation with the wild- type S. coelicolor nucS gene (Fig. 2C and Supplementary Table 3). Together, these results highlight the essential role of NucS in mutation avoidance.

NucS inhibits homeologous but not homologous recombination These results prompted us to investigate whether NucS is involved in reduction of recombination between non-identical (homeologous) DNA sequences, but not between 100% identical, as described for the canonical MMR-null mutants in other bacterial species 4,27. The rates of recombination between homologous and homeologous sequences were measured using specific engineered tools (Fig. 3A and Supplementary Fig. 2A-B). Recombination rates between 100% identical sequences were similar in the wild-type and its ΔnucS derivative (2.42x10-6 and 2.86x10-6, respectively). However, when the identity of DNA sequences decreased, the recombination rate was comparatively higher in the nucS–null derivative: 95%

6 4.68x10-7 vs 1.41x10-6 (3-fold difference), 90% 1.71x10-8 vs 1.82x10-7 (10-fold) and 85% 6.47x10-9 vs 3.36x10-8 (5-fold) for wild-type versus ΔnucS, respectively (Fig. 3B). We verified that recombinants detected were exclusively due to recombination events but not spontaneous mutation (see Methods). The inhibitory effect of NucS on homeologous but not homologous recombination again resembles an important tell-tale signature of a canonical MMR pathway 28,29.

NucS polymorphisms in M. tuberculosis clinical strains Once the possibility of hypermutability was demonstrated in M. smegmatis, we searched for the existence of M. tuberculosis hypermutable clinical isolates by nucS inactivation. Analysis of the nucSTB sequences from ∼1,600 clinical M. tuberculosis strains available in the databases revealed a total of 9 missense SNPs (Supplementary Table 4 and Supplementary Fig. 3). The effects of these polymorphisms on NucS activity were experimentally analysed by checking their impact on mutation rates, using a M. smegmatis 30 heterologous system, as previously reported for other mutagenesis studies . Ten nucSTB alleles (9 polymorphic plus the wild-type) were integrated into the chromosome of M. smegmatis ΔnucS. Complementation with wild-type nucSTB restored low mutation rates to M. smegmatis ΔnucS. However, five alleles increased mutation rate significantly (Fig. 4, Supplementary Table 5), with allele S39R presenting the strongest observed mutator phenotype (83-fold increase). Alleles A135S, R144S, T168A and K184E produced increases in mutation rates close to one order of magnitude, whereas alleles S54I, A67S, V69A, and D162H produced low increases (∼2) or no changes. These results suggest the existence of hypermutable M. tuberculosis clinical strains affected by modulation of NucS activity.

NucS taxonomic distribution Taken together, our results compellingly suggest a functional connection between NucS and known MMR pathways. We analysed the species distribution of NucS taking also into account the canonical MMR proteins, MutS and MutL (for MutS and MutL analyses see Supplementary Methods). A total of 3,942 reference proteomes were scanned for presence/absence of NucS (Supplementary Data 1 and Supplementary Fig. 4). are represented by 2,709 proteomes (68%) and Archaea by 132 (4%), the remaining 1,101 (28%) belonging to Eukaryota and Virus. NucS is present in 370 organisms, from the domains

7 Archaea (60 species) and Bacteria (310 species) (Supplementary Data 1). It is totally absent from eukaryotes and virus. To highlight the distribution of NucS in all organisms, we built a phylogenetic profile of NucS. Figure 5 shows this profile mapped onto the NCBI tree, containing 2,186 bacterial and archaeal species (see also Supplementary Data 2). The NucS distribution pattern indicates that this protein exhibits a disperse distribution (Fig. 5 and Supplementary Data 1). In Bacteria, the phylum Actinobacteria is the one containing the majority of NucS with 300 species in the class Actinobacteria (only 2 exceptions) and 3 species in other classes of the phylum (Supplementary Data 1), woesei (class ), Patulibacter medicamentivorans (class Thermoleophilia) and Ilumatobacter coccineus (class ). All the analysed members of the class , from the Actinobacteria phylum, lack NucS and MutS-MutL proteins, while the two species from the class lack NucS, but present MutS-MutL. The pattern is even more disperse in Archaea. From 132 species, 60 have NucS (21 also contain MutS-MutL). NucS, but not MutS-MutL, is present in 17 out of 24 species of the phylum Crenarchaeota and in 18 out of 88 of the phylum Euryarchaeota. By contrast, MutS-MutL (but not NucS) is completely absent in crenarchaeotal species while it is restricted to 29 euryarchaeotal species (Supplementary Data 1). Interestingly, only 28 organisms, 21 halobacterial species (domain Archaea) and 7 species of the phylum -Thermus (domain Bacteria), have both NucS and MutS- MutL sequences. Notably, when we focused our analysis in the two main NucS-containing groups (Actinobacteria and Archaea), some species still lack both NucS and MutS-MutL (confirmed by protein translated searches -tblastn- in Actinobacteria and Archaea, Supplementary Data 1).

A model for the origin and evolution of NucS Our results support a complex evolutionary ancestry for NucS. Comprehensive protein sequence analyses of NucS indicate that the protein contains two distinct regions, a DNA- binding N-terminal domain (NucS-NT) and an endonuclease C-terminal domain (NucS-CT) (Supplementary Figs. 3-4), in agreement with previous structural studies 11. At the sequence level, using profile hidden Markov models (HMMs), we also detected these regions in alternative proteins outside the context of NucS, supporting the original independence of these domains, which have been subsequently fused during evolution.

8 To understand the particular distribution observed in the phylogenetic profile, we conducted sequence and phylogenetic independent analyses of full NucS and also the N- terminal and C-terminal regions (Supplementary Fig. 5-7). The actinobacterial NucS representatives are well separated from those of Archaea. On the other hand, the NucS from Deinococcus-Thermus species group together with the archaeal proteins, instead of the actinobacterial orthologues, suggesting that NucS has recently been transferred from Archaea to these Deinococcus-Thermus species (Supplementary Figs. 5-7). The sequence analyses provide different distributions for the two regions of NucS. While the N-terminal region is limited exclusively to Archaea and some Bacteria (Supplementary Fig. 7), the C-terminal region was found in many archaeal species, some Bacteria, and in a few eukaryotes (Supplementary Fig. 6) and, importantly, also in different domain architectures. These observations, in the context of our phylogenetic analyses, suggest that both regions may have emerged in Archaea. Subsequently, the C-terminal region may have been transferred to some bacterial species and to a few eukaryotes, and got fixed in different protein contexts. The full protein and the individual domains likely got lost in certain groups during the evolution of Archaea. However, the reconstruction of the precise evolution of these individual domains requires deeper analyses in the context of related protein domains. Phylogenetic analysis of the full protein suggests that NucS emerged after the archaeal divergence by a rearrangement of these domains and then it was transferred from Archaea to certain Deinococcus-Thermus species in at least one event. For Actinobacteria, the most parsimonious explanation suggests a horizontal transfer event of NucS from Archaea coupled with a likely MutS-MutL loss event in the last common ancestor of Actinobacteria (as most of the contemporary NucS-encoding species lack MutS and MutL). This is consistent with the fact that we did not find any species from the entire Bacteria group, except those from Actinobacteria and the Deinococcus-Thermus group, having NucS or NucS-like proteins. Our model (Fig. 6) favours the simplest scenario to explain the complex pattern distribution of the full NucS protein and its absence in all eukaryotes, the vast majority of bacteria, and some archaeal organisms without invoking unlikely massive losses. This is in agreement with our phylogenetic observations.

9 Discussion This study provides genetic and biological evidence that establish the existence of a DNA repair system that mimics the canonical MutS-MutL-based MMR pathway. To date, the correction of mismatched nucleobases has been defined as a highly conserved mechanism whose key steps are performed, in all cases, by members of the MutS and MutL protein families. Despite the genomes of Crenarchaeota, a few groups of Euryarchaeota and almost all members of the phylum Actinobacteria lacking identifiable MutS or MutL homologues 5-7, they exhibit rates and spectra of spontaneous mutations similar to canonical MMR-bearing bacterial species 8-10, suggesting the existence of still undetected pathway responsible for this type of correction. Although recent biochemical and structural reports suggested the existence of a novel mismatch-specific endonuclease, NucS, in the archaeal species T. kodakarensis 12,14 that is able to recognize and cleave mismatched bases in double-strand DNA in vitro, no genetic and/or biological evidence had been reported to date on the activity of a novel mismatch repair pathway. The screening of a large library of M. smegmatis mutants revealed that NucS is a key mutation avoidance component in Actinobacteria. Through genetic and biological analysis, we demonstrate that the nucS-null phenotypes in M. smegmatis are almost identical to those produced by the MMR deficiency in other bacteria (very high mutation rates, transition-biased mutational spectrum and increased homeologous recombination rates). The anti-mutator nature of NucS is also demonstrated in Streptomyces coelicolor, a different species of the class Actinobacteria. Therefore, NucS appears to be an important DNA repair factor which, together with the high-fidelity DNA-polymerase DnaE1 and its PHP domain-proofreader 30, maintain low mutation rates (~10-10 mutations per base per generation 9,10), ensuring genome stability and DNA fidelity in mycobacteria and in other Actinobacteria. Although single nucleotide polymorphisms (SNPs) in DNA repair/replication genes have been suggested as a source of hypermutation in M. tuberculosis 31,32, only small increases in mutation rates have been observed due to these polymorphic genes (e.g. polymorphisms in PHP exonuclease domain of the DNA-polymerase DnaE1 30). Our results suggest the existence of naturally occurring hypermutable M. tuberculosis variants with diminished NucS activity. Whether or not this is relevant for the virulence and adaptation to antibiotic treatments remains to be deciphered. Although we observed that purified mycobacterial NucS binds to single-stranded but not to double-stranded DNA, as described for archaeal NucS proteins 11,12, no significant specific cleavage activity was observed on mismatched substrates, suggesting that activation

10 by other partners and/or modifications (e.g. post-translational modifications) is required in M. smegmatis. Also, the possibility of a functional difference between bacterial and archaeal NucS cannot be ruled out. Therefore, additional studies are needed to assess whether mycobacterial and archaeal NucS proteins have the same functional requirements. Our computational studies support an archaeal origin for NucS, as previously suggested 12, built upon two distinct domains that suffered a complex evolutionary history of transfers and/or losses, including at least two HGT events to Actinobacteria and Deinococcus- Thermus. Indeed, the contribution of horizontal gene transfer in bacterial and archaeal evolution is well-established 33,34. Interestingly, NucS and MutS-MutL systems seem to be present alternatively in different species, with only a few exceptions. In Actinobacteria, it is possible that the acquisition of NucS may have facilitated the subsequent loss of MutS-MutL, as these canonical proteins are so widely conserved across Bacteria.

On the other hand, there are two groups (Halobacteria and some Deinococcus- Thermus) where NucS and MutS-MutL coexist. Notably, in a Halobacteria, Halobacterium salinarum, inactivation of mutS or mutL produced no hypermutability 35, suggesting that MutS and MutL are redundant to an alternative system that controls spontaneous mutation. This indicates that evolution of these particular species may have opted to keep both systems. The possible interplay between both pathways remains to be elucidated. Surprisingly, NucS and MutS-MutL are apparently absent in some species. Therefore, the possible existence of additional alternative MMR repair pathways, yet to be identified, cannot be discarded.

In conclusion, we propose that MMR is a mechanism that can be either accomplished by either the MutS-L or the NucS pathway. Understanding the mechanisms and pathways that influence genome stability may unveil new strategies to predict and combat the development of drug resistance. In addition, engineered Mycobacterium strains lacking this mutation- avoidance pathway, may be valuable tools for evaluating anti-tuberculosis treatments, including new drugs, drug combinations and non-antibiotic based regimes.

11 Methods

Bacterial strains, media and growth conditions. M. smegmatis wild-type strain mc2 155 and its mutant derivatives were grown at 37ºC in Middlebrook 7H9 broth or Middlebrook 7H10 agar (Difco) with 0.5% glycerol and 0.05% Tween 80, and enriched with 10% albumin-dextrose-catalase (Difco). Escherichia coli strains were cultured at 37ºC in LB medium. Streptomyces coelicolor A3(2) M145 and its derivatives were grown at 30ºC on mannitol–soya (MS) agar. All primers used in this work are listed in Supplementary Table 6.

Generation and screening of a M. smegmatis insertion library Transposon ΦMycoMarT7 36 was used to obtain a M. smegmatis mutant library of 11,000 independent clones. For transduction, M. smegmatis mc2 155 cultures were washed, mixed with the phage stock at a multiplicity of infection of 1:10 (37ºC, 3 h) and plated on kanamycin (25 µg ml-1). The insertion mutants were isolated and inoculated into 96-well microtiter plates containing Middlebrook 7H9 medium. To identify hypermutators, each mutant was inoculated onto 7H10 agar plates with rifampicin (20 µg ml-1). One transposon mutant, producing a high number of rifampicin resistant (Rif-R) colonies, was selected for further characterization. The transposon insertion site in the chromosome was determined by sequencing.

Generation of a ΔnucS knockout mutant in M. smegmatis

M. smegmatis ΔnucS (MSMEG_4923, nucSSm) in-frame deletion mutant was generated 37 by allelic replacement . Briefly, p2NIL-ΔnucSSm, harbouring an in-frame deletion of the target gene, was electroporated into M. smegmatis mc2 155. Single-crossover merodiploid clones were isolated and after that, counter-selected to generate the unmarked deletion by a second crossover event. Finally, putative M. smegmatis ΔnucS colonies were checked by PCR and sequencing. For additional details see Supplementary Methods.

Complementation of the M. smegmatis ΔnucS mutant

For complementation, a wild type of the MSMEG_4923 gene (nucSSm) and the wild- type full-length MT_1321 gene from CDC1551 control strain (nucSTB), including their own promoter regions, were cloned into the vector pMV361 38 and introduced into M. smegmatis ΔnucS by electroporation. pMV361 is an integrative vector that carries a site-specific

12 integration system derived from the mycobacteriophage L5. It integrates at a single site in the M. smegmatis chromosome, the attB, which overlaps the 3´ end of the tRNA-glycine gene 39. Integration at the appropriate site was verified for all constructs by PCR. For additional details see Supplementary Methods.

Generation of ΔnucS S. coelicolor knockout mutant and complementation

S. coelicolor ΔnucS (gene SCO5388, nucSSco) in-frame deletion mutant was 40 constructed by double crossover recombination using the plasmid pIJ6650 . pIJ-ΔnucSSco

(containing the in-frame deletion of the nucSSco gene) was introduced in S. coelicolor A3(2)

M145 to generate the unmarked deletion of the nucSSco gene. Finally, S. coelicolor ΔnucS was complemented with a wild-type copy of nucSSco inserted in the attB site of the S. coelicolor ΔnucS via the integrative vector pSET152 (Supplementary Methods).

Estimation of mutation rates Fluctuation analyses were used to experimentally address mutation rates. For each experiment, 20 independent cultures of M. smegmatis mc2 155 and its derivatives were grown and diluted in order to inoculate 1,000-10,000 cells per ml into fresh medium. All the cultures were incubated until they reached stationary phase (about 109 cells per ml). Appropriate dilutions were plated on Middlebrook 7H10 medium with or without rifampicin (100 µg ml-1) or streptomycin (50 µg ml-1). For S. coelicolor, 20 independent concentrated spore suspensions (containing ∼109 spores per ml) from each analysed strain were generated on MS agar, and after that, plated on MS agar with or without rifampicin (100 µg ml-1) or streptomycin (50 µg ml-1). At least 3 different experiments were performed for each fluctuation analysis in both cases. The expected number of mutations per culture (m) and 95% confidence intervals were calculated using the maximum likelihood estimator applying the newton.LD.plating and confint.LD.plating functions that account for differences in plating efficiency implemented in the package rSalvador (http://eeeeeric.com/rSalvador/) (Qi Zheng. Rsalvador: an assay. R package version 1.3) for R (www.R-project.org/). Mutation rates (mutations per cell per generation) were then calculated by dividing m by the total number of generations, assumed to be roughly equal to the average final number of cells. Statistical comparisons were carried out by the use of the LRT.LD.plating function of the same package, which accounts for the differences in final number of cells and plating efficiency. Finally, in case of multiple comparisons p-values were corrected by the Bonferroni method.

13 Characterization of the mutational spectrum. The mutational spectra conferred by M. smegmatis mc2 155 and its ΔnucS derivative were characterised by selecting for resistance to rifampicin and sequencing the rifampicin resistance-determining region (RRDR) in the rpoB gene, from independent Rif-R isolates.

Recombination assays The recombination assay vectors (pRhomyco series) were generated by cloning two overlapping fragments of the hygromycin-resistance gene (hyg), from pRAM 41, flanking a functional kanamycin-resistance (kan) gene in the pMV361 plasmid (Supplementary Fig. 2). Neither of the fragments of hyg conferred resistance to hygromycin on their own. Fragments share a duplicated 517 bp overlapping region, with different degrees of sequence identity (100%, 95%, 90% or 85%). M. smegmatis mc2 155 and its ΔnucS derivative were transformed by electroporation with the integrative plasmids pRhomyco 100%, 95%, 90% and 85% designed for the recombination assays and plated on Middlebrook 7H10 containing kanamycin (25 µg/µl). Integration of pRhomyco vectors in the appropriate site (attB) of the 2 M. smegmatis mc 155 (wild-type and ΔnucS) chromosome was verified by PCR. To measure recombination rates, we analysed the restoration of a functional hyg gene that confers Hyg-R by a recombination event between the two overlapping fragments of the hyg gene. Sixteen independent overnight cultures from each strain were grown in Middlebrook 7H9 broth plus kanamycin (to prevent the selection of Hyg-R bacteria coming from the early recombination of pRhomyco). Approximately, 104 bacteria were inoculated in plain 7H9 broth (without kanamycin) to allow the recombination events and cultured 24h at 37ºC with shaking. Cultures were plated on 7H10 plus 50 µg ml-1 hygromycin (for recombinant cells count) and in addition, appropriate dilutions were plated on plain 7H10 (for viable cell counts) and incubated for 3-5 days. Total number of Hyg-R colonies was considered to calculate recombination rates. Recombination rates were calculated and analysed as described in estimation of mutation rates. To validate the recombination assay, we first verified that no Hyg-R colonies were generated by spontaneous mutation, even in the ΔnucS derivative. The mutation rate was ≤1x10-10 for the hypermutable derivative, far below the lowest recombination value (6x10-9 for WT, 15% non-identical sequences). These results indicate that the Hyg-R colonies from our recombination assay were not generated by spontaneous mutations. Moreover, we verified that recombinant clones express hygromycin resistance and kanamycin susceptibility by picking 20-30 Hyg-R colonies from each strain onto kanamycin plates. In all cases they were

14 Hyg-R and Kan-S. Finally, we randomly isolated 10 independent Hyg-R presumptive recombinant colonies from each construct (80 colonies total) and verified by PCR that a single fragment of the size of the reconstituted hyg gene was amplified in all cases.

NucS purification and biochemical activity. M. smegmatis nucS was cloned as a SUMO fusion and expressed in E. coli BL21 cells overnight at 16ºC. NucS protein was purified by affinity chromatography using Ni2+–NTA agarose column followed by an additional step of anion exchange purification in HiTrap Q HP column. NucS-SUMO fusion was cleaved by Ulp protease to release a native protein. Purified protein was collected and concentrated to perform the activity assays (Supplementary Fig. 1A). In addition, His-tagged M. smegmatis nucS was also cloned in pET28b (for E. coli expression) and pYUB28b (for M. smegmatis expression). His-tagged NucS was also purified by affinity chromatography and anion exchange purification as described before. For DNA EMSA assays, NucS purified protein (1 µM to 16 µM) was incubated with fluorescein-labelled single-stranded (45-mer DNA, 50 nM) or double-stranded DNA (45-bp DNA, 50 nM), in 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 10% glycerol and 10 µg ml-1 BSA for 30 minutes at 37ºC. Protein-DNA complexes were resolved on 5% polyacrylamide gel in 0.25xTBE plus 1% glycerol and run under refrigerated conditions (5ºC-10ºC). For nuclease activity, fluorescein-labelled 36-mer DNA was annealed to a fully complementary opposite strand (control) or carrying a single nucleotide mismatch in the central region of the strand (mismatch substrates). 30 nM dsDNA substrates were incubated with 300 nM of recombinant, tag free NucS in 20 mM Tris-HCl pH 7.5, 6 mM ammonium -1 sulfate, 2 mM MgCl2 and 100 mM NaCl, 0.1 mg ml BSA and 0.1% TritonX-100. The reactions were carried out at 37°C for 30 minutes when 0.5 U Proteinase K was added to each reaction to digest NucS nuclease. Digestion was continued for 30 minutes at 50°C. Stop solution was added (95% formamide, 0.09% xylene cyanol), samples were boiled at 95°C for 10 minutes and resolved on 7M urea, 15% polyacrylamide gel in 1xTBE for 2 hours. Mg, Mn and Zn ions, and combinations of these three metals, were tested as cofactors with no cleavage observed.

Analysis of nucS SNPs in M. tuberculosis clinical strains Sequences from the MT_1321 gene product were downloaded from the Ensembl database (http://bacteria.ensembl.org/index.html), M. tuberculosis variome resource 42 and

15 Comas et al. data 43. 1,600 proteins were aligned using MAFFT and the polymorphisms were visualized with Jalview.

Construction of nucS alleles by site-directed mutagenesis.

To construct nucS alleles, wild type nucSTB gene was PCR amplified and cloned into pUC19 to be used as template for mutagenic PCRs. Single specific mutations were introduced into nucSTB by PCR with the suitable mutagenic pairs of primers. Following the PCR reaction, a DpnI digestion was carried out for 4 hours at 37ºC. Mutagenized plasmids obtained after transformation into E. coli DH5α were verified by DNA sequencing. All the nucS alleles were re-cloned into the integrative vector pMV361 to generate a set of complementation plasmids and introduced by transformation into M. smegmatis mc2 155 ΔnucS. These plasmids integrate at a single site, attB, in the M. smegmatis chromosome. In order to test the efficiency of each nucS allele to restore normal mutation rates, the plasmids pMV361 carrying the mutated alleles were integrated in M. smegmatis ΔnucS chromosome. Integration of each nucSTB allele in the appropriate site (attB) of the chromosome was verified by PCR.

Computational analyses of the NucS protein. A summary of all the computational approaches conducted is depicted in Supplementary Fig. 4. To establish how general the presence of NucS is in the domains on life, we conducted sequence analyses to initially identify NucS proteins (Supplementary Fig. 4). NUCS_MYCTU (M. tuberculosis NucS) and NUCS_PYRAB (P. abyssi NucS) sequences were used to find homologues. Each full sequence was used as an independent query in pHMMER searches (HMMER3; http://hmmer.org/) against the large database of the concatenated reference proteomes (2016_02 release of 17th February 2016). We next collected sequences excluding the original sequences used to conduct the queries. The remaining sequences were filtered by e-value (removed >0.0001), bit score >50, and length (only those >75% length were kept), and further aligned by MAFFT [http://mafft.cbrc.jp/alignment/software/] 44. The alignments were visualized with Belvu [http://sonnhammer.sbc.su.se/Belvu.html] 45 to check for quality. We removed the redundancy of the alignment discarding sequences with more than 63% of sequence identity. We next generated Markov profiles trained with the non-redundant multiple sequence alignment (using HHMER3). For each protein, we obtained about the same 400 sequences that would give also

16 partial hits to different species, which suggested a potential existence of different domains in the protein. Further checking of the sequences retrieved the final 370 sequences. As NucS was not identified in eukaryotes or viruses, they were discarded for further analyses. We constructed a phylogenetic profile of NucS (Supplementary Data 1) and mapped it into the NCBI taxonomic tree (Figure 5, the newick tree format is available in Supplementary Data 2). The tree was annotated using ggtree (http://www.bioconductor.org/packages/ggtree).

Characterization of the two NucS regions To identify potential domains in NucS, we focused on the archaeal P. abyssi NucS, whose 3D structure was previously resolved (PDB 2VLD chain B) 11. This structure is composed by two distinct regions: the N-terminal DNA binding region (1-114 amino acids) and the C-terminal catalytic region (126-233 amino acids) 11. Given the absence of structural data for the bacterial NucS protein, the M. tuberculosis NucS structure was modelled using the archaeal P. abyssi NucS as a template using I-TASSER 16 and generated a reliable model (Supplementary Fig. 3). Template and model were structurally aligned using the Combinatorial Extension (CE) algorithm17, built-in with PyMOL, to identify relevant residues. We used the structure-based alignment between this template and M. tuberculosis NucS (NucS_MYCTU) (Supplementary Fig. 3) to determine the different regions for further sequence analyses (N-terminal and C-terminal regions). These analyses were conducted using profile searches as above and confirmed by independent analyses of the PF01939 domain trained with NucS (for details see Supplementary Methods and Supplementary Fig. 4). We conducted sequence searches of NucS_MYCTU in the PFAM database, where the PF01939 domain (DUF91, automatic) was identified. From the PF01939 domain (539 sequences in 464 species), only 368 entries in 363 species (archaeal and bacterial) gave a match with the full protein (Supplementary data file 1). The results obtained from the PFAM database searches for the PF01939 domain, are in agreement with the structure-based profiles analyses (see above) confirming the existence of two distinct regions. Taking into account the domain boundaries established above, we next aligned the 368 full NucS proteins (from PFAM PF01939) to profiles built from the structural alignment corresponding to i) the whole protein ii) the N-terminal region (NT) and iii) the C-terminal region (CT).

17 Sequence analysis of NucS domains Using the different defined regions by structural bioinformatics, we first conducted sequence searches against the large database. We made non-redundant alignments of the N- terminal region (containing 63 sequences) and the C-terminal region (containing 39 sequences) and built profile hidden Markov models (HMMs). These profiles were searched using pHMMER 46 against the large References Proteomes file. Our analyses identified the C-terminal region in all the original 368 NucS sequences (for details see Supplementary Methods and Supplementary Fig. 4). In addition, the C- terminal was also found in proteins containing a variety of additional protein domains in other prokaryotic species and in few additional eukaryotic sequences. We checked the alignments to confirm that the catalytic residues were conserved, as described in the structure of P. abyssi 11. When a profile with the eukaryotic sequences (excluding any bacterial or archaeal sequences) was generated, we retrieved again the C-terminal regions of NucS above threshold at significant values, but we did not retrieve any other eukaryotic sequences at reliable values. On the other hand, the N-terminal region was found mainly in bacterial and archaeal NucS, but also in two archaeal proteins annotated as topoisomerases. A pHMMER search of the regions of these sequences yielded the N-terminal of NucS.

Phylogenetic analyses We conducted Maximum Likelihood (ML) phylogenetic analyses using RAxML 47 in order to construct phylogenetic trees that could explain the phylogenetic profile (Supplementary Fig. 4). To run phylogenetic analyses, NucS sequences from the PF01939 domain were used, but aligned to their corresponding structure-based regions. Sequences found in our searches with independent regions were also included. Out of 539, 368 unique sequences aligned with the full NucS structure-based profile (Supplementary Fig. 5, Supplementary Data 3), 422 sequences containing the C-terminal region (NucS-CT) (Supplementary Fig. 6), 370 sequences containing the N-terminal region (NucS-NT) (Supplementary Fig. 7) were run. Sequences were aligned to the structure-based profiles, and the alignments were subjected to ML search and 1,500 bootstrap replicates using RAXML 47. All free model parameters were estimated by RAxML where we used a GAMMA model of rate heterogeneity and ML estimate of alpha-parameter. The most likely selected model was LG 48. For accuracy and focus on our alignment, we run the sequential version of RAxML creating various sets of starting trees (10 by manual inferring different starting trees and 10 as

18 automatically inferred by the program). The best setting (the closest likelihood to zero) was used for further calculations. To run phylogenetic analyses, sequences from the PFAM PF01939 domain were used, but aligned to their corresponding region. For details of the particular regions for each domain see Supplementary Data 4. Sequences found in our searches with NT or CT regions outside NucS were also included. Phylogenetic trees were visualized with iTOL (http://itol.embl.de) 49. The raw trees corresponding to both CT and NT regions (Supplementary Figs. 6 and 7) are available as Supplementary Data 5 and 6 respectively.

Data Availability The authors declare that all data supporting the findings of this study are available within the paper and its supplementary information files.

19 References 1 Friedberg, E. et al. DNA Repair and Mutagenesis. Washington DC, USA: American Society of Microbiology. 2nd Edition edn, (2006). 2 Eisen, J. A. & Hanawalt, P. C. A phylogenomic study of DNA repair genes, proteins, and processes. Mutation research 435, 171‐213 (1999). 3 Iyer, R. R., Pluciennik, A., Burdett, V. & Modrich, P. L. DNA mismatch repair: functions and mechanisms. Chem Rev 106, 302‐323, doi:10.1021/cr0404794 (2006). 4 Matic, I., Rayssiguier, C. & Radman, M. Interspecies gene exchange in bacteria: the role of SOS and mismatch repair systems in evolution of species. Cell 80, 507‐515, doi:0092‐8674(95)90501‐4 [pii] (1995). 5 Sachadyn, P. Conservation and diversity of MutS proteins. Mutation research 694, 20‐30, doi:10.1016/j.mrfmmm.2010.08.009 (2010). 6 Mizrahi, V. & Andersen, S. J. DNA repair in Mycobacterium tuberculosis. What have we learnt from the genome sequence? Molecular microbiology 29, 1331‐ 1339 (1998). 7 Banasik, M. & Sachadyn, P. Conserved motifs of MutL proteins. Mutation research 769, 69‐79, doi:10.1016/j.mrfmmm.2014.07.006 (2014). 8 Springer, B. et al. Lack of mismatch correction facilitates genome evolution in mycobacteria. Molecular microbiology 53, 1601‐1609, doi:10.1111/j.1365‐ 2958.2004.04231.x (2004). 9 Ford, C. B. et al. Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nature genetics 43, 482‐486, doi:10.1038/ng.811 (2011). 10 Kucukyildirim, S. et al. The Rate and Spectrum of Spontaneous Mutations in Mycobacterium smegmatis, a Bacterium Naturally Devoid of the Post‐replicative Mismatch Repair Pathway. G3 (Bethesda), doi:10.1534/g3.116.030130 (2016). 11 Ren, B. et al. Structure and function of a novel endonuclease acting on branched DNA substrates. The EMBO journal 28, 2479‐2489, doi:emboj2009192 [pii]10.1038/emboj.2009.192 (2009). 12 Ishino, S. et al. Identification of a mismatch‐specific endonuclease in hyperthermophilic Archaea. Nucleic acids research, doi:10.1093/nar/gkw153 (2016). 13 Wagner, R. et al. Involvement of Escherichia coli mismatch repair in DNA replication and recombination. Cold Spring Harb Symp Quant Biol 49, 611‐615 (1984). 14 Nakae, S. et al. Structure of the EndoMS‐DNA Complex as Mismatch Restriction Endonuclease. Structure 24, 1960‐1971, doi:10.1016/j.str.2016.09.005 (2016). 15 Gross, M. D. & Siegel, E. C. Incidence of mutator strains in Escherichia coli and coliforms in nature. Mutation research 91, 107‐110 (1981). 16 LeClerc, J. E., Li, B., Payne, W. L. & Cebula, T. A. High mutation frequencies among Escherichia coli and pathogens. Science 274, 1208‐1211 (1996). 17 Matic, I. et al. Highly variable mutation rates in commensal and pathogenic Escherichia coli. Science 277, 1833‐1834 (1997). 18 Oliver, A., Canton, R., Campo, P., Baquero, F. & Blazquez, J. High frequency of hypermutable Pseudomonas aeruginosa in cystic fibrosis lung infection. Science 288, 1251‐1254, doi:8507 [pii] (2000).

20 19 Picard, B. et al. Mutator natural Escherichia coli isolates have an unusual virulence phenotype. Infect Immun 69, 9‐14, doi:10.1128/IAI.69.1.9‐14.2001 (2001). 20 Giraud, A., Matic, I., Radman, M., Fons, M. & Taddei, F. Mutator bacteria as a risk factor in treatment of infectious diseases. Antimicrobial agents and chemotherapy 46, 863‐865 (2002). 21 Macia, M. D. et al. Hypermutation is a key factor in development of multiple‐ antimicrobial resistance in Pseudomonas aeruginosa strains causing chronic lung infections. Antimicrobial agents and chemotherapy 49, 3382‐3386, doi:49/8/3382 [pii] 10.1128/AAC.49.8.3382‐3386.2005 (2005). 22 Muller, B., Borrell, S., Rose, G. & Gagneux, S. The heterogeneous evolution of multidrug‐resistant Mycobacterium tuberculosis. Trends in genetics : TIG 29, 160‐ 169, doi:10.1016/j.tig.2012.11.005 (2013). 23 Ford, C. B. et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug‐ resistant tuberculosis. Nature genetics 45, 784‐790, doi:10.1038/ng.2656 (2013). 24 Cox, E. C. Bacterial mutator genes and the control of spontaneous mutation. Annu Rev Genet 10, 135‐156, doi:10.1146/annurev.ge.10.120176.001031 (1976). 25 Lee, H., Popodi, E., Tang, H. & Foster, P. L. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole‐genome sequencing. Proceedings of the National Academy of Sciences of the United States of America 109, E2774‐2783, doi:10.1073/pnas.1210309109 (2012). 26 Garibyan, L. et al. Use of the rpoB gene to determine the specificity of base substitution mutations on the Escherichia coli chromosome. DNA Repair (Amst) 2, 593‐608 (2003). 27 Tham, K. C. et al. Mismatch repair inhibits homeologous recombination via coordinated directional unwinding of trapped DNA structures. Mol Cell 51, 326‐ 337, doi:10.1016/j.molcel.2013.07.008 (2013). 28 Feinstein, S. I. & Low, K. B. Hyper‐recombining recipient strains in bacterial conjugation. Genetics 113, 13‐33 (1986). 29 Spies, M. & Fishel, R. Mismatch repair during homologous and homeologous recombination. Cold Spring Harbor perspectives in biology 7, a022657, doi:10.1101/cshperspect.a022657 (2015). 30 Rock, J. M. et al. DNA replication fidelity in Mycobacterium tuberculosis is mediated by an ancestral prokaryotic proofreader. Nature genetics, doi:10.1038/ng.3269 (2015). 31 Ebrahimi‐Rad, M. et al. Mutations in putative mutator genes of Mycobacterium tuberculosis strains of the W‐Beijing family. Emerging infectious diseases 9, 838‐ 845, doi:10.3201/eid0907.020589 (2003). 32 Dos Vultos, T., Blazquez, J., Rauzier, J., Matic, I. & Gicquel, B. Identification of Nudix hydrolase family members with an antimutator role in Mycobacterium tuberculosis and Mycobacterium smegmatis. Journal of bacteriology 188, 3159‐ 3161, doi:188/8/3159 [pii] 10.1128/JB.188.8.3159‐3161.2006 (2006). 33 Dagan, T., Artzy‐Randrup, Y. & Martin, W. Modular networks and cumulative impact of lateral transfer in genome evolution. Proceedings of the National Academy of Sciences of the United States of America 105, 10039‐10044, doi:10.1073/pnas.0800679105 (2008).

21 34 Gophna, U., Charlebois, R. L. & Doolittle, W. F. Have archaeal genes contributed to bacterial virulence? Trends in microbiology 12, 213‐219, doi:10.1016/j.tim.2004.03.002 (2004). 35 Busch, C. R. & DiRuggiero, J. MutS and MutL are dispensable for maintenance of the genomic mutation rate in the halophilic archaeon Halobacterium salinarum NRC‐1. PloS one 5, e9045, doi:10.1371/journal.pone.0009045 (2010). 36 Sassetti, C. M., Boyd, D. H. & Rubin, E. J. Comprehensive identification of conditionally essential genes in mycobacteria. Proceedings of the National Academy of Sciences of the United States of America 98, 12712‐12717, doi:10.1073/pnas.231275498 (2001). 37 Parish, T. & Stoker, N. G. Use of a flexible cassette method to generate a double unmarked Mycobacterium tuberculosis tlyA plcABC mutant by gene replacement. Microbiology 146 ( Pt 8), 1969‐1975, doi:10.1099/00221287‐146‐8‐1969 (2000). 38 Stover, C. K. et al. New use of BCG for recombinant vaccines. Nature 351, 456‐ 460, doi:10.1038/351456a0 (1991). 39 Lee, M. H., Pascopella, L., Jacobs, W. R., Jr. & Hatfull, G. F. Site‐specific integration of mycobacteriophage L5: integration‐proficient vectors for Mycobacterium smegmatis, Mycobacterium tuberculosis, and bacille Calmette‐Guerin. Proceedings of the National Academy of Sciences of the United States of America 88, 3111‐3115 (1991). 40 Paget, M. S. et al. Mutational analysis of RsrA, a zinc‐binding anti‐sigma factor with a thiol‐disulphide redox switch. Molecular microbiology 39, 1036‐1047 (2001). 41 Hinds, J. et al. Enhanced gene replacement in mycobacteria. Microbiology 145 ( Pt 3), 519‐527, doi:10.1099/13500872‐145‐3‐519 (1999). 42 Joshi, K. R., Dhiman, H. & Scaria, V. tbvar: A comprehensive genome variation resource for Mycobacterium tuberculosis. Database (Oxford) 2014, bat083, doi:10.1093/database/bat083 (2014). 43 Comas, I. et al. Out‐of‐Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nature genetics 45, 1176‐ 1182, doi:10.1038/ng.2744 (2013). 44 Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 3059‐3066 (2002). 45 Sonnhammer, E. L. & Hollich, V. Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinformatics 6, 108, doi:10.1186/1471‐2105‐6‐108 (2005). 46 Eddy, S. R. A new generation of homology search tools based on probabilistic inference. Genome Inform 23, 205‐211 (2009). 47 Stamatakis, A. RAxML‐VI‐HPC: maximum likelihood‐based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688‐2690, doi:10.1093/bioinformatics/btl446 (2006). 48 Le, S. Q. & Gascuel, O. An improved general amino acid replacement matrix. Molecular biology and evolution 25, 1307‐1320, doi:10.1093/molbev/msn067 (2008). 49 Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research 44, W242‐245, doi:10.1093/nar/gkw290 (2016).

22 50 Hug, L. A. et al. A new view of the tree of life. Nat Microbiol 1, 16048, doi:10.1038/nmicrobiol.2016.48 (2016).

23 End Notes

Acknowledgments. J.B. was supported by Plan Nacional de I+D+i and Instituto de Salud Carlos III, Subdirección General de Redes y Centros de Investigación Cooperativa, Ministerio de Economía y Competitividad, Spanish Network for Research in Infectious Diseases (REIPI RD12/0015) - co-financed by European Development Regional Fund "A way to achieve Europe" ERDF and SAF2015-72793-EXP and BFU2016-78250-P from Spanish Ministry of

Science and Competitiveness (MINECO)-FEDER). A.J.D. was supported by a grant from the Biotechnology and Biological Sciences Research Council (BB/J018643/1). A. C-G. and J. R- B. were supported by contracts from REIPI RD15/0012. and A. C-G also by a Ramon Areces fellowship for Life Sciences. A. P. was supported by a “Sara Borrell” contract, Instituto de Salud Carlos III. T.T. was supported by the Research Council of Norway, GLOBVAC project 234506. We are grateful to I. Cases for his help customizing the R libraries for tree visualization, Federico Abascal for critical input regarding models of evolution, I. Comas for critical reading of the manuscript and J. Glynn and S. Hoffner for providing additional data of M. tuberculosis strains.

Author contributions: J.B. designed the project, directed the experimental work, constructed S. coelicolor strains and wrote the manuscript; A.C-G participated in the project design, performed and designed experiments, made strains in M. smegmatis and S. coelicolor, developed biochemical assays together with P.P and wrote the manuscript. A.I.P. made strains, measured recombination rates, performed experiments with M. smegmatis and bioinformatics search of NucS polymorphisms. A. M. R. performed all the computational analyses (evolutionary, structural bioinformatics and sequence) and wrote the manuscript; J.R-B. measured mutation rates and performed statistical analysis; N.A-R, D.C., C.C. L.P., E. D. Z., M. Herranz, T.T., D.G-V, S.W, M. P. and A.J.D. contributed with strain construction and/or measured mutation rates. A.J.D. performed data base exploration, helped with NucS structure and participated in writing the manuscript.

Conflict of interest. The authors declare no conflict of interest.

24 Figure legends.

Figure 1. Identification and characterization of M. smegmatis NucS. A) Schematic representation of the process for identifying the nucS transposon mutant. ~11,000 clones from the M. smegmatis insertion mutant library were replicated onto plates with (Rif) or without (No Rif) rifampicin (step 1). One single clone (circled) produced a high number of Rif-R colonies. After isolation and purification (step 2), the frequency of spontaneous Rif-R mutants (bottom plate) was checked and compared with that of the wild-type (upper plate), demonstrating its hypermutable phenotype. B) Multiple sequence alignment of NucS sequences. M. tuberculosis (NucS_Mtu), M. smegmatis (NucS_Msm) and P. abyssi (NucS_Pab) sequences are from Uniprot (identifiers are P9WIY4, A0R1Z0 and Q9V2E8, respectively). Solid lines over the alignment indicate protein domains as defined previously for P. abyssi NucS 11 (black, DNA-binding; grey, nuclease). Identical amino acid residues are shown in black. Catalytic residues required for nuclease activity in P. abyssi NucS 11 are labelled with asterisks. Nuclease motifs of P. abyssi NucS are also indicated 11. C) DNA binding activity of NucS. In a gel-based electrophoretic mobility shift assay (EMSA), purified NucS protein (1 µM to 16 µM) is capable of binding to 45-mer ssDNA (50 nM) (left side) but not to 45-bp dsDNA (50 nM) (right side). The arrow indicates the position of the DNA-NucS complex.

Figure 2. Mutational effects of nucS deletion. A) Rates of spontaneous mutations conferring rifampicin, Rif-R (red), and streptomycin resistance, Str-R (grey) of M. smegmatis mc2 155 (WT), its ΔnucS derivative and the ΔnucS 2 strain complemented with nucS from M. smegmatis mc 155 (nucSSm). B) Mutational spectrum of M. smegmatis mc2 155 (green) and its ∆nucS derivative (red). Bars represent the frequency of the types of change found in rpoB. C) Rates of spontaneous mutations conferring Rif-R (red) and Str-R (grey) of S. coelicolor A3(2) M145 (WT), its ΔnucS derivative and the ΔnucS strain complemented with the wild-type nucS from S. coelicolor

(nucSSco). Error bars represent 95% confidence intervals (n=20). Asterisks denote statistical significance (Likelihood ratio test under Luria-Delbruck model, Bonferroni corrected, p-value <10-4 in all cases). Mutation rate: mutations per cell per generation.

25

Figure 3. Effect of nucS deletion on recombination. A) Chromosomal construct used to measure recombination between homologous or homeologous DNA sequences. The hyg gene is reconstituted by a single recombination event between two 517-bp overlapping fragments (striped), sharing different degree of sequence identity (100%, 95%, 90% and 85%) and separated by a kanamycin resistant (Kan-R) gene. Recombinant clones express hygromycin resistance and kanamycin susceptibility. B) Rates of recombination between homologous and homeologous DNA sequences with different degree of identity (%) in M. smegmatis mc2 155 (WT, green squares) and its ΔnucS derivative (red diamonds). Error bars represent 95% confidence intervals (n=16). Asterisks denote statistical significance (Likelihood ratio test under Luria-Delbruck model, Bonferroni corrected, p-value <10-4 in all cases).

Figure 4. Effects of NucS polymorphisms on mutation rates in the M. smegmatis ΔnucS surrogate model. Rates of spontaneous mutations conferring Rif-R of the M. smegmatis ΔnucS complemented with wild-type nucSTB (ΔnucS/nucSTB; red) or containing each of the 9 polymorphisms indicated (blue). Relative increases in mutation rates with respect to the control strain

(ΔnucS/nucSTB; set to 1) are shown inside the column. Error bars represent 95% confidence intervals (n=20). Asterisks denote statistical significance (Likelihood ratio test under Luria- Delbruck model, Bonferroni corrected; ***p < 0.001; ** p < 0.005). Mutation rate: mutations per cell per generation.

Figure 5. Phylogenetic profiling of NucS. The NCBI taxonomic tree from 2,186 species from Bacteria (black outer label) and Archaea (blue outer label). Orange branches: NucS only; green branches: NucS and MutS-MutL. Bacteria includes Actinobacteria, , , FCB (, Chlorobi and ), Other (, , , Deinococcus-Thermus, and unclassified Terrabacteria) and other groups (, , Caldiserica, Chrysiogenetes, Deferribacteres, Dictyoglomi, , , , PVC group, , , , and unclassified bacteria). Archaea includes Euryarchaeota, TACK (Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota) and unclassified archaeal species (*). As NucS is absent in

26 eukaryotes and viruses, these lineages were removed for clarity purposes. The tree was annotated using ggtree (http://www.bioconductor.org/packages/ggtree).

Figure 6. A model for NucS protein emergence and evolution. The unrooted Tree of Life (available and based on reference 50) was used to depict the proposed evolutionary history of NucS according to our data. The groups relevant to our model are highlighted. Coloured squares depict the NucS-NT (blue) and NucS-CT (red) terminal regions. This model proposes that NucS has an archaeal origin and emerged as a combination of two independent protein domains with complex evolutionary history. Numbers indicate the steps of the model: Both N-terminal and C-terminal regions likely emerged in the archaeal lineage (1). The CT region was transferred via HGT to very few Eukaryotes and to some Bacteria (main groups with any species having the NucS-CT region are labelled with red circles), where the CT domain combined with other regions outside the context of NucS. In the archaeal lineage, NT and CT regions fused to produce the full NucS (2). NucS expanded in many archaeal groups but was also lost in some others. The full NucS protein was transferred to Bacteria by at least two independent HGT events, one to some Deinoccocus-Thermus species (3) and another to Actinobacteria (4).

27 A

B

NucS_Mtu 1 MSRVRLVIAQCTVDYIGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWLTE NucS_Msm 1 ---MRLVIAQCTVDYVGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWVTE NucS_Pab 25 HGGVVTIFARCKVHYEGRAKSELGEGDRIIIIKPDGSFLIH-QNKKREPVNWQPPGSKVTF Motif I * NucS_Mtu 62 ESG-GQAPVWVVENKAGEQLRITIEGIEHDSSHELGVDPGLVKDGVEAHLQALLAEHIQLL NucS_Msm 59 QDTETGVALWVVENKTGEQLRITVEDIEHDSHHELGVDPGLVKDGVEAHLQALLAEHVELL NucS_Pab 85 KEN-S---IISIRRRPYERLEVEIIEPYSLVVFLAEDYEELALTGSEAEM-ANLIFENPRV

Motif II Motif III Motif QxxxY * * * * * * NucS_Mtu 122 G--EGYTLVRREYMTAIGPVDLLCSDERG-GSVAVEIK-RRGEIDGVEQLTRYLELLNRDS NucS_Msm 120 G--AGYTLVRREYPTPIGPVDLLCRDELG-RSVAVEIK-RRGEIDGVEQLTRYLELLNRDS NucS_Pab 141 IEEGFKPIYRE-KPIRHGIVDV-MGVDKDGNIVVLELKRRKADLHAVSQLKRYVDSLKEEY

NucS_Mtu 179 VLAPVKGVFAAQQIKPQARILATDRGIRCLTLDYDTMRGMDSGEYRLF 226 NucS_Msm 177 LLAPVAGVFAAQQIKPQARTLATDRGIRCVTLDYDQMRGMDSDEYRLF 224 NucS_Pab 200 GE-NVRGILVAPSLTEGAKKLLEKEGLEFRKLEPP-KKG------236 C NucS NucS NucS

NucS-ssDNA

45-mer 50nM ssDNA 45-mer ssDNA 45bp-dsDNA

Proteobacteria FCB group

Other groups

Euryarchaeota

TACK group *

Firmicutes

Other Terrabacteria

NucS Actinobacteria NucS+MutS/L Microgenomates

Firmicutes

Deinococcus-Thermus Parcubacteria

Cyanobacteria Chloroflexi

delta-Proteobacteria

Crenarchaeota

(1) (3) Halobacteria epsilon-Proteobacteria Euryarchaeota (2) Placozoa,Porifera EUKARYA (4) gamma-Proteobacteria Coriobacteriia No NucS beta-Proteobacteria

Actinobacteria alpha-Proteobacteria

Rubrobacteria NucS-NT No NucS NucS-CT Chlamydia Horizont al gene t ransf er

Tree scale: 1 Chlorobi Supplementary Information

Supplementary Figures.

A B kDa 123 200 dsDNA 150 100 75

50

37

25 NucS C-A:C-A: C-T:C-T: 20 C-C:C-C: T-T:T-T: 15 T-G:T-G: A-A:A-A: 10 A-G:A-G: G-G:G-G:

NO:NO:

Supplementary Figure 1. Purified mycobacterial NucS does not cleave mismatch DNA substrates in vitro. A) Native M. smegmatis NucS protein purification. Recombinant NucS protein was expressed in E. coli, purified and concentrated, as described (see Methods). Native NucS protein was analysed in a SDS-PAGE gel and stained with Coomassie Brillant Blue. Lane 1, molecular marker; Lane 2: purified native M. smegmatis NucS; Lane 3: concentrated M. smegmatis NucS protein (used for EMSA and nuclease assays). B) Double stranded DNA (30 nM) with or without the indicated internal mismatches, was incubated with 300 nM NucS. Sequence consensus is shown at the bottom and mismatch positions are indicated in red. No significant cleavage activity was observed with the shown 36-mers or with longer 75-mer (not displayed here) mismatch substrates.

1

A

B

Supplementary Figure 2. Tools for measuring recombination rates. A) Alignment of the 517 bp overlapping sequences, with 100%, 95%, 90% and 85% identities, shared by hyg 5´ and hyg 3´ regions, used to construct the different pRhomyco versions. Changes

2 introduced in the original sequence are highlighted. B) Cartoon of plasmid pRhomyco before and after recombination. hyg 5’ and hyg 3’ are truncated alleles of the hyg gene carrying a 3´-terminal or 5´-terminal deletion, respectively. Both alleles overlap 517 bp (striped regions) and are separated by a 1,200 bp region containing aph3 gene (Kan-R). Four versions of pRhomyco were generated carrying hyg 5’ alleles 100%, 95%, 90% or 85% identical, in its overlapping region, to the hyg 3’. Plasmids integrate in the chromosome of M. smegmatis mc2 155 and its ΔnucS derivative. Site- specific recombination between the attP from the plasmid and the unique bacterial attB is promoted by the plasmid-encoded integrase.

3 A C-terminal domain N-terminal domain C-terminal domain

R42 R70 W75 E127 G157D160 E174K176 Q187Y191

D162 T168 A67 S39 V69 K184 Q187 Q187 Y191 S54

E174 R70 W75 A135 W75 R70 o R144 180 R42 R42

K176 T168

A67 V69 D162

Q187 S39

Y191 Q187 K184 S54

R70 W75 W75 R70 G157 A135 R144 E127 D160 R42 B R42

2VLD_Paby 25 HGGVVTIFARCKVHYEGRAKSELGEGDRIIIIKPDGSFLIHQN-KKREPVNWQPPGSKVT NucS_Mycsme 1 M---RLVIAQCTVDYVGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWVT NucS_Myctu 1 MSRVRLVIAQCTVDYIGRLTAHLPSARRLLLFKADGSVSVHADDRAYKPLNWMSPPCWLT **** * * * * * ***R * * ****I

2VLD_Paby 84 FK--E--NSmISIRRRPYERLEVEIIEPYSLVVFLAEDYEELaltgSEAEmANLIFENPR NucS_Mycsme 58 EQDTETGVALWVVENKTGEQLRITVEDIEHDSHHELGVDPGLVKDGVEAHLQALLAEHVE NucS_Myctu 61 EESGGQ-APVWVVENKAGEQLRITIEGIEHDSSHELGVDPGLVKDGVEAHLQALLAEHIQ S A **** * *** * 2VLD_Paby 140 VIEEGFKPIYREKPIRHGIVDVMGVDKDGNIVVLELKRRKADLHAVSQLKRYVDSLKEEY NucS_Mycsme 118 LLGAGYTLVRREYPTPIGPVDLLCRDELGRSVAVEIKR-RGEIDGVEQLTRYLELLNRDS NucS_Myctu 120 LLGEGYTLVRREYMTAIGPVDLLCRDERGGSVAVEIKR-RGEIDGVEQLTRYLELLNRDS ****S ****S*** * H * ****A *

2VLD_Paby 200 -GENVRGILVAPSLTEGAKKLLEKEGLEFRKLEPP 233 NucS_Mycsme 177 LLAPVAGVFAAQQIKPQARTLATDRGIRCVTLDYD 211 NucS_Myctu 179 VLAPVKGVFAAQQIKPQARILATDRGIRCLTLDYD 213 ***E * ***

Supplementary Figure 3. Domain characterization of M. tuberculosis NucS and polymorphic residues. A) The upper part is a schematic representation of the homodimeric structure of P. abyssi NucS (2VDL) 1. The two distinct domains of NucS, N-terminal and C-terminal, are in blue and

4 pink, respectively. The first residues (1-25) of the NucS P. abyssi N-terminal domain, missing in the M. tuberculosis model, are depicted as green cartoon. The putative β-clamp binding motif in P. abyssi NucS is shown in orange. The important catalytic and DNA binding sites are depicted over the P. abyssi structure as vertical red bars with the residues numbered above. Back and front views of the structural superimposition of the homodimeric resolved structure of NucS from P. abyssi 1 and the M. tuberculosis model (purple) is presented below the schema (colour code as described for the upper scheme). The important catalytic and DNA binding sites are depicted as red sticks and the residues where polymorphisms described in this work have been detected are shown as green sticks over the structural imposition. B) Multiple structure-based alignment of NucS from P. abyssi (2VDL), M. smegmatis mc2 155 and M. tuberculosis CDC155. Colour code is the same as in panel A. The amino acid change of each naturally occurring M. tuberculosis polymorphism is depicted in green. Important catalytic and DNA binding residues are in red. Grey low-case indicates regions without structural information; magenta “m” indicates Seleno-Met modifications in the structure of P. abyssi. Residues of the putative β-clamp binding motif in P. abyssi NucS are shown in orange. Only regions that align among the three proteins are shown. Asterisks indicate identical residues in the three aligned sequences.

5 Initial Sequence Searches Domain Analyses I Structure-based definition using NucS_pyrab: NucS_pyrab NucS_myctu full, N-terminal and C-terminal. Large DB including hmmbuild pHMMER 3,942 Ref Proteomes e-val < 0.0001, aln lenght > 75%, aFull aNT aCT Bit score > 50 hmmsearch Large DB including 3,942 Ref Proteomes ~400 ~400 Align sequences pFull pNT pCT (excluding original MAFFT Archaea/Bacteria queries) Archaea Archaea/Bacteria aln-nr 63% aln-nr 63% Bacteria Archaea-others Prokayotes-others Eukaryotes hmmbuild Archaea-others Eukaryotes p-aln p-aln MAFFT-hmmbuild MAFFT- hmmbuild Large DB including p-NT p-CT hmmsearch 3,942 Ref Proteomes hmmsearch Large DB including e-val < 0.0001, aln lenght > 75%, 3,942 Ref Proteomes Bit score > 50 Archaea-NucS Archaea-NucS Bacteria-NucS Bacteria-NucS 370 370 Compare ranked positions hits by e-value and NucS is built on two distinct domains (B) Supp. Fig 3 bit-score - ~370 unique archaeal/bacterial proteins - Eukaryotes and Virus discarded as lack NucS (A) Domain Analyses II Extract PFAM PF01939 seqs MutS/L identification 3D-based MutS MutL Selected species pairwise hmmalign

aFull aNT aCT MAFFT RAXML: phylogeny alnMutS alnMutL NucS- full sequence NucS N-Terminal NucS C-Terminal Phylogenetic tree Phylogenetic tree Phylogenetic tree hmmbuild Supp. Fig 5 Supp. Fig 7 Supp. Fig 6 pMutS pMutL 2,841 prokaryotic NucS in some bacteria have been transferred from Archaea (C) hmmsearch genomes Bit score > 50, aln lenght > 75%, Phylogenetic Profiling Supplementary data file 1 Procedure: Figure 5 MutS seqs MutL seqs MUTS If NO NucS is identified and “Unassembled WGS” Undef NucS else, translated searches vs genome(*) if NucS found NucS Proposed evolution of NucS if NucSnot found LikelyNot Figure 6 if MutS/L is found MutS/L if MutS/L not found: and “Unassembled WGS” Undef MutS/L (A) (D) Tree of Life else, translated searches vs genome(*) if MutS/L found MutS/L (B) (C) if MutS/L not found LikelyNot If NucS IS identified if MutS/L is found NucS-MutS/L External info if MutS/L not found: and “Unassembled WGS” Undef MutS/L Program else, translated searches vs genome(*) if MutS/L found NucS-MutS/L Results if MutS/L not found NucS-only**

Input file (*) Only complete genomes Actinobacteria phylum and Archaea ** Confident absence Output file NucS shows a disperse distribution pattern (D)

Supplementary Figure 4. Computational procedures to analyse NucS distribution and evolution. Each panel depicts a different protocol. Yellow fonts indicate figures and/or tables. Yellow uppercase letters are pieces of evidence used in the model presented in Fig. 6 (main text).

6 ARCHAEA Deinococcus

abolaH

c

iret

mu

_

nilas

mura

6

9

Halobacterium_sp._DL1

7

8

nortaN 1.MSD TCC.43021 CTA_

Natronobacterium_gregoryi_ Natronomonas_pharaonis_ Natronobacterium_gregoryi_ A

C

4780

09761.MSD.ts.iybs Halobacterium_salinarum_ATCC.700922

Haloterrigena_turkmenica_ Deinococcus_proteolyticus_ 7.

nomo TCC.29096 Natrinema_pellirubrum.st.DSM_15624 Deinococcus_gobiensis.st.DSM_21396 Methanocella_paludicola.st.DSM_177 .ts.il

Halorubrum_lacusprofundi_ATCC.49239 Deinococcus_gobiensis.st.DSM_21396 07.CCT

900

94034.C Halopiger_xanaduensis.st.DSM.8323 Thermogladius_cellulolyticus.st.1633 Natrialba_magadii_ Methanocella_arvoryzae.st.DSM.2066

22

a V19

TCC.700274

om_sagt A Halostagnicola_larsenii_XH-48 TCC.35063 Marinithermus_hydrothermalis.st.DSM.14884Deinococcus_maricopensis.st.DSM.21211 A

o A

ej Thermosphaera_aggregans.st.DSM. Natronococcus_occultus_SP4 A_ieata Deinococcus_soli_Cha_et_al._2014 Staphylothermus_marinus_ _succocil

CTA_iut

nepalo ATCC.700844 TCC.35061 Halobacterium_sp._DL1 law_mutardauqo A Deinococcus_swuensis Natrinema_sp.st.J7-2 hokum_muibo misce_Crenarchaeota_group-1_archaeon_SG8-32-1 11836 romsiram_alucraolaH halophilic_archaeon_DL31

Hyperthermus_butylicus.st.DSM_5456 akla

misce_Crenarchaeota_group-6_archaeon_AD8-1 A A ATCC.43099 TCC.BAA

ATCC.43098 TCC.35678 Aeropyrum_pernix_ATCC.700893ATCC.35074 TCC.43098 l r._AbM4

aH ATCC.51198 Pyrolobus_fumarii.DSM.11204 Methanosphaera_stadtmanae_ ATCC.35092

ATCC.51850 Candidatus_Caldiarchaeum_subterraneum Ignisphaera_aggregans.st.DSM.17230 rci Thermoproteus_sp._AZ2 4 7 6 8 1 _ M S D.ts.sis TCC.43587 laH

Methanopyrus_kandleri.st.

m Vulcanisaeta_distributa.st.DSM.14429 olaH

ATCC.43588 11

Haloferax_volcanii_ATCC.29605 Methanobacterium_lacus.st.AL-21 Pyrobaculum_aerophilum_ATCC.51768Caldivirga_maquilingensis_ Thermofilum_pendens.st.Hrk_5 Halogeometricum_borinquense_

Thermofilum_sp._1910b Methanobrevibacter_ruminantium_ Methanobrevibacte 11486 Methanobrevibacter_smithii_Methanothermobacter_thermautotrophicus_A Methanobacterium_paludis.st.DSM.25820 Methanobacterium_sp._MB1

Sulfolobus_solfataricus_Sulfolobus_acidocaldarius_ATCC.33909 Sulfolobus_tokodaii.st.DSM_16993 Acidianus_hospitalis.st.W1 Thermococcus_litoralis_Thermococcus_barophilus.st.DSM_ Thermococcus_kodakarensis_APyrococcus_furiosus_A Methanocaldococcus_jannaschii_ATCC.43067 archaeon_GW201 Pyrococcus_abyssi.st.GE5 Thermococcus_gammatolerans.st.DSM_15229

1_AR10

ATCC.9614

Conexibacter_woesei.st.DSM_14684 V-SCH5 Patulibacter_medicamentivoransIlumatobacter_coccineus_YM16-304

TCC.700054 ATCC.4875 A

TCC.43885 Actinobacteria_bacterium_IMCC26256 A

Frankia_symbiont_subsp._Datisca_glomerata

Frankia_alni.st.ACN14a Acidothermus_cellulolyticus_ Propionibacterium_freudenreichii_subsp._shermanii_ Microlunatus_phosphovorus_Propionibacterium_propionicum.st.F0230aPropionibacterium_acnes.st.KPA171202 Propionibacterium_acidipropionici_Brachybacterium_sp._SW0106-09 Frankia_sp.st.CcI3 Geodermatophilus_obscurus_ Frankia_sp.st.EuI1c Brachybacterium_faecium_ Propionimicrobium_lymphophilum_ACS-093- Frankia_sp.st.EAN1pec Frankia_sp._EUN1f

Blastococcus_saxobsidens.st.DD2 Mobilicoccus_pelagius_NBRC.104925 Corynebacterium_glucuronolyticum_ATCC.51866 Austwickia_chelonae_NBRC.105200 Kineosphaera_limosa_NBRC.100340 A Microbacterium_foliorum 1 TCC.43068 Microbacterium_sp._SA39 T 11 Corynebacterium_glyciniphilum_AJ_3170Corynebacterium_epidermidicanisuricella_otitidis_ Microbacterium_sp._Ag1 A Corynebacterium_variabile.st.DSM_44702I4F2P8 acti TCC.25078477641 Microbacterium_oxydans Microbacterium_azadirachtae Microbacterium_hydrocarbonoxydans Microbacterium_testaceum.st.StLB037 Microbacterium_chocolatum Corynebacterium_resistens.DSM.45100Corynebacterium_jeikeium.st.K411A TCC.51513 Microbacterium_ketosireducens Corynebacterium_urealyticum_Corynebacterium_falsenii_DSM.44353 Microbacterium_trichothecenolyticumMicrobacterium_laevaniformans_OR221 Microbacterium_ginsengisoli Bifidobacterium_asteroides.st.PRL20 Bifidobacterium_scardoviiBifidobacterium_animalis_subsp._lactis.st.AD01 Corynebacterium_atypicum Bifidobacterium_longum_subsp.longum_F8 Bifidobacterium_longum.st.NCC_2705 Corynebacterium_glutamicum_ATCC.13032 Scardovia_inopinata_F0304 Corynebacterium_argentoratense_DSM.44202ATCC.43042 Bifidobacterium_breve_DSM.20213 Scardovia_wiggsiae_F0424 Corynebacterium_efficiens.st.DSM_44549 Bifidobacterium_thermophilum_RBL67 Corynebacterium_pseudotuberculosis.st.C231 Gardnerella_vaginalis_ATCC.14019 Gardnerella_vaginalis_JCP8481B Corynebacterium_diphtheriae_A TCC.15703 Bifidobacterium_angulatum_DSM.20098 Bifidobacterium_pseudolongum_PV8-2TCC.27534 Corynebacterium_matruchotii_ATCC.14266 Bifidobacterium_gallicum_DSM.20093A Bifidobacterium_kashiwanohense_PV20-2 Corynebacterium_vitaeruminis_DSM.20294Corynebacterium_mustelae Bifidobacterium_adolescentis_A TCC.700971 Bifidobacterium_dentium_ Bifidobacterium_bifidum_BGN4 Corynebacterium_kutscheri Corynebacterium_sp._A r.F0384 Corynebacterium_xerosis Actinomyces_urogenitalis_DSM.15434 TCC.6931 Actinomyces_graevenitzii_C83 Saccharothrix_sp._NRRL_B-16348 Saccharothrix_espanaensis_ Actinomyces_sp._oral_tax_175.str.F0400 Actinomyces_massiliensis_F0489 Actinomyces_sp.oral_tax_448.st V-Col4 Nocardia_sp._NRRL_S-836ATCC.51 Actinomyces_neuii_BVS029A5 Lechevalieria_aerocolonigenes144 r.F0310 Actinosynnema_mirum_ATCC.29888 Actinomyces_turicensis_ACS-279- r.F0338 Actinomyces_sp.oral_tax_180.st Kutzneria_albida_DSM.43870 Actinomyces_sp.oral_tax_178.st Nakamurella_multipartita_ATCC.700099Streptomyces_sp._AA4 Mobiluncus_mulieris_ATCC.35239 Mobiluncus_curtisii_ATCC.43063 Amycolicicoccus_subflavus.st.DSM_45089 Trueperella_pyogenes ATCC.9345 Saccharopolyspora_erythraea_ Arcanobacterium_haemolyticum_ r._F0332 Saccharomonospora_marina_XMU15ATCC.1 1635 Actinomyces_sp._oral_taxon_848_st Saccharomonospora_viridis_A Actinotignum_schaalii TCC.15386 Saccharomonospora_glauca_K62 Beutenbergia_cavernae_ATCC.BAA-8 Amycolatopsis_orientalis_HCCB10007 Isoptericola_variabilis.st.225 Amycolatopsis_mediterranei.st.S699 Sanguibacter_keddieii_ATCC.51767 Pseudonocardia_dioxanivorans_ATCC.55486 Cellulomonas_gilvus_ Cellulomonas_fimi_ ATCC.13127 Rhodococcus_spRD6.2 ATCC.484 Jonesia_denitrificans_ATCC.14870 Rhodococcus_aetherivorans Cellulomonas_flavigena_A Tree scale: 0.1 Xylanimonas_cellulosilytica.st.DSM_15894 Rhodococcus_pyridinivorans_SB3094 TCC.482 Rhodococcus_equi_ATCC.33707 Curtobacterium_flaccumfaciens_UCD-AKU Leucobacter_sp._Ag1 Rhodococcus_erythropolis.st.PR4 Clavibacter_michiganensis_subsp._sepedonicus_A Rhodococcus_jostii.st.RHA1 Frigoribacterium_sp._RI Rhodococcus_sp._B7740 marine_actinobacterium_PHSC20C1 Leifsonia_xyli_subsp._xyli.st.CTCB07T-PI-h Mycobacterium_tuberculosis_H37RvATCC.BAA-535 Micrococcus_luteus_ TCC.3311 ATCC.12478 3 Mycobacterium_marinum_ TCC.BAA-614 Kocuria_rhizophila_ Mycobacterium_kansasii_ Kocuria_sp._UCD-OTCPATCC.4698 TCC.BAA-968 Rothia_mucilaginosa.st.D Mycobacterium_leprae.st.TN ATCC.9341 Mycobacterium_parascrofulaceum_A Rothia_dentocariosa_A Arthrobacter_arilaitensis.st.DSM_16368 Mycobacterium_haemophilum Renibacterium_salmoninarum_ATCC.33209Y-18 Mycobacterium_paratuberculosis_A Arthrobacter_chlorophenolicus_ATCC.700700TCC.17931 Mycobacterium_sinense.st.JDM601 Arthrobacter_sp.st.FB24 Arthrobacter_globiformis.NBRC.12137 Mycobacterium_neoaurum_VKM_Ac-1815DMycobacterium_obuense Arthrobacter_sp._ Mycobacterium_vanbaalenii.st.DSM.251 Kineococcus_radiotolerans_A Mycobacterium_sp.st.KMS) Janibacter_hoylei_PV Mycobacterium_elephantisATCC.700084 PAMC25486 Dermacoccus_nishinomiyaensis Janibacter_sp._HTCC2649 Mycobacterium_rhodesiae.st.NBB3 Intrasporangium_calvum_A L0J2N8 acti 212767 Terrabacter_sp._28 ATCC.19527 AS-1 TCC.BAA-149 Mycobacterium_smegmatis_Mycobacterium_sp._VKM_Ac-1817D Kytococcus_sedentarius_ ATCC.19977 Brevibacterium_mcbrellneri_ATCC.49030 Aeromicrobium_marinum_DSM.15272 Mycobacterium_xenopi_RIVM700367 Nocardioides_sp.st.BAA-49 TCC.23552 Mycobacterium_sp._EPa45 Nocardioides_sp._CF8 Nocardioides_simplex Mycobacterium_thermoresistibile_Mycobacterium_hassiacum_DSM.44199 Nocardioides_luteus Kribbella_flavida.st.DSM_17836 ATCC.14392 Gordonia_sputi_NBRC.100414 Stackebrandtia_nassauensis.st.DSM_44728 Mycobacterium_abscessus_ Actinoplanes_sp._A Gordonia_sp._KTR9 Actinoplanes_missouriensis_ ATCC.700358 Gordonia_rhizosphera_NBRC.16068 Gordonia_polyisoprenivorans.st.DSM.44266 Actinoplanes_sp._N902-109 Gordonia_bronchialis_ATCC.25592 Micromonospora_aurantiaca_ATCC.27029 Micromonospora_sp._HK10 Gordonia_amarae_NBRC.15530ATCC.BAA-972 TCC.8368 Salinispora_tropica_ A TCC.31044 Nocardia_nova_SH22a V Micromonospora_sp._NRRL_B-16802errucosispora_maris.st.AB-18-032 Actinoplanes_friuliensis_DSM.7358Micromonospora_lupini_st Gordonia_sihwensis_NBRC.108236Nocardia_brasiliensis_ ATCC.27064 Thermobifida_fusca.st.YXCatenulispora_acidiphila.st.DSM_44928 1891 Dietzia_cinnamea_P4 1 Thermomonospora_curvata_A A ATCC.33331 TCC.14538 Nocardia_cyriacigeorgica.st.GUH-2Nocardia_farcinica.st.IFM_10152 Nocardiopsis_alba_ATCC.BAA-2165 Nocardiopsis_dassonvillei_A Segniliparus_rotundus_ TCC.25486 Nocardiopsis_sp._NRRL_B-16309 ATCC.BAA-916 A Streptomyces_cattleya_Thermobispora_bispora_ Streptosporangium_roseum_ Streptomyces_himastatinicus_ATCC.53653 Streptomyces_bingchenggensis.st.BCW Streptomyces_violaceusniger_T Tsukamurella_paurometabola_ Streptomyces_vietnamensis Streptomyces_sp.st.SirexAA-E Streptomyces_sp._NRRL_F-6491 Streptomyces_sp._Mg1 r._Lupac_08 Actinobacteria_bacterium_OV450 Streptomyces_clavuligerus_ Streptomyces_sp._NRRL_F-4428 Streptomyces_sp._769 Streptomyces_katrae Streptomyces_sp. Streptomyces_pratensis_ Streptomyces_sp._WM6372Streptomyces_sp._NRRL_S-444 Streptomyces_scabiei.st.87.22Streptomyces_sp._WM4235 Streptomyces_sp._NRRL_WC-3618 Streptomyces_sp._MMG1533 Streptomyces_lydicus_A02 Streptomyces_niveus_NCIMB_ Streptomyces_caelestisStreptomyces_zinciresistens_K42 TCC.19995 Streptomyces_mangrovisoli Streptomyces_sp._e14 Saccharothrix_sp._ST-888 ATCC.10712 Streptomyces_pluripotens ATCC.35852 TCC.23218 Streptomyces_sp._MMG Actinobacteria_bacterium_OV320Streptomyces_avermitilis_ATCC.31267

u6071 Streptomyces_sp._WM6386 Streptomyces_sp._MUSC136T A

679.21_SD_sn 33704

S

5 TCC.19993 Streptomyces_auratus_AGR0001

t

4 Streptomyces_pristinaespiralis_ ertS

r Tu4000

0 p ATCC.12428

Streptomyces_sp._NRRL_S-495 pe Streptomyces_griseus_subsp._griseus.st.JCM_4626 Streptomyces_rubellomurinus M

_s

_MSD.ts.sunilloc_sec Streptomyces_xiamenensis TCC.14672 mot

Streptomyces_sp._CNQ-509 uc Kitasatospora_setae_ATCC.33774 A Streptomyces_sp._NRRL_F-6602 u_4

ait

sorgyh_ Streptomyces_rimosus_subsp._rimosus_ATCC.10970 ecy Streptomyces_sp._PVA_94-07 c s e cy m otp ertS

Streptomyces_variegatus s Streptomyces_sp._NRRL_F-6602 n 1

Streptomyces_sp._WM6378 ar aticso s_ 13

ps_

u -1 Streptomyces_albus_ATCC.21838 Streptomyces_nodosus

ucipociv

a Streptomyces_sp._T

Streptomyces_sp._AS58 s

o

_

esi Streptomyces_sp.st.SPB074 _.ps

s UM_. Actinobacteria_bacterium_OK006

r

1 Streptomyces_incarnatus Streptomyces_venezuelae_ bu

g

121 Streptomyces_sp._PBH53

s Streptomyces_glaucescens bus_sun Streptomyces_tsukubensis_NRRL18488 _s

CTA_sue

11CS

ecym

ymot

_.p

Streptomyces_coelicolor_ATCC.BAA Streptomyces_viridochromogenes j .C T9 Streptomyces_griseoflavus_ ni Actinobacteria_bacterium_OK074

pertS

ot

egomorhcoeso

agg 8092

per

n Streptomyces_davawensis_JCM_4913 3

Streptomyces_ghanaensis_

tS

Streptomyces_viridochromogenes.st.DSM_40736

Strept_cyaneogriseus_subsp.noncyanogenus

5.ts.sis n e g

r_tpertS

800

Tree scale ACTINOBACTERIA

Supplementary Figure 5. Phylogenetic analyses of NucS. Unrooted ML tree of full NucS sequences from the PFAM PF01939. Black branches are Actinobacteria; blue branches are Archaea; red branches are species from the Deinococcus-Thermus group. Labels indicate species name. Red circles represent >80% bootstrap in 1,500 replicates.

7 75 14 75 icum GC varicum p. p.GC14 m ba s um s iu ium bavarvaricum ter 88 ium ba 5 haeum tobac 3 29096 arc

50 B-403

- CC.43

8 i 371-Lokiarchae B f 30

i F m F 294-Bifidobact i 370-Loki

301 0-B dif 063 295-Parascardovia denticolens.DSM.10105 299 s.st.ATCC. us Magnetobacter 298-Bi o

ugno

B-603 abod .35 29 292- b icu idat c i laris DAOM 197198w ifidobacteri - cus.st.1633 C

3120

a Ga nd 30 ti t if -Bifi gu 3-Bif 2705 ph e o 372-Candidatus373-Candidatus Magnere Magnetobacter l.ps r tc ans.st.DSM 11486 ATC

2-Gardnerella e . 4-Ca Bifidoba rd

4NG fi lulolyeg totr st 37 doba . 2.MSD. gus ir dobacteri r nerella vagina a idobacterium uir el au bu B1 mui .NCC gg c M oph 296-S 297-Scardov 303- scardovii iz 290-Act B a s erm er

mu 5386 c 0332 iretcabod Rh 287-Actinom288 291-Caten ina

u um denti m sp. te

2 m 5- ium kashiwanohense PV20-2 cterium aste adius m M4 t.PS 7 86-Ac 289- urog oda m .F aera CC.43021 ud 3 ev s rium gall C.1

-A el gnol mu gnol gum.st rium AT

2 i ila a acter th cardovia 699 r ruminantium um thermophil 8 f e erb mui erb

82- ib e ithii. n .st.DSM 25820 274- ct 3 osph ob 09 inomyces s ct s

g

lon Staphylothermus marinus.st.AT sm tinomyces turiceinomyces sp.oral taxon 17 vaginalis JC ter sp.Ab 22 A 284 4 rm udi 285- mu necs ATCC.25078 oba erm AL-21 275-Mycobacterium pseudolongum PV8-2 m TCC.55486 5 er . um.st. 848 str l t al 009 ctinomyces sp.oral tax CCB10007 272 -Thermogl My is. 0089 th

ui st ulispora acidiphil it A 6 an bac ATCC.43 st. 351- talu The ia i 7 279-Sacc icum.D -M retcabod is.st.ATC t. retc no ret CH rium CC.7 yces sp.oral taxon 180 str st cobac .psbus s TCC.14538 .4875 C. 204 3523- eth yi.st. 283 l S AT 27 273-Mycoba 2 ro i wi C.961 r ts.s us. A brevi nopinat .ATCC.14019 te 870 axon ob s 80- . 35 -M .mu ATCC.27029 11 .st. c A A-91 hanobrevibact 281-Acti A ra stadtmanae.st. D ego 2 0- 278 ide ggsiae F0424 ns.s t

a abodifiB-80 aemolyticum.st.ATCC.9345 iluncus curtisii.st. T TC ATC ano acterium p 66 -A TC M -Metha DL1 rum p.oral taxon 175 str.F0384 alii ntalis H 17836 st. TCC . Mycobacterium neoaur Ac bod 359 268 t C S 2 h ina SM.2009 A er s.st.P st. hanobrevibac sp um RBL67 -M 72- ct obac ie 1-Me t - gene editerranei.st.S C sis.st. 360 sal Saccha C.2 P84 18-032 t.F0230a x. 5092 Meth m ium h tinomy i - osphae ifi t.D o obscur y -M ino fi 36 bacterium lacus. aromonospora marina XMU15 a erium y 0 iu an cob Myc ci.st. 054 i.s C.3 3-Me h ct B 16993 D A .ts.sitc a 362 s m 1 B-16802 erni B-9 a no F0304 rien ano 7534 456 yco us .DSM m.s thermor myces neuii nsis -7 p 002.MS b enit 81B er et a.DSM.43 260-My cteri RL20 -Bifid ATCC.BA A171202 ACS-093-V- TC 36 th lo 9 arinum.DSM.15 M e 2 acterium s my 0751. oba ct id P 700 lobacterium

0

25 bacterium ce 8 .A 364-Methanob Natronobacterium gr 61 03 902-109 - -Ha a 3 1044 271-Mycoba 277 r 110 .st. 256-Mycobacter 2 abscess om l alis.DSM.15434 3 3 .K 7-Mycobact ieris ACS-27 .s 3 310 3 -Myc 264-M65-Mycoba um s ces graeve 684 365- 376 377 8-Ha cter pini str.Lupac 08 ropioni s fumari .DSM 5 s.st 276-Dietzia cinnamea P4 11 alb .st.DSM 366-M 37 t.DSM 4 u 10 t co -Nocard m ans ii ha es o on 448 F2P8 lu p.NRRL .s icu st.W1 no s eperella ob 259-Mba obacterium ium rhod ass A or AR 267 istibil ATCC.4306 8 -I4 iensis.DSM.7358 tar ycoba TCC. charomonospora virid phophilum ct s spora es sp.N M16-304 rol 11 licus m va si us.st st spora sp.HK ul .ATCC. acidip okoda 262 ili Tru ora 0 BVS 9- p subsp.shermanii.st.Am Y ty erium t acum.DSM.4 ra tropica entiv Py eg -Mycobact269- ensis r nospora aurantiaca. p ii - solfa e .F0310 rus.st.ATCC. 2 nb um VKM A st V- .F0 1- no C26256 349-Aeropyrum p ri ycob cte e ia no 4928 o cterium sp.VKM Ac- 3-Actinomyces sp.oral t W2 19 -My 63-My o s fri hospitalis. um c mat ni 35239 12-Arcanoba 50 G s bu V aale ATCC.. gl 8-Sac oplan ium terium si ATCC r 029 Co 31 3 bus 3 es .F 314-Actinotignum scha ispo TCC.43068 255-Mycobacterium xenopi RIVM700367 De r tzi 338 31 terium CC.BAA-972 315-Streptomyces316-Amycolatopsis sp.AA4 or eudonocardia dioxanivora ium 317-Amycolatopsi 9-Kutzneria lo p 25 act ube ma cobacteri auca K62 T .A mu st.A paratuber is. ia va SH22 F0 0400 31 ctinoplanes missou nreich r ar i C83 l4 Aeromicrobium m lin eon lfo 5 8- nii le - eri cob A5 31 icrom a Sulfolobus t rk asc e. 320-Geodermatophil he r s te sp.st.KMS 489 3 -A woesei.st.DSM 14 idianus Myco r inum .st.DSM 7251 321-Blastococcus saxobsidens.st.DD2 rt um cu t. erium st 19527. 322-Deleted M noplane .st.A d 19977 323-Kribbella flavida.st 56- 796 rofulaceum acteriumnen A c-1815D Verrucosispora maris.st.AB- er phosphovo .st.H - L .NBB3 324 ionibac pe 3 -Ac kandleri. s los TC 27-Actin 4199 325 326-Ps 357-Su backan .s um haemoph 3 30- us 0 328-Microm329-Sa culosis se.st.JDM601 3 t -Hy 358 251 252- 253-Nocardia farcinica.st.IFMis.st.A 10152t.A C.700084 obuenseJ2N8 a opionibacterium acnes.st olyticus.st 354-arch DSM 18674 sasi 331- 33-Micromonos r m sp.1910b t. teri T 332-Micromonospora u li.st.DSM.s 18 250-Gordo 3 ropionimicrobium ly 355 -G 254-Mycobacterium elephantisC s 334-Acti eria bacterium rotundus IMCellul anopyrus Nocardia cy u 335-Actinoplanes s P um penden otga ordonia rhi .st. m li AT C p.E 1817D 336-Propionibacterium37-Prop propionicu l e 24 A TC .BAA 3 croluna s j apensis 35678 4 TCC. 338-P339-Prop freude hermofilmofi .ATCC.700922 27 9-G AT eprae.st.TNCC.1247C Pa45 r st 49 05 340- -T . TCC.430 24 CC.B .25 ilum 341-Patulibacter medicam 367-Methhe .A .296 o ni BA -535 p.DL1 CC. 246-No 8- rdoni 342-Conexibact343-Ilumatobacter coccineus 368 s is.st CC 9 245 a br 618 344-Mi lalkalicoccu T t.DSM 16790 Go zospriacigeorgic A-6 369-T onomonas moolsalinarumraon .st.A nse.st.ATCC.700C.4923 a s AA- 9-Ha erium rchaeonnii DL31byi.sue -Rho rdonia onchi 8 37 ium a a ls 24 ca ihwen hera 9614 45-Actinobact ismortui.st.AT olc .st.ATC rdia bra 3 alobact undi 4-Rhodocodo alis 8 346- 380-NatrH rax v coccus p a N lofe t.ATCC.4309823 ma sis NBRC 10823 381--Halobacteronomonas pha quadratum wa 247-G .st BRCa.st.GUH-2 1 347-Acidothermus c 385-halophilic-Haalo ometricum borinqgoryi.s 238-Gor 24 siliensisra .AT 382 aloarcula mar3867-H .DSM 183 242-Rho3-Rhodo e 383-NatrH 38 -Haloge rium gre 240-Rhod c y or NB CC 384- 388 te cus erythrid d 6068 7-2 DSM 15624 236-T donia 241- in ATCConia RCsp.KTR9 155.2 389-Halorubrumtronobac lacusprof 198 239 dococcc ivor 559 51 polyisop-GordoniaococcusR equoccus jo 390-Na ma sp.st.Jlirubrum.st. ATCC. sukam hodococcus rosp.B7740ans 2 .st. .7 306 391-Halopigerema xanaduensis.st pel ica us poli SB300358 392-Natrine urella s .ATCC.43099 237renivorans sputi. aet tii.ss.s 393-Natrin rigena turkmen 0874 235-marine actinoba paurome heriv t.PR4094 394-Natronococcuser agadii.st occultus SP4 TCC.70 -Rh i ATCC. t.RH 5-Halot ei.st.A odococcNBRC.100414orans 39 rialba m kohata 234 .st. A1 4 396-Nat lostagnicolam larseniimu XH-48 231-Reni -M tabo DSM 3370 397-Ha biu icrococcu u alomicro 233-Ar la.st s sp. 4 7 bacte . 4266 398-H 230-Art cte ATCC.8368R M 1488 rium salmonthrobacters l aurrium PHSC D6 227-Arthrob228-Brach h 232-Art ute .2 22 robacter glob us.st. 9-Brachybac hr 20C acterybac inarum.st.Aob escens.st.ATCC terium acter 1 chlorop iform .4 archaeon AD8-1 32-1 22 faecte sp.st 698 SG8- 5-Art 226-Arthrobacterheno rium sp.PAM issp. NBRCTCC.3 1 T oup-6 hro ium.s .FB2C1 um bac licus SW0 3 rane 222-Mi ter arilaitensis. t.A 212094 399-Crenarchaeota gr 2 st.ATCTCC.43885106-0937 crobacteriu23-Microbacterium224- 400-Crenarchaeota group-1 archaeon 220-Micr Ko C.7007 221-Microbam curia sp.U.st.DSMC25486 16368 401-Candidatus Caldiarchaeum subter 219-Micobac laevanif 00 TCC.BAA-918 terium trichotheorm ch CD .st.DSM 15229 robac cterium ginansocol - ans 36 218-Microbactt atum OTC SM 118 erium ketos OR22 P 402-Thermococcus kodakarensis.st.Ailus.st.D 217- cenolytsengiso 1 arinithermus hydrothermalis.st.DS cus baroph 850 21 Microbacteriumeriumireducens fol icum li 403-Thermococcus gammatolerlitoralis.st.ATCC.51 214-Mi 6-Microba 404-Thermococmococcus crobacteriu215-Microbacteriumcte sp.SA39iorum 405-Ther .GE5 riu oxydans 348-M ccus abyssi.st 212- 213- m h m sp. 406-Pyroco ATCC .43587 Mic Microbacteriumydroca azad Ag1 s.st. robacter rbonoxydans 407-Pyrococcus furiosu st.ATCC. 43067 ium tes naschii. 211-Beutenbergiataceum.st cav irachtae 408-Methanocaldococcus jan .StLB037 210-Sangu erna ibacter ked e.st.AT 209-Cellu dieii.st. CC.BAA-8 208-Cel lomonas ATCC.51767 lulomonas flavigenfimi.st.ATCC.4 207-Iso a.st.ATCC 84 206-Xylanimonas celluloptericola variabilis.st.2.482 205-Cellulom silytica.st.DSM 1 25 onas gilvu 5894 204-Jo s.st.ATCC.13127 .14579 nesia denitrificans.st.ATCC.148 cereus.st.ATCC 203-Kocuria 70 409-Bacillus rhizophila.st.ATCC.9341 410-Metagenomes 202-Rothia dentocariosa.st.ATCC.17931 201-Rothia mucilaginosa.st.DY-18 411-Metagenomes 200-Leifsonia xyli subsp.xyli.st.CTCB07 412-Thiothrix nivea.DSM.5205 199-Clavibacter michiganensis.st.ATCC.33113 413-Rhodospirillaceae bacterium BRH c57 ium sp.RIT-PI-h 198-Frigoribacter 416-Parcubacte 415-Microbacterium keto 414-Peptococcaceae bacterium SCADC1 2 3 197-Leucobacter sp.Ag1 ria sireducens -AKU laccumfaciens UCD -Curtobacterium f 49 196 ATCC.BAA-1 417- diotolerans.st. sp.CAG:700 ineococcus ra 418 195-K ckia chelonae NBRCC 105200104925 -Pelagibacter ubique.st.HTCC106 194-Austwi us NBR 00340 icoccus pelagi NBRC 1 1 2 193-Mobil ra limosa lei PVAS- neosphae bacter hoy 2 192-Ki 191-Jani CC.2355 vum.st.ATabacter sp.28 189-Terr C2649 sporangium cal r sp.HTC aensis 190-Intra ibacte iy 2 Jan nishinom .1439 188- occus TCC 428 ius.st.A 187-Dermacentar .ATCC.12 i seum.st uterequ occus sedm ro rium inoris Kytoc rangiu 186- m testudivorans 4-Corynebacteteriu celer .44291 185-Streptospo 18 ebac rei DSM ans Coryn . m imit 183- acteriumflavum u eriu 44953 420-Clo 419-Clos yneb philo ct 6098 stridium s tridium sp.CA r ipo .DSM.C 10 p.CAG:433 G:433 182-Coium l inum BR 35 8-De 7-mi ar YIM 70093212 1 421 180-Coryneba m 36-Methanoce-Me sulfu scel 2-M -Firmi 423-T -Amphi rynebacter ium CAU.6931 47- 4 rispirillu laneo eth cutes b rich medon q Co educens N se C 6- th us anom acte oplax ueensland 181- mir nen Dei Dei ano m Cr 3- ethy rium C adhaerens ica orynebacterm hu p.ATC xerosis 30 indicu enar Methanolobuslovorans AG:58 422-Ri C riu ium halotoleransdoosa s ium 0975 -T m. chaeo 6- holla 2 424- cinus 179- cte cter rium C.70 no noc cel h st.ATCC.B ta gr Methanosalsum zhil ps ndic Clostri communis erium te TC coccus la a er ou ychr a.st. dium neba act ynebacterA occus mo p a ophil DSM 15 sp.CAG:628 Cory eb nebac s ll r 11-S AA-1 rchaeon us R 978 178-Coryneba ory um.st. L1855 a p vo prot 12- ynechoc 9-Bu 389 in 15 177- -C 174-Cors ei LMG S-19264 alud ry Desu rk SM ae.st. 6-Coryn175 uco tum ATCC.6940p.KP 0 ma eus s 10- ho TZ DS 4-Metha 17 m porealensi45190 soli zae lfob occu Desulfatitalealde -80 M 4017 ium cas ium s M. 231 ico 31- acteriums s riale noc aurier cam 3303 ricop .s p.AZ2 19- p. s b 5-Methanococcooc um ter DS t.C .Ch la.st.t.DSM Py 23- Ch ATCC acter coid cterium striaium is. TCC.s.s 0294e ro au sp. iu es meth nebact er .2 a a et al.2014 25-Methanobacterium2 Bu ryse totr .27264 m y ct losi tel e baculum aer 4-B rkhold 20- o BRH GJ id ylut Cor m mar SM ns DSM 17 2 obacteriu phicu -E1 es bur ens M rynebacteri alium rcuA us urkholderia 21- Citreice c12 0 ton M1 o 172- eriu s.D m is.st.DSM 21 20 m ii. -C 170-Corynebacorynebact tube ni 45492 66 er 2 B .st.A st.DS 173 171-CorynebaC genit mi um 4 03 s ia v 2- re ll m T M 6242 9- eba m do m kutscheri 4 711 Hirscvu a sp.E CC 16 ryn iu iu ni 2-De o ietnamiensi ndimonsp.3 16-Ki 1 .43914 Co er pseu taeru er C.13 4 philum.st.A sp hi RM 3-St - ct vi ebacterict st.DSMC PL1989idica 44-D3 3 .YI23 a 57 t rep 68 ba um n s. AT .K 4202 -De 2- bal a R1 asatospora tomyc 1 eba p rm cum 45-Deinococcusin Ig lac tica s :04 14- ryne terium yn s de 48- oc 33 n s.s sp St e bacteric r ficienm.st.m SM.4ypi 11 21 einino isph 26- us .st. .AAP58 rept s v -Co Co ef epi 51866 49-Franki occus -V Ly . t.G4 AT set omycesiri yne eba 164-Cory .D 44385K4 50-FrankiaFr symbiont 1 ococcuscoccus 3 ulc a TC 27- st doch 167 r yn 51-Frank 4-Ca era a ngb .AL-21 CC.498 ae r Co or 163- acteriu nse TCC. 5 a anisae C.5 Ly .st omog erium te A nkia sp swu 2 ya sp.st .AT 15-Streptomalb ct neb .DSMm.st.44702170 53-2- s ldi ggreg1 9-Jeotgalngbya CC us. enes 166-165-C ry ora um t iu 3 768 14 . st. eba ium glutamicunt c i.s ke 00 Frankia sp.st.EuI1c gobi v ta 2 33774 AT e ti a g en irg dis 8- s .P 17- CC.21838 ryn -Co rg nebacteriumy at .4304251 aln obiensis.st.s a m an B p. CC 8 M yces pl orynebacteriumy tedti 4 971 ia s .st. p e is i ac st icr Co 160 m jei ilum AJ CC rot nsi tr s.st. baci ill .PCC 810 18- og le.st.DSM T i.s aquiib enomat ur um a riu nii.DSM.44353A SM p.EUN1f Cc eolyticus.st uta.s l us 106 Amph ipotens 162-rynebacter159-C iph t. t.ACN s.st 37 DSM 17 lus d 157-Corucuronol n .D C.142669030 54-Nocardioides p.st.EAN1p I3 li ec i -Co cteri roppens ci st CC.700 4 78 4 39 3 -P ng sp.D medo es ba k bactevariabiy 55-Nocardioides sp.C .DS 0- 8- t.D isif 6 63 tis 56 41 -V a ensis ro n 161 um gl t.ATATC CC. 14a DS Firm Pa ra SM 5 que ryne m glium ticum.sfalse s ii 57 su M -Fra ibr 230 nd oryne teri stens. AT WM 72 -N ra gla 14 i e terium erium ter . S58es -N bsp.Datisca glomerat M 21 i .s s ns si ae. nogenusaeles46 58 oca .ATCC.35 ic o cag cie t.ATCC la 8-C 4-Co ealyre i eri n 18 2 396 n ut lacie 42 ndi cteriu uchot sp c 6 5 oca 139 ki co 15 nebac 15 ur r ge 60-Sacchar9-N -Stackebra ec e mp 9 ca es C.1 7.22 rd a s bac eba nebac at c C 61-Am sp co la ium oncya mo 4 62- rd ioides 6 bellii.st.HY arct .7 yn m cbrelln n 63-Sac akamure .EU la arc 0084 Corynebac diphther AT 2 64-Lechevalieria aerocolonigenes ioides lut - Corynebact-Cory m omy 65- S s 074 ter 156-Cory tomyces ei.st.8m OK074 66-Act ica BSs20 55 rium chro i 71 ac implex N1f iu 1 ep sis ucescens 6 y tica BSs2 4 153- 151 ubsp.ept omyces sp.A CMgrovisoli 4913sp.e1 No m um s r do 67-Thermobifida co sp.st. 152-Cor nebacter terium en J osus 68- c y rynebacterium ri gla Tu4000 us nd CA ept s 045 s 69-Therm charothriharothrixl 01 5-Str na tr s AA-4 7 cardia s ic opo acteriu 7 inosynnema miru lla F8 s vi yces en 08 72-No 0-Nocard The ic tia nassa eu G: 14 iseus ha -S M T 1-N 1 ynebacte 143-St rnat 74- 73- mul BA 35 s ot 3 75-Streptomyces sp.Mg1 occus s 114 0 49-Co ce 40733 19T 7 l s es sp.NRRLia b WC-3 21 76-Streptom ys a 135 150-Cor1 flavu u 77-Streptomyces revibacteri 141 tom DS 12.97 79-Ac 7 r A- Cor eogr st.50 80- Str ocardiopsis dassonvill yc PBH53 11 6 82-Streptomyces81-Str niveus NCIMB Strep mo po . 83-Str 8-Streptomyce -B 88-S 84-Streptomyces venezuelae tipartita. 86-Streptomyces sp.PVA 499 85-Streptomyces albus.st.A us 89-Streptomyces sp.st.SirexAA 8 ca p.NRR x espanaensis. s rep inciresistens K4 90- s .ATCC.B 320 91-Streptomyces clavuligerus.s ra eryt 73 25853.C 7-Streptomyc 20 ept o 147- tom sis USC1 S bi p.NRRL B-16348 plurip 29 eptomyces scab tans es inca 09 secymotpertS 967.ps u 148-Corynebac146 MG rdiopsis sp.NRRL monospora curva cyan tomyces nod 13 .st s -39 u tr s z st.DSM tr iopsis alba.st.ATsp en Strep S tinobacteria bacterium bf .M tr tomyce TCC.29083 OV A es Streptomy rinu ep omyces kat eptomyces g lor 3653 eptom 33-St ce 3774 m OK006 tS- eptomyc sis.st.DSM 4 .MMG153 eptomy o lav

rt suci 0- A r yc -Streptomyces1 man oaurantiac TCC.3126 fus L S-836 st e ra hraea.st. Str sp enensis .5 ertS-49 to 9-Strep 136-Streptomycesmyces davawensi Strep e BCW-1 -Actinobacter A riu L F-6602 138-S pe .A 14 p us 3 namensis Tu 41 bispo to variegat myces sp.WM63 yces sp.WM42 tom es t ca. 134 omyces sp. s sp.MUSC136 t 1 om ris ggangen otp m.st. TC ot et secym dyl CTA.ts.ay .s ep 142- 137 g treptomyc es sp ces sp.Tu6071 ep coelico yces tsukubensis NRRL18488 s sp.C 129- mo jin ce TCC.3 st t.DSM 45 ces C.70009

rix sp.ST-888 yces pratensis.st.A Str sviceus ATCC ym ce - es es collinus. y s sp .YX st. es sp.NRR 6-S c sp. ra.st.A A ATC -Streptomy myc th Streptomyces griseo e

-Streptomyce e s gr rae bsp. yces sp.st.SPB074 ec TCC.2988 AT sp.NRR s sp.N 4728 144 12 s ro 135-Str myc NRRL yces sp.CNQ-5 .NRRL F-4428 secym su teria bacte s 131 omyc ggensis.st. CC C. omyces 124 om a CC 132- ceusniger 122-Strept ise B t s repto ei a.s pt pto treptomy elt u TC 9 -Streptomyc otpert 0 1 tomyces xiam sirp r -1 .st. . 89 163 omir t .B 51 ptomyces vi hen us s t romogenus subsp.osci -S ptom setae.st.A Streptomy RR t.A s i statinicussecymotpertS- ac -59 C. ta S-4 6 h 35 A 121 18-Streptomyces sp.M rep A 144 -Stre a 30 reptomyces an 5 1 17-St 113-Streptomy hromogenes.st.DSM 40 L TC 19 8 tinobac su A-2 72 TCC.2 0 copicu 119 115-Streptomyces sp.WM6386 94-07 O 1 e L F ubsp.gri

3 S 9 109- -Stre s F-6602 -69 44 99 1 TCC.21838 C.19995 127-Strept os - V450 125-Stre 6-St us -Ac 8 p 165 .s 649 112-Strept 103-Saccha

b i s him 3 ptomyces avermitilis.st. t 10 10 s 14 ar 120-St 105-Stre 2-Streptomyces rubellomu .A 1-Actinobacteria bacterium 1 3218 -E hygr 1 p TCC 7-Streptomyces l

t.A 1 1891 s viridoc . TCC.10712 11 yce 10

104-Streptomyces sp.NRRL S-495 seu 10 -Stre itasatospora T

1000RGA su

yce TA si CC.27064 .3

omir -K

C

s.s m s myces 116 3331 -Streptomyces viola eptomyces bingc

u

to r

s 79 101 .C t.JCM 4626

pto 99 2

St re trep

St 98- 128-Streptomyces roseoc 6845 CCTA 0-S 100-Streptom

.

1 123- 11

90

7

Tree scale 0

Supplementary Figure 6. Phylogenetic analyses of the NucS C-terminal (CT) region. Unrooted ML tree of CT sequences from the PFAM PF01939 domain. Grey font indicates that this domain is in NucS and blue font that is found outside NucS. Black branches, Actinobacteria; red Deinococcus-Thermus; grey, other Bacteria; light blue, Archaea; green, eukaryotic sequences. Bold fonts indicate the main species used in this study. Labels indicate species name. The circles represent >80% bootstrap in 1,500 replicates.

8

65-Streptomyces_roseochromogenus_subsp.oscitans_DS_12.976

67-Streptomyces_hygroscopicus_subsp.jinggangensis.st.5008

62-Streptomyces_cyaneogriseus_subsp.noncyanogenus

64-Streptomyces_viridochromogenes.st.DSM_40736

69-Streptomyces_pratensis.st.ATCC.33331

51-Actinoplanes_missouriensis.st.ATCC.1453855-Catenulispora_acidiphila.st.DSM_44928

63-Streptomyces_collinus.st.DSM_40733

48-Micromonospora_aurantiaca.st.ATCC.2702954-Salinispora_tropica.st.ATCC.BAA-916 57-Streptomyces_davawensis_JCM_4913 58-Streptomyces_niveus_NCIMB_11891 53-Micromonospora_lupini_str.Lupac_0856-Kitasatospora_setae.st.ATCC.33774

68-Streptomyces_sp.st.SirexAA-E

52-Actinoplanes_friuliensis.DSM.7358 72-Streptomyces_sp.MUSC136T

71-Streptomyces_sp.MMG1121

50-Actinoplanes_sp.st.ATCC.31044

47-Micromonospora_sp.NRRL_B-16802 70-Streptomyces_incarnatus

61-Streptomyces_mangrovisoli 66-Streptomyces_sp.PBH53

45-Verrucosispora_maris.st.AB-18-032 59-Streptomyces_pluripotens 38-Streptomyces_griseus_subsp.griseus.st.JCM_4626 49-Actinoplanes_sp.N902-109

41-Streptomyces_coelicolor.st.ATCC.BAA-471 60-Streptomyces_sp.e14

46-Micromonospora_sp.HK10

DSM_44728 42-Streptomyces_sp.NRRL_S-49543-Streptomyces_rubellomurinus 39-Streptomyces_griseoflavus_Tu4000 44-Saccharothrix_sp.ST-888

yces_cattleya.st.ATCC.35852

36-Streptomyces_sp.NRRL_F-6491 34-Streptomyces_albus.st.ATCC.2183835-Actinobacteria_bacterium_OK074 37-Streptomyces_vietnamensis40-Streptomyces_caelestis

73-Streptomyces_sp.WM6386

74-Actinobacteria_bacterium_OV320

75-Streptomyces_viridochromogenes 76-Streptomyces_zinciresistens_K42

32-Streptomyces_sp.NRRL_S-444 77-Streptom 33-Streptomyces_glaucescens

81-Saccharomonospora_marina_XMU15 29-Streptomyces_sp.NRRL_F-442831-Streptomyces_sp.WM6372 80-Saccharomonospora_glauca_K6282-Saccharomonospora_viridis.st.ATCC.15386 26-Streptomyces_ghanaensis.ATCC.14672 83-Amycolatopsis_mediterranei.st.S699 78-Pseudonocardia_dioxanivorans.st.ATCC.55486 84-Amycolatopsis_orientalis_HCCB10007 90-Segniliparus_rotundus.st.ATCC.BAA-972

79-Saccharopolyspora_erythraea.st.ATCC.11635 86-Actinosynnema_mirum.st.ATCC.29888 85-Nocardia_sp.NRRL_S-83687-Saccharothrix_espanaensis.st.ATCC.51144 30-Streptomyces_sp.Mg1 88-Saccharothrix_sp.NRRL_B-16348 92-Nocardia_cyriacigeorgica.st.GUH-2 89-Lechevalieria_aerocolonigenes 91-Nocardia_farcinica.st.IFM_10152 93-Nocardia_nova_SH22a 94-Nocardia_brasiliensis.ATCC.700358 95-Kribbella_flavida.st.DSM_1783696-Stackebrandtia_nassauensis.st. 23-Streptomyces_sviceus.ATCC.2908328-Streptomyces_katrae 24-Streptomyces_sp.MUSC119T27-Streptomyces_sp.C 25-Streptomyces_variegatus 97-Amycolicicoccus_subflavus.st.DSM_45089

21-Actinobacteria_bacterium_OK006 98-Kutzneria_albida.DSM.43870 22-Streptomyces_sp.WM6378 105-Gardnerella_vaginalis_JCP8481B 99-Nakamurella_multipartita.st.ATCC.700099 106-Bifidobacterium_thermophilum_RBL67108-Bifidobacterium_bifidum_BGN4 17-Streptomyces_tsukubensis_NRRL1848819-Actinobacteria_bacterium_OV450 107-Bifidobacterium_angulatum.DSM.20098109-Bifidobacterium_gallicum.DSM.20093110-Bifidobacterium_pseudolongum_PV8-2 16-Streptomyces_venezuelae.st.ATCC.10712 20-Streptomyces_sp.WM4235 100-Kytococcus_sedentarius.st.ATCC.14392 101-Brevibacterium_mcbrellneri.ATCC.49030 111-Bifidobacterium_dentium.st.ATCC.27534 18-Streptomyces_scabiei.st.87.22 102-Aeromicrobium_marinum.DSM.15272 112-Bifidobacterium_adolescentis.st.ATCC.15703 103-Nocardioides_luteus 113-Bifidobacterium_kashiwanohense_PV20-2114-Bifidobacterium_animalis_subsp.lactis.st.AD011 104-Nocardioides_simplex 115-Bifidobacterium_breve.DSM.20213 13-Streptomyces_avermitilis.st.ATCC.31267 15-Streptomyces_sp.MMG1533 12-Streptomyces_sp.NRRL_WC-3618 14-Str 10-Streptomyces_griseoaurantiacus_M045 eptomyces_nodosus 9-Streptomyces_pristinaespiralis.ATCC.25486 116-Nocardioides_sp.CF8 117-Nocardioides_sp.st.BAA-499 11-Streptomyces_sp.AS58 129-Natronomonas_pharaonis.st.ATCC.35678128-Natronomonas_moolapensis.st.DSM_18674 118-Rothia_dentocariosa.st.ATCC.17931119-Rothia_mucilaginosa.st.DY-18 4-Streptomyces_rimosus_subsp.rimosus.ATCC.109707-Streptomyces_auratus_AGR00018-Streptomyces_xiamenensis 120-Actinomyces_sp.oral_taxon_848_str.F0332 127-Haloarcula_marismortui.st.ATCC.43049130-Halorhabdus_tiamatea_SARL4B 121-Actinomyces_neuii_BVS029A5 122-Halobacterium_salinarum.st.ATCC.700922 6-Streptomyces_lydicus_A02 124-Halogeometricum_borinquense.st.ATCC.700274126-halophilic_archaeon_DL31132-Halovivax_ruber.st.DSM_18193133-Halopiger_xanaduensis.st.DSM_18323134-Halostagnicola_larsenii_XH-48 123-Halorubrum_lacusprofundi.st.ATCC.49239125-Haloferax_volcanii.st.ATCC.29605131-Halomicrobium_mukohataei.st.ATCC.700874135-Natrialba_magadii.st.ATCC.43099136-Natronobacterium_gregoryi.st.ATCC.43098

3-Streptomyces_sp.NRRL_F-66025-Streptomyces_sp.769 1-Streptomyces_bingchenggensis.st.BCW-1 368-Streptomyces_clavuligerus.st.ATCC.270642-Streptomyces_sp 137-Haloquadratum_walsbyi.st.DSM_16790

367-Streptomyces_sp.Tu6071.CNQ-509 366-Streptomyces_sp.st.SPB074

365-Streptomyc A 363-Streptomyces_himastatinicus.ATCC.53653 ARCHAEA 364-Streptomyces_sp.PVA_94-07es_sp.NRRL_F-6602

362-Streptomyces_violaceusniger_Tu_4113 R

138-Thermofilum_sp.1910b 360-Frankia_symbiont_subsp.Datisca_glomerata

361-Frankia_sp.st.EAN1pec C 139-Thermofilum_pendens.st.Hrk_5 140-miscellaneous_Crenarchaeota_group-6_archaeon_AD8-1

359-Frankia_sp 141-miscellaneous_Crenarchaeota_group-1_archaeon_SG8-32-1 H 358-Frankia_sp.st.E.EUN1f 142-Halobacterium_sp.DL1143-Natronobacterium_gregoryi.st.ATCC.43098 144-Halobacterium_salinarum.st.ATCC.700922 357-Frankia_sp.st.CcI3uI1c 355-Acidothermus_cellulolyticus.st.ATCC.43068356-Frankia_alni.st.ACN14a 145-Methanocella_arvoryzae.st.DSM_22066146-Methanocella_paludicola.st.DSM_17711 A 147-Sulfolobus_solfataricus.st.ATCC.35092148-Acidianus_hospitalis.st.W1 149-Sulfolobus_acidocaldarius.st.ATCC.33909150-Sulfolobus_tokodaii.st.DSM_16993

353-Thermomonospora_curvata.st.ATCC.19995354-Thermobifida_fusca.st.YX E 352-Thermobispora_bispora.st.ATCC.19993 151-Pyrococcus_abyssi.st.GE5152-Methanocaldococcus_jannaschii.st.ATCC.43067 153-Pyrococcus_furiosus.st.ATCC.43587 351-Streptosporangium_roseum.st.ATCC.12428 154-Thermococcus_gammatolerans.st.DSM_15229155-Thermococcus_barophilus.st.DSM_11836 156-Thermococcus_litoralis.st.ATCC.51850 A 350-Nocardiopsis_alba.st.ATCC.BAA-2165 157-Thermococcus_kodakarensis.st.ATCC.BAA-918 349-Nocardiopsis_sp.NRRL_B-16309 159-Methanobrevibacter_smithii.st.PS158-Methanobrevibacter_ruminantium.st.ATCC.35063 160-Methanobrevibacter_sp.AbM4161-Methanosphaera_stadtmanae.st.ATCC.43021 348-Nocardiopsis_dassonvillei.st.ATCC.23218 162-Methanothermobacter_thermautotrophicus.st.ATCC.29096 163-Methanobacterium_lacus.st.AL-21 347-Microlunatus_phosphovorus.st.ATCC.700054 164-Methanobacterium_paludis.st.DSM_25820165-Methanobacterium_sp.MB1 346-Propionibacterium_propionicum.st.F0230a 345-Propionimicrobium_lymphophilum_ACS-093-V-SCH5 166-Natronomonas_pharaonis.st.ATCC.35678167-Natronomonas_moolapensis.st.DSM_18674 344-Propionibacterium_acidipropionici.st.ATCC.4875 168-Halalkalicoccus_jeotgali.st.DSM_18796 169-Haloarcula_mari170-Halomi 171-Halostagnicola_larsenii_XH-48crobium_mukohataei.st.ATCC.700874smortui.st.ATCC.43049 343-Propionibacterium_freudenreichii_subsp.shermanii.st.ATCC.9614 172-Natrinema_pellirubrum.st.DSM_15624 340- 174-Natrinema_sp.st.J7-2173-Natronobacterium_gregoryi.st.ATCC.43098 342-Geodermatophilus_obscurus.st.ATCC.25078 176-Natronococcus_occultus_SP4175-Halopiger_xanaduen 341-Blastococcus_saxobsidens.st.DD2 178-halophilic_archaeon_DL31177-Halor ubrum_lacusprofundi.st.ATCC.49239 180-Haloferax_volcanii.st.ATCC.29605179-Halogeometricum_borinquense.st.ATCC.700274sis.st.DSM_18323 181-Haloquadratum_walsbyi.st.DSM_16790 ACTINOBACTERIA

182-Ignisphaera_aggregans.st.DSM_17230

339-Corynebacterium_kroppenstedtii.st.DSM_44385 186-Pyrobaculum_aerophilum.st.ATCC.51768 187-Candidatus_Caldiarchaeum_subterraneum185-Thermoproteus_sp.AZ2184-Caldivirga_maquilingensis.st.ATCC.700844183-Vulcanisaeta_distributa.st.DSM_14429 338-Corynebacterium_lipophiloflavum.DSM.44291 188-Thermosphaera_aggregans.st.DSM_11486 337-Corynebacterium_jeikeium.st.K411 189-Staphylothermus_marinus.st.ATCC.43588 190-Hyperthermus_butylicus.st.DSM_5456 336-Corynebacterium_resistens.st.DSM_45100 335-Corynebacterium_falsenii.DSM.44353

334-Corynebacterium_urealyticum.st.ATCC.43042 191-Deinococcus_maricopensis.st.DSM_21211 333-Corynebacterium_glyciniphilum_AJ_3170 197-Thermogladius_cellulolyticus.st.1633 192-'Deinococcus_soli'_Cha_et_al._2014 195-Deinococcus_gobiensis.st.DSM_21396194-Deinococcus_swuensis 332-Corynebacterium_variabile.st.DSM_44702 196-Deinococcus_proteolyticus.st.ATCC. 331-Ilumatobacter_coccineus_YM16-304 193-Marinithermus_hydrothermalis.st.DSM_14884 329-Corynebacterium_sp.KPL1989 330-Actinobacteria_bacterium_IMCC26256 328-Corynebacterium_testudinoris 199-archaeon_GW2011_AR10 198-Methanoce 326-Corynebacterium_kutscheri 327-Corynebacterium_halotolerans_YIM_70093 324-Corynebacterium_mustelae 201-Actinomyces_graevenitzii_C83200-Conexibacter_woesei.st.DSM_14684 lla_paludicola.st.DSM_17711 325-Corynebacterium_vitaeruminis.DSM.20294 35074 202-Actinomyces_sp.oral_taxon_175_str.F0384 204-Actinomyces_massiliensis_F048203-Actinomyces_urogenitalis.DSM.15434 323-Corynebacterium_matruchotii.ATCC.14266 205-Actinomyces_sp.oral_taxon_448_str.F0400 322-Corynebacterium_pseudotuberculosis.st.C231 DeinococcusD 317-Corynebacterium_uterequi 206-Actinotignum 321-Corynebacterium_diphtheriae.st.ATCC.700971320-Corynebacterium_efficiens.st.DSM_44549 207-Actinomyces_turicensis_ACS-279-V-Col4 e 208-Actinomyces_sp.oral_taxon_178_str.F0338 i 209-Actinomyces_sp.oral_taxon_180_str.F0310 n 319-Corynebacterium_argentoratense.DSM.44202 210-Arcanobacterium_haemolyticum.st.ATCC.9345 o 318-Corynebacterium_glutamicum.st.ATCC.13032316-Corynebacterium_striatum.ATCC.6940 211-Trueperella_pyogenes 212-Microbacterium_laevaniformans_OR221 c 213-Microbacteri 314-Corynebacterium_camporealensis _schaalii o 313-Corynebacterium_sp.KPL1855 215-Microbacterium_trichothecenolyticum214-Microbacterium_chocolatum c 216-Microbacterium_ketosireducens 217-Microbacterium_ginsengisoli c 315-Corynebacterium_aurimucosum.st.ATCC.700975 9 u 312-Corynebacterium_casei_LMG_S-19264 219-Microbacterium_sp.Ag1218-Microbacterium_azadirachtae s 310-Corynebacterium_maris.DSM.45190 220-Microbacterium_sp.SA39 307-Corynebacterium_atypicum 221-Microbacterium_hydrocarbonoxydansum_testaceum.st.StLB037 223-Microbacterium_oxydans222-Microbacterium_foliorum 309-Corynebacterium_marinum.DSM.44953 224-Sanguibacter_keddieii.st.ATCC.51767 303-Corynebacterium_imitans 225-Xylanimonas_cellulosilytica.st.DSM_15894 226-Cellulomonas_gilvus.st.ATCC.13127 311-Corynebacterium_doosanense_CAU_212_=.DSM.45436 227-Cellulomonas_flavigena.st.ATCC.482 231-Terrabacter_sp.28230-Isoptericola_variabilis.st.225229-Cellulom 304-Turicella_otitidis.ATCC.51513 232-Intrasporangium_calvum.st.ATCC.23552 299-Corynebacterium_xerosis298-Dietzia_cinnamea_P4 228-Jonesia_denitrificans.st.ATCC.14870 302-Corynebacterium_ureicelerivorans 235-Mobilicoccus_pelagius.NBRC.104925234-Austwickia_chelonae.NBRC.105200233-Kineosphaera_limosa.NBRC.100340 305-Corynebacterium_genitalium.ATCC.33030 68 240-Beutenbergia_cavernae.st.ATCC.BAA-8239-Kineococcus_radiotolerans.st.ATCC.BAA-149 301-Corynebacterium_epidermidicanis 243-Micrococcus_luteus.st.ATCC.4698 236-Dermacoccus_nishinomiyaensis 300-Corynebacterium_sp.ATCC.6931 237-Janibacter_hoylei_PVAS-1 238-Janibacter_sp.HTCC2649 241-Brachybacterium_sp.SW0106-09

255-Clavibacter_michiganensis_su 250-Arthrobacter_chlorophenolicus.st.ATCC.700700 cus_erythropolis.st.PR4 248-Arthr 306-Corynebacterium_glucuronolyticum.ATCC.51866 251-Renibacterium_salmoninarum.st.ATCC.33209249-Arthrobacter_sp.st.FB24 254-Curtobacterium_flaccumfaciens_UCD-AKU 247-Arthrobacter_sp.PAMC25486246-Kocuria_sp.UCD-OTCP 242-Brachybacterium_faecium.st.ATCC.43885 308-Corynebacterium_humireducens.NBRC.106098_=.DSM.45392 296-Rhodococcus_jostii.st.RHA1 244-Kocuria_rhizophila.st.ATCC.9341

256-marine_actinobacterium_PHSC20C1

252-Arthrobacter_arilaitensis.st.DSM_16368 onas_fimi.st.ATC

253-Leucobacter_sp.Ag1 245-Mobiluncus_curtisii.st.ATCC.43063

7 297-Rhodococcus_pyridinivorans_SB3094293-Rhodococcus_sp.B7740292-Rhodococcus_sp.RD6.2 295-Rhodococcus_equi.ATCC.33707 67 291-Rhodococcus_aetherivorans287-Gordonia_sp.KTR9 obacter_globiformis.NBRC.12137 294-Rhodococ 262-

t.ATCC.BAA-535 289-Gordonia_amarae.NBRC.15530 RIVM7003 C.484

290-Gordonia_sihwensis.NBRC.108236 i_ 283-Mycobacterium_sp.EPa45 279-Mycobacterium_bovis 288-Gordonia_rhizosphera.NBRC.16068285-Gordonia_sputi.NBRC.100414

259-Streptomyces_sp.AA4 281-Mycobacterium_haemophilum 286-Gordonia_bronchialis.st.ATCC.25592280-Mycobacterium_leprae.st.TN

270-Mycobacterium_obuense

257-Frigoribacterium_sp.RIT-PI-h Tree scale: 1 bsp.sepedonicus.st.ATCC.33113 284-Gordonia_polyisoprenivorans.st.DSM_44266

263-Mycobacterium_elephantis

264-Mycobacterium_sp.st.KMS 274-Mycobacterium_sp.VKM_Ac-1817D 276-Mycobacterium_kansasii.ATCC.12478

258-Leifsonia_xyli_subsp.xyli.st.CTCB07

278-Mycobacterium_tuberculosis.st.ATCC.25618 282-Mycobacterium_paratuberculosis.st.ATCC.BAA-9 266-Mycobacterium_rhodesiae.st.NBB3 273-Mycobacterium_hassiacum.DSM.44199

275-Mycobacterium_marinum.s 267-Mycobacterium_xenop 265-Mycobacterium_sinense.st.JDM601

272-Mycobacterium_neoaurum_VKM_Ac-1815D269-Mycobacterium_vanbaalenii.st.DSM_7251 277-Mycobacterium_parascrofulaceum.ATCC.BAA-614 271-Mycobacterium_smegmatis.st.ATCC.700084

261-Mycobacterium_abscessus.st.ATCC.19977

268-Mycobacterium_thermoresistibile.ATCC.1952 260-Tsukamurella_paurometabola.st.ATCC.8368

Supplementary Figure 7. Phylogenetic analyses of NucS N-terminal (NT) region. Unrooted ML tree of NT sequences from the PFAM PF01939 domain. Black branches, Actinobacteria; red Deinococcus-Thermus; light blue, Archaea. Labels indicate species name. The circles represent >80% bootstrap in 1,500 replicates.

9 Supplementary Tables

Supplementary Table 1. Mutation rates of M. smegmatis and its ∆nucS derivatives. Rates of spontaneous mutations conferring rifampicin (Rif-R) and streptomycin resistance (Str-R) of M. smegmatis mc2 155 (WT), M. smegmatis ∆nucS and M. smegmatis ∆nucS complemented with the 2 wild-type nucS from M. smegmatis mc 155 (nucSSm). Fold change indicates the increase in mutation rate with respect to the strain M. smegmatis mc2 155 (set to 1). Mut Rate: mutation rate (mutations per cell per generation).

Mut Rate Mut Rate Strain Genotype 95% CI Fold 95% CI Fold Rif Str mc2 155 WT 2.07x10‐9 1.47‐2.74x10‐9 1 5.58x10‐10 2.25‐11.1x10‐10 1

∆nucS ∆nucS 3.11x10‐7 2.81‐3.39x10‐7 150.2 4.82x10‐8 3.53‐6.17x10‐8 86.3

∆nucS/nucSSm Complemented 3.02x10‐9 2.2‐3.97x10‐9 1.46 1.43x10‐9 0.77‐2.33x10‐9 2.56

10 Supplementary Table 2. Mutational spectrum of M. smegmatis mc2 155 and its ∆nucS derivative. A) Spontaneous mutations conferring rifampicin resistance found in 41 independent Rif-R mutants in the WT and ∆nucS strains. The columns show the position of the mutations in the rpoB gene sequence, the codon change (modified bases are in bold), the amino acid change caused by each mutation and the number of independent Rif-R mutants isolated from mc2 155 (WT) or its ΔnucS derivative. B) Specificity of base substitutions, summarized by class, produced in the WT and ∆nucS strains. A)

Amino acid Position Codon change WT ΔnucS change

A 1295 G GAC→GGC Asp 432 Gly 0 1

C 1324 T CAC→TAC His 442 Tyr 6 6

C 1324 G CAC→GAC His 442 Asp 2 0

A 1325 G CAC→CGC His 442 Arg 13 20

A 1325 C CAC→CCC His 442 Pro 4 0

G 1334 A CGT →CAT Arg 445 His 0 6

G 1334 C CGT→CCT Arg 445 Pro 1 0

G 1334 T CGT→CTT Arg 445 Leu 2 0

C 1340 T TCG→TTG Ser 447 Leu 9 6

C 1340 G TCG→TGG Ser 447 Trp 2 0

T 1346 C CTG→CCG Leu 449 Pro 0 2

G 1348 A GGC→AGC Gly 450 Ser 1 0

Δ CCAGCTGTC CCAGCTGTC Ser 425 Arg 1 0 (1275-1283) + ΔGlnLeuSer (426-428)

TOTAL 41 41

11

B)

Mutation WT (%) ∆nucS (%)

n=41 n=41 G:C→A:T 16 (39.0) 18 (43.9) A:T→G:C 13 (31.7) 23 (56.1) G:C→T:A 2 (4.9) 0 A:T→T:A 0 0 G:C→C:G 5 (12.2) 0 A:T→C:G 4 (9.7) 0 Deletions 1 (2.4) 0

12 Supplementary Table 3. Mutation rates of S. coelicolor A3(2) M145 and its ∆nucS derivatives. Rates of spontaneous mutations conferring rifampicin (Rif-R) and streptomycin resistance (Str-R) of S. coelicolor A3(2) M145, its ∆nucS derivative and S. coelicolor ∆nucS complemented with the wild-type nucS from S. coelicolor (nucSSco). 95% confidence intervals (CI) are indicated. Mut Rate: mutation rate (mutations per cell per generation).

Mut Rate Mut Rate Strain Genotype 95% CI Fold 95% CI Fold Rif-R Str-R S. coelicolor WT 5.11x10-9 2.87-8.01x10-9 1 4.30x10-10 0.24-18.9x10-10 1 A3(2) M145

∆nucS ∆nucS 5.54x10-7 4.20-6.97x10-7 108.4 8.49x10-8 5.78-11.4x10-8 197.4

-9 -9 -10 -10 ∆nucS/nucSSco Complemented 2.01x10 0.83-3.88x10 0.5 4.30x10 0.24-18.9x10 1

13 Supplementary Table 4. Characteristics of the M. tuberculosis representative strains containing NucS polymorphisms. Genome ID/common name, NucS polymorphism, resistance profile, lineage and origin of the strains are shown. Resistance profile indicates not detected/unknown antibiotic resistances (Susceptible) or MDR, multidrug-resistant strain (expressing at least rifampicin and isoniazid resistance).

GenomeID/ Resistance Polymorphism Lineage Origin name profile CDC1551 WT Susceptible 4 North America TKK_02_0079 S39R MDR 4 South Africa MTB_N1057 S54I Susceptible 4 South Asia KT-0040 A67S Susceptible 2 S. Korea (Broad Inst) ERR036236 V69A Susceptible 1 Unknown BTB 04-388 A135S MDR 3 Sweden (Broad Inst) BTB 07-246 R144S MDR 4 Sweden (Broad Inst) TKK_03_0044 D162H Susceptible 4 South Africa HN2738 T168A Unknown Unknown Unknown (Broad Inst) MTB_X632 K184E MDR 4 Central America

14

Supplementary Table 5. Effect of M. tuberculosis NucS naturally occurring polymorphisms on mutation rates. Rates of spontaneous mutations conferring rifampicin resistance (Mut rate, mutations per cell per generation) of M. smegmatis ∆nucS complemented with the wild-type nucS from M. tuberculosis (nucSTB) or the 9 polymorphic alleles. 95% confidence intervals (CI) are shown. Fold change indicates the increase in mutation rate with respect to the strain M. smegmatis

∆nucS complemented with wild-type nucSTB (∆nucS/nucSTB), set to 1.

Amino acid Fold Codon change Mut rate 95% CI change change M. smegmatis nucS from TB 4.17x10-9 3.29-5.12x10-9 1 ∆nucS/nucSTB CDC1551 S39R AGC→AGG 3.49x10-7 2.93-4.0x10-7 83.7 S54I AGT→ATT 7.94x10-9 5.49-11.07x10-9 1.9 A67S GCG→TCG 2.78x10-9 1.87-3.84x10-9 0.7 V69A GTG→GCG 8.77x10-9 6.50-11.2x10-9 2.1 A135S GCG→TCG 3.57x10-8 2.77-4.38x10-8 8.6 R144S CGC→AGC 3.16x10-8 2.40-4.0x10-8 7.6 D162H GAC→CAC 8.98x10-9 6.42-11.80x10-9 2.2 T168A ACC→GCC 3.96x10-8 3.09-4.81x10-8 9.5 K184E AAG→GAG 3.06x10-8 2.37-3.75x10-8 7.3

15

Supplementary Table 6. Sequence of oligonucleotides used in this work.

Oligonucleotide Sequence (5’-3’) Purpose

SecMycomar CCCGAAAAGTGCCACCTAAATTGTAAGCG Localization of Tn insertion site

DelnucSm5F CCCGCTGCAGCTGGCCGAGTTCGG nucSSm M. smegmatis deletion

DelnucSm5R CGGTAAGCTTGGCTATCACGAGGCGCACCC nucSSm M. smegmatis deletion

DelnucSm3F CGGAAAGCTTAGCGACGAGTACCGGCTCTT nucSSm M. smegmatis deletion

DelnucSm3R AATCGTCGACCGAACCCATCAACTTACCGA nucSSm M. smegmatis deletion

CompnucTBF ACTGGAATTCTCGAGTGGTGGCCTTCTCGGATGGCAT nucSTB for ΔnucSSm complementation

CompnucTBR ACTGAAGCTTTCAGAACAGCCGGTACTCGCCGCT nucSTB for ΔnucSSm complementation

CompnucSmF ACTGGAATTCCCGCGCCAGCGAATTGTCGGCGTTCAT nucSSm for ΔnucSSm complementation

CompnucSmR CGGCAAGCTTTCAGAAGAGCCGGTACTCGTCGCT nucSSm for ΔnucSSm complementation

RifRRDRrpoBF GTGGCGGCGATCAAGGAGTTCTTC RRDR-rpoB amplification Sm

RifRRDRrpoBR GGCGACCGACACCATCTGGCGCGG RRDR-rpoB amplification Sm

DelnucSco5F GTGTAAGCTTCGGCACCGCGGTGAGTGTGC nucS S. coelicolor deletion Sco

DelnucSco5R CGACGGATCCGCGGGCAATGACGAGACGCA nucSSco S. coelicolor deletion

DelnucSco3F GCTGGGATCCTTCTGAGGGCGCGACGCGTC nucSSco S. coelicolor deletion

DelnucSco3R CAGGGATATCTGCCCGCCCTGGTCGGCGAG nucSSco S. coelicolor deletion

CompnucScoF TACGGAATTCGGGTTCTCCTCTCGCACCCCCGACCAGCAGGGG nucSSco for ΔnucSSco complementation

CompnucScoR TACGGGATCCTCAGAACAGCCGCAGCTTGTCGTCCTCGATGCC nucSSco for ΔnucSSco complementation

Rechyg3F CAGTTTCATTTGATGCTCGATGAG Recombination. pRhomyco 100%

Rechyg3R GACTAACTAGTCAGGCGCCGGGGGCGGTGT Recombination. pRhomyco 100%

Rechyg5F GATGCAGCTGGGAGTGCGCTGTGACACAAGAATCC Recombination. pRhomyco 100%

Recombi. pRhomyco 100%, 95%, 90% and Rechyg5R CGATAAGCTTCGAATTCTGCAGCTCG 85%

Rechyg5intR GATGTGATCACCGGGTCGGGCTCG Recombi. pRhomyco 95%, 90% and 85%

GATGTGATCAAGCTGTTCGGGGAGCACTG Rec95-BclIF Recombination. pRhomyco 95%

Rec95-NheIR GATGGCTAGCGAAGTCGACGATCCCGGTGA Recombination. pRhomyco 95%

Rec90-85-BclIF GATGTGATCAAGCTCTTCGGGGAACACTG Recombination. pRhomyco 90% and 85%

Rec90-85-PciIR GATGACATGTGAAGTCGACGATCCCGGTGA Recombination. pRhomyco 90% and 85%

S39RF GGCCGACGGATCGGTCAGGGTACATGCTGACGACCG Site-directed mutagenesis nucSTB

S39RR CGGTCGTCAGCATGTACCCTGACCGATCCGTCGGCC Site-directed mutagenesis nucSTB

S54IF CCGTTGAACTGGATGATTCCGCCGTGCTGGTTG Site-directed mutagenesis nucSTB

16 S54IR CAACCAGCACGGCGGAATCATCCAGTTCAACGG Site-directed mutagenesis nucSTB

A67SF AGAGTCCGGCGGCCAGTCGCCAGTGTGGGTGGTCG Site-directed mutagenesis nucSTB

A67SR CGACCACCCACACTGGCGACTGGCCGCCGGACTCTTC Site-directed mutagenesis nucSTB

V69AF CCGGCGGCCAGGCGCCAGCGTGGGTGGTCGAGAAC Site-directed mutagenesis nucSTB

V69AR GTTCTCGACCACCCACGCTGGCGCCTGGCCGCCGG Site-directed mutagenesis nucSTB

A135SF GCCGCGAGTACATGACCTCGATCGGACCCGTCGAC Site-directed mutagenesis nucSTB

A135SR GTCGACGGGTCCGATCGAGGTCATGTACTCGCGGC Site-directed mutagenesis nucSTB

A162HF GCGGCGTGGCGAGATCCACGGCGTGGAGCAGCTGAC Site-directed mutagenesis nucSTB

A162HR GTCAGCTGCTCCACGCCGTGGATCTCGCCACGCCGC Site-directed mutagenesis nucSTB

T168AF GCGTGGAGCAGCTGGCCCGCTACCTCGAGTTGC Site-directed mutagenesis nucSTB

T168AR GCAACTCGAGGTAGCGGGCCAGCTGCTCCACGC Site-directed mutagenesis nucSTB

K184EF GTGCTCGCGCCGGTCGAGGGGGTGTTTGCCG Site-directed mutagenesis nucSTB

K184ER CGGCAAACACCCCCTCGACCGGCGCGAGCAC Site-directed mutagenesis nucSTB

pvv16-seq-fwd CGGTGAGTCGTAGGTCGGGACGG PCR amplification and sequencing

pvv16-seq-rev TGCCTGGCAGTCGATCGTACGCTAG PCR amplification and sequencing

17 Supplementary Data.

Supplementary Data 1. Taxonomic distribution of NucS in reference proteomes (Big excel data file).

Supplementary Data 2. Taxonomic tree for phylogenetic profiling in newick format. Raw NCBI profiling tree from archaeal and bacterial species extracted from NCBI in newick format, corresponding to Figure 5.

Supplementary Data 3. Phylogenetic tree of full NucS in newick format. Raw Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary Figure 5 (full NucS) in newick format.

Supplementary Data 4. Detailed information for phylogenetic trees of NucS regions. Sequence details of NucS-CT and NucS-NT labels on Supplementary Figures 6-7 trees (Excel file).

Supplementary Data 5. Phylogenetic tree of C-terminal NucS in newick format. Raw Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary Figure 6 (NucS-CT region) in newick format.

Supplementary Data 6. Phylogenetic tree of N-terminal NucS in newick format. Raw Maximum Likelihood and bootstrap phylogenetic tree corresponding to Supplementary figure 7 (NucS-NT region) in newick format.

18

Supplementary Methods.

Generation of a ΔnucS knockout mutant in M. smegmatis. To generate a ΔnucS knockout mutant in M. smegmatis, a 1.0 kb PstI-HindIII upstream and HindIII- SalI downstream fragments of nucS were amplified by PCR, using delnucSm 5F, 5R, 3F and 3R primers, and cloned in-frame into the p2NIL vector. Then, a pGOAL19 PacI-cassette, carrying a β- galactosidase lacZ gene, a hygromycin-resistance hyg gene and a sacB gene, that confers sucrose sensitivity, was also inserted into the p2NIL vector. The resulting plasmid p2NIL-ΔnucSSm, harbouring an in-frame deletion of the target gene (lacking ≈ 95% of the gene sequence), was electroporated into M. smegmatis mc2 155. Cells were plated on Middlebrook 7H10 agar-X-gal (100 µg ml-1) plus kanamycin (25 µg ml-1) and hygromycin (50 µg ml-1) and incubated for 4-5 days at 37ºC. Single-crossover merodiploid clones were grown in 7H9 broth without antibiotics to allow a second crossover event. Cultures were diluted and counter-selected on Middlebrook 7H10-X-gal (100 µg ml-1) plates containing 2% sucrose and incubated 4-5 days at 37ºC. Double-crossover clones were tested for kanamycin and hygromycin susceptibility to confirm the loss of the plasmid. Finally, M. smegmatis mc2 155 ΔnucS colonies were tested by PCR and sequencing to verify the unmarked nucS deletion.

Complementation of the M. smegmatis ΔnucS mutant. 2 For complementation with nucS from M. smegmatis mc 155, the MSMEG_4923 gene (nucSSm), including its own 46-bp promoter region, was amplified by PCR with primers compnucSmF and compnucSmR primers, digested with EcoRI and HindIII and cloned into the integrative vector pMV361, rendering the complementation vector pMV-nucSSm. Similarly, for complementation with nucS from M. tuberculosis, the wild-type full-length MT_1321 gene from CDC1551 control strain

(nucSTB), with its own 61-pb upstream promoter region, was amplified by PCR, using compnucTBF and compnucTBR primers, cloned into the pMV361 vector to generate the complementation vector, pMV-nucSTB. Putative complemented mutants were obtained upon electroporation of the plasmids into M. smegmatis mc2 155 ΔnucS and incubation of the plated samples on Middlebrook 7H10 agar plus kanamycin (25 µg ml-1) for 3-5 days at 37ºC. Finally, M. smegmatis mc2 155 ΔnucS complemented mutants were analysed by PCR and sequencing to verify the proper insertion of the genes.

19 Generation of ΔnucS S. coelicolor knockout mutant and complementation. pIJ6650 vector was used to clone in-frame two 1.5 kb DNA fragments, one HindIII-BamHI nucS upstream fragment plus one BamHI-EcoRV nucS downstream fragment, previously amplified by PCR (primers delnucSco 5F, 5R, 3F and 3R). E. coli ET12567 (pUZ8002) was transformed with pIJ-ΔnucSSco (containing the in-frame deletion of the nucSSco gene) and conjugated with S. coelicolor A3(2) M145. Following the isolation of apramycin resistant single-crossover, putative double-crossover mutants were isolated and the unmarked deletion of the nucSSco gene was verified by PCR. S. coelicolor ΔnucS was complemented with a wild-type copy of nucSSco cloned in pSET152. pSET152 is a non-replicative plasmid in Streptomyces that carries the attP site and the integrase gene of the C31 phage and consequently can integrate into the attB site 2 in the chromosome of Streptomyces. nucSSco gene with its own promotor was amplified by PCR with primers compnucScoF and compnucScoR, cloned into pSET152 using EcoRI and BamHI sites to give the pSET-nucSSco plasmid that was introduced into E. coli ET12567 (pUZ8002) by transformation. Finally, pSET-nucSSco from E. coli ET12567 (pUZ8002, pSET-nucSSco) was introduced into S. coelicolor ΔnucS by conjugation in order to integrate the construction into the chromosome.

Construction of pRhomyco plasmids. To create a template for measuring homologous and homeologous recombination in M. smegmatis, we designed a recombination assay based on the hygromycin-resistance (Hyg-R) gene hyg, using the integrative plasmid pMV361. For pRhomyco 100%, a fragment denominated hyg 3’ (from nucleotide 195 of the coding sequence to the stop codon) of the hyg gene, was PCR amplified from pRAM vector, using rechyg3F and rechyg3R primers and digested with EcoRV/SpeI to be cloned in StuI/SpeI targets of pMV361. Additionally, a fragment denominated hyg 5´ containing 711 bp (from the start codon of the hyg gene to nucleotide 711) was PCR amplified and cloned using PvuII/HindIII with rechyg5F and rechyg5R primers. Both fragments share two overlapping hyg fragments of 517 bp (nucleotide 195 to 711). The homeologous recombination vectors, named pRhomyco 95%, 90% and 85%, were constructed by replacing the overlapping 517-bp fragment of hyg 5´ for synthetic fragments that have 95%, 90% or 85% sequence similarity to the original hyg fragment (Supplementary Fig. 2). First, a common fragment to the three of them (from nucleotide 1 to 193 of hyg) was PCR-amplified with rechyg5F and rechyg5intR primers and cloned using PvuII/BclI. Then, each synthetic fragment was PCR amplified and cloned using BclI/NheI with rec95-BclIF and rec95-NheIR primers, in the case of pRhomyco 95%. For pRhomyco 90% and

20 85%, the variable fragment was cloned using BclI/PcI-BspHI with rec90-85-BclIF and rec90-85- PciIR primers.

Computational analyses. A summary of all the computational approaches conducted is depicted in Supplementary Fig. 4. For NucS, we used the structure-based alignment of bacterial and archaeal NucS (Supplementary Fig. 3). Then we followed the procedure depicted in Supplementary Fig. 4 to conduct domain analyses using sequences. Using the different defined regions by structural bioinformatics, we first conducted sequence searches against the large database. We made non-redundant alignments of the N-terminal region (containing 63 sequences) and the C-terminal region (containing 39 sequences) and built profile hidden Markov models (HMMs). These profiles were searched using pHMMER 3 against the large References Proteomes file where additional proteins containing CT and NT regions in alternative proteins were found. While the NT region was found in alternative archaeal proteins (we selected two after checking the alignment F7PKA0_9EURY and L0I7R5_HALRX (DUF91, PF01939), the CT region was found not only in additional prokaryotic groups, but also in eukaryotic sequences (I1F1Q3_AMPQE, I1F1Q5_AMPQE, B3SFT8_TRIAD, B9T981_RICCO, and A0A015J6U4_9GLOM). With the exception of A0A015J6U4, and B9T981, the matching regions are located within a DUF1016 domain (uncharacterized). We checked the alignments to confirm that the catalytic residues, as described in the structure of P. abyssi 1, were conserved. We next searched the PFAM database with NUCS_MYCTU and found a domain, PF01939, which was trained in three NucS full sequences. From the PF01939 domain (539 sequences in 464 species), only 368 entries in 363 (archaeal and bacterial) species gave a match with the full protein (Supplementary data file 1). To identify MutS and MutL in bacterial and archaeal species, we generated HMMs from different proteins, which would capture most of the bacterial and archaeal diversity (Supplementary Fig. 4). The MutS profile was trained with the following sequences (MUTS_ECOLI, C1F256_ACIC5, MUTS_AQUAE, MUTS_BACTN, MUTS_CHLPN, MUTS_CHLTE, MUTS_CHLAA, WP_027388865.1, MUTS_PROM3, B5YE41_DICT6, MUTS_BACSU, MUTS_STAAM, MUTS_CLOTE, MUTS_FUSNN, C1AEM6_GEMAT, A6DUF8_9BACT, MUTS_RHOBA, MUTS_BRUME, MUTS_NEIMA, MUTS_SALTY, MUTS_VIBC3, MUTS_PSEAE, MUTS_DESVH, MUTS_TREPA, A0A075WU36_9BACT, MUTS_THEMA, WP_009958798.1), while the MutL profile was trained with the following sequences (MUTL_ACIC5, MUTL_AQUAE, MUTL_BACTN, MUTL_CHLPN, MUTL_CHLTE, A9WJ86_CHLAA,

21 MUTL_DEIRA, MUTL_DICT6, D9S9H6_FIBSS, MUTL_BACSU, MUTL_STAAM, MUTL_CLOTE, Q8RG56_FUSNN, MUTL_GEMAT, A6DHB3_9BACT, Q7UMZ3_RHOBA, MUTL_BRUME, MUTL_NEIMA, A0A0C5TQ33_SALTM, MUTL_VIBCH, MUTL_PSEAE, Q72ET5_DESVH, MUTL_TREPA, A0A075WSF3_9BACT, MUTL_THEMA, and MUTL_ECOLI). The models were searched against each reference proteome where bit scores <50 were discarded to filter and analyze the results. Only full proteins were retrieved by forcing a length > 75%, so partial hits would be excluded. For actinobacterial and archaeal species not showing NucS, searches in all the particular genomes (only for complete genomes) were done using translated searches against their genomes using tblastn with NucS_MYCTU, MutS_ECOLI, and MUTL_ECOLI for actinobacterial, and NucS_PYRAB, MutS_HALMA, and MUTL_HALSA for archaeal proteins.

22 References for Supplementary Information

1 Ren, B. et al. Structure and function of a novel endonuclease acting on branched DNA substrates. The EMBO journal 28, 2479‐2489, doi:emboj2009192 [pii] 10.1038/emboj.2009.192 (2009). 2 Bierman, M. et al. Plasmid cloning vectors for the conjugal transfer of DNA from Escherichia coli to Streptomyces spp. Gene 116, 43‐49 (1992). 3 Eddy, S. R. A new generation of homology search tools based on probabilistic inference. Genome Inform 23, 205‐211 (2009).

23