biblio.ugent.be

The UGent Institutional Repository is the electronic archiving and dissemination platform for all UGent research publications. Ghent University has implemented a mandate stipulating that all academic publications of UGent researchers should be deposited and archived in this repository. Except for items where current copyright restrictions apply, these papers are available in Open Access.

This item is the archived peer‐reviewed author‐version of:

Multi Locus Sequence Analysis of root nodule isolates from Lotus arabicus (Senegal), Lotus creticus, Argyrolobium uniflorum and Medicago sativa (Tunisia) and description of Ensifer numidicus sp. nov. and Ensifer garamanticus sp. nov.

C. Merabet, M. Martens, M. Mahdhi, F. Zakhia, A. Sy, C. Le Roux, O. Domergue, R. Coopman, A. Bekki, M. Mars, A.Willems and P. de Lajudie.

In: International Journal of Systematic and Evolutionary Microbiology 60 (3), 664-674, 2010. http://ijs.sgmjournals.org/cgi/content/full/60/3/664

To refer to or to cite this work, please use the citation to the published version:

Merabet C., M. Martens, M. Mahdhi, F. Zakhia, A. Sy, C. Le Roux, O. Domergue, R. Coopman, A. Bekki, M. Mars, A.Willems and P. de Lajudie. 2010. Int. J. Syst. Evol. Microbiol. 60:664-674. DOI 10.1099/ijs.0.012088-0

1 IJS/2009/012088 REVISED MS 2 3 Multi Locus Sequence Analysis of root nodule isolates from Lotus arabicus 4 (Senegal), Lotus creticus, Argyrolobium uniflorum and Medicago sativa 5 (Tunisia) and description of Ensifer numidicus sp. nov. and Ensifer 6 garamanticus sp. nov. 7 8 C. Merabet1,2, M. Martens3, M. Mahdhi2,4, F. Zakhia2, A. Sy5, C. Le Roux2, O. Domergue2, R. 9 Coopman3, A. Bekki1, M. Mars4, A.Willems3 and P. de Lajudie2*. 10 11 1Laboratoire de Rhizobiologie, Univ. Oran, Es-Senia, Algeria. 12 2IRD, UMR 113 Symbioses Tropicales et Méditerranéennes F-34398 Montpellier, France. 13 3Laboratorium voor Microbiologie (WE10), Universiteit Gent, Belgium. 14 4Faculté des Sciences, Gabès Tunisia. 15 5 IRD-UCAD-ISRA, Laboratoire Commun de Microbiologie, Dakar, Senegal, West Africa. 16 17 18 19 *Corresponding author: [email protected] 20 L.S.T.M., UMR 113 IRD/CIRAD/AGRO-M/UM-II, Campus International de Baillarguet, TA A-82/J, 34398 21 MONTPELLIER Cedex 5, France. 22 Tel. : (33) (0) 4 67 59 38 51 23 Fax : (33) (0) 4 67 59 38 02 24 25 Running title: Ensifer numidicus and Ensifer garamanticus spp. nov. 26 27 Subject category: New Taxa (subsection ). 28 29 The GenBank/EMBL/DDBJ accession numbers of new sequences are provided in Table 1. 30 31

1 32 Abstract: 33 34 Nine isolates from Argyrolobium uniflorum, Lotus creticus, Medicago sativa (Tunisia) and 35 Lotus arabicus (Senegal) were analysed by Multilocus Sequence Analysis (MLSA) of five 36 housekeeping genes (recA, atpD, glnA, gltA and thrC), the 16S rRNA gene and the nodulation 37 gene nodA. Analysis of the individual gene sequences and concatenated gene sequences 38 demonstrated that the nine strains constitute three stable, well-supported (bootstrap and 39 sequence similarity values) monophyletic clusters A, B and C, all belonging to the Ensifer 40 genus branch, whatever the phylogenetic reconstruction method (Maximum likelyhood, 41 Maximum Parsimony, Neighbour Joining). The three groups were further characterized by 42 API 100 auxanographic tests, host specificity and nodA gene sequence analysis. In light of all 43 data we describe clusters A and C as two novel species within the genus Ensifer, Ensifer 44 numidicus sp. nov., with ORS 1407T (=LMG 24690T =CIP 109850T) as the type strain, and 45 Ensifer garamanticus sp. nov., with ORS 1400T (=LMG 24692T =CIP 109916T) as the type 46 strain. Cluster B strains are assigned to Ensifer adhaerens genomovar A. 47 48 49 Keywords: , Ensifer numidicus, Ensifer garamanticus, MLSA. 50 51 52 53 54 55 56 57 58

2 59 INTRODUCTION 60 61 During the past three decades, bacterial was based on a polyphasic approach 62 including both phenotypic and genotypic data with 16S rRNA gene sequence analysis and 63 DNA-DNA hybridisation being considered as the cornerstones of genotypic characterisation 64 for new species description (Vandamme et al., 1996; Stackebrandt et al., 2002; Coenye et al., 65 2005). However many authors have reported drawbacks of the methods (Stackebrandt et al., 66 2002; Stackebrandt, 2003; Eardly et al., 2005; van Berkum et al., 2003; Ueda et al., 1999; 67 Schouls et al., 2003). The ad hoc committee for the re-evaluation of the species definition in 68 bacteriology (Stackebrandt et al., 2002; Gevers et al., 2005) suggested the sequence analysis 69 of a set of protein-encoding genes as alternative phylogenetic markers, so-called MLSA 70 (Multilocus Sequence Analysis). Several recent studies confirmed that sequences from 71 housekeeping genes can be used for identification at the species level (Zeigler, 2003; Wertz et 72 al., 2003), evolutionary population genetics and in taxonomy (Bailly et al., 2006; Stepkowski 73 et al., 2003; Vinuesa et al., 2005a,b). The analysis of a small number of carefully selected 74 gene sequences may equal or even surpass the precision of DNA-DNA hybridisations for 75 quantification of genome relatedness and this approach thus has the potential to replace the 76 cumbersome DNA-DNA hybridisations (Zeigler, 2003, Martens et al., 2008). Mantelin et al. 77 (2006) included MLSA in the description of four new Phyllobacterium species and Vinuesa et 78 al. (2005c) described new Bradyrhizobium species using the phylogenies of three 79 housekeeping genes atpD, glnII, recA combined with other classical genotypic and 80 phenotypic analyses. Ensifer mexicanus and S. chiapanecum were described using phenotypic 81 analysis, phylogenies of recA, gyrA, nolR, rpoB, rrs and symbiotic genes and confidence 82 intervals of sequence identity to estimate both inter and intra-species ; all were in correlation 83 to DNA-DNA hybridization data (Lloret et al., 2007; Rincón-Rosales et al., 2009). Also, 84 MLSA was applied in diversity studies of Enterococcus, Aeromonas and Ensifer 85 (Sinorhizobium) (Naser et al., 2005; Soler et al., 2004, Bailly et al., 2006) in which it was 86 reported as a highly reproducible and economical method for rapid and reliable species 87 identification. 88 The genera Sinorhizobium and Ensifer were recently recognized to form a single phylogenetic 89 clade (Balkwill, 2003; Willems et al., 2003), and are now united and all Sinorhizobium 90 species were transfered to the genus Ensifer, in line with rule 38 of the Bacteriological Code 91 (Young, 2003; Judicial Commission, 2008). Ensifer currently includes 11 species (Wang et 92 al., 2002; Wei et al., 2002; Young, 2003; Toledo et al. 2003). Two further species, Ensifer

3 93 mexicanum and Sinorhizobium chiapanecum, have been published but not yet validated 94 (Lloret et al. 2007; Rincon-Rosales et al., 2009). Martens et al. (2007, 2008) recently 95 demonstrated that the discriminative power of MLSA for species identification and 96 delineation is higher than 16S rDNA sequence analysis and DNA-DNA hybridization within 97 the genus Ensifer. 98 Here we performed MLSA to complete the characterisation of nine root nodule strains 99 isolated from several legumes in Tunisia (Argyrolobium uniflorum, Lotus creticus, Medicago 100 sativa) and Senegal (Lotus arabicus). Argyrolobium uniflorum (Tribe Genisteae, Family 101 Fabaceae) is an indigenous herb legume in the Mediterranean bassin, and is a pastoral and 102 forage legume broadly distributed in arid and semi-arid regions of Tunisia. This plant plays an 103 important role in soil fertility maintenance, coverage and dune stability (Ferchichi, 1996). 104 Lotus arabicus L. (tribe Loteae, family Fabaceae), syn. Lotus mossamedensis Welw. ex 105 Baker, is a natural annual herb legume of coastal dunes in Senegal. Medicago sativa is the 106 most widely cultivated species of lucerne in the world, and its symbiotic rhizobial strains have 107 been extensively studied. Six of the isolates have previously been analyzed by ARDRA using 108 seven enzymes, total cell protein SDS-PAGE analysis and 16S rRNA gene sequencing 109 (Zakhia et al., 2004) and results suggested their separate positions in Ensifer (syn. 110 Sinorhizobium). According to the conclusions of Martens et al. (2007, 2008) the strains were 111 subjected to MLSA using five housekeeping genes, most discriminative among 10 tested 112 housekeeping genes for taxonomic purposes in Ensifer. Genes were analysed individually and 113 as a concatenation using only the congruent genes estimated by ILD test (incongruence-length 114 difference (Farris et al., 1995). Trees were constructed using three algorithmic methods for 115 comparison. The characterisation was completed by auxanographic tests, DNA-DNA 116 hybridizations, host specificity and sequence analysis of the symbiotic nodA gene for biovar 117 description. 118 119 120 121 MATERIALS AND METHODS: 122 123 Bacterial strains and culture media 124 The strains investigated are listed in Table 1. All strains were grown on yeast mannitol agar 125 YMA (Zakhia et al., 2004) at 28 °C. Isolates were stored at –80 °C in 50 % v/v glycerol. 126

4 127 Host specificity 128 The isolates were tested for nodulation on their original host plant. The seeds from Medicago 129 sativa and Lotus arabicus were sterilised in 3 % calcium hypochloride and scarified by 130 immersion in 96 % Sulphuric acid for 20 min, washed five times with sterile water and placed 131 in agar-water 1 % w/v at 24-25 °C for germination. After 3 to 4 days, the seedlings were 132 transferred to agar slant tubes (Bertrand, 1997) for root nodulation trials (5 plants per strain). 133 Plants were grown under continuous light (20 Wm-2) at 24 °C and inoculated with 1 ml of an 134 exponential growth phase YM bacterial suspension. Roots were observed for nodule 135 formation during the first 4 weeks after inoculation. Infectivity of strains isolated from 136 Argyrolobium uniflorum was previously reported by Zakhia (2004). 137 138 Genomic DNA isolation: 139 Bacterial DNA was prepared as described by Zakhia et al. (2004) or alternatively by the 140 alkaline lysis method (Baele et al., 2000). 141 142 143 DNA amplification 144 The 16S rRNA gene, internal fragments of five housekeeping genes (atpD, recA, gltA, thrC 145 and glnA) and the nodA gene were amplified using primers listed in Supplementary Table S1 146 (available in IJSEM Online). 147 PCR amplification of atpD, recA, gltA, thrC and glnA genes was performed as described by 148 Martens et al. (2007) except in a total volume of 25 μl reaction mixture. PCR amplification of 149 16S rRNA gene was performed according to Zakhia et al. (2004) and nodA gene according to 150 Zakhia et al. (2006). 151 152 Gene sequencing and phylogenetic data analysis 153 PCR products of different genes were purified and sequenced as previously described 154 (Martens et al., 2007) using primers detailed in Supplementary Table S1 (available in IJSEM 155 Online). Consensus sequences were constructed using the Autoassembler software (Applied 156 Biosystems). Interstrain sequence similarity (S) values were determined applying pairwise 157 sequence alignments using BioNumerics 4.6. software. The nucleotide sequences of partial 158 atpD, recA, gltA, glnA, thrC and SSU were independently aligned using the Clustal X 159 program, version 1.8 (Thompson et al., 1997). Neighbour-joining (NJ), Maximum-Parsimony 160 (MP) and Maximum Likelihood (ML) trees were constructed using PAUP* version 4.0b10

5 161 (Swoford, 2002). Reference sequences of Ensifer species were included for comparison 162 (accession numbers are listed in Martens et al., 2008). NJ analyses (Saitou & Nei, 1987) were 163 performed using the Kimura-2 correction (Kimura, 1980) and 1000 bootstrap replications; MP 164 analyses were performed using the heuristic search option. For ML analyses, the optimal 165 models of nucleotide substitution were estimated using the program MODELTEST 3.7 166 (Posada & Crandall, 1998), using the Akaike Information Criterion (AIC) (Posada & 167 Buckley, 2004). The MP trees were used as starting trees for the heuristic search procedure. 168 Bootstrap analyses were performed using 1000 replications of heuristic searches for MP and 169 100 replications for ML. The ILD test (Farris et al., 1995) implemented in PAUP* and using 170 1250 replicates was used to assess incongruence between datasets. A phylogenetic tree was 171 built with the concatenated sequences of the congruent housekeeping genes. 172 173 DNA-DNA Hybridisations 174 DNA was prepared according to a slightly modified procedure of Marmur (1961) as described 175 previously (Willems et al., 2001). Hybridization were carried out using the microplate method 176 in which unlabelled DNA, non covalently bound to the microplate, is hybridized with 177 biotinylated probe DNA (Ezaki et al., 1989; Willems et al., 2001; Goris et al., 1998). 178 Hybridizations were performed at 45 °C in 2xSSC, 50 % formamide. 179 180 Phenotypic tests 181 Microscopic observations, tolerance to pH, NaCl and temperature were performed as 182 previously described (de Lajudie et al., 1994; Mahdhi et al., 2008). 183 API100 galleries (BioMérieux, Montalieu-Vercieu, France) were used to test the utilisation of 184 carbon sources (carbohydrates, amino acids, organic acids) for bacterial growth. Inoculants 185 were obtained from 36-h YMA slant cultures. After inoculation, the galleries were incubated 186 at 30 °C, and results were determined after 1, 2, 4 and 7 days. The results of the 187 auxanographic tests were scored as described previously (Kersters & De Ley, 1984). Jaccard 188 similarity coefficients were calculated and an UPGMA analysis was performed using 189 BioNumerics 4.6.

6 190 RESULTS 191 192 Nodulation tests: 193 All studied strains induce efficient nodulation on their hosts of isolation. This was already 194 reported by Zakhia (2004) for ORS 1400, ORS 1401, ORS 1407, ORS 1444 (Argyrolobium 195 uniflorum), for ORS 1410 (Medicago sativa). Here strains ORS 2154, ORS 2133 and ORS 196 529, isolated from Lotus arabicus, were tested and are effective on their plant of isolation. 197 198 16S rRNA gene sequence analysis: 199 Nearly full-length 16S rRNA gene sequences (1340bp) were determined for strains ORS 200 1410, ORS 2133, ORS 529, ORS 2154, ORS 1401 and STM 354. The 16S rRNA gene 201 phylogenetic trees constructed using three methods (ML, MP, NJ) resulted in the same 202 groupings. Therefore, only the resulting ML tree is shown in Fig. 1. All studied isolates were 203 placed unambiguously within the and within the Ensifer reference 204 species clade. They were subdivided into three clusters (clusters A, B and C) supported by 205 high bootstrap values and displaying high intra-cluster sequence similarity. 206 Cluster A (Bootstrap [BT] value 97 %) includes the isolates ORS 1444, ORS 1410 and 207 ORS1407 and formed a homogeneous group (sequence similarity values 100 %) equally 208 distant to E. arboris, E. medicae and E. meliloti (99.7 % sequence similarity). 209 Cluster B strains (ORS 529, ORS 2154 and ORS 2133) also shared identical 16S rRNA gene 210 sequences and formed a significant cluster. This clade grouped together with the different E. 211 adhaerens genomovars (more than 99.6-99.8 % sequence similarity) and “S. morelense” 212 (99.0-99.2 % sequence similarity) reference strains, although low BT values were displayed 213 for this group (BT value 35 %). 214 The strains ORS 1400, ORS 1401 and STM 354 formed a homogeneous group (Cluster C, 215 100 % interstrain sequence similarity values) supported by a bootstrap value of 85 %, and 216 clustered in the vicinity of E. terangae (48 % BT value and 98,8 % sequence homology). The 217 type strain of E. mexicanum LMG 23932T which was described recently (Lloret at al., 2007) 218 is related to these taxa. It was not yet included in our analyses, but forms a distinct subclade, 219 sharing sequence similarity values of 97.8 to 98.8 % with clusters A, B and C and 99.5 % 220 with E. terangae strains. 221 222 Sequence analysis of individual genes.

7 223 The five housekeeping gene fragments examined in this study had different lengths: glnA 224 (977bp), gltA (681bp), recA (550bp), atpD (461bp), thrC (636bp). Within each group A, B or 225 C, the three strains of the group had identical sequences for all genes tested (except for ORS 226 1410 of group A for which we could not amplify thrC). Sequence analysis of the five 227 individual housekeeping genes resulted in different tree topologies (Fig. 2). However, the 228 same three separate clusters A, B and C were revealed with high bootstrap support for all 229 clusters in all gene analyses, including 16S rRNA gene. These clusters occupied different 230 positions relative to the reference species depending on the gene considered. 231 In the atpD tree topology, A. tumefaciens grouped with “S. morelense”, the 3 genomovars of 232 E. adhaerens, and all strains from cluster B (100 % BT) although this cluster had low BT 233 support (BT value 32 %). A second cluster (BT value 43 %) contained all other Ensifer 234 species and clusters A and C (both clusters 100 % BT value). 235 In the case of gltA, all Ensifer strains together with clusters A, B and C, formed a single 236 separate but poorly supported clade (BT value 38 %). Within this clade, all strains from 237 cluster C formed a significant subclade with E. terangae (100 % BT). Another well-supported 238 subgroup (BT value 98 %) was formed by all strains from cluster B and E. adhaerens gv. A. 239 In the glnA analysis, all Ensifer strains (and clusters A, B and C) were grouped together with 240 high BT support (75 %) and the same two significant subclusters as in the gltA tree topology 241 were found (BT values 100 and 95 % for cluster C/E. terangae and cluster B/E. adhaerens gv. 242 A respectively). 243 Also in the thrC gene tree, all Ensifer strains together with clusters A, B and C, formed a 244 single, separate but less supported clade (BT value 48 %), in which the same two subclades 245 were again distinguished (BT values 100 and 98 % for cluster C/E. terangae and cluster B/E. 246 adhaerens gv. A respectively). In the final single gene tree, the recA tree, again all Ensifer 247 strains and clusters A, B and C did form a separate but poorly supported clade (BT value 27 248 %). A well-supported subgroup (BT value 95 %) was formed by all strains from cluster B and 249 E. adhaerens gv. A. 250 Thus, cluster A formed a separate, well-supported group (BT value of 100 %) in all single 251 gene phylogenies with identical sequences for the three strains (100 %). Cluster A was 252 located at different positions in comparison to the Ensifer reference species and no significant 253 clusterings were apparent. The highest observed sequence similarities between cluster A 254 strains and the reference species ranged between 86.9 % with E. saheli and E. kostiensis for 255 the thrC gene and 94.3 % with reference strains E. meliloti and E. medicae for the atpD gene.

8 256 Cluster B grouped with Ensifer adhaerens, more specifically E. adhaerens gv. A with high 257 BT values in all housekeeping gene analyses, except for atpD. Sequence similarities with E. 258 adhaerens gv. A strains ranged between 96.4 % for thrC to 98.4 % for recA gene. With both 259 other genomovars the range was lower at 90.5% (gltA) to 95.3% (glnA). 260 The strains ORS 1400, ORS 1401 and STM 354 formed the monophyletic Cluster C in all 261 gene phylogenies (BT values ranging from 98 to 100 %; sequence similarity values 100 % for 262 all genes). This cluster displays sequence similarities with its closest neighbour E. terangae in 263 the range of 92,2 % (for recA) to 96,4 % (for gltA). Cluster C together with E. terangae was 264 also related to the E. mexicanum type strain LMG 23932T in the analysis of three 265 housekeeping genes (glnA, gltA and recA) (not included in Fig. 2). The sequence similarities 266 between Cluster C strains and E. mexicanum ranged from 92.2 % for recA, over 93.7 % for 267 gltA, to 95.8 % for glnA. Between E. terangae and E. mexicanum somewhat comparable 268 interspecies sequence similarity values amounted to 96.6 % for recA, 93.5 % for gltA and 96.2 269 % for glnA. This indicates that E. mexicanum, E. terangae and the isolates from Cluster C all 270 represent distinct genomic species. 271 272 Sequence analysis of concatenated housekeeping genes. 273 The atpD gene was found not to be congruent with the other genes (p-value<0, 01) while all 274 other housekeeping genes showed compatible phylogenetic signals (data not shown). A tree 275 was constructed from the concatenated sequences of the four congruent genes recA, glnA, 276 gltA, and thrC (p-value>0, 01) estimated by ILD test (Farris et al., 1995). The ILD test values 277 ranged from 0.0120 for thrC and recA to 0.6288 for thrC and glnA. In view of the ILD test 278 results, we concatenated the aligned sequences for recA, thrC, gltA and glnA and obtained an 279 alignment of 2704 nucleotides (comprising 1486 invariable sites, 237 variable but parsimony 280 uninformative sites and 981 parsimony informative sites). 281 The combined analysis showed a significant cluster (BT value 97 %), comprising all Ensifer 282 strains together with clusters A, B and C, which was subdivided into two closely related 283 subclusters. One well-supported subcluster (BT-value 100 %) includes “S. morelense”, the 284 three E. adhaerens genomovars and cluster B. As in most single gene analysis, cluster B 285 formed a significant subclade together with E. adhaerens gv. A. The second major subcluster 286 (BT value 99 %) comprised all other Ensifer strains and clusters A and C. In analogy to most 287 single gene trees, Cluster C formed a significant sub-branch with E. terangae (BT value 100 288 %). Cluster A strains ORS 1407 and ORS 1444 also formed a reliable cluster (BT value 100 289 %), which was well separated from all other genomic species. Since we did not obtain a thrC

9 290 sequence for ORS 1410, this strain could not be included in the concatenated housekeeping 291 gene analysis. As indicated, all mentioned clusters were supported by higher BT values in the 292 concatenated tree than in the single gene trees and are therefore more robust. 293 294 DNA-DNA Hybridization 295 To determine to which E. adhaerens genomovar Cluster B could be designated, we performed 296 DNA-DNA hybridisations between two strains (ORS 529 and ORS 2133) of this cluster and 297 three E. adhaerens reference strains, representing the three different genomovars (Willems et 298 al., 2003) (Table 2). In agreement with the gene sequence analyses, ORS 529 and ORS 2133 299 display high DNA-DNA hybridisation values (99 %), proving that the strains represent the 300 same genomic species. The hybridisation values of strains ORS 529 and ORS 2133 with E. 301 adhaerens gv. B and C strains are below 70 %, indicating that the two strains from Cluster B 302 are not E. adhaerens gv. B nor E. adhaerens gv. C. However, DNA-DNA hybridisation 303 results between cluster B strains and the E. adhaerens gv. A reference strain, exceed the 70 % 304 value which confirms the assumption drawn from MLSA that cluster B strains are E. 305 adhaerens gv. A strains. 306 307 nodA gene sequencing 308 The nodA gene was sequenced in strains representing both the different clusters and host 309 plants of isolation: two cluster C strains (ORS 1400/ Argyrolobium uniflorum and STM 354/ 310 Medicago sativa), one cluster B strain ORS529 (isolated from Lotus arabicus, like the two 311 other strains of the cluster) and two Cluster A strains (ORS 1444/ Argyrolobium uniflorum 312 and ORS1410/Lotus creticus). The nodA gene could not be amplified in strain ORS 1407. The 313 analysis of nodA sequences by neighbour-joining method (Fig. 3), shows that strains isolated 314 from Argyrolobium uniflorum, Lotus arabicus and Medicago sativa form a separate cluster. 315 This cluster is grouped with the nodA of E. meliloti and E. medicae branch with high 316 bootstrap support. In this study we report for the first time the position of nodA sequences of 317 isolates from Argyrolobium uniflorum. 318 319 Numerical analysis of auxanographic tests 320 All strains were tested for use of 99 substrates as sole carbon source for growth, using API 321 Biotype100 galleries (Supplementary Table S2). Some discriminative features could be 322 observed.

10 323 In contrast to its closest phylogenetic neighbours E. medicae LMG 19920T, E. meliloti LMG 324 6133T and E. arboris LMG 14919T, cluster A strains do not grow on xylitol and DL- 325 glycerate. Results for other substrates were variable between members of this cluster. About 326 28 substrates were differently utilized by strains ORS 1444, ORS 1410 and ORS 1407 and 13 327 substrates were only metabolised by ORS 1407 (dulcitol, D-lyxose, 1-O-methyl-α-D- 328 glucopyranoside, 3-O-Methyl-D-glucopyranose, D-gluconate, L-histidine, succinate, 329 fumarate, ethanolamine, DL-β-hydroxybutyrate, L-aspartate, L-alanine, propionate). 330 331 Cluster B strains used putrescine in contrast to the three genomovars (A, B, C) of E. 332 adhaerens. Seven substrates (L (+) sorbose, dulcitol, D-tagatose, citrate, 5-keto-D-gluconate, 333 D-gluconate, glutarate) were not metabolised by Cluster B members and diversely used 334 among the E. adhaerens three genomovars. 335 336 Cluster C, close to E. terangae in phylogenetic analysis, differed from this species in ten 337 substrates assimilated by E. terangae LMG 7834T and not by any member of Cluster C. 338 However, malonate is metabolised by all Cluster C strains and not by E. terangae LMG 339 7834T. 340 341 Auxanographic data were analysed numerically (data not shown). Clusters A and C formed 342 two related clusters but separate from the other Ensifer species and Cluster B was included in 343 E. adhaerens as observed in our genotypic analysis, with strain ORS 2154 close to Ensifer 344 adhaerens gv B strain R-7457. 345 346 347 348 DISCUSSION 349 350 In this study we performed MLSA to characterise nine strains from Tunisia and Senegal 351 isolated from root nodules of several legumes. Previously, six of them had been partially 352 characterised by ARDRA, SDS-PAGE and 16S rRNA gene sequencing (Zakhia et al. 2004). 353 Genes for MLSA were chosen according to previous taxonomic and phylogenetic studies: 354 recA (Recombinase A) and atpD (ATP synthase beta subunit) were used in phylogenetic 355 study of Agrobacterium and (Gaunt et al., 2001); glnA (Glutamine synthase) was 356 used by Turner & Young (2000) and gltA (Citrate synthase) was used by Hernandez-Lucas et

11 357 al. (2004). In a study evaluating the taxonomic potential of 10 housekeeping genes, Martens 358 et al. (2007, 2008) found these genes and additionally thrC (threonine synthase) usefull for 359 identification and inference of phylogenetic relationships of Ensifer species. Phylogenetic 360 analyses of the five housekeeping genes were compared with the 16SrRNA gene tree using 361 three algorithmic methods (ML, MP and NJ). We completed the characterisation with 362 auxanographic tests and nodA gene sequences for biovar determination. The nodA gene is a 363 key gene for establishment of symbiosis; it is present as a single copy and has a size of 590 to 364 660 bp (Moulin et al., 2004). Except for some photosynthetic bradyrhizobia (Giraud et al., 365 2007) all described thus far capable of establishing a symbiosis with legumes, 366 harbour nodulation (nod) genes and nitrogen fixation (nif, fix) genes, opening the prospect of 367 screening these genes as an alternative to nodulation tests. The encoded αβ-unsaturated Nod 368 factor protein is a key enzyme for Nod Factor synthesis, transferring an acyl chain on a chitin 369 oligomer, resulting in a biologically active NodA molecule. nodA gene sequences provide 370 information on the symbiotic characteristics of the rhizobium and may be predictive for the 371 type of Nod factor produced and the host specificity (Lortet et al., 1996; Debellé et al., 2001). 372 Because of its frequent plasmid-borne nature and its propension to lateral gene transfer, its 373 sequence is usually indicative of biovar rather than species affiliation (Haukka et al., 1998). 374 All phylogenies applying any of the three algorithmic methods unambiguously placed the 375 nine strains within the Alphaproteobacteria, in the Ensifer clade. The nine strains were 376 subdivided into three monophyletic clusters with high intra-species sequence similarity of 377 100 % for all gene sequence analyses. 378 379 Cluster A formed a separate cluster with variable positions relative to the Ensifer reference 380 species in the phylogenies of the 16S rRNA and all housekeeping genes. In the combined 381 gene sequence analysis Cluster A formed a single, seperate cluster which was clearly 382 distinguished from all other clusters. The sequence homologies between Cluster A strains and 383 strains of related reference species supported this segregation. The sequence similarity values 384 between cluster A and the other genotypic clusters ranged between 82-94.3 % in the different 385 gene analyses (Fig. 2). S. chiapanecum sp. nov., related to S. terangae, was reported during 386 the process of revision of the present work (Rincón-Rosales et al., 2009). We thus retrieved 387 the available recA and 16S rRNA gene sequences of this species and observed that both genes 388 place S. chiapanecum, S. terangae and S. mexicanus in a cluster (96.3-96.9% (recA) and 99.3- 389 99.7% (16S RNA gene) internal similarity values), away from cluster A, with interspecies 390 similarity values ranging from 92.2 to 92.6% (recA) and 98.4 to 98.9% (16S RNA).

12 391 These values illustrate a clear gap for sequence similarity levels within cluster A strains and 392 between cluster A and other reference strains, corresponding to the inter- and intra-species 393 sequence similarity value gap observed by Martens et al. (2007, 2008) in their comparison of 394 MLSA data and DNA-DNA hybridizations. This indicates that cluster A strains are well 395 distinguished from the Ensifer reference species and therefore represent a new genomic 396 species. 397 The auxanographic tests confirmed that Cluster A clearly formed a novel Ensifer species. We 398 also analysed the nodA sequences of the Cluster A representative strain ORS 1444 to get some 399 information on the Nod factor of Argyrolobium uniflorum microsymbionts. The results 400 showed that it grouped in a separate cluster close to E. meliloti/E. medicae nodA group. In 401 view of our phenotypic and genotypic results, and those obtained in previous work (Zakhia et 402 al., 2004), we propose to create a novel Ensifer species for Cluster A strains with the name 403 Ensifer numidicus sp. nov., with ORS 1407 as the type strain. In a parallel and independent 404 work, Mahdhi et al. (2008) described root nodule isolates from Argyrolobium uniflorum in the 405 same region of Tunisia. Among the most effective strains (potential candidates for inoculant 406 production), 13 formed a homogeneous and separate group on the Ensifer branch by 16S 407 rDNA PCR-RFLP. Three representative strains of this group were sequenced, and they shared 408 identical sequence with strain ORS 1444. This group may thus be considered as E. 409 numidicus. 410 Cluster B formed a monophyletic group in all phylogenies and grouped seperately with E. 411 adhaerens gv. A in the majority of the gene phylogenies. In the 16S rRNA gene tree, Cluster 412 B strains were grouped together with the E. adhaerens genomovars and “S. morelense”. In the 413 single and combined housekeeping gene analyses, Cluster B grouped with Ensifer adhaerens 414 gv. A, supported by high bootstrap values. The analysis of the sequence similarities placed 415 Cluster B in the gap between intra and interspecies sequence similarities values, confirming 416 that cluster B belongs to E. adhaerens species but with uncertain genomovar position. 417 Following the recommendation of Willems et al. (2003) for genomovar discrimination, we 418 performed DNA/DNA hybridisations. The results confirmed that Cluster B is different from 419 genomovars B and C but belongs to E. adhaerens genomovar A. Phenotypic and genotypic 420 analyses clearly indicate that cluster B groups with E. adhaerens gv. A. All strains from 421 Cluster B nodulate their plant of isolation Lotus arabicus. This is in contrast to all known 422 members of the E. adhaerens species (Casida, 1982; Willems et al., 2003), even though Rogel 423 et al. (2001) demonstrated that they may acquire nodulation capacity upon introduction of 424 symbiotic plasmids.

13 425 426 Cluster C formed a separate monophyletic group in all phylogenies, essentially in the E. 427 terangae / E. mexicanum clade. A similar grouping was obtained using the four congruent 428 genes (glnA, gltA, thrC and recA), in individual gene and in the concatenated gene trees. 429 Sequence similarity values indicate a clear gap between similarity levels within cluster C 430 strains and between cluster C and other reference strains. Values between cluster C strains and 431 the reference strains were comparable with those at the interspecies level found by Martens et 432 al. (2008) in their study comparing MLSA data and DNA-DNA hybridizations. This indicates 433 that Cluster C is distinct and forms a separate genospecies. Also the numerical analysis of 434 auxanographic results confirmed that cluster C is separate from all Ensifer species. Cluster C 435 strains can use malonate, but not p-hydroxybenzoate in contrast to its closest neighbour E. 436 terangae LMG 7834T. Two representative strains of cluster C, ORS 1400 and STM 354, 437 isolated from two different legumes, exhibited nodA sequences which grouped together (99 % 438 bootstrap value) with those from Cluster A and B, in E. meliloti and E. medicae branch. The 439 combination of the results obtained by Zakhia et al. (2004) by SDS-PAGE of whole cell 440 protein and 16S rRNA ARDRA and in this study by MLSA and additional 16S rRNA and 441 nodA gene sequence analyses, auxanographic test indicate that cluster C represents a novel 442 genospecies in the E. terangae, E. mexicanum clade, for which we propose the the name: 443 Ensifer garamanticus sp. nov., with ORS 1400 as the type strain. 444 445 We did not perform additional DNA-DNA hybridizations to establish the separate species 446 status of clusters A and C because of the clear MLSA evidence. In this study the highest 447 sequence similarity level with other Ensifer species we obtained for cluster A was 94.3% 448 (atpD) and for cluster C this was 96.4% (gltA). In view of the finding that strains of cluster 449 B which had housekeeping gene similarity levels of 96.4 to 98.4% with E. adhaerens 450 genomovars B and C, yield DNA-binding levels below the species threshold (Table 2, 47 to 451 60%), there will be even less DNA-binding between strains that have lower sequence 452 similarity levels in the housekeeping genes. This information is also in line with a previous 453 elaborate comparison of housekeeping gene sequences and DNA-DNA hybridizations 454 (Martens et al., 2008) where it was shown that MLSA of selected housekeeping genes can 455 accurately predict relations between closely related organims. In view of all these findings we 456 conclude from our data that cluster A and C represent two new Ensifer species, for which we 457 propose the names of E. numidicus and E. garamanticus respectively. 458

14 459 Description of Ensifer numidicus sp. nov. (Cluster A) 460 Ensifer numidicus (nu.mi'di.cus. L. masc. adj. numidicus, pertaining to the country of 461 Numidia, Numidiannumidicusnumidicus, the roman denomination of the region in North- 462 West Africa from which the majority of the organisms were isolated). 463 Short, aerobic, Gram-negative, non-spore-forming rods. Strains grow on Yeast Mannitol 464 medium, on which they form white slightly mucuous colonies after a growth period of three 465 days at 28 °C. 466 At the molecular level, this species can be differentiated by phylogenetic analysis based on 467 several housekeeping gene (recA, glnA, gltA, thrC, atpD) and 16S rRNA gene sequencing. Its 468 phylogenetic neighbors are E. medicae, E. meliloti and E. arboris. Detailed phenotypic 469 features for all strains are given in Supplementary Table S2. Ensifer numidicus can be 470 distinguished from E. medicae, E. meliloti and E. arboris by differential use of a combination 471 of growth substrates. Ensifer numidicus strains do not grow on xylitol and DL-glycerate. 472 Results for other substrates were variable between members of this cluster. About 28 473 substrates were differently used by strains ORS 1444, ORS 1410 and ORS 1407 and 13 474 substrates were only metabolised by ORS 1407 (dulcitol, D-lyxose, 1-O-methyl-α-D- 475 glucopyranoside, 3-O-Methyl-D-glucopyranose, D-gluconate, L-histidine, succinate, 476 fumarate, ethanolamine, DL-β-hydroxybutyrate, L-aspartate, L-alanine, propionate). Thus 477 Ensifer numidicus strains cannot be identified by biochemical and physiological characters 478 alone. Strains can nodulate Argyrolobium uniflorum (ORS 1444 and ORS 1407) and Lotus 479 creticus (ORS 1410). The nodulation gene nodA differs in sequence compared to those of 480 other Ensifer species but is most closely related to those of the E. meliloti and E. medicae 481 clade. 482 The type strain of this species is ORS 1407T (LMG 24690T, CIP 109850T). The %GC content 483 of its DNA is 62.8 % (HPLC). 484 A group of 13 Sinorhizobium sp. strains reported by Mahdhi et al. (2008), AB1, AB3, AB5, 485 AB6, AB10, AM1, AM2, AM3, AS1, AS2, AS3, AS4, AS5, may be considered as Ensifer 486 numidicus and included as members of this species. These strains were independantly 487 isolated from Argyrolobium uniflorum in South Tunisia. These strains form a homogeneous 488 group by PCR-RFLP of 16S rRNA gene. Three representative strains of this group STM 489 4034 (AB1), STM 4036 (AB3), STM 4039 (AS1), share identical 16S rRNA gene sequence 490 with E. numidicus strains. The three latter strains form the most effective symbiosis with 491 Argyrlobium uniflorum with STM 4036 as the most efficient one. They tolerate pH 6-9, 1-2 492 % NaCl and 40°C for growth. They are sensitive to ampicillin (100 μg ml-1) and

15 493 Streptomycin (100 μg ml-1) but resistant to Kanamycin (100 μg ml-1) and nalixidic Acid (100 494 μg ml-1). 495 496 497 Description of Ensifer garamanticus sp. nov. (Cluster C) 498 Ensifer garamanticus (ga.ra.man.'ti.cus. L. masc. adj. garamanticus, pertaining to Garamante, 499 Garamantian, garamanticus, "the country of Garamantes". In Roman times, Garamantes were 500 inhabitants living in the region South Numidia (Tunisia). Garamantis was their eponym hero. 501 Strains were isolated in this region, which is semi-arid and sunny. 502 Short, aerobic, Gram-negative, non-spore-forming rods. The strains form white and slightly 503 mucous colonies on YMA medium after 48 to 72 h incubation at 28 °C. 504 Strains efficiently nodulate Argyrolobium uniflorum (ORS 1400, ORS 1401) and Medicago 505 sativa (STM 354). 506 Ensifer garamanticus can be distinguished from other species by phylogenetic analysis based 507 on several housekeeping (recA, glnA, gltA, thrC, atpD) and 16S rRNA gene sequencing. 508 Malonate is used by all strains for growth but not by its closest phylogenetic neighbor E. 509 terangae. Detailed phenotypic features for all strains are given in Supplementary Table S2. 510 Ensifer garamanticus, close to E. terangae in phylogenetic analysis, differed in ten substrates 511 used for growth by E. terangae LMG 7834T and not by any member of Ensifer garamanticus. 512 However, malonate is metabolised by all Ensifer garamanticus strains and not by E. terangae 513 LMG 7834T. 514 The type strain of this species is ORS 1400T (LMG 24692T, CIP 109916T). The %GC content 515 of its DNA is 62.4 % (HPLC). 516 The group formed by Sinorhizobium sp. strains STM 4015, STM 4016, STM 4027, STM 517 4031, STM 4032, isolated from Genista saharae in South Tunisia by Mahdhi et al. (2007), 518 share identical 16S rRNA gene sequence with E. garamanticus type strain. They may be thus 519 be considered as Ensifer garamanticus members. These strains were described also 520 phenotypically and generally tolerate high temperature (40°C), high pHs (7-12) and high 521 NaCl concentrations (1% up to 4%) for growth. They nodulate their plant of isolation, so that 522 Genista saharae may be considered as potential host plant of Ensifer garamanticus. 523 524 525 Abbreviations 526 CIP: Institut Pasteur (Paris) Collection

16 527 Acknowledgements 528 The authors are grateful to F. Boukhatem, B. Dreyfus, A., A. Galiana, L. Moulin, G. Béna, B. 529 Hamed Bey, H. Ameziane and L. Bouchentouf for helpful discussions. C. M., F. Z., A. B. and 530 A. S. are indebted to the Département Soutien-Formation (DSF) of the Institut de Recherche 531 pour le Développement (IRD), France for PhD fellowship and ESCD grants and M. Mosbah 532 for AUF grant. AW is grateful to the Fund for Scientific Research – Flanders for a 533 postdoctoral fellowship. This work was supported by the Commission of the European 534 Communities RT-program contract BACDIVERS QLRT-2001-02097. 535 536 537 538 539 540

541 REFERENCES 542 Baele, M., Baele, P., Vaneechoutte, M., Storms, V., Butaye, P., Devriese, L. A., Verschraegen, G., Gillis, M. & 543 Haesebrouck, F. (2000). Application of tRNA intergenic spacer PCR for identification of Enterococcus species. J 544 Clin Microbiol 38, 4201-4207. 545 Bailly, X., Olivieri, I., de Mita, S., Cleyet-Marel, J-C. & Béna, G. (2006). Recombination and selection shape the 546 molecular diversity pattern of nitrogen-fixing Sinorhizobium sp. associated to Medicago. Mol Ecol 15, 2719-34. 547 Bertrand, H. (1997). Stimulation de l'absorption minérale et de la croissance du Colza inoculé avec des bactéries 548 isolées de son rhizoplan. Thèse de l’Université Claude Bernard-Lyon I. 549 Casida, L. E., Jr. (1982). Ensifer adhaerens, gen. nov., sp. nov. a bacterial predator of bacteria in soil. Int J Syst 550 Bacteriol 32, 339–345. 551 Coenye, T., Gevers, D., Van de Peer, Y., Vandamme, P. & Swings, J. (2005). Towards a prokaryotic genomic 552 taxonomy. FEMS Microbiol Rev 29, 147-167. 553 Debellé, F., Moulin, L., Mangin, B., Dénarié, J. & Boivin, C. (2001). nod genes and Nod signals and the 554 evolution of the rhizobium legume symbiosis. Acta Biochem Pol 48, 359-365. 555 de Lajudie, P., A. Willems, B. Pot, D. Dewettinck, G. Maestrojuan, M. Neyra, M. D. Collins, B. L. 556 Dreyfus, K. Kersters & Gillis, M. (1994). Polyphasic taxonomy of rhizobia. Emendation of the genus 557 Sinorhizobium and description of Sinorhizobium meliloti comb. nov., Sinorhizobium saheli sp. nov. and 558 Sinorhizobium teranga sp. nov. Int. J. Syst. Bacteriol. 44, 715-733. 559 Eardly, B. D., Nour, S. M., van Berkum, P. & Selander, R. K. (2005). Rhizobial 16S rRNA and dnaK genes: 560 Mosaicim and the uncertain phylogenetic placement of Rhizobium galegae. Appl Env Microbiol 27,1328- 1335. 561 Ezaki, T., Hashimoto, Y. & Yabuuchi, E. (1989). Fluorometric deoxyribonucleic acid-deoxyribonucleic acid 562 hybridization in microdilution wells as an alternative to membrane filter hybridization in which radioisotopes are 563 used to determine genetic relatedness among bacterial strains. Int J Syst Bacteriol 39, 224-229.

17 564 Farris, J. S., Källersjö, M., Kluge, A. G. & Bult, C. (1995). Testing significance of incongruence. Cladistics 10, 565 315-319. 566 Ferchichi (1996). La lutte contre l’ensablement et pour la stabilisation des dunes : essai de la fixation biologique des 567 dunes en Tunisie presaharienne. Recherche sur la désertification dans la Jeffara. Rev. Tunis. Geogr. 12, 49-102. 568 Gaunt, M. W, Turner, S. L, Rigottier-Gois, L., Lloyd-Macgilp, S. A. & Young, J. P. (2001). Phylogenies of atpD 569 and recA support the small subunit rRNA-based classification of rhizobia. Int J Syst Evol Microbiol 51, 2037-2048. 570 Gevers, D., Cohan, F. M., Lawrence, J. G., Spratt, B. G., Coenye, T., Feil, E. J., Stackebrandt, E., Van de Peer, 571 Y., Vandamme, P., Thompson, F.L. & Swings, J. (2005). Opinion: Re-evaluating prokaryotic species. Nature Rev 572 Microbiol 3, 733-739. 573 Giraud E., Moulin L., Vallenet D., & 31 other authors (2007). Legume symbioses: absence of nod genes in 574 photosynthetic bradyrhizobia. Science 316, 1296-1297. 575 Goris, J., Suzuki, K-I., De Vos, P., Nakase, T. & Kersters, K. (1998). Evaluation of microplate DNA-DNA 576 hybridization method compared with the initial renaturation method. Can J Microbiol 44, 1148-1153. 577 Haukka, K., Lindström, K. & Young, J. P. (1998). Three phylogenetic groups of nodA and nifH genes in 578 Sinorhizobium and isolates from leguminous trees growing in Africa and Latin America. Appl 579 Environ Microbiol 64, 419-26. 580 Hernandez-Lucas, I., Rogel-Hernandez, M. A., Segovia, L., Rojas-Jiménez, K. & Martinez-Romero, E. (2004). 581 Phylogenetic relationships of rhizobia based on citrate synthase gene sequences. Syst Appl Microbiol 27, 703-706. 582 Judicial Commission of the International Committee on Systematics of Prokaryotes (2008). The genus name 583 Sinorhizobium Chen et al. 1988 is a later synonym of Ensifer Casida 1982 and is not conserved over the latter genus 584 name, and the species name ‘Sinorhizobium adhaerens’ is not validly published. Opinion 84. Int J Syst Evol 585 Microbiol 58, 1973. 586 Kersters, K. & De Ley, J. (1984). Genus III. Agrobacterium Conn 1942. In Bergey’s manual of systematic 587 bacteriology, vol. 1 pp. 244-254. Edited by N. R. Krieg & J. G. Holt. The Williams & Wilkins Co., Baltimore. 588 Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative 589 studies of nucleotide sequences. J Mol Evol 16, 111-120. 590 Lloret, L., Ormeno-Orrillo, E., Rincon, R., Martinez-Romero, J., Rogel-Hernandez, M. A. & Martinez- 591 Romero, E. (2007). Ensifer mexicanus sp. nov. a new species nodulating Acacia angustissima (Mill.) Kuntze in 592 Mexico. Syst Appl Microbiol 30, 280-290. 593 Lortet, G., Mear, N., Lorquin, J., Dreyfus, B., de Lajudie, P., Rosenberg, C. & Boivin, C. (1996). Nod 594 factor thin-layer chromatography profiling as a tool to characterize symbiotic specificity of rhizobial strains : 595 application to Sinorhizobium saheli, S. teranga and Rhizobium sp. strains isolated from Acacia and Sesbania. 596 Mol. Plant-Microbe Interact 9:736-747. 597 Mahdhi, M., de Lajudie, P. & Mars, M. (2008). Phylogenetic and symbiotic characterization of rhizobial bacteria 598 nodulating Argylorobium uniflorum in Tunisian arid soils. Can J Microbiol 54, 209-217. 599 Mahdhi, M., Nzoué, A., Gueye, F., Merabet, C., de Lajudie, P. & Mars, M. (2007). Phenotypic and genotypic 600 diversity of Genista saharae microsymbionts from the infra-arid region of Tunisia. Lett Appl Microbiol ISSN 0266- 601 8254. 602 Mantelin, S., Fischer-Le Saux, M., Zakhia, F., Béna, G., Jeder, H., de Lajudie, P. & Cleyet-Marel J. C. (2006). 603 Emended description of the genus Phyllobacterium and description of four novel species associated to plant roots

18 604 Phyllobacterium bourgognense sp. nov. Phyllobacterium ifriqiyense sp. nov. Phyllobacterium leguminum sp. nov. 605 and Phyllobacterium brassicacearum sp. nov. Int J Syst Evol Microbiol 56, 827-839. 606 Martens, M., Delaere, M., Coopman, R., De Vos, P., Gillis, M. & Willems, A. (2007). Multilocus sequence 607 analysis of Ensifer and related taxa. Int J Syst Evol Microbiol 57, 489-503. 608 Martens, M., Dawyndt, P., Coopman, R., Gillis, M., De Vos, P. & Willems, A. (2008). Advantages of MultiLocus 609 Sequence Analysis for taxonomic studies: A case study using 10 housekeeping genes in the genus Ensifer. Int J Syst 610 Evol Microbiol 58, 200-214. 611 Moulin, L., Béna, G., Boivin-Masson, C. & Stepkowski T. (2004). Phylogenetic analyses of symbiotic nodulation 612 genes support vertical and lateral gene co-transfer within the Bradyrhizobium genus. Mol Phyl Evol 30(3):720-32. 613 Naser, S. M., Thompson, F. L., Hoste, B., Gevers, D., Dawyndt, P., Vancanneyt, M. & Swings, J. (2005). 614 Application of Multilocus sequence analysis (MLSA) for rapid identification of Enterococcus species based on rpoA 615 and pheS genes. Microbiol. 151, 2141-2150. 616 Normand, P., Orso, S., Cournoyer, B., Jeannin, P., Chapelon, C., Dawson, J., Evtushendro, L. & Misra, A. K. 617 (1996). Molecular phylogeny of the genus Frankia and related genera and emendation of the Family Frankiaceae. Int 618 J Syst Bact 46, 1-9. 619 Normand, P., Cournoyer, B., Simonet, P. & Nazaret, S. (1992). Analysis of a ribosomal RNA operon in the 620 actinomycete Frankia. Gene 111, 119-124. 621 Posada, D. & Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: advantages of the 622 Akaike Information Criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53, 793–808. 623 Posada, D. & Crandall, K. A. (1998). MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 624 817–818. 625 Rincon-Rosales, R., Lloret, L., Ponce, E. & Martínez-Romero, E. (2009). Rhizobia with different symbiotic 626 efficiencies nodulate Acaciella angustissima in Mexico, including Sinorhizobium chiapanecum sp. nov. which has 627 common symbiotic genes with Sinorhizobium mexicanum. FEMS Microbiol. Ecol. 67, 103 – 117. 628 Rogel, M. A., Hernandez-Lucas, I., Kuykendall, L. D., Balkwill, D. L. & Martinez-Romero, E. (2001). 629 Nitrogen-fixing nodules with Ensifer adhaerens harbouring Rhizobium tropici symbiotic plasmids. Appl Environ 630 Microbiol 67, 3264-3268. 631 Saitou, N. & Nei, M. (1987). The neighbour joining method: a new method for reconstructing phylogenetic trees. 632 Mol Biol Evol 4, 406-425. 633 Schouls, L. M., Schot, C. S. & Jacobs, J. A. (2003). Horizontal transfer of segments of the 16S rRNA genes 634 between species of the anginosus group. J Bacteriol 185, 7241-7246. 635 Stackebrandt, E. (2003). The richness of prokaryotic diversity: there must be a species somewhere. Food Technol 636 Biotechnol 41, 17-22. 637 Stackebrandt, E., Frederiksen, W., Garrity, G. M., Grimont, P. A. Kampfer, P., Maiden, M. C., Nesme, X., 638 Rossello-Mora, R., Swings, J., Trupper, H. G., Vauterin, L., Ward, A. C. & Whitman, W. B. (2002). Report of 639 the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int J Syst Evol Microbiol 52, 640 1043-1047. 641 Stepkowski, T., Czaplinska, M., Miedzinska, K. & Moulin, L. (2003): The variable part of the dnaK gene as an 642 alternative marker for phylogenetic studies of rhizobia and related alpha Proteobacteria. Syst. Appl. Microbiol. 26, 643 483-494.

19 644 Soler, L., Yanez, M. A., Chacon, M. R., Aguilera-Arreola, M. G., Catalan, V., Figueras, M. J. & Martinez- 645 Murcia, A. J. (2004). Phylogenetic analysis of the genus Aeromonas based on two housekeeping genes. Int J Syst 646 Bacteriol 54, 1511-1519. 647 Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The ClustalX windows 648 interface: flexible strategies for multiple sequence alignement aided by quality analysis tools. Nucleic Acids Res 24, 649 4876-4882. 650 Toledo, I., LIoret, L. & Martinez-Romero, E. (2003). Sinorhizobium americanum sp. nov., a new Sinorhizobium 651 species nodulating native Acacia spp. in Mexico. Syst Appl Microbiol 26, 54-64. 652 Turner, S. L. & Young, J. P. W. (2000). The glutamine synthetases of rhizobia: phylogenetics and evolutionary 653 implications. Mol Biol Evol 17, 309-319. 654 Ueda, K., Seki, T., Kudo, T., Yoshida, T. & Kataoka, M. (1999). Two distinct mechanisms cause heterogeneity of 655 16S rRNA. J Bacteriol 181, 78-82. 656 van Berkum, P., Terefework, Z., Paulin, L., Suomalainen, S. Lindström, K. & Eardly, B. D. (2003). Discordant 657 phylogenies within the rrn loci of rhizobia. J Bacteriol 185, 2988-2998. 658 Vandamme, P., Pot, B., Gillis, M., De Vos, P., Kersters, K. & Swings, J. (1996). Polyphasic taxonomy, a 659 consensus approach to bacterial systematics. Microbiol Rev 60, 407-438. 660 Vinuesa, P., Silva, C., Lorite, M. J., Izaguirre-Mayoral, M. L., J.Bedmar, E. & Martinez-Romero, E. (2005a). 661 Molecular systematics of rhizobia based on maximum likelihood and Baysian phylogenies inferres from rrs, atpD, 662 recA and nifH sequences, and their use in the classification of Sesbania microsymbionts from Venezuelan wetlands. 663 Syst Appl Micobiol 28, 702-716. 664 Vinuesa, P., Silva, C., Werner, D. & Martinez-Romero, E. (2005b). Population genetics and phylogenetic 665 inference in bacterial molecular systematics: the roles of migration and recombination in Bradyrhizobium species 666 cohesion and delineation. Mol Phylogenet Evol 34, 29-54. 667 Vinuesa, P., Leon-Barrios, M., Silva, C., Willems, A., Jarabo-Lorenzo, A., Pérez-Galdona, R. & Martinez- 668 Romero, E. (2005c). Bradyrhizobium canariense sp. nov., an acid-tolerant endosymbiont that nodulates endemic 669 genistoid legumes (Papilionoideae :Genisteae) growing in the canary Islands, along with B. japonicum bv. 670 genistearum, Bradyrhizobium genospecies α and Bradyrhizobium genospecies β. Int J Syst Evol Microbiol 55, 569- 671 575. 672 Wang, E. T., Tan, Z. Y., Willems, A., Fernandez-Lopez, M., Reinhold-Hurek, B. & Martinez-Romero, E. 673 (2002). Sinorhizobium morelense sp. nov., isolated from Leucaena leucocephala that is highly resistant to multiple 674 antibiotics. Int J Syst Microbiol 52, 1687-1693. 675 Wei, G. H., Wang, E. T., Tan, Z. Y., Zhu, M. E. & Chen, W. X. (2002). Rhizobium indigoferae sp. nov. and 676 Sinorhizobium kummerowiae sp. nov., respectively isolated from Indigofera spp ; and Kummerowia stipulacea. Int J 677 Syst Evol Microbiol 52, 2231-2239. 678 Wertz, J. E., Goldstone, C., Gordon, D. M. & Riley, M. A. (2003). A molecular phylogeny of enteric bacteria and 679 implications for a bacterial species concept. J Evol Biol 16, 1236-1248. 680 Willems, A., Fernandez-Lopez, M., Munoz-Adelantado, E., Goris, J., De Vos, P., Martinez-Romero, E., Toro, 681 N. & Gillis, M. (2003). Description of new Ensifer strains from nodules and proposal to transfer Ensifer adhaerens 682 Casida 1982 to Sinorhizobium as Sinorhizobium adhaerens comb. nov. Request for an Opinion. Int J Syst Evol 683 Microbiol 53, 1207-1217.

20 684 Willems, A., Doignon-Bourcier, F., Goris, J., Coopman, R., de Lajudie, P., De Vos, P. & Gillis, M. (2001). 685 DNA-DNA hybridization study of Bradyrhizobium strains. Int J Syst Microbiol 51, 1315-1322. 686 Young, J. M. (2003). The genus name Ensifer Casida 1982 takes priority over Sinorhizobium Chen et al. 1988, and 687 Sinorhizobium morelense Wang et al. 2002 is a later synonym of Ensifer Casida 1982. Is the combination 688 Sinorhizobium adhaerens (Casida 1982) Willems et al., 2003 legitimate? Request for an Opinion. Int J Syst Evol 689 Microbiol 53, 2107-2110. 690 Zakhia, F., Jeder, H., Domergue, O., Willems, A., Cleyet-Marel, J.-C. , Gillis, M., Dreyfus, B. & de Lajudie, P. 691 (2004). Characterisation of Legume-Nodulating Bacteria (LNB) in arid regions of Tunisia. Syst Appl Microbiol 27, 692 380-395. 693 Zakhia, F. (2004). Diversité des bactéries hôtes de légumineuses méditerranéennes en Tunisie et au Liban. Thèse 694 d’université. Université de Montpellier, France. 200 p. 695 Zakhia, F., Jeder,, H., Willems, A., Dreyfus, B. & de Lajudie, P. (2006). Diverse bacteria associated with root 696 nodules of spontaneous legumes in Tunisia and first report for nifH-like gene within the genera Microbacterium and 697 Starkeya.. Microbiol. Ecol. 51(3), 375-393. 698 Zeigler, D. R. (2003). Gene sequences useful for predicting relatedness of whole genomes in bacteria. Int J Syst Evol 699 Microbiol 53, 1893-1900.

21 Table 1. Accession numbers of new sequence data ND, not detected.

Name Strain Other number SSU recA glnA atpD thrC gltA nodA Source E. numidicus ORS 1444 LMG 24691 AY500253 AM946578 AM946562 AM946549 AM946586 AM946596 Argyrolobium uniflorum, Tunisia (Zakhia et al., 2004) E. numidicus ORS 1410 LMG 24736, AM946566 AM946577 AM946561 AM946550 ND AM946595 Lotus creticus, Tunisia (Zakhia et al., 2004) CIP 109858 E. numidicus ORS 1407T LMG24690T, AY500254 AM946576 AM946560 AM946551 AM946585 AM946594 Argyrolobium uniflorum, Tunisia (Zakhia et al., CIP 109850T 2004) E. garamanticus ORS 1400T LMG 24692T AY500255 AM946573 AM946557 AM946546 AM946582 AM946591 Argyrolobium uniflorum, Tunisia (Zakhia et al., 2004) E. garamanticus ORS 1401 LMG 24693, AM946567 AM946574 AM946558 AM946547 AM946583 AM946592 Argyrolobium uniflorum, Tunisia (Zakhia et al., CIP 109848 2004) STM 354 LMG 24694, Medicago sativa, Tunisia (Zakhia et al., 2004) E. garamanticus AM946568 AM946575 AM946559 AM946548 AM946584 AM946593 CIP 109849 E. adhaerens gv. A ORS 2154 AM946569 AM946579 AM946565 AM946553 AM946587 AM946598 Lotus arabicus, Senegal (Sy et al., 2001) E. adhaerens gv. A ORS 2133 AM946570 AM946580 AM946563 AM946554 AM946588 AM946597 Lotus arabicus, Senegal (Sy et al., 2001) E. adhaerens gv. A ORS 529 AM946571 AM946581 AM946564 AM946555 AM946589 AM946599 Lotus arabicus, Senegal (Sy et al., 2001)

22 Table 2. DNA-DNA hybridisation values among Ensifer species and strains of cluster B isolated from Lotus arabicus in Senegal.

ORS 529 ORS 2133 ORS 529 100 99 ORS 2133 99 100 Ensifer adhaerens gv. A LMG 10007 74 75 Ensifer adhaerens gv. B R-7457 60 59 Ensifer adhaerens gv. C LMG 20216T 47 47

23 Supplementary Table S1. Primers used for DNA amplification and sequencing. Primers given in bold were used for initial amplification. Gene Primer Direction length (bases) Primer sequence (5’-3’) Reference 16S rRNA FGPS 6 forward 22 GGA GAG TTA GAT CTT GGC TCA G Normand et al., 1992 16S rRNA FGPS 1509 reverse 20 AAG GAG GGG ATC CAG CCG CA Normand et al., 1992 16S rRNA FGPS 484-292 forward 15 CAG CAG CCG CGG TAA Normand et al., 1996 16S rRNA 16S-1080r reverse 19 GGG ACT TAA CCC AAC ATC T Sy et al., 2001 16S rRNA 16S-870f forward 21 CCT GGG GAG TAC GGT CGC AAG Sy et al., 2001 16S rRNA 16S-1924 reverse 20 GGC ACG AAG TTA GCC GGG GC Sy et al., 2001 recA A555 reverse 26 CGR ATC TGG TTG ATG AAG ATC ACC AT Gaunt et al., 2001 recA A36 forward 23 ATC GAG CGG TCG TTC GGC AAG GG Gaunt et al., 2001 atpD atpD273F forward 21 SCT GGG SCG YAT CMT GAA CGT Gaunt et al., 2001 atpD atpD771R reverse 23 GCC GAC ACT TCC GAA CCN GCC TG Gaunt et al., 2001 glnA glnA144F forward 21 GTC ATG TTC GAC GGY TCY TCG Martens et al., 2007 glnA glnA1142R reverse 21 TGG AKC TTG TTC TTG ATG CCG Martens et al., 2007 glnA glnA572F forward 20 GGA CAT GCG YTC YGA RAT GC Martens et al., 2007 glnA glnA572R reverse 20 GCA TYT CRG ARC GCA TGT CC Martens et al., 2007 thrC thrC577F forward 21 GGC AMK TTC GAC GAY TGC CAG Martens et al., 2007 thrC thrC1231R reverse 20 GGR AAT TTD GCC GGR TGS GC Martens et al., 2007 thrC thrC766F forward 17 GGC AAT TTC GGC GAY AT Martens et al., 2007 thrC thrC766R reverse 17 ATR TCG CCG AAA TTG CC Martens et al., 2007 thrC thrC925R reverse 20 GAS GAR AYC TGG ATR TCC AT Martens et al., 2007 gltA gltA428F forward 19 CSG CCT TCT AYC AYG ACT C Martens et al., 2007 gltA gltA1111R reverse 20 GGG AAG CCS AKC GCC TTC AG Martens et al., 2007 nodA NodA1F forward 24 TGC RGT GGA ARN TRB VYT GGG AAA Haukka et al., 1998 nodA NodAB1R reverse 23 GGN CCG TCR TCR AAW GTC ARG TA Haukka et al., 1998

24 Supplementary Table S2: Carbon assimilation tests. Cluster A is compared to reference strains of E. meliloti, E. medicae and E. arboris ; Cluster B is compared to E. adhaerens genomovars A, B, C ; Cluster C is compared to E. terangae reference strain. The following substrate of the biotype 100 strip (bioMérieux) are assimilated by all strains: D(+) Galactose, β-D (+) Fructose , D(+) Trehalose, D(+) Mannose, Sucrose (Saccharose), Maltose, α-Lactose, Lactulose , 1-0-Methyl-β-galactopyranoside, 1-0-Methyl-α- galactopyranoside, D(+)Xylose, Palatinose, α-L-Rhamnose, D(+) Arabitol, Glycerol, myo-Inositol, D-Mannitol, Maltitol, D(+)Turanose , D- Sorbitol, L-Glutamate, L-Proline, Betain, . The following are not assimilated by any strain: D-Saccharate, Mucate, L(+) Tartrate, D(-) Tartrate, meso-Tartrate, Tricarballylate, Gentisate, m-Hydroxybenzoate , 3-Phenylpropionate, m-Coumarate, Histamine, Caprylate, Tryptamine +: Growth ; +/-: Weak Growth ; - No growth *: Positive reaction is estimated by a colorated reaction (Esculin “dark brown/ Black”, Hydroxyquinoline β-glucuronide “black point”, L- Tryptophan “Dark orange”, L-Histidine “pink coloration »). b: Black coloration; j: Yellow coloration; i: not colored; c: colored

25 gvB gvA gvC

adhaerens medicae arboris meliloti adhaerens terangae adhaerens ...... R 7457 E 20216 LMG Cluster B: 2133ORS Cluster B 2154ORS Cluster B: LMG 9954 LMG E 7834T LMG C: Cluster 1400 ORS C: Cluster 1401 ORS C: Cluster LMG LMG 1419T A: Cluster 1444ORS A: Cluster 1410ORS A:Cluster LMG 6133T LMG E 19920T LMG E Substrate E 1407 ORS E 529 ORS E STM 354 control ------α-D (+) Glucose + + + + - + + + + + + + + + + + L(+) Sorbose + +/- - - + + - - +/------α-D Melibiose + - + + + + + + + + + + + + + +/- D(+) Raffinose + - + - + + + + + + + + +/- + + + Maltotriose + + + + + + + + + + + + + - - +/- D(+) Cellobiose + + - + + + + + + + + + + + + + β-Gentiobiose + +/------+ +/- +/- +/- +/- - - +/- +/-

1-0-Methyl-β-D- + + + + - +/- +/- + + + + + + + + + glucopyranoside Esculin* -b -b -b -b -b -b -b -b - b - b - b - b - b - b - b - b D(-)Ribose + + + + - +/- + + +/- + + + + +/- + - L(+) Arabinose + + + + + + + + + + + + + + +/- - α-L(-)Fucose + - + + + + + + + + + + + + + +/- D(+) Melezitose + - + + + + + + + + + + +/- - - - L(-) Arabitol + + + + + + + + + + + + - - - - Xylitol + + + - - - + + + + + + + - - - Dulcitol + + + - - +/- +/- - +/------D-Tagatose + + + - + + + - + ------Adonitol + +/- + + + -i + + + + + + + + + + Hydroxyquinoline-β- + - -i -i - - -i - -i ------i glucuronide* D-Lyxose -i + +/- - - + + + + + + + - - - - i-Erythritol + + + + + + + + + + + + - - - - 1-O-Methyl-α-D- + - + - - + - + + +/- +/- +/- +/- - - - glucopyranoside 3-O-Methyl-D- + +/- + - - + + ------Glucopyranose D(+) Malate - - + - - - + +/- +/- + + +/- +/- - - - L(-) Malate + + + +/- - +/- + + + +/- + +/- + +/- +/- - cis-Aconitate - - + - - - + + + - + +/- +/- - - - trans-Aconitate ------+ + - + + +/- + +/- +/- - Citrate - - + - - - - - + - - - +/- - - - D-Glucoronate ------+ + + +/- + +/- +/- - - - D-Galacturonate - - + - - - + + + + + + + - - - 2-Keto-D-Gluconate + + +/- + + + ------+ + +/- + 5-Keto-D-Gluconate - - +/- - - - + ------L-Tryptophan* -i - -i -c - -c +/-c +c +/- +/-c +c +c -i -i - - N-Acetyl-D- + + + +/- + +/- + + + + + + + + +/- +/- Glucosamine D- Gluconate - - + - - +/- + - + - - - +/- - - - Phenylacetate ------Protocatechuate + -j + +/- - + + + + + + + + + + + p-Hydroxybenzoate - - + - - - - + - + + + + - - - (-) Quinate - + + - + + + + + - - + + + + + Trigonelline + + - - + ------Putrescine ------+ +/- + - - - - DL-α-Amino-n- + + + +/- - + + + + + + + + + + + Butyrate DL-Lactate + - + + + + + + + + +/- + +/- + +/- - Caprate ------+ + + + + - - - - L.Histidine* + + + - - +/- - + + +i +i +i +/-i +i + +/- Succinate + + + - - +/- +i + +/- + + + - - - +/- Fumarate + + + - - +/- + + +/- + + + +/- +/- +/- +/- Glutarate - - + - - - +/- + ------DL-Glycerate +/- +/- + - - - +/- - +/- +/- + +/- - - +/- - DL-α-Amino-n- +/- - + - - - +/- - +/- +/- + + - +/- +/- - Valerate Ethanolamine - +/- + - - +/- +/- - + +/- + +/- - - - - D-Glucosamine + + + + - +/- + + +/- + + + + + +/- - Itaconate - + ------DL-β- - + + - - +/- + + + + + + + + + + Hydroxybutyrate L-Aspartate + + + - - +/- + + + + + + +/- +/- - - D-Alanine ------L-Alanine + + + - - + + + + + + + + + + +/- L-Serine - - + - - - + + + + + + +/- + +/- +/- Malonate - +/- + +/- - +/------+/- +/- +/- Propionate + - + - - +/- + + +/- + + +/- - - - - L-Tyrosine ------+/- - +/- +/- - - - +/- α-Ketoglutarate - - +/------26 27

28 Figure 1. Phylogenetic reconstruction based on individual analysis of the 16S rRNA gene. Analysis were conducted using the ML method. BT values of 75 or more (using 100 replicates) are indicated at branching points. Bars, 0.1% estimated substitutions.

Bradyrhizobium japonicum USDA 110 Agrobacterium tumefaciens C58 Ensifer adhaerens gv. A R-9451 Ensifer adhaerens gv. A LMG 10007 Ensifer adhaerens gv. A LMG 9954 Ensifer adhaerens gv. B R-7457 Ensifer sp. LMG 20571 94 "Sinorhizobium morelense" LMG 21331T Ensifer adhaerens gv. C LMG 20216 Ensifer adhaerens gv. C LMG 20582 Ensifer adhaerens gv. C R-14067 Ensifer sp. ORS 2133 77 Ensifer sp. ORS 529 Cluster B Ensifer sp. ORS 2154 Sinorhizobium americanum LMG 22684T Ensifer fredii LMG 6216 Ensifer fredii LMG 6217T Ensifer fredii LMG 6218 Ensifer fredii LMG 8317 Ensifer xinjiangensis LMG 17930T Ensifer xinjiangensis R-16438 Ensifer arboris LMG 14917 Ensifer arboris LMG 14919T Ensifer arboris LMG 19223 Ensifer medicae LMG 18864 Ensifer medicae LMG 19920T Ensifer medicae LMG 19921 Ensifer meliloti 1021 Ensifer meliloti LMG 4289 Ensifer meliloti LMG 6130 Ensifer meliloti LMG 6133T Ensifer sp. ORS 1407 Ensifer sp. ORS 1410 Cluster A 97 Ensifer sp. ORS 1444 Ensifer kostiensis LMG 14911 94 Ensifer kostiensis LMG 19225 Ensifer kostiensis LMG 19227T T 99 Mesorhizobium loti LMG 6125 Mesorhizobium huakuii MAFF 303099 Ensifer saheli LMG 7837T 78 Ensifer saheli LMG 11864 Ensifer saheli LMG 8310 Ensifer terangae LMG 11859 Ensifer terangae LMG 6464 Ensifer terangae LMG 7834T Ensifer sp. ORS 1400 Ensifer sp. ORS 1401 85 Cluster C Ensifer sp. STM 354 Rhizobium leguminosarum 3841 82 Rhizobium etli USDA9032T 0.1

Figure 2: Phylogenetic trees for five housekeeping genes (atpD, gltA, glnA, recA, thrcC) and the concatenated tree of Tunisian and Senegalese strains compared with Ensifer species reference strains. Trees were calculated using the maximum likelihood method (ML). Bootstrap values of 75 or more derived from 100 replicates are indicated in branching points. Bars, 0.1 % estimated substitutions

atpDML gltAML glnAML B. japonicumIa USDA 110 B. japonicumIa USDA 110 B. ja po ni cumIa USDA 110 A. tumefaciensC58 R. etliCFN42 A. tumefaciensC58 T R. leguminosarum 3841 S. americanumLMG 22684T S. americanumLMG 22684 T T S. americanumLMG 22684 E. xinjiangensisR-16438 99 E. xinjiangensisLMG 17930 T T E. xinjiangensisR-16438 E. xinjiangensisLMG 17930 100 E. xinjiangensisLMG 17930 E. xinjiangensisR-16438 E. frediiLMG 8317 E. frediiLMG 8317 T T 100 E. frediiLMG 6217 100 E. frediiLMG 8317 E. frediiLMG 6217 E. frediiLMG 6217T E. frediiLMG 6218 E. frediiLMG 6218 E. frediiLMG 6218 E. frediiLMG 6216 E. frediiLMG 6216 E. frediiLMG 6216 E. sp. ORS 1400 E. kostiensisLMG 19225 100 T E. terangaeLMG 6464 E. sp. ORS 1401 Cluster C E. kostiensisLMG 19227 T E. kostiensisLMG 14911 E. terangaeLMG 7834 100 E. sp. STM 354 97 E. terangaeLMG 11859 E. terangaeLMG 6464 E. meliloti1021 100 T E. melilotiLM G 61 30 E. sp.ORS 1400 E. terangaeLMG 7834 100 T 100 E. sp.ORS 1401 E. terangaeLMG 11859 E. melilotiLM G 61 33 Cluster C E. melilotiLM G 42 89 E. sp.STM 354 E. sp. ORS 1407 100 E. sp. ORS 1407 E. sp.ORS 1410 E. arborisLMG 14917 100 Cluster A Cluster A 100 E. arborisLMG 14919T E. sp.ORS 1410 E. sp.ORS 1444 E. sp.ORS 1444 E. medicaeLMG 19920T E. arborisLMG 19223 100 E. medicaeLMG 19920T E. meliloti1021 E. medicaeLMG 18864 83 E. melilotiLMG 6130 E. medicaeLMG 19921 E. medicaeLMG 18864 E. medicaeLMG 19921 E. melilotiLMG 6133T E. meliloti1021 100 99 E. saheliLMG 7837T E. melilotiLMG 4289 100 E. melilotiLMG 6130 T E. saheliLMG 8310 E. medicaeLMG 19921 E. melilotiLMG 6133 100 E. saheliLMG 11864 E. medicaeLMG 19920T E. melilotiLMG 4289 100 E. medicaeLMG 18864 78 E. arborisLMG 14917 E. sp.ORS 1400 100 T T E. sp.ORS 1401 Cluster C E. saheliLMG 7837 E. arborisLMG 14919 100 88 97 E. sp.STM 354 E. saheliLMG 8310 E. arborisLMG 19223 T 100 E. terangaeLMG 6464 E. saheliLMG 11864 E. saheliLM G 7837 T T 100 E. terangaeLMG 7834 97 E. arborisLMG 14919 E. saheliLM G 8310 E. terangaeLMG 11859 E. arborisLMG 19223 E. saheliLM G 11864 E. sp.ORS 1407 E. arborisLMG 14917 E. kostiensisLMG 19225 E. kostiensisLMG 19225 100 E. kostiensisLMG 19227T E. sp.ORS 1410 Cluster A 100 E. kostiensisLMG 19227T E. kostiensisLMG 14911 100 E. sp.ORS 1444 E. kostiensisLMG 14911 E. adhaerensgv. B R-7457 75 E. adhaerensgv. B R-7457 “S. morelense”LMG 21331T E. adhaerensgv. C R-14067 100E. adhaerensgv. C R-14067 E. adhaerensgv. B R-7457 E. adhaerensgv. C LMG 20582 E. adhaerensgv. C LMG 20582 T T E. adhaerensgv. C LMG 20216 100 E. adhaerensgv. C LMG 20216T E. adhaerensgv. C LMG 20216 “S. morelense”LMG 21 33 1T 91 E. adhaerensgv. C R-14067 E. sp.ORS 2133 86 E. adhaerensgv. C LMG 20582 E. sp.LM G 205 71 E. sp.ORS 529 Cluster B 100 E. adhaerensgv. A R-9451 E. sp.ORS 2154 E. sp.ORS 2133 E. sp.ORS 529 Cluster B 93 E. adhaerensgv. A LMG 10007 98 E. adhaerensgv. A R-9451 E. sp.ORS 2154 E. adhaerensgv. A LMG 9954 E. adhaerensgv. A LMG 10007 97 98 E. adhaerensgv. A R-9451 A. tumefaciensC58 E. adhaerensgv. A LMG 9954 80 E. adhaerensgv. A LMG 10007 E. sp.LMG 20571 “S. morelense”LMG 21331T 97 0.1 E. adhaerensgv. A LMG 9954 E. sp.ORS 2133 0.1 Cluster B E. sp.LMG 20571 R. etliCFN42 E. sp.ORS 529 99 100 0.1 R. etliCFN42 95 R. leguminosarum3841 E. sp.ORS 2154 R. leguminosarum3841 98M. huakuiiMAFF303099 99 M. huakii MAFF303099 M. huakuiiMAFF303099 M. lotiLMG 6125T T M. lotiLMG 6125 M. lotiLMG 6125T 96

88 recAML thrC ML congruent genes ML B. japoni cumIa USDA 110 B. japoni cum Ia USDA 110 T E. adhaerens gv. C R-14067 S. americanumLMG 22684 99 E. adhaerens gv. C LMG 20582 A. tumefaciensC58 E. xinjiangensisR-16438 E. adhaerens gv. C LMG 20216T E. adhaerensgv. B R-7457 E. xinjiangensisLMG 17930T 98 E. adhaerens gv. B R-7457 E. adhaerensgv. C R-14067 100 E. frediiLMG 8317 T E. adhaerensgv. C LMG 20582 E. frediiLMG 6217 E. sp. ORS 2133 100 100 E. adhaerensC LMG 20216T E. frediiLMG 6218 E. sp. ORS 529 Cluster B 100 E. frediiLMG 6216 92 E. sp. ORS 2154 E. sp.ORS 2133 98 E. adhaerens gv. A R-9451 100 E. sp.ORS 1407 E. sp.ORS 529 Cluster B 100 E. sp.ORS 1410 E. adhaerens gv. A LMG 10007 E. sp.ORS 2154 Cluster A 100 E. sp.ORS 1444 89 E. adhaerens gv. A LMG 9954 100 E. adhaerensgv. A R-9451 T E. meliloti1021 “S. morelense”LMG 21331 E. adhaerensgv. A LMG 10007 E. melilotiLMG 6130 80 E. sp. LMG 20571 100 98 E. adhaerensgv. A LMG 9954 E. melilotiLMG 6133T E. sp. ORS 1400 98 “S. morelense”LMG 21331T E. melilotiLMG 4289 E. sp. ORS 1401 Cluster C E. arborisLMG 19223 100 E. sp. STM 354 100 E. sp. LMG 20571 99 E. arborisLMG 14919T E. terangae LMG 6464 E. sp.ORS 1400 100 87 E. terangae LMG 7834 T E. sp.ORS 1401 Cluster C E. arborisLMG 14917 100 79 T E. terangae LMG 11859 100 E. sp.STM 354 E. medicaeLMG 19920 T 100 E. xinjiangensis LMG 17930 97 E. terangaeLMG 6464 E. medicaeLMG 18864 T E. medicaeLMG 19921 E. xinjiangensis R-16438 E. terangaeLMG 7834 100 E. fredii LMG 8317 100 Cluster A E. kostiensisLMG 19225 E. fredii LMG 6217T E. terangaeLMG 11859 E. kostiensisLMG 19227T E. sp.ORS 1407 100 E. fredii LMG 6218 E. kostiensisLMG 14911 E. fredii LMG 6216 E. sp.ORS 1444 E. saheliLMG 11864 94 100 E. sp. ORS 1407 E. xinjiangensisR-16438 T T E saheliLMG 7837 E. sp. ORS 1444 Cluster A 100 E. xinjiangensisLMG 17930 E. saheliLMG 8310 E. frediiLMG 8317 100 E. kostiensis LMG 19225 E. adhaerensgv. B R-7457 E. kostiensis LMG 19227T E. frediiLMG 6217T E. adhaerensgv. C R-14067 99 E. kostiensis LMG 14911 99 E. frediiLMG 6218 E. adhaerensgv. C LMG 20582 E. meliloti 1021 E. frediiLMG 6216 E. adhaerensgv. C LMG 20216T T 100 E. meliloti LMG 6130 E. kostiensisLMG 19225 “S. morelense”LMG 21331 T 95 E. meliloti LMG 6133 100 T E. sp.LMG 20571 E. meliloti LMG 4289 E. kostiensisLMG 19227 E. sp. ORS 2133 98 E. medicae LMG 19920T E. kostiensisLMG 14911 E. sp.ORS 529 Cluster B 100 E. medicae LMG 18864 E. meliloti1021 E. sp.ORS 2154 E. medicae LMG 19921 100 E. melilotiLMG 6130 E. adhaerensgv. A R-9451 T 95 E. arboris LMG 14917 E. melilotiLMG 6133 E. adhaerensA LMG 10007 E. arboris LMG 14919T E. melilotiLMG 4289 97 0.1 99 E. adhaerensgv. A LMG 9954 E. arboris LMG 19223 E. arborisLMG 14917 Cluster C T 100 E. sp. ORS 1400 E. saheli LMG 7837 E. arborisLMG 14919T 100 E. sp.ORS 1401 100 E. saheli LMG 8310 E. arborisLMG 19223 E. sp.STM 354 100 E. saheli LMG 11864 E. medicaeLMG 19920T E. terangaeLMG 6464 0.1 100 A. tumefaciens C58 E. medicaeLMG 18864 100 E. terangaeLMG 7834T R. etli CFN42 E. medicaeLMG 19921 E. terangaeLMG 11859 97 R. leguminosarum 3841 E. saheliLMG 11864 100 A. tumefaciensC58 M. huakuii MAFF303099 T 0.1 R. etliCFN42 100 M. loti LMG 6125T E. saheliLMG 7837 R. leguminosarum3841 E. saheliLMG 8310 95 M. huakuiiMAFF303099 100 R. etliCFN42 M. lotiLMG 6125T R. leguminosarum3841 100 M. huakii MAFF303099 M. lotiLMG 6125T

100 Rhizobium giardinii 93 bv giardinii H152T . 78 Rhizobium galegaebv galegae R602sp . Allorhizobium undicola ORS995 . Rhizobium tropici CFN299 . 100 70 Mesorhizobium plurifarum ORS1001 . 100 Mesorhizobium plurifarum ORS1096 . Ensifer terangae bv acacia ORS1073 . 100 Ensifer terangae ORS1009 . 99 98 Ensifer kostiense HAMBII1489 . Ensifer saheli baHAMBI1496 . Ensifer arboris HAMBI1552 . 100 Burkholderia tuberum STM678 . Methyllobacterium nodulans ORS2060 . 100 Mesorhizobium loti NZP2037 . 66 Mesorhizobium loti CIAM1801 . 100 Mesorhizobium mediterraneum UPMCa36 . 99 Me so rh izo bi um ci ce r i UPMCa7 . MesorhizobiumhuakuiiMesorhizobium Ra5 . 90 Mesorhizobium septentrionale SDW037 . 100 Mesorhizobium septentrionale SDW012 . Ra5 . 100 Mesorhizobium temperatum SDW018 . Mesorhizobium tianshanense USDA3592 . 100 100 Bradyrhizobium elkanii USDA61 . Bradyrhizobium elkanii USDA94 . Bradyrhizobium japonicum 100 USDA110 . Ensifer fredii HH103. 83 Ensifer fredii USDA191 . 100 Rhizobium galegae bv phaseoli PhD12 . Rhizobium giardinii bv phaseoli Ro84 . Rhizobium etli CFN42 . 97 Rhizobium taiwanense LMG19424 . B. phymatumSTM815 . 100 Rhizobium huautlense USDA4900 . 100 82 Ensifer saheli ORS609 . EnsiferEnsif er terangae bv sesbaniae ORS604 . Ensifer sahel i bv sesbaniae ORS611 . 100 92 ORS529/ORS529 Lotus Arabicus / Sénégal / Cluster B ORS1444/ Argyrolobium uniflorum / Tunisie. / Cluster A 99 99 ORS1400/ Argyrolobium uniflorum / Tunisie/Cluster C STM35 4/ Medicago sativa / Tuni si e/. Cl uster C 99 80 Ensif er medicae M3 . 100 Ensif er medicae A3 21 . Ensif er mel iloti 1021 . Ensif er mel iloti USDA1170 . Rhizobium galegae HAMBI1207 . Azorhizobium caulinodans ORS571 .

0.05

Figure 3: Neighbor-joining phylogenetic tree of nodA gene sequences of Tunisian end Senegalease strains. The new strains are indicated in blot and bootstrap value> 60 % resulting from 1000 replicates are indicated in branching points.