bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Plasmids encode niche-specific traits in

2

3 Dimple Davray, Dipti Deo and Ram Kulkarni*

4

5 Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Lavale,

6 Pune 412115, India

7

8 *Corresponding Author

9 Dr. Ram Kulkarni

10 [email protected] or [email protected];

11 Tel.: +91 2028116477

12

13 Keywords: Lactobacillaceae, plasmid-encoded trait, ecological niche, comparative genomics,

14 stress resistance genes, Clusters of Orthologous Groups

15

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

17 Abstract

18 The species of family Lactobacillaceae are found in highly diverse environments and play an

19 important role in fermented foods and probiotic products. Many of these species have been

20 individually reported to harbor plasmids that encode important genes. In this study, we

21 performed comparative genomic analysis of the publically available data of 512 plasmids from

22 282 strains represented by 51 species of this family and correlated the genomic features of

23 plasmids with the ecological niches in which these species are found. Two-third of the species

24 had at least one plasmid-harboring strain. Plasmid abundance and GC content were significantly

25 lower in the vertebrate-adapted species as compared to the nomadic and free-living species.

26 Hierarchical clustering (HCL) highlighted the distinct nature of plasmids from the nomadic and

27 free-living species than those from the vertebrate-adapted species. EggNOG assisted functional

28 annotation revealed that genes associated with transposition, conjugation, DNA repair and

29 recombination, exopolysaccharide production, metal ion transport, toxin-antitoxin system, and

30 stress tolerance were significantly enriched on the plasmids of the nomadic and in some cases

31 nomadic and free-living species. On the other hand, genes related to anaerobic metabolism, ABC

32 transporters, and major facilitator superfamily were found to be overrepresented on the plasmids

33 of the vertebrate-adapted species. These genomic signatures are correlated to the comparatively

34 nutrient-depleted, stressful and dynamic environments of nomadic and free-living species and

35 nutrient-rich and anaerobic environments of the vertebrate-adapted species. Thus, these results

36 indicate the contribution of the plasmids in the adaptation of lactobacilli to the respective

37 habitats. This study also underlines the potential application of these plasmids in improving the

38 technological and probiotic properties of lactic acid . bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

39 Impact statement

40 The bacteria of the family Lactobacillaceae are present in the wide range of habitats and play an

41 important role in human health, fermented foods and chemical industries. A few studies have

42 demonstrated the presence of plasmids in the individual strains of Lactobacillaceae species

43 encoding various traits. Extensive data of genome sequences of the lactobacilli are becoming

44 available; however, no comprehensive analysis of the plasmid-encoded genes and determining

45 their biological relevance across lactobacilli has been undertaken at a larger scale. In this study,

46 we explored the genomic content of 512 plasmids of Lactobacillaceae species and correlated it

47 to the three types of these species according to their ecological niches – vertebrate-adapted, free-

48 living and nomadic. Comparatively lower plasmid abundance and GC content in the vertebrate-

49 adapted species could be correlated to the presence of these species in the nutrient-rich

50 environment. The genomic content of the plasmids was consistent with the respective lifestyle

51 adopted by lactobacilli suggesting that the plasmids might enhance the niche-specific fitness of

52 the strains. The plethora of important genes present on the plasmids can also make them a highly

53 useful tool in improving the probiotic, technological and food-related properties of lactobacilli.

54 Data summary

55 Nucleotide sequences of plasmids of strains for which complete genome sequences

56 were available were retrieved from the NCBI genome [https://www.ncbi.nlm.nih.gov/genome]

57 and PATRIC 3.5.41 databases on 31st March 2019. The dataset includes 512 nucleotide

58 sequences of plasmids of 282 strains belonging to genus Lactobacillus before its reclassification

59 into several genera (1). Details of the plasmids have been given in Table S1. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

60 1. Introduction

61 Lactobacilli are gram-positive, aerotolerant, and non-sporulating bacteria which belong to the

62 (LAB) group wherein lactic acid is the major metabolic end product during

63 glucose fermentation (2). Lactobacillus is a major genus of the Lactobacillaceae family and the

64 highly diverse nature of its species has recently resulted in reclassification of this genus into 23

65 genera (1). Many lactobacilli have been associated with humans in a variety of ways. Their

66 presence in the gastrointestinal tract has earned extraordinary attention due to the health-

67 promoting properties of few of the genera. These bacteria are also important in the food industry

68 for the production of fermented dairy products wherein these bacteria provide the taste, texture,

69 and antibacterial activity (3). Furthermore, lactobacilli have applications in the production of

70 industrially important chemicals such as lactic acid (4).

71 Lactobacilli have been isolated from a wide range of habitats and have been found to

72 have highly diverse traits. Based on the properties such as phylogenetic analysis, source of

73 isolation, the prevalence of detection, optimal growth temperature, substrate utilization, and

74 environmental stress tolerance, the species of the earlier Lactobacillus genus have been classified

75 as host-adapted, nomadic and free-living (5). Some of the genomic features of these species are

76 correlated with these habitats. For example, the vertebrate-adapted lactobacilli (VAL) have

77 undergone genome reduction by losing a substantial number of genes involved in carbohydrate

78 metabolism and amino acid and cofactor biosynthesis because they thrive in a nutrient-rich

79 environment (5), (6), (7). On the other hand, free-living lactobacilli (FLL) and nomadic

80 lactobacilli (NL) have larger genome sizes because they encode various enzymes which utilize a

81 broad spectrum of carbohydrates (8). Furthermore, Lactobacillus helvticus DPC4571 and

82 Lactobacillus acidophilus NCFM were found to have genes important for the adaptation to their bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

83 habitats, viz., dairy and gut, respectively (9). Lastly, Lactiplantibacillus plantarum, a hallmark

84 NL species found in most diverse habitats was also observed to have a highly diverse genome

85 with no unique environmental signature (10). Another classification of lactobacilli is based on

86 the carbohydrate fermentation pathways, as homofermentative and heterofermentative. In the

87 homofermentative species, the phosphotransferase system (PTS) is preferred to transport the

88 sugars which are fermented via the Emden-Meyerhoff pathway. On the other hand, in the

89 heterofermentative species, PTS are not functional and the phosphoketolase pathway is the

90 preferred catabolic pathway (11), (12).

91 Plasmids have been extensively found in lactobacilli and reported to carry the genes that

92 give them favorable traits and permit them to survive in competitive surroundings (13). Many

93 plasmids have been sequenced from the individual strains and found to encode important

94 functions such as oxidative stress response, antibiotic resistance, bacteriophage resistance,

95 chloride and potassium transport, and bacteriocin production (14), (15). Only a handful of reports

96 are available on the importance of plasmids in the niche adaptation of lactobacilli. In L.

97 plantarum P-8 isolated from fermented dairy products, loss of the plasmids important for dairy-

98 adaptation was found when the strain was administered in rats and humans (16). A plasmid from

99 Lacticaseibacillus paracasei NFBC338 isolated from the human gut encoded a gene involved in

100 adherence to the human intestinal epithelial cells (17). In Levilactobacillus brevis, the brewery

101 isolates were found to have the unique genes on the plasmids supporting the growth under the

102 harsh conditions. At the same time, a large proportion of brewery plasmidome was also found to

103 be shared with the insect isolates (18). Similarly, in Lactococcus lactis, another important

104 bacterium of the LAB group, plant cell wall modifying enzymes were encoded by the plasmids

105 in the plant-derived isolates (19). Apart from these scarce strain and species-specific reports, the bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

106 contribution of plasmids in niche-adaptation has not been holistically studied in the

107 Lactobacillaceae family. In the last decade, Next Generation Sequencing (NGS) has created

108 extensive data on the genome sequences of many strains of lactobacilli. In the current study, we

109 report an analysis of the plasmid sequence data of the species of genus defined as Lactobacillus

110 before its division into several genera (1), available in the public domains to explore their

111 genomic content and their possible involvement in the niche adaptation.

112 2. Material and Methods

113 2.1 General characterization of the plasmids

114 The coding sequences (CDSs) of the 512 plasmids were extracted from the GenBank database,

115 along with their NCBI annotation (https://www.ncbi.nlm.nih.gov/nucleotide) (20). The

116 plasmids for which the encoded genes were not annotated in NCBI were subjected to the analysis

117 using Prodigal software (version 2.6) for predicting the CDSs.

118 2.2 Functional annotation

119 The functional annotation and classification of the CDSs were performed using eggNOG 4.5

120 databases (http://eggnogdb.embl.de/). For subgrouping of the genes under each COG category,

121 their annotations in eggNOG4.5, NCBI and KEGG databases were considered.

122 2.3 Markov clustering analysis

123 The all-against-all bi-directional blast alignment was performed on the obtained CDSs with at

124 least 50% amino acid identity and 50% query coverage (20). The blast output was analyzed using

125 markov clustering (MCL) in the mclblastine v12-0678 pipeline to classify proteins into families

126 (21). Further, Hierarchical clustering was computed in TM4 MeV Suite (22) based on the

127 presence and absence of the protein families in plasmids by using average linkage algorithm and bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

128 Manhattan distance metric parameters The HCL analysis was visualized in the Interactive Tree

129 of life (ITOL) (23) by importing Newick tree from TM4 MeV Suite. For identification of the

130 plasmids which are possibly shared between the species from different habitats, a similarity

131 matrix was created for the plasmids based on the presence and absence of the MCL families

132 using the Sørensen–Dice coefficient. The Genbank files of the plasmids with Sørensen–Dice

133 similarity coefficient of 0.8 or more were compared to each other using EasyFig program (24).

134 2.4 Identification of other important genes

135 To identify antibiotic resistance gene, initially, amino acid sequences of 2,404 antibiotic

136 resistance (ARGs) genes were downloaded from the CARD database

137 (http://arpcard.mcmaster.ca). Plasmids CDSs were aligned against retrieved ARGs sequences

138 through standalone blast (BLASTP,ver.2.9.0,

139 https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/) with a threshold of query coverage of

140 >70%, amino acid sequence identity of >30%, and e-value <10-5 The exopolysaccharide (EPS)

141 gene clusters were identified as described earlier (25). Bacteriocin operons were identified using

142 the web version of BAGEL4 (http://bagel4.molgenrug.nl) using default parameters.

143 2.5 Statistical analysis

144 GraphPad Prism 8 was used to perform statistical analysis.All the comparisons between the

145 habitats was performed by Kruskal-Wallis test with Dunn’s post hoc test. Wilcoxon Signed Rank

146 Test was used to compare GC content of the plasmids with that of the chromosome of the same

147 strain and species.

148 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

149 3. Result and Discussion

150 3.1 Data collection and general characterization

151 Till March 2019, NCBI prokaryotic genome database was found to have complete genome

152 sequences for 282 strains under the genus defined as Lactobacillus before its reclassification into

153 several genera (1). With the current classification, this data includes 24 of the 31 genera under

154 the Lactobacillaceae family. Collectively, nucleotide sequences of a total of 512 plasmids were

155 found in the database for all these strains (Table 1, Table S1). Based on the classification by

156 Duar et al. (5), in our dataset, the maximum number of strains (110) belonged to NL group

157 followed by VAL (86),FLL (43) and insect adapted (6) groups. Forty strains belonged to the

158 species for which no clear information on their possible native habitat was available; hence they

159 were not considered for further analysis. Plasmids from insect adapted strains were excluded

160 from the subsequent analyses as only three plasmids were represented by this category of the

161 strains. Similarly, based on the glycolytic pathway, 216 strains belonged to homofermentative

162 and 51 to heterofermentative species (11), (12).

163 The proportion of strains having plasmids was the smallest for VAL (32.2%) and the

164 largest for FLL (76.7%) (Fig. 1a). The average number of plasmids per strain was significantly

165 lower for VAL (1) than NL (2.3) and FLL (2.7) (Fig. 1b, Table S1). On the other hand, the

166 average number of plasmids per strain for each species was significantly higher only in FLL as

167 compared to VAL, though this average was three-fold higher for NL than VAL (Fig. 1c). This

168 suggests that some of these differences could be driven by strains of only a few species; whereas

169 some others are possibly common to majority of the species of that habitat. Overall, the lower

170 plasmid abundance in VAL as compared to NL and FLL is in agreement with the similar

171 observation for the chromosomal genome size (5). Such reduction is considered to be due to the bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

172 loss of several biosynthetic pathway genes as the bacteria are living in a nutrient-rich

173 environment in the hosts. Furthermore, loss of plasmids in a dairy isolate of Lactobacillus when

174 administered in the human and rat has also been reported. Such loss has been postulated to be

175 because of the superfluous nature of the genes encoded by the lost plasmid in the gut

176 environment as well as stress response of the bacteria during the passage through the harsh

177 gastric environment (16). There was no difference in the average plasmid size per strain across

178 various habitats (Fig. 1d), suggesting that the low number of plasmids in VAL is not

179 compensated by their larger size in these strains.

180 The average GC content of the plasmids per strains was significantly lower for VAL

181 (37.7%) than NL (39.7%) and FLL (39.7 %) (Fig. 1e). At the species level, this difference was

182 significant only between VAL and NL (Fig. 1f). This observation is similar with the lower

183 chromosomal GC content of the host-associated strains, which is thought to be because of non-

184 adaptive loss of DNA repair system leading to the mutational bias toward A and T (5). The

185 plasmid GC contents per strain and per species for NL and FLL were lower than the respective

186 chromosomal GC contents; whereas, in VAL, plasmids and chromosomes had similar GC

187 content (Fig. 1e and 1f). This observation was also valid for similar comparison within the

188 individual strains wherein the average plasmids GC content was found lower than the respective

189 chromosomal GC content in all the NL and in 27 of 33 FLL. It has been reported that the

190 maintenance of plasmid with higher GC content is metabolically expensive for the bacterial cells

191 with lower chromosomal GC content and is thus evolutionary unfavourable (26). It has also been

192 proposed that plasmids can be transferred from chromosomally GC-poor hosts or environments

193 to relatively GC-rich hosts (27). Thus, the lower GC content of plasmids from NL and FLL than

194 their chromosomes is not surprising and is indeed an indication of their possible acquisition by bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

195 horizontal gene transfer. On the other hand, it is suggested that with time the nucleotide

196 composition of plasmids might become closer to that of the host genome because of the

197 dependency of plasmid replication on the host cell machinery (28), (29). Thus, the comparable

198 GC content of VAL chromosomes and plasmids probably suggests their long association with

199 each other. No correlation was observed between GC content and size of the plasmids (data not

200 shown) in contrast to the earlier observation (30).

201 3.2 Hierarchical clustering (HCL) analysis

202 To understand the extent of similarity between the plasmids based on the genomic content, their

203 functional grouping was performed by HCL analysis. Based on markov clustering analysis

204 (MCL), the coding sequences (total 19,058) could be classified into a total of 3,380 protein

205 families, with about half of them (1,560) having a single-member. This observation is similar to

206 that on Lc. lactis plasmids, where also approximately half (413 out of 885) of the protein families

207 were singleton (31). Plasmids from VAL had the highest proportion (73%) of unique families,

208 followed by NL (55%) and FLL (44%) (Fig. 2a). This probably suggests the uniqueness of the

209 plasmids in VAL, which might have arisen during the long course of association of plasmids

210 with these species as speculated from the GC content. The highest sharing of families of NL with

211 VAL and FLL could be due to the fact that NL is found in a wide range of environments,

212 including those associated with the hosts as well as those similar to FLL (5).

213 The data obtained on protein families from MCL analysis was used to perform HCL to

214 cluster the plasmids based on the presence and absence of protein families (Fig. S1). The

215 analysis depicted three broad clusters (I, II and III) of the plasmids (Fig. 2b). Although there was

216 no strict distinct clustering based on the habitats, certain habitat-wise observations were made.

217 For example, half of the plasmids from VAL were present in the cluster I whereas the majority of bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

218 the plasmids from NL and FLL (57.9 % and 55.5 %, respectively) were present in cluster III. On

219 the other hand, almost equal proportions of plasmids from VAL (31.7 %), NL (33.1%) and FLL

220 (31.1 %) were present in cluster II (Fig. 2c). Within the three main clusters, 38 subclusters were

221 observed (Fig. 2b, Table S2). Out of 100 strains having two or more plasmids, 63 strains were

222 such that none of the plasmids from the same strain belonged to the same subcluster. This

223 observation probably indicates the non-redundant genomic content of multiple plasmids within a

224 strain in the majority of lactobacilli. Similar observation was reported in Lc. lactis IL594 where

225 multiple plasmids in the same strain were found to be diverse and proposed to function in the

226 synergistic manner (32).

227 Based on the Sørensen–Dice similarity coefficient which was a measure of extent of

228 sharing of the protein families between plasmids, 15 plasmid pairs with the Sørensen–Dice

229 coefficient of higher than 0.8 were identified (Table S3). Of all these plasmid pairs, four having

230 Sørensen–Dice coefficient of one had only one protein family and the sequence identity of <

231 85% with the partner plasmid with query coverage of < 60%, hence they were considered

232 dissimilar. Majority of the remaining pairs (6) had plasmids from the species belonging to

233 dissimilar habitats. BLAST analysis of the plasmids found to be similar to each other in this way

234 using EasyFig program suggested their highly similar gene content (Fig. S2). This is a suggestive

235 of the inter-species as well as inter-habitat sharing of some of the plasmids in lactobacilli.

236 Interestingly, none of the plasmids from these pairs belonged to VAL underlining the uniqueness

237 plasmids from VAL. This further supports the evolutionary model proposing that VAL have

238 evolved independently in the confined vertebrate-associated environment (5). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

239 3.3 Functional annotation

240 To predict the putative functions associated with the plasmid-encoded genes, they were classified

241 based on orthology using eggNOG. Based on this, 63.2% (12,120) of CDSs were annotated into

242 20 Clusters of Orthologous Groups (COGs) functional categories. Replication, recombination,

243 and repair was the largest category with 32.5% of the CDSs followed by Function unknown

244 (20.7% CDSs), Carbohydrate transport and metabolism (6.2% CDSs) and Transcription (5.3%

245 CDSs) (Fig. 3, Table S4). Furthermore, the genes were subgrouped under each eggNOG

246 category based on their putative functions based on eggNOG, NCBI, and KEGG annotations.

247 The genes under Function unknown category were manually evaluated for their annotations in

248 these databases and based on this some of the genes were binned into the individual COG

249 categories. For example, genes annotated as LPXTG-motif cell wall anchor domain protein that

250 were found under the Function Unknown category were manually put under Cell

251 wall/membrane/envelope biogenesis.

252 To test the hypothesis that plasmids play an important role in the environmental

253 adaptations in Lactobacillaceae, we investigated whether the plasmids encode genes with

254 varying putative functions across the lifestyles. The fraction of genes involved in Replication,

255 recombination, and repair (L), Cell wall/membrane/envelope biogenesis (M), and Inorganic ion

256 transport and metabolism (P) was significantly higher (p < 0.05) in the plasmids from NL as

257 compared to VAL (Fig. 3). Furthermore, the genes associated with Defense mechanisms (V) and

258 Energy production and conservation (C) were significantly overrepresented (p < 0.001, and p <

259 0.05 respectively) in VAL in comparison with NL (Fig. 3). Similarly, the average proportion of

260 genes involved in Transcription (K) was significantly higher (p < 0.05) in the plasmids from FLL

261 as compared to NL. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

262 3.3.1 Replication, recombination, and repair

263 Various subcategories under Replication, recombination, and repair (L) were transposase (1,928

264 CDSs), DNA replication (614 CDSs), resolvase (323 CDSs), integrase (256 CDSs),

265 mobilization/conjugation (221 CDSs), DNA repair (191 CDSs) and DNA repair and

266 recombination (99 CDSs) (Fig. 4a, Table S5). Within these, the genes encoding transposases,

267 resolvases, DNA repair and recombination proteins, and mobilization/conjugation proteins were

268 significantly overrepresented on the plasmids from NL than those from VAL considering the

269 strain-level average. Of these genes, resolvase and transposase were significantly

270 overrepresented on the plasmids from FLL as compared to VAL at species level also (Fig. 4a,

271 Table S6). Previous studies have demonstrated that transposases are more abundant in the

272 bacteria living in the extreme environment and they can help in the evolution of fitter phenotypes

273 by accelerating the rate of mutations (33), (34). In the context of lactobacilli, NL and FLL can be

274 considered relatively extreme as these species occur in a wider range of harsh environmental

275 conditions (5). Thus, higher proportion of mobile genetic elements in the plasmids of NL and

276 FLL may give them genomic plasticity for adaptation to the dynamic environments. Since NL

277 and FLL live are under more extreme conditions, they are likely to face more DNA damage,

278 which can justify the higher proportion of DNA repair genes in their plasmids. This speculation

279 is also supported by earlier observations reporting a higher proportion of DNA repair genes in

280 the thermophilic microorganisms (35). Furthermore, the low proportion of DNA repair genes in

281 VAL could also be correlated to the reduced genome size of these bacteria. The correlation

282 between smaller proteome and loss of the DNA repair genes has already been established in

283 several other bacteria (36). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

284 3.3.2 Carbohydrate transport and metabolism

285 In the current study total of 673 genes (6.2% of the total plasmid-encoded genes) associated with

286 Carbohydrate transport and metabolism (G) were found. Further, carbohydrates metabolism (332

287 CDSs), phosphotransferase system (PTS) (177 CDSs) and major facilitator superfamily

288 transporter (131 CDSs) were identified as the major subcategories (Fig. 4c, Table S5). Fourty-

289 nine complete PTS transporters were found in the 35 plasmids across 30 strains (Table S7).

290 Although there was no significant difference between the habitats for the average proportion of

291 PTS genes, this proportion per species was significantly higher for the NL as compared to HAL

292 and FLL. This observation was also reflected in the fact that the proportion of species having

293 complete PTS on plasmids was 22.2%, 9.1%, and 83.3% for FLL, HAL, and NL, respectively

294 (Table S7). NL are metabolically versatile and able to utilize a wide range of sugars. Thus, the

295 broader distribution of complete PTS on the plasmids from NL underlines the contribution of

296 plasmids in adaptation to diverse environments. All the 30 strains having complete PTS were

297 homofermentative which is consistent with the reported loss of PTS genes from the

298 chromosomes of heterofermentative species (11). These results suggest that plasmids can

299 significantly contribute to the physiology of the homofermentative species by allowing sugar

300 uptake via PTS systems.

301 The abundance of the genes from the KEGG category ‘starch and sucrose metabolism’ in

302 L. brevis, L. sakei, Secundilactobacillus paracollinoides, L. paracasei, which are FLL can be

303 clearly attributed to the presence of maltose phosphorylase and β-phospho-glucomutase in their

304 plasmids which allow utilization of maltose. Furthermore, a large number of

305 maltose/moltooligosaccharide transporters were encoded on the plasmids of FLL (Table S5).

306 Plasmids having such maltose utilization genes had lacI on their upstream region in the six bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

307 strains of L. sakei which is one of the FLL (Fig. 4b, Table S8). LacI transcriptional regulator was

308 shown to be involved in the maltose utilization in L. acidophilus, Enterococcus, and

309 Bifidobacterium (37). One of the major habitats for FLL is fermented cereals (5), which are

310 likely to be rich in maltose released by starch hydrolysis. This suggests that the plasmids might

311 contribute to the efficient utilization of maltose in FLL.

312 The complete lactose operon (lacTEFG) was found on the plasmids of six strains of L.

313 paracasei and one strain of Lacticaseibacillus rhamnosus (Table S9). A previous study has

314 shown that this operon also helps Lacticaseibacillus casei in the uptake of N-acetyllactosamine

315 in human milk (38). AraC was present upstream of the rhamnose utilization genes, viz., MFS

316 transporter, rhamnulokinase, L-rhamnose mutarotase, L-rhamnose isomerase, rhamnulose-1-

317 phosphate aldolase in five plasmids from Ligilactobacillus salivarius (Fig. 4b, Table S10). AraC

318 transcriptional regulator was shown to be involved in the rhamnose metabolism in L. acidophilus

319 and L. plantarum (39). These results suggest that plasmids might play an important role in the

320 growth and physiology of Lactobacillaceae species by allowing the uptake of various sugars.

321 Genes encoding major facilitator superfamily (MFS) proteins were found in significantly

322 higher proportion on the plasmids of VAL as compared to NL and FLL (Fig. 4c). MFS proteins

323 are involved in the transport of a large number of molecules across the membrane in response to

324 chemiosmotic gradient. These include sugars, vitamins, antibiotics, amino acids, nucleosides,

325 ions, etc. In general, majority of the MFS transporters in LAB are multidrug transporters (40).

326 Thus, their overrepresentation in the plasmids of VAL might be related to the antibiotic usage in

327 humans and other animals. Additionally, some of the MFS transporters are also involved in the

328 bile salt resistance in Lactobacillus (41) which is relevant for VAL. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

329 3.3.3 Inorganic ion transport and metabolism

330 Within the Inorganic ion transport and metabolism (P) category metal/ion transporters (435

331 CDSs), other transporters (87 CDSs) and DNA protection during starvation protein (dps) (46

332 CDSs) were identified as the dominant genes (Table S5). Metal/ion transporters genes were

333 significantly more abundant in the plasmids from NL and FLL than VAL considering strains-

334 level average (Fig. 4d, Table S6). This difference was also reflected in species-level average

335 between FLL and VAL. Under this category, a large number of genes putatively involved in

336 heavy metal resistance (HMRGs) were found including those ascribing resistance to arsenic (98

337 genes), cadmium (49 genes), copper (19 genes) and cobalt (13 genes) on the plasmids of 69

338 strains (Table S5). Such heavy metal resistance genes, especially those involved in arsenic

339 resistance were highly enriched on the plasmids of NL such as L. plantarum, L. paracasei, L.

340 paraplantarum as compared to VAL (65 and 4 genes, respectively). No comparative phenotypic

341 data on the heavy metal resistance of the Lactobacillaceae species belonging to various habitats

342 is available. However, the above observation is correlated to a finding in the case of

343 Acinetobacter and Ornithilibacillus where in the environmental isolates were more resistant to

344 heavy metals than the clinical isolates (42), (43). Further, the gene clusters involved in arsenate

345 resistance were detected in 15 plasmids from NAL and FLL (Fig. 5, Table S11). Eight plasmids

346 had complete clusters encoding all five proteins, viz., a regulator (arsR), ATPase component

347 (arsA), secondary transporter (arsB), transcriptional regulator (arsD), and arsenate reductase

348 (arsC). The other seven plasmids lacked arsC; however, chromosomal arsC can complement

349 such incomplete cluster as shown earlier in the case of L. plantarum WCFS1 (44).

350 Several genes related to uptake of the potassium were found in the plasmids of L.

351 plantarum and Lactiplantibacillus pentosus which are NL (35 genes) and L. brevis which is a bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

352 FLL (10 genes); whereas, none of the plasmids from VAL had this gene (Table S5). This

353 observation is consistent with the earlier report suggesting the potassium channels are not

354 essential in the host-dependent bacteria and are found mostly in the free-living and metabolically

355 versatile bacteria (45). Genes encoding manganese transport protein were found to be enriched in

356 plasmids of NL and mostly had dtxR family transcriptional regulator on the upstream side.

357 (Table S12). Other genes enriched on the plasmids from NL mainly include chloride channel

358 protein, calcium-transporting ATPase, magnesium transporter, and Na+/H+ antiporter (Table

359 S5). These proteins are involved in functions such as acid resistance, salt tolerance, metal ion

360 homeostasis, and oxidative stress tolerance (46), (47), (48), probably suggesting their

361 contribution to adaptation of NL to the dynamic environments.

362 The dps gene per strain was 10 and 3 fold more abundant on the plasmids from FLL

363 (found in four species) and NL (two species), respectively, as compared to VAL (one species)

364 (Fig. 4d). Dps is involved in tolerance towards oxidative, pH, heat, and irradiation stresses as

365 well as metal toxicity and attachment to abiotic surfaces during the stationary phase in E. coli

366 (49) and has been thought to protect L. plantarum GB-LP2 from the oxidative damage (50).

367 Higher abundance of this gene in the plasmids from FLL and NL reflects the association of these

368 strains with the open and harsh environment where exposure of such stresses is comparatively

369 higher.

370 3.3.4 Cell wall/membrane/envelope biogenesis

371 Within this category, heteropolysaccharides (HePS) biosynthesis (246 CDSs), adherence (89

372 CDSs), and cell wall biosynthesis and degradation (77 CDSs) were the major subcategories.

373 HePS biosynthesis was represented by 124 CDSs of glycosyltransferases (GTs), 21 CDSs of

374 tyrosine kinase modulator (epsB), 53 genes of polysaccharide synthesis proteins putatively bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

375 involved in HePS biosynthesis, and 48 genes involved in the biosynthesis of activated sugar

376 precursors (Fig. 4e, Table S5). The proportion of these CDS per strain involved in HePS

377 biosynthesis was 34 and 35 fold and significantly higher in the plasmids of NL and FLL,

378 respectively, as compared to VAL. The significant difference between VAL and NL was also

379 reflected in the species-level comparison (Fig, 4d, Table S6). As some of the GTs were also

380 represented in other COG categories such as Carbohydrate transport and metabolism and

381 Function unknown, their comprehensive identification in the Lactobacillus plasmids was

382 performed using dbCAN2 meta server (Table S13). A total of 99 GTs belonging to nine families

383 were found on 47 plasmids with 33 plasmids having two or more GTs with the dominance of

384 GT2 family (52% of the CDSs).

385 We further analyzed the plasmids for the presence of the HePS biosynthesis gene

386 clusters. Out of 512 plasmids, HePS genes clusters were found on 16 plasmids of 16 strains

387 represented by six species (Fig. 6, Table S14). Of these, four clusters had all the genes required

388 for HePS biosynthesis (25). Although a large number (12) of clusters analyzed in this way were

389 incomplete, they can contribute to the HePS production in collaboration with other EPS clusters

390 present in the chromosomes of the same strain. Such contribution of multiple HePS clusters

391 within a strain to various properties of HePS has been demonstrated in L. plantarum WCFS1

392 (51). The absence of epsA in the HePS clusters from L. plantarum plasmids is consistent with our

393 earlier similar observation in case of L. plantarum chromosomes (25). HePS is important in

394 many VAL for colonization in the gut and in few cases withstanding low pH and high bile

395 concentration in the stomach (52). The higher proportion of HePS related genes in the plasmids

396 from NL and FLL than VAL probably suggests that HePS might be even more important for NL

397 and FLL in adapting to the more diverse and harsh environments. This hypothesis is also bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

398 supported by a recent genome analysis showing that the capsular exopolysaccharide encoding

399 genes occur more frequently in the environmental bacteria than the pathogens (53).

400 A large number (143) of glycosyl hydrolases (GH) were also detected under in the

401 plasmids and were classified in 19 CAZy families (Table S15) according to the dbCAN2 meta

402 server. The contributions of GH1 (16.7%) and GH13 (15.3%) families were maximum followed

403 by GH25 (13.9%). The proportion of the GH genes on the plasmids was significantly higher in

404 FLL and NL as compare to VAL. Almost all the GH68 (fructansucrases) and GH70

405 (glucansucrase) which are involved in the production of homopolysaccharides were found only

406 on the plasmids of L. plantarum. Moreover, some of the L. plantarum strains had multiple of

407 such genes. For example, L. plantarum 16 had GH68 as well as GH70 on the same plasmid and a

408 complete heteropolysaccharide biosynthesis gene cluster on another plasmid. Similarly, L.

409 plantarum subsp. plantarum TS12 had two GH68, one on each of the two plasmids and also had

410 a GH70 on another plasmid. This is consistent with the higher occurrence of the genes involved

411 in the HePS biosynthesis in the plasmids of NL as described above. It is interesting to note the

412 absence of glucansucrase on the plasmids VAL species but their presence in the chromosome of

413 Limosilactobacillus reuteri which is a VAL (54).On the contrary, we found absence of

414 glucansucrase in the chromosomes of L. plantarum 16 and L. plantarum subsp. plantarum TS12,

415 which had these genes on the respective plasmids. This observation probably suggests the

416 obligatory requirement of glucansucrase for the lifestyle adapted by limosilactobacilli and its

417 conditional requirement in case of L. plantarum. The higher abundance of the lysozyme family

418 (GH23 and GH25) and N-acetylglucosaminidase (GH3 and GH73) genes in the plasmids of FLL

419 and NL might help them in hydrolyzing the cells walls of the competing bacteria in the open

420 environment. Such a higher abundance of lysozymes encoding genes in the environmental bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

421 isolates than the host-associated isolates has also been reported in E. coli (55). Similarly, higher

422 metabolic versatility of NL and FLL is highlighted by the abundance in their plasmids of GH1

423 and GH2 which enable utilization of various oligosaccharides and GH13 and GH65 which

424 enable utilization of starch.

425 We analyzed whether Lactobacillaceae plasmids encode any of the proteins required for

426 adhesion to the intestinal epithelial (56). Based on annotated data and manual curation, 91 genes

427 putatively involved in adhesion were observed in the lactobacilli plasmids. These genes mainly

428 encoded cell wall anchor domain protein (39 CDSs), sortase (34 CDSs) and collagen-binding

429 surface protein (16 CDSs) (Table S4). There was no significant difference between the habitats

430 for the proportion of the plasmidome encoding these genes collectively. However, a higher

431 number of the genes for collagen-binding surface protein were found on the plasmids of VAL (9

432 CDSs) as compare to NL (5 CDSs) and FLL (2 CDSs). This protein is involved in adhesion of

433 the lactobacilli to the mammalian extracellular matrix (57). The proportion of sortase and cell-

434 wall anchor domain proteins was higher in the plasmids from NL (37 CDSs) than VAL (5

435 CDSs). Sortases and sortase dependent proteins have been mostly studied for their role in

436 interactions with the host cells. However, large numbers of these genes have already been

437 reported in NL, such as L. plantarum and have been correlated to the interactions with the

438 dynamic environments (58).

439 3.3.5 Defense systems

440 The proportion of Defense systems (V) genes was significantly higher in the plasmids of VAL

441 species as compared to NL. The major subcategories observed were restriction-modification (R-

442 M) system (275 CDSs), toxin-antitoxin (TA) (263 CDSs), ABC transporters (123 CDSs) and

443 CRISPR-Cas associated protein (14 CDSs) (Fig. 4f, Table S5). Of the four major types (I-IV) of bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

444 R-M systems, the type I R-M system was found to be the most abundant with 175 CDSs.

445 Although the fraction of type I R-M system genes was overrepresented on the plasmids of VAL,

446 all the three essential subunits (R, M, S) were present in the highest numbers in FLL (9) and NL

447 (8) as compared to the VAL (4). The average proportion of TA genes on the plasmids was

448 significantly higher in FLL (found in seven species) and NL (found in four species) as compared

449 to VAL (found in four species). TA loci are involved in regulating the gene expression under

450 stress conditions and are thought to be more important to the free-living bacteria since they are

451 more frequently under nutritional stress than the host-associated bacteria (59) justifying the

452 above observation.

453 The proportion of genes encoding ABC transporters was 4.2 and 3 fold higher in the

454 plasmids of VAL than those from FLL and NL, respectively (Fig. 4f). Most of the ABC

455 transporters under this COG category appear to be involved in the transport of the antimicrobial

456 peptides and antibiotics outside the cell (Table S5). Higher proportion of such genes in the

457 plasmids of VAL is similar to the overrepresentation of the MFS transporters as discussed above

458 and bacteriocin-encoding genes (see section antibacterial activity) in the similar plasmids. This

459 observation also suggests the contribution of plasmids in defense against other bacteria in the

460 more competitive environment and effluxing the antibiotics.

461 3.3.6 Amino acid transport and metabolism

462 A total of 387 CDSs were found under Amino acid transport and metabolism (E) with no

463 significant difference across the habitats. Amino acids metabolism (161 CDSs), osmoprotectant

464 transport system (72 CDSs) and peptidase (16 CDSs) were the major subcategories (Fig. 4g,

465 Table S5). L. salivarius plasmids uniquely had several genes involved in the amino acid

466 metabolism such as 3-dehydroquinate dehydratase, serine dehydratase, and cysteine desulfurase bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

467 (Table S5). The genes required for the conversion of methionine to cysteine, including cysteine

468 synthase, cystathionine lyase, and serine acetyltransferase were found only in a few L. paracasei,

469 L. casei, L. rhamonosus and Limosilactobacillus fermentum plasmids and this is consistent with

470 this well-characterized pathway in L. parcasei (60).

471 Some plasmids encoded peptidases and their fraction was significantly higher in the

472 plasmids from VAL as compared to NL and FLL. However, these genes were confined to the

473 plasmids from L. salivarius in VAL group. A total of 68 glycine betaine transporter genes were

474 found on the plasmids, of which 64 belonged to NL (L. plantarum and L. pentosus, Table S4).

475 Glycine betaine transporters are involved in providing osmoprotection to the bacteria (61)

476 suggesting that plasmids may play a role in the similar function in NL.

477 3.3.7 Energy production and conservation

478 The category Energy production and conservation (C) was significantly over-represented in the

479 plasmids of VAL as compared to NL. We further classified the genes under this category as

480 those associated with aerobic metabolism (98 CDSs), anaerobic metabolism (71 CDSs), and

481 others (97 CDSs) (Fig. 4h, Table S5). Plasmids from VAL had a significantly higher proportion

482 of the genes associated with anaerobic metabolism than NL and none of FLL had such genes

483 present on their plasmids (Fig. 4h). Although two genes, viz., pyruvate formate lyase and

484 acetaldehyde dehydrogenase were present only in L. salivarius, other genes such as lactate

485 dehydrogenase, alcohol dehydrogenase and fumarate reductase were each present in two or more

486 VAL species such as L. salivarius, L. amylolyticus, L. amylovorus, L. reuteri and Lactobacillus

487 gasseri. There was no significant difference between the habitats for the proportion in the

488 plasmids of the genes involved in the aerobic metabolism. However, genes encoding the oxygen

489 utilizing enzymes viz., NADH oxidase and pyruvate oxidase were present only in NL (L. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

490 plantarum and L. paraplantarum) and FLL (Paucilactobacillus hokkaidonensis, L. brevis, S.

491 paracollinoides, Lentilactobacillus parabuchneri). Although Lactobacillus species are generally

492 considered to be aerotolerant, because of the anaerobic gut environment, most of VAL such as L.

493 johnsonii and L. gasseri are considered strictly anaerobic (62), (63). On the other hand, NL and

494 FLL exhibit respiratory growth (64). Thus, NADH oxidase can help FLL and NL in the

495 regeneration of NAD+ using oxygen as an electron acceptor. Furthermore, Lc. lactis NADH

496 oxidase was also shown to be inactivated under anaerobic conditions or low pH (65) justifying

497 its dispensability in VAL. Similarly, the anaerobic metabolism genes which were dominant in

498 the plasmids from VAL can assist in the maintaining the redox balance under the anaerobic

499 conditions in these species.

500 3.3.8 Stress resistance

501 To confront a variety of harsh conditions, lactobacilli are equipped with an arsenal of stress

502 signaling pathways (66). To get insights into the possible contribution of plasmids in the stress

503 tolerance in Lactobacillaceae, we explored whether any of the various stress tolerance-associated

504 genes described earlier in Lactobacillus (67), (68), (69), (70), (71) were present on the plasmids

505 by evaluating the eggNOG, NCBI, and KEGG annotations. In total 228 stress resistance genes

506 related to oxidative stress resistance (104 CDSs), multiple stress resistance (clp; 39 CDSs),

507 universal stress resistance protein (uspA; 26 CDSs), stress response regulator protein (gls24; 15

508 CDSs), stress response membrane protein (GlsB/YeaQ/YmgE family 11 CDSs), low pH

509 tolerance genes (13 CDSs) and bile tolerance (10 CDSs) were detected on 103 plasmids across

510 62 different strains (Table S16).

511 Within oxidative stress tolerance, genes encoding thioredoxin (TrxA, 33 CDSs),

512 thioredoxin reductase (TrxB, 20 CDSs), glutathione reductase (GshR, 28 CDSs), and disulfide bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

513 isomerase (FrnE, 20 CDSs) were detected in the Lactobacillaceae plasmids (Table S17). The

514 functions of gshR, trxA, and trxB have already been established in NL and FLL (14), (72). FrnE

515 encodes for a disulfide bond formation protein (DsbA) belonging to the thioredoxin family

516 which plays a role in oxidative stress tolerance and has been characterized in Deinococcus

517 radiodurans (73). Interestingly, 17 plasmids which were solely from NL and FLL had all the

518 four genes, trxA, trxB, gshR, and frnE present in the same order and had arsR upstream of trxA

519 (Fig. 7b, Table S18) suggesting the possible involvement of this transcriptional regulator in the

520 expression of all the downstream genes associated with the oxidative stress response. Such a

521 contribution of an ArsR family transcriptional regulator in the expression of trxA2 upon the

522 exposer to the oxidative stress has been reported in Cyanobacterium Anabaena sp. PCC 7120

523 (74). This observation along with the collective overrepresentation of these genes in the plasmids

524 of FLL as compared to HAL indicates the contribution of these plasmids in relieving the

525 oxidative stress in FLL and NL which is more exposed to the aerobic conditions.

526 A total of 39 Clp proteinase genes (clpL, clpV, clpX, clpC) were detected with their

527 fraction significantly higher in the plasmids of FLL as compared to those from HAL (Table S16).

528 The primary function of the Clp complex is to degrade the abnormal or misfolded proteins under

529 stressful conditions (75), (76) with its increased expression observed in L. plantarum WCFS1

530 under heat stress condition (77). The total of 52 genes encoding other stress response proteins

531 viz., UspA, Gls24 and GlsB/YeaQ/YmgE family protein were found on the Lactobacillaceae

532 plasmids of which only two ORFs were encoded by the plasmids of VAL. The gene uspA is

533 considered as universal stress resistance gene and was shown to respond to various stresses such

534 as heat shock, oxidants, DNA damage, and nutrient starvation in L. plantarum (78). Similarly, bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

535 gene gls24, encodes a stress response regulator protein and is induced under copper stress in

536 Enterococcus hirae (79).

537 Bsh encoding choloylglycine hydrolase involved in the deconjugation of bile salts in the

538 mammalian gut disabling their inhibitory effect was present only in the plasmids from VAL (8

539 genes) and FLL (2 genes), though confined only to one species in each habitat (Table S4).

540 Interestingly, all these plasmids also had a two-component system encoded by histidine kinase

541 and response regulator genes, and ABC transporter genes downstream of bsh though encoded by

542 the opposite strand (Fig. 7a, Table S19). An operon containing bsh and the two-component

543 system genes was shown to be involved in bile salt resistance in L. acidophilus (80). In addition

544 to these, an ABC transporter locus was shown to be present next to such bile resistance operon in

545 L. gasseri (81). Thus, in light of these facts and few other previous studies showing strong

546 correlation between the bile salt resistance genotype, phenotype, and the habitat of the LAB (82),

547 it appears that plasmids play an important role in survival of at least in some lactobacilli during

548 the passage through GI tract.

549 3.3.9 Antibiotic resistance

550 Since antibiotic resistance is one of the safety concerns associated with LAB used for human

551 consumption (83), we examined the extent of the occurrence of ARGs on the Lactobacillaceae

552 plasmids using Comprehensive Antibiotic Resistance Database (CARD). A total of 58 ARGs

553 were found on 48 plasmids in 42 Lactobacillus strains (Table S20). These ARGs encoded

554 resistance for lincosamides (26 lmrB, two lnuA, and one lsaA), tetracycline (15), multiple

555 antibiotics (6), aminoglycoside (2), chloramphenicol (2), erythromycin (1), pleuromutilin (2),

556 streptogramin (1), and vancomycin (1). The presence of lmrB has been previously shown in L.

557 gasseri UFVCC 1091 (84) and was also found to be involved in bacteriocin secretion and bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

558 immunity in Lc. lactis (85). Campedelli et al. (86) reported the presence of 18 tetracycline

559 resistance genes in the raw data of the genomes of 182 type strains of Lactobacillus. Thus, the

560 presence of 15 tetracycline resistance genes in 512 plasmids in the current set appears to be a

561 much higher proportion. Of these, eight genes encoded for the efflux pumps (Tet (L) and Tcr3);

562 whereas seven encoded for ribosomal protection proteins (TetM, TetW, and TetB(P)). Majority

563 of the tetracycline resistance genes (eight) were found on the plasmids of VAL and this

564 phenomenon could be related to the usage of the antibiotics as tetracycline is one of the most

565 commonly used antibiotics in humans. This hypothesis is also supported by an earlier report

566 suggesting that the tetracycline resistance genes in the human intestinal bifidobacteria might

567 have been acquired from the intestinal pathogens (87). Although it is important that LAB for the

568 food and probiotic applications should be free of antibiotic resistance and the associated genes,

569 some of the previous reports have shown the absence of correlation between the antibiotic

570 resistance phenotype and genotype (86), (88). Thus, the mechanism of antibiotic resistance in the

571 strains lacking the related ARGs and that of the antibiotic susceptibility in spite of having the

572 ARGs needs to be deeply studied in Lactobacillaceae.

573 ARGs are often reported to co-occur with HMRGs in many bacteria especially under the

574 high concentration of heavy metals in the environment (89). Thus, we analyzed the co-

575 occurrence of ARGs with HMRGs. No significant difference was found in the proportion of

576 plasmids having ARGs with or without HMRGs (logistic regression, p < 0.05, data not shown).

577 Such a low extent of co-occurrence of these genes has also been reported earlier in Lactobacillus

578 and Lactococcus (90). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

579 3.3.10 Antibacterial activity

580 Lactobacilli are known as one of the important producers of bacteriocins which show

581 antimicrobial activity against closely related bacteria. To identify bacteriocin encoded ORFs in

582 lactobacilli plasmids, the plasmid genomes were screened with the help of BAGEL4 database

583 (91). A total of 18 plasmids of 18 strains were found to encode genes for bacteriocin operons.

584 Twelve plasmids contained complete operons collectively for class IIa (4 plasmids), IIb (5

585 plasmids), and IIc (3 plasmids) bacteriocins and encoded genes for structural peptides and

586 immunity and transport proteins (Table S21a). In total, 27 structural genes were found on the

587 plasmids, and nine of them appear to encode novel bacteriocin (identity < 85%) (Table S21b).

588 The highest number of the structural genes (24) encoded class II bacteriocins, whereas only one

589 plasmid (Lactobacillus gallinarum HFD4) encoded three bacteriolysin genes (class III) (Table

590 S8b). This is consistent with the earlier observation in Lactobacillus (92). Although the presence

591 of structural genes alone in a bacterium is not sufficient for the production and secretion of the

592 bacteriocins, complementation of such genes with the accessory genes from other strains was

593 shown to support the secretion of the novel bacteriocins (93). This suggests the potential

594 application of Lactobacillaceae plasmids in the production of novel bacteriocins. In the context

595 of habitats, the proportion of strains having complete bacteriocin operons as well as those having

596 structural genes in the plasmids was highest in VAL (18.5% and 22.2%, respectively) than NL

597 (9.5% and 13.6%, respectively) and none of the FLL had the complete bacteriocin operons or

598 structural genes. A similar observation was reported earlier with a high prevalence of bacteriocin

599 encoding genes in the Lactobacillus strains isolated from animal and human gut than those

600 isolated from other sources (92). This observation also suggests that plasmids might offer a

601 competitive advantage to the lactobacilli, mostly in the host-adapted environment. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

602 4. Conclusion

603 Plasmids have been shown to be important for the niche-adaptation in many individual bacteria.

604 We demonstrated this phenomenon in Lactobacillaceae by large-scale comparative genomic

605 analysis of the publically available plasmid sequence data. The genomic content of the plasmids

606 was consistent with the respective lifestyle adopted by lactobacilli suggesting that the plasmids

607 might enhance the niche-specific fitness of the strains. Whether the plasmid-encoded genes are

608 redundant as the chromosomal ones remains to be determined. Provided that numerous bacterial

609 genes have no proved function assigned to them, at least some of the smaller plasmids with a

610 very few genes offer a readymade tool to understand the function of these genes. Specifically,

611 after incorporating suitable selection marker into the plasmids and their transformation into a

612 model LAB, the altered phenotype can give readout of the function of such genes. Additionally,

613 some of the genes present on the plasmids such as those conferring resistance to antibiotics,

614 heavy metals, oxidative stress, bacteriocins, etc. as well as those ascribing certain biochemical

615 traits such as utilization of specific sugars can be developed as the selection markers.

616 Considering the arsenal of the genes present on the plasmids and the commercial importance of

617 the Lactobacillaceae species, plasmids have huge potential in improving the properties of other

618 strains of these bacteria. For example, the plasmids with PTS systems can enable utilization of

619 broad range of sugars, those with the stress resistance genes can enable bacterial sustenance

620 under industrial as well as gastrointestinal settings, those with EPS biosynthesis related genes

621 can improve the food-related properties, those with the genes involved in redox balancing can

622 assist in the production of value-added chemicals, and so on. Taken together, this study not only

623 provides a bird’s eye view of the plasmidome of Lactobacillaceae and fundamental insights into bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

624 their ecological importance but also opens new avenues to understand the biological functions

625 and invent novel biotechnological applications of the plasmids.

626 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

627 Authors and contributors

628 Conceptualization, R.K.; Methodology, D.D. (Dimple Davray); Formal analysis, D.D. (Dimple

629 Davray), D.D. (Dipti Deo); Investigation, D.D. (Dimple Davray); Resources, R.K.; Writing –

630 original draft preparation, D.D. (Dimple Davray); Writing – review and editing, R.K.

631 Conflicts of interest

632 The authors declare that there are no conflicts of interest.

633 Funding information

634 We are grateful to the Symbiosis Centre for Research and Innovation (SCRI), Symbiosis

635 International (Deemed University) for financial support. The academic activities related to

636 antimicrobial resistance at the host institute are supported through ERASMUS+ grant 598515-

637 EPP-1-2018-1-IN-EPPKA2-CBHE-JP.

638

639 References

640 1. Zheng J, Wittouck S, Salvetti E, Franz CMAP, Harris HMB, Mattarelli P, et al. A 641 taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended 642 description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and 643 Leuconostocaceae. Int J Syst Evol Microbiol. 2020;70(4):2782–858.

644 2. Slover CM. Lactobacillus: a Review. Clin Microbiol Newsl. 2008 Feb 15;30(4):23–7.

645 3. Evivie SE, Huo G-C, Igene JO, Bian X. Some current applications, limitations and future 646 perspectives of lactic acid bacteria as probiotics. Food Nutr Res. 2017 Jan 647 3;61(1):1318034.

648 4. Hatti-Kaul R, Chen L, Dishisha T, Enshasy H El. Lactic acid bacteria: from starter 649 cultures to producers of chemicals. FEMS Microbiol Lett. 2018 Oct 1;365(20). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

650 5. Duar RM, Lin XB, Zheng J, Martino ME, Grenier T, Pérez-Muñoz ME, et al. Lifestyles in 651 transition: evolution and natural history of the genus Lactobacillus. FEMS Microbiol Rev. 652 2017;41(1):S27–48.

653 6. Makarova K, Slesarev A, Wolf Y, Sorokin A, Mirkin B, Koonin E, et al. Comparative 654 genomics of the lactic acid bacteria. Proc Natl Acad Sci U S A. 2006 Oct 655 17;103(42):15611–6.

656 7. Zheng J, Guan Z, Cao S, Peng D, Ruan L, Jiang D, et al. Plasmids are vectors for 657 redundant chromosomal genes in the group. BMC Genomics. 2015 Jan 22;16(1):6.

658 8. Danner H, Holzer M, Mayrhuber E, Braun R. Acetic acid increases stability of silage 659 under aerobic conditions. Appl Environ Microbiol. 2003 Jan 1;69(1):562–7.

660 9. O’Sullivan O, O’Callaghan J, Sangrador-Vegas A, McAuliffe O, Slattery L, Kaleta P, et 661 al. Comparative genomics of lactic acid bacteria reveals a niche-specific gene set. BMC 662 Microbiol. 2009;9.

663 10. Martino ME, Bayjanov JR, Caffrey BE, Wels M, Joncour P, Hughes S, et al. Nomadic 664 lifestyle of Lactobacillus plantarum revealed by comparative genomics of 54 strains 665 isolated from different habitats. Environ Microbiol. 2016 Dec 1;18(12):4974–89.

666 11. Gänzle MG. Lactic metabolism revisited: Metabolism of lactic acid bacteria in food 667 fermentations and food spoilage. Vol. 2, Current Opinion in Food Science. Elsevier Ltd; 668 2015. p. 106–17.

669 12. Zheng J, Ruan L, Sun M, Gänzle M. A genomic view of lactobacilli and pediococci 670 demonstrates that phylogeny matches ecology and physiology. Appl Environ Microbiol. 671 2015;81(20):7233–43.

672 13. Folli C, Levante A, Percudani R, Amidani D, Bottazzi S, Ferrari A, et al. Toward the 673 identification of a type I toxin-antitoxin system in the plasmid DNA of dairy Lactobacillus 674 rhamnosus. Sci Rep. 2017 Dec 1;7(1):1–13.

675 14. Zhai Z, Yang Y, Wang J, Wang G, Ren F, Hao Y. Complete genome sequencing of 676 Lactobacillus plantarum CAUH2 reveals a novel plasmid pCAUH203 associated with 677 oxidative stress tolerance. 3 Biotech. 2019 Apr 1;9(4). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

678 15. Cui Y, Hu T, Qu X, Zhang L, Ding Z, Dong A. Plasmids from Food Lactic Acid Bacteria: 679 Diversity, Similarity, and New Developments. Int J Mol Sci. 2015 Jun 10;16(12):13172– 680 202.

681 16. Song Y, He Q, Zhang J, Qiao J, Xu H, Zhong Z, et al. Genomic Variations in Probiotic 682 Lactobacillus plantarum P-8 in the Human and Rat Gut. Front Microbiol. 2018 May 683 8;9(MAY):893.

684 17. C D, RP R, G F, C S. Sequence Analysis of the Plasmid Genome of the Probiotic Strain 685 Lactobacillus paracasei NFBC338 Which Includes the Plasmids pCD01 and pCD02. 686 Plasmid. 2005;54(2).

687 18. Fraunhofer ME, Geißler AJ, Behr J, Vogel RF. Comparative Genomics of Lactobacillus 688 brevis Reveals a Significant Plasmidome Overlap of Brewery and Insect Isolates. Curr 689 Microbiol. 2019 Jan 15;76(1):37–47.

690 19. Fallico V, McAuliffe O, Fitzgerald GF, Ross RP. Plasmids of raw milk cheese isolate 691 Lactococcus lactis subsp. lactis biovar diacetylactis DPC3901 suggest a plant-based origin 692 for the strain. Appl Environ Microbiol. 2011 Sep 15;77(18):6451–62.

693 20. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search 694 tool. J Mol Biol. 1990;215(3):403–10.

695 21. Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection 696 of protein families. Nucleic Acids Res. 2002;30(7):1575–84.

697 22. Howe E, Holton K, Nair S, Schlauch D, Sinha R, Quackenbush J. MeV: MultiExperiment 698 Viewer. In: Biomedical Informatics for Cancer Research. Boston, MA: Springer US; 699 2010. p. 267–77.

700 23. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and 701 annotation of phylogenetic and other trees. Nucleic Acids Res. 2016;44(W1):W242–5.

702 24. Sullivan MJ, Petty NK, Beatson SA. Easyfig: A genome comparison visualizer. 703 Bioinformatics. 2011 Apr;27(7):1009–10.

704 25. Deo D, Davray D, Kulkarni R. A diverse repertoire of exopolysaccharide biosynthesis 705 gene clusters in Lactobacillus revealed by comparative analysis in 106 sequenced bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

706 genomes. Microorganisms. 2019;7(10).

707 26. Dietel A-K, Merker H, Kaltenpoth M, Kost C. Selective advantages favour high genomic 708 AT-contents in intracellular elements. Blokesch M, editor. PLOS Genet. 2019 Apr 709 29;15(4):e1007778.

710 27. Nishida H. Comparative Analyses of Base Compositions, DNA Sizes, and Dinucleotide 711 Frequency Profiles in Archaeal and Bacterial Chromosomes and Plasmids. Int J Evol Biol. 712 2012;2012:1–5.

713 28. Lawrence JG, Ochman H. Molecular archaeology of the Escherichia coli genome. Proc 714 Natl Acad Sci U S A. 1998 Aug 4;95(16):9413–7.

715 29. Rocha EPC, Danchin A. Base composition bias might result from competition for 716 metabolic resources. Vol. 18, Trends in Genetics. Trends Genet; 2002. p. 291–4.

717 30. Almpanis A, Swain M, Gatherer D, McEwan N. Correlation between bacterial G+C 718 content, genome size and the G+C content of associated plasmids and bacteriophages. 719 Microb genomics. 2018 Apr 1;4(4).

720 31. Kelleher P, Mahony J, Bottacini F, Lugli GA, Ventura M, Van Sinderen D. The 721 Lactococcus lactis pan-Plasmidome. Front Microbiol. 2019 Apr 4;10(APR):707.

722 32. Górecki RK, Koryszewska-Bagińska A, Gołębiewski M, Żylińska J, Grynberg M, 723 Bardowski JK. Adaptative Potential of the Lactococcus lactis IL594 Strain Encoded in Its 724 7 Plasmids. PLoS One. 2011 Jul 18;6(7):e22238.

725 33. Li SJ, Hua ZS, Huang LN, Li J, Shi SH, Chen LX, et al. Microbial communities evolve 726 faster in extreme environments. Sci Rep. 2014 Aug 27;4.

727 34. Nelson WC, Wollerman L, Bhaya D, Heidelberg JF. Analysis of insertion sequences in 728 thermophilic cyanobacteria: Exploring the mechanisms of establishing, maintaining, and 729 withstanding high insertion sequence abundance. Appl Environ Microbiol. 2011 Aug 730 1;77(15):5458–66.

731 35. Cobo-Simón M, Tamames J. Relating genomic characteristics to environmental 732 preferences and ubiquity in different microbial taxa. BMC Genomics. 2017 Jun 733 29;18(1):499. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

734 36. Acosta S, Carela M, Garcia-Gonzalez A, Gines M, Vicens L, Cruet R, et al. DNA Repair 735 Is Associated with Information Content in Bacteria, Archaea, and DNA Viruses. J Hered. 736 2015;106(5):644–59.

737 37. Zhang X, Rogers M, Bierschenk D, Bonten MJM, Willems RJL, van Schaik W. A LacI- 738 Family Regulator Activates Maltodextrin Metabolism of Enterococcus faecium. 739 Manganelli R, editor. PLoS One. 2013 Aug 7;8(8):e72285.

740 38. Bidart GN, Rodríguez-Díaz J, Pérez-Martínez G, Yebra MJ. The lactose operon from 741 Lactobacillus casei is involved in the transport and metabolism of the human milk 742 oligosaccharide core-2 N-acetyllactosamine. Sci Rep. 2018 Dec 1;8(1):1–12.

743 39. Beekwilder J, Marcozzi D, Vecchi S, De Vos R, Janssen P, Francke C, et al. 744 Characterization of Rhamnosidases from Lactobacillus plantarum and Lactobacillus 745 acidophilus. Appl Environ Microbiol. 2009 Jun;75(11):3447–54.

746 40. Lorca G, Reddy L, Nguyen A, Sun EI, Tseng J, Yen MR, et al. Lactic Acid Bacteria: 747 Comparative Genomic Analyses of Transport Systems. In: Biotechnology of Lactic Acid 748 Bacteria Novel Applications. 2007. p. 73–87.

749 41. Elkins CA, Moser SA, Savage DC. Genes encoding bile salt hydrolases and conjugated 750 bile salt transporters in Lactobacillus johnsonii 100-100 and other Lactobacillus species. 751 Microbiology. 2001 Dec 1;147(12):3403–12.

752 42. Dhakephalkar PK, Chopade BA. High levels of multiple metal resistance and its 753 correlation to antibiotic resistance in environmental isolates of Acinetobacter. Biometals. 754 1994 Jan;7(1):67–74.

755 43. Jiang XW, Cheng H, Zheng BW, Li A, Lv LX, Ling ZX, et al. Comparative genomic 756 study of three species within the genus Ornithinibacillus, reflecting the adaption to 757 different habitats. Gene. 2016 Mar 1;578(1):25–31.

758 44. Van Kranenburg R, Golic N, Bongers R, Leer RJ, De Vos WM, Siezen RJ, et al. 759 Functional analysis of three plasmids from Lactobacillus plantarum. Appl Environ 760 Microbiol. 2005 Mar 1;71(3):1223–30.

761 45. Loukin SH, Kuo MMC, Zhou XL, Haynes WJ, Kung C, Saimi Y. Microbial K+ channels. 762 Vol. 125, Journal of General Physiology. The Rockefeller University Press; 2005. p. 521– bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

763 7.

764 46. Anjali T. Merchant and Grace. A Spatafora. A role for the DtxR family of 765 metalloregulators in gram-positive pathogenesis. Mol Oral Microbiol. 2014;29(1):1–7.

766 47. Yao W, Yang L, Shao Z, Xie L, Chen L. Identification of salt tolerance-related genes of 767 Lactobacillus plantarum D31 and T9 strains by genomic analysis. Ann Microbiol. 2020 768 Mar 3;70(1):1–14.

769 48. Böhmer N, König S, Fischer L. A novel manganese starvation-inducible expression 770 system for Lactobacillus plantarum. FEMS Microbiol Lett. 2013 May;342(1):37–44.

771 49. Goulter-Thorsen RM, Taran E, Gentle IR, Gobius KS, Dykes GA. CsgA production by 772 Escherichia coli O157:H7 alters attachment to abiotic surfaces in some growth 773 environments. Appl Environ Microbiol. 2011 Oct;77(20):7339–44.

774 50. Kwak W, Kim K, Lee C, Lee C, Kang J, Cho K, et al. Comparative analysis of the 775 complete genome of Lactobacillus plantarum GB-LP2 and potential candidate genes for 776 host immune system enhancement. J Microbiol Biotechnol. 2016 Apr 28;26(4):684–92.

777 51. Remus DM, van Kranenburg R, van Swam II, Taverne N, Bongers RS, Wels M, et al. 778 Impact of 4 Lactobacillus plantarum capsular polysaccharide clusters on surface glycan 779 composition and host cell signaling. Microb Cell Fact. 2012 Nov 21;11(1):149.

780 52. Khalil ES, Manap MYA, Mustafa S, Alhelli AM, Shokryazdan P. Probiotic properties of 781 exopolysaccharide-producing Lactobacillus strains isolated from tempoyak. Molecules. 782 2018;23(2).

783 53. Rendueles O, Garcia-Garcerà M, Néron B, Touchon M, Rocha EPC. Abundance and co- 784 occurrence of extracellular capsules increase environmental breadth: Implications for the 785 emergence of pathogens. PLOS Pathog. 2017;13(7):e1006525.

786 54. Kralj S, van Geel-Schutten GH, Dondorff MMG, Kirsanovs S, van der Maarel MJEC, 787 Dijkhuizen L. Glucan synthesis in the genus Lactobacillus: Isolation and characterization 788 of glucansucrase genes, enzymes and glucan products from six different strains. 789 Microbiology. 2004 Nov 1;150(11):3681–90.

790 55. Cohan FM, Kopac SM. Microbial genomics: E. coli relatives out of doors and out of body. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

791 Vol. 21, Current Biology. Cell Press; 2011. p. R587–9.

792 56. Sengupta R, Altermann E, Anderson RC, McNabb WC, Moughan PJ, Roy NC. The role 793 of cell surface architecture of lactobacilli in host-microbe interactions in the 794 gastrointestinal tract. Mediators Inflamm. 2013;2013.

795 57. Vélez MP, De Keersmaecker SCJ, Vanderleyden J. Adherence factors of Lactobacillus in 796 the human gastrointestinal tract. Vol. 276, FEMS Microbiology Letters. FEMS Microbiol 797 Lett; 2007. p. 140–8.

798 58. Kleerebezem M, Boekhorst J, Van Kranenburg R, Molenaar D, Kuipers OP, Leer R, et al. 799 Complete genome sequence of Lactobacillus plantarum WCFS1. Proc Natl Acad Sci U S 800 A. 2003 Feb 18;100(4):1990–5.

801 59. Deo Prakash Pandey and Kenn Gerdes. Toxin–antitoxin loci are highly abundant in free- 802 living but lost from host-associated prokaryotes. Nucleic Acids Research. 2005. p. 966– 803 976.

804 60. Wüthrich D, Irmler S, Berthoud H, Guggenbühl B, Eugster E, Bruggmann R. Conversion 805 of Methionine to Cysteine in Lactobacillus paracasei Depends on the Highly Mobile 806 cysK-ctl-cysE Gene Cluster. Front Microbiol. 2018 Oct 17;9(OCT):2415.

807 61. Zhao S, Zhang Q, Hao G, Liu X, Zhao J, Chen Y, et al. The protective role of glycine 808 betaine in Lactobacillus plantarum ST-III against salt stress. Food Control. 2014 Oct 809 1;44:208–13.

810 62. Maresca D, Zotta T, Mauriello G. Adaptation to Aerobic Environment of Lactobacillus 811 johnsonii/gasseri Strains. Front Microbiol. 2018 Feb 9;9(FEB):157.

812 63. Ruth A. Riedl, Samantha N. Atkinson, Colin M.L. Burnett, Justin L. Grobe and JRK. The 813 Gut Microbiome, Energy Homeostasis, and Implications for Hypertension. Curr 814 Hypertens Rep. 2017;19(4).

815 64. Zotta T, Parente E, Ricciardi A. Aerobic metabolism in the genus Lactobacillus: impact 816 on stress response and potential applications in the food industry. Vol. 122, Journal of 817 Applied Microbiology. Blackwell Publishing Ltd; 2017. p. 857–69.

818 65. Chambellon E, Rijnen L, Lorquet F, Gitton C, Van Hylckama Vlieg JET, Wouters JA, et bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

819 al. The D-2-hydroxyacid dehydrogenase incorrectly annotated PanE is the sole reduction 820 system for branched-chain 2-keto acids in Lactococcus lactis. J Bacteriol. 2009 Feb 821 1;191(3):873–81.

822 66. Papadimitriou K, Alegría Á, Bron PA, de Angelis M, Gobbetti M, Kleerebezem M, et al. 823 Stress Physiology of Lactic Acid Bacteria. Microbiol Mol Biol Rev. 2016 Sep;80(3):837– 824 90.

825 67. Biswas S, Keightley A, Biswas I. Characterization of a stress tolerance-defective mutant 826 of Lactobacillus rhamnosus LRB. Mol Oral Microbiol. 2019 Aug 1;34(4):153–67.

827 68. Fontana A, Falasconi I, Molinari P, Treu L, Basile A, Vezzi A, et al. Genomic comparison 828 of Lactobacillus helveticus strains highlights probiotic potential. Front Microbiol. 2019 829 Jun 26;10:1380.

830 69. Daranas N, Badosa E, Francés J, Montesinos E, Bonaterra A. Enhancing water stress 831 tolerance improves fitness in biological control strains of Lactobacillus plantarum in plant 832 environments. Jogaiah S, editor. PLoS One. 2018 Jan 5;13(1):e0190931.

833 70. Jia FF, Zheng HQ, Sun SR, Pang XH, Liang Y, Shang JC, et al. Role of luxS in Stress 834 Tolerance and Adhesion Ability in Lactobacillus plantarum KLDS1.0391. Biomed Res 835 Int. 2018;2018.

836 71. Lebeer S, Vanderleyden J, De Keersmaecker SCJ. Genes and Molecules of Lactobacilli 837 Supporting Probiotic Action. Microbiol Mol Biol Rev. 2008;72(4):728–64.

838 72. Jänsch A, Korakli M, Vogel RF, Gänzle MG. Glutathione reductase from Lactobacillus 839 sanfranciscensis DSM20451 T: Contribution to oxygen tolerance and thiol exchange 840 reactions in wheat sourdoughs. Appl Environ Microbiol. 2007 Jul;73(14):4469–76.

841 73. Khairnar NP, Joe MH, Misra HS, Lim SY, Kim DH. Frne, a cadmium-inducible protein in 842 Deinococcus radiodurans, is characterized as a disulfide isomerase chaperone in vitro and 843 for its role in oxidative stress tolerance in vivo. J Bacteriol. 2013;195(12):2880–6.

844 74. Ehira S, Ohmori M. The redox-sensing transcriptional regulator RexT controls expression 845 of thioredoxin A2 in the cyanobacterium Anabaena sp. strain PCC 7120. J Biol Chem. 846 2012 Nov 23;287(48):40433–40. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

847 75. Krüger E, Witt E, Ohlmeier S, Hanschke R, Hecker M. The Clp proteases of Bacillus 848 subtilis are directly involved in degradation of misfolded proteins. J Bacteriol. 2000 849 Jun;182(11):3259–65.

850 76. Michel A, Agerer F, Hauck CR, Herrmann M, Ullrich J, Hacker J, et al. Global regulatory 851 impact of ClpP protease of Staphylococcus aureus on regulons involved in virulence, 852 oxidative stress response, autolysis, and DNA repair. J Bacteriol. 2006 Aug 853 15;188(16):5783–96.

854 77. Russo P, Mohedano M de la L, Capozzi V, de Palencia PF, López P, Spano G, et al. 855 Comparative proteomic analysis of Lactobacillus plantarum WCFS1 and ΔctsR mutant 856 strains under physiological and heat stress conditions. Int J Mol Sci. 2012 857 Sep;13(9):10680–96.

858 78. Gury J, Seraut H, Tran NP, Barthelmebs L, Weidmann S, Gervais P, et al. Inactivation of 859 padR, the repressor of the phenolic acid stress response, by molecular interaction with 860 usp1, a universal stress protein from Lactobacillus plantarum, in Escherichia coli. Appl 861 Environ Microbiol. 2009 Aug 15;75(16):5273–83.

862 79. Stoyanov J V., Mancini S, Lu ZH, Mourlane F, Poulsen KR, Wimmer R, et al. The stress 863 response protein Gls24 is induced by copper and interacts with the CopZ copper 864 chaperone of Enterococcus hirae. FEMS Microbiol Lett. 2010 Jan 1;302(1):69–75.

865 80. Pfeiler EA, Azcarate-Peril MA, Klaenhammer TR. Characterization of a novel bile- 866 inducible operon encoding a two-component regulatory system in Lactobacillus 867 acidophilus. J Bacteriol. 2007 Jul 1;189(13):4624–34.

868 81. Azcarate-Peril MA, Altermann E, Goh YJ, Tallon R, Sanozky-Dawes RB, Pfeiler EA, et 869 al. Analysis of the Genome Sequence of Lactobacillus gasseri ATCC 33323 Reveals the 870 Molecular Basis of an Autochthonous Intestinal Organism. Appl Environ Microbiol. 2008 871 Aug 1;74(15):4610–25.

872 82. O’Flaherty S, Briner Crawley A, Theriot CM, Barrangou R. The Lactobacillus Bile Salt 873 Hydrolase Repertoire Reveals Niche-Specific Adaptation. Ellermeier CD, editor. 874 mSphere. 2018 May 30;3(3).

875 83. Barlow M. What antimicrobial resistance has taught us about horizontal gene transfer. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

876 Vol. 532, Methods in molecular biology. Methods Mol Biol; 2009. p. 397–411.

877 84. da Cunha LR, Ferreira CLLF, Durmaz E, Goh YJ, Sanozky-Dawes RB, Klaenhammer 878 TR. Characterization of Lactobacillus gasseri isolates from a breast-fed infant. Gut 879 Microbes. 2012;3(1).

880 85. Kaatz GW, McAleese F, Seo SM. Multidrug resistance in Staphylococcus aureus due to 881 overexpression of a novel multidrug and toxin extrusion (MATE) transport protein. 882 Antimicrob Agents Chemother. 2005 May 1;49(5):1857–64.

883 86. Campedelli I, Mathur H, Salvetti E, Clarke S, Rea MC, Torriani S, et al. Genus-wide 884 assessment of antibiotic resistance in Lactobacillus spp. Appl Environ Microbiol. 2019 885 Jan 1;85(1).

886 87. Wang N, Hang X, Zhang M, Liu X, Yang H. Analysis of newly detected tetracycline 887 resistance genes and their flanking sequences in human intestinal bifidobacteria. Sci Rep. 888 2017 Dec 1;7(1).

889 88. Rozman V, Mohar Lorbeg P, Accetto T, Bogovič Matijašić B. Characterization of 890 antimicrobial resistance in lactobacilli and bifidobacteria used as probiotics or starter 891 cultures based on integration of phenotypic and in silico data. Int J Food Microbiol. 2020 892 Feb 2;314:108388.

893 89. Chen J, Li J, Zhang H, Shi W, Liu Y. Bacterial heavy-metal and antibiotic resistance 894 genes in a copper tailing dam area in northern China. Front Microbiol. 2019;10:1–12.

895 90. Pal C, Bengtsson-Palme J, Kristiansson E, Larsson DGJ. Co-occurrence of resistance 896 genes to antibiotics, biocides and metals reveals novel insights into their co-selection 897 potential. BMC Genomics. 2015;16(1):1–14.

898 91. Anne de Jong, Sacha A. F. T. van Hijum*, Jetta J. E. Bijlsma JK and OPK. BAGEL: a 899 web-based bacteriocin genome mining tool. Nucleic Acids Research. 2006.

900 92. Collins FWJ, O’Connor PM, O’Sullivan O, Gómez-Sala B, Rea MC, Hill C, et al. 901 Bacteriocin Gene-Trait matching across the complete Lactobacillus Pan-genome. Sci Rep. 902 2017 Dec 1;7(1):1–14.

903 93. Collins FWJ, Mesa-Pereira B, O’Connor PM, Rea MC, Hill C, Ross RP. Reincarnation of bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

904 Bacteriocins From the Lactobacillus Pangenomic Graveyard. Front Microbiol. 2018 Jul 905 2;9:1298.

906

907 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

908 Table 1: General features of the plasmids from completely sequenced Lactobacillaceae species.

Basic features of plasmids Values Number and proportion of the strains having plasmids 155 (54.7%) Number and proportion of the species having plasmids 38 (66.6%) Total number of plasmids 512 GC Content range (%) 28.9 - 76.5 Size range (Kb) 0.83 - 759.6 Total number of coding sequences (CDSs) 19,064 Number range of coding sequences per plasmid 1-809 909

910

911 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

912 Figure legends

913 Figure 1: General features of the plasmids in the Lactobacillaceae species from various habitats 914 indicating the abundance of plasmids across habitats (a), the number of plasmids per strain (b), 915 average number of plasmids per strain for each species (c), average size of the plasmids per 916 strain (d), GC content per strain (e) and average GC content per strain for each species (f). The 917 statistical analyses were performed to compare the data across various habitats using Kruskal- 918 Wallis test with Dunn’s post hoc test. The GC contents of plasmids and chromosome of the same 919 strain (e) and of the same species (f) were compared using the Wilcoxon signed rank test (***, p 920 < 0.001; **, p < 0.01; *, p < 0.05). 921 922 Figure 2: Hierarchical clustering (HCL) analysis of the Lactobacillaceae plasmids. (a) Sharing of 923 protein families encoded by plasmids as identified by markov clustering (MCL) across 924 lactobacilli from various habitats. (b) HCL tree displaying the clustering of the plasmids based 925 on their protein family composition. Black bars on the outermost periphery represent plasmid 926 genome size. The plasmids having mobilization/conjugation genes were identified based the 927 eggNOG annotations detailed in section 2.2.1 (c) Pie chart representing the proportion of the 928 plasmids from each habitat under clusters I, II, and III of the tree shown in panel (b). 929 930 Figure 3: Average proportion of each of the top-ten COG categories in the Lactobacillaceae 931 plasmids per strain from various habitats. L, Replication, repair and recombination; S, Function 932 unknown; G, Carbohydrate transport and metabolism; K, Transcription; P, Inorganic ion 933 transport and metabolism; M, Cellwall/membrane/envelope biogenesis; V, Defense mechanisms; 934 E, Amino acid transport and metabolism; D, Cell cycle control, cell division, chromosome 935 partitioning; and C, Energy production and conservation. The statistical analysis was performed 936 to compare the data across various habitats using the Kruskal-Wallis test with Dunn’s post hoc 937 test (***, p < 0.001; **, p < 0.01; *, p < 0.05). The error bars represent standard error of 938 measurement. 939 940 Figure 4: Average proportion of each of the COG subcategories in the Lactobacillaceae plasmids 941 per strain from various habitats. The statistical analyses were performed to compare the data 942 across various habitat using the Kruskal-Wallis test with Dunn’s post hoc test (***, p < 0.001; bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

943 **, p < 0.01; * p < 0.05). The error bars represent standard error of measurement. The solid black 944 triangles indicate the significant difference by the same test between the habitats for the species- ▲▲▲ ▲▲ ▲ 945 level average which is mentioned in the Table S6 ( , p < 0.001; , p < 0.01; , p < 0.05). 946 947 Figure 5: Arsenic resistance gene clusters identified in 15 Lactobacillaceae plasmids. Numbers 948 on the right side of the clusters indicate the number of plasmids showing that types of 949 arrangement of the cluster. arsR, a regulator; arsA, ATPase component; arsB, secondary 950 transporter; arsD, transcriptional regulator; and arsC, arsenate reductase. The plasmids 951 represented in each group were from, group 1: L. sakei WiKim0074 (CP025207.1), L. curvatus 952 ZJUNIT8 (CP029967.1); group 2: L. hokkaidonensis LOOC 260 (AP014681.1), L. buchneri 953 NRRL B-30929 (CP002653.1), L. plantarum CLP0611 (CP019723.1), L. brevis ZLB004 (-) 954 (CP021460.1), L. plantarum HAC01 (-) (CP029350.1), L. plantarum DR7 (-) (CP031319.1), L. 955 plantarum ATG-K6 (CP032465.1), L. plantarum ATG-K8 (-) (CP032467.1); group 3: L. 956 plantarum TMW 1.277 (CP017364.1), L. paraplantarum DSM 10667 (CP032747.1), L. 957 plantarum WCFS1 (CR377166.1); group 4: L. plantarum MF1298 (-) (CP013150.1 and 958 CP013151.1). Details are available in the supplementary Table S9. 959 960 Figure 6: EPS gene clusters identified in 16 Lactobacillaceae plasmids. The clusters encoded by 961 the negative strand are represented by the negative sign in the parentheses after the strain name. 962 The Genbank accession number of the plasmid has been mentioned in the parentheses after the 963 strain name. 964 965 Figure 7: Organization of the (a) bile and (b) oxidative stress resistance genes in 966 Lactobacillaceae plasmids. Numbers on the right side of the clusters indicate the number of 967 plasmids showing that types of arrangement of the cluster. The bile resistance genes (a) were 968 bsh, choloylglycine hydrolase; phosphatase, protein tyrosine/serine phosphatase; RR, two- 969 component response regulator; HPK, signal transduction histidine kinase; permease, ABC 970 transporter permease protein; and transporter, ABC-type multidrug transport system ATPase 971 component. The bile resistance genes were from L. salivarius UCC118 (CP000234.1), L. 972 salivarius CECT 5713 (CP002037.1), L. salivarius JCM1046 (CP007647.1), L. salivarius str. 973 Ren (CP011404.1), L. salivarius CICC 23174 (CP017108.1), L. salivarius ZLS006 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

974 (CP020859.1), L. salivarius BCRC 12574 (CP024065.1), L. salivarius BCRC 14759 975 (CP024068.1). The oxidative stress resistance genes (b) were arsR, transcriptional regulator; 976 trxA, thioredoxin; trxB, thioredoxin reductase; frnE, disulfide isomerase; and gshR, glutathione 977 reductase. The oxidative stress resistance genes were from, group 1: L. plantarum LP3 978 (CP017068.1), L. plantarum CAUH2 (-) (CP015129.2), L. brevis LMT1-73 (CP033887.1), L. 979 plantarum LMT1-48 (CP033892.1), L. brevis SRCM101174 (CP021481.1), L. brevis 100D8 980 (CP015340.1), L. brevis KB290 (-) (AP012172.1), L. brevis SRCM101106 (CP021676.1), L. 981 plantarum ATG-K2 (-) (CP032463.1), L. plantarum DSR M2 (CP022293.1), L. sakei 982 WiKim0063 (CP022710.1), L. sakei WiKim0072 (-) (CP025138.1), L. sakei WiKim0074 983 (CP025207.1); group 2: L. plantarum ZFM55 (CP032360.1), L. plantarum ZFM9 984 (CP032643.1), L. plantarum SRCM102022 (CP021502.1), L. plantarum KACC 92189 985 (CP024060.1). 986 987 Figure S1: Heatmap based on the presence (black areas) and absence (white areas) of protein 988 families encoded by the Lactobaccilaceae plasmids as identified by MCL analysis. 989 990 Figure S2: BLAST comparisons using EasyFig program of the highly similar plasmids identified 991 in different strains of Lactobacillaceae based on the Sørensen–Dice similarity coefficient. PF 992 numbers in the parentheses after the gene names indicate MCL protein families.

993 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. (a) (b) (c) ✱✱✱ 150 Total no. of strains 15 8

No. of strains having plasmids ✱ 110 ✱✱✱ 6 100 86 10 73 4

50 43 5

Number of plasmids 33 27 2 strain for each species Average no. of plasmids per Average Number of plasmids per strain 0 0 0

Free-living Nomadic Nomadic Nomadic Free-living Free-living rate-adapted

Vertebrate-adapted Verteb Vertebrate-adapted

(d) (e) C: chromosomes P: plasmids (f) C: chromosomes P: plasmids

✱✱✱ ✱✱✱ 500 100 100 ✱✱✱ ✱✱ 400 80 80 ✱

✱✱✱ 300 ✱✱✱ ✱ 60 60 ✱ ✱✱ per strain 200

40 strain for each species 40 100 Plasmid genome size (Kb) GC content (%) per strain Average GC content (%) per Average

0 20 20 CPCPCP CPCPCP

Nomadic Free-living Nomadic Nomadic Free-living Free-living Vertebrate-adapted Vertebrate-adapted Vertebrate-adapted

Figure 1: General features of the plasmids in the Lactobacillaceae species from various habitats indicating the abundance of plasmids across habitats (a), the number of plasmids per strain (b), average number of plasmids per strain for each species (c), average size of the plasmids per strain (d), GC content per strain (e) and average GC content per strain for each species (f). The statistical analyses were performed to compare the data across various habitats using Kruskal-Wallis test with Dunn’s post hoc test. The GC contents of plasmids and chromosome of the same strain (e) and of the same species (f) were compared using the Wilcoxon signed rank test (***, p < 0.001; **, p < 0.01; *, p < 0.05).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(a) (c)

Free-living 433 Vertebrate- 76 Free-living Nomadic 304 adapted 176 Cluster in tree Vertebrate- I adapted 126 Nomadic II 1019 745 III

(b)

CP032646.1 CP012124.1 CP018214.1 CP025418.1 CP025414.1 CP006016.1 CP025732.1 CP019749.1 CP019741.1 AP012176.1 AP012173.1 CP012893.1 CP029549.1 AP012543.1 CP032639.1 CP012654.1 CP005984.1 CP015127.2 CP021504.1 CP004408.1 CP013168.2 JN084215.1 CP013167.2

CR377164.1 CP024062.1 CP013158.2 CP017959.1 CP010527.1 CP032657.1 CP032661.1 CP028984.1 CP006043.1 CP015968.1 CP015128.2 CP020099.1 JN084214.1 CP017360.1 CP002560.1 CP017362.1 CP017367.1 CR377165.1 CP028982.1 CP032362.1 CP013170.2 CP032647.1 CP020094.1 CP029969.1 CP023178.1 CP006042.1 CP013162.2 CP023495.1 CP031007.1 CP020458.1 CP025992.1 CP024066.1 CP002611.1 CP007650.1 CP002612.1 CP015969.1 CP017110.1 CP006015.1 CP020860.1 CP014923.1 CP007649.1 CP014913.1 CP014925.1 CP017369.1 CP000417.1 AP017934.1 CP017960.1 CP002037.1 CP017370.1 CP000234.1 CP013154.2 CP006247.1 CP017108.1 CP017068.1 CP024065.1 CP022293.1 CP024068.1 AP012542.1 CP007647.1 CP017264.1 CP033891.1 CP011404.1 CP009908.1 CP020859.1 CP012658.1 CP017109.1 CP019742.1 CP024415.1 CP032638.1 CP018210.1 CP025733.1 CP006014.1 CP025994.1 CP005983.1 CP015341.1 CP029970.1 CP005981.1 CP017372.1 CP019739.1 CP022131.1 CP016798.1 CP032758.1 CP014917.1 CP032655.1 CP012653.1 CP032660.1 CP032651.1 CP020461.1 CP018613.1 CP021998.2 CP022476.1 CP018612.1 CP005947.2 CP007123.1 CP028981.1 CP029251.1 CP023175.1 CP002342.1 CP018325.1 CP014788.1 CP018330.1 CP024064.1 CP021502.1 CP017371.1 CP032360.1 CP007648.1 CP032643.1 CP007126.1 CP006012.1 CP012656.1 CP006013.1 Free-living AP012546.1 AY862141.1 CP012896.1 CP006017.1 CP031017.1 CP012892.1 CP031019.1 CP012891.1 I CP023494.1 CP014789.1 CP032749.1 CP032935.1 CP016630.1 Plasmid size FN357112.1 CP021702.1 CP021705.1 CP002561.1 100 kb CP002613.1 Vertebrate-adapted CP021701.1 CP021706.1 AP012175.1 CP006034.1 10 CP006035.1 CP004407.1 CP024416.1 CP014920.1 CP032644.1 CP016631.1 CP032361.1 9 AP017933.1 8 CP017378.1 CP017382.1 Nomadic CP022292.1 7 11 II CP032750.1 CP012123.1 CP029968.1 CP031006.1 CP023774.1 CP033887.1 CP002223.2 CP033892.1 CP017067.1 CP032463.1 CP032649.1 12 CP025138.1 CP006249.1 13 CP015129.2 CP018211.1 6 CP021676.1 CP024414.1 14 CP032658.1 CP015859.1 Potentially mobilizable/ CP019349.1 CP024059.1 5 CP019740.1 CP006039.1 15 CP019748.1 CP032747.1 CP005948.1 CP013151.1 4 CP024417.1 CP017361.1 conjugative CP017958.1 CP030882.1 AP012174.1 CP006038.1 38 3 CP017373.1 CP021675.1 CP005978.1 CP017383.1 CP014922.1 37 CP000418.1 CP008839.1 2 CP002655.1 CP032636.1 16 CP021460.1 CP032634.1 36 1 Non-mobilizable/ CP005979.1 CP032641.1 CP014930.1 CP032653.1 CP021482.1 CP025137.1 35 CP021458.1 CP031004.1 CP005982.1 CP025207.1 non-conjugative CP013166.2 CP018327.1 17 CP014921.1 CP017359.1 CP014931.1 CP005980.1 18 CP020862.1 CP017409.1 34 CP017411.1 CP015861.1 AP017930.1 CP005945.2 19 CP021483.1 CP021461.1 CP022711.1 CP032934.1 CP028983.1 CP013153.2 20 CP032656.1 CP032461.1 CP032662.1 CP012660.1 CP023773.1 CP018328.1 CP015860.1 33 III CP016801.1 CP020095.1 21 CP019754.1 CP015340.1 32 CP013160.2 AP012172.1 CP021503.1 CP021481.1 CP002610.1 CP025415.1 31 CP020460.1 CP018326.1 CP031005.1 CP033889.1 22 CP017377.1 CP032754.1 CP019350.1 CP013152.2 CP019737.1 CP032462.1 CP019746.1 CP012655.1 CP019351.1 AP014681.1 30 CP012895.1 CP021478.1 23 CP031008.1 AP012169.1 CP021459.1 CP021457.1 CP021672.1 CP019736.1 CP033886.1 CP019745.1 CP032755.1 29 24 CP021480.1 CP017380.1 CP032933.1 CP003044.1 28 27 CP016799.1 CP028980.1 CP019753.1 CP020096.1 CP017355.1 CP032645.1 CP017365.1 CP015339.1 25 CP019735.1 CP002653.1 26 CP019744.1 CP006248.1 CP028978.1 CP032753.1 CP013750.1 CP013754.1 CP018212.1 CP006040.1 CP032756.1 CP023491.1 CP017364.1 CP025993.1 CP017357.1 CP012657.1 CP023492.1 CP006041.1 CP017412.1 CP025417.1 CP018797.1 CP017955.1 CP012652.1 CP023176.1 CP013155.2 CP013751.1 CP002035.1 CP013755.1 AF488832.1 CP017956.1 CP015858.1 CP011405.1 CP032745.1 CP008840.1 CP017375.1 CP013756.1 CP013156.2 CP013752.1 CP017957.1 CP023177.1 CP032748.1 CP031020.1 CP016800.1 CP002430.1 CP019752.1 CP014926.1 CP018213.1 CP014929.1 AP012170.1 CP030883.1 CP002036.1 CP013150.1 AF488831.1 CP005946.2 CP032465.1 CP006037.1 CP000424.1 CP031319.1 CP012651.1 CP032467.1 CP017408.1 CP029350.1 CP005943.2 CP019723.1 CP012659.1 CP002654.1 CP018329.1 CP025413.1 CP021673.1 CP005944.2 CP014919.1 CP017407.1 CP014928.1 CP018798.1 AP012171.1 CP014916.1 CP005985.1 CP014914.1 CP014932.1 CP014927.1 CP029538.1 CP005487.1 CP014918.1 CP007125.1 CP019747.1 CP002617.1 CP002619.1 CP019738.1 CP014988.1 CP017265.1 CP029537.1 CP029547.1 CP006036.1 CP007124.1 CP012188.1 CP017410.1 CP017262.1 CP014986.1 CP032932.1 CP014987.1 CP029548.1 CP017368.1 CP014202.1 FM179324.1 CP012190.1 CP017358.1 CP012191.1 AP012545.1 CP017263.1 CP012189.1 CP000935.1 CP017266.1 CP020097.1 CP024061.1 CP020098.1

CP024060.1 AP012168.1 CP032650.1 CP015967.1 AP014682.1 CP023493.1 CP017381.1 CP017376.1 CP032746.1 CR377166.1 CP029967.1 CP025208.1 CP033890.1 CP022710.1 AP017932.1 CP019751.1 CP016802.1 CP015862.1 Tree scale:1 CP025416.1 CP028979.1 CP021933.1 CP032752.1 CP017366.1 CP017356.1

Figure 2: Hierarchical clustering (HCL) analysis of the Lactobacillaceae plasmids. (a) Sharing of protein families encoded by plasmids as identified by markov clustering (MCL) across lactobacilli from various habitats. (b) HCL tree displaying the clustering of the plasmids based on their protein family composition. Black bars on the outermost periphery represent plasmid genome size. The plasmids having mobilization/conjugation genes were identified based the eggNOG annotations detailed in section 2.2.1 (c) Pie chart representing the proportion of the plasmids from each habitat under clusters I, I I, and III of the tree shown in panel (b).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

50

✱✱ 40

30 %

20 ✱✱✱

✱✱✱ ✱ 10 ✱ ✱ Free-living Vertebrate-adapted Nomadic 0 LSGKPMVEDC COG categories

Figure 3: Average proportion of each of the top -ten COG categories in the Lactobacillaceae plasmids per strain from various habitats. L, Replication, repair and recombination; S, Function unknown; G, Carbohydrate transport and metabolism; K,

Transcription; P, Inorganic ion transport and metabolism; M, Cellwall/membrane/envelope biogenesis; V, Defense mechanisms; E, Amino acid transport and metabolism; Cell cycle control, cell division, chromosome partitioning; and C, Energy production and conservation. The statistical analysis was performed to compare the data across various habitats using the Kruskal-Wallis test with Dunn’s post hoc test (***, p < 0.001; **, p < 0.01; *, p < 0.05). The error bars represent standard error of measurement.

(a) Replication, recombination, and repair (b) Transcription

25 ✱✱✱ 2.5 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which✱✱ was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 20 2.0

15 1.5 ✱ % % ✱✱ ✱✱✱ 10 ✱✱✱ 1.0 ✱ ✱✱✱ ✱✱✱ ✱ 5 ✱✱ ✱ 0.5 ✱✱✱ ✱

0 0.0

IntegraseResolvase XRE familyLacI family BglG familyAbrB familyAraC family LysR familyGntR family Transposase DNA Repair DeoR family DNA replication TetR/AcrR family

Mobile genetic elements Sugar transporter regulator DNA repair and recombination (d) Inorganic ion transport (e) Cell wall/membrane/ (c) Carbohydrate transport and metabolism and metabolism envelope biogenesis ✱ 4 8 4 ✱✱✱ ✱✱ ✱✱ ✱✱✱ ✱ ✱✱ 3 6 3 ✱ % % % 2 4 2

✱✱ ✱ 2 ✱ 1 1 ✱✱

0 0 0

Adherence

EPS biosynthesis Bile acid resistance Metal/Ion transporter Cell wall metabolism Galactose metabolism

Major facilitator superfamily Pentose phosphate pathway Starch and sucrose metabolism Inositol phosphate metabolism CarbohydratesOther carbohydratesmultiple pathways metabolismDNA protection during starvation PhosphotransferaseFructose system and (PTS)mannose metabolism Glycolysis/gluconeogenesis pathway Amino and nucleotide sugar metabolism (g) Amino acid transport and (h) Energy production (f) Defense systems metabolism and conservation 15 8 2.0 ✱✱

✱ 6 1.5 10

✱✱ ✱ % %

% 4 1.0

✱✱✱ ✱ 5 2 ✱✱ ✱✱✱ 0.5 Free-living ✱✱✱ ✱ Vertebrate-adapted 0 0 0.0 Nomadic

Peptidase RM system Toxin-antitoxin Osmoprotection ABC transporters Aerobic metabolism CRISPR-Cas system Amino acidAmino metabolism acid transporter Anaerobic metabolism Abortive infection systems

Figure 4: Average proportion of each of the COG subcategories in the Lactobacillaceae plasmids per strain from various habitats. The statistical analyses were performed to compare the data across various habitat using the Kruskal-Wallis test with Dunn’s post hoc test (***, p < 0.001; **, p < 0.01; * p < 0.05). The error bars represent standard error of measurement. The solid black triangles indicate the significant difference by the same test between the habitats for the species-level average which is mentioned in the Table S6 (▲▲▲, p < 0.001; ▲▲, p < 0.01; ▲, p < 0.05). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

group 1 × 2 group 2 × 8 group 3 × 3 group 4 × 2

0.5 kb

arsR arsD arsA arsC arsB Unknown function Unrelated function

Figure 5: Arsenic resistance gene clusters identified in 15 Lactobacillaceae plasmids. Numbers on the right side of the clusters indicate the number of plasmids showing that types of arrangement of the cluster. arsR, a regulator; arsA, ATPase component; arsB, secondary transporter; arsD, transcriptional regulator; and arsC, arsenate reductase. The plasmids represented in each group were from, group 1: L. sakei WiKim0074 (CP025207.1), L. curvatus ZJUNIT8 (CP029967.1); group 2: L. hokkaidonensis LOOC 260 (AP014681.1), L. buchneri NRRL B-30929 (CP002653.1), L. plantarum CLP0611 (CP019723.1), L. brevis ZLB004 (-) (CP021460.1), L. plantarum HAC01 (-) (CP029350.1), L. plantarum DR7 (-) (CP031319.1), L. plantarum ATG-K6 (CP032465.1), L. plantarum ATG-K8 (-) (CP032467.1); group 3: L. plantarum TMW 1.277 (CP017364.1), L. paraplantarum DSM 10667 (CP032747.1), L. plantarum WCFS1 (CR377166.1); group 4: L. plantarum MF1298 (-) (CP013150.1 and CP013151.1). Details are available in the supplementary Table S9. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

L.buchneri NRRL B-30929 (-) (CP002653.1) * * L. buchneri CD034 (CP003044.1) * L. brevis 100D8 (CP015339.1) * C. farciminis CNCM-L3699-S (-) (CP011954.1) * L. paraplantarum DSM 10667 (CP032745.1) * L. plantarum DSM 16365 (CP032755.1) * * * * L. plantarum LZ227 (CP015858.1) * * ** * ** L. plantarum ZFM9 (CP032645.1) L. pentosus ZFM222 (-) (CP032655.1) L. plantarum HFC8 (CP012657.1) * * * L. plantarum 16 (CP006041.1) L. plantarum X7021 (-) (CP025417.1) L. plantarum LQ80 (CP028980.1) ** ** L. plantarum TMW 1.1623 (CP017380.1) L. plantarum KACC92189 (CP024061.1) * * L. plantarum C410L1 (-) (CP017955.1) * * * * * 1 kb

epsA epsE epsX Other, EPS related Other, cell surface related Precursor Biosynthesis Phosphoregulatory GT epsY Unrelated function Transposase Decoration module Unknown function * pseudogene

Figure 6: EPS gene clusters identified in 16 Lactobacillaceae plasmids. The clusters encoded by the negative strand are represented by the negative sign in the parentheses after the strain name. The Genbank accession number of the plasmid has been mentioned in the parentheses after the strain name. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.20.258673; this version posted October 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(a) tyrosine/serine bsh phosphatase RR HPK permease transporter × 8 (b)

group 1 × 13

group 2 × 4 300 bp

arsR frnE gshR trxA trxB

Figure 7: Organization of the (a) bile and (b) oxidative stress resistance genes in Lactobacillaceae plasmids. Numbers on the right side of the clusters indicate the number of plasmids showing that types of arrangement of the cluster. The bile resistance genes (a) were bsh, choloylglycine hydrolase; phosphatase, protein tyrosine/serine phosphatase; RR, two- component response regulator; HPK, signal transduction histidine kinase; permease, ABC transporter permease protein; and transporter, ABC-type multidrug transport system ATPase component. The bile resistance genes were from L. salivarius UCC118 (CP000234.1), L. salivarius CECT 5713 (CP002037.1), L. salivarius JCM1046 (CP007647.1), L. salivarius str. Ren (CP011404.1), L. salivarius CICC 23174 (CP017108.1), L. salivarius ZLS006 (CP020859.1), L. salivarius BCRC 12574 (CP024065.1), L. salivarius BCRC 14759 (CP024068.1). The oxidative stress resistance genes (b) were arsR, transcriptional regulator; trxA, thioredoxin; trxB, thioredoxin reductase; frnE, disulfide isomerase; and gshR, glutathione reductase. The oxidative stress resistance genes were from, group 1: L. plantarum LP3 (CP017068.1), L. plantarum CAUH2 (-) (CP015129.2), L. brevis LMT1-73 (CP033887.1), L. plantarum LMT1-48 (CP033892.1), L. brevis SRCM101174 (CP021481.1), L. brevis 100D8 (CP015340.1), L. brevis KB290 (-) (AP012172.1), L. brevis SRCM101106 (CP021676.1), L. plantarum ATG-K2 (-) (CP032463.1), L. plantarum DSR M2 (CP022293.1), L. sakei WiKim0063 (CP022710.1), L. sakei WiKim0072 (-) (CP025138.1), L. sakei WiKim0074 (CP025207.1); group 2: L. plantarum ZFM55 (CP032360.1), L. plantarum ZFM9 (CP032643.1), L. plantarum SRCM102022 (CP021502.1), L. plantarum KACC 92189 (CP024060.1).