bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 The gill-associated symbiont microbiome is a main source of woody-plant

2 polysaccharide hydrolase genes and secondary metabolite gene clusters in

3 Neoteredo reynei, a unique from south Atlantic mangroves.

4

5 Thais L. Brito1, Amanda B. Campos2, F.A. Bastiaan von Meijenfeldt3, Julio P. Daniel1, Gabriella

6 B. Ribeiro1, Genivaldo G. Z. Silva4; Diego V. Wilke1; Daniela Morais5, Bas E. Dutilh3,6, Pedro M.

7 Meirelles2, Amaro E. Tridade-Silva1.

8

9 1 Drug Research and Development Center, Department of Physiology and Pharmacology, Federal

10 University of Ceara, Fortaleza, CE, Brazil;

11 2 Institute of Biology, Federal University of Bahia, Salvador, BA, Brazil

12 3 Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands;

13 4 Computational Science Research Center, San Diego State University, San Diego,

14 CA, USA;

15 5 National Institute of Metrology, Quality and Technology, Rio de Janeiro, Brazil;

16 6 Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre,

17 Nijmegen, The Netherlands.

18

19 * Corresponding author

20 E-mail: [email protected] (AETS)

21

22

23

1 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

24 Abstract

25

26 Teredinidae is a family of highly adapted wood-feeding and wood-boring bivalves,

27 commonly known as , whose evolution is linked to the acquisition of cellulolytic

28 gammaproteobacterial symbionts harbored in bacteriocytes within the gills. In the present work

29 we applied metagenomics to characterize microbiomes of the gills and digestive tract of Neoteredo

30 reynei, a mangrove-adapted shipworm species found over a large range of the Brazilian coast.

31 Comparative metagenomics grouped the symbiotic gammaproteobacterial community of gills of

32 different N. reynei specimens, indicating closely related bacterial types are shared, while intestine

33 and digestive glands presented related, and more diverse microbiomes that did not overlap with

34 gills. Annotation of assembled metagenomic contigs revealed that the symbiotic community of N.

35 reynei gills was a hotspot of woody-polysaccharides degrading hydrolase genes, and Biosynthetic

36 Gene Clusters (BGCs), while in contrast, the digestive tract microbiomes seems to play little role

37 in wood digestion and secondary metabolites biosynthesis. Metagenome binning recovered the

38 nearly complete genome sequences of two symbiotic Teredinibacter strains from the gills, a

39 representative of Teredinibacter turnerae “clade I” strain, and a yet to be cultivated Teredinibacter

40 sp. type. These Teredinibacter genomes, as well as unbinned gill-derived gammaproteobacteria

41 contigs, code for novelty including an endo-β-1,4-xylanase/acetylxylan esterase multi-catalytic

42 carbohydrate-active enzyme, and a trans-acyltransferase polyketide synthase (trans-AT PKS) gene

43 cluster with the gene cassette for generating -branching on complex polyketides. Multivariate

44 analyzes have shown that the secondary metabolome encoded on the genomes of Teredinibacter

45 representatives, including the genomes binned from N. reynei gills’ metagenomes, stand out within

46 the Cellvibrionaceae family by size, and enrichments for polyketide, nonribosomal peptide and

2 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

47 hybrid BGCs. Results grouped here add to the growing characterization of shipworm symbiotic

48 microbiomes and indicate that the N. reynei gill gammaproteobacterial community is a prolific

49 source of biotechnologically relevant enzymes for wood-digestion and bioactive compounds

50 production.

51

52 Introduction

53

54 Teredinidae, a family of obligate xylotrepetic (wood-boring) and xylotrophic (wood

55 feeding) mollusks, evolved from an ancestor lineage within the eulamellibranch order Myoida,

56 and succeeded in exploiting wood as new food source, most probably by concomitantly acquiring

57 cellulolytic bacterial symbionts [1]. These suffered dramatic body shape adaptations to

58 thrive as wood feeders including greatly reduced and burrowing-specialized valves (shells), and a

59 wood-sheltered worm-like body plan protruding from their shells, which led to their common name

60 of shipworms[2].

61 Shipworm bacterial symbiotic communities are formed by cultivated and as yet

62 uncultivated closely related gammaproteobacteria, located intracellularly in bacteriocytes within

63 the gills (ctenidia) tissue [3–6]. The prototypical shipworm symbiont is the cellulolytic and

64 nitrogen fixing bacterium T. turnerae, isolated from a great variety of Teredinidae species from all

65 over the globe [7,8]. Sequencing of the complete genome of one strain of T. turnerae, T7901,

66 revealed that the genome of this bacterium codes for an arsenal of enzymes specialized in breaking

67 down woody material, reinforcing the hypothesis that shipworm gammaproteobacterial symbionts

68 play a critical role on supporting host nutrition[9]. Indeed, two seminal works provided strong

69 evidence for this hypothesis: firstly multi isotopic mass spectroscopy (MIMS) combined with

3 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

70 transmission electron microscopy (TEM) showed, in Lyrodus pedicellatus, that T. turnerae cells

71 could fix atmospheric N2 in symbio, and then transfer nitrogenized compounds to their host [10].

72 Secondly, a combination of metagenomics and proteomics of Bankia setacea, showed that wood-

73 specialized hydrolytic enzymes secreted by the symbiotic community are, somehow, selectively

74 conducted from the gill symbiotic site up to the host’s cecum to reach their substrates, the ingested

75 wood particles [11].

76 Characterization of the T. turnerae T7901 genome also revealed that this endosymbiotic

77 bacterium contains a diversity of Biosynthetic Gene Clusters (BGCs) for secondary metabolites

78 comparable to those observed in free-living Streptomyces species known for their proficiency at

79 secondary metabolite production[9]. Indeed, compounds of the tartrolon family of boronated

80 antibiotic [12], and a novel triscatecholate siderophore called turnerbactin [13], were purified from

81 chemical extracts of T. turnerae cultures and had their respective biosynthetic routes identified

82 and characterized by retrobiosynthetic and phylogenetic approaches. Moreover, both BGCs were

83 proven to be expressed in symbio and signatures of these compounds were detected by mass

84 spectroscopy of shipworms whole extracts [12,13]. Such results fueled a hypothesis that

85 bioactive secondary metabolites could also play a role in supporting the symbiosis by providing

86 chemical defense and so, improving holobiont fitness [14], and/or mediate

87 competition/specialization to shape the distribution of bacterial types within the gills. Fluorescence

88 in situ hybridization (FISH) and confocal microscopy surveys of tissues from several shipworm

89 species provided evidence supporting both these hypothesis, showing that i) symbiotic

90 gammaproteobacterial types are structured within the gills, in separated niches of bacteriocytes,

91 and ii) tissues of the shipworm digestive system are virtually sterile (cecum) or containing a

92 discrete microbial community (intestine) [11,15].

4 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

93 Recently, the gill symbiotic community of the giant shipworm Kuphus polythalamia, a rare

94 species that burrows into mud and sediment instead of wood, was shown to be composed of sulfur

95 oxidizing chemoautotrophic gammaproteobacteria instead of Teredinibacter-related cellulolytic

96 types [16]. These findings suggested this bivalve is a chemoautotrophic descendent of a

97 xylotrophic ancestor, thus adding complexity to the biology of symbiosis in the family. N. reynei

98 is another singular shipworm that has been reported on mangroves of the south Atlantic, including

99 records of a large range of the 7.367 km long Brazilian coastline, from the States of Pará, at the

100 north of the country, to Santa Catarina, at the very south. N. reynei is the only species of the genus,

101 and can be rapidly identified by a unique feature: the presence of large and highly vascularized

102 dorsal lappets at the posterior end of the animal’s body, whose function is yet to be clarified [2,17].

103 N. reynei is a basal member of the family and has, characteristically, unsegmented pallets, a

104 globular (type II) stomach and an intestine that makes a loose loop forward embracing the

105 crystalline style-sac before running backwards toward the cecum [1]. The N. reynei digestive tract

106 includes also two digestive glands, one previously attributed to wood-digestion, another attributed

107 to filter feeding digestion, and both connected to the style-sac through ducts. In addition, N. reynei

108 possesses a large cecum, which can reach up to 44% of the animal’s body size, and an enormous

109 and cylindrically shaped anal canal whose aperture is controlled by a strong muscular sphincter

110 [18]. N. reynei gills extend from the base of the siphons up to the posterior end of the visceral mass

111 as demibranchs and then reduce to a food groove at the anterior end of the animal’s body, without

112 presenting an anterior gill, like in most other shipworm species. Therefore, it comprises only 20%

113 of N. reynei body length, a much shorter proportion when compared to other shipworm species.

114 Such reduced gills combined with a large cecum and anal canal, both continuously filled with

115 wood particles, were taken as anatomical evidences that N. reynei diet is primarily wood [2,17,18].

5 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

116 Previous culturing efforts lead to isolation of T. turnerae strains from N. reynei gill homogenate,

117 including CS30, a strain that displayed antimicrobial activity against a spectrum of gram positive

118 and negative indicator bacteria [19]. However, to date, the diversity and biotechnological potential

119 of N. reynei gill symbiotic microbiome was still to be explored by culture independent methods.

120 In the present work we applied metagenomic analysis to explore the N. reynei gill

121 symbiotic bacterial community, as well as the microbiome of the digestive glands and intestine, as

122 representative tissues of the shipworm digestive tract. Two symbiotic bacterial genome bins were

123 recovered and analyzed for their capability for producing hydrolytic enzymes and bioactive

124 secondary metabolites. Our findings reinforced the importance of the shipworm symbiotic system

125 as a subject of study and discovery.

126

127 Material and Methods

128

129 Specimens

130

131 Three whole adult specimens of N. reynei shipworm were collected, using care to avoid

132 injury, from decaying wood chunks in the Coroa Grande mangrove area at Sepetiba Bay, Rio de

133 Janeiro State, Brazil (22.91°S, 43.87°W) on December 5, 2014, and transported to the Laboratory

134 of Biotechnology at the National Institute of Metrology, Quality and Technology (Inmetro, RJ) in

135 sterilized glass jars.

136

6 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

137 Metagenomic DNA extraction, sequencing, annotation, and statistics.

138

139 Digestive glands, intestines and gill tissues were dissected from freshly collected animals

140 (out of wood for ~2hs at 24oC) snap frozen and ground in liquid nitrogen, and processed for

141 metagenomic DNA extraction using 2% CTAB lysis buffer, phenol-chloroform-isoamyl alcohol

142 (25:24:1) deproteinization, and the columns from PowerSoil® DNA Isolation Kit (MO BIO

143 Laboratories, Carlsbad, CA,USA) for final purification, as previously described [20]. For each

144 specimen, the two digestive glands were combined prior to grounding for DNA purification.

145 Additionally, two independent DNA purifications were performed from pulverized intestine and

146 gills tissues (replicates), totalizing 5 metagenomic DNA samples per shipworm specimen.

147 Metagenomic DNA samples purity and integrity were respectively evaluated by quantification

148 with a NanoDrop spectrophotometer (Thermo Fisher Scientific Inc.), and by 1% agarose gel

149 electrophoresis. A Qubit fluorimeter (Thermo Fisher Scientific Inc.) was used for accurate double-

150 stranded DNA quantification. Metagenomic DNA libraries were prepared for the fifteen high-

151 quality metagenomic DNA samples, using a Nextera XT DNA Sample Preparation Kit (Illumina),

152 as recommended by the manufacturer. Metagenomic libraries were sequenced using the 600-cycle

153 (300 bp paired-end runs) MiSeq Reagent Kits v3 chemistry (Illumina) with a MiSeq Desktop

154 Sequencer (Illumina) at the Center for Genomics and Bioinformatics (CeGenBio) of the Drug

155 Research and Development Center (NPDM), at the Federal University of Ceara, Brazil. Raw

156 sequence data were submitted to the MG-RAST server (version 4.03) [21] for sequence paired-

157 end joining, quality control, and automated annotation pipeline. Taxonomic annotations were

158 performed using RefSeq, and functional signatures were retrieved with both Clusters of

159 Orthologous Groups (COG) and Subsystems technology databases using the default cutoffs

7 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

160 settings (Supplementary Table S1). Taxonomical and functional signatures were submitted to

161 comparative metagenomics using R packages and the Statistical Analysis of Metagenomic Profiles

162 (STAMP) software package, version 2.1.3[22]. Multivariate statistical analyzes were performed

163 with R using the following unsupervised learning techniques: i) hierarchical clustering, with Ward

164 grouping method on a Euclidean distance matrix, ii) PCA biplot, and iii) supervised/unsupervised

165 random forest at R, as previously reported [23]. For hierarchical clustering, cluster significance

166 was accessed by bootstrap resampling using pvclust algorithm[24]. At STAMP, “two groups”

167 comparisons were conducted considering the two main clades obtained by unsupervised

168 hierarchical clustering. For that, the prokaryotic taxonomic (Class level of RefSeq) and functional

169 (Level 2 of the Subsystems technology) features of each group were retrieved from MG-RAST

170 and compared using two-sided Welch’s t-Test with confidence interval of 0.95 and considering

171 Benjamini-Hochberg False Discovery Rate corrected P-values (q-values) < 10-5 as significantly

172 relevant.

173

174 Metagenome assembly, cross-comparison and contig annotation.

175

176 For annotation-independent cross-assembly comparison between the 15 metagenomic

177 samples generated in this study we used crAss tool [25]. First, all paired-end joined quality

178 controlled metagenome reads of all tissues and specimens were assembled together using SPAdes

179 3.8 in --meta mode, and k-mer sizes of 21, 33, 55, and 77 [26]. Reads were mapped back to contigs

180 using Bowtie2 [27] and the SAM file was used on crAss, which computes the abundance of reads

181 from each metagenome contributing to form each contig in the cross assembly. The coverage of

182 each of the fifteen metagenomes in contigs was used to build a distance matrix that was displayed

8 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

183 in the form of a cladogram [25]. The tissue-specific contigs retrieved from gills, intestine and

184 digestive glands datasets were taxonomically annotated with the contig annotation tool (CAT) [28]

185 which predicts protein-coding genes on a contig, maps these to NCBI's non-redundant protein

186 database (NR), and subsequently employs a last common ancestor algorithm to give a conservative

187 estimate of taxonomy for the contig. CAT was run with the b1 parameter set to 10, and the b2

188 parameter set to 0.5. Prodigal 2.6.3 [29] and Diamond 0.9.14 [30] were employed for protein

189 calling and database mapping, respectively. NR was downloaded on November 23, 2017.

190

191 Symbiotic genome binning

192

193 Metagenomic sequencing reads were cross-assembled for each tissue with SPAdes 3.8.0 in

194 --meta mode [26]. Tissue-specific contig sets were each binned into draft genome sequences using

195 MetaBAT 0.26.3 [31] that exploits signatures that are typical of contigs belonging to the same

196 genome, including the nucleotide usage and the abundance of the contigs in the different samples.

197 In order to get contig abundances, reads of all the samples were mapped to the tissue-specific

198 assemblies with Burrows-Wheeler Aligner (BWA) 0.7.12 with the BWA-MEM algorithm[32].

199 Mappings were converted to the binary BAM format with SAMtools 0.1.19 [33] and depth files

200 were created with the script jgi_bam_summarize_contig_depths that is supplied with MetaBAT.

201 Binning with MetaBAT was performed with the sensitive setting, after tests with other presets

202 showed this to give the best results in terms of bin completeness and contamination. Bin

203 completeness, contamination, and strain heterogeneity were assessed using CheckM 1.0.7 [34]

204 based on the presence and absence of sets of single-copy marker genes. CheckM was run in the

205 lineage specific work flow, which places the bins in a reference tree and based on that placement

9 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

206 searches for lineage-specific marker genes. Completeness, contamination, and strain heterogeneity

207 were assessed on the lowest level of placement (node) in the CheckM backbone tree. Two high

208 quality genome bins designated gills.bin.1 and gills.bin.4 (>50% completeness and <5%

209 contamination, see Table 1) were generated from the gill assembly, as discussed in the Results

210 section. No high-quality bins could be generated from the digestive gland or intestine assemblies.

211

212 Symbiotic genome functional annotation and multivariate analysis

213

214 Nearly complete genomes gills.bin.1 and gills.bin.4, as well as other related

215 gammaproteobacterial genomes, including representatives of the Cellvibrionaceae family, and

216 more specifically, the Teredinibacter clade (Suppl. Table 7) were automatically annotated at the

217 RAST server [35] under default settings. Carbohydrate-active enzymes (CAZymes) were

218 annotated using the dbCAN web server [36]. Results were filtered for E-value of 1e-5 and domain

219 coverage fraction > 0.7. Using such parameters, reannotation of T. turnerae T7901 genome leads

220 to recover of 99% of the CAZy domains, with false positive and false negative rates of ~8 and

221 ~7% respectively. Natural products Biosynthetic Gene Clusters (BGCs) were annotated using

222 antiSMASH bacterial version [37] with RAST GenBank generated files as input and default

223 settings. The genes composing the resistome (i.e. genes related to antibiotic resistance) were

224 annotated and quantified with RGI (Resistome Genes Identificator) tool of the Comprehensive

225 Antibiotic Resistance Database (CARD) [38], under the discovery criteria for detection of perfect,

226 strict and loose hits. We then performed Principal Component Analysis (PCA) to cluster the

227 selected genomes according their diversities of BGCs and resistance genes, and supervised

228 Random Forest (RF) [39] to highlight the fifteen most influential vectors in each case, using

10 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

229 ggbiplot R library (Vu, 2011) to plot the results. The number of variables in PCA was reduced by

230 supervised RF to, with higher mean accuracy using the R library [40]. To test if the relative

231 abundance of resistome and metabolome (specifically Nrps, PKS and their hybrid paths) related

232 genes were significantly different between free-living and host-associated bacteria we performed

233 Kruskal-Wallis non-parametric test using the kruskal.test() function in R and considered p-value

234 lower than 0.05 significant. We tested each category independently and adjusted the p-value using

235 the Bonferroni adjust method with the p.adjust()function in R.

236

237 Phylogenetic analysis

238

239 The two high quality genome bins were placed in a phylogeny with closely related

240 Gammaproteobacteria based on the 43 phylogenetically informative marker genes that CheckM

241 uses to assess bin placement in its own backbone tree (Supplemental Table S6 in [34]). All

242 genomes from the family Cellvibrionaceae were downloaded from the PATRIC (Pathosystems

243 Resource Integration Center, [41] genome database, to our knowledge the largest publicly

244 available prokaryotic genome database. One genome per family of Halieaceae, Porticoccaceae,

245 and Spongiibacteraceae was added to together serve as outgroups. It was recently shown that the

246 monophyly of Cellvibrionaceae could only be clearly resolved when using a genome-scale

247 supermatrix [42] while a tree based on the RpoB gene showed Cellvibrionaceae to be paraphyletic

248 with Microbulbiferaceae. Thus, no Microbulbiferaceae genomes were included here. Indeed, when

249 Microbulbiferaceae were included in the tree, this led to paraphyly of the Cellvibrionaceae, but the

250 placement of the novel genome bins within the Cellvibrionaceae remained unchanged (not shown).

251 Genomes were downloaded based on the genome_lineage file on the PATRIC server that was

11 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

252 downloaded on December 20, 2017. One Cellvibrionaceae genome present in the genome_lineage

253 file, Cellvibrio mixtus strain PSBB022, could not be retrieved from the PATRIC server. CheckM

254 was run on all the genomes to identify the 43 marker genes, translated sequences were individually

255 realigned after dealignment with Clustal Omega 1.2.3 [43] and subsequently concatenated, filling

256 gaps if a gene was not found in a genome. A maximum-likelihood tree was inferred with RAxML

257 8.2.9 [44] using the PROTCATLG model and 100 rapid bootstraps (random seeds -p and -x were

258 both set to 12345), and the best scoring tree was visualized with iToL [45]. Additionally, the partial

259 nucleotide sequences of the protein-coding gene rpoB (1,020 bp) and gryB (1,053 bp) were

260 retrieved from a previously determined set of 25 T. turnerae isolates [8], and aligned with

261 respective gene fragments from binned genomes gills.bin.1 and gills.bin.4 with Molecular

262 Evolutionary Genetics Analysis (MEGA) software version 6.0[46], using the built-in ClustalW

263 program with default settings for “codons” (protein-coding sequences). Maximum likelihood fits

264 for 24 different nucleotide models were evaluated for each gene and best-fit model of evolution

265 selected according to the lowest Bayesian Information Criteria (BIC) scores. Neighbor-joining

266 phylogenies were reconstructed employing Kimura 2-parameter nucleotide substitution model

267 improved with 5 discrete gamma categories for a total of 1000 bootstraps.

268

269 Results

270

12 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

271 The N. reynei gill symbiotic microbiome is dense and highly dominated

272 by -proteobacteria, while digestive glands and intestine present similar

273 diverse microbial communities that do not overlap with the gill

274 community.

275

276 To extend knowledge regarding the role of shipworm symbiotic bacteria in wood digestion

277 and host defense, we performed functional metagenomics of N. reynei digestive gland, intestine,

278 and gill tissues (see methods). Shotgun sequencing of fifteen metagenomes samples (Table S1),

279 generated 21,548,783 reads, from which approximately 92% (3.94 Gb) passed the quality control

280 and annotation pipeline of the MG-RAST server. Approximately 1.2, 55.7 and 3.8% of the quality

281 proofed data presented matching hits for ribosomal RNA, protein, and protein with known

282 functional coding genes respectively (Table S1).

283 Unsupervised multivariate analysis showed that the N. reynei gill microbial community has

284 unique taxonomic and functional profiles (Figure 1). Hierarchical clustering of the microbial

285 annotations formed consistent cladogram topologies, with gills samples grouping apart from

286 digestive gland and intestine samples in two main clades (Figure 1A and 1B). Indeed, gill samples

287 exhibited a dense but low diversity microbiome, with Bacteria dominating the hits against the

288 RefSeq database (> 98%), and the vast majority of these hits being tracked to -proteobacteria (>

289 92%), and ultimately, to the genus Teredinibacter (Figure 1A, Figure S1). Accordingly, 16S rRNA

290 annotation against the RDP database confirmed the dominance of signatures of Teredinibacter in

291 the gill microbiomes (~ 88%) (Suppl. Fig. 1B). Intestine and digestive gland samples, on the other

13 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

292 hand, presented more diverse microbiomes distinct from the gills, responding respectively to 23,8

293 and 28,7% of the taxonomical annotations (Figure 1A, Figure S1).

294

295 Figure 01 – Multivariate analysis of metagenomes microbial taxonomical and functional

296 features. A) Unsupervised hierarchical clustering of taxonomical (TAX) and functional (FUNC)

297 features. Clades formed by gill samples and by intestine plus digestive gland samples are

298 highlighted respectively in purple and dark green. The pvclust AU (Approximately Unbiased) and

299 BP (Bootstrap Probability) p-values are shown at clades nodes in red and light green respectively.

300 B) Relative abundances of metagenome bacterial classes and functions respectively signed with

301 RefSeq and COG (Level 2) databases. C) Multiple dimensional scale (MDS) plot of N. reynei

302 metagenomics samples using distances and the fifteen most important bacteria classes (RefSeq

303 database) and functions (COG database at Level 2) vectors as calculated from unsupervised

304 random forest. Digestive gland, intestine and gill samples are shown as green, brown and purple

305 dots respectively. Clusters of Orthologous Groups are classified into functional categories as

306 follow: A - RNA processing and modification; E - Amino acid transport and metabolism; F -

307 Nucleotide transport and metabolism; H - Coenzyme transport and metabolism; I - Lipid transport

308 and metabolism; K – Transcription; L - Replication, recombination and repair; N - Cell motility;

309 O - Post-translational modification, protein turnover, and chaperones; Q - Secondary metabolites

310 biosynthesis, transport, and catabolism; S - Function Unknown; T - Signal transduction

311 mechanisms; U - Intracellular trafficking, secretion, and vesicular transport; W - Extracellular

312 structures; Z - Cytoskeleton

313 Non-parametric multidimensional scaling (nMDS) combined with random forest biplot

314 (Figure 1C) showed that enrichments to -proteobacteria is a major vector driving the observed

14 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

315 grouping of gill samples, while gram-positive (Actinobacteria, Flavobacteriia, Clostridia), gram-

316 negative (-proteobacteria, Cloroflexia, Bacteroidia, Sphingobacteriia), and unclassified

317 Cyanobacteria influence the grouping of digestive glands and/or intestine samples. Additionally,

318 functions under the COG subcategories Cytoskeleton; Cell motility; Coenzyme transport and

319 metabolism; RNA processing and modification; Signal transduction mechanisms;

320 Posttranslational modification, protein turnover, chaperones; Transcription; and Secondary

321 metabolites biosynthesis, transport and catabolism influenced gill samples grouping, while

322 digestive glands and intestine microbiomes were driven by Lipid; Amino acid, and Nucleotide

323 transport and metabolism besides Extracellular structures, and Intracellular trafficking secretion

324 and vesicular transport (Figure 1D). The two group comparisons performed with reference to the

325 two main clades obtained by the hierarchical clustering analysis and using taxonomical and

326 functional microbial sequence annotations corroborate the nMDS analysis, with biologically

327 significant enrichments detected in either gills or digestive tract (digestive glands and intestine)

328 microbiomes closely matching the vectors driving samples grouping on nMDS plot (Figure S2).

329 Although highly informative, comparing taxonomic and functional annotations confine the

330 sequence data to the reference databases, therefore biasing the results according to the database

331 representativeness. Considering that, we further performed annotation-independent comparative

332 metagenomics with crAss [25], a tool that accesses (and score) similarities between metagenomes

333 according to their shared entities (cross-contigs) after cross-assembling all samples’ reads.

334 Interestingly, metagenomes were grouped into one of two different cladograms topologies

335 reconstituted from four distance matrices (Figure 2). When qualitative comparison was applied

336 using "minimum" and "shot" presence/absence-based distance formulas, the samples grouped

337 significantly by the animal source (p<0.001) (Figure 2A). Although, when more quantitative

15 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

338 distance formulas "reads" and "wootters" were used, in which the contig number of reads is

339 considered as reflecting its abundance in the environment, samples grouped significantly

340 (p<0.001) by the tissue, driven by gill samples, which clustered together regardless of the animal

341 source (Figure 2B).

342 Figure 02 – Comparison of N. reynei metagenomic samples by cross-assembling. A)

343 Cladogram representing cross-contigs (i.e., shared contigs containing reads from at least two

344 metagenomes) grouping when using presence-absence-based formulas (distance formulas ‘shot’

345 and ‘minimum’). Samples grouped according the specimen of origin, being clades formed of

346 samples of N. reynei specimens #1, 2 and 3 highlighted in red, green and brown respectively. B)

347 Cladogram representing cross-contigs grouping when using more qualitative distance measures

348 (distance formulas ‘Wootters’ and ‘reads’). Gill samples grouped together regardless the specimen

349 of origin, as highlighted in dark-blue.

350 Further, we applied the contigs annotation tool (CAT) algorithm to get taxonomical

351 classification of contigs assembled for each tissue-type dataset (see Methods). Such analysis

352 showed that gill samples contain 10 times more bacterial-derived contigs (~25% of the signatures)

353 than intestine and digestive glands (~2-3% of the signatures) (Figure S3). represented 30

354 – 38% of the obtained classifications being the most prevailing annotation on the three tissues

355 (Figure S3). Corroborating raw reads annotations, gammaproteobacteria dominated the gill

356 microbial community (~87% of bacterial contigs), with Teredinibacter as the most abundant

357 classified bacterial genus (~81% of gammaproteobacterial reads). Additionally, the discrete

358 microbiomes of digestive glands and intestine were reaffirmed as more diverse with Proteobacteria

359 (~0.6 – 0.7%), Actinobacteria (~0.4 – 0.5%), Bacteroidetes (~0.3 – 0.6%) and Firmicutes (~0.3%)

360 contigs leading the obtained microbial hits (Figure S3).

16 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

361

362 T. turnerae and Teredinibacter sp. compose the N. reynei gill symbiotic

363 community

364

365 Genome binning was then performed in an effort to recovering microbial genomes

366 representing each investigated tissue microbiome. Two -proteobacterial genomes bins presenting

367 the quality required for further analysis (> 50% completeness, < 5% contamination), as assessed

368 by CheckM tool [34], were recovered, both from the gill metagenome dataset (Table 1). The

369 genome bins, labeled as “gills.bin.1” and “gills bin.4”, were automated assigned as T. turnerae

370 and Teredinibacter sp. respectively (Table 1).

371

372 Table 1 – N. reynei gill symbiotic genome bins metrics and features.

Genome bin/Features Gills.bin.1 Gills.bin.4

Marker lineage s_turnerae (UID4667) g_Teredinibacter (UID4659)

# genomes 2 13

# markers 1792 1055

# marker set 305 271

Completeness 87.12 96.45

Contamination 2.8 4.88

Strain heterogeneity 0 55

Genome size (bp) 4418468 5127066

# contigs 355 207

17 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

N50 (contigs) 16757 39993

Mean contig length (bp) 12446 24768

Longest contig (bp) 97957 134446

GC 51.59 53.41

373 Genome bins quality as accessed by the CheckM tool.

374 # - Number. Abbreviation: bp – base pairs.

375

376 Firstly, the taxonomic affiliations of gills.bin.1 and gills.bin.4 genome bins were checked

377 by genome-scaled supermatrix (Figure 3) and one-protein-coding taxonomic marker phylogenies

378 (Figure S4). Corroborating with the CheckM taxonomical annotation, which is based on lineage-

379 specific marker genes, both phylogenies provided highly similar tree topologies where both

380 genome bins fell in a subclade of Cellvibrionaceae family, with gills.bin.1 grouping within the

381 previously defined “clade 1” of T. turnerae [8] and gills.bin.4 appearing as outgroup of this species

382 main clade (Figure 3; Figure S4). The detection of a N. reynei symbiotic “clade I” T. turnerae

383 strain is in agreement with previous affiliation of a N. reynei T. turnerae isolate, T8508, also falling

384 within this clade [8]. Moreover, as observed for other Teredinidae species, the present results

385 showed that N. reynei gill symbiotic microbiota contains species in the Teredinibacter clade

386 including, but not limited to, T. turnerae.

387

388 Figure 03 – Phylogeny of two high quality Teredinibacter bins based on a concatenated

389 alignment of 43 proteins. RAxML maximum likelihood tree based on the CAT model of rate

390 heterogeneity and LG amino acid substitution matrix, with 100 rapid bootstraps. To improve

391 readability, bootstraps for the Teredinibacter turnerae sub-clades I and II (with bootstrap values

18 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

392 of 87 and 97 respectively) are omitted. The tree is rooted in between the Cellvibrionaceae branch

393 and representatives of Halieaceae, Porticoccaceae, and Spongiibacteraceae. Color coding of the

394 families is similar to the one used by Spring and co-authors (2015). The scale bar represents 0.05

395 nucleotide substitutions per site.

396

397 N. reynei symbiotic Teredinibacter genome bins code for wood

398 specialized repertoire of hydrolases including novel multicatalytic

399 enzymes

400

401 The T. turnerae T7901 genome codifies an arsenal of enzymes devoted to breaking down

402 the lignocellulosic plant material [9]. Based on this knowledge, we annotated the recovered

403 genome bins against the carbohydrate-acting enzymes (CAZy) database [47] and compared their

404 cazymes profile with the one for T7901, disposed at the CAZy server (Table S2). In the three

405 compared genomes, cazymes represented between 3.5 – 4% of the protein coding genes,

406 encompassing a diversity of predicted protein domains including 32 – 37% of glycoside hydrolases

407 (GH), 9 – 12% of glycoside transferases (GT), 7 – 13% of carbohydrate esterases (CE), 1 – 2% of

408 polysaccharide lyases (PL), and 42 to 44% of carbohydrate binding modules (CBM) (Table S2).

409 Additionally, when GH families were grouped according their substrate specificities, it was shown

410 that the binned genomes also presented the “wood-specialized” profile described for T7901 [9],

411 where a major portion (43 – 51%) of the GHs are devoted to the digestion of wood-composing

412 polysaccharides (Figure 4A). Furthermore, gills.bin.1 contains identical or near identical (E value

413 = 0.0, amino acids identities ≥ 99%) homologs to the multidomain and multi-catalytic cazymes

19 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

414 detected in the T7901 genome, the exception being TERTU_RS21420, coding for the

415 biochemically characterized multifunctional cellulase CelAB [48] (Table S3). In comparison,

416 gills.bin.4 codes less conserved homologs (E value = 0.0, amino acids identities ≥ 75%) to only

417 three of T7901 multicatalytic enzymes (Table S3). However, this genome bin codes for other novel

418 multidomain/multicatalytic enzyme configurations, including a putative acetyl xylan esterase/

419 endo-1,4-β-xylanase (peg.397) and two putative dual pectin acetylesterases (peg.2052 and

420 peg.3260) (Figure 4B).

421

422 Figure 04 – Gills genome bins and unbinned contigs carbohydrate active enzymes

423 (CAZymes). A) Proportion of GH domains according to the substrate specificity at gills.bin1 and

424 gills.bin 4 binned genomes and at other unbinned contigs from gills metagenome dataset. Substrate

425 specificities as previously defined [9]: dark green = cellulose/xylan GH families (5, 6, 8, 9, 10, 11,

426 12, 44, 45, 51, 52, 62, and 74); dark blue = GH families with other or not-unique specificities (1,

427 2, 3, 13, 15, 23, 27, 30, 31, 35, 39, 43, 73, 77, 78, 79, 94, 95, 103, 108, 109, 115, 128, 130); light

428 grey = laminarin GH families (16 and 81); yellow = other plant cell wall polysaccharides GH

429 families (26, 53, and 67); light blue = chitin GH families (18, 19, and 20); light green =

430 peptidoglycan GH families (28 and 105). B) Novel multicatalytic CAZymes configurations

431 detected on Teredinibacter sp. gills.bin.4 genome bin (labeled in black), and on unbinned contigs

432 from gill-symbiotic gammaproteobacterial community (labeled in red). Abbreviations key:

433 glycoside hydrolases (GH), carbohydrate esterases (CE), carbohydrate binding modules (CBM).

434 Catalytic domains and binding modules are color coded as black squares and gray circles.

435 When the entire set of contigs assembled from the gill dataset was mined, additional 291

436 putative cazymes not mapped to the genome bins were found. In total, N. reynei gills contained

20 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

437 882 carbohydrate active domains, and comparing to the cazymes profile of the symbiotic genome

438 bins, the gill dataset presented an ~2.3 X enrichment for GTs, and 1.3 X reduction on GHs (Suppl.

439 Table 2). However, the wood-specialized profile is maintained for the entire gill symbiotic

440 community CAZymes (Fig. 4A). In fact, other unique multicatalytic woody plant material

441 digesting CAZymes that have not binned to Teredinibacter genomes were detected, such as a

442 putative endo-β-1,4-xylanase/ endo-β-1,4-galactanase (NODE.622) and two endo-β-1,4-

443 xylanase/acetylxylan esterases (NODE.790a and b) (Fig. 4B).

444 Finally, we evaluate genomes CAZymes in light of the recent genome-enabled-proteomic

445 study performed with shipworm specimens of B. setacea species, which revealed that an array of

446 carbohydrate-active-domains derived from the gill symbiotic gammaproteobacterial community

447 are found in the animals’ cecum contents [11]. We verified that, in the compared genomes and in

448 the gill microbiome dataset, the related domains families are enriched, representing 82 – 89% of

449 the cellulase/xylanases (GH families 5, 6, 9,10, 11, 45 and 53), 50 – 75% of the carbohydrate

450 esterases (CE families 1, 3, 4, 6 and 15) and 44 – 62% of the carbohydrate binding motifs detected

451 (CBM2, 10, 22, 57, 60, 61) (data not shown).

452 Contrasting with the rich and wood devoted repertoire of hydrolyses present in the gill

453 symbiotic site, contigs from digestive glands and intestine datasets retained only 29 and 101

454 CAZymes related domains respectively, where GHs are absent in recovered enzymes of digestive

455 glands and account for only 4% of intestine-proteins CAZymes, all of which have substrate

456 specificity other than cellulose/xylan (Table S2, data not shown).

457

21 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

458 The N. reynei symbiotic genome bins code for a diverse repertoire of

459 biosynthetic gene clusters (BGCs) including Teredinibacter related and

460 novel putative pathways

461

462 Further, we evaluated the potential of the N. reynei associated microbiome to produce

463 bioactive secondary metabolites. Firstly, the gill, digestive gland and intestine assembled sequence

464 dataset was mined against the robust shell of algorithms of the antiSMASH server [37]. The gill

465 symbiotic microbiome was shown to be a hotspot of BGCs, with a total of 119 contigs encoding

466 enzymes involved in biosynthesis of complex peptides, polyketides and hybrid compounds,

467 besides poly-unsaturated fatty acids (PUFAs), terpenes, arylpolyenes, homoserine lactones,

468 ectoine and bacteriocins (Figure 5A). On the contrary, no secondary metabolite clusters were

469 detected in the digestive gland dataset, and only two contigs for modular type I PKS were found

470 in intestine samples (data not shown). Intriguingly, the intestine-derived type I PKS contigs best

471 BLASTp hits were to a fusarin C synthetase-like (XP_021367770) and a erythronolide synthase

472 (OWF54747.1) genes linked to the genome of the bivalve Mizuhopecten yessoensis (Yesso

473 scallop) [49]. Since modular type I PKS occur almost exclusively in microbes, it is unclear if these

474 homolog genes are truly encoded in the M. yessoensis genome, which would indicate a rare inter-

475 kingdom horizontal gene transfer event, or are from a sequenced contaminating DNA source, as

476 an associated bacterium.

477 We then reanalyzed the secondary metabolome of the reference T. turnerae T7901 genome

478 via the antiSMASH server. Previously, T7901 was shown to devote an unexpectedly high

479 percentage (~7%) of its genetic content to natural product biosynthetic pathways [9]. The new

22 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

480 genome mining reveals a total of 13 BGCs, adding new putative pathways for bacteriocins (4),

481 terpene (1) and ectoine (1), and condensing firstly delimitated “regions” 3, 4, and 5 [9], into |a

482 putative giant Bacteriocin-T1pks-Nrps BGC of ~270.6 kb (Table S4). Screening of gills.bin.1 and

483 gills.bin.4 genome bins lead, respectively, to the detection of 19 and 14 contigs, representing

484 putative BGCs, or fragments thereof, for production of compounds from a diversity of classes such

485 as polyketides, non-ribosomal peptides, terpenes, and bacteriocins (Figure 5A). To get this analysis

486 at higher perspective, we also annotated the secondary metabolome of all genomes in the

487 Cellvibrionaceae family included in our genome-wide phylogeny (Figure 3) and resolved the

488 detected BGCs diversity by supervisioned random forest combined with multidimensional scaling.

489 The N. reynei gill symbiotic genome bins contain secondary metabolomes highly related to the

490 ones from representative T. turnerae strains, which grouped tightly mostly because of enrichments

491 for modular PKS, NRPS and hybrid pathways (Figure S5).

492

493 Figure 05 – Gills genome bins Biosynthetic Gene Clusters (BGCs). A) Secondary

494 metabolomes. BGCs/contigs detected in Teredinibacter turnerae (T7901 and gills.bin.1),

495 Teredinibacter sp.(gills.bin.4) and on contigs derived from N. reynei gill assembled dataset.

496 Clusters are grouped, and color coded according to the compound class. B) Novel putative trans-

497 AT PKS biosynthetic gene clusters detected on Teredinibacter sp. gills.bin.4 genome and the main

498 predicted catalytic domains of their open reading frames (ORFs). Biosynthetic, transport-related,

499 regulatory, other -branching-related genes are color coded in dark-blue, light-blue, green, gray

500 and orange respectively. Domains keys: thioesterase (TE), aminotransferase (AMT), adenylation

501 (A), ketosynthase (KS), ketoreductase (KR), dehydrogenase (DH), thiolation (T), enoyl-CoA

502 dehydratases (ECH), 2-Nitropropane dioxygenase (2NPD), polyketide synthase cyclase (PKS

23 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

503 CY), 4'-phosphopantetheinyl transferase (4PPT). Putative inter-proteins bimodules are present in

504 yellow, PKS-encoded ECH catalytic domains are present in orange.

505 Eighteen of the putative BGCs from gills.bin.1 and additional fourth contig undetected by

506 the antiSMASH server could be mapped to all 13 BGCs from T. turnerae T7901 secondary

507 metabolome with high pairwise identity (~99%) and BGCs coverage (~88%) (Table S4). BLASTp

508 inspection showed that the type 1 PKS route from gills.bin.1 genome bin lacking in T7901 genome

509 is actually conserved in other T. turnerae genomes, as the ones from strains T8412, T8413 and

510 T8415 (E value = 0 ,100% query coverage, ~99% sequence identity) (Table S5). Therefore, such

511 results corroborate the grouping of gills.1 genome bin as a T. turnerae strain.

512 Conversely, only a putative bacteriocin BGC encoded in gills.bin.4 could be mapped to the

513 T. turnerae T7901 secondary metabolome (Suppl. Table 4). BLASTp analysis showed that eight

514 others putative BGCs from this Teredinibacter sp. genome bin were also conserved (~97% gene

515 coverage and ~73% identity) in other Teredinibacter genomes (Table S5). The 5 remaining contigs

516 from gills.bin.4 are novel among the secondary metabolomes of shipworms symbionts.

517 The core biosynthetic proteins encoded on these BGCs presented best BLASTp hits to

518 putative proteins from Bacteroidetes (~ 40%), gammaproteobacteria (33%), and

519 betaproteobacteria (~15%), besides alphaproteobacteria and Firmicutes (Table S6). Three of these

520 hypothetical novel BGCs (NODE_76, NODE_26 and NODE_85) code for typical multi-modular

521 trans-AT PKS enzymes, with a catalytic domain inventory including two inter-proteins bimodules,

522 commonly involved in formation of ,-olefinic moieties, and a gene cassette typically involved

523 in the formation of -branching in bioactive compounds such as the highly cytotoxic pederins and

524 the antibiotic bacillaene [50] (Figure 5B).

525

24 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

526 The Teredinibacter resistome singularity is driven by putative resistance

527 genes to PKs-NRPs derived antibiotics

528

529 The Resistome is defined as the collection of all antibiotic resistance genes present in a

530 microorganism’s genome, including precursor genes, which encode proteins retaining weak

531 antibiotic resistance or binding activity, and that under selective pressure can evolve into a new

532 resistance marker [51]. Here we also investigated the resistome coded in Teredinibacter genomes,

533 including gills.bin.1 and gills.bin.4 genome bins, and other representative genomes of the

534 Cellvibrionaceae family. The vast majority of resistance signatures obtained with the Resistance

535 Gene Identifier (RGI) tool from the Comprehensive Antibiotics Resistance Database (CARD) [52]

536 were under the lower stringency cutoffs, which includes putative resistance genes precursors

537 (Table S7). Multivariate comparisons showed that Teredinibacter resistomes grouped tightly and

538 are particularly driven by putative resistance markers to fluoroquinolone (total of 2110 hits) and

539 to antibiotics derived from PKS, NRPS and hybrid compounds, including macrolides (2071),

540 monobactam (2071), tetracycline (1963), penam (1343), peptides (816) and glycopeptide (739)

541 antibiotics (Figure 6A). The relative abundance of genes encoding resistance related proteins was

542 not significantly different (adjusted p-value of 0.28) across the genera inside the Cellvibrionaceae

543 family, suggesting that the resistome size is conserved (Fig 6B, yellow bars). Indeed, resistomes

544 of both Teredinibacter (13) and Cellvibrio (6) representative genomes were of similar size

545 (representing respectively ~5.3 and ~5.6% of the protein coding genes) (Table S7), despite the

546 intracellularly symbiotic life-style of the former and, mostly free-living nature of the latter. In

547 contrast, BGCs and, more specifically, Nrps-PKS-hybrid metabolic paths relative abundances

548 differed significantly between host-associated and free-living Cellvibrionaceae (adjusted p-values

25 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

549 <0.001 and <0.0001, respectively), being larger in the first group (Figure 6B, and 6C). Clearly this

550 pattern is driven by symbiotic Teredinibacter and, mostly free-living Cellvibrio representative

551 genomes (Figure 6B).

552

553 Figure 06 –Teredinibacter resistome. A) resistome PCA using the fifteen most important genes

554 based on mean decrease accuracy of Random Forest analysis. Teredinibacter forms a major group

555 influenced by polyamine, isoniazid, and PK-NRP-derived antibiotics. B) Relative abundance of

556 resistance gene markers, BGCs, and NRPS, PKS and hybrid BGCs. C) Host associated bacteria

557 present a significantly larger metabolome and abundance of NRPS, PKS and their hybrid

558 metabolic paths than free living bacteria. One asterisk represents Kruskal-Wallis test p-values

559 <0.001 and two represent <0.0001 both adjusted with Bonferroni method.

560 Discussion

561

562 In the present work, metagenomics was applied to characterize, compare and investigate

563 the microbiomes associated with the digestive glands, intestines and gills of the mangrove

564 shipworm N. reynei. Taxonomic annotations showed that, as seen for other shipworm systems

565 [3,4,11], the gills of N. reynei are a unique symbiotic site, filled with cellulolytic -proteobacteria,

566 including bacterial types of the Teredinibacter clade (Figure 1, Figure S1 and S3).

567 In contrast, digestive gland and intestine samples differ from the gill community and are more

568 diverse (Figure 1, Figure S1 and S3). Such results corroborate previous culture-independent

569 analysis of five shipworms species microbiomes, where digestive tract tissues were reported as

570 virtually sterile (cecum) or containing very low loads of bacteria (intestine) [15].

26 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

571 Therefore, our results indicate that digestive tract microbiomes do not contribute significantly to

572 the digestion of wood particles and consequently the nutrition of the shipworm host. Remarkably,

573 shipworms that are basal in the Teredininae subfamily (including N. reynei, Dicyathifer manni and

574 Bactronophorus thoracites) have the anus opening into the anal canal structure [1,2], which

575 morphology indicates a role in animal’s food absorption [18], but microbial constitution is yet to

576 be explored.

577 N. reynei gill samples are enriched for functions under categories of cellular process and

578 signaling (e.g., Cell motility; Signal transduction mechanisms), information storage and processing

579 (e.g., RNA processing and modification, Transcription) and metabolism (e.g., Inorganic ion

580 transport and metabolism, Iron acquisition and metabolism, Nitrogen metabolism, and Secondary

581 metabolites biosynthesis, transport, and catabolism) (Figure 1D, Figure S2).

582 Cross-assemblage based comparative metagenomics showed that such enrichments were

583 related to gammaproteobacterial-derived genes which drive gill samples to group together

584 regardless of the shipworm species of origin (Figure 2B). In fact, gill-enriched functionalities are

585 consistent with the genetic repertoire described for the prototypical shipworm symbiont T.

586 turnerae, a flagellar gammaproteobacterial species able to fix dinitrogen, sequestering iron and

587 producing secondary metabolites in symbio [9,10,12].

588 Specific mining of the contigs assembled from each tissue dataset showed that the N. reynei

589 gill microbiome is a hot spot of genes for: i) CAZymes, and particularly, wood-degrading

590 hydrolases (Figure 4), and ii) biosynthetic pathways for a variety of bioactive compounds classes

591 (Figure 5). On the other hand, such a genetic repertoire was shown to be negligible in digestive

592 gland and intestine datasets. However, is important to note that the digestive tract microbiomes

593 have proven to be more diverse than the gill symbiotic gammaproteobacterial community, and so,

27 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

594 our sequence strategy loses resolution and may not reach the necessary sequencing depth required

595 to fully cover these microbes’ functionalities.

596 Two high-quality representative symbiotic genomes could be recovered from the gill

597 datasets by our binning strategies (Table 1), and were assigned as T. turnerae and Teredinibacter

598 sp. by CheckM tool sets of single-copy genes that are signatures within a phylogenetic lineage

599 [34]. A phylogeny with several closely related reference taxa confirmed these taxonomical

600 affiliations grouping gills.bin.1 and gills.bin.4 genome bins respectively inside and as a sister

601 group of the T. turnerae species clade (Figure 3). The tree topology reliably reproduced the one

602 originally obtained by Spring and co-authors (2015), which supported the proposal of new

603 gammaproteobacterial families, including the Cellvibrionaceae family embracing Teredinibacter

604 and Ca. Endobugula genera of marine invertebrate symbionts [42].

605 Detection of T. turnerae and Teredinibacter sp. as prevalent bacterial types within N.

606 reynei gill microbiome is in agreement with previous culture efforts made with this species

607 specimens from the very same sampling site, and that resulted in isolation of several T. turnerae

608 strains [19]. These results also agree with previous culture-independent approaches applied on L.

609 pedicellatus and B. setacea shipworm species, showing that gill gammaproteobacterial

610 communities were composed of at least four distinct bacterial types, all grouping with T. turnerae

611 in a main clade of shipworm symbionts [3,4,14].

612 We also reproduced a systematic phylogenetic analysis previously performed with T.

613 turnerae isolates from a wide variety of shipworms species and using protein coding gene markers

614 [8] (Figure S4). In this work, Altamia and collaborators showed that T. turnerae population

615 actually encompasses two distinct lineages that, although they experienced horizontal gene transfer

616 events, can be consistently grouped in two main clades: clade I and clade II. Our phylogeny

28 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

617 reconstitutions using rpoB and gyrB markers replicate this topology, also agreeing with the

618 genome-wide phylogeny. In both analyses, gills.bin.1 genome bin fell within T. turnerae “clade

619 I”, just as the N. reynei isolated T. turnerae strain T8508 [8,19] (Figure 3, Figure S4).

620 Specific mining revealed that N. reynei’s symbiotic Teredinibacter genome bins contain a

621 repertoire of carbohydrate-active enzymes (CAZymes) that is closely related to the “wood-

622 specialized” profile described for T. turnerae T7901 [9], and is characterized by a vast repertoire

623 of cellulases/xylanases and other catalytic and binding domains involved in a network for breaking

624 down woody complex polysaccharides (Figure 4). Of particular interest was the detection of multi-

625 catalytic CAZymes with novel domain configurations in gills.bin.4 and unassembled

626 gammaproteobacterial contigs (Figure 4B). Also, within the cabal of CAZymes characterized was

627 an enrichment of catalytic domains that, in the B. setacea, were proven to travel from the gill

628 symbiotic site up to the animal cecum, where the wood digestion takes place [11]. Considering

629 that reduced gills, and well-developed cecum and anal canal organs, both associated with the

630 digestion of wood particles, support the hypothesis that N. reynei nutritional depends heavily on

631 wood [17], a symbiotic role on shipping key hydrolases from the gills site to the cecum

632 “biodigester” is quite reasonable and very exciting. Therefore, additional efforts focusing on

633 isolation and discovery of new symbiotic bacterial types from N. reynei gills is worthwhile and

634 can lead to the discovery of new and biotechnologically relevant wood-digesting CAZymes.

635 N. reynei gill’s symbiotic microbiome was also shown to be a fruitful source of BGCs.

636 Particularly, the T. turnerae representative genome bin (gills.bin.1) was shown to contain all the

637 BGCs previously described for T. turnerae strain T7901 [9], including the characterized routes for

638 production of the turnerbactin triscatecholate siderophore [13] and for the tartrolon family of

639 antibiotics [12]. Indeed, these results corroborate the antimicrobial activity previously reported for

29 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

640 T. turnerae strain CS30, isolated from N. reynei [19], and the results of a PCR survey for the

641 tartrolon producing trans-AT PKSs (trtDEF) that returned expected amplicons when CS30 and

642 other T. turnerae isolates genomic DNA were used as template [12]. Additionally, the in symbio

643 expression of the trt pathway was also shown by reverse transcriptase PCR with N. reynei gill bulk

644 RNA (unpublished data). However, the capacity of N. reynei symbiotic T. turnerae to produce

645 tartrolons remains to be clarified, since compounds of this family could not be detected on a survey

646 including CS30 strain’s culture chemical extracts [12]. Such clarification is of great interest, since

647 it was suggested that tartrolon antibiotics could play a role on shipworm host chemical defense,

648 and/or gill symbiotic community structuring. Finally, growths of CS30 under iron starvation leads

649 to secretion of siderophores, as revealed by the universal chrome azurol S (CAS) assay

650 (unpublished data). Such results support also the production of turnerbactin by this T. turnerae

651 strain.

652 Further, the genome bin gills.bin.4 encoded five contigs representing novel clusters among

653 shipworm microbiomes secondary metabolomes (Figure 4, Table S5). BLASTp searches showed

654 that these clusters putative core biosynthetic proteins present higher similarities to proteins from

655 putative BGCs from Bacteroidetes (as Flavobacteria), betaproteobacteria (as Nesseriales) and

656 firmicutes (as Bacillus) (Table S5 and data not shown). Even knowing that blast searches are biased

657 by the database, and that the modular enzymology commonly involved on biosynthesis of

658 secondary metabolites, such as polyketides, contain many conserved catalytic domains and so, are

659 not good taxonomic markers, these results may be taken as a suggestion of horizontal gene transfer

660 (HGT) events that led to the acquisition of such pathways by the Teredinibacter symbiotic

661 community.

30 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

662 Independently of the acquisition history, the discovery of these novel BGCs reinforces the

663 potential of N. reynei’s symbiotic system for biotechnological exploration aiming for discovery of

664 new drug leads. In fact, our comparative analyses with representative genomes composing the

665 Cellvibrionaceae family showed that shipworm gammaproteobacterial symbionts, and

666 Teredinibacter in particular, share large secondary metabolomes, enriched for PKS-NRPS routes

667 (Figure 6B, Table S7). Such a result reinforces the hypothesis that production of bioactive

668 metabolites, as polyketides and complex peptides or nonribosomally derived siderophores increase

669 Teredinibacter fitness in symbio. Furthermore, considering that each shipworm species has a

670 singular morpho-physiology and each symbiotic gammaproteobacterial community has a unique

671 diversity, specific BGCs might play specific roles within each host-bacteria interaction.

672 Consequently, the trans-AT PSK enzymology detected in the Teredinibacter sp. genome

673 bin is of great interest, since its includes a cassette containing catalytic activities for formation of

674 -branching chemical structures, which is also present in important drug leads as bryostatins -

675 potent protein kinases modulators, the pederin family of antitumor compounds and myxovirescin

676 antibiotics [50]. Interestingly, macrolide lactones of the bryostatin family, with promising

677 applications for treatment of Alzheimer's disease, had their biosynthesis assigned to a trans-AT

678 PKSs cluster (bry) from Ca. Endobugula sertula, an as yet not cultivated symbiont of the bryozoan

679 Bugula neritina [53], which also groups within the proposed Cellvibrionaceae family [42].

680 Therefore, efforts towards a better characterization of such putative novel trans-AT-PKS systems

681 are encouraged and can go from attempts to isolate strains of N. reynei’s symbiotic Teredinibacter

682 sp. carrying these clusters, genome sequencing and retrobiosynthetic compound(s) chemical

683 structure prediction, passing through PCR/probe-hybridization screening of such pathways from

684 N. reynei gill metagenomic DNA clone libraries, up to attempts to heterologously express this

31 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

685 pathway in a domesticated bacterium or engineered T. turnerae strain. Certainly, other interesting

686 BGCs could be retrieved from the 119 putative BGCs or gene clusters fragments recovered from

687 N. reynei gill dataset.

688 The detailed characterization of a BGC is a complex task and usually is made

689 retrobiosynthetically, based on the chemical structure of the cluster-associated purified compound,

690 therefore, beyond the scope of this work. However, the obtained results reaffirm shipworm-

691 gammaproteobacterial symbiotic interactions as a prolific source for detection of interesting

692 bioactive compounds, including antibiotics. Indeed, our group and collaborators are working on a

693 deeper and full characterization of the BGC catalog of shipworm -proteobacteria symbiotic

694 communities, including a pool of DNA sequence from isolates genome DNA and gill metagenome

695 samples from shipworms from several species and geographical locations.

696 The representative Teredinibacter genomes also encoded more putative resistance markers

697 for PKS and NRPS derived antibiotics (Figure 6A). The matching of this hypothetical resistome

698 with the profile of Teredinibacter BGCs indicate that at least some of these putative markers might

699 truly be resistance to self-produced antibiotics, and so, further characterizations of these genes are

700 of great interest.

701

702 Conclusion

703

704 In this study we applied metagenomics to characterize the microbial communities

705 associated with key organs of shipworm digestive systems, using the mangrove-specialized species

706 N. reynei as a research subject. We provided strong evidences that the N. reynei gill symbiont

32 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

707 gammaproteobacterial community is a hot spot for biotechnologically relevant enzymes, such as

708 hydrolases involved on breaking down wood-derived complex polysaccharides, and biosynthetic

709 gene clusters for production of potentially bioactive secondary metabolites such as complex

710 polyketides, peptides and lipopeptides. Further, the discrete bacterial communities forming the

711 microbiome of the digestive tract tissues seems to have little or no involvement in host nutritional

712 support or chemical defense. Digestive gland and intestine datasets presented a simple repertoire

713 of CAZymes, lacking glycosyl hydrolases for digestion of woody material, and only one

714 fragmented BGC, from intestine samples, with hits to a locus of the Yesso scallop bivalve mollusk

715 genome. Two representative symbiotic genomes were recovered from gill symbiotic community

716 and taxonomically assigned as a T. turnerae strain and a yet to be cultivated Teredinibacter sp.

717 bacterium. The latter encoded novel multi-catalytic CAZymes, and a trans-AT PKS BGC

718 containing the catalytic domains implicated in β-branching in related polyketides. Finally, we

719 show strong evidence that Teredinibacter is enriched in secondary metabolite pathways when

720 compared to relatives of the Cellvibrionaceae family particularly by an excess of BGCs for

721 complex polyketides and nonribosomal peptides, reinforcing the possible role of these compounds

722 on supporting the symbiosis.

723 Acknowledgments

724

725 Metagenomic sequencing has been performed at the Genomics and Bioinformatics facility

726 (CeGenBio) of the Drug Research and Development Center (NPDM) of the Federal University of

727 Ceara. Sampling and Genetic Resources Assessment were performed under license

728 IBAMA/MMA/SISBIO numbers 48388-1 and 48388-2. Authors thank the laboratories of

729 microbiology and of microscopy from the National Institute of Metrology, Quality and Technology

33 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

730 (http://www.inmetro.gov.br) at Xerém, RJ, Brazil for the support on preliminary samples

731 processing. We also thank professor Dr. Margo Haygood for the critical reading and revision of

732 this manuscript.

733 References

734

735 1. Distel DL, Amin M, Burgoyne A, Linton E, Mamangkey G, Morrill W, et al. Molecular

736 phylogeny of Pholadoidea Lamarck, 1809 supports a single origin for xylotrophy (wood

737 feeding) and xylotrophic bacterial endosymbiosis in . Mol Phylogenet Evol.

738 2011;61: 245–254. doi:10.1016/j.ympev.2011.05.019 S1055-7903(11)00261-2

739 2. Turner RD. A survey and illustrated catalogue of the Teredinidae (Mollusca: Bivalvia).

740 Museum of Comparative Zoology. Cambridge: Harvard University Press; 1966.

741 3. Distel DL, DeLong EF, Waterbury JB. Phylogenetic characterization and in situ localization

742 of the bacterial symbiont of shipworms (Teredinidae: Bivalvia) by using 16S rRNA

743 sequence analysis and oligodeoxynucleotide probe hybridization. Appl Env Microbiol.

744 1991;57: 2376–2382.

745 4. Distel DL, Beaudoin DJ, Morrill W. Coexistence of Multiple Proteobacterial

746 Endosymbionts in the Gills of the Wood-Boring Bivalve Lyrodus pedicellatus ( Bivalvia :

747 Teredinidae ). Society. 2002;68: 6292–6299. doi:10.1128/AEM.68.12.6292

748 5. Greene R V, Freer SN. Growth Characteristics of a Novel Nitrogen-Fixing Cellulolytic

749 Bacterium. Science. 1986;52: 982–986.

750 6. Waterbury JB, Calloway CB, Turner RD. A Cellulolytic Nitrogen-Fixing Bacterium

751 Cultured from the Gland of Deshayes in Shipworms (Bivalvia: Teredinidae). Science.

752 1983;221: 1401–1403. doi:10.1126/science.221.4618.1401

34 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

753 7. Distel DL, Morrill W, MacLaren-Toussaint N, Franks D, Waterbury J. Teredinibacter

754 turnerae gen. nov., sp. nov., a dinitrogen-fixing, cellulolytic, endosymbiotic gamma-

755 proteobacterium isolated from the gills of wood-boring molluscs (Bivalvia: Teredinidae).

756 Int J Syst Evol Microbiol. 2002;52: 2261–2269.

757 8. Altamia MA, Wood N, Fung JM, Dedrick S, W E. Genetic differentiation among isolates

758 of Teredinibacter turnerae , a widely occurring intracellular endosymbiont of shipworms.

759 Molecular Ecology. 2014; 1418–1432. doi:10.1111/mec.12667

760 9. Yang JC, Madupu R, Durkin a S, Ekborg N a, Pedamallu CS, Hostetler JB, et al. The

761 complete genome of Teredinibacter turnerae T7901: an intracellular endosymbiont of

762 marine wood-boring bivalves (shipworms). PLoS One. 2009;4: e6085.

763 doi:10.1371/journal.pone.0006085

764 10. Lechene CP, Luyten Y, McMahon G, Distel DL. Quantitative imaging of nitrogen fixation

765 by individual bacteria within animal cells. Science. 2007;317: 1563–1566.

766 doi:10.1126/science.1145557

767 11. O’Connor RM, Fung JM, Sharp KH, Benner JS, McClung C, Cushing S, et al. Gill bacteria

768 enable a novel digestive strategy in a wood-feeding mollusk. Proc Natl Acad Sci U S A.

769 2014;111: E5096-104. doi:10.1073/pnas.1413110111

770 12. Elshahawi SI, Trindade-Silva AE, Hanora A, Han AW, Flores MS, Vizzoni V, et al.

771 Boronated tartrolon antibiotic produced by symbiotic cellulose-degrading bacteria in

772 shipworm gills. Proc Natl Acad Sci U S A. 2013;110: E295-304.

773 doi:10.1073/pnas.1213892110

774 13. Han AW, Sandy M, Fishman B, Trindade-Silva AE, Soares C a G, Distel DL, et al.

775 Turnerbactin, a novel triscatecholate siderophore from the shipworm endosymbiont

35 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

776 Teredinibacter turnerae T7901. PLoS One. 2013;8: e76151.

777 doi:10.1371/journal.pone.0076151

778 14. Schmidt EW. Trading molecules and tracking targets in symbiotic interactions. Nature

779 Chemical Biology. 2008;4: 466–473. doi:10.1038/nchembio.101

780 15. Betcher MA, Fung JM, Han AW, Connor RO, Seronay R, Concepcion GP, et al. Microbial

781 Distribution and Abundance in the Digestive System of Five Shipworm Species ( Bivalvia :

782 Teredinidae ). PLoS One . 2012;7: 3–10. doi:10.1371/journal.pone.0045309

783 16. Distel DL, Altamia MA, Lin Z, Shipway JR, Han A, Forteza I, et al. Discovery of

784 chemoautotrophic symbiosis in the giant shipworm Kuphus polythalamia (Bivalvia:

785 Teredinidae) extends wooden-steps theory. Proc Natl Acad Sci U S A. 2017;114: E3652–

786 E3658. doi:10.1073/pnas.1620470114

787 17. Moraes DT de, Lopes SGB. The functional morphology of Neoteredo reynei (Bartsch,

788 1920) (Bivalvia, Teredinidae). J Molluscan Stud. 2003;69: 311–318.

789 18. Lopes SGBC, Domaneschi O, De Moraes DT, Morita M, Meserani GDLC. Functional

790 anatomy of the digestive system of Neoteredo reynei (Bartsch, 1920) and Psiloteredo healdi

791 (Bartsch, 1931) (Bivalvia: Teredinidae). Geol Soc London, Spec Publ. 2000;177: 257–271.

792 19. Trindade-Silva AE, Machado-Ferreira E, Senra MVX, Vizzoni VF, Yparraguirre L a,

793 Leoncini O, et al. Physiological traits of the symbiotic bacterium Teredinibacter turnerae

794 isolated from the mangrove shipworm Neoteredo reynei. Genet Mol Biol. 2009;32: 572–

795 81. doi:10.1590/S1415-47572009005000061

796 20. Garcia GD, Gregoracci GB, de OSE, Meirelles PM, Silva GG, Edwards R, et al.

797 Metagenomic Analysis of Healthy and White Plague-Affected Mussismilia braziliensis

798 Corals. Microb Ecol. 2013;65: 1076–1086. doi:10.1007/s00248-012-0161-4

36 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

799 21. F Meyer M D’Souza, R Olson, EM Glass, M Kubal, DP, T Paczian R Stevens, A Wilke, J

800 Wilkening and AR, Edwards RA. The metagenomics RAST server – a public resource for

801 the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics.

802 2008;9: 386. doi:10.1186/1471-2105-9-386

803 22. Parks DH, Beiko RG. Identifying biologically relevant differences between metagenomic

804 communities. Bioinformatics. 2010;26: 715–721. doi:10.1093/bioinformatics/btq041

805 23. Dinsdale EA, Edwards RA, Bailey BA, Tuba I, Akhter S, McNair K, et al. Multivariate

806 analysis of functional metagenomes. Front Genet. 2013;4: 41.

807 doi:10.3389/fgene.2013.00041

808 24. Suzuki R, Shimodaira H. Pvclust: an R package for assessing the uncertainty in hierarchical

809 clustering. Bioinformatics. 2006;22: 1540–1542. doi:10.1093/bioinformatics/btl117

810 25. Dutilh BE, Schmieder R, Nulton J, Felts B, Salamon P, Edwards RA, et al. Reference-

811 independent comparative metagenomics using cross-assembly: crAss. Bioinformatics.

812 2012;28: 3225–3231. doi:10.1093/bioinformatics/bts613

813 26. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. MetaSPAdes: A new versatile

814 metagenomic assembler. Genome Res. 2017;27: 824–834. doi:10.1101/gr.213959.116

815 27. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods.

816 2012;9: 357–359. doi:10.1038/nmeth.1923

817 28. Cambuy DD, Coutinho FH, Dutilh BE. Contig annotation tool CAT robustly classifies

818 assembled metagenomic contigs and long sequences. bioRxiv. 2016. doi:10.1101/072868

819 29. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic

820 gene recognition and translation initiation site identification. BMC Bioinformatics.

821 2010;11: 119. doi:10.1186/1471-2105-11-119

37 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

822 30. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat

823 Methods. Nature Publishing Group; 2015;12: 59–60. doi:10.1038/nmeth.3176

824 31. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately

825 reconstructing single genomes from complex microbial communities. PeerJ. 2015;3: e1165.

826 doi:10.7717/peerj.1165

827 32. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.

828 2013; Available: http://arxiv.org/abs/1303.3997

829 33. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence

830 Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079.

831 doi:10.1093/bioinformatics/btp352

832 34. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the

833 quality of microbial genomes recovered from isolates, single cells, and metagenomes.

834 Genome Res. 2015;25: 1043–1055. doi:10.1101/gr.186072.114

835 35. Aziz RK, Bartels D, Best AA, Dejongh M, Disz T, Edwards RA, et al. The RAST Server :

836 Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;15: 1–15.

837 doi:10.1186/1471-2164-9-75

838 36. Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated

839 carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40: W445-51.

840 doi:10.1093/nar/gks479

841 37. Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, et al. antiSMASH 3.0-a

842 comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids

843 Res. 2015;43: W237-43. doi:10.1093/nar/gkv437

844 38. Jia B, Raphenya AR, Alcock B, Waglechner N, Guo P, Tsang KK, et al. CARD 2017:

38 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

845 Expansion and model-centric curation of the comprehensive antibiotic resistance database.

846 Nucleic Acids Res. 2017;45: D566–D573. doi:10.1093/nar/gkw1004

847 39. Breiman L. Random Forrest. Mach Learn. 2001; 5–32. doi:10.1023/A:1010933404324

848 40. Liaw A, Wiener M. Package ‘ randomForest ’. Breiman and Cutler’s random forests for

849 classification and regression. CRAN Reference manual. 2015.

850 41. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. PATRIC, the

851 bacterial bioinformatics database and analysis resource. Nucleic Acids Res. Oxford

852 University Press; 2014;42: D581-91. doi:10.1093/nar/gkt1099

853 42. Spring S, Scheuner C, Göker M, Klenk HP. A taxonomic framework for emerging groups

854 of ecologically important marine gammaproteobacteria based on the reconstruction of

855 evolutionary relationships using genome-scale data. Front Microbiol. 2015;6: 281.

856 doi:10.3389/fmicb.2015.00281

857 43. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation

858 of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol.

859 2014;7: 539–539. doi:10.1038/msb.2011.75

860 44. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large

861 phylogenies. Bioinformatics. 2014;30: 1312–1313. doi:10.1093/bioinformatics/btu033

862 45. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and

863 annotation of phylogenetic and other trees. Nucleic Acids Res. 2016;44: W242–W245.

864 doi:10.1093/nar/gkw290

865 46. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary

866 Genetics Analysis version 6.0. Mol Biol Evol. 2013;30: 2725–2729.

867 doi:10.1093/molbev/mst197

39 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

868 47. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. The carbohydrate-

869 active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42: D490-5.

870 doi:10.1093/nar/gkt1178

871 48. Ekborg NA, Morrill W, Burgoyne AM, Li L, Distel DL. CelAB, a multifunctional cellulase

872 encoded by Teredinibacter turnerae T7902T, a culturable symbiont isolated from the wood-

873 boring marine bivalve Lyrodus pedicellatus. Appl Env Microbiol. 2007;73: 7785–7788.

874 doi:10.1128/AEM.00876-07

875 49. Wang S, Zhang J, Jiao W, Li J, Xun X, Sun Y, et al. Scallop genome provides insights into

876 evolution of bilaterian karyotype and development. Nat Ecol Evol. 2017;1: 120.

877 doi:10.1038/s41559-017-0120

878 50. Piel J. Biosynthesis of polyketides by trans-AT polyketide synthases. Nat Prod Rep. 2010;

879 1–77. doi:10.1039/b816430b

880 51. Wright GD. The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev

881 Microbiol. 2007;5: 175–186. doi:10.1038/nrmicro1614

882 52. McArthur AG, Waglechner N, Nizam F, Yan A, Azad MA, Baylay AJ, et al. The

883 comprehensive antibiotic resistance database. Antimicrob Agents Chemother. 2013;57:

884 3348–3357. doi:10.1128/AAC.00419-13

885 53. Trindade-Silva AE, Lim-Fong GE, Sharp KH, Haygood MG. Bryostatins: biological

886 context and biotechnological prospects. Curr Opin Biotechnol. 2010;21: 834–42.

887 doi:10.1016/j.copbio.2010.09.018

888

889 Supporting Information 890

40 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

891 Figure S1 – Metagenomes taxonomic diversity. A) Relative abundance of metagenomes

892 taxonomical signatures under Domain hierarchical level (RefSeq database). B) Relative

893 abundances of metagenomes bacterial genera when considering 16S rRNA reads (RDP database).

894 C) Box-plot of metagenomes Shannon–Weaver index according to the tissue source. Digestive

895 glands (light-blue), and Intestine (green) samples present higher Shannon diversity number,

896 evenness and entropy when compared with Gills (dark-blue) samples. Annotations were performed

897 under MG-RAST server under default stringency parameters (e-value 1e-5, % of identity = 60%,

898 minimal length of 15 and minimal abundance of 1, considering the representative hit).

899

900 Figure S2 – Two groups comparisons between gills and digestive-tract (digestive glands +

901 intestine) metagenomes annotations. A) Extended error bar plot showing bacterial classes

902 enriched at gills (blue) or digestive tract (orange) metagenomic groups. B) and C) Extended error

903 bar plot showing gills (blue) or digestive tract (orange) enriched bacterial functions when using

904 Cluster of Orthologous (COG) database (B); or Subsystem Technology database (C), at their

905 respective hierarchical level 2. Two groups comparisons were performed at the STAMP software

906 version v2.1.3, as explained at Methods.

907

908 Figure S3 – Contigs based taxonomical annotation of N. reynei gills, digestive glands and

909 intestine datasets. A) Pizza chart showing relative abundances of taxonomical hits per

910 superkingdom. B) Distribution of hits to the most abundant taxon under the phyla taxonomical

911 hierarchy. Inset on gills bar chart brings the distribution of proteobacterial-derived hits (orange

912 bar) as a pizza chart to show the prevalence (98,2%) of hits to the -proteobacteria class.

913

41 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

914 Figure S4 – Neighbor-joining phylogenies inferred for the genome bins protein coding gene

915 markers gyrB (A) and rpoB (B) against a referential dataset of genes from T. turnerae isolates.

916 Bootstrap probability values > 0.7 are shown at the nodes.

917

918 Figure S5 –Secondary metabolome PCA of Cellvibrionaceae representatives. N. reynei binned

919 gills-symbiotic genomes gills.bin.1 and gills.bin.4 secondary metabolome grouped closely to a

920 major group formed by Teredinibacter and influenced by BGCs for polyketide, non-ribosomal

921 peptide, and hybrid compounds BGCs, besides putative routes for bacteriocins and terpene-

922 arylpolyene.

923

924 Table S1 – Metagenomes overall features. * Numbers (1,2,3): N. reynei specimens. Letters (A,

925 B, C): metagenomic samples. 1 - QC – MG-RAST 4.0 applied quality control of the metagenomic

926 paired end reads. 2 - Values obtained after metadata quality control pipeline filtering.

927 Abbreviations: Int – Intestine; DG: Digestive Glands.

928

929 Table S2 – CAZy domains detected on the binned genomes.

930

931 Table S3 – BLASTp analysis of T. turnerae T7901 multi-catalytic CAZymes using N. reynei

932 symbiotic binned genome as database. * T7901 CAZymes for which no highly conserved

933 homolog could be detected on one or both genome bins. Stringency cutoff for highly conserved

934 homologs are E value = 0.0, identity > 75% over the entire query protein.

935

42 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

936 Table S4 – Genome bins contigs mapped to Teredinibacter turnerae T7901 putative

937 Biosynthetic Gene Cluster. * Characterized biosynthetic gene clusters for tartrolon antibiotics

938 and turnerbactin siderophore production. Contigs highlighted in bold were detected by antismash

939 server.

940

941 Table S5: BLASTp analysis showing gills.bin.1 and gills.bin.4 BGCs conserved on T. turnerae

942 species.

943

944 Table S6: BLASTp analysis of the genome bin gills.bin.4 unique BGCs core biosynthetic

945 genes.

946

947 Table S7: Cellvibrionaceae representative genomes information and resistome and BGCs

948 annotations. * Coding sequence automated annotated by RAST server. † Resistance genes

949 annotated according the CARD server and considering the default stringency cutoffs for

950 perfect/strict and loosely annotations. ‡ Biosynthetic Gene Clusters (BGCs) annotated according

951 to the antiSMASH server, under the bacterial version and using default settings.

43 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.