bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
1 The gill-associated symbiont microbiome is a main source of woody-plant
2 polysaccharide hydrolase genes and secondary metabolite gene clusters in
3 Neoteredo reynei, a unique shipworm from south Atlantic mangroves.
4
5 Thais L. Brito1, Amanda B. Campos2, F.A. Bastiaan von Meijenfeldt3, Julio P. Daniel1, Gabriella
6 B. Ribeiro1, Genivaldo G. Z. Silva4; Diego V. Wilke1; Daniela Morais5, Bas E. Dutilh3,6, Pedro M.
7 Meirelles2, Amaro E. Tridade-Silva1.
8
9 1 Drug Research and Development Center, Department of Physiology and Pharmacology, Federal
10 University of Ceara, Fortaleza, CE, Brazil;
11 2 Institute of Biology, Federal University of Bahia, Salvador, BA, Brazil
12 3 Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands;
13 4 Computational Science Research Center, San Diego State University, San Diego,
14 CA, USA;
15 5 National Institute of Metrology, Quality and Technology, Rio de Janeiro, Brazil;
16 6 Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre,
17 Nijmegen, The Netherlands.
18
19 * Corresponding author
20 E-mail: [email protected] (AETS)
21
22
23
1 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
24 Abstract
25
26 Teredinidae is a family of highly adapted wood-feeding and wood-boring bivalves,
27 commonly known as shipworms, whose evolution is linked to the acquisition of cellulolytic
28 gammaproteobacterial symbionts harbored in bacteriocytes within the gills. In the present work
29 we applied metagenomics to characterize microbiomes of the gills and digestive tract of Neoteredo
30 reynei, a mangrove-adapted shipworm species found over a large range of the Brazilian coast.
31 Comparative metagenomics grouped the symbiotic gammaproteobacterial community of gills of
32 different N. reynei specimens, indicating closely related bacterial types are shared, while intestine
33 and digestive glands presented related, and more diverse microbiomes that did not overlap with
34 gills. Annotation of assembled metagenomic contigs revealed that the symbiotic community of N.
35 reynei gills was a hotspot of woody-polysaccharides degrading hydrolase genes, and Biosynthetic
36 Gene Clusters (BGCs), while in contrast, the digestive tract microbiomes seems to play little role
37 in wood digestion and secondary metabolites biosynthesis. Metagenome binning recovered the
38 nearly complete genome sequences of two symbiotic Teredinibacter strains from the gills, a
39 representative of Teredinibacter turnerae “clade I” strain, and a yet to be cultivated Teredinibacter
40 sp. type. These Teredinibacter genomes, as well as unbinned gill-derived gammaproteobacteria
41 contigs, code for novelty including an endo-β-1,4-xylanase/acetylxylan esterase multi-catalytic
42 carbohydrate-active enzyme, and a trans-acyltransferase polyketide synthase (trans-AT PKS) gene
43 cluster with the gene cassette for generating -branching on complex polyketides. Multivariate
44 analyzes have shown that the secondary metabolome encoded on the genomes of Teredinibacter
45 representatives, including the genomes binned from N. reynei gills’ metagenomes, stand out within
46 the Cellvibrionaceae family by size, and enrichments for polyketide, nonribosomal peptide and
2 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
47 hybrid BGCs. Results grouped here add to the growing characterization of shipworm symbiotic
48 microbiomes and indicate that the N. reynei gill gammaproteobacterial community is a prolific
49 source of biotechnologically relevant enzymes for wood-digestion and bioactive compounds
50 production.
51
52 Introduction
53
54 Teredinidae, a family of obligate xylotrepetic (wood-boring) and xylotrophic (wood
55 feeding) mollusks, evolved from an ancestor lineage within the eulamellibranch order Myoida,
56 and succeeded in exploiting wood as new food source, most probably by concomitantly acquiring
57 cellulolytic bacterial symbionts [1]. These animals suffered dramatic body shape adaptations to
58 thrive as wood feeders including greatly reduced and burrowing-specialized valves (shells), and a
59 wood-sheltered worm-like body plan protruding from their shells, which led to their common name
60 of shipworms[2].
61 Shipworm bacterial symbiotic communities are formed by cultivated and as yet
62 uncultivated closely related gammaproteobacteria, located intracellularly in bacteriocytes within
63 the gills (ctenidia) tissue [3–6]. The prototypical shipworm symbiont is the cellulolytic and
64 nitrogen fixing bacterium T. turnerae, isolated from a great variety of Teredinidae species from all
65 over the globe [7,8]. Sequencing of the complete genome of one strain of T. turnerae, T7901,
66 revealed that the genome of this bacterium codes for an arsenal of enzymes specialized in breaking
67 down woody material, reinforcing the hypothesis that shipworm gammaproteobacterial symbionts
68 play a critical role on supporting host nutrition[9]. Indeed, two seminal works provided strong
69 evidence for this hypothesis: firstly multi isotopic mass spectroscopy (MIMS) combined with
3 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
70 transmission electron microscopy (TEM) showed, in Lyrodus pedicellatus, that T. turnerae cells
71 could fix atmospheric N2 in symbio, and then transfer nitrogenized compounds to their host [10].
72 Secondly, a combination of metagenomics and proteomics of Bankia setacea, showed that wood-
73 specialized hydrolytic enzymes secreted by the symbiotic community are, somehow, selectively
74 conducted from the gill symbiotic site up to the host’s cecum to reach their substrates, the ingested
75 wood particles [11].
76 Characterization of the T. turnerae T7901 genome also revealed that this endosymbiotic
77 bacterium contains a diversity of Biosynthetic Gene Clusters (BGCs) for secondary metabolites
78 comparable to those observed in free-living Streptomyces species known for their proficiency at
79 secondary metabolite production[9]. Indeed, compounds of the tartrolon family of boronated
80 antibiotic [12], and a novel triscatecholate siderophore called turnerbactin [13], were purified from
81 chemical extracts of T. turnerae cultures and had their respective biosynthetic routes identified
82 and characterized by retrobiosynthetic and phylogenetic approaches. Moreover, both BGCs were
83 proven to be expressed in symbio and signatures of these compounds were detected by mass
84 spectroscopy of shipworms whole animal extracts [12,13]. Such results fueled a hypothesis that
85 bioactive secondary metabolites could also play a role in supporting the symbiosis by providing
86 chemical defense and so, improving holobiont fitness [14], and/or mediate
87 competition/specialization to shape the distribution of bacterial types within the gills. Fluorescence
88 in situ hybridization (FISH) and confocal microscopy surveys of tissues from several shipworm
89 species provided evidence supporting both these hypothesis, showing that i) symbiotic
90 gammaproteobacterial types are structured within the gills, in separated niches of bacteriocytes,
91 and ii) tissues of the shipworm digestive system are virtually sterile (cecum) or containing a
92 discrete microbial community (intestine) [11,15].
4 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
93 Recently, the gill symbiotic community of the giant shipworm Kuphus polythalamia, a rare
94 species that burrows into mud and sediment instead of wood, was shown to be composed of sulfur
95 oxidizing chemoautotrophic gammaproteobacteria instead of Teredinibacter-related cellulolytic
96 types [16]. These findings suggested this bivalve is a chemoautotrophic descendent of a
97 xylotrophic ancestor, thus adding complexity to the biology of symbiosis in the family. N. reynei
98 is another singular shipworm that has been reported on mangroves of the south Atlantic, including
99 records of a large range of the 7.367 km long Brazilian coastline, from the States of Pará, at the
100 north of the country, to Santa Catarina, at the very south. N. reynei is the only species of the genus,
101 and can be rapidly identified by a unique feature: the presence of large and highly vascularized
102 dorsal lappets at the posterior end of the animal’s body, whose function is yet to be clarified [2,17].
103 N. reynei is a basal member of the family and has, characteristically, unsegmented pallets, a
104 globular (type II) stomach and an intestine that makes a loose loop forward embracing the
105 crystalline style-sac before running backwards toward the cecum [1]. The N. reynei digestive tract
106 includes also two digestive glands, one previously attributed to wood-digestion, another attributed
107 to filter feeding digestion, and both connected to the style-sac through ducts. In addition, N. reynei
108 possesses a large cecum, which can reach up to 44% of the animal’s body size, and an enormous
109 and cylindrically shaped anal canal whose aperture is controlled by a strong muscular sphincter
110 [18]. N. reynei gills extend from the base of the siphons up to the posterior end of the visceral mass
111 as demibranchs and then reduce to a food groove at the anterior end of the animal’s body, without
112 presenting an anterior gill, like in most other shipworm species. Therefore, it comprises only 20%
113 of N. reynei body length, a much shorter proportion when compared to other shipworm species.
114 Such reduced gills combined with a large cecum and anal canal, both continuously filled with
115 wood particles, were taken as anatomical evidences that N. reynei diet is primarily wood [2,17,18].
5 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
116 Previous culturing efforts lead to isolation of T. turnerae strains from N. reynei gill homogenate,
117 including CS30, a strain that displayed antimicrobial activity against a spectrum of gram positive
118 and negative indicator bacteria [19]. However, to date, the diversity and biotechnological potential
119 of N. reynei gill symbiotic microbiome was still to be explored by culture independent methods.
120 In the present work we applied metagenomic analysis to explore the N. reynei gill
121 symbiotic bacterial community, as well as the microbiome of the digestive glands and intestine, as
122 representative tissues of the shipworm digestive tract. Two symbiotic bacterial genome bins were
123 recovered and analyzed for their capability for producing hydrolytic enzymes and bioactive
124 secondary metabolites. Our findings reinforced the importance of the shipworm symbiotic system
125 as a subject of study and discovery.
126
127 Material and Methods
128
129 Specimens
130
131 Three whole adult specimens of N. reynei shipworm were collected, using care to avoid
132 injury, from decaying wood chunks in the Coroa Grande mangrove area at Sepetiba Bay, Rio de
133 Janeiro State, Brazil (22.91°S, 43.87°W) on December 5, 2014, and transported to the Laboratory
134 of Biotechnology at the National Institute of Metrology, Quality and Technology (Inmetro, RJ) in
135 sterilized glass jars.
136
6 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
137 Metagenomic DNA extraction, sequencing, annotation, and statistics.
138
139 Digestive glands, intestines and gill tissues were dissected from freshly collected animals
140 (out of wood for ~2hs at 24oC) snap frozen and ground in liquid nitrogen, and processed for
141 metagenomic DNA extraction using 2% CTAB lysis buffer, phenol-chloroform-isoamyl alcohol
142 (25:24:1) deproteinization, and the columns from PowerSoil® DNA Isolation Kit (MO BIO
143 Laboratories, Carlsbad, CA,USA) for final purification, as previously described [20]. For each
144 specimen, the two digestive glands were combined prior to grounding for DNA purification.
145 Additionally, two independent DNA purifications were performed from pulverized intestine and
146 gills tissues (replicates), totalizing 5 metagenomic DNA samples per shipworm specimen.
147 Metagenomic DNA samples purity and integrity were respectively evaluated by quantification
148 with a NanoDrop spectrophotometer (Thermo Fisher Scientific Inc.), and by 1% agarose gel
149 electrophoresis. A Qubit fluorimeter (Thermo Fisher Scientific Inc.) was used for accurate double-
150 stranded DNA quantification. Metagenomic DNA libraries were prepared for the fifteen high-
151 quality metagenomic DNA samples, using a Nextera XT DNA Sample Preparation Kit (Illumina),
152 as recommended by the manufacturer. Metagenomic libraries were sequenced using the 600-cycle
153 (300 bp paired-end runs) MiSeq Reagent Kits v3 chemistry (Illumina) with a MiSeq Desktop
154 Sequencer (Illumina) at the Center for Genomics and Bioinformatics (CeGenBio) of the Drug
155 Research and Development Center (NPDM), at the Federal University of Ceara, Brazil. Raw
156 sequence data were submitted to the MG-RAST server (version 4.03) [21] for sequence paired-
157 end joining, quality control, and automated annotation pipeline. Taxonomic annotations were
158 performed using RefSeq, and functional signatures were retrieved with both Clusters of
159 Orthologous Groups (COG) and Subsystems technology databases using the default cutoffs
7 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
160 settings (Supplementary Table S1). Taxonomical and functional signatures were submitted to
161 comparative metagenomics using R packages and the Statistical Analysis of Metagenomic Profiles
162 (STAMP) software package, version 2.1.3[22]. Multivariate statistical analyzes were performed
163 with R using the following unsupervised learning techniques: i) hierarchical clustering, with Ward
164 grouping method on a Euclidean distance matrix, ii) PCA biplot, and iii) supervised/unsupervised
165 random forest at R, as previously reported [23]. For hierarchical clustering, cluster significance
166 was accessed by bootstrap resampling using pvclust algorithm[24]. At STAMP, “two groups”
167 comparisons were conducted considering the two main clades obtained by unsupervised
168 hierarchical clustering. For that, the prokaryotic taxonomic (Class level of RefSeq) and functional
169 (Level 2 of the Subsystems technology) features of each group were retrieved from MG-RAST
170 and compared using two-sided Welch’s t-Test with confidence interval of 0.95 and considering
171 Benjamini-Hochberg False Discovery Rate corrected P-values (q-values) < 10-5 as significantly
172 relevant.
173
174 Metagenome assembly, cross-comparison and contig annotation.
175
176 For annotation-independent cross-assembly comparison between the 15 metagenomic
177 samples generated in this study we used crAss tool [25]. First, all paired-end joined quality
178 controlled metagenome reads of all tissues and specimens were assembled together using SPAdes
179 3.8 in --meta mode, and k-mer sizes of 21, 33, 55, and 77 [26]. Reads were mapped back to contigs
180 using Bowtie2 [27] and the SAM file was used on crAss, which computes the abundance of reads
181 from each metagenome contributing to form each contig in the cross assembly. The coverage of
182 each of the fifteen metagenomes in contigs was used to build a distance matrix that was displayed
8 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
183 in the form of a cladogram [25]. The tissue-specific contigs retrieved from gills, intestine and
184 digestive glands datasets were taxonomically annotated with the contig annotation tool (CAT) [28]
185 which predicts protein-coding genes on a contig, maps these to NCBI's non-redundant protein
186 database (NR), and subsequently employs a last common ancestor algorithm to give a conservative
187 estimate of taxonomy for the contig. CAT was run with the b1 parameter set to 10, and the b2
188 parameter set to 0.5. Prodigal 2.6.3 [29] and Diamond 0.9.14 [30] were employed for protein
189 calling and database mapping, respectively. NR was downloaded on November 23, 2017.
190
191 Symbiotic genome binning
192
193 Metagenomic sequencing reads were cross-assembled for each tissue with SPAdes 3.8.0 in
194 --meta mode [26]. Tissue-specific contig sets were each binned into draft genome sequences using
195 MetaBAT 0.26.3 [31] that exploits signatures that are typical of contigs belonging to the same
196 genome, including the nucleotide usage and the abundance of the contigs in the different samples.
197 In order to get contig abundances, reads of all the samples were mapped to the tissue-specific
198 assemblies with Burrows-Wheeler Aligner (BWA) 0.7.12 with the BWA-MEM algorithm[32].
199 Mappings were converted to the binary BAM format with SAMtools 0.1.19 [33] and depth files
200 were created with the script jgi_bam_summarize_contig_depths that is supplied with MetaBAT.
201 Binning with MetaBAT was performed with the sensitive setting, after tests with other presets
202 showed this to give the best results in terms of bin completeness and contamination. Bin
203 completeness, contamination, and strain heterogeneity were assessed using CheckM 1.0.7 [34]
204 based on the presence and absence of sets of single-copy marker genes. CheckM was run in the
205 lineage specific work flow, which places the bins in a reference tree and based on that placement
9 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
206 searches for lineage-specific marker genes. Completeness, contamination, and strain heterogeneity
207 were assessed on the lowest level of placement (node) in the CheckM backbone tree. Two high
208 quality genome bins designated gills.bin.1 and gills.bin.4 (>50% completeness and <5%
209 contamination, see Table 1) were generated from the gill assembly, as discussed in the Results
210 section. No high-quality bins could be generated from the digestive gland or intestine assemblies.
211
212 Symbiotic genome functional annotation and multivariate analysis
213
214 Nearly complete genomes gills.bin.1 and gills.bin.4, as well as other related
215 gammaproteobacterial genomes, including representatives of the Cellvibrionaceae family, and
216 more specifically, the Teredinibacter clade (Suppl. Table 7) were automatically annotated at the
217 RAST server [35] under default settings. Carbohydrate-active enzymes (CAZymes) were
218 annotated using the dbCAN web server [36]. Results were filtered for E-value of 1e-5 and domain
219 coverage fraction > 0.7. Using such parameters, reannotation of T. turnerae T7901 genome leads
220 to recover of 99% of the CAZy domains, with false positive and false negative rates of ~8 and
221 ~7% respectively. Natural products Biosynthetic Gene Clusters (BGCs) were annotated using
222 antiSMASH bacterial version [37] with RAST GenBank generated files as input and default
223 settings. The genes composing the resistome (i.e. genes related to antibiotic resistance) were
224 annotated and quantified with RGI (Resistome Genes Identificator) tool of the Comprehensive
225 Antibiotic Resistance Database (CARD) [38], under the discovery criteria for detection of perfect,
226 strict and loose hits. We then performed Principal Component Analysis (PCA) to cluster the
227 selected genomes according their diversities of BGCs and resistance genes, and supervised
228 Random Forest (RF) [39] to highlight the fifteen most influential vectors in each case, using
10 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
229 ggbiplot R library (Vu, 2011) to plot the results. The number of variables in PCA was reduced by
230 supervised RF to, with higher mean accuracy using the R library [40]. To test if the relative
231 abundance of resistome and metabolome (specifically Nrps, PKS and their hybrid paths) related
232 genes were significantly different between free-living and host-associated bacteria we performed
233 Kruskal-Wallis non-parametric test using the kruskal.test() function in R and considered p-value
234 lower than 0.05 significant. We tested each category independently and adjusted the p-value using
235 the Bonferroni adjust method with the p.adjust()function in R.
236
237 Phylogenetic analysis
238
239 The two high quality genome bins were placed in a phylogeny with closely related
240 Gammaproteobacteria based on the 43 phylogenetically informative marker genes that CheckM
241 uses to assess bin placement in its own backbone tree (Supplemental Table S6 in [34]). All
242 genomes from the family Cellvibrionaceae were downloaded from the PATRIC (Pathosystems
243 Resource Integration Center, [41] genome database, to our knowledge the largest publicly
244 available prokaryotic genome database. One genome per family of Halieaceae, Porticoccaceae,
245 and Spongiibacteraceae was added to together serve as outgroups. It was recently shown that the
246 monophyly of Cellvibrionaceae could only be clearly resolved when using a genome-scale
247 supermatrix [42] while a tree based on the RpoB gene showed Cellvibrionaceae to be paraphyletic
248 with Microbulbiferaceae. Thus, no Microbulbiferaceae genomes were included here. Indeed, when
249 Microbulbiferaceae were included in the tree, this led to paraphyly of the Cellvibrionaceae, but the
250 placement of the novel genome bins within the Cellvibrionaceae remained unchanged (not shown).
251 Genomes were downloaded based on the genome_lineage file on the PATRIC server that was
11 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
252 downloaded on December 20, 2017. One Cellvibrionaceae genome present in the genome_lineage
253 file, Cellvibrio mixtus strain PSBB022, could not be retrieved from the PATRIC server. CheckM
254 was run on all the genomes to identify the 43 marker genes, translated sequences were individually
255 realigned after dealignment with Clustal Omega 1.2.3 [43] and subsequently concatenated, filling
256 gaps if a gene was not found in a genome. A maximum-likelihood tree was inferred with RAxML
257 8.2.9 [44] using the PROTCATLG model and 100 rapid bootstraps (random seeds -p and -x were
258 both set to 12345), and the best scoring tree was visualized with iToL [45]. Additionally, the partial
259 nucleotide sequences of the protein-coding gene rpoB (1,020 bp) and gryB (1,053 bp) were
260 retrieved from a previously determined set of 25 T. turnerae isolates [8], and aligned with
261 respective gene fragments from binned genomes gills.bin.1 and gills.bin.4 with Molecular
262 Evolutionary Genetics Analysis (MEGA) software version 6.0[46], using the built-in ClustalW
263 program with default settings for “codons” (protein-coding sequences). Maximum likelihood fits
264 for 24 different nucleotide models were evaluated for each gene and best-fit model of evolution
265 selected according to the lowest Bayesian Information Criteria (BIC) scores. Neighbor-joining
266 phylogenies were reconstructed employing Kimura 2-parameter nucleotide substitution model
267 improved with 5 discrete gamma categories for a total of 1000 bootstraps.
268
269 Results
270
12 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
271 The N. reynei gill symbiotic microbiome is dense and highly dominated
272 by -proteobacteria, while digestive glands and intestine present similar
273 diverse microbial communities that do not overlap with the gill
274 community.
275
276 To extend knowledge regarding the role of shipworm symbiotic bacteria in wood digestion
277 and host defense, we performed functional metagenomics of N. reynei digestive gland, intestine,
278 and gill tissues (see methods). Shotgun sequencing of fifteen metagenomes samples (Table S1),
279 generated 21,548,783 reads, from which approximately 92% (3.94 Gb) passed the quality control
280 and annotation pipeline of the MG-RAST server. Approximately 1.2, 55.7 and 3.8% of the quality
281 proofed data presented matching hits for ribosomal RNA, protein, and protein with known
282 functional coding genes respectively (Table S1).
283 Unsupervised multivariate analysis showed that the N. reynei gill microbial community has
284 unique taxonomic and functional profiles (Figure 1). Hierarchical clustering of the microbial
285 annotations formed consistent cladogram topologies, with gills samples grouping apart from
286 digestive gland and intestine samples in two main clades (Figure 1A and 1B). Indeed, gill samples
287 exhibited a dense but low diversity microbiome, with Bacteria dominating the hits against the
288 RefSeq database (> 98%), and the vast majority of these hits being tracked to -proteobacteria (>
289 92%), and ultimately, to the genus Teredinibacter (Figure 1A, Figure S1). Accordingly, 16S rRNA
290 annotation against the RDP database confirmed the dominance of signatures of Teredinibacter in
291 the gill microbiomes (~ 88%) (Suppl. Fig. 1B). Intestine and digestive gland samples, on the other
13 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
292 hand, presented more diverse microbiomes distinct from the gills, responding respectively to 23,8
293 and 28,7% of the taxonomical annotations (Figure 1A, Figure S1).
294
295 Figure 01 – Multivariate analysis of metagenomes microbial taxonomical and functional
296 features. A) Unsupervised hierarchical clustering of taxonomical (TAX) and functional (FUNC)
297 features. Clades formed by gill samples and by intestine plus digestive gland samples are
298 highlighted respectively in purple and dark green. The pvclust AU (Approximately Unbiased) and
299 BP (Bootstrap Probability) p-values are shown at clades nodes in red and light green respectively.
300 B) Relative abundances of metagenome bacterial classes and functions respectively signed with
301 RefSeq and COG (Level 2) databases. C) Multiple dimensional scale (MDS) plot of N. reynei
302 metagenomics samples using distances and the fifteen most important bacteria classes (RefSeq
303 database) and functions (COG database at Level 2) vectors as calculated from unsupervised
304 random forest. Digestive gland, intestine and gill samples are shown as green, brown and purple
305 dots respectively. Clusters of Orthologous Groups are classified into functional categories as
306 follow: A - RNA processing and modification; E - Amino acid transport and metabolism; F -
307 Nucleotide transport and metabolism; H - Coenzyme transport and metabolism; I - Lipid transport
308 and metabolism; K – Transcription; L - Replication, recombination and repair; N - Cell motility;
309 O - Post-translational modification, protein turnover, and chaperones; Q - Secondary metabolites
310 biosynthesis, transport, and catabolism; S - Function Unknown; T - Signal transduction
311 mechanisms; U - Intracellular trafficking, secretion, and vesicular transport; W - Extracellular
312 structures; Z - Cytoskeleton
313 Non-parametric multidimensional scaling (nMDS) combined with random forest biplot
314 (Figure 1C) showed that enrichments to -proteobacteria is a major vector driving the observed
14 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
315 grouping of gill samples, while gram-positive (Actinobacteria, Flavobacteriia, Clostridia), gram-
316 negative (-proteobacteria, Cloroflexia, Bacteroidia, Sphingobacteriia), and unclassified
317 Cyanobacteria influence the grouping of digestive glands and/or intestine samples. Additionally,
318 functions under the COG subcategories Cytoskeleton; Cell motility; Coenzyme transport and
319 metabolism; RNA processing and modification; Signal transduction mechanisms;
320 Posttranslational modification, protein turnover, chaperones; Transcription; and Secondary
321 metabolites biosynthesis, transport and catabolism influenced gill samples grouping, while
322 digestive glands and intestine microbiomes were driven by Lipid; Amino acid, and Nucleotide
323 transport and metabolism besides Extracellular structures, and Intracellular trafficking secretion
324 and vesicular transport (Figure 1D). The two group comparisons performed with reference to the
325 two main clades obtained by the hierarchical clustering analysis and using taxonomical and
326 functional microbial sequence annotations corroborate the nMDS analysis, with biologically
327 significant enrichments detected in either gills or digestive tract (digestive glands and intestine)
328 microbiomes closely matching the vectors driving samples grouping on nMDS plot (Figure S2).
329 Although highly informative, comparing taxonomic and functional annotations confine the
330 sequence data to the reference databases, therefore biasing the results according to the database
331 representativeness. Considering that, we further performed annotation-independent comparative
332 metagenomics with crAss [25], a tool that accesses (and score) similarities between metagenomes
333 according to their shared entities (cross-contigs) after cross-assembling all samples’ reads.
334 Interestingly, metagenomes were grouped into one of two different cladograms topologies
335 reconstituted from four distance matrices (Figure 2). When qualitative comparison was applied
336 using "minimum" and "shot" presence/absence-based distance formulas, the samples grouped
337 significantly by the animal source (p<0.001) (Figure 2A). Although, when more quantitative
15 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
338 distance formulas "reads" and "wootters" were used, in which the contig number of reads is
339 considered as reflecting its abundance in the environment, samples grouped significantly
340 (p<0.001) by the tissue, driven by gill samples, which clustered together regardless of the animal
341 source (Figure 2B).
342 Figure 02 – Comparison of N. reynei metagenomic samples by cross-assembling. A)
343 Cladogram representing cross-contigs (i.e., shared contigs containing reads from at least two
344 metagenomes) grouping when using presence-absence-based formulas (distance formulas ‘shot’
345 and ‘minimum’). Samples grouped according the specimen of origin, being clades formed of
346 samples of N. reynei specimens #1, 2 and 3 highlighted in red, green and brown respectively. B)
347 Cladogram representing cross-contigs grouping when using more qualitative distance measures
348 (distance formulas ‘Wootters’ and ‘reads’). Gill samples grouped together regardless the specimen
349 of origin, as highlighted in dark-blue.
350 Further, we applied the contigs annotation tool (CAT) algorithm to get taxonomical
351 classification of contigs assembled for each tissue-type dataset (see Methods). Such analysis
352 showed that gill samples contain 10 times more bacterial-derived contigs (~25% of the signatures)
353 than intestine and digestive glands (~2-3% of the signatures) (Figure S3). Mollusca represented 30
354 – 38% of the obtained classifications being the most prevailing annotation on the three tissues
355 (Figure S3). Corroborating raw reads annotations, gammaproteobacteria dominated the gill
356 microbial community (~87% of bacterial contigs), with Teredinibacter as the most abundant
357 classified bacterial genus (~81% of gammaproteobacterial reads). Additionally, the discrete
358 microbiomes of digestive glands and intestine were reaffirmed as more diverse with Proteobacteria
359 (~0.6 – 0.7%), Actinobacteria (~0.4 – 0.5%), Bacteroidetes (~0.3 – 0.6%) and Firmicutes (~0.3%)
360 contigs leading the obtained microbial hits (Figure S3).
16 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
361
362 T. turnerae and Teredinibacter sp. compose the N. reynei gill symbiotic
363 community
364
365 Genome binning was then performed in an effort to recovering microbial genomes
366 representing each investigated tissue microbiome. Two -proteobacterial genomes bins presenting
367 the quality required for further analysis (> 50% completeness, < 5% contamination), as assessed
368 by CheckM tool [34], were recovered, both from the gill metagenome dataset (Table 1). The
369 genome bins, labeled as “gills.bin.1” and “gills bin.4”, were automated assigned as T. turnerae
370 and Teredinibacter sp. respectively (Table 1).
371
372 Table 1 – N. reynei gill symbiotic genome bins metrics and features.
Genome bin/Features Gills.bin.1 Gills.bin.4
Marker lineage s_turnerae (UID4667) g_Teredinibacter (UID4659)
# genomes 2 13
# markers 1792 1055
# marker set 305 271
Completeness 87.12 96.45
Contamination 2.8 4.88
Strain heterogeneity 0 55
Genome size (bp) 4418468 5127066
# contigs 355 207
17 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
N50 (contigs) 16757 39993
Mean contig length (bp) 12446 24768
Longest contig (bp) 97957 134446
GC 51.59 53.41
373 Genome bins quality as accessed by the CheckM tool.
374 # - Number. Abbreviation: bp – base pairs.
375
376 Firstly, the taxonomic affiliations of gills.bin.1 and gills.bin.4 genome bins were checked
377 by genome-scaled supermatrix (Figure 3) and one-protein-coding taxonomic marker phylogenies
378 (Figure S4). Corroborating with the CheckM taxonomical annotation, which is based on lineage-
379 specific marker genes, both phylogenies provided highly similar tree topologies where both
380 genome bins fell in a subclade of Cellvibrionaceae family, with gills.bin.1 grouping within the
381 previously defined “clade 1” of T. turnerae [8] and gills.bin.4 appearing as outgroup of this species
382 main clade (Figure 3; Figure S4). The detection of a N. reynei symbiotic “clade I” T. turnerae
383 strain is in agreement with previous affiliation of a N. reynei T. turnerae isolate, T8508, also falling
384 within this clade [8]. Moreover, as observed for other Teredinidae species, the present results
385 showed that N. reynei gill symbiotic microbiota contains species in the Teredinibacter clade
386 including, but not limited to, T. turnerae.
387
388 Figure 03 – Phylogeny of two high quality Teredinibacter bins based on a concatenated
389 alignment of 43 proteins. RAxML maximum likelihood tree based on the CAT model of rate
390 heterogeneity and LG amino acid substitution matrix, with 100 rapid bootstraps. To improve
391 readability, bootstraps for the Teredinibacter turnerae sub-clades I and II (with bootstrap values
18 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
392 of 87 and 97 respectively) are omitted. The tree is rooted in between the Cellvibrionaceae branch
393 and representatives of Halieaceae, Porticoccaceae, and Spongiibacteraceae. Color coding of the
394 families is similar to the one used by Spring and co-authors (2015). The scale bar represents 0.05
395 nucleotide substitutions per site.
396
397 N. reynei symbiotic Teredinibacter genome bins code for wood
398 specialized repertoire of hydrolases including novel multicatalytic
399 enzymes
400
401 The T. turnerae T7901 genome codifies an arsenal of enzymes devoted to breaking down
402 the lignocellulosic plant material [9]. Based on this knowledge, we annotated the recovered
403 genome bins against the carbohydrate-acting enzymes (CAZy) database [47] and compared their
404 cazymes profile with the one for T7901, disposed at the CAZy server (Table S2). In the three
405 compared genomes, cazymes represented between 3.5 – 4% of the protein coding genes,
406 encompassing a diversity of predicted protein domains including 32 – 37% of glycoside hydrolases
407 (GH), 9 – 12% of glycoside transferases (GT), 7 – 13% of carbohydrate esterases (CE), 1 – 2% of
408 polysaccharide lyases (PL), and 42 to 44% of carbohydrate binding modules (CBM) (Table S2).
409 Additionally, when GH families were grouped according their substrate specificities, it was shown
410 that the binned genomes also presented the “wood-specialized” profile described for T7901 [9],
411 where a major portion (43 – 51%) of the GHs are devoted to the digestion of wood-composing
412 polysaccharides (Figure 4A). Furthermore, gills.bin.1 contains identical or near identical (E value
413 = 0.0, amino acids identities ≥ 99%) homologs to the multidomain and multi-catalytic cazymes
19 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
414 detected in the T7901 genome, the exception being TERTU_RS21420, coding for the
415 biochemically characterized multifunctional cellulase CelAB [48] (Table S3). In comparison,
416 gills.bin.4 codes less conserved homologs (E value = 0.0, amino acids identities ≥ 75%) to only
417 three of T7901 multicatalytic enzymes (Table S3). However, this genome bin codes for other novel
418 multidomain/multicatalytic enzyme configurations, including a putative acetyl xylan esterase/
419 endo-1,4-β-xylanase (peg.397) and two putative dual pectin acetylesterases (peg.2052 and
420 peg.3260) (Figure 4B).
421
422 Figure 04 – Gills genome bins and unbinned contigs carbohydrate active enzymes
423 (CAZymes). A) Proportion of GH domains according to the substrate specificity at gills.bin1 and
424 gills.bin 4 binned genomes and at other unbinned contigs from gills metagenome dataset. Substrate
425 specificities as previously defined [9]: dark green = cellulose/xylan GH families (5, 6, 8, 9, 10, 11,
426 12, 44, 45, 51, 52, 62, and 74); dark blue = GH families with other or not-unique specificities (1,
427 2, 3, 13, 15, 23, 27, 30, 31, 35, 39, 43, 73, 77, 78, 79, 94, 95, 103, 108, 109, 115, 128, 130); light
428 grey = laminarin GH families (16 and 81); yellow = other plant cell wall polysaccharides GH
429 families (26, 53, and 67); light blue = chitin GH families (18, 19, and 20); light green =
430 peptidoglycan GH families (28 and 105). B) Novel multicatalytic CAZymes configurations
431 detected on Teredinibacter sp. gills.bin.4 genome bin (labeled in black), and on unbinned contigs
432 from gill-symbiotic gammaproteobacterial community (labeled in red). Abbreviations key:
433 glycoside hydrolases (GH), carbohydrate esterases (CE), carbohydrate binding modules (CBM).
434 Catalytic domains and binding modules are color coded as black squares and gray circles.
435 When the entire set of contigs assembled from the gill dataset was mined, additional 291
436 putative cazymes not mapped to the genome bins were found. In total, N. reynei gills contained
20 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
437 882 carbohydrate active domains, and comparing to the cazymes profile of the symbiotic genome
438 bins, the gill dataset presented an ~2.3 X enrichment for GTs, and 1.3 X reduction on GHs (Suppl.
439 Table 2). However, the wood-specialized profile is maintained for the entire gill symbiotic
440 community CAZymes (Fig. 4A). In fact, other unique multicatalytic woody plant material
441 digesting CAZymes that have not binned to Teredinibacter genomes were detected, such as a
442 putative endo-β-1,4-xylanase/ endo-β-1,4-galactanase (NODE.622) and two endo-β-1,4-
443 xylanase/acetylxylan esterases (NODE.790a and b) (Fig. 4B).
444 Finally, we evaluate genomes CAZymes in light of the recent genome-enabled-proteomic
445 study performed with shipworm specimens of B. setacea species, which revealed that an array of
446 carbohydrate-active-domains derived from the gill symbiotic gammaproteobacterial community
447 are found in the animals’ cecum contents [11]. We verified that, in the compared genomes and in
448 the gill microbiome dataset, the related domains families are enriched, representing 82 – 89% of
449 the cellulase/xylanases (GH families 5, 6, 9,10, 11, 45 and 53), 50 – 75% of the carbohydrate
450 esterases (CE families 1, 3, 4, 6 and 15) and 44 – 62% of the carbohydrate binding motifs detected
451 (CBM2, 10, 22, 57, 60, 61) (data not shown).
452 Contrasting with the rich and wood devoted repertoire of hydrolyses present in the gill
453 symbiotic site, contigs from digestive glands and intestine datasets retained only 29 and 101
454 CAZymes related domains respectively, where GHs are absent in recovered enzymes of digestive
455 glands and account for only 4% of intestine-proteins CAZymes, all of which have substrate
456 specificity other than cellulose/xylan (Table S2, data not shown).
457
21 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
458 The N. reynei symbiotic genome bins code for a diverse repertoire of
459 biosynthetic gene clusters (BGCs) including Teredinibacter related and
460 novel putative pathways
461
462 Further, we evaluated the potential of the N. reynei associated microbiome to produce
463 bioactive secondary metabolites. Firstly, the gill, digestive gland and intestine assembled sequence
464 dataset was mined against the robust shell of algorithms of the antiSMASH server [37]. The gill
465 symbiotic microbiome was shown to be a hotspot of BGCs, with a total of 119 contigs encoding
466 enzymes involved in biosynthesis of complex peptides, polyketides and hybrid compounds,
467 besides poly-unsaturated fatty acids (PUFAs), terpenes, arylpolyenes, homoserine lactones,
468 ectoine and bacteriocins (Figure 5A). On the contrary, no secondary metabolite clusters were
469 detected in the digestive gland dataset, and only two contigs for modular type I PKS were found
470 in intestine samples (data not shown). Intriguingly, the intestine-derived type I PKS contigs best
471 BLASTp hits were to a fusarin C synthetase-like (XP_021367770) and a erythronolide synthase
472 (OWF54747.1) genes linked to the genome of the bivalve Mizuhopecten yessoensis (Yesso
473 scallop) [49]. Since modular type I PKS occur almost exclusively in microbes, it is unclear if these
474 homolog genes are truly encoded in the M. yessoensis genome, which would indicate a rare inter-
475 kingdom horizontal gene transfer event, or are from a sequenced contaminating DNA source, as
476 an associated bacterium.
477 We then reanalyzed the secondary metabolome of the reference T. turnerae T7901 genome
478 via the antiSMASH server. Previously, T7901 was shown to devote an unexpectedly high
479 percentage (~7%) of its genetic content to natural product biosynthetic pathways [9]. The new
22 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
480 genome mining reveals a total of 13 BGCs, adding new putative pathways for bacteriocins (4),
481 terpene (1) and ectoine (1), and condensing firstly delimitated “regions” 3, 4, and 5 [9], into |a
482 putative giant Bacteriocin-T1pks-Nrps BGC of ~270.6 kb (Table S4). Screening of gills.bin.1 and
483 gills.bin.4 genome bins lead, respectively, to the detection of 19 and 14 contigs, representing
484 putative BGCs, or fragments thereof, for production of compounds from a diversity of classes such
485 as polyketides, non-ribosomal peptides, terpenes, and bacteriocins (Figure 5A). To get this analysis
486 at higher perspective, we also annotated the secondary metabolome of all genomes in the
487 Cellvibrionaceae family included in our genome-wide phylogeny (Figure 3) and resolved the
488 detected BGCs diversity by supervisioned random forest combined with multidimensional scaling.
489 The N. reynei gill symbiotic genome bins contain secondary metabolomes highly related to the
490 ones from representative T. turnerae strains, which grouped tightly mostly because of enrichments
491 for modular PKS, NRPS and hybrid pathways (Figure S5).
492
493 Figure 05 – Gills genome bins Biosynthetic Gene Clusters (BGCs). A) Secondary
494 metabolomes. BGCs/contigs detected in Teredinibacter turnerae (T7901 and gills.bin.1),
495 Teredinibacter sp.(gills.bin.4) and on contigs derived from N. reynei gill assembled dataset.
496 Clusters are grouped, and color coded according to the compound class. B) Novel putative trans-
497 AT PKS biosynthetic gene clusters detected on Teredinibacter sp. gills.bin.4 genome and the main
498 predicted catalytic domains of their open reading frames (ORFs). Biosynthetic, transport-related,
499 regulatory, other -branching-related genes are color coded in dark-blue, light-blue, green, gray
500 and orange respectively. Domains keys: thioesterase (TE), aminotransferase (AMT), adenylation
501 (A), ketosynthase (KS), ketoreductase (KR), dehydrogenase (DH), thiolation (T), enoyl-CoA
502 dehydratases (ECH), 2-Nitropropane dioxygenase (2NPD), polyketide synthase cyclase (PKS
23 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
503 CY), 4'-phosphopantetheinyl transferase (4PPT). Putative inter-proteins bimodules are present in
504 yellow, PKS-encoded ECH catalytic domains are present in orange.
505 Eighteen of the putative BGCs from gills.bin.1 and additional fourth contig undetected by
506 the antiSMASH server could be mapped to all 13 BGCs from T. turnerae T7901 secondary
507 metabolome with high pairwise identity (~99%) and BGCs coverage (~88%) (Table S4). BLASTp
508 inspection showed that the type 1 PKS route from gills.bin.1 genome bin lacking in T7901 genome
509 is actually conserved in other T. turnerae genomes, as the ones from strains T8412, T8413 and
510 T8415 (E value = 0 ,100% query coverage, ~99% sequence identity) (Table S5). Therefore, such
511 results corroborate the grouping of gills.1 genome bin as a T. turnerae strain.
512 Conversely, only a putative bacteriocin BGC encoded in gills.bin.4 could be mapped to the
513 T. turnerae T7901 secondary metabolome (Suppl. Table 4). BLASTp analysis showed that eight
514 others putative BGCs from this Teredinibacter sp. genome bin were also conserved (~97% gene
515 coverage and ~73% identity) in other Teredinibacter genomes (Table S5). The 5 remaining contigs
516 from gills.bin.4 are novel among the secondary metabolomes of shipworms symbionts.
517 The core biosynthetic proteins encoded on these BGCs presented best BLASTp hits to
518 putative proteins from Bacteroidetes (~ 40%), gammaproteobacteria (33%), and
519 betaproteobacteria (~15%), besides alphaproteobacteria and Firmicutes (Table S6). Three of these
520 hypothetical novel BGCs (NODE_76, NODE_26 and NODE_85) code for typical multi-modular
521 trans-AT PKS enzymes, with a catalytic domain inventory including two inter-proteins bimodules,
522 commonly involved in formation of ,-olefinic moieties, and a gene cassette typically involved
523 in the formation of -branching in bioactive compounds such as the highly cytotoxic pederins and
524 the antibiotic bacillaene [50] (Figure 5B).
525
24 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
526 The Teredinibacter resistome singularity is driven by putative resistance
527 genes to PKs-NRPs derived antibiotics
528
529 The Resistome is defined as the collection of all antibiotic resistance genes present in a
530 microorganism’s genome, including precursor genes, which encode proteins retaining weak
531 antibiotic resistance or binding activity, and that under selective pressure can evolve into a new
532 resistance marker [51]. Here we also investigated the resistome coded in Teredinibacter genomes,
533 including gills.bin.1 and gills.bin.4 genome bins, and other representative genomes of the
534 Cellvibrionaceae family. The vast majority of resistance signatures obtained with the Resistance
535 Gene Identifier (RGI) tool from the Comprehensive Antibiotics Resistance Database (CARD) [52]
536 were under the lower stringency cutoffs, which includes putative resistance genes precursors
537 (Table S7). Multivariate comparisons showed that Teredinibacter resistomes grouped tightly and
538 are particularly driven by putative resistance markers to fluoroquinolone (total of 2110 hits) and
539 to antibiotics derived from PKS, NRPS and hybrid compounds, including macrolides (2071),
540 monobactam (2071), tetracycline (1963), penam (1343), peptides (816) and glycopeptide (739)
541 antibiotics (Figure 6A). The relative abundance of genes encoding resistance related proteins was
542 not significantly different (adjusted p-value of 0.28) across the genera inside the Cellvibrionaceae
543 family, suggesting that the resistome size is conserved (Fig 6B, yellow bars). Indeed, resistomes
544 of both Teredinibacter (13) and Cellvibrio (6) representative genomes were of similar size
545 (representing respectively ~5.3 and ~5.6% of the protein coding genes) (Table S7), despite the
546 intracellularly symbiotic life-style of the former and, mostly free-living nature of the latter. In
547 contrast, BGCs and, more specifically, Nrps-PKS-hybrid metabolic paths relative abundances
548 differed significantly between host-associated and free-living Cellvibrionaceae (adjusted p-values
25 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
549 <0.001 and <0.0001, respectively), being larger in the first group (Figure 6B, and 6C). Clearly this
550 pattern is driven by symbiotic Teredinibacter and, mostly free-living Cellvibrio representative
551 genomes (Figure 6B).
552
553 Figure 06 –Teredinibacter resistome. A) resistome PCA using the fifteen most important genes
554 based on mean decrease accuracy of Random Forest analysis. Teredinibacter forms a major group
555 influenced by polyamine, isoniazid, and PK-NRP-derived antibiotics. B) Relative abundance of
556 resistance gene markers, BGCs, and NRPS, PKS and hybrid BGCs. C) Host associated bacteria
557 present a significantly larger metabolome and abundance of NRPS, PKS and their hybrid
558 metabolic paths than free living bacteria. One asterisk represents Kruskal-Wallis test p-values
559 <0.001 and two represent <0.0001 both adjusted with Bonferroni method.
560 Discussion
561
562 In the present work, metagenomics was applied to characterize, compare and investigate
563 the microbiomes associated with the digestive glands, intestines and gills of the mangrove
564 shipworm N. reynei. Taxonomic annotations showed that, as seen for other shipworm systems
565 [3,4,11], the gills of N. reynei are a unique symbiotic site, filled with cellulolytic -proteobacteria,
566 including bacterial types of the Teredinibacter clade (Figure 1, Figure S1 and S3).
567 In contrast, digestive gland and intestine samples differ from the gill community and are more
568 diverse (Figure 1, Figure S1 and S3). Such results corroborate previous culture-independent
569 analysis of five shipworms species microbiomes, where digestive tract tissues were reported as
570 virtually sterile (cecum) or containing very low loads of bacteria (intestine) [15].
26 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
571 Therefore, our results indicate that digestive tract microbiomes do not contribute significantly to
572 the digestion of wood particles and consequently the nutrition of the shipworm host. Remarkably,
573 shipworms that are basal in the Teredininae subfamily (including N. reynei, Dicyathifer manni and
574 Bactronophorus thoracites) have the anus opening into the anal canal structure [1,2], which
575 morphology indicates a role in animal’s food absorption [18], but microbial constitution is yet to
576 be explored.
577 N. reynei gill samples are enriched for functions under categories of cellular process and
578 signaling (e.g., Cell motility; Signal transduction mechanisms), information storage and processing
579 (e.g., RNA processing and modification, Transcription) and metabolism (e.g., Inorganic ion
580 transport and metabolism, Iron acquisition and metabolism, Nitrogen metabolism, and Secondary
581 metabolites biosynthesis, transport, and catabolism) (Figure 1D, Figure S2).
582 Cross-assemblage based comparative metagenomics showed that such enrichments were
583 related to gammaproteobacterial-derived genes which drive gill samples to group together
584 regardless of the shipworm species of origin (Figure 2B). In fact, gill-enriched functionalities are
585 consistent with the genetic repertoire described for the prototypical shipworm symbiont T.
586 turnerae, a flagellar gammaproteobacterial species able to fix dinitrogen, sequestering iron and
587 producing secondary metabolites in symbio [9,10,12].
588 Specific mining of the contigs assembled from each tissue dataset showed that the N. reynei
589 gill microbiome is a hot spot of genes for: i) CAZymes, and particularly, wood-degrading
590 hydrolases (Figure 4), and ii) biosynthetic pathways for a variety of bioactive compounds classes
591 (Figure 5). On the other hand, such a genetic repertoire was shown to be negligible in digestive
592 gland and intestine datasets. However, is important to note that the digestive tract microbiomes
593 have proven to be more diverse than the gill symbiotic gammaproteobacterial community, and so,
27 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
594 our sequence strategy loses resolution and may not reach the necessary sequencing depth required
595 to fully cover these microbes’ functionalities.
596 Two high-quality representative symbiotic genomes could be recovered from the gill
597 datasets by our binning strategies (Table 1), and were assigned as T. turnerae and Teredinibacter
598 sp. by CheckM tool sets of single-copy genes that are signatures within a phylogenetic lineage
599 [34]. A phylogeny with several closely related reference taxa confirmed these taxonomical
600 affiliations grouping gills.bin.1 and gills.bin.4 genome bins respectively inside and as a sister
601 group of the T. turnerae species clade (Figure 3). The tree topology reliably reproduced the one
602 originally obtained by Spring and co-authors (2015), which supported the proposal of new
603 gammaproteobacterial families, including the Cellvibrionaceae family embracing Teredinibacter
604 and Ca. Endobugula genera of marine invertebrate symbionts [42].
605 Detection of T. turnerae and Teredinibacter sp. as prevalent bacterial types within N.
606 reynei gill microbiome is in agreement with previous culture efforts made with this species
607 specimens from the very same sampling site, and that resulted in isolation of several T. turnerae
608 strains [19]. These results also agree with previous culture-independent approaches applied on L.
609 pedicellatus and B. setacea shipworm species, showing that gill gammaproteobacterial
610 communities were composed of at least four distinct bacterial types, all grouping with T. turnerae
611 in a main clade of shipworm symbionts [3,4,14].
612 We also reproduced a systematic phylogenetic analysis previously performed with T.
613 turnerae isolates from a wide variety of shipworms species and using protein coding gene markers
614 [8] (Figure S4). In this work, Altamia and collaborators showed that T. turnerae population
615 actually encompasses two distinct lineages that, although they experienced horizontal gene transfer
616 events, can be consistently grouped in two main clades: clade I and clade II. Our phylogeny
28 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
617 reconstitutions using rpoB and gyrB markers replicate this topology, also agreeing with the
618 genome-wide phylogeny. In both analyses, gills.bin.1 genome bin fell within T. turnerae “clade
619 I”, just as the N. reynei isolated T. turnerae strain T8508 [8,19] (Figure 3, Figure S4).
620 Specific mining revealed that N. reynei’s symbiotic Teredinibacter genome bins contain a
621 repertoire of carbohydrate-active enzymes (CAZymes) that is closely related to the “wood-
622 specialized” profile described for T. turnerae T7901 [9], and is characterized by a vast repertoire
623 of cellulases/xylanases and other catalytic and binding domains involved in a network for breaking
624 down woody complex polysaccharides (Figure 4). Of particular interest was the detection of multi-
625 catalytic CAZymes with novel domain configurations in gills.bin.4 and unassembled
626 gammaproteobacterial contigs (Figure 4B). Also, within the cabal of CAZymes characterized was
627 an enrichment of catalytic domains that, in the B. setacea, were proven to travel from the gill
628 symbiotic site up to the animal cecum, where the wood digestion takes place [11]. Considering
629 that reduced gills, and well-developed cecum and anal canal organs, both associated with the
630 digestion of wood particles, support the hypothesis that N. reynei nutritional depends heavily on
631 wood [17], a symbiotic role on shipping key hydrolases from the gills site to the cecum
632 “biodigester” is quite reasonable and very exciting. Therefore, additional efforts focusing on
633 isolation and discovery of new symbiotic bacterial types from N. reynei gills is worthwhile and
634 can lead to the discovery of new and biotechnologically relevant wood-digesting CAZymes.
635 N. reynei gill’s symbiotic microbiome was also shown to be a fruitful source of BGCs.
636 Particularly, the T. turnerae representative genome bin (gills.bin.1) was shown to contain all the
637 BGCs previously described for T. turnerae strain T7901 [9], including the characterized routes for
638 production of the turnerbactin triscatecholate siderophore [13] and for the tartrolon family of
639 antibiotics [12]. Indeed, these results corroborate the antimicrobial activity previously reported for
29 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
640 T. turnerae strain CS30, isolated from N. reynei [19], and the results of a PCR survey for the
641 tartrolon producing trans-AT PKSs (trtDEF) that returned expected amplicons when CS30 and
642 other T. turnerae isolates genomic DNA were used as template [12]. Additionally, the in symbio
643 expression of the trt pathway was also shown by reverse transcriptase PCR with N. reynei gill bulk
644 RNA (unpublished data). However, the capacity of N. reynei symbiotic T. turnerae to produce
645 tartrolons remains to be clarified, since compounds of this family could not be detected on a survey
646 including CS30 strain’s culture chemical extracts [12]. Such clarification is of great interest, since
647 it was suggested that tartrolon antibiotics could play a role on shipworm host chemical defense,
648 and/or gill symbiotic community structuring. Finally, growths of CS30 under iron starvation leads
649 to secretion of siderophores, as revealed by the universal chrome azurol S (CAS) assay
650 (unpublished data). Such results support also the production of turnerbactin by this T. turnerae
651 strain.
652 Further, the genome bin gills.bin.4 encoded five contigs representing novel clusters among
653 shipworm microbiomes secondary metabolomes (Figure 4, Table S5). BLASTp searches showed
654 that these clusters putative core biosynthetic proteins present higher similarities to proteins from
655 putative BGCs from Bacteroidetes (as Flavobacteria), betaproteobacteria (as Nesseriales) and
656 firmicutes (as Bacillus) (Table S5 and data not shown). Even knowing that blast searches are biased
657 by the database, and that the modular enzymology commonly involved on biosynthesis of
658 secondary metabolites, such as polyketides, contain many conserved catalytic domains and so, are
659 not good taxonomic markers, these results may be taken as a suggestion of horizontal gene transfer
660 (HGT) events that led to the acquisition of such pathways by the Teredinibacter symbiotic
661 community.
30 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
662 Independently of the acquisition history, the discovery of these novel BGCs reinforces the
663 potential of N. reynei’s symbiotic system for biotechnological exploration aiming for discovery of
664 new drug leads. In fact, our comparative analyses with representative genomes composing the
665 Cellvibrionaceae family showed that shipworm gammaproteobacterial symbionts, and
666 Teredinibacter in particular, share large secondary metabolomes, enriched for PKS-NRPS routes
667 (Figure 6B, Table S7). Such a result reinforces the hypothesis that production of bioactive
668 metabolites, as polyketides and complex peptides or nonribosomally derived siderophores increase
669 Teredinibacter fitness in symbio. Furthermore, considering that each shipworm species has a
670 singular morpho-physiology and each symbiotic gammaproteobacterial community has a unique
671 diversity, specific BGCs might play specific roles within each host-bacteria interaction.
672 Consequently, the trans-AT PSK enzymology detected in the Teredinibacter sp. genome
673 bin is of great interest, since its includes a cassette containing catalytic activities for formation of
674 -branching chemical structures, which is also present in important drug leads as bryostatins -
675 potent protein kinases modulators, the pederin family of antitumor compounds and myxovirescin
676 antibiotics [50]. Interestingly, macrolide lactones of the bryostatin family, with promising
677 applications for treatment of Alzheimer's disease, had their biosynthesis assigned to a trans-AT
678 PKSs cluster (bry) from Ca. Endobugula sertula, an as yet not cultivated symbiont of the bryozoan
679 Bugula neritina [53], which also groups within the proposed Cellvibrionaceae family [42].
680 Therefore, efforts towards a better characterization of such putative novel trans-AT-PKS systems
681 are encouraged and can go from attempts to isolate strains of N. reynei’s symbiotic Teredinibacter
682 sp. carrying these clusters, genome sequencing and retrobiosynthetic compound(s) chemical
683 structure prediction, passing through PCR/probe-hybridization screening of such pathways from
684 N. reynei gill metagenomic DNA clone libraries, up to attempts to heterologously express this
31 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
685 pathway in a domesticated bacterium or engineered T. turnerae strain. Certainly, other interesting
686 BGCs could be retrieved from the 119 putative BGCs or gene clusters fragments recovered from
687 N. reynei gill dataset.
688 The detailed characterization of a BGC is a complex task and usually is made
689 retrobiosynthetically, based on the chemical structure of the cluster-associated purified compound,
690 therefore, beyond the scope of this work. However, the obtained results reaffirm shipworm-
691 gammaproteobacterial symbiotic interactions as a prolific source for detection of interesting
692 bioactive compounds, including antibiotics. Indeed, our group and collaborators are working on a
693 deeper and full characterization of the BGC catalog of shipworm -proteobacteria symbiotic
694 communities, including a pool of DNA sequence from isolates genome DNA and gill metagenome
695 samples from shipworms from several species and geographical locations.
696 The representative Teredinibacter genomes also encoded more putative resistance markers
697 for PKS and NRPS derived antibiotics (Figure 6A). The matching of this hypothetical resistome
698 with the profile of Teredinibacter BGCs indicate that at least some of these putative markers might
699 truly be resistance to self-produced antibiotics, and so, further characterizations of these genes are
700 of great interest.
701
702 Conclusion
703
704 In this study we applied metagenomics to characterize the microbial communities
705 associated with key organs of shipworm digestive systems, using the mangrove-specialized species
706 N. reynei as a research subject. We provided strong evidences that the N. reynei gill symbiont
32 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
707 gammaproteobacterial community is a hot spot for biotechnologically relevant enzymes, such as
708 hydrolases involved on breaking down wood-derived complex polysaccharides, and biosynthetic
709 gene clusters for production of potentially bioactive secondary metabolites such as complex
710 polyketides, peptides and lipopeptides. Further, the discrete bacterial communities forming the
711 microbiome of the digestive tract tissues seems to have little or no involvement in host nutritional
712 support or chemical defense. Digestive gland and intestine datasets presented a simple repertoire
713 of CAZymes, lacking glycosyl hydrolases for digestion of woody material, and only one
714 fragmented BGC, from intestine samples, with hits to a locus of the Yesso scallop bivalve mollusk
715 genome. Two representative symbiotic genomes were recovered from gill symbiotic community
716 and taxonomically assigned as a T. turnerae strain and a yet to be cultivated Teredinibacter sp.
717 bacterium. The latter encoded novel multi-catalytic CAZymes, and a trans-AT PKS BGC
718 containing the catalytic domains implicated in β-branching in related polyketides. Finally, we
719 show strong evidence that Teredinibacter is enriched in secondary metabolite pathways when
720 compared to relatives of the Cellvibrionaceae family particularly by an excess of BGCs for
721 complex polyketides and nonribosomal peptides, reinforcing the possible role of these compounds
722 on supporting the symbiosis.
723 Acknowledgments
724
725 Metagenomic sequencing has been performed at the Genomics and Bioinformatics facility
726 (CeGenBio) of the Drug Research and Development Center (NPDM) of the Federal University of
727 Ceara. Sampling and Genetic Resources Assessment were performed under license
728 IBAMA/MMA/SISBIO numbers 48388-1 and 48388-2. Authors thank the laboratories of
729 microbiology and of microscopy from the National Institute of Metrology, Quality and Technology
33 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
730 (http://www.inmetro.gov.br) at Xerém, RJ, Brazil for the support on preliminary samples
731 processing. We also thank professor Dr. Margo Haygood for the critical reading and revision of
732 this manuscript.
733 References
734
735 1. Distel DL, Amin M, Burgoyne A, Linton E, Mamangkey G, Morrill W, et al. Molecular
736 phylogeny of Pholadoidea Lamarck, 1809 supports a single origin for xylotrophy (wood
737 feeding) and xylotrophic bacterial endosymbiosis in Bivalvia. Mol Phylogenet Evol.
738 2011;61: 245–254. doi:10.1016/j.ympev.2011.05.019 S1055-7903(11)00261-2
739 2. Turner RD. A survey and illustrated catalogue of the Teredinidae (Mollusca: Bivalvia).
740 Museum of Comparative Zoology. Cambridge: Harvard University Press; 1966.
741 3. Distel DL, DeLong EF, Waterbury JB. Phylogenetic characterization and in situ localization
742 of the bacterial symbiont of shipworms (Teredinidae: Bivalvia) by using 16S rRNA
743 sequence analysis and oligodeoxynucleotide probe hybridization. Appl Env Microbiol.
744 1991;57: 2376–2382.
745 4. Distel DL, Beaudoin DJ, Morrill W. Coexistence of Multiple Proteobacterial
746 Endosymbionts in the Gills of the Wood-Boring Bivalve Lyrodus pedicellatus ( Bivalvia :
747 Teredinidae ). Society. 2002;68: 6292–6299. doi:10.1128/AEM.68.12.6292
748 5. Greene R V, Freer SN. Growth Characteristics of a Novel Nitrogen-Fixing Cellulolytic
749 Bacterium. Science. 1986;52: 982–986.
750 6. Waterbury JB, Calloway CB, Turner RD. A Cellulolytic Nitrogen-Fixing Bacterium
751 Cultured from the Gland of Deshayes in Shipworms (Bivalvia: Teredinidae). Science.
752 1983;221: 1401–1403. doi:10.1126/science.221.4618.1401
34 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
753 7. Distel DL, Morrill W, MacLaren-Toussaint N, Franks D, Waterbury J. Teredinibacter
754 turnerae gen. nov., sp. nov., a dinitrogen-fixing, cellulolytic, endosymbiotic gamma-
755 proteobacterium isolated from the gills of wood-boring molluscs (Bivalvia: Teredinidae).
756 Int J Syst Evol Microbiol. 2002;52: 2261–2269.
757 8. Altamia MA, Wood N, Fung JM, Dedrick S, W E. Genetic differentiation among isolates
758 of Teredinibacter turnerae , a widely occurring intracellular endosymbiont of shipworms.
759 Molecular Ecology. 2014; 1418–1432. doi:10.1111/mec.12667
760 9. Yang JC, Madupu R, Durkin a S, Ekborg N a, Pedamallu CS, Hostetler JB, et al. The
761 complete genome of Teredinibacter turnerae T7901: an intracellular endosymbiont of
762 marine wood-boring bivalves (shipworms). PLoS One. 2009;4: e6085.
763 doi:10.1371/journal.pone.0006085
764 10. Lechene CP, Luyten Y, McMahon G, Distel DL. Quantitative imaging of nitrogen fixation
765 by individual bacteria within animal cells. Science. 2007;317: 1563–1566.
766 doi:10.1126/science.1145557
767 11. O’Connor RM, Fung JM, Sharp KH, Benner JS, McClung C, Cushing S, et al. Gill bacteria
768 enable a novel digestive strategy in a wood-feeding mollusk. Proc Natl Acad Sci U S A.
769 2014;111: E5096-104. doi:10.1073/pnas.1413110111
770 12. Elshahawi SI, Trindade-Silva AE, Hanora A, Han AW, Flores MS, Vizzoni V, et al.
771 Boronated tartrolon antibiotic produced by symbiotic cellulose-degrading bacteria in
772 shipworm gills. Proc Natl Acad Sci U S A. 2013;110: E295-304.
773 doi:10.1073/pnas.1213892110
774 13. Han AW, Sandy M, Fishman B, Trindade-Silva AE, Soares C a G, Distel DL, et al.
775 Turnerbactin, a novel triscatecholate siderophore from the shipworm endosymbiont
35 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
776 Teredinibacter turnerae T7901. PLoS One. 2013;8: e76151.
777 doi:10.1371/journal.pone.0076151
778 14. Schmidt EW. Trading molecules and tracking targets in symbiotic interactions. Nature
779 Chemical Biology. 2008;4: 466–473. doi:10.1038/nchembio.101
780 15. Betcher MA, Fung JM, Han AW, Connor RO, Seronay R, Concepcion GP, et al. Microbial
781 Distribution and Abundance in the Digestive System of Five Shipworm Species ( Bivalvia :
782 Teredinidae ). PLoS One . 2012;7: 3–10. doi:10.1371/journal.pone.0045309
783 16. Distel DL, Altamia MA, Lin Z, Shipway JR, Han A, Forteza I, et al. Discovery of
784 chemoautotrophic symbiosis in the giant shipworm Kuphus polythalamia (Bivalvia:
785 Teredinidae) extends wooden-steps theory. Proc Natl Acad Sci U S A. 2017;114: E3652–
786 E3658. doi:10.1073/pnas.1620470114
787 17. Moraes DT de, Lopes SGB. The functional morphology of Neoteredo reynei (Bartsch,
788 1920) (Bivalvia, Teredinidae). J Molluscan Stud. 2003;69: 311–318.
789 18. Lopes SGBC, Domaneschi O, De Moraes DT, Morita M, Meserani GDLC. Functional
790 anatomy of the digestive system of Neoteredo reynei (Bartsch, 1920) and Psiloteredo healdi
791 (Bartsch, 1931) (Bivalvia: Teredinidae). Geol Soc London, Spec Publ. 2000;177: 257–271.
792 19. Trindade-Silva AE, Machado-Ferreira E, Senra MVX, Vizzoni VF, Yparraguirre L a,
793 Leoncini O, et al. Physiological traits of the symbiotic bacterium Teredinibacter turnerae
794 isolated from the mangrove shipworm Neoteredo reynei. Genet Mol Biol. 2009;32: 572–
795 81. doi:10.1590/S1415-47572009005000061
796 20. Garcia GD, Gregoracci GB, de OSE, Meirelles PM, Silva GG, Edwards R, et al.
797 Metagenomic Analysis of Healthy and White Plague-Affected Mussismilia braziliensis
798 Corals. Microb Ecol. 2013;65: 1076–1086. doi:10.1007/s00248-012-0161-4
36 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
799 21. F Meyer M D’Souza, R Olson, EM Glass, M Kubal, DP, T Paczian R Stevens, A Wilke, J
800 Wilkening and AR, Edwards RA. The metagenomics RAST server – a public resource for
801 the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics.
802 2008;9: 386. doi:10.1186/1471-2105-9-386
803 22. Parks DH, Beiko RG. Identifying biologically relevant differences between metagenomic
804 communities. Bioinformatics. 2010;26: 715–721. doi:10.1093/bioinformatics/btq041
805 23. Dinsdale EA, Edwards RA, Bailey BA, Tuba I, Akhter S, McNair K, et al. Multivariate
806 analysis of functional metagenomes. Front Genet. 2013;4: 41.
807 doi:10.3389/fgene.2013.00041
808 24. Suzuki R, Shimodaira H. Pvclust: an R package for assessing the uncertainty in hierarchical
809 clustering. Bioinformatics. 2006;22: 1540–1542. doi:10.1093/bioinformatics/btl117
810 25. Dutilh BE, Schmieder R, Nulton J, Felts B, Salamon P, Edwards RA, et al. Reference-
811 independent comparative metagenomics using cross-assembly: crAss. Bioinformatics.
812 2012;28: 3225–3231. doi:10.1093/bioinformatics/bts613
813 26. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. MetaSPAdes: A new versatile
814 metagenomic assembler. Genome Res. 2017;27: 824–834. doi:10.1101/gr.213959.116
815 27. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods.
816 2012;9: 357–359. doi:10.1038/nmeth.1923
817 28. Cambuy DD, Coutinho FH, Dutilh BE. Contig annotation tool CAT robustly classifies
818 assembled metagenomic contigs and long sequences. bioRxiv. 2016. doi:10.1101/072868
819 29. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic
820 gene recognition and translation initiation site identification. BMC Bioinformatics.
821 2010;11: 119. doi:10.1186/1471-2105-11-119
37 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
822 30. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat
823 Methods. Nature Publishing Group; 2015;12: 59–60. doi:10.1038/nmeth.3176
824 31. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately
825 reconstructing single genomes from complex microbial communities. PeerJ. 2015;3: e1165.
826 doi:10.7717/peerj.1165
827 32. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.
828 2013; Available: http://arxiv.org/abs/1303.3997
829 33. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence
830 Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079.
831 doi:10.1093/bioinformatics/btp352
832 34. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the
833 quality of microbial genomes recovered from isolates, single cells, and metagenomes.
834 Genome Res. 2015;25: 1043–1055. doi:10.1101/gr.186072.114
835 35. Aziz RK, Bartels D, Best AA, Dejongh M, Disz T, Edwards RA, et al. The RAST Server :
836 Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;15: 1–15.
837 doi:10.1186/1471-2164-9-75
838 36. Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated
839 carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012;40: W445-51.
840 doi:10.1093/nar/gks479
841 37. Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, et al. antiSMASH 3.0-a
842 comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids
843 Res. 2015;43: W237-43. doi:10.1093/nar/gkv437
844 38. Jia B, Raphenya AR, Alcock B, Waglechner N, Guo P, Tsang KK, et al. CARD 2017:
38 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
845 Expansion and model-centric curation of the comprehensive antibiotic resistance database.
846 Nucleic Acids Res. 2017;45: D566–D573. doi:10.1093/nar/gkw1004
847 39. Breiman L. Random Forrest. Mach Learn. 2001; 5–32. doi:10.1023/A:1010933404324
848 40. Liaw A, Wiener M. Package ‘ randomForest ’. Breiman and Cutler’s random forests for
849 classification and regression. CRAN Reference manual. 2015.
850 41. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. PATRIC, the
851 bacterial bioinformatics database and analysis resource. Nucleic Acids Res. Oxford
852 University Press; 2014;42: D581-91. doi:10.1093/nar/gkt1099
853 42. Spring S, Scheuner C, Göker M, Klenk HP. A taxonomic framework for emerging groups
854 of ecologically important marine gammaproteobacteria based on the reconstruction of
855 evolutionary relationships using genome-scale data. Front Microbiol. 2015;6: 281.
856 doi:10.3389/fmicb.2015.00281
857 43. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation
858 of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol.
859 2014;7: 539–539. doi:10.1038/msb.2011.75
860 44. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large
861 phylogenies. Bioinformatics. 2014;30: 1312–1313. doi:10.1093/bioinformatics/btu033
862 45. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and
863 annotation of phylogenetic and other trees. Nucleic Acids Res. 2016;44: W242–W245.
864 doi:10.1093/nar/gkw290
865 46. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary
866 Genetics Analysis version 6.0. Mol Biol Evol. 2013;30: 2725–2729.
867 doi:10.1093/molbev/mst197
39 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
868 47. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. The carbohydrate-
869 active enzymes database (CAZy) in 2013. Nucleic Acids Res. 2014;42: D490-5.
870 doi:10.1093/nar/gkt1178
871 48. Ekborg NA, Morrill W, Burgoyne AM, Li L, Distel DL. CelAB, a multifunctional cellulase
872 encoded by Teredinibacter turnerae T7902T, a culturable symbiont isolated from the wood-
873 boring marine bivalve Lyrodus pedicellatus. Appl Env Microbiol. 2007;73: 7785–7788.
874 doi:10.1128/AEM.00876-07
875 49. Wang S, Zhang J, Jiao W, Li J, Xun X, Sun Y, et al. Scallop genome provides insights into
876 evolution of bilaterian karyotype and development. Nat Ecol Evol. 2017;1: 120.
877 doi:10.1038/s41559-017-0120
878 50. Piel J. Biosynthesis of polyketides by trans-AT polyketide synthases. Nat Prod Rep. 2010;
879 1–77. doi:10.1039/b816430b
880 51. Wright GD. The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev
881 Microbiol. 2007;5: 175–186. doi:10.1038/nrmicro1614
882 52. McArthur AG, Waglechner N, Nizam F, Yan A, Azad MA, Baylay AJ, et al. The
883 comprehensive antibiotic resistance database. Antimicrob Agents Chemother. 2013;57:
884 3348–3357. doi:10.1128/AAC.00419-13
885 53. Trindade-Silva AE, Lim-Fong GE, Sharp KH, Haygood MG. Bryostatins: biological
886 context and biotechnological prospects. Curr Opin Biotechnol. 2010;21: 834–42.
887 doi:10.1016/j.copbio.2010.09.018
888
889 Supporting Information 890
40 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
891 Figure S1 – Metagenomes taxonomic diversity. A) Relative abundance of metagenomes
892 taxonomical signatures under Domain hierarchical level (RefSeq database). B) Relative
893 abundances of metagenomes bacterial genera when considering 16S rRNA reads (RDP database).
894 C) Box-plot of metagenomes Shannon–Weaver index according to the tissue source. Digestive
895 glands (light-blue), and Intestine (green) samples present higher Shannon diversity number,
896 evenness and entropy when compared with Gills (dark-blue) samples. Annotations were performed
897 under MG-RAST server under default stringency parameters (e-value 1e-5, % of identity = 60%,
898 minimal length of 15 and minimal abundance of 1, considering the representative hit).
899
900 Figure S2 – Two groups comparisons between gills and digestive-tract (digestive glands +
901 intestine) metagenomes annotations. A) Extended error bar plot showing bacterial classes
902 enriched at gills (blue) or digestive tract (orange) metagenomic groups. B) and C) Extended error
903 bar plot showing gills (blue) or digestive tract (orange) enriched bacterial functions when using
904 Cluster of Orthologous (COG) database (B); or Subsystem Technology database (C), at their
905 respective hierarchical level 2. Two groups comparisons were performed at the STAMP software
906 version v2.1.3, as explained at Methods.
907
908 Figure S3 – Contigs based taxonomical annotation of N. reynei gills, digestive glands and
909 intestine datasets. A) Pizza chart showing relative abundances of taxonomical hits per
910 superkingdom. B) Distribution of hits to the most abundant taxon under the phyla taxonomical
911 hierarchy. Inset on gills bar chart brings the distribution of proteobacterial-derived hits (orange
912 bar) as a pizza chart to show the prevalence (98,2%) of hits to the -proteobacteria class.
913
41 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
914 Figure S4 – Neighbor-joining phylogenies inferred for the genome bins protein coding gene
915 markers gyrB (A) and rpoB (B) against a referential dataset of genes from T. turnerae isolates.
916 Bootstrap probability values > 0.7 are shown at the nodes.
917
918 Figure S5 –Secondary metabolome PCA of Cellvibrionaceae representatives. N. reynei binned
919 gills-symbiotic genomes gills.bin.1 and gills.bin.4 secondary metabolome grouped closely to a
920 major group formed by Teredinibacter and influenced by BGCs for polyketide, non-ribosomal
921 peptide, and hybrid compounds BGCs, besides putative routes for bacteriocins and terpene-
922 arylpolyene.
923
924 Table S1 – Metagenomes overall features. * Numbers (1,2,3): N. reynei specimens. Letters (A,
925 B, C): metagenomic samples. 1 - QC – MG-RAST 4.0 applied quality control of the metagenomic
926 paired end reads. 2 - Values obtained after metadata quality control pipeline filtering.
927 Abbreviations: Int – Intestine; DG: Digestive Glands.
928
929 Table S2 – CAZy domains detected on the binned genomes.
930
931 Table S3 – BLASTp analysis of T. turnerae T7901 multi-catalytic CAZymes using N. reynei
932 symbiotic binned genome as database. * T7901 CAZymes for which no highly conserved
933 homolog could be detected on one or both genome bins. Stringency cutoff for highly conserved
934 homologs are E value = 0.0, identity > 75% over the entire query protein.
935
42 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
936 Table S4 – Genome bins contigs mapped to Teredinibacter turnerae T7901 putative
937 Biosynthetic Gene Cluster. * Characterized biosynthetic gene clusters for tartrolon antibiotics
938 and turnerbactin siderophore production. Contigs highlighted in bold were detected by antismash
939 server.
940
941 Table S5: BLASTp analysis showing gills.bin.1 and gills.bin.4 BGCs conserved on T. turnerae
942 species.
943
944 Table S6: BLASTp analysis of the genome bin gills.bin.4 unique BGCs core biosynthetic
945 genes.
946
947 Table S7: Cellvibrionaceae representative genomes information and resistome and BGCs
948 annotations. * Coding sequence automated annotated by RAST server. † Resistance genes
949 annotated according the CARD server and considering the default stringency cutoffs for
950 perfect/strict and loosely annotations. ‡ Biosynthetic Gene Clusters (BGCs) annotated according
951 to the antiSMASH server, under the bacterial version and using default settings.
43 bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/357616; this version posted June 27, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.