1 Supplementary Information (SI) Appendix
2
3 Nitrogen conservation, conserved: 46 million years of N-recycling by
4 the core symbionts of turtle ants
5
6 Yi Hu, Jon G. Sanders, Piotr Łukasik, Catherine L. D'Amelio, John S. Millar, David
7 R. Vann, Yemin Lan, Justin A. Newton, Mark Schotanus, John T. Wertz, Daniel J. C.
8 Kronauer, Naomi E. Pierce, Corrie S. Moreau, Philipp Engel, Jacob A. Russell
9
10 Table of Contents 11 Supplementary Figure legends ...... 1 12 Supplementary Table legends ...... 8 13 Supplementary Materials and Methods...... 10 14 Assessing N-fixation ...... 10 15 Feeding experiments with 15N-labeled urea and 13C/15N-labeled glutamate ...... 10 16 qPCR and amplicon 16S rRNA sequencing to estimate antibiotic efficacy ...... 12 17 Amino acid analysis from ant hemolymph by gas-chromatography-mass 18 spectrometry (GC-MS) ...... 13 19 DNA preparation for C. varians metagenomics, non-C. varians ants for 20 metagenomics and cultured bacteria ...... 14-15 21 Genome and metagenome sequencing, assembly and annotation ...... 16 22 Genome binning using Anvi’o in conjunction with the CONCOCT ...... 19 23 Visualization of taxonomic composition of metagenomes based on coverage 24 and %GC ...... 20 25 Fluorescence in situ hybridization ...... 20 26 Stable isotope data ...... 21 27 Assays to measure urea production (via allantoin) and urea degradation (into 28 ammonia) ...... 21 29 Supplementary Results ...... 24 30 Colony fragment nutritional experiments—antibiotic treatments ...... 24
1
31 Fine-scale metagenome binning from C. varians colony PL010: Why did N- 32 recycling genes appear absent from Cephaloticoccus and the predicted uric acid 33 degrading Burkholderiales with relatedness to isolate Cv33a? ...... 25 34 A summary of sequenced genomes from cultured isolates ...... 25 35 Our cultured isolates are highly similar to previously sampled core symbionts. .... 26 36 References ...... 28 37 38 39 Supplementary Figure legends
40
41 Figure S1. Relative bacterial abundance in the ant groups under different
42 dietary treatments in the 15N labeled glutamate (A), 13C labeled glutamate (B)
43 and 15N labeled urea feeding experiment (C). The relative bacterial abundance was
44 determined by dividing bacterial 16S rRNA copy number estimates by one tenth of
45 the total amount of bacterial 16S rRNA copy number estimates of the ten pooled gut
46 DNA sample used for constructing standard curves. 16S rRNA amplicon sequencing
47 was performed only for ants in 15/14N glutamate. NA=16S amplicon sequencing not
48 performed for these ants.
49
50 Figure S2. Survival of Cephalotes varians workers under different dietary
51 treatments with isotope labeling of dietary urea (A) and dietary glutamate (B)
52 with symbiont removal or maintenance. (A) Cox regression analysis for the
53 workers fed on antibiotics (green lines) shows that disruption of gut microbiota
54 significantly reduces survival (Wald statistic = 6.89, df = 1,P=0.0087 for coloy
55 PL215A; Wald statistic = 22.67, df = 1,P= 1.924e-06 for coloy PL217; Wald statistic
56 = 3.67, df = 1,P=0.0553 for coloy PL231). (B) Cox regression analysis for the
57 workers fed on antibiotics (green lines) shows that disruption of gut microbiota has no
58 effect on survival of C. varians in this experiment. (Wald statistic = 2.4, df = 1,
2
59 P=0.1214 for coloy PL207; Wald statistic = 0.29, df = 1, P= 0.5888 for coloy PL210;
60 Wald statistic = 0, df = 1, P=0.9882 for coloy PL231).
61
62 Figure S3. Percentage of 13C-labeling of free essential amino acids (A) and non-
63 essential amino acids (B) in hemolymph of Cephalotes varians fed with 13C-
64 labeled glutamate. Asterisks indicated that 13C in amino acids from 13C-treated ants
65 (blue) was significantly higher than in ants feeding on unlabeled glutamate (red) and
66 in aposymbiotic ants feeding on 13C-labeled glutamate (green) across three
67 investiaged colonies.
68
69 Figure S4. Percentage of 15N-labeling of free essential amino acids (A) and non-
70 essential amino acids (B) in hemolymph of Cephalotes varians fed with 15N-
71 labeled glutamate. Asterisks indicated that 15N in amino acids from 15N-treated ants
72 (blue) was significantly higher than in ants feeding on unlabeled glutamate (red) and
73 in aposymbiotic ants feeding on 15N-labeled glutamate (green) across three
74 investiaged colonies.
75
76 Figure S5. Phylogenetic analyses of symbiont 16S rRNA genes reveal strong
77 taxonomic conservation among worker-associated gut bacteria. Phylogenies of
78 16S rRNA nucleic acid sequences based on sequences extracted from 18 Cephalots
79 metagenomes and top BLAST hits. Rooted maximum likelihood phylogeny reveals
80 nearly all Cephalotes-associates come from Cephalotes-specific clades. N-recycling
81 bacteria identified through in vitro assays are emphasized with cyan or green lines
82 connecting their branches to their strain names. Outer circle and branch colors:
83 bacterial taxonomy. Middle circle colors: Cephalotes species groups. Inner circle: all
3
84 red shading of taxon names reveals sequences coming from our metagenomic
85 datasets, bright red shading of taxon names reveals cultured isolates and gray shading
86 of taxon names represents non-Cephalotine ant associated bacteria.
87
88 Figure S6. The conserved operons of genes involved in uric acid degradation and
89 urea degradation pathways across 17 Cephalotes ant species. A cladogram based
90 on reported relationships12 is shown on the left. Names and functions of genes in the
91 uric acid degradation and urea gene operons are given at the top of the figure. The
92 arrow with dashed lines represents ant host derived metabolic steps. The gene
93 structure of each operon was shown in all 18 metagenomes, with the left panel
94 indicating Xanthine/Uric acid degradation gene operons and the right panel indicating
95 Urea degradation gene operons. Each gene operon was labelled by the corresponding
96 scaffold ID and was highlighted by a box colored by the bacterial orders to which
97 they were binned. For some hosts (C. varians and C. rohweri) we present data from
98 cultured isolate genome sequencing; such findings are indicated with labeling at right,
99 while distinctions between the two metagenomes from C. varians are indicated at the
100 right as well.
101
102 Figure S7. Presence or absence of genes involved in pathways of xanthine/uric
103 acid degradation, urea degradation, ammonia assimilation and amino acid
104 synthesis for eight bacterial bins in each of the gut metagenome of 18 Cephalotes.
105 Symbionts hail from the orders Burkholderiales (A), Rhizobiales (B), Opitutales (C),
106 Pseudomonadales (D), Xanthomonadales (E), Campylobacterales (F) and
107 Flavobacteriales (G). White and blue in each heatmap respectively represent the
108 absence and presence of genes associated with the focal metabolic pathways. If total
4
109 length of scaffolds belonging to a specific bacterial taxa from one metagenomic
110 dataset is less than 50% of the total length of the same taxa draft genome assembled
111 from metagenome,Gray bars denote the lack of pathway information for the core
112 bacterial bins of Cephalotes ants. A cladogram based on previously published
113 relationships of 18 Cephalotes ants12 is shown to the left of each panel. Common
114 ancestry traces back to roughly 46 million years.
115
116 Figure S8 Phylogenetic analyses of symbiont UreC proteins reveal patterns of
117 convergent functional evolution among worker-associated gut bacteria.
118 Phylogenies of UreC proteins based on sequences extracted from 18 Cephalotes
119 metagenomes and top BLAST hits.Rooted maximum likelihood phylogeny reveals
120 nearly all Cephalotes-associates come from Cephalotes-specific clades. Outer circle
121 and branch colors: bacterial taxonomy. A lack of shading in the outer circle, for
122 Cephalotes-derived sequences, revealed that ureC genes fell on contigs that could not
123 be assigned to bacterial phyla or any lower taxa. Middle circle colors: Cephalotes
124 species groups. Inner circle: all red shading of taxon names reveals sequences coming
125 from our metagenomic datasets, bright red shading of taxon names reveals cultured
126 isolates, gray shading of taxon names represents non-Cephalotine ant associated
127 bacteria, and green shadings of taxon names reveals sequences from Bartonella apis
128 isolated from honeybees.
129
130 Figure S9. Predicted essential amino acid biosynthetic pathways in the gut
131 metagenome of Cephalotes varians. Names of genes not found in bacterial genomes
132 are in red font. Asterisks indicated that genes were identified in the ant genome. Data
133 are compiled from the metagenomes from colonies PL005 and PL010. 3PG, 3-
5
134 phosphoglycerate; E4P, erythrose-4-phosphate; PEP, phosphoenolpyruvate; PRPP,
135 phosphoribosyl pyrophosphate; OA, oxaloacetate;Cit, citrulline.
136 Figure S10. Distribution of scaffolds containing genes in the N-metabolic
137 pathways in taxon-annotated GC-coverage (TAGC) plots for the metagenomes
138 of Cephalotes varians. Individual scaffolds are plotted based on their GC content (x-
139 axis) and their read coverage (y-axis, logarithmic scale). Scaffolds are colored based
140 on the taxonomic order they were assigned to as described in the text. (A) and (D)
141 Scaffolds containing genes in uric acid degradation pathways were highlighted in the
142 TAGC plots of colony PL005 (top) and PL010 (bottom). (B) and (D) Scaffolds
143 containing genes in urea degradation pathways were similarly highlighted, as were
144 those containing genes involved in ammonia assimilation (C) and (F).
145
146 Figure S11. Phylogeny of uricase amino acid sequences from metagenomes of C.
147 varians. Maximum likelihood phylogeny reveals that uricase amino acid sequences in
148 our metagenomic surveys form a Cephalotes-specific clade. The tree was rooted using
149 Actinosynnema mirum as the outgroup. Clade colors represent the source from which
150 the uricase coding sequences was derived.
151
152 Figure S12. Phylogenetic analyses of symbiont PuuD and UraH proteins reveal
153 patterns of convergent functional evolution among worker-associated gut
154 bacteria. Phylogenies of PuuD proteins (A) and UraH proteins (B) based on
155 sequences extracted from 18 Cephalots metagenomes and top BLAST hits.Rooted
156 maximum likelihood phylogeny reveals nearly all Cephalotes-associates come from
157 Cephalotes-specific clades. Outer circle and branch colors: bacterial taxonomy.
158 Middle circle colors: Cephalotes species groups. Inner circle: all red shading of taxon
6
159 names reveals sequences coming from our metagenomic datasets, bright red shading
160 of taxon names reveals cultured isolates and gray shading of taxon names represents
161 non-Cephalotine ant associated bacteria.
162
163 Figure S13. Pathways from the purines Guanine and Adenine to urea, via
164 xanthine/uric acid degradation. Shown with blue highlighted boxes are enzymes
165 encoded by the Burkholderiales CV33a strain, which can make urea from Guanine but
166 potentially not Adenine.
167
168 Figure S14. Alternative mechanisms for urea production in Cephalotes ants via a
169 separate, two-step pathway converting arginine to urea.
170
171 Figure S15. Alignment of isolate genome assemblies with metagenome
172 assemblies. Genome alignment against metagenome contigs shows the similarity of
173 cultured isolates to genomes present in the in vivo gut community. Each circular
174 genome visualization represents an isolate genome. The outermost ring shows GC%,
175 while the innermost shows coding density of the isolate genome. Each of the two
176 middle rings indicates alignment of scaffolds from C. varians metagenomes, from
177 samples C. varians PL010 (inner) and C. varians PL005 (outer). For each sample,
178 contigs aligning contiguously to the isolate genome reference are indicated by green
179 blocks; contigsthat align successfully but that are misassembled with respect to the
180 reference isolate genome are indicated in red. Nucleotide mismatches between the
181 reference and metagenome contigs are summarized by column charts within each
182 band, with higher columns indicating moremismatches in that window.
183
7
184 Supplementary Table legends
185
186 Table S1. Collection information for the ant colonies utilized in this study
187
188 Table S2. Acetylene-reduction activity detected for in vivo bacterial communities
189 of C. varians and information on colonies used in this study. Nitrogenase can
190 reduce acetylene (C2H2) to ethylene (C2H4). No ethylene was detected in three ant
191 colonies investigated in this study.
192
193 Table S3. Statistical results for heavy isotopic signal in hemolymph amino acids
194 in three feeding experiments.
195
196 Table S4. Assembly statistics of metagenomic data.
197
198 Table S5. Genes from N-metabolic pathways in 18 Cephalotes ant gut microbiota
199 and their distribution in different bacterial bins. B, R, O, P, X, C, F, S and H refer
200 to Burkholderiales, Rhizobiales, Opitutales, Pseudomonadales, Xanthomonadales,
201 Campylobacterales, Flavobacteriales, Sphingobacteriales and Hymenopera bins,
202 respectively.
203
204 Table S6. Assembly statistics of genomes and cultivation conditions of cultured
205 bacteria.
206
207 Table S7. The distribution of genes from N-metabolic pathways in the 14
208 genomes of bacteria isolated from C. varians and C. rohweri.
8
209
210 Table S8. Summary of scaffolds assigned to 11 bins in PL010 C. varians
211 metagenome
212
213 Table S9 Summary of strain-level binning for gut metagenomes from C. varians
214 workers in colony PL010.
215
216 Table S10 The distribution of genes from N-metabolic pathways in the 11 bins
217 generated based on the metagenome of C. varians colony PL010.
218
219 Table S11 Results of in vitro urea production assays
220
221 Table S12 Information of samples and fraction of the first isotopic peak
222 abundance (M+1 abundance (fraction %)) of amino acids in the feeding
223 experiments with 15N-labelled urea and 13C/15N-labeled glutamate. The first
224 isotopic peak represents the abundance of naturally occurring amino acids containing
225 heavy isotpes.
226
227 Table S13 OTU table from C. varians gut community samples used in feeding
228 experiments with 15N-labeled glutamate. The columns correspond to samples and
229 rows correspond to OTUs. Numbers represent read abundance for each OTU within
230 each library. Also indicated are taxonomic classification for each OTU.
231
232
233
9
234 Supplementary Materials and Methods
235 Assessing N-fixation
236 To measure N-fixation capacities for Cephalotes associated microbes we performed
237 acetylene reduction assays, in which conversion of acetylene (C2H2) to ethylene (C2H4)
238 is used as evidence for active nitrogenase enzymes 1, 2. Three colonies of C. varians
239 were collected (by CSM or YH) from mangrove trees in the Florida Keys (Table S1).
240 After excavation in the field, we immediately placed all available workers (and larvae,
241 pupae or queens, when present: Table S2) into 10 ml gas tight syringes (Vici Precision
242 Sampling Inc, Baton Rouge, LA, USA). An empty syringe was used as a control. Two
243 milliliters of air in these four syringes were removed and two milliliters of acetylene
244 were added to the syringe, resulting in a final atmosphere of 20% acetylene. A 1ml air
245 mixture sample from each syringe was injected in a 3ml Exetainer tube at 0, 1, 2, 4, 8,
246 16 hours. Acetylene and ethylene concentrations were then quantified using a gas
247 chromatography-flame ionization detector (GC-FID, HP6890 series, Agilent
248 Technologies, Inc., E.&E.S. Analytical Instrumentation of University of Pennsylvania).
249
250 Feeding experiments with 15N-labeled urea and 13C/15N-labeled glutamate
251 Nine colonies of C. varians collected from the Florida Keys (Table S1) were reared on
252 a holidic artificial diet 3 and 50% honey water at 25°C under a daily light:dark cycle of
253 14:10 until use in the feeding experiment. Fresh diet was provided roughly every two
254 days. All adult workers were subjected to a water-only starvation period of three days
255 prior to the start of experiments.
256
257 In the feeding experiment with 15N-labeled urea, workers from each of three colonies
258 were split into two treatment groups. In the first treatment, workers were subjected to
10
259 antibiotic feeding to remove their gut bacteria through rearing on 30% (weight/volume)
260 sucrose water containing 0.01% of each Tetracycline, Rifampicin, and Kanamycin.
261 Untreated workers from the second treatment group consumed only 30% sucrose water.
262 After the three week pre-trial period, the antibiotic-treatment groups were provided
263 with 30% sucrose water with the same antibiotic mixture, in addition to 1%
264 (weight/volume) 15N-labeled urea (Sigma-Aldrich, St Louis, MO). Untreated ants were
265 further split into subgroupings, with half being reared upon 30% sucrose water with 1%
266 (w/v) 15N-labeled urea, and the other half consuming 30% sucrose water containing 1%
267 (w/v) unlabeled (i.e. mostly 14N) urea.
268
269 The same experimental design was applied to three colonies in an experiment with 13C-
270 labeled glutamate (and unlabeled control) and to three additional colonies under a 15N-
271 labeled glutamate treatment (with an unlabeled control). Workers from each colony
272 were divided into two groups, with the first group of workers being fed a holidic
273 artificial diet3 containing 0.01% of each Tetracycline, Rifampicin, and Kanamycin, and
274 the second group consuming a holidic artificial diet for three weeks (the pre-trial
275 period). At that point, antibiotic-treatment groups were then switched to the modified
276 holidic diet (trial period), with only non-essential amino acids and the total amino acid
277 concentration the same as the complete holidic diet, plus glutamate containing either
278 isotope label, in addition to the same antibiotic mixture. At the same time, those from
279 the treatment without antibiotics were split into two groups, with one set consuming
280 the same diet as their antibiotic-treated counterparts, with either a standard isotope ratio
281 (treatment 2) or with glutamate containing the heavy label (treatment 3). No antibiotics
282 were added to these latter diets.
283
11
284 After four to five weeks of feeding during the trial period, ant hemolymph was extracted
285 from surviving workers. Hemolymph was harvested from decapitated ants using
286 borosilicate glass needles pulled from microcapillary tubes (1/0.58 OD/ID mm, World
287 Precision Instruments, Sarasota, FL) to capture droplets exuding from the posterior
288 opening of the head capsule and from the anterior opening of the mesosoma. Depending
289 on the number of ants available, there were two or three replicates for each colony and
290 treatment, each consisting of pooled hemolymph from 3-10 workers (see Table S12 for
291 details). Hemolymph was added to 10ul of molecular grade water, and samples frozen
292 at -80°C immediately after collection.
293
294 Worker survival curves for the 13C and 15N experiments (trial periods) were plotted and
295 data were analyzed using Cox regression analysis.
296
297 qPCR and amplicon 16S rRNA sequencing to estimate antibiotic efficacy
298 Quantative PCR with universal 16S rRNA primers was used to confirm that the
299 antibiotic treatments drastically reduced bacterial loads in gut communities of C.
300 varians (Fig. S1). Gasters of those ants were used for DNA extraction and all other
301 DNA isolation procedures were the same as mentioned below. 16S rRNA gene copy
302 concentration was estimated using qPCR with PerfeCTa SYBR Green FastMix (Quanta
303 Biosciences, Gaithersburg, MD, USA) and eubacterial primers 515F (5’-
304 GTGCCAGCMGCCGCGGTAA-3’) and 806R (5’-GGACTACHVGGGTWTCTAAT-
305 3’) at 200 nM each, on a CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad,
306 Hercules, CA, USA). The PCR program consisted of initial denaturation at 94°C for 3
307 minutes; 40 cycles of 94°C for 45s, 50°C for 60s, 72°C for 60s, and plate read at the
308 end of the extension step. Melting curve analysis was applied at the end of these 40
12
309 cycles, with temperatures rising from 55°C to 95°C with 0.5°C increments and plate
310 reads after 5s incubation at each temperature. Ten-fold dilution series of a DNA sample
311 extracted from ten pooled gut samples of C. varians were used to build standard curves
312 for estimation of relative bacterial abundance in ants under different dietary treatments.
313 Four biological replicates per dietary treatment in each colony were chosen for
314 quantitative PCR. Three technical replicates per standard curve sample and two
315 technical replicates per biological replicate were performed for each dietary
316 experiment. The relative bacterial abundance was determined by dividing bacterial 16S
317 rRNA copy number estimates by one tenth of the total amount of bacterial 16S rRNA
318 copy number estimates of the ten pooled gut DNA sample used for constructing
319 standard curves.
320
321 DNA samples from individual workers in the 15N glutamate experiment were sent to
322 Argonne National Laboratory for Illumina amplicon sequencing of the V4 region of
323 bacterial 16S rRNA. Analyses of sequences proceeded using previously published
324 quality control and filtering protocols4. A 97% OTU table was generated (Table S13),
325 and the average relative abundance of each OTU was obtained across ants from the
326 same treatment. Averages were then plotted in conjunction with qPCR data from the
327 same treatment (Fig. S2), showing how antibiotics had altered the composition in
328 addition to the quantities of gut microbiota.
329
330 Amino acid analysis from ant hemolymph by gas-chromatography-mass
331 spectrometry (GC-MS)
332 Enrichment of amino acids in ant hemolymph was measured at the Metabolic Tracer
333 Resource at the University of Pennsylvania. Approximately 5 ul of each hemolymph-
13
334 water mixture was acidified with 1 ml of 1N acetic acid and run over AG 50W-X8
335 cation exchange resin. Resin was washed three times with milli-Q water and free amino
336 acids eluted using 3N ammonium hydroxide. Samples were dried in a rotary vacuum
337 evaporator and amino acids converted to their heptafluorobutyryl isobutyl ester
338 derivatives5. Derivatized amino acids were injected onto an Agilent 7890A/5975C
339 Series gas chromatograph/mass spectrometer (GC/MS) (Agilent Technologies, Santa
340 Clara, CA) operated in the negative chemical ionization mode and separated using a
341 DB5-MS column. The injection port temperature was 250°C. The GC column
342 temperature was maintained at 80°C for 1 minute, increased to 150°C (10°C /min) and
343 then to 300°C (20°C /min). It was then held at 300°C for 1 minute. Amino acid peaks
344 were identified by retention time, which was confirmed using purified standards. Peaks
345 that could not be definitively identified were not measured.
346
347 Abundance data of 15N/13C-labeled essential amino acids in ant hemolymph samples
348 were transformed with a logit transformation (ln(p/(1-p))) before statistical analysis.
349 All logit transformed data were checked for normality by Shapiro-Wilk W-test. Normal
350 data were compared using one-way ANOVA with dietary treatment as a factor and
351 levels of 15N or 13C-labeled amino acids as dependent variables, followed by Tukey’s
352 post-hoc tests. Non-normal data were analyzed by Kruskal-Wallis tests followed by
353 multiple pairwise comparisons using the Wilcoxon rank sum test (see Table S3). All
354 statistical tests were performed using R version 3.3.2.
355
356 DNA preparation for C. varians metagenomics
357 Ten adult C. varians workers from each of two colonies in the Florida Keys were used
358 to create two DNA pools for metagenome sequencing. Adult workers were washed in
14
359 70% ethanol and sterile water before dissection. Ant guts were dissected with sterile
360 forceps under a compound light microscope. Between each individual dissection,
361 forceps were rinsed with a 6% bleach solution and then with sterile water. The dissected
362 mid- and hind- guts were individually immersed in 180 μL enzymatic lysis buffer
363 containing lysozyme (20 mg/ml). After grinding with sterile pestles, samples were
364 incubated for 30 min at 37°C. Extractions then proceeded according to the protocol for
365 gram-positive bacteria with the Qiagen DNeasy Kit (Qiagen, Valencia, CA). Pooled
366 genomic DNA from the guts of ten workers per colony was used as source material for
367 the two Illumina HiSeq metagenome libraries (colony PL005; colony PL010).
368
369 DNA extraction from non-C. varians ants for metagenomics
370 DNA from dissected guts of Cephalotes ants other than C. varians was extracted
371 according to the protocol of Sanders et al 2014 6, using pools of 10 dissected guts per
372 colony (as opposed to single guts, as for C. varians). Briefly, dissected guts preserved
373 in RNAlater were diluted ~1:1 in sterile water (to decrease solution density and
374 dissolve any precipitated salts), and spun to pellet the biological material. The
375 supernatant was removed and replaced with lysis buffer TLS-C (MPBio, inc), then
376 vortexed to resuspend. Resuspended material was lysed with
377 Phenol:Chloroform:Isoamyl alchohol (pH 8) and sterile beads (Lysis Matrix A,
378 MPBio) on a MPBio FastPrep-20 bead beater. The aqueous phase was then column-
379 purified through Qiagen DNeasy Blood and Tissue extraction columns, and
380 concentrated by isopropanol precipitation. Full methodological details are published
381 elsewhere6.
382 DNA extraction from cultured bacteria
383 High molecular weight DNA from cultured bacteria isolated from C. varians and C.
15
384 rohweri was extracted using Qiagen Genomic Tip 20/G columns, following the
385 manufacturer's recommendations for bacterial cultures.
386
387 Genome and metagenome sequencing, assembly and annotation
388 For shotgun sequencing of metagenomes of C. varians and isolates derived from C.
389 varians and C. rohweri, DNA was sheared to 400bp using a Covaris S220 sonicator.
390 Sheared DNA was end-repaired and ligated to indexed Illumina-compatible sequencing
391 adapters (Bioo Scientific, Inc) using the KAPA low-throughput Illumina-compatible
392 library preparation kit (KAPA biosystems, Inc). Fragments of the two prepared libraries
393 were size selected using double-ended SPRI bead-based size selection following the
394 KAPA protocol. After this selection, libraries were amplified for six cycles using KAPA
395 high-fidelity polymerase and then checked for quality using an Agilent Bioanalyzer.
396 The two prepared libraries as well as two from Cephalotes larval samples were pooled
397 with other indexed samples, combining for an estimated 40% of the total molar fraction
398 in the Illumina sequencing lane, and then sequenced at the Harvard Biopolymers
399 Facility using paired-end 150 bp reads on an Illumina HiSeq 2500 instrument.
400
401 Sequence libraries for non-C. varians-derived metagenomes and isolates were prepared
402 using the same Covaris shearing step as above, but on an Apollo 324 automated library
403 preparation robot using the PrepX ILM DNA kit (IntegenX, Inc) following
404 manufacturer’s recommendations. These libraries were PCR-amplified using the same
405 protocol as above, and amplified libraries were size-selected using the double-ended
406 SPRI bead-based size selection protocol on the Apollo 324 instrument. The resulting
407 libraries were sequenced using paired-end 100bp chemistry on an Illumina HiSeq 2000
408 instrument.
16
409
410 Metagenome sequences were trimmed for quality and adapters using Trimmomatic7.
411 The quality trimmed reads were then combined and assembled with IDBA-UD 1.1.1
412 using k values of 20, 40, 60, 80, and 1008. The assembled data were run through
413 QUAST9 to calculate assembly statistics (Table S4).
414
415 Scaffolds of 18 metagenomes and 14 isolates, and coverage information of
416 metagenomic scaffolds were uploaded to the Integrated Microbial Genomes with
417 Microbiome Samples Expert Review (IMG/M-ER)10. Assignment of phylogenetic
418 lineages was initially attempted in IMG/MER based on USEARCH similarity against
419 all public reference genomes in IMG and the KEGG database. However, some scaffolds
420 could not be assigned to bins while others were classified into bins not matching taxa
421 known to be prevalent amongst the Cephalotes gut microbiota. To obtain more accurate
422 information of phylogenetic binning, all scaffolds with length longer than 1000 bp from
423 18 metagenomes were compared to eight reference genomes of isolated gut bacteria
424 from C. varians (GOLD Analysis Project ID: Ga0064586, Ga0064593, Ga0064594,
425 Ga0064595, Ga0064585, Ga0064596, Ga0105007) and a Rhizobiales bacterium
426 genome (accession number CP015625) using BLASTX with an e-value of 10-15,
427 identity of 70% and maxhits of 1. A scaffold was assigned to a bacterial bin if over 50%
428 of all best BLASTX hits belonged to a single reference bacterial genome and at least
429 50% of the scaffold sequence was covered by the aforementioned BLASTX hits. If
430 phylogenetic assignment by IMG/MER of a certain scaffold did not match reference
431 genome based results, this specific scaffold was assigned to the phylogenetic group of
432 the appropriate cultured isolate as ascertained through this BLASTx approach.
433 Annotation of gene content was also performed by IMG/M-ER. N-metabolic pathways
17
434 of gut microbiota (Fig. 6; Figs. S6 & S9; Fig. S13-S14) were built manually, using
435 KEGG and Metacyc11 as guides (Tables S5, S7, S10). All genes involved in the N-
436 metabolic pathways of C. varians gut microbiota were added into a functional cart in
437 IMG, and the “Profile & Alignment” tool in the IMG function cart was used to search
438 those genes in non-C. varians-derived metagenomes. We present the nitrogen recycling
439 and nitrogen provisioning gene presence/absence data in different bacterial taxa along
440 with the Cephalotes host phylogeny12 in Figure S7. Gray bars were used in this figure
441 to obscure cells likely affected by insufficient coverage for that taxon in the given
442 metagenome (i.e. when total scaffold length within one metagenomic less than 50% of
443 the total length for this taxon in the draft genome assembled for C. varians PL010—
444 see below).
445
446 Sequence fragments of 16S rRNA genes with length longer than 200 bp were extracted
447 from all 18 metagenomic libraries and 14 cultured isolate genomes. Closest relatives of
448 each 16S rRNA sequence were identified in BLASTn searches and the top one to three
449 BLAST hits were taken for each sequence. If the top hit was from a non-ant source, this
450 sequence alone was selected. Up to two ant-associated sequences were selected, and
451 the top non-ant BLAST hit was always selected. Beyond the 16S rRNA sequences from
452 metagenomes and cultured isolates, and those from BLASTn hits, we also selected one
453 to five sequences with close relatedness to each of the major Cephalotes-specific clades
454 (based on phylogenetic placement in prior studies), along with two Mollicutes
455 sequences used as outgroups. Finally, we included a partial 16S rRNA sequence from
456 an allantoin-dependent, urea-producing Burkholderiales derived from the sister ant
457 genus of Cephalotes, Procryptocerus. Sequences were checked for chimeras though
458 DECIPHER14 and chimera filtered sequences were uploaded to the Ribosomal
18
459 Database Project website for sequence alignment15. The alignment was then uploaded
460 to the CIPRES web portal for maximum likelihood phylogenetic analysis using the
461 RAxML-HPC2 on XSEDE (version 8.2.4)16.
462
463 Amino acid sequence fragments encoded by ureC, uraH, and puuD from N-recycling
464 pathways were extracted from each metagenomic and genomic dataset. Related
465 homologs were identified in BLASTp searches and the top one to two for each sequence
466 was selected. Sequences were aligned by ClustalW17. The alignment was uploaded to
467 the CIPRES web portal for maximum likelihood phylogenetic analysis with
468 bootstrapping using the RAxML-HPC BlackBox 16.
469
470 Genome binning using Anvi’o in conjunction with the CONCOCT
471 We used the Anvi’o metagenome visualization and annotation pipeline (version 1.2.3)18
472 in conjunction with the CONCOCT differential coverage-based binning program19 to
473 bin assembled contigs into putative microbial genomes. These putative genomes were
474 then manually refined to maximize completeness and minimize redundancy according
475 to panels of single-copy marker genes as reported by Anvi’o. Briefly, reads from each
476 of the four pooled Cephalotes varians metagenomic libraries were mapped against the
477 assembled contigs using Bowtie220. These read profiles were then loaded into an Anvi’o
478 database and used for differential coverage binning with CONCOCT. All steps in this
479 process (with the exception of manual bin refinement) were automated using the
480 Snakemake workflow management software21; pipeline rules and configuration
481 information sufficient to reproduce this analysis are made available upon request.
482
483 Amino acid sequence fragments encoded by seven protein-coding genes (rplB, rplA,
19
484 rplC, rpsB, rpsC, rpsE and tsf) were extracted from each isolate genomic and draft
485 genomic dataset. The concatenated alignment was uploaded to the CIPRES web portal
486 for maximum likelihood phylogenetic analysis with bootstrapping using the RAxML-
487 HPC BlackBox 16.
488
489 Visualization of taxonomic composition of metagenomes based on coverage
490 and %GC
491 Quality and adapter-trimmed reads were mapped back to metagenome scaffolds using
492 BWA 0.7.1222 with default parameters. A Perl script sam_len_cov_gc_insert.pl
493 (https://github.com/sujaikumar/assemblage) was used to estimate length, %GC content
494 and average depth for each scaffold from the samfile generated by BWA. This GC-
495 coverage file was combined with a customized file containing the information of
496 taxonomic assignment for each scaffold using a python script make_blobology_file.py
497 (http://static.xbase.ac.uk/files/results/nick/make_blobology_file.py). Taxon-annotated
498 GC-coverage (TAGC) plots were then generated using scaffolds using a customized
499 python script to visualize the contributions of different bacterial bins to the metagenome
500 assemblies.
501
502 Fluorescence in situ hybridization
503 We investigated the localization of bacteria within the digestive tract of ants using
504 fluorescence microscopy. Guts dissected from workers of Cephalotes sp. JGS2370 were
505 fixed in 4% formaldehyde in PBS buffer for 2h at room temperature, then dehydrated
506 using an ethanol gradient, and stored in 95% ethanol. After rehydration using PBS
507 buffer with 0.03% TritonX-100, they were washed three times for 10 minutes with
508 hybridization solution containing 30% formamide, 0.01% SDS, 0.9 M NaCl and 0.02
20
509 M Tris-HCl (pH 8.0). Hybridization was performed overnight at 37°C in hybridization
510 solution with the addition of the universal eubacterial probe EUB338 (5’-
511 GCTGCCTCCCGTAGGAGT-3’) labeled with Cy3 at 100 nM, as well as DAPI as a
512 counterstain. After washing with PBS, the specimens were imaged using a Leica
513 M165FC fluorescent stereo microscope. Fluorescent microphotographs taken using the
514 blue and green excitation filters were merged with a photograph taken under the white
515 light. The detailed protocol is provided in23.
516
517 Stable isotope data
518 Data were extracted from a prior N-isotope profiling studies24 using graphical tools, as
519 described and summarized previously. We also used data from supplementary files of
520 another study that profiled Cephalotes N-isotopes25. For each locale where isotope data
521 had been generated previously, we plotted delta 15N values for Cephalotes next to those
522 for other Myrmicinae ants (the sub-family containing Cephalotes) and ants from
523 Camponotus (from the subfamily Formicinae; this genus harbors N-recycling
524 symbionts feeding somewhat low on the food chain). Also plotted, separately for each
525 locale, were delta 15N values for sympatric plants, sap-feeding herbivores, leaf-chewing
526 herbivores, and predators.
527
528 Assays to measure urea production (via allantoin) and urea degradation (into
529 ammonia)
530 Bacteria isolated from the guts of Cephalotes or Procryptocerus ants were grown in
531 trypticase soy broth (TSB) or TSB supplemented with 250 µM allantoin (Sigma). They
532 were prioritized for genome sequencing based on their similarity at 16S rRNA to
533 previously sampled bacteria. A similar rationale was used to prioritize them for in vitro
21
534 assays.
535
536 As a proxy for the uric acid degradation pathway, we measured whether selected
537 isolates from the Burkholderiales (Fig. 6; Table S11) could produce urea in vitro and
538 whether this production was increased by allantoin (suggesting the presence of at least
539 part of the uric acidurea pathway). Tubes of TSB and TSB with allantoin were
540 inoculated from a liquid culture of the chosen isolates to an initial OD(A600) of
541 approximately 0.05. Uninoculated control TSB and TSB + allantoin tubes were also
542 incubated along with the inoculated samples. Sample aliquots (500 µL) were collected
543 from each tube at various time points. The bacteria in the inoculated samples were
544 pelleted by centrifugation at 4500xg for 10 minutes, and the liquid portion of inoculated
545 and uninoculated samples stored at -20°C until analysis.
546
547 Urea concentrations were measured using a modified Jung assay 26. Briefly, a solution
548 of equal parts o-phthalaldehyde and primaquine bisphosphate (Sigma) was prepared,
549 and 200 µL of this working solution was combined with 50 µL of samples in a 96-well
550 assay plate. Standard concentrations of urea in TSB and TSB + 250 µM allantoin were
551 also tested. The reaction of o-phthalaldehyde and primaquine bisphosphate with urea
552 caused a color change, which was measured at 430 nm using a BioTek® Synergy H1
553 spectrophotometer. The absorbance values of the uninoculated TSB or TSB + 250 µM
554 allantoin blank was subtracted from each standard and sample, then concentration was
555 calculated from the standard curve, with concentrations for corrected values below that
556 of the lowest standard (0 µM) being treated as 0 µM. The concentration of the un-
557 inoculated samples at each time was subtracted from the corresponding concentrations
558 of inoculated to calculate the amount of urea produced by the isolate in each media
22
559 type. Average urea production at each time point was calculated and normalized by
560 subtraction of the 0 hour average. Data were analyzed with SigmaPlot software (Systat,
561 San Jose, CA) using a two way repeated measures ANOVA and Holm-Sidak test.
562 Comparisons were considered statistically different if p ≤ 0.05.
563
564 Bacteria from a range of taxa spanning multiple Cephalotes hosts (Fig. 6) were used in
565 assays measuring ammonia production from urea. We performed a qualitative method,
566 where bacteria were inoculated into Rapid Urea Broth (BD, Sparks, MD) containing
567 the pH indicator phenol red. Isolates were considered positive for urea degradation if
568 the color of the media changed from red to bright purple.
569
570 For isolates used in these assays, we generated full-length 16S rRNA sequences with
571 Sanger sequencing. Top BLAST hits and representative sequences from particular
572 clades were downloaded from NCBI. These represented bacteria that had been
573 previously found through culture-independent means (i.e. in vivo), typically through
574 shallow sampling of clone libraries. Their identity or near identity to our isolates from
575 C. varians and C. rohweri indicate that our cultured isolates are abundant core
576 microbes. Maximum likelihood phylogenies, with bootstrapping, were conducted in the
577 software package SeaView after sequence alignment (through the Muscle algorithm) in
578 this same program.
579
23
580 Supplementary Results
581 Colony fragment nutritional experiments—antibiotic treatments
582 Subsets of ants for the below-described N-upgrading and N-recycling experiments
583 were treated with antibiotics to remove or suppress gut bacteria. Treatment efficacy
584 was validated through significant reductions in 16S rRNA copy number compared to
585 unexposed ants reared on the same diets (Fig. S1). While the magnitude of this
586 suppression varied across treatments, bacterial titers were always significantly lower
587 under antibiotic exposure. Amplicon sequencing of the V4 region of 16S rRNA
588 revealed that bacteria remaining after treatment were almost entirely core symbionts
589 from the Rhizobiales. The absence of other core taxa and dominance by this one
590 group drew a strong contrast with untreated gut communities, which showed greater
591 richness and evenness, and an overall composition of core symbionts resembling that
592 seen in prior studies.
593
594 The effects of antibiotics on worker ant survival differed across the 13C-glutamate
595 labeling and 15N-urea labeling experiments (Fig. S2). In the latter case, Cox
596 regression statistics revealed harmful effects of antibiotic treatment on C. varians
597 survival for two of three colonies (Wald statistic = 6.89, df = 1,P=0.0087 for colony
598 PL215A; Wald statistic = 22.67, df = 1,P= 1.924e-06 for colony PL217; Wald statistic
599 = 3.67, df = 1,P=0.0553 for colony PL231). In contrast, antibiotic treatment had no
600 significant impact on survival in the glutamate feeding experiments for any of three
601 colonies (Wald statistic = 2.4, df = 1, P=0.1214 for colony PL207; Wald statistic =
602 0.29, df = 1, P= 0.5888 for colony PL210; Wald statistic = 0, df = 1,P=0.9882 for
603 colony PL231).
604
24
605 Fine-scale metagenome binning from C. varians colony PL010: Why did N-
606 recycling genes appear absent from Cephaloticoccus and the predicted uric acid
607 degrading Burkholderiales with relatedness to isolate Cv33a?
608 Incomplete sequencing may explain the apparent lack of urease genes in the
609 Opitutales (Cephaloticoccus) bin. Urease genes were indeed present in the PL010
610 metagenome, classifying to the Opitutales order (Table S5; Fig. S6). Furthermore,
611 most genes involved in N-metabolism were found on just a single Opitutales-binned
612 scaffold from this metagenome (Table S5), suggesting the presence of just one
613 symbiont strain from this order within this colony. The simplest resulting explanation
614 is that orphaned urease genes (i.e. left over after draft genome assembly) belong to the
615 Opitutales strain with the assembled draft genome and that their exclusion is a
616 methodological artifact. Similarly, while the draft genome from one Rhizobiales
617 strain encoded all necessary urease genes, a Burkholderiales strain close to the
618 cultured uric acid recycling Cv33a isolate lacked some genes in the uric acid pathway,
619 in spite of all pathway genes having been found in the Burkholderiales bin within the
620 PL010 metagenome (Table S10; Table S5; Fig. S6). This too may reflect difficulties
621 in assembling complete genomes from complex metagenomic datasets. Regardless,
622 findings of universal N-recycling in all in vitro assayed isolates from either
623 Cephaloticoccus (urea) and the Burkholderiales Cv33a clade (converting allantoin to
624 urea, a proxy for the end of the uric acid pathway), suggest role conservation in these
625 groups.
626
627 A summary of sequenced genomes from cultured isolates
628 Bacteria from several major taxa, including Pseudomonadales, Opitutales,
629 Rhizobiales, Xanthomonadales, and Burkholderiales were successfully cultured from
25
630 macerated worker guts. Strains were prioritized for sequencing based on preliminary
631 assessments of identity or near 16S rRNA gene identity compared to known core
632 symbionts (Fig. S5). In total, we sequenced fourteen bacterial genomes using Illumina
633 HiSeq (n=13) or PacBio (n=1) technology. Genome sizes ranged from 1.9-3.4 Mb
634 with %GC ranging from 53.4-62.4% (Table S6). On average, genomes encoded 2615
635 protein-coding genes. Full-length 16S rRNA genes from these genomes were nested
636 within known clades of cephalotine core gut symbionts generated from 16S rRNA in
637 our shotgun metagenomic analyses (Fig. S5).
638
639 Details on gene composition are found in the main text. But we note here the unique
640 nature of the JR021-5 Rhizobiales genome. Found in a disparate Rhizobiales clade
641 (i.e. not the primary grouping; Fig. S5), this bacterium shows resemblance to those
642 found in Cephalotes worker crops and in larvae. Its genome lacked some of the key
643 N-metabolism genes found in most others. For example, it was the only one of 14
644 cultured isolates to lack glutamate dehydrogenase gene (gdhA) converting ammonia
645 into glutamate. It was also a strong outlier in its capacities to make amino acids,
646 making only six out of twenty. This genome also did not encode N-recycling genes.
647
648 Our cultured isolates are highly similar to previously sampled core symbionts.
649 In testing whether genetic signatures reflect actual N-recycling capacities, we
650 performed a series of in vitro assays. For symbionts of host species studied
651 extensively through prior work (i.e. C. varians and C. rohweri), 16S rRNA sequences
652 of the focal isolates were highly similar if not identical to those of core symbionts
653 obtained through culture-independent efforts. Isolates from other cephalotine host
654 species, subjected to little or no prior symbiont sequencing, had top BLAST hits to
26
655 cephalotine-specific bacteria (Fig. 6). These findings combine to illustrate the natural
656 relevance of our in vitro work, i.e. we have assayed dominant core gut symbionts or
657 their very close relatives.
658
659 660
27
661 References
662 663 1. Hardy R, Burns R, Holsten RD. Applications of the acetylene-ethylene assay 664 for measurement of nitrogen fixation. Soil Biology and Biochemistry 5, 47-81 665 (1973). 666 667 2. Bentley BL. Nitrogen-fixation in termites - fate of newly fixed nitrogen. Journal 668 of Insect Physiology 30, 653-655 (1984). 669 670 3. Straka J, Feldhaar H. Development of a chemically defined diet for ants. 671 Insectes Sociaux 54, 100-104 (2007). 672 673 4. Hu Y, et al. By their own devices: invasive Argentine ants have shifted diet 674 without clear aid from symbiotic microbes. Molecular Ecology 26, 1608-1630 675 (2017). 676 677 5. MacKenzie SL, Tenaschuk D. Gas-liquid chromatography of N- 678 heptafluorobutyryl isobutyl esters of amino acids. Journal of Chromatography 679 A 97, 19-24 (1974). 680 681 6. Sanders JG, Powell S, Kronauer DJC, Vasconcelos HL, Frederickson ME, 682 Pierce NE. Stability and phylogenetic correlation in gut microbiota: lessons 683 from ants and apes. Molecular Ecology 23, 1268-1283 (2014). 684 685 7. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 686 sequence data. Bioinformatics 30, 2114-2120 (2014). 687 688 8. Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for 689 single-cell and metagenomic sequencing data with highly uneven depth. 690 Bioinformatics 28, 1420-1428 (2012). 691 692 9. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool 693 for genome assemblies. Bioinformatics 29, 1072-1075 (2013). 694 695 10. Markowitz VM, et al. IMG/M 4 version of the integrated metagenome 696 comparative analysis system. Nucleic Acids Research 42, D568-D573 (2014). 697 698 11. Caspi R, et al. The MetaCyc database of metabolic pathways and enzymes and 699 the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research 700 42, D459-D471 (2014). 701 702 12. Price SL, Powell S, Kronauer DJC, Tran LAP, Pierce NE, Wayne RK. Renewed 703 diversification is associated with new ecological opportunity in the Neotropical 704 turtle ants. Journal of Evolutionary Biology 27, 242-258 (2014). 705 706 13. Hu Y, Lukasik P, Moreau CS, Russell JA. Correlates of gut community 707 composition across an ant species (Cephalotes varians) elucidate causes and 708 consequences of symbiotic variability. Molecular Ecology 23, 1284-1300 709 (2014). 28
710 711 14. Wright ES, Yilmaz LS, Noguera DR. DECIPHER, a Search-Based Approach to 712 Chimera Identification for 16S rRNA Sequences. Appl Environ Microb 78, 717- 713 725 (2012). 714 715 15. Cole JR, et al. The Ribosomal Database Project: improved alignments and new 716 tools for rRNA analysis. Nucleic Acids Research 37, D141-D145 (2009). 717 718 16. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post- 719 analysis of large phylogenies. Bioinformatics 30, 1312-1313 (2014). 720 721 17. Thompson JD, Gibson TJ, Higgins DG. Multiple Sequence Alignment Using 722 ClustalW and ClustalX. In: Current Protocols in Bioinformatics (ed^(eds). John 723 Wiley & Sons, Inc. (2002). 724 725 18. Eren AM, et al. Anvi'o: an advanced analysis and visualization platformfor 726 'omics data. Peerj 3, (2015). 727 728 19. Alneberg J, et al. Binning metagenomic contigs by coverage and composition. 729 Nat Methods 11, 1144-1146 (2014). 730 731 20. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat 732 Methods 9, 357-U354 (2012). 733 734 21. Koster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. 735 Bioinformatics 28, 2520-2522 (2012). 736 737 22. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler 738 transform. Bioinformatics 26, 589-595 (2010). 739 740 23. Łukasik P, et al. The structured diversity of specialized gut symbionts of the 741 New World army ants. BioRiv, (2016). 742 743 24. Davidson DW, Cook SC, Snelling RR, Chua TH. Explaining the abundance of 744 ants in lowland tropical rainforest canopies. Science 300, 969-972 (2003). 745 746 25. Tillberg CV, Holway DA, LeBrun EG, Suarez AV. Trophic ecology of invasive 747 Argentine ants in their native and introduced ranges. Proceedings of the 748 National Academy of Sciences of the United States of America 104, 20856- 749 20861 (2007). 750 751 26. Zawada RJX, Kwan P, Olszewski KL, Llinas M, Huang SG. Quantitative 752 determination of urea concentrations in cell culture medium. Biochem Cell Biol 753 87, 541-544 (2009). 754
29