1 Supplementary Information (SI) Appendix

2

3 Nitrogen conservation, conserved: 46 million years of N-recycling by

4 the core symbionts of turtle

5

6 Yi Hu, Jon G. Sanders, Piotr Łukasik, Catherine L. D'Amelio, John S. Millar, David

7 R. Vann, Yemin Lan, Justin A. Newton, Mark Schotanus, John T. Wertz, Daniel J. C.

8 Kronauer, Naomi E. Pierce, Corrie S. Moreau, Philipp Engel, Jacob A. Russell

9

10 Table of Contents 11 Supplementary Figure legends ...... 1 12 Supplementary Table legends ...... 8 13 Supplementary Materials and Methods...... 10 14 Assessing N-fixation ...... 10 15 Feeding experiments with 15N-labeled urea and 13C/15N-labeled glutamate ...... 10 16 qPCR and amplicon 16S rRNA sequencing to estimate antibiotic efficacy ...... 12 17 Amino acid analysis from hemolymph by gas-chromatography-mass 18 spectrometry (GC-MS) ...... 13 19 DNA preparation for C. varians metagenomics, non-C. varians ants for 20 metagenomics and cultured bacteria ...... 14-15 21 Genome and metagenome sequencing, assembly and annotation ...... 16 22 Genome binning using Anvi’o in conjunction with the CONCOCT ...... 19 23 Visualization of taxonomic composition of metagenomes based on coverage 24 and %GC ...... 20 25 Fluorescence in situ hybridization ...... 20 26 Stable isotope data ...... 21 27 Assays to measure urea production (via allantoin) and urea degradation (into 28 ammonia) ...... 21 29 Supplementary Results ...... 24 30 Colony fragment nutritional experiments—antibiotic treatments ...... 24

1

31 Fine-scale metagenome binning from C. varians colony PL010: Why did N- 32 recycling genes appear absent from Cephaloticoccus and the predicted uric acid 33 degrading Burkholderiales with relatedness to isolate Cv33a? ...... 25 34 A summary of sequenced genomes from cultured isolates ...... 25 35 Our cultured isolates are highly similar to previously sampled core symbionts. .... 26 36 References ...... 28 37 38 39 Supplementary Figure legends

40

41 Figure S1. Relative bacterial abundance in the ant groups under different

42 dietary treatments in the 15N labeled glutamate (A), 13C labeled glutamate (B)

43 and 15N labeled urea feeding experiment (C). The relative bacterial abundance was

44 determined by dividing bacterial 16S rRNA copy number estimates by one tenth of

45 the total amount of bacterial 16S rRNA copy number estimates of the ten pooled gut

46 DNA sample used for constructing standard curves. 16S rRNA amplicon sequencing

47 was performed only for ants in 15/14N glutamate. NA=16S amplicon sequencing not

48 performed for these ants.

49

50 Figure S2. Survival of varians workers under different dietary

51 treatments with isotope labeling of dietary urea (A) and dietary glutamate (B)

52 with symbiont removal or maintenance. (A) Cox regression analysis for the

53 workers fed on antibiotics (green lines) shows that disruption of gut microbiota

54 significantly reduces survival (Wald statistic = 6.89, df = 1,P=0.0087 for coloy

55 PL215A; Wald statistic = 22.67, df = 1,P= 1.924e-06 for coloy PL217; Wald statistic

56 = 3.67, df = 1,P=0.0553 for coloy PL231). (B) Cox regression analysis for the

57 workers fed on antibiotics (green lines) shows that disruption of gut microbiota has no

58 effect on survival of C. varians in this experiment. (Wald statistic = 2.4, df = 1,

2

59 P=0.1214 for coloy PL207; Wald statistic = 0.29, df = 1, P= 0.5888 for coloy PL210;

60 Wald statistic = 0, df = 1, P=0.9882 for coloy PL231).

61

62 Figure S3. Percentage of 13C-labeling of free essential amino acids (A) and non-

63 essential amino acids (B) in hemolymph of Cephalotes varians fed with 13C-

64 labeled glutamate. Asterisks indicated that 13C in amino acids from 13C-treated ants

65 (blue) was significantly higher than in ants feeding on unlabeled glutamate (red) and

66 in aposymbiotic ants feeding on 13C-labeled glutamate (green) across three

67 investiaged colonies.

68

69 Figure S4. Percentage of 15N-labeling of free essential amino acids (A) and non-

70 essential amino acids (B) in hemolymph of Cephalotes varians fed with 15N-

71 labeled glutamate. Asterisks indicated that 15N in amino acids from 15N-treated ants

72 (blue) was significantly higher than in ants feeding on unlabeled glutamate (red) and

73 in aposymbiotic ants feeding on 15N-labeled glutamate (green) across three

74 investiaged colonies.

75

76 Figure S5. Phylogenetic analyses of symbiont 16S rRNA genes reveal strong

77 taxonomic conservation among worker-associated gut bacteria. Phylogenies of

78 16S rRNA nucleic acid sequences based on sequences extracted from 18 Cephalots

79 metagenomes and top BLAST hits. Rooted maximum likelihood phylogeny reveals

80 nearly all Cephalotes-associates come from Cephalotes-specific clades. N-recycling

81 bacteria identified through in vitro assays are emphasized with cyan or green lines

82 connecting their branches to their strain names. Outer circle and branch colors:

83 bacterial . Middle circle colors: Cephalotes species groups. Inner circle: all

3

84 red shading of taxon names reveals sequences coming from our metagenomic

85 datasets, bright red shading of taxon names reveals cultured isolates and gray shading

86 of taxon names represents non-Cephalotine ant associated bacteria.

87

88 Figure S6. The conserved operons of genes involved in uric acid degradation and

89 urea degradation pathways across 17 Cephalotes ant species. A cladogram based

90 on reported relationships12 is shown on the left. Names and functions of genes in the

91 uric acid degradation and urea gene operons are given at the top of the figure. The

92 arrow with dashed lines represents ant host derived metabolic steps. The gene

93 structure of each operon was shown in all 18 metagenomes, with the left panel

94 indicating Xanthine/Uric acid degradation gene operons and the right panel indicating

95 Urea degradation gene operons. Each gene operon was labelled by the corresponding

96 scaffold ID and was highlighted by a box colored by the bacterial orders to which

97 they were binned. For some hosts (C. varians and C. rohweri) we present data from

98 cultured isolate genome sequencing; such findings are indicated with labeling at right,

99 while distinctions between the two metagenomes from C. varians are indicated at the

100 right as well.

101

102 Figure S7. Presence or absence of genes involved in pathways of xanthine/uric

103 acid degradation, urea degradation, ammonia assimilation and amino acid

104 synthesis for eight bacterial bins in each of the gut metagenome of 18 Cephalotes.

105 Symbionts hail from the orders Burkholderiales (A), Rhizobiales (B), Opitutales (C),

106 Pseudomonadales (D), Xanthomonadales (E), Campylobacterales (F) and

107 Flavobacteriales (G). White and blue in each heatmap respectively represent the

108 absence and presence of genes associated with the focal metabolic pathways. If total

4

109 length of scaffolds belonging to a specific bacterial taxa from one metagenomic

110 dataset is less than 50% of the total length of the same taxa draft genome assembled

111 from metagenome,Gray bars denote the lack of pathway information for the core

112 bacterial bins of Cephalotes ants. A cladogram based on previously published

113 relationships of 18 Cephalotes ants12 is shown to the left of each panel. Common

114 ancestry traces back to roughly 46 million years.

115

116 Figure S8 Phylogenetic analyses of symbiont UreC proteins reveal patterns of

117 convergent functional evolution among worker-associated gut bacteria.

118 Phylogenies of UreC proteins based on sequences extracted from 18 Cephalotes

119 metagenomes and top BLAST hits.Rooted maximum likelihood phylogeny reveals

120 nearly all Cephalotes-associates come from Cephalotes-specific clades. Outer circle

121 and branch colors: bacterial taxonomy. A lack of shading in the outer circle, for

122 Cephalotes-derived sequences, revealed that ureC genes fell on contigs that could not

123 be assigned to bacterial phyla or any lower taxa. Middle circle colors: Cephalotes

124 species groups. Inner circle: all red shading of taxon names reveals sequences coming

125 from our metagenomic datasets, bright red shading of taxon names reveals cultured

126 isolates, gray shading of taxon names represents non-Cephalotine ant associated

127 bacteria, and green shadings of taxon names reveals sequences from Bartonella apis

128 isolated from honeybees.

129

130 Figure S9. Predicted essential amino acid biosynthetic pathways in the gut

131 metagenome of Cephalotes varians. Names of genes not found in bacterial genomes

132 are in red font. Asterisks indicated that genes were identified in the ant genome. Data

133 are compiled from the metagenomes from colonies PL005 and PL010. 3PG, 3-

5

134 phosphoglycerate; E4P, erythrose-4-phosphate; PEP, phosphoenolpyruvate; PRPP,

135 phosphoribosyl pyrophosphate; OA, oxaloacetate;Cit, citrulline.

136 Figure S10. Distribution of scaffolds containing genes in the N-metabolic

137 pathways in taxon-annotated GC-coverage (TAGC) plots for the metagenomes

138 of Cephalotes varians. Individual scaffolds are plotted based on their GC content (x-

139 axis) and their read coverage (y-axis, logarithmic scale). Scaffolds are colored based

140 on the taxonomic order they were assigned to as described in the text. (A) and (D)

141 Scaffolds containing genes in uric acid degradation pathways were highlighted in the

142 TAGC plots of colony PL005 (top) and PL010 (bottom). (B) and (D) Scaffolds

143 containing genes in urea degradation pathways were similarly highlighted, as were

144 those containing genes involved in ammonia assimilation (C) and (F).

145

146 Figure S11. Phylogeny of uricase amino acid sequences from metagenomes of C.

147 varians. Maximum likelihood phylogeny reveals that uricase amino acid sequences in

148 our metagenomic surveys form a Cephalotes-specific clade. The tree was rooted using

149 Actinosynnema mirum as the outgroup. Clade colors represent the source from which

150 the uricase coding sequences was derived.

151

152 Figure S12. Phylogenetic analyses of symbiont PuuD and UraH proteins reveal

153 patterns of convergent functional evolution among worker-associated gut

154 bacteria. Phylogenies of PuuD proteins (A) and UraH proteins (B) based on

155 sequences extracted from 18 Cephalots metagenomes and top BLAST hits.Rooted

156 maximum likelihood phylogeny reveals nearly all Cephalotes-associates come from

157 Cephalotes-specific clades. Outer circle and branch colors: bacterial taxonomy.

158 Middle circle colors: Cephalotes species groups. Inner circle: all red shading of taxon

6

159 names reveals sequences coming from our metagenomic datasets, bright red shading

160 of taxon names reveals cultured isolates and gray shading of taxon names represents

161 non-Cephalotine ant associated bacteria.

162

163 Figure S13. Pathways from the purines Guanine and Adenine to urea, via

164 xanthine/uric acid degradation. Shown with blue highlighted boxes are enzymes

165 encoded by the Burkholderiales CV33a strain, which can make urea from Guanine but

166 potentially not Adenine.

167

168 Figure S14. Alternative mechanisms for urea production in Cephalotes ants via a

169 separate, two-step pathway converting arginine to urea.

170

171 Figure S15. Alignment of isolate genome assemblies with metagenome

172 assemblies. Genome alignment against metagenome contigs shows the similarity of

173 cultured isolates to genomes present in the in vivo gut community. Each circular

174 genome visualization represents an isolate genome. The outermost ring shows GC%,

175 while the innermost shows coding density of the isolate genome. Each of the two

176 middle rings indicates alignment of scaffolds from C. varians metagenomes, from

177 samples C. varians PL010 (inner) and C. varians PL005 (outer). For each sample,

178 contigs aligning contiguously to the isolate genome reference are indicated by green

179 blocks; contigsthat align successfully but that are misassembled with respect to the

180 reference isolate genome are indicated in red. Nucleotide mismatches between the

181 reference and metagenome contigs are summarized by column charts within each

182 band, with higher columns indicating moremismatches in that window.

183

7

184 Supplementary Table legends

185

186 Table S1. Collection information for the ant colonies utilized in this study

187

188 Table S2. Acetylene-reduction activity detected for in vivo bacterial communities

189 of C. varians and information on colonies used in this study. Nitrogenase can

190 reduce acetylene (C2H2) to ethylene (C2H4). No ethylene was detected in three ant

191 colonies investigated in this study.

192

193 Table S3. Statistical results for heavy isotopic signal in hemolymph amino acids

194 in three feeding experiments.

195

196 Table S4. Assembly statistics of metagenomic data.

197

198 Table S5. Genes from N-metabolic pathways in 18 Cephalotes ant gut microbiota

199 and their distribution in different bacterial bins. B, R, O, P, X, C, F, S and H refer

200 to Burkholderiales, Rhizobiales, Opitutales, Pseudomonadales, Xanthomonadales,

201 Campylobacterales, Flavobacteriales, Sphingobacteriales and Hymenopera bins,

202 respectively.

203

204 Table S6. Assembly statistics of genomes and cultivation conditions of cultured

205 bacteria.

206

207 Table S7. The distribution of genes from N-metabolic pathways in the 14

208 genomes of bacteria isolated from C. varians and C. rohweri.

8

209

210 Table S8. Summary of scaffolds assigned to 11 bins in PL010 C. varians

211 metagenome

212

213 Table S9 Summary of strain-level binning for gut metagenomes from C. varians

214 workers in colony PL010.

215

216 Table S10 The distribution of genes from N-metabolic pathways in the 11 bins

217 generated based on the metagenome of C. varians colony PL010.

218

219 Table S11 Results of in vitro urea production assays

220

221 Table S12 Information of samples and fraction of the first isotopic peak

222 abundance (M+1 abundance (fraction %)) of amino acids in the feeding

223 experiments with 15N-labelled urea and 13C/15N-labeled glutamate. The first

224 isotopic peak represents the abundance of naturally occurring amino acids containing

225 heavy isotpes.

226

227 Table S13 OTU table from C. varians gut community samples used in feeding

228 experiments with 15N-labeled glutamate. The columns correspond to samples and

229 rows correspond to OTUs. Numbers represent read abundance for each OTU within

230 each library. Also indicated are taxonomic classification for each OTU.

231

232

233

9

234 Supplementary Materials and Methods

235 Assessing N-fixation

236 To measure N-fixation capacities for Cephalotes associated microbes we performed

237 acetylene reduction assays, in which conversion of acetylene (C2H2) to ethylene (C2H4)

238 is used as evidence for active nitrogenase enzymes 1, 2. Three colonies of C. varians

239 were collected (by CSM or YH) from mangrove trees in the Florida Keys (Table S1).

240 After excavation in the field, we immediately placed all available workers (and larvae,

241 pupae or queens, when present: Table S2) into 10 ml gas tight syringes (Vici Precision

242 Sampling Inc, Baton Rouge, LA, USA). An empty syringe was used as a control. Two

243 milliliters of air in these four syringes were removed and two milliliters of acetylene

244 were added to the syringe, resulting in a final atmosphere of 20% acetylene. A 1ml air

245 mixture sample from each syringe was injected in a 3ml Exetainer tube at 0, 1, 2, 4, 8,

246 16 hours. Acetylene and ethylene concentrations were then quantified using a gas

247 chromatography-flame ionization detector (GC-FID, HP6890 series, Agilent

248 Technologies, Inc., E.&E.S. Analytical Instrumentation of University of Pennsylvania).

249

250 Feeding experiments with 15N-labeled urea and 13C/15N-labeled glutamate

251 Nine colonies of C. varians collected from the Florida Keys (Table S1) were reared on

252 a holidic artificial diet 3 and 50% honey water at 25°C under a daily light:dark cycle of

253 14:10 until use in the feeding experiment. Fresh diet was provided roughly every two

254 days. All adult workers were subjected to a water-only starvation period of three days

255 prior to the start of experiments.

256

257 In the feeding experiment with 15N-labeled urea, workers from each of three colonies

258 were split into two treatment groups. In the first treatment, workers were subjected to

10

259 antibiotic feeding to remove their gut bacteria through rearing on 30% (weight/volume)

260 sucrose water containing 0.01% of each Tetracycline, Rifampicin, and Kanamycin.

261 Untreated workers from the second treatment group consumed only 30% sucrose water.

262 After the three week pre-trial period, the antibiotic-treatment groups were provided

263 with 30% sucrose water with the same antibiotic mixture, in addition to 1%

264 (weight/volume) 15N-labeled urea (Sigma-Aldrich, St Louis, MO). Untreated ants were

265 further split into subgroupings, with half being reared upon 30% sucrose water with 1%

266 (w/v) 15N-labeled urea, and the other half consuming 30% sucrose water containing 1%

267 (w/v) unlabeled (i.e. mostly 14N) urea.

268

269 The same experimental design was applied to three colonies in an experiment with 13C-

270 labeled glutamate (and unlabeled control) and to three additional colonies under a 15N-

271 labeled glutamate treatment (with an unlabeled control). Workers from each colony

272 were divided into two groups, with the first group of workers being fed a holidic

273 artificial diet3 containing 0.01% of each Tetracycline, Rifampicin, and Kanamycin, and

274 the second group consuming a holidic artificial diet for three weeks (the pre-trial

275 period). At that point, antibiotic-treatment groups were then switched to the modified

276 holidic diet (trial period), with only non-essential amino acids and the total amino acid

277 concentration the same as the complete holidic diet, plus glutamate containing either

278 isotope label, in addition to the same antibiotic mixture. At the same time, those from

279 the treatment without antibiotics were split into two groups, with one set consuming

280 the same diet as their antibiotic-treated counterparts, with either a standard isotope ratio

281 (treatment 2) or with glutamate containing the heavy label (treatment 3). No antibiotics

282 were added to these latter diets.

283

11

284 After four to five weeks of feeding during the trial period, ant hemolymph was extracted

285 from surviving workers. Hemolymph was harvested from decapitated ants using

286 borosilicate glass needles pulled from microcapillary tubes (1/0.58 OD/ID mm, World

287 Precision Instruments, Sarasota, FL) to capture droplets exuding from the posterior

288 opening of the head capsule and from the anterior opening of the mesosoma. Depending

289 on the number of ants available, there were two or three replicates for each colony and

290 treatment, each consisting of pooled hemolymph from 3-10 workers (see Table S12 for

291 details). Hemolymph was added to 10ul of molecular grade water, and samples frozen

292 at -80°C immediately after collection.

293

294 Worker survival curves for the 13C and 15N experiments (trial periods) were plotted and

295 data were analyzed using Cox regression analysis.

296

297 qPCR and amplicon 16S rRNA sequencing to estimate antibiotic efficacy

298 Quantative PCR with universal 16S rRNA primers was used to confirm that the

299 antibiotic treatments drastically reduced bacterial loads in gut communities of C.

300 varians (Fig. S1). Gasters of those ants were used for DNA extraction and all other

301 DNA isolation procedures were the same as mentioned below. 16S rRNA gene copy

302 concentration was estimated using qPCR with PerfeCTa SYBR Green FastMix (Quanta

303 Biosciences, Gaithersburg, MD, USA) and eubacterial primers 515F (5’-

304 GTGCCAGCMGCCGCGGTAA-3’) and 806R (5’-GGACTACHVGGGTWTCTAAT-

305 3’) at 200 nM each, on a CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad,

306 Hercules, CA, USA). The PCR program consisted of initial denaturation at 94°C for 3

307 minutes; 40 cycles of 94°C for 45s, 50°C for 60s, 72°C for 60s, and plate read at the

308 end of the extension step. Melting curve analysis was applied at the end of these 40

12

309 cycles, with temperatures rising from 55°C to 95°C with 0.5°C increments and plate

310 reads after 5s incubation at each temperature. Ten-fold dilution series of a DNA sample

311 extracted from ten pooled gut samples of C. varians were used to build standard curves

312 for estimation of relative bacterial abundance in ants under different dietary treatments.

313 Four biological replicates per dietary treatment in each colony were chosen for

314 quantitative PCR. Three technical replicates per standard curve sample and two

315 technical replicates per biological replicate were performed for each dietary

316 experiment. The relative bacterial abundance was determined by dividing bacterial 16S

317 rRNA copy number estimates by one tenth of the total amount of bacterial 16S rRNA

318 copy number estimates of the ten pooled gut DNA sample used for constructing

319 standard curves.

320

321 DNA samples from individual workers in the 15N glutamate experiment were sent to

322 Argonne National Laboratory for Illumina amplicon sequencing of the V4 region of

323 bacterial 16S rRNA. Analyses of sequences proceeded using previously published

324 quality control and filtering protocols4. A 97% OTU table was generated (Table S13),

325 and the average relative abundance of each OTU was obtained across ants from the

326 same treatment. Averages were then plotted in conjunction with qPCR data from the

327 same treatment (Fig. S2), showing how antibiotics had altered the composition in

328 addition to the quantities of gut microbiota.

329

330 Amino acid analysis from ant hemolymph by gas-chromatography-mass

331 spectrometry (GC-MS)

332 Enrichment of amino acids in ant hemolymph was measured at the Metabolic Tracer

333 Resource at the University of Pennsylvania. Approximately 5 ul of each hemolymph-

13

334 water mixture was acidified with 1 ml of 1N acetic acid and run over AG 50W-X8

335 cation exchange resin. Resin was washed three times with milli-Q water and free amino

336 acids eluted using 3N ammonium hydroxide. Samples were dried in a rotary vacuum

337 evaporator and amino acids converted to their heptafluorobutyryl isobutyl ester

338 derivatives5. Derivatized amino acids were injected onto an Agilent 7890A/5975C

339 Series gas chromatograph/mass spectrometer (GC/MS) (Agilent Technologies, Santa

340 Clara, CA) operated in the negative chemical ionization mode and separated using a

341 DB5-MS column. The injection port temperature was 250°C. The GC column

342 temperature was maintained at 80°C for 1 minute, increased to 150°C (10°C /min) and

343 then to 300°C (20°C /min). It was then held at 300°C for 1 minute. Amino acid peaks

344 were identified by retention time, which was confirmed using purified standards. Peaks

345 that could not be definitively identified were not measured.

346

347 Abundance data of 15N/13C-labeled essential amino acids in ant hemolymph samples

348 were transformed with a logit transformation (ln(p/(1-p))) before statistical analysis.

349 All logit transformed data were checked for normality by Shapiro-Wilk W-test. Normal

350 data were compared using one-way ANOVA with dietary treatment as a factor and

351 levels of 15N or 13C-labeled amino acids as dependent variables, followed by Tukey’s

352 post-hoc tests. Non-normal data were analyzed by Kruskal-Wallis tests followed by

353 multiple pairwise comparisons using the Wilcoxon rank sum test (see Table S3). All

354 statistical tests were performed using R version 3.3.2.

355

356 DNA preparation for C. varians metagenomics

357 Ten adult C. varians workers from each of two colonies in the Florida Keys were used

358 to create two DNA pools for metagenome sequencing. Adult workers were washed in

14

359 70% ethanol and sterile water before dissection. Ant guts were dissected with sterile

360 forceps under a compound light microscope. Between each individual dissection,

361 forceps were rinsed with a 6% bleach solution and then with sterile water. The dissected

362 mid- and hind- guts were individually immersed in 180 μL enzymatic lysis buffer

363 containing lysozyme (20 mg/ml). After grinding with sterile pestles, samples were

364 incubated for 30 min at 37°C. Extractions then proceeded according to the protocol for

365 gram-positive bacteria with the Qiagen DNeasy Kit (Qiagen, Valencia, CA). Pooled

366 genomic DNA from the guts of ten workers per colony was used as source material for

367 the two Illumina HiSeq metagenome libraries (colony PL005; colony PL010).

368

369 DNA extraction from non-C. varians ants for metagenomics

370 DNA from dissected guts of Cephalotes ants other than C. varians was extracted

371 according to the protocol of Sanders et al 2014 6, using pools of 10 dissected guts per

372 colony (as opposed to single guts, as for C. varians). Briefly, dissected guts preserved

373 in RNAlater were diluted ~1:1 in sterile water (to decrease solution density and

374 dissolve any precipitated salts), and spun to pellet the biological material. The

375 supernatant was removed and replaced with lysis buffer TLS-C (MPBio, inc), then

376 vortexed to resuspend. Resuspended material was lysed with

377 Phenol:Chloroform:Isoamyl alchohol (pH 8) and sterile beads (Lysis Matrix A,

378 MPBio) on a MPBio FastPrep-20 bead beater. The aqueous phase was then column-

379 purified through Qiagen DNeasy Blood and Tissue extraction columns, and

380 concentrated by isopropanol precipitation. Full methodological details are published

381 elsewhere6.

382 DNA extraction from cultured bacteria

383 High molecular weight DNA from cultured bacteria isolated from C. varians and C.

15

384 rohweri was extracted using Qiagen Genomic Tip 20/G columns, following the

385 manufacturer's recommendations for bacterial cultures.

386

387 Genome and metagenome sequencing, assembly and annotation

388 For shotgun sequencing of metagenomes of C. varians and isolates derived from C.

389 varians and C. rohweri, DNA was sheared to 400bp using a Covaris S220 sonicator.

390 Sheared DNA was end-repaired and ligated to indexed Illumina-compatible sequencing

391 adapters (Bioo Scientific, Inc) using the KAPA low-throughput Illumina-compatible

392 library preparation kit (KAPA biosystems, Inc). Fragments of the two prepared libraries

393 were size selected using double-ended SPRI bead-based size selection following the

394 KAPA protocol. After this selection, libraries were amplified for six cycles using KAPA

395 high-fidelity polymerase and then checked for quality using an Agilent Bioanalyzer.

396 The two prepared libraries as well as two from Cephalotes larval samples were pooled

397 with other indexed samples, combining for an estimated 40% of the total molar fraction

398 in the Illumina sequencing lane, and then sequenced at the Harvard Biopolymers

399 Facility using paired-end 150 bp reads on an Illumina HiSeq 2500 instrument.

400

401 Sequence libraries for non-C. varians-derived metagenomes and isolates were prepared

402 using the same Covaris shearing step as above, but on an Apollo 324 automated library

403 preparation robot using the PrepX ILM DNA kit (IntegenX, Inc) following

404 manufacturer’s recommendations. These libraries were PCR-amplified using the same

405 protocol as above, and amplified libraries were size-selected using the double-ended

406 SPRI bead-based size selection protocol on the Apollo 324 instrument. The resulting

407 libraries were sequenced using paired-end 100bp chemistry on an Illumina HiSeq 2000

408 instrument.

16

409

410 Metagenome sequences were trimmed for quality and adapters using Trimmomatic7.

411 The quality trimmed reads were then combined and assembled with IDBA-UD 1.1.1

412 using k values of 20, 40, 60, 80, and 1008. The assembled data were run through

413 QUAST9 to calculate assembly statistics (Table S4).

414

415 Scaffolds of 18 metagenomes and 14 isolates, and coverage information of

416 metagenomic scaffolds were uploaded to the Integrated Microbial Genomes with

417 Microbiome Samples Expert Review (IMG/M-ER)10. Assignment of phylogenetic

418 lineages was initially attempted in IMG/MER based on USEARCH similarity against

419 all public reference genomes in IMG and the KEGG database. However, some scaffolds

420 could not be assigned to bins while others were classified into bins not matching taxa

421 known to be prevalent amongst the Cephalotes gut microbiota. To obtain more accurate

422 information of phylogenetic binning, all scaffolds with length longer than 1000 bp from

423 18 metagenomes were compared to eight reference genomes of isolated gut bacteria

424 from C. varians (GOLD Analysis Project ID: Ga0064586, Ga0064593, Ga0064594,

425 Ga0064595, Ga0064585, Ga0064596, Ga0105007) and a Rhizobiales bacterium

426 genome (accession number CP015625) using BLASTX with an e-value of 10-15,

427 identity of 70% and maxhits of 1. A scaffold was assigned to a bacterial bin if over 50%

428 of all best BLASTX hits belonged to a single reference bacterial genome and at least

429 50% of the scaffold sequence was covered by the aforementioned BLASTX hits. If

430 phylogenetic assignment by IMG/MER of a certain scaffold did not match reference

431 genome based results, this specific scaffold was assigned to the phylogenetic group of

432 the appropriate cultured isolate as ascertained through this BLASTx approach.

433 Annotation of gene content was also performed by IMG/M-ER. N-metabolic pathways

17

434 of gut microbiota (Fig. 6; Figs. S6 & S9; Fig. S13-S14) were built manually, using

435 KEGG and Metacyc11 as guides (Tables S5, S7, S10). All genes involved in the N-

436 metabolic pathways of C. varians gut microbiota were added into a functional cart in

437 IMG, and the “Profile & Alignment” tool in the IMG function cart was used to search

438 those genes in non-C. varians-derived metagenomes. We present the nitrogen recycling

439 and nitrogen provisioning gene presence/absence data in different bacterial taxa along

440 with the Cephalotes host phylogeny12 in Figure S7. Gray bars were used in this figure

441 to obscure cells likely affected by insufficient coverage for that taxon in the given

442 metagenome (i.e. when total scaffold length within one metagenomic less than 50% of

443 the total length for this taxon in the draft genome assembled for C. varians PL010—

444 see below).

445

446 Sequence fragments of 16S rRNA genes with length longer than 200 bp were extracted

447 from all 18 metagenomic libraries and 14 cultured isolate genomes. Closest relatives of

448 each 16S rRNA sequence were identified in BLASTn searches and the top one to three

449 BLAST hits were taken for each sequence. If the top hit was from a non-ant source, this

450 sequence alone was selected. Up to two ant-associated sequences were selected, and

451 the top non-ant BLAST hit was always selected. Beyond the 16S rRNA sequences from

452 metagenomes and cultured isolates, and those from BLASTn hits, we also selected one

453 to five sequences with close relatedness to each of the major Cephalotes-specific clades

454 (based on phylogenetic placement in prior studies), along with two Mollicutes

455 sequences used as outgroups. Finally, we included a partial 16S rRNA sequence from

456 an allantoin-dependent, urea-producing Burkholderiales derived from the sister ant

457 genus of Cephalotes, Procryptocerus. Sequences were checked for chimeras though

458 DECIPHER14 and chimera filtered sequences were uploaded to the Ribosomal

18

459 Database Project website for sequence alignment15. The alignment was then uploaded

460 to the CIPRES web portal for maximum likelihood phylogenetic analysis using the

461 RAxML-HPC2 on XSEDE (version 8.2.4)16.

462

463 Amino acid sequence fragments encoded by ureC, uraH, and puuD from N-recycling

464 pathways were extracted from each metagenomic and genomic dataset. Related

465 homologs were identified in BLASTp searches and the top one to two for each sequence

466 was selected. Sequences were aligned by ClustalW17. The alignment was uploaded to

467 the CIPRES web portal for maximum likelihood phylogenetic analysis with

468 bootstrapping using the RAxML-HPC BlackBox 16.

469

470 Genome binning using Anvi’o in conjunction with the CONCOCT

471 We used the Anvi’o metagenome visualization and annotation pipeline (version 1.2.3)18

472 in conjunction with the CONCOCT differential coverage-based binning program19 to

473 bin assembled contigs into putative microbial genomes. These putative genomes were

474 then manually refined to maximize completeness and minimize redundancy according

475 to panels of single-copy marker genes as reported by Anvi’o. Briefly, reads from each

476 of the four pooled Cephalotes varians metagenomic libraries were mapped against the

477 assembled contigs using Bowtie220. These read profiles were then loaded into an Anvi’o

478 database and used for differential coverage binning with CONCOCT. All steps in this

479 process (with the exception of manual bin refinement) were automated using the

480 Snakemake workflow management software21; pipeline rules and configuration

481 information sufficient to reproduce this analysis are made available upon request.

482

483 Amino acid sequence fragments encoded by seven protein-coding genes (rplB, rplA,

19

484 rplC, rpsB, rpsC, rpsE and tsf) were extracted from each isolate genomic and draft

485 genomic dataset. The concatenated alignment was uploaded to the CIPRES web portal

486 for maximum likelihood phylogenetic analysis with bootstrapping using the RAxML-

487 HPC BlackBox 16.

488

489 Visualization of taxonomic composition of metagenomes based on coverage

490 and %GC

491 Quality and adapter-trimmed reads were mapped back to metagenome scaffolds using

492 BWA 0.7.1222 with default parameters. A Perl script sam_len_cov_gc_insert.pl

493 (https://github.com/sujaikumar/assemblage) was used to estimate length, %GC content

494 and average depth for each scaffold from the samfile generated by BWA. This GC-

495 coverage file was combined with a customized file containing the information of

496 taxonomic assignment for each scaffold using a python script make_blobology_file.py

497 (http://static.xbase.ac.uk/files/results/nick/make_blobology_file.py). Taxon-annotated

498 GC-coverage (TAGC) plots were then generated using scaffolds using a customized

499 python script to visualize the contributions of different bacterial bins to the metagenome

500 assemblies.

501

502 Fluorescence in situ hybridization

503 We investigated the localization of bacteria within the digestive tract of ants using

504 fluorescence microscopy. Guts dissected from workers of Cephalotes sp. JGS2370 were

505 fixed in 4% formaldehyde in PBS buffer for 2h at room temperature, then dehydrated

506 using an ethanol gradient, and stored in 95% ethanol. After rehydration using PBS

507 buffer with 0.03% TritonX-100, they were washed three times for 10 minutes with

508 hybridization solution containing 30% formamide, 0.01% SDS, 0.9 M NaCl and 0.02

20

509 M Tris-HCl (pH 8.0). Hybridization was performed overnight at 37°C in hybridization

510 solution with the addition of the universal eubacterial probe EUB338 (5’-

511 GCTGCCTCCCGTAGGAGT-3’) labeled with Cy3 at 100 nM, as well as DAPI as a

512 counterstain. After washing with PBS, the specimens were imaged using a Leica

513 M165FC fluorescent stereo microscope. Fluorescent microphotographs taken using the

514 blue and green excitation filters were merged with a photograph taken under the white

515 light. The detailed protocol is provided in23.

516

517 Stable isotope data

518 Data were extracted from a prior N-isotope profiling studies24 using graphical tools, as

519 described and summarized previously. We also used data from supplementary files of

520 another study that profiled Cephalotes N-isotopes25. For each locale where isotope data

521 had been generated previously, we plotted delta 15N values for Cephalotes next to those

522 for other ants (the sub-family containing Cephalotes) and ants from

523 Camponotus (from the subfamily Formicinae; this genus harbors N-recycling

524 symbionts feeding somewhat low on the food chain). Also plotted, separately for each

525 locale, were delta 15N values for sympatric plants, sap-feeding herbivores, leaf-chewing

526 herbivores, and predators.

527

528 Assays to measure urea production (via allantoin) and urea degradation (into

529 ammonia)

530 Bacteria isolated from the guts of Cephalotes or Procryptocerus ants were grown in

531 trypticase soy broth (TSB) or TSB supplemented with 250 µM allantoin (Sigma). They

532 were prioritized for genome sequencing based on their similarity at 16S rRNA to

533 previously sampled bacteria. A similar rationale was used to prioritize them for in vitro

21

534 assays.

535

536 As a proxy for the uric acid degradation pathway, we measured whether selected

537 isolates from the Burkholderiales (Fig. 6; Table S11) could produce urea in vitro and

538 whether this production was increased by allantoin (suggesting the presence of at least

539 part of the uric acidurea pathway). Tubes of TSB and TSB with allantoin were

540 inoculated from a liquid culture of the chosen isolates to an initial OD(A600) of

541 approximately 0.05. Uninoculated control TSB and TSB + allantoin tubes were also

542 incubated along with the inoculated samples. Sample aliquots (500 µL) were collected

543 from each tube at various time points. The bacteria in the inoculated samples were

544 pelleted by centrifugation at 4500xg for 10 minutes, and the liquid portion of inoculated

545 and uninoculated samples stored at -20°C until analysis.

546

547 Urea concentrations were measured using a modified Jung assay 26. Briefly, a solution

548 of equal parts o-phthalaldehyde and primaquine bisphosphate (Sigma) was prepared,

549 and 200 µL of this working solution was combined with 50 µL of samples in a 96-well

550 assay plate. Standard concentrations of urea in TSB and TSB + 250 µM allantoin were

551 also tested. The reaction of o-phthalaldehyde and primaquine bisphosphate with urea

552 caused a color change, which was measured at 430 nm using a BioTek® Synergy H1

553 spectrophotometer. The absorbance values of the uninoculated TSB or TSB + 250 µM

554 allantoin blank was subtracted from each standard and sample, then concentration was

555 calculated from the standard curve, with concentrations for corrected values below that

556 of the lowest standard (0 µM) being treated as 0 µM. The concentration of the un-

557 inoculated samples at each time was subtracted from the corresponding concentrations

558 of inoculated to calculate the amount of urea produced by the isolate in each media

22

559 type. Average urea production at each time point was calculated and normalized by

560 subtraction of the 0 hour average. Data were analyzed with SigmaPlot software (Systat,

561 San Jose, CA) using a two way repeated measures ANOVA and Holm-Sidak test.

562 Comparisons were considered statistically different if p ≤ 0.05.

563

564 Bacteria from a range of taxa spanning multiple Cephalotes hosts (Fig. 6) were used in

565 assays measuring ammonia production from urea. We performed a qualitative method,

566 where bacteria were inoculated into Rapid Urea Broth (BD, Sparks, MD) containing

567 the pH indicator phenol red. Isolates were considered positive for urea degradation if

568 the color of the media changed from red to bright purple.

569

570 For isolates used in these assays, we generated full-length 16S rRNA sequences with

571 Sanger sequencing. Top BLAST hits and representative sequences from particular

572 clades were downloaded from NCBI. These represented bacteria that had been

573 previously found through culture-independent means (i.e. in vivo), typically through

574 shallow sampling of clone libraries. Their identity or near identity to our isolates from

575 C. varians and C. rohweri indicate that our cultured isolates are abundant core

576 microbes. Maximum likelihood phylogenies, with bootstrapping, were conducted in the

577 software package SeaView after sequence alignment (through the Muscle algorithm) in

578 this same program.

579

23

580 Supplementary Results

581 Colony fragment nutritional experiments—antibiotic treatments

582 Subsets of ants for the below-described N-upgrading and N-recycling experiments

583 were treated with antibiotics to remove or suppress gut bacteria. Treatment efficacy

584 was validated through significant reductions in 16S rRNA copy number compared to

585 unexposed ants reared on the same diets (Fig. S1). While the magnitude of this

586 suppression varied across treatments, bacterial titers were always significantly lower

587 under antibiotic exposure. Amplicon sequencing of the V4 region of 16S rRNA

588 revealed that bacteria remaining after treatment were almost entirely core symbionts

589 from the Rhizobiales. The absence of other core taxa and dominance by this one

590 group drew a strong contrast with untreated gut communities, which showed greater

591 richness and evenness, and an overall composition of core symbionts resembling that

592 seen in prior studies.

593

594 The effects of antibiotics on worker ant survival differed across the 13C-glutamate

595 labeling and 15N-urea labeling experiments (Fig. S2). In the latter case, Cox

596 regression statistics revealed harmful effects of antibiotic treatment on C. varians

597 survival for two of three colonies (Wald statistic = 6.89, df = 1,P=0.0087 for colony

598 PL215A; Wald statistic = 22.67, df = 1,P= 1.924e-06 for colony PL217; Wald statistic

599 = 3.67, df = 1,P=0.0553 for colony PL231). In contrast, antibiotic treatment had no

600 significant impact on survival in the glutamate feeding experiments for any of three

601 colonies (Wald statistic = 2.4, df = 1, P=0.1214 for colony PL207; Wald statistic =

602 0.29, df = 1, P= 0.5888 for colony PL210; Wald statistic = 0, df = 1,P=0.9882 for

603 colony PL231).

604

24

605 Fine-scale metagenome binning from C. varians colony PL010: Why did N-

606 recycling genes appear absent from Cephaloticoccus and the predicted uric acid

607 degrading Burkholderiales with relatedness to isolate Cv33a?

608 Incomplete sequencing may explain the apparent lack of urease genes in the

609 Opitutales (Cephaloticoccus) bin. Urease genes were indeed present in the PL010

610 metagenome, classifying to the Opitutales order (Table S5; Fig. S6). Furthermore,

611 most genes involved in N-metabolism were found on just a single Opitutales-binned

612 scaffold from this metagenome (Table S5), suggesting the presence of just one

613 symbiont strain from this order within this colony. The simplest resulting explanation

614 is that orphaned urease genes (i.e. left over after draft genome assembly) belong to the

615 Opitutales strain with the assembled draft genome and that their exclusion is a

616 methodological artifact. Similarly, while the draft genome from one Rhizobiales

617 strain encoded all necessary urease genes, a Burkholderiales strain close to the

618 cultured uric acid recycling Cv33a isolate lacked some genes in the uric acid pathway,

619 in spite of all pathway genes having been found in the Burkholderiales bin within the

620 PL010 metagenome (Table S10; Table S5; Fig. S6). This too may reflect difficulties

621 in assembling complete genomes from complex metagenomic datasets. Regardless,

622 findings of universal N-recycling in all in vitro assayed isolates from either

623 Cephaloticoccus (urea) and the Burkholderiales Cv33a clade (converting allantoin to

624 urea, a proxy for the end of the uric acid pathway), suggest role conservation in these

625 groups.

626

627 A summary of sequenced genomes from cultured isolates

628 Bacteria from several major taxa, including Pseudomonadales, Opitutales,

629 Rhizobiales, Xanthomonadales, and Burkholderiales were successfully cultured from

25

630 macerated worker guts. Strains were prioritized for sequencing based on preliminary

631 assessments of identity or near 16S rRNA gene identity compared to known core

632 symbionts (Fig. S5). In total, we sequenced fourteen bacterial genomes using Illumina

633 HiSeq (n=13) or PacBio (n=1) technology. Genome sizes ranged from 1.9-3.4 Mb

634 with %GC ranging from 53.4-62.4% (Table S6). On average, genomes encoded 2615

635 protein-coding genes. Full-length 16S rRNA genes from these genomes were nested

636 within known clades of cephalotine core gut symbionts generated from 16S rRNA in

637 our shotgun metagenomic analyses (Fig. S5).

638

639 Details on gene composition are found in the main text. But we note here the unique

640 nature of the JR021-5 Rhizobiales genome. Found in a disparate Rhizobiales clade

641 (i.e. not the primary grouping; Fig. S5), this bacterium shows resemblance to those

642 found in Cephalotes worker crops and in larvae. Its genome lacked some of the key

643 N-metabolism genes found in most others. For example, it was the only one of 14

644 cultured isolates to lack glutamate dehydrogenase gene (gdhA) converting ammonia

645 into glutamate. It was also a strong outlier in its capacities to make amino acids,

646 making only six out of twenty. This genome also did not encode N-recycling genes.

647

648 Our cultured isolates are highly similar to previously sampled core symbionts.

649 In testing whether genetic signatures reflect actual N-recycling capacities, we

650 performed a series of in vitro assays. For symbionts of host species studied

651 extensively through prior work (i.e. C. varians and C. rohweri), 16S rRNA sequences

652 of the focal isolates were highly similar if not identical to those of core symbionts

653 obtained through culture-independent efforts. Isolates from other cephalotine host

654 species, subjected to little or no prior symbiont sequencing, had top BLAST hits to

26

655 cephalotine-specific bacteria (Fig. 6). These findings combine to illustrate the natural

656 relevance of our in vitro work, i.e. we have assayed dominant core gut symbionts or

657 their very close relatives.

658

659 660

27

661 References

662 663 1. Hardy R, Burns R, Holsten RD. Applications of the acetylene-ethylene assay 664 for measurement of nitrogen fixation. Soil Biology and Biochemistry 5, 47-81 665 (1973). 666 667 2. Bentley BL. Nitrogen-fixation in termites - fate of newly fixed nitrogen. Journal 668 of Physiology 30, 653-655 (1984). 669 670 3. Straka J, Feldhaar H. Development of a chemically defined diet for ants. 671 Insectes Sociaux 54, 100-104 (2007). 672 673 4. Hu Y, et al. By their own devices: invasive Argentine ants have shifted diet 674 without clear aid from symbiotic microbes. Molecular Ecology 26, 1608-1630 675 (2017). 676 677 5. MacKenzie SL, Tenaschuk D. Gas-liquid chromatography of N- 678 heptafluorobutyryl isobutyl esters of amino acids. Journal of Chromatography 679 A 97, 19-24 (1974). 680 681 6. Sanders JG, Powell S, Kronauer DJC, Vasconcelos HL, Frederickson ME, 682 Pierce NE. Stability and phylogenetic correlation in gut microbiota: lessons 683 from ants and apes. Molecular Ecology 23, 1268-1283 (2014). 684 685 7. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 686 sequence data. Bioinformatics 30, 2114-2120 (2014). 687 688 8. Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for 689 single-cell and metagenomic sequencing data with highly uneven depth. 690 Bioinformatics 28, 1420-1428 (2012). 691 692 9. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool 693 for genome assemblies. Bioinformatics 29, 1072-1075 (2013). 694 695 10. Markowitz VM, et al. IMG/M 4 version of the integrated metagenome 696 comparative analysis system. Nucleic Acids Research 42, D568-D573 (2014). 697 698 11. Caspi R, et al. The MetaCyc database of metabolic pathways and enzymes and 699 the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research 700 42, D459-D471 (2014). 701 702 12. Price SL, Powell S, Kronauer DJC, Tran LAP, Pierce NE, Wayne RK. Renewed 703 diversification is associated with new ecological opportunity in the Neotropical 704 turtle ants. Journal of Evolutionary Biology 27, 242-258 (2014). 705 706 13. Hu Y, Lukasik P, Moreau CS, Russell JA. Correlates of gut community 707 composition across an ant species (Cephalotes varians) elucidate causes and 708 consequences of symbiotic variability. Molecular Ecology 23, 1284-1300 709 (2014). 28

710 711 14. Wright ES, Yilmaz LS, Noguera DR. DECIPHER, a Search-Based Approach to 712 Chimera Identification for 16S rRNA Sequences. Appl Environ Microb 78, 717- 713 725 (2012). 714 715 15. Cole JR, et al. The Ribosomal Database Project: improved alignments and new 716 tools for rRNA analysis. Nucleic Acids Research 37, D141-D145 (2009). 717 718 16. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post- 719 analysis of large phylogenies. Bioinformatics 30, 1312-1313 (2014). 720 721 17. Thompson JD, Gibson TJ, Higgins DG. Multiple Sequence Alignment Using 722 ClustalW and ClustalX. In: Current Protocols in Bioinformatics (ed^(eds). John 723 Wiley & Sons, Inc. (2002). 724 725 18. Eren AM, et al. Anvi'o: an advanced analysis and visualization platformfor 726 'omics data. Peerj 3, (2015). 727 728 19. Alneberg J, et al. Binning metagenomic contigs by coverage and composition. 729 Nat Methods 11, 1144-1146 (2014). 730 731 20. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat 732 Methods 9, 357-U354 (2012). 733 734 21. Koster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. 735 Bioinformatics 28, 2520-2522 (2012). 736 737 22. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler 738 transform. Bioinformatics 26, 589-595 (2010). 739 740 23. Łukasik P, et al. The structured diversity of specialized gut symbionts of the 741 New World army ants. BioRiv, (2016). 742 743 24. Davidson DW, Cook SC, Snelling RR, Chua TH. Explaining the abundance of 744 ants in lowland tropical rainforest canopies. Science 300, 969-972 (2003). 745 746 25. Tillberg CV, Holway DA, LeBrun EG, Suarez AV. Trophic ecology of invasive 747 Argentine ants in their native and introduced ranges. Proceedings of the 748 National Academy of Sciences of the United States of America 104, 20856- 749 20861 (2007). 750 751 26. Zawada RJX, Kwan P, Olszewski KL, Llinas M, Huang SG. Quantitative 752 determination of urea concentrations in cell culture medium. Biochem Cell Biol 753 87, 541-544 (2009). 754

29