Metagenomes of native and electrode-enriched microbial communities from the Soudan Iron Mine Jonathan P. Badalamenti and Daniel R. Bond Department of Microbiology and BioTechnology Institute, University of Minnesota - Twin Cities, Saint Paul, Minnesota, USA Twitter: @JonBadalamenti @wanderingbond

Summary Approach - compare metagenomes from native and electrode-enriched deep subsurface microbial communities

30 Despite apparent carbon limitation, anoxic deep subsurface brines at the Soudan )

enriched 2 Underground Iron Mine harbor active microbial communities . To characterize these 20 assemblages, we performed shotgun metagenomics of native and enriched samples. enrich harvest cells collect inoculate +0.24 V extract 10 Follwing enrichment on poised electrodes and long read sequencing, we recovered Soudan brine electrode 20° C from DNA biodreactors ( µ A/cm current electrodes from the metagenome the closed, circular genome of a novel Desulfuromonas sp. 0 0 10 20 30 40 filtrate PacBio RS II Illumina HiSeq with remarkable genomic features that were not fully resolved by short read assem- extract time (d) long reads short reads TFF DNA unenriched bly alone. This organism was essentially absent in unenriched Soudan communities, 0.1 µm retentate assembled metagenomes reconstruct long read return de novo complete genome(s) assembly indicating that electrodes are highly selective for putative metal reducers. Native HGAP assembly community metagenomes suggest that carbon cycling is driven by methyl-C me- IDBA_UD 1 hybrid tabolism, in particular methylotrophic methanogenesis. Our results highlight the 3 µm prefilter assembly N4 binning promising potential for long reads in metagenomic surveys of low-diversity environ- borehole N4 binning read trimming and filtering brine de novo ments. collection bottle assembly IDBA_UD

Introduction Results - long reads recover a closed genome from mine enrichements on electrodes ĖŤ'#Ť.4" -Ť 1.-Ť (-#Ť(-Ť-.13'#1-Ť (--#2.3 Ť/1.5("#2Ť Ť1# "(+8Ť !!#22( +#Ť/.13 +Ť datasets ‘Ca. Desulfuromonas biwabikus DDH964’ into carbon cycling the deep terrestrial biosphere unenriched enriched on electrodes complete genome short read assembled contigs c-type cytochromes mapped reads > 6.5 kbp phage DNA/transposons Ť ĖŤĄĉ ŤĄ7ă㹍 /Ť ++4,(- Ť1# "2 Ť ĖŤĄć ŤĄ7ăĉĂŤ /Ť ++4,(- Ť1# "2 3,924,648 bp circular mapped long read coverage rRNA

ĖŤ'#Ť"##/Ť24 241$ !#Ť' 1 .12Ťlow-complexity microbial communities which drive 3.9 3.8 repeat sequences >500 bp ĖŤĄćĊŤ /Ť !(.Ť+.-%Ť1# "2ŤijĄŤ Ť!#++2ďŤ 0.1 62.23% G+C 3.7 repeat sequences >2,000 bp fundamental biogeochemical cycles, including redox transformations of metals 0.2 3.6 Ť ŤŤŤ 5%ĎŤ1# "Ť+#-%3'ŤċďĉăćŤ /Ĵ 3,633 CDS 0.3 genes on + strand

ĖŤ1#5(.42ŤăĊŤ1Ť%#-#Ť2415#82Ť.$Ť -.7(!Ť 1(-#2Ť 3Ť.4" -Ť2'.6Ť+.6Ť1#+ 3(5#Ť 4-- 3.5 0.4 genes on − strand imcH Gamma- 54 tRNA cbcL dance of putative metal reducers Delta- 3.4 0.5 proteobacteria ĖŤ+#!31."#2Ť./#1 3#"Ť 3Ťǫ7#"Ť/.3#-3( +Ť#+(,(- 3#Ť"(2 "5 -3 %#2Ť.$Ť#-1(!'(-%Ť,#3 +Ť Firmicutes 3.3 0.6 Alpha- reducers on insoluble metal oxides as electron acceptors, such as variability in DBIWA_3010 0.7 proteobacteria DBIWA_3009 crystallinity, redox potential, and adsorption of other compounds Unclassified DBIWA_3008 Alphaproteobacteria 0.8 ĖŤ'.13Ť1# "Ť 22#, +8Ĭ 2#"Ť2'.3%4-Ť,#3 %#-.,(!2Ť !!#22#2Ť#5#-Ť+.6Ť 4-" -!#Ť Euryarchaeota Gammaproteobacteria DBIWA_3000 microbes in natural communities, but at the expense of assembly contiguity other other 0.9

3.2 ĖŤThere is tremendous potential for long reads in improving metagenomic assem- 1 Mb blies and downstream phylogenetic, bioinformatic, and biochemical predic- assembly comparison - electrode enrichment 2500 250 3.1 18 IDBA_UD PacBio subread filtering hgcAB 1.1 tions SPAdes hybrid 15 2000 200 SPAdes with 3 Mb 1.2 Illumina only 12 1500 150 Soudan Underground Iron Mine 2.9 9 1.3 1000 100 rRNA 2 rRNA 1

6 PBcR # of subreads ĖŤ4--#+2Ť! 15#"Ť 3Ťċ㹍,Ť"#/3'Ť(-Ť (--#2.3 ġ2Ť.+"#23Ť -"Ť"##/#23Ť4-"#1%1.4-"Ť 2.8 1.4

500 50 lengthMbp > read CRISPR1 CRISPR2 mine transect massive veins of hematite embedded in an Archaean (2.7 Gya) lengthcumulative (Mbp) 3 2.7 1.5

banded iron formation 0 800 1600 2400 3200 4000 10 20 30 1.6 2.6 ĖŤ-.7(!Ť! +!(4,Ť!'+.1("#Ť 1(-#2Ť2##/Ť$1.,Ť#7/+.1 3.18Ť"1(++Ť'.+#2Ť(-Ť -Ť.3'#16(2#Ť contigs read length (kbp) 1.7 2.5

dry mine 1.8 2.4 Int’l Falls 1.9 ONT A R I O adding PacBio long reads improves metagenomic assembly CANADA M 2.3

I 2.2 2 Mb ĖŤ#2/(3#Ť! 1 .-Ť+(,(3 3(.-ďŤ 1(-#2Ť 1#Ť1(!'Ť(-Ť N long reads 2.1 N genome features E +8 short reads only short and long reads S O only reduced metals, suggesting active microbial T glyoxlyate shunt Bemidji A I O R E R U P S K E UNITED STATES L A IDBA_UD SPAdes SPAdes hybrid PBcR hybrid HGAP nitrate respiration malate synthase aceB isoctrate lyase icl metabolism 40 km Duluth contigs 2820 3816 3338 581 132 degradation of aromatics DBIWA_3278 DBIWA_3279 ĖŤSoudan is actively maintained as a Minne- total length 16,849,449 15,149,004 15,417,681 5,660,852 4,451,391 phosphonate transport 3536800 3538400 3540000 sota state park, allowing year-round N50 38,339 16,604 25,567 58,773 3,932,815 evidence for conjugative mobule element transfer (tra genes) and chromosomal integration L 107 187 80 27 1 access to the deep terrestrial biosphere 50 heavy metal resistance; mercury resistance and methylation

Results - native Soudan metagenomes datasets inner rings novel lineages natural, unenriched Soudan Mine communities Firmicutes Alphaproteobacteria Class level Ť ĖŤĄć ŤĄ7ă㹍 /Ť ++4,(- Ť1# "2Ť/#1Ť .1#'.+# Bacteroidetes Gammaproteobacteria Order level Genome Comparison of Metal-Reducing Ť ĖŤ 22#, +#"Ť6(3'Ť į Euryarchaeota Deltaproteobacteria Family level Ť ĖŤ Ť1#!.-2314!3(.-Ť.$Ť$4++Ĭ+#-%3'ŤăĊŤ1Ť%#-#2 38 unclassified Geopsychrobacter electrodophilus 24 Ť ĖŤ Ť3#31 -4!+#.3("#Ť (--(-% Desulfuromusa kysingii outgroup-rooted tree based on Proteobacteria alignment of concatenated set Methanolobus unclassified of 40 conserved single-copy unclassified Clostridiales Rhodobacterales Methanolobus unclassified Dehalobacter Dehalobacter 22 6 genes generated using Phylosift Rhodobacteraceae Methanolobus Desulfuromonas thiophila Desulfosporosinus 31 (v. 1.0.1) unclassified Halocella 49 Rhodobacteraceae unclassified Roseovarius Sphaerochaeta Halocella Desulfuromonas acetoxidans Clostridiales unclassified seleniigenes Rhodovulum Clostridiales 28 Rhodovulum 51 Geoalkalibacter subterraneus Red1

Pelobacter carbinolicus

Desulfuromonas subbituminosa DDH DDH DDH Marinobacter 41 Geoalkalibacter ferrihydriticus Z-0531 ‘Ca. #24+$41.,.- 2Ť (6 (*42ŤčĊćġ 932 944 951 unclassified ‘Ca.Ť#24+$41.,.- 2Ť2.4" -#-2(2Ť ġ34 Halothiobacillus Desulfuromonas sp. TF Halothiobacillus Peptococcaceae Desulfuvibrio vulgaris Hildenborough unclassified unclassified Desulfopila 0.08 Desulfitibacter Demequina 61 Geobacter daltonii FRC-32 Alteromonadales Proteobacteria 2 multiheme* c-type Dehalobacter 86 Demequina unclassified Geobacter lovleyi cytochrome count Marinobacter Halanaerobium Bacteroidales 42 unclassified unclassified unclassified 86 Geobacter uraniireducens Rf4 Geobacter argillaceus Marinobacter *3 or more heme- Halanaerobiaceae Bacteroidales Peptococcaceae Geobacter sp. OR-1 Halocella 55 binding motifs GeobacterGeobacter sp. GSS01 sulfurreducens PCA unclassified Geobacter bemidjiensis 65 Geobacter metallireducens shallower/ deeper/ Peptococcaceae samples collected along redox gradient more oxidized more reduced Geobacter bremensis 58 58 pan- and core genome analysis of avail- Geobacter sp. M21 70 Geobacter sp. M18 able Geobacter/Desulfuromonas clade 76 genomes, with cytochrome plots re- 68 71 ported using default clustering param- Conclusions Acknowledgments

Geobacter pickeringii G13 eters in get_homologues (3-6-2015 re- 58 59 lease) ĖŤ'.3%4-Ť,#3 %#-.,(!2Ť.$Ť.4" -Ť!.,,4-(3(#2Ť1#5# +#"Ť-.5#+Ť1"#1ĬŤ -"Ť We thank the Minnesota Supercomputing Institute (MSI) for developing, implementing, and main- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● taining the PacBio SMRT Analysis suite, and the Marine Biological Laboratory (MBL) for Illumina se- ● ● ● ● ● ● ● ● ● 4000 Family-level lineages, particularly among Firmicutes pangenome ● core genome ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● quencing under a seed grant from the Deep Carbon Observatory Census of Deep Life. Long read data all genes ● ● ● ● all genes ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ĖŤ +(% 3#+8Ť,#3'8+.31./'(!ŤMethanolobus was the only Archaeon observed in ● ● ● ● 12000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● was generously provided by Pacific Biosciences and we thank Karl Oles (Mayo Clinic Bioinformatics ● ● ● ● ● ● ● ● ● ● ● 3000 ● ● ● ● ● ● multiheme* ● ● ● ● ● unenriched metagenomes, suggesting active methyl-C metabolism in situ ● ● ● ● 1 ● ● ● ● ● ● ● ● Core) for sample preparation. We also thank Chris O’Brien (Pall Life Sciences) for guidance with TFF ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● c-type ● ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● cytochromes ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● -"Ť.4" -Ť 1*Ť - %#1Ť (,Ť22(%Ť$.1Ť!..1"(- 3(-%Ťǫ#+"Ť2 ,/+(-%ĎŤ'(2Ť6.1*Ť6 2Ť24//.13#"Ť 8Ť3'#Ť ● ● ● ĖŤ.(2#"Ť#+#!31."#2Ť#Ǫ#!3(5#+8Ť#-1(!'Ť-.5#+Ť,#3 +Ĭ1#"4!(-%Ť !3#1( Ť$1.,Ť#-5(- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● clusters gene ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● gene clusters gene ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● multiheme* ● ● 0 (--#2.3 Ť-5(1.-,#-3Ť -"Ť 341 +Ť#2.41!#2Ť1423Ť4-"Ď ● 60 ● ● ● ● ronments where they exist at extremely low relative abundance ● ● ● ● ● ● ● ● c-type ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● cytochromes ● ● ● ● ● ● ● ● ● ● 1000 ● ● ● ● ● ● ● ● ● ●

20 ĖŤ .6Ť!.,/+#7(38Ť!.,,4-(3(#2Ť 1#Ť ,#- +#Ť3.Ť+.-%Ť1# "Ť2#04#-!(-% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4000 ● ● ● ● ● ● ● ● ● ● ● 568 ● ● ● ● ● ● ● ● ● ● DEEP CARBON genes ĖŤHowever, for unenriched metagenomic datasets, additional long read cov- OBSERVATORY BIOINFORMATICS CORE 1 3 5 7 9 11 13 15 17 19 21 23 25 1 3 5 7 9 11 13 15 17 19 21 23 25 genomes genomes erage is required to address low abundance (<5%) community members