Supplementary Information for:

Meanders as a scaling motif for understanding of floodplain soil microbiome and biogeochemical potential at the watershed scale

Paula B. Matheus Carnevali1, Adi Lavy1, Alex D. Thomas2, Alexander Crits-Christoph3, Spencer Diamond1, Raphaeël Meéheust1,4, Matthew R. Olm3,^, Allison Sharrar1, Shufei Lei1, Wenming Dong5, Nicola Falco5, Nicholas Bouskill5, Michelle Newcomer5, Peter Nico5, Haruko Wainwright5, Dipankar Dwivedi5, Kenneth H. Williams5, Susan Hubbard5, Jillian F. Banfield1,2,3,4,5,6,*.

1Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. 2Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA. 3Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA. 4Innovative Genomics Institute, Berkley, CA, USA. 5Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA 6Chan Zuckerberg Biohub, San Francisco, CA, USA.

Current affiliation: ^ Department of Microbiology and Immunology, Stanford University, Palo Alto, CA, USA

*Corresponding author: [email protected]

File contents: Supplementary figures 1-7 List of supplementary tables 1-14 List of supplementary data 1-8

Supplementary Figure 1. Percent of samples within each floodplain where a genome was detected at the sub-species level (98% ANI). Presence or absence was determined based on Hellinger transformed abundance (average coverage ³ 0.01). (a) Detection regardless of where the genome was reconstructed from. (b) Detection according to floodplain of origin. Genomes were detected in a higher number of samples from a given floodplain if they were reconstructed from a sample within that floodplain. Alphaprotoeobacteria (outgroup)

100 GCA 003483505.1 GCA 009923755.1 100 GCA 009921425.1 GCA 002451065.1 GCA 001464695.1 100 GCA 001464765.1 100 GCA 001464895.1 Drinking Water Treatment Plant in Ann Arbor 97 GCA 009693245.1 Lake Baikal Russia ERMGT157 2 61 8 curated 97 Tree scale: 0.1 100 ERMGT138 2 Betaproteobacteria 61 8 curated 100 PLM3 127 b2 sep16 Betaproteobacteria Methylophilales 62 20 100 ERMGT200 2 Betaproteobacteria 62 10 curated 100 94 ERMGT642 2 Betaproteobacteria 62 13 curated

100 92 100 Family Nitrosomonadaceae GCA 009377585.1 69 GCA 009920905.1 100 GCA 003153455.1 100 PLM2 5 b1 sep16 Betaproteobacteria 59 9 100 100 PLM2 5 b1 sep16 Nitrosospira multiformis 62 13 GCA 001770325.1 82 100 GCA 011046905.1 100 GCA 001772005.1 Environmental strains clade 1 (GTDB Family SG8-41) 100 GCA 001771985.1 100 100 GCA 001464815.1 100 GCA 003820645.1 100 GCA 900299145.1 100 ERMGT157 2 Betaproteobacteria 60 9 curated 100 GCA 900299225.1 Amazon river downstream section AM_1003 GCA 011330925.1 Hot spring sediment 100 100 GCA 003250115.1 100 GCA 003235525.1 Hydrothermal vent Mid-Atlanctic Ridge 100 GCA 001303585.1 ERMLT366 2 Betaproteobacteria 65 7 curated 100 ERMLT300 2 Betaproteobacteria 64 11 curated 100 ERMGT600 2 Betaproteobacteria 64 13 curated 100 100 100 PLM4 5 b1 sep16 Betaproteobacteria 64 7 ERMLT433 2 Betaproteobacteria 66 12 curated 100 ERMLT262 2 Betaproteobacteria 66 9 curated 65 ERMLT499 2 Betaproteobacteria 66 11 curated 65 ERMZT820 2 Betaproteobacteria 66 22 curated 65 100 ERMLT830 2 Betaproteobacteria 64 11 curated GCA 001771935.1 Rifle, CO (RIFCSPLOWO2) 100 ERMGT828 2 Betaproteobacteria 63 8 curated 100 ERMZT100 FULL Betaproteobacteria 63 14 curated PLM3 127 b2 sep16 Betaproteobacteria 64 11

100 ERMGT630 2 Betaproteobacteria 65 25 curated 100 100 ERMGT418 2 Betaproteobacteria 65 12 curated 100 ERMGT100 2 Betaproteobacteria 65 9 curated ERMGT500 2 Betaproteobacteria 65 14 curated 50 100 ERMGT615 2 Betaproteobacteria 65 14 curated 62 ERMZT600 2 Betaproteobacteria 64 14 curated ERMLT560 2 Betaproteobacteria 64 11 curated 76 ERMGT200 2 Betaproteobacteria 65 11 curated

98 ERMLT100 2 Betaproteobacteria 65 7 curated ERMGT842 2 Betaproteobacteria 65 8 curated ERMLT530 2 Betaproteobacteria 64 10 curated ERMZT400 2 Betaproteobacteria 65 15 curated ERMGT119 2 Betaproteobacteria 64 15 curated ERMLT142 2 Betaproteobacteria 65 11 curated 69 ERMLT466 2 Betaproteobacteria 65 21 curated 54 ERMZT166 2 Betaproteobacteria 65 17 curated ERMLT860 2 Betaproteobacteria 65 17 curated ERMGT319 2 Betaproteobacteria 65 13 curated 66 PLM4 32 b1 sep16 Betaproteobacteria 65 15 57 ERMGT500 2 Betaproteobacteria 65 31 curated 65 ERMGT454 2 Betaproteobacteria 65 14 curated ERMGT138 2 Betaproteobacteria 64 8 curated 100 GCA 004298385.1 100 GCA 001770575.1 Rifle (CO) (RIFCSPLOWO2) 98 GCA 007280315.1 Powell Lake water 98 GCA 012274205.1 GCA 009693235.1 100 GCA 011391185.1 PLM6 170 b2 sep16 Dechlorosoma suillum 69 12 100 GCA 001770355.1 100 GCA 001770255.1 100 100 GCA 001770525.1 Rifle (CO) (RIFCSPLOWO2) GCA 001771915.1 100 GCA 001770585.1 100 93 99 GCA 001770415.1 Rifle (CO) (RIFCSPLOWO2) 100 GCA 001770245.1 100 GCA 001770505.1

93 ERMGT800 2 Betaproteobacteria 64 20 curated ERMGT800 2 Betaproteobacteria 64 38 curated ERMGT100 2 Betaproteobacteria 64 9 curated 100 100 ERMZT100 Betaproteobacteria 64 15 curated 100 100 PLM3 127 b2 sep16 Hydrogenophilalia Hydrogenophilales 63 16 100 ERMGT500 2 Betaproteobacteria 64 16 curated 100 ERMGT357 2 Betaproteobacteria 64 10 curated 100 100 PLM4 65 b1 sep16 Hydrogenophilalia Hydrogenophilales 64 16 100 100 ERMLT366 2 Betaproteobacteria 64 8 curated GCA 005881355.1 Angelo temperate grassland soil 97 PLM1 100 b1 sep16 koreensis 64 9

97 ERMGT800 2 Betaproteobacteria 63 9 curated ERMZT600 2 Betaproteobacteria 65 8 curated 100100 ERMGT500 2 Betaproteobacteria 65 10 curated Environmental strains clade 2 (GTDB Family SG8-39) 100 PLM0 60 b1 sep16 65 9 GCA 005888845.1 Angelo temperate grassland soil 100 100 GCA 005881765.1 Angelo temperate grassland soil 97 GCA 005888995.1 97 PLM1 30 coex sep16 Hydrogenophilalia Hydrogenophilales 64 8 97 GCA 005881935.1 96 GCA 005884455.1 GCA 009693425.1 100 GCA 009922975.1 GCA 001303445.1

100 GCA 009693355.1 Lake Baikal ERMLT830 2 Betaproteobacteria 69 9 curated PLM4 5 b1 sep16 Betaproteobacteria 67 16 100 100 100 ERMGT600 2 Betaproteobacteria 67 12 curated 100 ERMGT746 2 Betaproteobacteria 68 10 curated ERMGT436 2 Betaproteobacteria 68 12 curated 69100 100 ERMLT499 2 Betaproteobacteria 68 9 curated 96 ERMLT231 2 Betaproteobacteria 68 9 curated 100 PLM4 65 b1 sep16 Betaproteobacteria 68 26 97 ERMGT338 2 Betaproteobacteria 68 10 curated 94 ERMGT222 2 Betaproteobacteria 68 11 curated 96 ERMLT800 2 Betaproteobacteria 68 12 curated

89 ERMGT513 2 Betaproteobacteria 66 12 curated 100 GCA 001770285.1 100 GCA 009693345.1 100 GCA 005240145.1 75 100 GCA 009693335.1 PLM3 127 b2 sep16 Betaproteobacteria 66 9 100 PLM6 170 b1 sep16 Betaproteobacteria 66 9

100 GCA 002451175.1 100 GCA 001770475.1 100 GCA 001770625.1 89 98 GCA 004298325.1 100 GCA 004298985.1 100 GCA 001770495.1 GCA 001770655.1 ERMGT642 2 Betaproteobacteria 67 12 curated 100 PLM3 127 b2 sep16 Betaproteobacteria 67 14 100 98 GCA 005884345.1 Angelo temperate grassland soil 100 PLM0 30 b1 sep16 Betaproteobacteria 66 18 100 GCA 005883005.1 100 100 GCA 001919685.1 GCA 001772055.1 GCA 004297625.1 98 ERMGT119 2 Betaproteobacteria 68 12 curated 100 100 ERMGT244 2 Betaproteobacteria 68 14 curated 100 ERMGT769 2 Betaproteobacteria 68 11 curated

98 GCA 005883675.1 Angelo temperate grassland soil 100 GCA 005888925.1 GCA 001644455.1 Iowa RefSoil 98100 PLM4 5 coex sep16 Betaproteobacteria 67 11 100 ERMGT138 2 Betaproteobacteria 67 10 curated 99 ERMGT100 2 Betaproteobacteria 67 10 curated ERMLT830 2 Betaproteobacteria 66 12 curated 100 ERMLT600 2 Betaproteobacteria 66 7 curated 100 100 ERMLT700 Betaproteobacteria 65 16 curated 99 ERMZT500 2 Betaproteobacteria 68 12 curated

100 ERMGT814 2 Betaproteobacteria 66 9 curated 100 ERMZT166 2 Betaproteobacteria 67 12 curated ERMZT800 2 Betaproteobacteria 67 16 curated 99 66 ERMZT736 2 Betaproteobacteria 67 14 curated 95 ERMZT718 2 Betaproteobacteria 66 16 curated 73 ERMZT366 2 Betaproteobacteria 67 12 curated

54

92

92

100

69 Family Gallionelaceae GCA 002257185.1 97 GCA 008080515.1 100 100 GCA 002470125.1 100 GCA 009885195.1 100 GCA 003151655.1 100 GCA 004297065.1 ERMGT357 2 Betaproteobacteria 59 6 curated 10096 GCA 000025705.1 Sideroxydans lithotrophicus ES-1. 93 96 GCA 001830485.1 ERMGT723 2 Betaproteobacteria 56 7 curated 100 GCA 001830495.1 100 GCA 001830535.1 78 GCA 002789215.1 100 GCA 002789255.1 98 GCA 002421915.1 GCA 000376945.1

100 100 100 100 GCA 001771905.1 GCA 003112435.1 94 100 GCA 001803385.1

100 GCA 001645185.1 GCA 009858335.1

100 GCA 001648895.1 Family Methylophilaceae

83 10081 GCA 000617925.1 100 GCA 900187985.1 79 66 GCA 009360475.1 GCA 902168195.1 GCA 005502435.1 71

82100

87 ERMGT157 2 Betaproteobacteria 53 8 curated GCA 000953015.1 Candidatus Methylopumilus turicensis MMS-10A-171 100 98 GCA 009926205.1

100 99 GCA 005797835.1 100 100 GCA 006364455.1 100 GCA 003533095.1

100 GCA 010022975.1 100 GCA 001044355.1 100 GCA 000156155.1 100 GCA 902512995.1 100 GCA 902558495.1 100 GCA 001438385.1 100 GCA 000168995.1

100

100 Family Rhodocyclaceae

100

99

97 Family Burkholderiaaceae

100 ERMZT640 2 Betaproteobacteria 61 8 curated 100 ERMZT840 2 Betaproteobacteria 61 7 sub curated 99 ERMZT718 2 Betaproteobacteria 61 11 curated 100 100 ERMZT736 2 Betaproteobacteria 61 7 curated GCA 011328825.1 100 GCA 005882215.1 Angelo temperate grassland soil 79 100 GCA 000828975.1 100 GCA 900290295.1 Schlöppnerbrunnen fens 100 GCA 001688905.2 100 GCA 003525345.1 100 GCA 003481265.1 GCA 000437635.1 100 100 GCA 003543795.1 GCA 001578585.1 100 GCA 000250875.1 100 100 GCA 000411515.1 100 GCA 009183815.1 100 100 GCA 009183845.1 GCA 000438235.1 GCA 902364545.1 99 GCA 000434855.1 100 100 GCA 900544255.1 87 GCA 902769935.1 86 GCA 900542805.1 100 GCA 002404465.1

100

100

100

100 50

98 GCA 003152055.1

100 97

5358

5353

97 53

98 GCA 002222655.1 GCA 002255925.1 100100 96 GCA 006519715.1 100 GCA 008801845.1 100 GCA 000333615.1 ERMZT660 2 Betaproteobacteria 66 6 curated 69 93 GCA 000244995.1 bacterium JOSHI_001 100 GCA 002198735.1 100 GCA 002256085.1 100 GCA 001770815.1 93 100 GCA 010030205.1 GCA 009923895.1 GCA 001724995.1 100100 GCA 011525985.1 GCA 000420125.1 100 100 GCA 000284255.1 100 GCA 004340905.1 100 GCA 001725505.1 100 GCA 001464055.1 100 GCA 001295905.1 100 GCA 001295855.1 100 GCA 004016505.1 Supplementary Figure 2. Concatenated ribosomal proteins IQ-TREE of Betaproteobacteria at the sub-species level (98% ANI) and ~ 1540 reference genomes from the NCBI. East River Betaproteobacteria are shown in bold magenta font (from this study) and violet font (from 1). Some environmental sequences related to East River Betaproteobacteria are highlighted in orange, and next to the accession number is the environment of origin. Clades at the family level follow GTDB 2 for additional reference.

Supplementary Figure 3. Diversity indices calculated for a set of representative genomes at the sub-species level (98% ANI).

Supplementary Figure 4. Environmental variables used in fourth corner analysis: solid face chemistry from soil samples collected in 2015 including total carbon (TC; %), inorganic carbon (IC; %), organic carbon (OC; %), total nitrogen (TN; %), total carbon to total nitrogen ratio (OC:N) and measures associated with sample site locations: distance to river, elevation, easting, northing, topographic position index (TPI), and distance to the inner bank edge (or toe distance) and distance to middle of the meander-bound floodplain.

Floodplain G Floodplain L Floodplain Z

Toe distance Distance to middle

Supplementary Figure 5. Diagram representing imaginary lines used to determine distance to the middle of the meander-bound floodplain and distance to the inner bank edge (toe distance) as alternative measures of samples position on the floodplains.

Supplementary Figure 6. Spearman’s correlation among environmental variables (2015).

IC C.N TN TC OC 1

IC 1 −0.15 −0.86 −0.51 −0.58 0.8

0.6

C.N 1 0.4 0.77 0.74 0.4

0.2

TN 1 0.81 0.85 0

−0.2

TC 1 0.97 −0.4

−0.6

OC 1 −0.8

−1

Supplementary Figure 7. Spearman’s correlation among environmental variables (2016). List of Supplementary Tables (separate files)

Supplementary Table 1: Sample sequencing and assembly information, and NCBI accession numbers.

Supplementary Table 2: Representative genomes (of 248 sub-species clusters at 98% ANI) information. This set of genomes was reconstructed from soil samples collected in September 2015.

Supplementary Table 3: Representative genomes (of 215 species clusters at 95% ANI) information. This set of genomes includes genomes reconstructed from samples collected in September 2015 and September 2016.

Supplementary Table 4: Presence/Absence of gene homologs identified among 248 representative genomes using selected HMMs. Includes rules used to define functions. Only “Present” considered for downstream analyses.

Supplementary Table 5: Results of the indicator species analysis.

Supplementary Table 6: Presence/Absence of functions among 248 representative genomes.

Supplementary Table 7: Solid phase chemistry for subset of samples collected in September 2016 with paired metatranscriptomes.

Supplementary Table 8: Carbohydrate active enzymes (CAZYs) in the glycosyl hydrolases (GH), carbohydrate esterases (CE), polysaccharide lyases (PL), and auxiliary activities (AA) classes detected among the 215 representative genomes (e- value cut-off 1e-20).

Supplementary Table 9: Unique sequences from both genome sets confirmed to be homologous to NiFe hydrogenases groups 1, 2 or 3 based on a phylogenetic analysis.

Supplementary Table 10: Unique sequences from both genome sets confirmed to be homologous to FeFe hydrogenases groups A, B or C based on a phylogenetic analysis. Sequences identified among the 248 representative genomes were used in analyses based on presence/absence. Sequences identified among the 215 representative genomes were used in transcript analyses.

Supplementary Table 11: Unique sequences from both genome sets confirmed to be homologous to FdhA/FdoG/FdhF/FdwA, NxrA, NapA, and NarG based on a phylogenetic analysis.

Supplementary Table 12: Unique sequences from both genome sets confirmed to be homologous to CoxL based on a phylogenetic analysis. Subtype is indicated next to the sequences.

Supplementary Table 13: Unique sequences from both genome sets confirmed to be homologous to methanol dehydrogenases.

Supplementary Table 14: Unique sequences from both genome sets homologous to DsrAB. Subtype is indicated next to the sequences.

List of Supplementary Data (separate files)

Supplementary Data 1: Concatenated ribosomal proteins phylogenetic tree of Betaproteobacteria among 248 representative genomes of sub-species level clusters at 98% ANI.

Supplementary Data 2: Concatenated ribosomal proteins phylogenetic tree of 215 representative genomes of species level clusters at 95% ANI.

Supplementary Data 3: Phylogenetic tree of NiFe hydrogenases groups 1, 2 and 3 reference sequences and homologs identified among unique sequences from both genome sets.

Supplementary Data 4: Phylogenetic tree of FeFe hydrogenases groups A, B and C reference sequences and homologs identified among unique sequences from both genome sets.

Supplementary Data 5: Phylogenetic tree of DMSOR superfamily reference sequences and homologs identified among unique sequences from both genome sets.

Supplementary Data 6: Phylogenetic tree of CoxL reference sequences and homologs identified among unique sequences from both genome sets.

Supplementary Data 7: Phylogenetic tree of PQQ-containing alcohol dehydrogenases reference sequences and homologs identified among unique sequences from both genome sets.

Supplementary Data 8: Phylogenetic tree of concatenated DsrAB reference sequences and homologs identified among unique sequences from both genome sets.

References

1. Lavy A, et al. Taxonomically and metabolically distinct microbial communities with depth and across a hillslope to riparian zone transect. bioRxiv, 768572 (2019).

2. Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, Hugenholtz P. A complete domain-to-species taxonomy for and Archaea. Nat Biotechnol, (2020).