SI Figure 1: Rarefaction curves of mock communities, after subsampling to 6,800 sequences but prior to removing rare OTUs (see Results and Methods for complete details). Solid line represents the mean and dashed lines indicate the range of all eight replicate mock communities; rarefaction curves of Even and Bloom communities were very similar and therefore are combined here. Note that the color scheme here is altered to distinguish the overlapping V4-5 and V3-4 lines. A: Full range of data. B: Subset of panel A, with restricted axes for detail. Dotted black line indicates 22 OTUs, the number of cloned 16S genes added to the mock communities.

SI Figure 2: Log2-fold ratios of observed OTU relative abundance to expected, in Even and Bloom communities. Solid line is a 1:1 ratio. A: V4-5. B: V4. C: V3-4. D: V1-2.

SI Figure 3: Mock community results – log2-fold ratio to expected values in Bloom mock community (mean of four replicates; error bars indicate range). “Other” columns indicate sequences that could not be assigned to an expected clone. As log2-fold ratios cannot accommodate zero values, samples were adjusted as follows: for samples with mean observed relative abundance = 0 and expected > 0, the log2-fold ratio was set to -10, without error bars, which was an arbitrary value more negative than any observed; for samples with observed > 0 and expected = 0, the expected value was set to 0.0001, which is less than 1 sequence per 6800, for calculations; for individual replicates with observed = 0 and expected > 0, where other replicates had observed > 0, relative abundance was set to 0.0001 for calculations. Dotted lines delineate +/- 1.58, or a three-fold ratio to expected.

SI Figure 4: Expected vs. observed relative abundances of taxa in the Bloom mock communities, on a log-10 scale. Filled circles represent those taxa within a three-fold difference of expected relative abundance. Empty circles represent those taxa differing from expected by more than three-fold in either direction; only these taxa are labeled. Circles on the horizontal axis represent taxa that were not detected by a primer set; taxa on the vertical axis represent sequences that could not be classified to an expected clone. Error bars indicate the maximum and minimum relative abundances observed in four replicates. A: V4-5. B: V4. C: V3-4. D: V1-2.

SI Figure 5: Comparison of relative abundances of taxa within the mock communities using the V1-2 primers, clustered at 95% sequence similarity and at 97% sequence similarity. A: Even community. B: Bloom community.

SI Figure 6: Rarefaction curves of field samples from each of the primer sets. In all cases, the colored lines represent samples collected between 0 and 30 m, while the black lines represent samples collected between 75 and 300 m. Solid lines indicate the mean of all samples within the depth range, and dashed lines represent the minimum and maximum values of all samples. Note that all figures are plotted on an identical Y-axis. A: V4-5. B: V4. C: V3-4. D: V1-2.

SI Figure 7: Bacterioplankton community composition (BCC) of select taxa in the metagenomes and the corresponding amplicon datasets. Taxa are arranged by hierarchical clustering; samples are grouped by the date on which they were collected (month-year) and depth. All taxa are plotted on the same scale. “Other ” includes both less common taxa and sequences that could be classified to that domain but not to a useful subsequent level (generally those unclassified beyond the phylum or family levels).

SI Figure 8: Relative abundance of select clades of interest in metagenomes and amplicon libraries. To accommodate the log scale, zeroes were plotted as 0.01%.

SI Figure 9: NMS plots of surface samples only, with and without archaea. All figures have been rotated to have similar orientation, with the seasonal cycle in a roughly clockwise pattern from the top left. Legend for all as in Panel C. A: V4-5, all taxa, 2D stress = 0.12. B. V4, all taxa, 2D stress = 0.1. C: V3-4, all taxa, 2D stress = 0.09. D: V1-2, bacteria only as V1-2 primers do not sample archaea, 2D stress = 0.09. E: V4-5, no archaea, 2D stress = 0.1. F: V4, no archaea, 2D stress = 0.09. G: V3-4, no archaea, 2D stress = 0.1. H. V1-2, no archaea, identical to panel D but presented here for comparison.

SI Figure 10: NMS plots of field samples, including samples from all depths, with archaeal sequences removed in silico. All figures have been rotated for similar orientations. Symbol color indicates sampling month and shape indicates sampling depth, as in Panel B. A: V4-5 primers, 2D stress = 0.09. B: V4 primers, 2D stress = 0.1. C: V3-4 primers, 2D stress = 0.1. D: V1-2 primers, 2D stress = 0.09. Panel D is identical to Figure 4D, as the V1-2 primers do not amplify archaea, and is presented here for comparison.

SI Figure 11: Distinctions between surface (0 m) and sub-euphotic zone (75 m) samples, by primer set. Center lines indicate the median of 15 pairs of samples from the same sampling casts, the box indicates 25th and 75th percentiles, and the error bars indicate minimum and maximum values. Letters indicate significant differences, based on a one-way ANOVA with Ryan-Einot- Gabriel-Welsch Range posthoc. The V1-2 primer set does not amplify archaea; the other samples labeled “Without archaea” had all archaeal sequences removed in silico. A: Weighted UniFrac distances between paired samples. B: Percent change in Shannon diversity between the same paired samples, from surface to 75 m.

SI Figure 12: Mock community data from Parada et al. (2016) Table 1, plotted following the log-2 fold ratio from expected as used in Figure 1. Note that Parada et al. used the older 806R primer rather than the newer 806R-B designed by Apprill et al. (2015). Bars indicate means of four replicates; as Parada et al. reported standard error rather than range, we have opted not to include error bars. Dashed lines indicate a log-2 fold ratio of +/- 1.58, the equivalent of a 3-fold difference from expected. SI Table 1: The primer sequences used to generate the full-length 16S rRNA gene clone library

Primer Domain targeted Sequence (5’ – 3’) Reference 8F Bacteria AGRGTTYGATYMTGGCTCAG Morris et al. 2004 1492bact Bacteria GGYTACCTTGTTACGACTT Morris et al. 2004 9F Archaea CCGGTTGATCCYGSCRG Klindworth et al. 2013 1492arch Archaea GGCTACCTTGTTACGACTT Klindworth et al. 2013

Additional references:

Morris, R.M., Rappé, M.S., Urbach, E., Connon, S.A., and Giovannoni, S.J. (2004) Prevalence of the Chloroflexi-related SAR202 bacterioplankton cluster throughout the mesopelagic zone and deep ocean. Applied and Environmental Microbiology 70: 2836-2842.

SI Table 2: Mock community composition and results

Even Target V4-5 V4 V3-4 V1-2 Mean SD Mean SD Mean SD Mean SD Archaea Euryarchaeota: Marine Group II 4.5 12.6 0.5 16.3 0.8 18.3 0.5 NDC Thaumarchaeota: Nitrosopumillus 4.5 11.5 0.6 17.9 0.7 0.5 0.0 ND Actinobacteria: OCS155 4.5 2.9 0.2 2.5 0.2 7.0 0.3 3.7 0.3 Bacteroidetes FlavobacteriaA: Polaribacter 4.5 6.1 0.02 4.1 0.2 5.4 0.2 2.1 0.2 Flavobacteria: NS4 marine group 4.5 2.4 0.2 1.8 0.2 2.3 0.2 2.2 0.1 Flavobacteria: NS5 marine group 4.5 9.2 0.04 5.6 0.1 9.1 0.5 14.0 0.7 Deferribacteres: SAR406 4.5 6.4 0.2 3.8 0.4 6.8 0.4 11.4 0.8 SAR11 Surface 1 4.5 2.5 0.3 3.9 0.3 1.0 0.1 8.3 0.4 SAR11 Surface 2 4.5 3.7 0.4 4.9 0.5 1.7 0.2 8.5 0.3 SAR11 Deep 1 4.5 1.1 0.1 2.0 0.1 0.4 0.0 0.8 0.1 Roseobacter NAC11-7 4.5 3.7 0.3 2.4 0.4 3.4 0.3 ND Roseobacter DC5-80-3 4.5 3.1 0.4 2.3 0.5 3.3 0.5 1.4 0.2 Roseobacter OCT 4.5 ND 1.8 0.2 2.2 0.2 0.03 0.0 *Other RhodobacteraceaeB 0 2.5 0.3 0.02 0.0 5.8 0.2 OCS116 4.5 6.3 0.7 4.2 0.3 6.4 0.4 7.1 0.5 SAR116 4.5 3.8 0.1 ND 0.3 0.0 ND Deltaproteobacteria Nitrospina 4.5 6.5 0.5 7.1 0.2 7.8 0.3 7.3 0.2 SAR324: Marine Group B 4.5 2.9 0.1 2.3 0.2 3.3 0.4 5.2 0.4 Gammaproteobacteria SAR92 4.5 2.8 0.5 3.1 0.1 3.5 0.1 4.3 0.1 OM60(NOR5) 4.5 3.4 0.3 5.4 0.2 4.8 0.3 4.3 0.1 Pseudospirillum 4.5 0.01 0.0 2.2 0.3 2.0 0.2 2.8 0.2 SAR86 4.5 3.2 0.2 4.3 0.2 8.8 0.3 6.5 0.3 Verrucomicrobia: Opitutae: MB11C04 4.5 3.5 0.3 1.9 0.1 1.8 0.2 4.3 0.3 *Other: < 0.1% 0 0.02 0.0 0.02 0.0

Bloom Target V4-5 V4 V3-4 V1-2 Mean SD Mean SD Mean SD Mean SD Archaea Euryarchaeota: Marine Group II 1 5.5 1.2 8.1 0.8 10.1 1.4 ND Thaumarchaeota: Nitrosopumillus 1 3.8 0.4 8.1 0.7 0.1 0.0 ND Actinobacteria: OCS155 2 1.8 0.0 1.4 0.3 4.9 0.3 1.7 0.1 Bacteroidetes Flavobacteria: Polaribacter 10 14.3 0.3 10.4 0.4 12.2 1.0 5.1 0.3 Flavobacteria: NS4 marine group 2 1.3 0.2 1.1 0.2 1.0 0.1 0.8 0.2 Flavobacteria: NS5 marine group 5 11.4 0.5 8.0 0.7 10.9 0.9 14.1 0.4 Deferribacteres: SAR406 2 3.4 0.2 2.1 0.1 4.0 0.3 5.8 0.1 Alphaproteobacteria SAR11 Surface 1 25 13.7 0.8 17.2 0.7 6.5 0.4 25.3 0.5 SAR11 Surface 2 10 7.9 0.3 9.4 0.4 4.6 0.2 13.0 0.2 SAR11 Deep 1 1 0.2 0.0 0.3 0.1 0.09 0.0 0.1 0.1 Roseobacter NAC11-7 10 10.4 1.4 7.2 0.7 9.2 0.4 ND Roseobacter DC5-80-3 5 3.5 0.1 3.4 0.5 4.7 0.3 1.6 0.3 Roseobacter OCT 5 ND 2.6 0.4 3.0 0.2 0.03 0.0 *Other 0 3.3 0.3 0.02 0.0 10.5 0.3 OCS116 2 3.6 1.0 2.4 0.3 4.4 0.4 3.1 0.2 SAR116 2 2.3 0.1 ND 0.2 0.0 ND Deltaproteobacteria Nitrospina 1 1.6 0.2 2.1 0.1 2.6 0.4 1.8 0.1 SAR324: Marine Group B 1 0.8 0.1 0.7 0.1 0.8 0.1 1.2 0.1 Gammaproteobacteria SAR92 3 2.6 0.1 2.6 0.1 3.2 0.3 2.9 0.0 OM60(NOR5) 2 1.8 0.2 3.3 0.1 3.0 0.2 1.9 0.2 Pseudospirillum 3 0.03 0.0 2.0 0.1 1.5 0.1 1.9 0.2 SAR86 5 4.7 0.2 6.7 0.2 12.2 0.3 7.4 0.3 Verrucomicrobia: Opitutae: MB11C04 2 2.2 0.2 1.0 0.1 0.9 0.2 1.9 0.1 *Other: < 0.1% 0 0.02 0.0 0.01 0.0

Values are mean percent of BCC and standard deviation of four technical replicates for each community type. For full clone identities, see SI Table 3. A: To maintain consistency with the SILVA taxonomy, the Flavobacteriia are referred to as Flavobacteria throughout. B: Clades with asterisks indicate sequences that could not be assigned to their source clone with complete taxonomic resolution. C: ND = not detected; only specified for clades that were expected to be present. SI Table 3: Identity of cloned 16S rRNA genes included in the mock communities, to the lowest classified taxonomic level

SILVA v115 custom database SILVA v132 via SINA Aligner GenBank (Goldberg et al., 2017) accession number Archaea; Euryarchaeota; Thermoplasmata; Archaea; Euryarchaeota; Thermoplasmata; Marine MG012919 Thermoplasmatales; Marine Group II; unidentified Group II archaeon Archaea; Thaumarchaeota; Marine Group I; Order Archaea; Thaumarchaeota; Nitrososphaeria; MG012910 Incertae Sedis; Family Incertae Sedis; Candidatus Nitrosopumilales; Nitrosopumilaceae; Candidatus Nitrosopumilus Nitrosopumilus Bacteria; Actinobacteria; Acidimicrobiia; Bacteria; Actinobacteria; Acidimicrobiia; MG013047 Acidimicrobiales; OCS155 marine group; uncultured Actinomarinales; Actinomarinaceae; Candidatus Actinobacterium OCS155 Actinomarina Bacteria; Bacteroidetes; Flavobacteria; Flavobacteriales; Bacteria; Bacteroidetes; Bacteroidia; MG013081 Flavobacteriaceae; Polaribacter Flavobacteriales; Flavobacteriaceae; Polaribacter 3 Bacteria; Bacteroidetes; Flavobacteria; Flavobacteriales; Bacteria; Bacteroidetes; Bacteroidia; MG012979 Flavobacteriaceae; NS4 marine group Flavobacteriales; Flavobacteriaceae; NS4 marine group Bacteria; Bacteroidetes; Flavobacteria; Flavobacteriales; Bacteria; Bacteroidetes; Bacteroidia; MG013000 Flavobacteriaceae; NS5 marine group Flavobacteriales; Flavobacteriaceae; NS5 marine group Bacteria; Deferribacteres; Deferribacteres; Bacteria; Marinimicrobia (SAR406 clade) MG012948 Deferribacterales; SAR406 clade (Marine group A); uncultured Eubacterium OCS307 Bacteria; ; Alphaproteobacteria; SAR11 Bacteria; Proteobacteria; Alphaproteobacteria; MG012997 clade; Surface 1; uncultured bacterium SAR11 clade; Clade I; Clade Ia Bacteria; Proteobacteria; Alphaproteobacteria; SAR11 Bacteria; Proteobacteria; Alphaproteobacteria; MG012988 clade; Surface 2; uncultured bacterium SAR11 clade; Clade II Bacteria; Proteobacteria; Alphaproteobacteria; SAR11 Bacteria; Proteobacteria; Alphaproteobacteria; MG012961 clade; Deep 1; uncultured Alphaproteobacterium SAR11 clade; Clade II Bacteria; Proteobacteria; Alphaproteobacteria; Bacteria; Proteobacteria; Alphaproteobacteria; MG013082 ; Rhodobacteraceae; Roseobacter clade Rhodobacterales; Rhodobacteraceae; Amylibacter NAC11-7 lineage Bacteria; Proteobacteria; Alphaproteobacteria; Bacteria; Proteobacteria; Alphaproteobacteria; MG013066 Rhodobacterales; Rhodobacteraceae; Roseobacter clade Rhodobacterales; Rhodobacteraceae; DC5-80-3 lineage Planktomarina Bacteria; Proteobacteria; Alphaproteobacteria; Bacteria; Proteobacteria; Alphaproteobacteria; MG013062 Rhodobacterales; Rhodobacteraceae; Roseobacter clade Rhodobacterales; Rhodobacteraceae; OCT lineage Ascidiaceihabitans Bacteria; Proteobacteria; Alphaproteobacteria; OCS116 Bacteria; Proteobacteria; Alphaproteobacteria; MG013063 clade; uncultured bacterium; unclassified Parvibaculales; PS1 clade Bacteria; Proteobacteria; Alphaproteobacteria; Bacteria; Proteobacteria; Alphaproteobacteria; MG013049 Rickettsiales; SAR116 clade; Candidatus Puniceispirillales; SAR116 clade Puniceispirillum Bacteria; Proteobacteria; Deltaproteobacteria; Bacteria; Nitrospinae; Nitrospinia; Nitrospinales; MG012976 Desulfobacterales; Nitrospinaceae; Nitrospina Nitrospinaceae; LS-NOB Bacteria; Proteobacteria; Deltaproteobacteria; SAR324 Bacteria; Proteobacteria; Deltaproteobacteria; MG012978 clade (Marine group B); uncultured bacterium; SAR324 clade (Marine group B) unclassified Bacteria; Proteobacteria; Gammaproteobacteria; Bacteria; Proteobacteria; Gammaproteobacteria; MG013105 Alteromonadales; Alteromonadaceae; SAR92 clade Cellvibrionales; Porticoccaceae; SAR92 clade Bacteria; Proteobacteria; Gammaproteobacteria; Bacteria; Proteobacteria; Gammaproteobacteria; MG013076; Alteromonadales; Alteromonadaceae; OM60(NOR5) Cellvibrionales; Halieaceae; OM60(NOR5) clade MG013077 clade Bacteria; Proteobacteria; Gammaproteobacteria; Bacteria; Proteobacteria; Gammaproteobacteria; MG013051 Oceanospirillales; Oceanospirillaceae; Pseudospirillum Oceanospirillales; Pseudohongiellaceae; Pseudohongiella Bacteria; Proteobacteria; Gammaproteobacteria; Bacteria; Proteobacteria; Gammaproteobacteria; MG013126 Oceanospirillales; SAR86 clade; uncultured SAR86 clade Gammaproteobacterium Bacteria; Verrucomicrobia; Opitutae; MB11C04 marine Bacteria; Verrucomicrobia; Verrucomicrobiae; MG012971 group; uncultured marine bacterium; unclassified Opitutales; Puniceicoccaceae; MB11C04 marine group

The identities used in the text are based on SILVA v115, custom curated as described in Goldberg et al. (2017). As the bacterial and archaeal taxonomies are in flux, the most recent taxonomic identity at the time of publication is also provided, based in SILVA v132. When two accession numbers are listed, the forward and reverse sequences could not be assembled into a contig.

SI Table 4: Updated taxonomies (SILVA v132) for OTUs from field samples discussed in the text

Primer set Taxonomy SILVA v115 Nitrosopumilus sp. V4-5 Archaea; Thaumarchaeota; Nitrososphaeria; Nitrosopumilales; Nitrosopumilaceae; Candidatus Nitrosopumilus V4 Archaea; Thaumarchaeota; Nitrososphaeria; Nitrosopumilales; Nitrosopumilaceae; Candidatus Nitrosopumilus V3-4 N/A V1-2 N/A SILVA v115 OCS155 V4-5 Bacteria; Actinobacteria; Acidimicrobiia; Actinomarinales; Actinomarinaceae; Candidatus Actinomarina V4 Bacteria; Actinobacteria; Acidimicrobiia; Actinomarinales; Actinomarinaceae; Candidatus Actinomarina V3-4 Bacteria; Actinobacteria; Acidimicrobiia; Actinomarinales; Actinomarinaceae; Candidatus Actinomarina V1-2 Bacteria; Actinobacteria; Acidimicrobiia; Actinomarinales; Actinomarinaceae; Candidatus Actinomarina SILVA v115 Polaribacter V4-5 Bacteria; Bacteroidetes; Bacteroidia; Flavobacteriales; Flavobacteriaceae V4 Bacteria; Bacteroidetes; Bacteroidia; Flavobacteriales; Flavobacteriaceae V3-4 Bacteria; Bacteroidetes; Bacteroidia; Flavobacteriales; Flavobacteriaceae V1-2 Bacteria; Bacteroidetes; Bacteroidia; Flavobacteriales; Flavobacteriaceae; Polaribacter 3 SILVA v115 Roseobacter NAC11-7 (or most common Rhodobacteraceae, V1-2) V4-5 Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae; Amylibacter V4 Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae; Amylibacter V3-4 Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae V1-2 Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales; Rhodobacteraceae SILVA v115 SAR11 Surface 1 V4-5 Bacteria: Proteobacteria; Alphaproteobacteria; SAR11 Clade; Clade I; Clade Ia V4 Bacteria: Proteobacteria; Alphaproteobacteria; SAR11 Clade; Clade I; Clade Ia V3-4 Bacteria: Proteobacteria; Alphaproteobacteria; SAR11 Clade; Clade I; Clade Ia V1-2 Bacteria: Proteobacteria; Alphaproteobacteria; SAR11 Clade; Clade I; Clade Ia SILVA v115 SAR11 Surface 2 V4-5 Bacteria; Proteobacteria; Alphaproteobacteria; SAR11 clade V4 Bacteria; Proteobacteria; Alphaproteobacteria; SAR11 clade; Clade II V3-4 Bacteria; Proteobacteria; Alphaproteobacteria; SAR11 clade; Clade II V1-2 Bacteria; Proteobacteria; Alphaproteobacteria; SAR11 clade; Clade II SILVA v115 SAR86 V4-5 Bacteria; Proteobacteria; Gammaproteobacteria; SAR86 clade V4 Bacteria; Proteobacteria; Gammaproteobacteria; SAR86 clade V3-4 Bacteria; Proteobacteria; Gammaproteobacteria; SAR86 clade V1-2 Bacteria; Proteobacteria; Gammaproteobacteria; SAR86 clade SILVA v115 ZD0405 (or most common Oceanospirillales, V4) V4-5 Bacteria; Proteobacteria; Gammaproteobacteria; Thiomicrospirales; Thioglobaceae; SUP05 cluster V4 Bacteria; Proteobacteria; Gammaproteobacteria; Thiomicrospirales; Thioglobaceae; SUP05 cluster V3-4 Bacteria; Proteobacteria; Gammaproteobacteria; Thiomicrospirales; Thioglobaceae; SUP05 cluster V1-2 Bacteria; Proteobacteria

For each OTU, the SILVA v115 name used in the text is presented in bold, followed by the SILVA v132 classification via SINA Aligner of the representative sequence for each primer set. SI Table 5: Slopes of orthogonal (Model II) regressions of specific OTUs between primer sets

V4-5 V4 V3-4 Slope 95% CI Slope 95% CI Slope 95% CI NitrosopumilusA V4 1.42 1.34-1.50 V3-4 N/A N/A V1-2 N/A N/A N/A OCS155 V4 0.62 0.58-0.67 V3-4 2.25 2.11-2.41 3.70 3.37-4.10 V1-2 1.06 0.98-1.15 1.75 1.57-1.95 0.47 0.45-0.50 Roseobacter NAC11-7, or Rhodobacteraceae unclassified in V1-2 V4 0.84 0.81-0.87 V3-4 0.85 0.79-0.91 1.02 0.94-1.11 V1-2 0.74 0.69-0.78 0.88 0.82-0.94 0.86 0.82-0.91 Polaribacter V4 0.92 0.90-0.93 V3-4 0.79 0.78-0.81 0.86 0.84-0.88 V1-2 0.88 0.85-0.90 0.95 0.93-0.98 1.10 1.08-1.13 SAR11 Surface 1 V4 1.12 1.05-1.20 V3-4 0.32 0.28-0.36 0.28 0.24-0.32 V1-2 1.06 0.95-1.18 0.94 0.85-1.05 3.28 2.96-3.67 SAR11 Surface 2 V4 1.26 1.20-1.31 V3-4 0.41 0.39-0.43 0.33 0.31-0.34 V1-2 0.85 0.81-0.90 0.68 0.65-0.71 2.08 1.97-2.19 SAR86 V4 1.85 1.73-1.98 V3-4 2.40 2.20-2.64 1.28 1.18-1.38 V1-2 1.91 1.77-2.07 1.02 0.95-1.11 0.80 0.74-0.87 ZD0405, or unclassified Oceanospirillales in V4 V4 0.60 0.57-0.63 V3-4 1.43 1.32-1.56 2.50 2.25-2.81 V1-2 1.39 1.30-1.49 2.39 2.19-2.62 0.98 0.92-1.04

Slopes that are not significantly different from 1 (1 is within the 95% confidence interval) are indicated in bold. For each comparison, the primer set listed at the top of the table was on the X- axis, and the primer set listed to the left was on the Y-axis. For all comparisons, N = 76. A: Regressions were not calculated for Nitrosopumilus with the V3-4 and V1-2 primer sets. SI Table 6: Percent of community of taxa in the metagenomes and corresponding field samples Metagenome V4-5 V4 V3-4 V1-2 MeanA Range Mean Range Mean Range Mean Range Mean Range Phyla (classes within Proteobacteria) Euryarchaeota 0.6 0 - 1.8 0.9 0.1 - 2.5 1.3 0.3 - 2.5 0 0 - 0.1 0 0 - 0 Thaumarchaeota 3.2 0 - 15.3 6.1 0.1 - 28.1 12.1 0 - 54.3 0 0 - 0.1 0 0 - 0 Actinobacteria 2.0 0 - 6.8 2.6 0.1 - 9.1 1.8 0 - 5.4 6.4 0.3 - 19.8 2.5 0.1 - 8.1 Bacteroidetes 10.6 1.9 - 38.0 20.4 5.1 - 63.5 14.4 2.0 - 59.6 17.0 4.6 - 50.1 20.9 5.4 - 66.2 Cyanobacteria 0.4 0 - 1.5 0.4 0 - 1.2 0.2 0 - 0.8 1.7 0.1 - 7.3 0 0 - 0.1 Deferribacteres 1.0 0.1 - 3.4 2.4 0.1 - 7.5 2.1 0.1 - 5.2 3.9 0.4 - 10.4 2.2 0.2 - 6.6 Alphaproteobacteria 28.7 14.9 - 46.1 17.3 - 44.2 12.6 - 31.7 13.3 - 43.2 23.5 - 39.6 67.7 66.3 41.9 58.4 Betaproteobacteria 0.6 0 - 1.8 1.2 0.3 - 2.7 1.1 0.1 - 3.1 1.4 0.1 - 4.3 1.4 0.1 - 3.9 Deltaproteobacteria 0.8 0 - 4.1 2.6 0 - 13.6 1.5 0 - 8.8 5.3 0 - 25.4 0.9 0 - 4.5 Gammaproteobacteria 16.6 5.8 - 22.6 15.9 6.2 - 22.3 18.2 12.9 - 29.9 18.9 - 17.9 9.6 - 27.8 25.8 35.5 Proteobacteria: unclassified 1.9 0.9 - 4.6 0.3 0 - 0.9 1.2 0.1 - 2.8 1.2 0 - 2.7 4.6 0 - 18.4 Bacteria: rare, unclassified 33.5 29.3 - 1.1 0.3 - 3.3 2.0 0.3 - 5.8 1.7 0.1 - 5.5 6.5 0.3 - 18.1 41.3 Taxa of interest, genus to order level Genus Nitrosopumilus 1.0 0 - 4.7 2.6 0 - 11.1 4.6 0 - 18.1 0 0 - 0 0 0 - 0 Family OCS155 marine 1.7 0 - 6.6 2.4 0.1 - 9.1 1.6 0 - 5.3 5.8 0.3 - 19.7 2.4 0.1 - 8.1 group Family Flavobacteriaceae 8.4 1.2 - 31.6 16.3 3.4 - 51.4 11.9 1.1 - 50.3 13.8 3.3 - 41.2 18.0 4.8 - 55.9 Family Rhodobacteraceae 6.8 0.2 - 15.2 14.9 0.2 - 36.6 10.6 0.4 - 30.0 12.7 0.4 - 26.2 8.3 0 - 17.9 Genus Roseobacter 3.6 0 - 8.7 11.2 0 - 29.8 8.1 0.1 - 24.6 9.2 0.1 - 20.3 1.0 0 - 2.8 Order SAR11 clade 18.1 0.7 - 35.7 26.7 1.1 - 64.3 28.3 2.6 - 57.5 9.0 0.2 - 17.8 32.2 2.4 - 52.3 Family SAR11 Surface 1 14.1 0.4 - 27.2 21.1 0.9 - 44.6 22.1 1.6 - 43.9 6.2 0.2 - 12.6 26.2 1.7 - 45.7 Family SAR116 0.8 0 - 1.7 1.6 0 - 3.1 1.2 0 - 2.3 2.4 0.1 - 4.9 0.7 0 - 1.6 Genus SAR92 1.0 0.1 - 2.9 1.0 0 - 2.4 1.2 0 - 3.3 2.2 0.2 - 5.6 1.3 0.1 - 3.1 Family SAR86 7.1 2.1 - 15.7 6.3 1.4 - 11.1 9.6 3.9 - 16.3 13.9 3.0 - 22.9 7.0 1.0 - 12.3 Family ZD0405 2.2 0 - 7.2 4.2 0.1 - 11.5 0 0 - 0.1 5.5 0.1 - 18.9 5.7 0 - 18.7 A: Values are mean and range of ten samples.

SI Table 7: SAR116 sequences corresponding to the primer regions from the mock community clone and abundant field OTUs Sequence % Sim. Forward (5’ – 3’) Reverse (5’ – 3’; RC of primers) % BCC % BCC (median) (range) V4-5 Primers GTGYCAGCMGCCGCGGTAA AAACTYAAAKRAATTGRCGG Clone GTGCCAGCAGCCGCGGTAA AAACTCAAAGGAATTGACGG OTU 88 GTGTCAGCCGCCGCGGTAA AAACTCAAATAAATTGACGG 0.7 0 - 3.1 OTU 99 GTGTCAGCCGCCGCGGTAA AAACTCAAATAAATTGACGG 0.5 0 - 3.4 OTU 93 GTGTCAGCCGCCGCGGTAA AAACTCAAATAAATTGACGG 0.07 0 - 2.5 OTU 95 GTGTCAGCAGCCGCGGTAA AAACTTAAAGGAATTGACGG 0.1 0 - 1.5 OTU 97 GTGTCAGCAGCCGCGGTAA AAACTTAAAGGAATTGACGG 0.01 0 - 0.2 OTU 93 GTGTCAGCCGCCGCGGTAA AAACTTAAAGAAATTGACGG 0.03 0 - 0.5 Parada GTGCCAGCAGCCGCGGTAA AAACTCAAAGGAATTGACGG V4 Primers GTGYCAGCMGCCGCGGTAA ATTAGAWACCCBNGTAGTCC Clone GTGCCAGCAGCCGCGGTAA ATCAGATACCCTGGTAGTCC OTU 88 GTGTCAGCCGCCGCGGTAA ATTAGATACCCGGGTAGTCC 0.4 0 - 2.1 OTU 98 GTGTCAGCCGCCGCGGTAA ATTAGATACCCCCGTAGTCC 0.4 0 - 2.6 OTU 94 GTGTCAGCAGCCGCGGTAA ATTAGAAACCCGGGTAGTCC 0.06 0 - 1.3 OTU 96 GTGTCAGCCGCCGCGGTAA ATTAGATACCCGAGTAGTCC 0.04 0 - 0.3 Parada GTGCCAGCAGCCGCGGTAA GTTAGATACCCTGGTAGTCC V3-4 Primers CCTACGGGNGGCWGCAG GGATTAGATACCCBDGTAGTC Clone CCTACGGGAGGCAGCAG GGATCAGATACCCTGGTAGTC OTU 90 CCTACGGGGGGCTGCAG GGATTAGATACCCCAGTAGTC 0.7 0 - 3.8 OTU 99 CCTACGGGGGGCTGCAG GGATTAGATACCCCAGTAGTC 0.7 0 - 5.5 OTU 96 CCTACGGGGGGCTGCAG GGATTAGATACCCCAGTAGTC 0.2 0 - 4.5 OTU 93 CCTACGGGGGGCTGCAG GGATTAGATACCCCTGTAGTC 0 0 - 0.6 V1-2 Primers AGRGTTTGATYMTGGCTCAG ACWCCTACGGGWGGCWGCA Clone AGGGTTTGATTATGGCTCAG ACTCCTACGGGAGGCAGCA OTU 91 AGGGTTTGATTATGGCTCAG ACACCTACGGGTGGCTGCA 0.5 0 - 2.4 OTU 88 missing – Sanger quality screen ACTCCTACGGGTGGCTGCA 0.02 0 - 0.4

% Sim. = percent similarity between full length of amplicon and clone per BLAST+. OTU = representative sequence identified for each SAR116 OTU in the 200 most abundant OTUs from each primer set. RC = the reverse complement of reverse primers are reported, to match the 5’ – 3’ sequence on the sense strand. Locations with variability between the clone and OTU sequences are indicated in bold. The mismatch between our SAR116 clone sequence and the 806RB and 785R primers is in red, and the location of the similar mismatch between clone and primer in Parada et al. (2016) is in blue. % BCC = median and range of percent of community of each OTU in field samples. SI Table 8: Mantel tests – correlations between UniFrac distance matrices

All taxa, all depths V4 V4-5 V3-4 V4-5 rho = 0.944 V3-4 0.840 0.867 V1-2 0.852 0.860 0.938

No archaea, all depths V4 V4-5 V3-4 V4-5 0.949 V3-4 0.887 0.911 V1-2 0.882 0.869 0.951

All taxa, surface only V4 V4-5 V3-4 V4-5 0.890 V3-4 0.821 0.837 V1-2 0.839 0.827 0.918

No archaea, surface only V4 V4-5 V3-4 V4-5 0.944 V3-4 0.864 0.870 V1-2 0.871 0.831 0.921

In all cases, N = 76 (all taxa) or 52 (surface only) and p = 0.001 (based on 999 permutations). Mantel tests were conducted using Primer RELATE algorithm and Spearman’s non-parametric correlations. SI Table 9: BIO-ENV correlations of surface weighted UniFrac matrices with bottom-up environmental parameters

Primer set ParametersA Spearman’s p (from 99 rho permutations) V4-5 Temperature, BP 0.376 0.01 Temperature, BP, Chl a 0.355 BP, Chl a 0.301 V4 Temperature, BP 0.441 0.01 Temperature, BP, Chl a 0.381 Temperature, N+N, BP 0.377 V3-4 BP, Chl a 0.329 0.01 Temperature, BP 0.325 BP 0.321 V1-2 BP, Chl a 0.331 0.01 Temperature, BP, Chl a 0.324 Temperature, BP 0.317 V4-5, no archaea Temperature, BP 0.403 0.01 Temperature, BP, Chl a 0.370 Temperature, N+N, BP 0.329 V4, no archaea Temperature, BP 0.409 0.01 Temperature, BP, Chl a 0.378 Temperature, N+N, BP 0.342 V3-4, no archaea Temperature, BP 0.333 0.01 BP 0.321 BP, Chl a 0.316

The top three models for each primer set are presented for comparison; a significance value was only calculated for the best-fit model. A: Available parameters were temperature, nitrate+nitrite (N+N), bacterial production (BP), chlorophyll a (Chl a), and CDOM spectral slope coefficient.

SI Table 10: Correlations between bottom-up environmental factors

Temperature Nitrate+ BP Chl a CDOM nitrite slope Nitrate+nitrite R = -0.922 N = 52 p < 0.0005 BP 0.002 -0.113 52 52 0.988 0.423 Chl a -0.489 0.323 0.628 52 52 52 <0.0005 0.020 <0.0005 CDOM slope 0.055 0.003 -0.347 -0.480 52 52 52 52 0.696 0.981 0.012 <0.0005 POC -0.307 0.122 0.638 0.852 -0.500 49 49 49 49 49 0.032 0.403 <0.0005 <0.0005 <0.0005

Correlations are Spearman’s rho; significant correlations are in bold. BP = bacterial production, Chl a = chlorophyll a, and CDOM slope = CDOM spectral slope coefficient. Supplemental Experimental Procedures

Sample collection

Water for bacterioplankton DNA samples was pre-filtered through a 1.2 µm filter in a polycarbonate filter holder (to remove microphytoplankton and attached bacteria) then collected on 0.2 µm polyethersulfone filter cartridges (Sterivex-GP, Millipore) under gentle peristaltic pressure and stored frozen at -40° C. For samples collected from 75 m and above, 1 L was filtered; from 150 m and below, 2 L were filtered. Samples were lysed in sucrose lysis buffer (40 mM EDTA, 50 mM Tris-HCl, 750 mM sucrose, 400 mM NaCl, pH adjusted to 8.0) with 1% w/v sodium dodecyl sulfate and 0.2 mg mL-1 proteinase-K at 55° C. Genomic DNA was extracted a commercial silica centrifugation kit (DNEasy, Qiagen, Valencia, CA).

Mock community construction: cloning conditions

Full-length bacterial 16S rRNA gene amplicons were generated primers 8F-1492Rbact

(SI Table 1); primers 9F-1492Rarch targeting archaea were also used with the deep-water sample. PCR reactions consisted of 12.5 µL Kapa Robust Hotstart ReadyMix, 1.25 µL each of the respective forward and reverse primers (10 mM), 6 µL PCR water (5 Prime), and 4 µL genomic DNA, cycled for 3 min at 95° C; 35 cycles of 15 sec at 95° C, 15 sec at 65° C, and 45 sec at 72° C; and 10 min at 72° C. PCR product was cleaned with a Qiagen Qiaquick kit.

Amplicons were ligated into plasmids using the pGEM T Easy Vector System (Promega;

Madison, Wisc.). Plasmids were transformed into JM109 competent cells (Promega) following the manufacturer’s standard protocol. Transformed cells were incubated with shaking in 950 µL

SOC medium (SOB from MP Biomedicals) in polypropylene tubes (Falcon) for 90 minutes at

37° C, then cells were pelleted and resuspended in SOC. Resuspended cells (100 µL) were plated on duplicate plates [LB agar, ampicillin (100 ng mL-1), X-Gal (80 µg mL-1), and IPTG (0.5 mM); KD Medical, Columbia, MD] and incubated overnight at 37° C. Colonies (41-42 per sample, for a total of 250) were picked and transferred to polypropylene tubes with 3 mL SOC medium; isolates were incubated with shaking overnight at 37° C. Plasmids were extracted using a Qiagen

Qiaprep plasmid kit. The 16S rRNA gene region was amplified out of the plasmids using pUC/M13 primers flanking the insert region (pUC/M13F:

CGCCAGGGTTTTCCCAGTCACGAC; pUC/M13R: TCACACAGGAAACAGCTATGAC).

PCR reactions consisted of 12.5 µL Kapa Robust Hotstart ReadyMix, 1.25 µL each of the respective forward and reverse primers (10 mM), 8 µL PCR water (5 Prime), and 2 µL plasmid

DNA, cycled for 3 min at 95° C; 30 cycles of 15 sec at 95° C, 15 sec at 65° C, and 45 sec at 72°

C; and 10 min at 72° C. Inserts were isolated by gel extraction using a Qiagen Qiaquick kit.

Inserts were Sanger sequenced from each end using the same pUC/M primers at the University of California, Berkeley DNA Sequencing Facility.

Sequences were trimmed and assembled in Sequencher (v 5.4.1, Gene Codes

Corporation). Sequence ends were quality-controlled such that: the first 25 base pairs in each sequence had < 10 ambiguous bases; the first 10 base pairs had < 2 bases with < 25% confidence; any leading or trailing ambiguities after the previous quality control were removed; and vector sequences (the pGem-T Easy plasmid insertion sites) were removed. Forward and reverse sequences were assembled with a minimum 20 base pair overlap and a minimum 85% match in the overlapping region. Unassembled sequences were retained unless they were shorter than 100 base pairs or longer than 1400 base pairs, or had overall quality scores < 90%.

Taxonomies were then assigned to both contigs and unassembled sequences as described in the

Experimental Procedures.

Amplicon library construction and sequencing: PCR conditions Samples were amplified using each of four 16S rRNA gene primer sets (Table 1), with clone-based mock communities amplified separately from genomic DNA field samples to avoid cross-contamination. Initial PCR reactions consisted of: 12.5 µL Kapa Robust Hotstart

ReadyMix, 1.25 µL each of the respective forward and reverse primers (10 mM), 8 µL PCR water (5 Prime), and 2 µL genomic DNA (~0.5-12 ng gDNA). Reactions were cycled for 3 min at 95° C; 25 cycles of 30 sec at 95° C, 30 sec at 60° C, and 1 min at 72° C; and 5 min at 72° C.

Amplicons were cleaned with Promega Wizard SV Binding plates (field samples) or Agencourt

Ampure XP beads (mock communities). Nextera XT index primers were attached with PCR reactions consisting of 12.5 µL Kapa Robust Hotstart ReadyMix, 2.5 µL each Nextera XT N700 and S500 series index primers, 2.5 µL PCR water, and 5 µL cleaned product from the first PCR reaction, cycled for 3 min at 95° C; 8 cycles of 30 sec at 95° C, 30 sec at 55° C, and 1 min at 72°

C; and 5 min at 72° C. Samples were then normalized, pooled, and cleaned as in the

Experimental Procedures.

Amplicon library bioinformatics

Bioinformatic analyses were conducted in mothur, modified from the pipeline described in Nelson and Carlson (2012), with samples subdivided by both primer set and type (field sample or mock community) for analysis. Forward and reverse reads were assembled, and low quality sequences were removed (those with: >2 ambiguous base calls, defined in the overlapping region as a disagreement between the two reads with a difference in quality scores of <6; homopolymers >10 nucleotides; and sequence length shorter or longer than 95% of the sequences). Quality-controlled sequences were dereplicated to aggregate unique sequences and aligned to the custom SILVA v115 taxonomy as above. Aligned sequences were screened to remove sequences that started or ended at positions outside 99% of total sequences. To reduce the influence of sequencing errors on community structure, samples from primer sets V4 and V1-

2 were preclustered allowing for 2 base pair mismatches, and samples from primer sets V3-4 and

V4-5 were preclustered with 3 mismatches (Huse et al., 2010); the differing settings made for a similar ratio of mismatches to total amplicon length. Potentially chimeric sequences were identified with UCHIME (Edgar et al., 2011) then removed. Sequences were then classified and grouped into operational taxonomic units as described in the Experimental Procedures.

Supplemental references

Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194-2200.

Huse, S.M., Welch, D.M., Morrison, H.G., and Sogin, M.L. (2010) Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol 12: 1889-1898.