University of Massachusetts Amherst ScholarWorks@UMass Amherst

Doctoral Dissertations Dissertations and Theses

October 2019

ECOLOGY OF THE ELUSIVE: GENOME-INFORMED INVESTIGATION OF SOIL MICROBIAL DIVERSITY

Lauren Alteio University of Massachusetts Amherst

Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2

Part of the Terrestrial and Aquatic Ecology Commons

Recommended Citation Alteio, Lauren, "ECOLOGY OF THE ELUSIVE: GENOME-INFORMED INVESTIGATION OF SOIL MICROBIAL DIVERSITY" (2019). Doctoral Dissertations. 1680. https://doi.org/10.7275/hsq9-ph87 https://scholarworks.umass.edu/dissertations_2/1680

This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. ECOLOGY OF THE ELUSIVE: GENOME-INFORMED INVESTIGATION OF SOIL MICROBIAL DIVERSITY

A Dissertation Presented

by

LAUREN VICTORIA ALTEIO

Submitted to the Graduate School of the

University of Massachusetts Amherst in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

September 2019

Graduate Program in Organismic and Evolutionary Biology

© Copyright by Lauren Victoria Alteio 2019

All Rights Reserved

ECOLOGY OF THE ELUSIVE: GENOME-INFORMED INVESTIGATION OF SOIL MICROBIAL DIVERSITY

A Dissertation Presented

by

LAUREN VICTORIA ALTEIO

Approved as to style and content by:

______

Laura A. Katz, Chair

______

Courtney C. Babbitt, Member

______

Kristina Stinson, Member

______

Tanja Woyke, Member

______

Paige S. Warren, Graduate Program Director

Graduate Program in Organismic and Evolutionary Biology

DEDICATION

To Jeremy Hayward for your unrelenting passion for science and fungi, and for your belief in me from the start. You were a dear friend and mentor. Not a day goes by that you are not on my mind.

ACKNOWLEDGMENTS

Thank you Laura for supporting me through the roughest period of my graduate career. You advocated for me so strongly, I cannot express how grateful I am to have you as a mentor. I am so proud to be a part of the Katz lab. Thank you for accepting me as your own when you had very little to gain.

Thank you Tanja for your support, enthusiasm and advocacy, and for hosting me at the JGI during my SCGSR internship. I am so glad I had the opportunity to work with you. I admire your passion for science and ability to navigate challenging situations with ease. I hope we have many more years of collaboration to come!

Thank you Courtney and Kristina for your support and helpful discussions about my dissertation research, and your helpful career advice.

Thank you Paige for your diplomacy and advocacy. I would not have made it to the finish line without you. You are doing an incredible job as GPD!

Thank you Stephanie and Dagmar for your support and enthusiastic discussions about soils and ! I appreciate you taking me in and hosting me during my visits to DoME.

I am grateful to Jeff Blanchard for setting me up for a successful scientific career. Thank you to the Blanchard lab for all of their support.

Thank you to William Rodriguez for being a lab-mate and brother to me and a fierce supporter and advocate for my science. You are amazing! Best of luck in everything you do in your future.

I am grateful to Alex Truchon for your dedication to our projects and interesting conversations about our results.

Thank you to the Harvard Forest community. Since 2012 you have become a family for me. You helped cultivate my love of science, and provided a supportive environment for me to grow, learn and lead.

Thank you to all of my friends at UMass, the JGI, Walnut Creek run club, DoME and beyond who supported me both near and far through the troughs and peaks of my graduate experience. I truly would not be here today if it were not for all of you. I am grateful to Trisha Zintel for your support, heart-to-hearts and painting sessions. Thank you Kit Straley for your encouragement and creativity in the face of great challenges. Thank you Emily Fusco for your wise words and for being my hiking and running partner. I am grateful to Carolyn Anderson and Elsa Cousins for all of your cooking, adventuring, fermenting and rallying over the last few years. You are great friends and roommates!

Thank you Kim De Paepe, Nora Grossschmidt and Dimitri Meier for making me feel right at home in Vienna. I look forward to seeing you all soon!

v

Thank you to the SUNY-ESF community, particularly the Horton Lab, who supported me as an undergraduate and encouraged me to pursue a PhD. I am grateful to Tom Horton for accepting me into your lab and allowing me to chart my own course. Thank you Becka for being a mentor, friend and confidant while I was an undergraduate and beyond.

Thank you, Benni, for your relentless support and positivity through the last part of my PhD candidacy. Du bist afoch leiwand, mein Freund.

Lastly, (but definitely not least) thank you to my family. Thank you Aunt Andrea and Uncle Bruce for always leaving your doors and hearts open for me. Thank you Morgan for standing by me through it all. Thank you Mom and Dad for believing in me no matter what. Words cannot express how much your love and support has meant to me throughout this journey.

vi

ABSTRACT

ECOLOGY OF THE ELUSIVE: GENOME-INFORMED INVESTIGATION OF SOIL MICROBIAL ECOLOGY

SEPTEMBER 2019

LAUREN VICTORIA ALTEIO

B.S., STATE UNIVERSITY OF NEW YORK COLLEGE OF ENVIRONMENTAL SCIENCE AND FORESTRY

Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST

Directed by: Professor Laura A. Katz

Soil is considered one of the most diverse ecosystems on Earth, harboring diversity of organisms across the three domains of life. It is spatially and chemically heterogeneous: properties that intertwine in a complex matrix to support organismal diversity and function across different scales. Soil microorganisms both respond to and drive changes in ecosystems through metabolic activities. A single gram of soil is teeming with millions of cells comprised of thousands of .

Much of this diversity remains uncharacterized due to technical and methodological challenges faced by soil ecologists. Due to the complex physicochemical properties of soil and cross-feeding interactions between organisms, it is difficult to culture microorganisms in isolation. The immense biological diversity of soils also reduces bioinformatic genome assembly efficiency, therefore obscuring the scope of diversity. As one of Earth's main reservoirs of stored carbon, containing roughly two-thirds of carbon globally, terrestrial ecosystems may serve as a carbon source under future climate scenarios and drive further climate change. Despite challenges associated with the study of soil microorganisms, it remains critical to discover and describe diversity of microbial communities in soils if we are to understand resilience of our ecosystems to climate change.

Surveys of microbial diversity and function in soil have been conducted using amplicon sequencing, metagenomics, and metatranscriptomics, however a large knowledge gap persists in

vii

the characterization of diversity and ecological niches of elusive microorganisms. These are organisms that are typically recalcitrant to laboratory culture, and may appear in relatively low abudance in soil communities or exhibit a high degree of population microheterogeneity, thereby resulting in poor representation in genome assemblies. The focus of my dissertation research is the application of complementary genomic techniques in order to uncover more of the previously unknown microbial diversity contained in forest soils, and link this diversity to higher-level ecosystem function. Much of what is known about soil diversity has been contributed through cultivation-independent investigations, however diversity estimates indicate that we are only beginning to scratch the surface of bacterial, archaeal, and viral diversity in forest soils. We are therefore vastly underestimating the roles these organisms play in biogeochemical processes, such as the release of CO2 to the atmosphere through respiration. However, the scope of microbial diversity and their suite of metabolic functions remain challenging to link to ecosystem level processes due to methodological limitations.

For chapter 1 of my dissertation, I worked in collaboration with researchers at the

University of Vienna using extensive literature searches to explore the different spatial scales at which we study microbial diversity and function with the goal of linking microorganisms and their role as drivers of higher level processes. This work suggests that the level at which microorganisms interact, termed the 'microbial consortium', is a key scale which provides insights into microbial diversity, function, and enables scaling up from the single cell to the ecosystem. In chapter 2, I applied complementary metagenomic techniques to the discovery of soil biological diversity, including bulk metagenomics and a pooled, cell-sorting approach coupled to high-throughput sequencing, termed mini-metagenomics. In combination, these approaches uncover the genetic diversity of elusive microorganisms at the Harvard Forest Long-Term Ecological Research (LTER) site. Together, these approaches have generated some of the highest quality metagenome

viii

assembled genomes (MAGs) to date from this LTER experimental site, and have revealed a swath of diversity beyond the organisms typically found in high abundance in the soil. I demonstrate how complementary metagenomic techniques facilitate the discovery of biological diversity by highlighting the expanded knowledge of potential intracellular in the

Bacteroidetes. In chapter 3, I characterize the metabolism of representatives in the phylum

Acidobacteria subdivision 2, which are abundant in forest soils but have yet to be described as there are no available genome sequences in this taxonomic group. Finally, chapter 4 describes sixteen novel giant viruses which have been discovered in Harvard Forest soil for the first time in collaboration with researchers at the Joint Genome Institute. These expand knowledge of phylogenetic diversity of the nucelocytoplasmic large DNA viruses (NCLDV) by 21%, and further demonstrate the utility of complementary metagenomic approaches in uncovering diversity of elusive viral entities in addition to microbial life.

Observed changes at Prospect Hill, the longest-running soil warming experimental site at

Harvard Forest, reveal increases in soil microbial respiration, increases in nitrogen mineralization, decreases in soil organic matter and decreases in the overall microbial biomass of these soils in response to warming. Based on these findings, we can expect similar changes to occur at the Barre

Woods warming experiment, which was established at the Harvard Forest LTER site in 2002.

Additionally, we may anticipate similar changes in temperate forest soils as the Earth's climate changes and surface temperatures continue to rise. With these changes, the microbial community must change and adapt to shifting nutrient and substrate availability, moisture conditions and changing soil structure. This dissertation work supports our understanding of the expansion of niches for soil microorganisms with oligotrophic growth strategies and flexible metabolism. These traits will enable soil organisms to cope with a nutrient-limited environment that is predicted to occur in response to long-term climate change.

ix

TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS ...... v

ABSTRACT ...... vii

LIST OF TABLES ...... xvi

LIST OF FIGURES ...... xvii

1. THE 'MICROBIAL CONSORTIUM': THE KEY SCALE FOR INVESTIGATING MICROBIALLY- MEDIATED PROCESSES ACROSS SOIL SCALES ...... 1

1.1. Abstract ...... 1

1.2 Introduction ...... 2

1.3 Advances in soil ecology at higher-level scales ...... 4

1.3.1 ‘Soil core/profile scale' - controls presented by plant roots, mycorrhizal fungi and soil fauna ...... 5

1.3.2 Methodological approaches and challenges at the ‘soil core/profile’ scale ...... 7

1.3.3 The millimeter-scale – including soil micro- and macroaggregates ...... 9

1.3.4 Methodological approaches and challenges at the aggregate scale ...... 12

1.4 The 'microbial consortium': The key scale for investigating microbially- mediated processes in soil ...... 13

1.4.1 Simplifying soils in the lab with microfluidics and ‘soil-on-a-chip’ ...... 16

1.4.2 Mini-metagenomics for capturing diversity at the 'microbial consortium' scale ...... 17

1.4.3 Single-cell investigations of the ‘microbial consortium’ ...... 19

1.4.4 Computational approaches to the ‘microbial consortium’ ...... 21

1.5 Scaling up towards the ecosystem scale ...... 22

1.6 Synthesis ...... 25 x

2. COMPLEMENTARY METAGENOMIC APPROACHES REVEAL BACTERIAL AND ARCHAEAL DIVERSITY IN FOREST SOIL ...... 29

2.1 Abstract ...... 29

2.2 Introduction ...... 30

2.3 Results and Discussion ...... 34

2.3.1 Improved assembly and binning from mini-metagenomes ...... 34

2.3.2 Expansion of phylogenetic diversity ...... 36

2.3.3 Caveats of mini-metagenomics ...... 39

2.3.4 Representation of sorted-MAGs and MAGs Across Terrestrial Soil Metagenomes ...... 40

2.3.5 Biological Insights into Carbon Metabolism in Soil ...... 42

2.3.6 Reduced Carbon Metabolism in an Unclassified Bacteroidetes Clade ...... 45

2.4 Conclusions ...... 47

2.5 Methods ...... 47

2.5.1 Sample Collection and Incubation ...... 47

2.4.2 Sample Preparation and Cell Sorting ...... 48

2.4.3 Mini-Metagenomes ...... 49

2.4.4 Bulk Metagenomes ...... 49

2.4.6 Phylogenetic Tree Construction and Phylogenetic Diversity………..50

2.4.5 Genome Binning and Quality Assessment ...... 50

2.4.7 Gene Recruitment ...... 52

2.4.8 Metabolic Insights ...... 53

3. GENOME-INFORMED ECOPHYSIOLOGICAL CHARACTERIZATION OF NOVEL, UNCULTIVATED SUBDIVISION 2 ACIDOBACTERIA ...... 60

3.1 Abstract ...... 60

3.2 Introduction ...... 61 xi

3.3 Results and Discussion ...... 63

3.3.1 Taxonomic distribution across bulk- and mini-metagenomes reveals abundant acidobacteria in Harvard Forest soils ...... 63

3.3.2 Phylogenomic distribution of acidobacterial MAGs in forest soil expands known diversity of subdivision 2 acidobacteria ...... 65

3.3.3 Comparison of genomic similarity across acidobacteria supports discovery of novel species ...... 66

3.3.4 Genomic potential reflects functional similarities driven by acidobacterial subdivision ...... 68

3.3.5 Carbohydrate degradation potential indicates overlap in metabolism and niche specialization across Acidobacteria ...... 70

3.3.6 Gene expression analysis suggests metabolic flexibility of subdivision 2 acidobacteria ...... 74

3.3.6.1 Evidence of assimilatory sulfur metabolism ...... 75

3.3.6.2 Assimilatory nitrate reduction in subdivision 2 ...... 76

3.3.6.3 Terminal oxidases enable respiration in low oxygen environments ...... 77

3.3.6.4 Hydrogenases enable hydrogen scavenging and utilization ...... 78

3.3.7 Metabolic strategies across Acidobacteria reflect environmental selection and niche specialization ...... 79

3.8 Synthesis ...... 81

3.9 Methods ...... 82

3.9.1 Sample Collection and Processing ...... 82

3.9.2 Mini-Metagenomics ...... 82

3.9.3 Bulk Metagenomics ...... 83

3.9.4 Metatranscriptomics ...... 83

3.9.5 Genome Binning and Quality Assessment ...... 84

3.9.6 Distribution of taxa in bulk and sorted-MAGs ...... 85

xii

3.9.7 Taxonomic distribution of extracted 16S rRNA gene sequences ... 85

3.9.8 Phylogenomic analysis of MAGs and Sorted-MAGs ...... 86

3.9.9 Average Nucleotide Identity and Amino Acid Identity ...... 86

3.9.10 Metabolic Insights from MAGs ...... 86

3.9.11 Estimation of gene expression in Acidobacteria ...... 87

3.9.12 Detecting hydrogenases in acidobacterial MAGs ...... 88

4. HIDDEN DIVERSITY OF SOIL GIANT VIRUSES ...... 101

4.1 Abstract ...... 101

4.2 Introduction ...... 101

4.3 Results ...... 103

4.3.1 Mini-metagenomics facilitated the discovery of giant viruses .... 103

4.3.2 Cell-sorted viral particles expand known diversity of NCLDV ...... 105

4.3.3 Genomic features of soil giant viruses ...... 106

4.3.4 Genome novelty of soil giant viruses ...... 107

4.3.5 True diversity of giant viruses in forest soil ...... 108

4.4 Discussion ...... 109

4.5 Methods ...... 111

4.5.1 Sampling and sample preparation ...... 111

4.5.2 Mini-Metagenomes ...... 112

4.5.3 Bulk Metagenomes ...... 112

4.5.4 Metatranscriptomes ...... 113

4.5.5 Metagenome Binning ...... 113

4.5.6 Screening for giant viruses ...... 113

4.5.7 Annotation and quality control of viral genome bins ...... 114

4.5.8 Experimental benchmarking of the mini-metagenomics approach ...... 114

xiii

4.5.9 Computational benchmarking of giant virus metagenomic binning ...... 115

4.5.10 Phylogenomics ...... 115

4.5.11 Major capsid protein analysis ...... 117

4.5.12 Gene sharing network ...... 117

APPENDICES

A. SUPPLEMENTARY TABLES ...... 123

B. SUPPLEMENTARY FIGURES ...... 125

BIBLIOGRAPHY ...... 138

xiv

LIST OF TABLES

Table Page

Table 3.1. Number of reads assigned to Acidobacteria from the mini- and bulk metagenome datasets using RDP Classifier...... 87

Table 3.2. CheckM quality assessment of bulk and sorted-MAGs classified to the phylum Acidobacteria...... 88

Table 3.3. Number of expressed KEGG genes in subdivision 2 acidobacteria draft genomes...... 89

Table 3.4. Number of expressed CAZy genes in each subdivision 2 acidobacteria genome...... 90

Table S1. RDP Classifier results for full-length 16S sequences extracted from acidobacteria MAGs...... 122

xv

LIST OF FIGURES

Figure Page

1.1. Conceptual image depicting the different scales at which processes occur in soil and exemplifying major process controls at each scale ...... 27

1.2. Diversity contributes to the complexity of soil ecosystems ...... 28

2.1. Overview of mini-metagenome and bulk metagenome approaches used in this study ...... 54

2.2. Assessment of sorted-MAG and MAG quality ...... 55

2.3. Phylogenetic diversity of soil taxa identified in this study ...... 56

2.4. Comparison of MAGs from this study with published data from terrestrial metagenomes ...... 57

2.5. Insights into carbon metabolism within the phylum Bacteroidetes ...... 58

3.1. Taxonomic distribution in mini- and bulk metagenomes ...... 91

3.2. Distribution of acidobacterial clades across the bulk metagenome and mini-metagenome samples ...... 92

3.3. Concatenated marker gene tree of acidobacterial MAGs and references ...... 93

3.4. Quality assessment of MAGs and reference MAGs from GTDB ...... 94

3.5 Maximum likelihood tree of acidobacterial subdivisions 1,2,3,4,6 and 8 based on full-length 16S rRNA gene...... 95

3.6. PCoA of COG/NOG genes across Acidobacteria subdivisions ...... 96

3.7. PCoA of CAZy gene families across Acidobacteria subdivisions ...... 97

3.8. Metabolic diagram of the 5 subdivision 2 acidobacteria draft genomes ...... 98

4.1. Discovery pipeline for soil giant viruses ...... 117

4.2. Expansion of NCLDV diversity by soil giant viruses ...... 118

4.3. Genome novelty of soil giant viruses ...... 119

4.4. Hidden diversity of giant viruses in bulk metagenomes ...... 120

xvi

S1. Full concatenated marker gene tree of acidobacterial MAGs and references...... 124

S2. Genomic similarity across the 5 subdivision 2 acidobacteria characterized in this study ...... 125

S3. Genomic similarity across subdivision 1 acidobacteria ...... 126

S4. PcoA clustering COG/NOG categories across acidobacteria MAGs and reference genomes...... 127

S5. Heatmap of EggNOG genes across Acidobacteria subdivisions ...... 128

S6. PcoA clustering CAZy categories across acidobacteria MAGs and reference genomes ...... 129

S7. Heatmap of CAZy gene families across acidobacterial subdivisions ...... 130

S8. Relative abundance of CAZy gene families across acidobacteria MAGs and reference genome sequences ...... 131

S9. Phylogenetic tree of group 1h/5 [NiFe] hydrogenases ...... 132

S10. Glycoside hydrolases expressed across subdivision 2 acidobacteria MAGs ...... 133

S11. Relative abundance of CAZy gene families expressed across subdivision 2 MAGs ...... 134

S12. Distribution of acidobacterial classes in the mineral and organic soil horizons ...... 135

xvii

CHAPTER 1

1

THE 'MICROBIAL CONSORTIUM': THE KEY SCALE FOR INVESTIGATING MICROBIALLY-MEDIATED

PROCESSES ACROSS SOIL SCALES1

1.1. Abstract

Microorganisms are abundant and diverse in soils, and constitute the main drivers of biogeochemical processes. They govern processes across different scales – from the single-cell

(µm) to the ecosystem (Km). Yet, across these scales, processes are controlled by different factors

- from the genetic-physiological capabilities at the single-cell level to land-use and vegetation type at the ecosystem scale. To fully appreciate ecosystem function, we need to understand the composition of microbial assemblages, the activities and interactions of microorganisms, and the controls governing them across these scales. In this perspective, we discuss different spatial scales that should be considered when investigating soil microbial processes along with their controls.

These scales include (1) the ‘soil core/profile’ scale at which plant roots and mycorrhizal fungi interact with bacteria; (2) soil macroaggregates and microaggregates; and (4) the scale at which single cells interact within ‘microbial consortia’. Until now, most investigations have been made at discrete scales, from the ‘soil core/profile’ to the ecosystem-scale, with more recent advances at the single-cell and soil aggregate levels. We propose that the ‘microbial consortium’ is a key scale that influences processes across scales, due to emergent properties captured neither at the single cell nor ecosystem scale. Due to technical challenges this scale remains underexplored, and we propose more refined methods and tools for these investigations.

1 Alteio LV, Eichorst SA, Kaiser C, Katz LA, Richter A, Wanek W, Woebken D. and Woyke, T. Toward the understanding of spatial scale: Microbial consortia as the key scale for investigating soil processes. In prep for submission to FEMS. Authors listed in alphabetical order.

2

1.2 Introduction

Soils are among the most diverse and complex habitats for microbial life on the planet with estimates of 109 to 1010 cells per gram of soil, harboring 1,000 to 1,000,000 species within this gram (e.g. R. Amann & Rosselló-Móra, 2016; Gans, Wolinsky, & Dunbar, 2005; Jay T Lennon &

Locey, 2016; Locey & Lennon, 2016) As regulators of ecosystem processes, understanding taxonomic diversity and functional redundancy within soil microbial communities is critical for predicting habitat resilience under conditions of environmental change (Allison & Martiny, 2008;

Delgado-Baquerizo et al., 2016).⁠ However, the extensive microbial diversity and the function carried out by members of the microbial community remains largely uncharacterized due to methodological challenges. The complex physicochemical properties of soil in addition to high levels of spatial heterogeneity make accurate estimates of diversity and functional potential difficult to attain (Nemergut et al., 2013).⁠

The main driver of microbial diversity in soil communities is presumed to be spatial heterogeneity of soil as this provides ecological opportunities for various microbial species, populations and communities (Nagy, Ábrahám, Keymer, & Galajda, 2018; Nemergut et al., 2013).⁠

However, studies of the spatial properties of soil are limited (as reviewed in Baveye et al., 2018),⁠ thereby restricting scientific understanding of this key property in shaping microbial community assemblages, and driving higher-level processes. Soil spatial heterogeneity plays a role in the diversity and function of microorganisms across scales - from the µm scale at which single cells interact, to the meter scale where large-scale ecosystem processes occur (Štursová, Bárta,

Šantrůčková, & Baldrian, 2016; Vos, Wolf, Jennings, & Kowalchuk, 2013). Yet the controls across these scales vary considerably. For example, the edaphic properties stemming from soil

3

heterogeneity result in patchy distribution of nutrients, generating ‘hot spots’ of microbial activity and diversification (Rillig, Muller, & Lehmann, 2017).

Prior studies have implicated microorganisms as drivers of biogeochemical cycles, and cultivation and community omics have further expanded knowledge of taxonomic and functional diversity. Numerous field and laboratory incubation experiments have been established that encompass soils across biomes (e.g. Bond-Lamberty et al., 2016; Hultman et al., 2015; Y. Luo, Wan,

Hui, & Wallace, 2001; Melillo et al., 2017; Noh et al., 2016; Tveit, Urich, & Svenning, 2014).⁠ These studies use a broad range of technical approaches to characterize microbial diversity and link this to ecosystem level process measurements. Research at the ecosystem scale includes broad measurements of nutrient and gaseous fluxes (e.g. Carey et al., 2016; Frey, Lee, Melillo, & Six,

2013; Jerry M Melillo et al., 2011; Tucker, Bell, Pendall, & Ogle, 2013),⁠ as well as biomass (e.g. S.

D. Frey, Drijber, Smith, & Melillo, 2008; Spohn et al., 2016) ⁠ and phospholipid fatty acid (PLFA) estimates of microbial community structure (Watzinger, 2015).⁠

Recently, technical challenges that result from the expansive microbial diversity and spatial heterogeneity have been addressed using more refined approaches to studying the diversity of soil microorganisms. These approaches include cell-sorting coupled to high- throughput sequencing, microfluidics, and bioinformatic investigations of microorganisms in co- occurrence networks (Berry & Widder, 2014; Ofaim et al., 2017).⁠ Although studying microbial communities using these approaches elucidates diversity and activity at discrete scales, it does not reveal the emergent properties of soil ecosystems which become apparent through measurements of interactions within and across scales. Investigations of microbial assemblages across scales remain key in order to link microbial diversity and function to higher-level ecosystem processes.

4

In this perspective, we address different soil scales that should be considered when investigating microbial processes along with their controls, namely (1) soil cores/profile scale; (2) the soil aggregate scale including micro- and macroaggregates; and (3) the ‘microbial consortium’ scale at which single cells interact (Figure 1.1). We further discuss potential methods to investigate the controls of microbial community structure and functions across the different scales. We propose that the underexplored ‘microbial consortium’ is a key scale that can have significant influences on many larger scale biogeochemical processes. It is the aim of this perspective to encourage investigators to take more refined approaches in their studies of soil communities, particularly with a focus on the ‘microbial consortium’.

1.3 Advances in soil ecology at higher-level scales

1.3.1 ‘Soil core/profile scale' - controls presented by plant roots, mycorrhizal fungi and soil fauna

The ‘soil core/profile’ scale has been extensively investigated across soil environments.

The classical approach of collecting and homogenizing soil cores is found in countless studies

(reviewed in Fierer, 2017); however the ‘soil core’ represents a rather artificial scale, reflecting the needs of the experimenters and available analytical methods. Nonetheless, this is the scale from which estimates of 109 to 1010 microbial cells and 1,000 to 1,000,000 species per gram of soil have been derived (R. Amann & Rosselló-Móra, 2016; Gans et al., 2005; Jay T Lennon & Locey, 2016;

Locey & Lennon, 2016a).⁠ This microbial diversity is hypothesized to be a control of ecosystem level processes even though the details of these processes occur at much smaller scales that are not accounted for due to homogenization of soil within cores (Bach, Williams, Hargreaves, Yang, &

Hofmockel, 2018).⁠ Perhaps this is one of the reasons that the exact influence of microbial diversity and community structure on these processes has generally remained elusive (Prosser et al., 2007).⁠

5

Studies at the ‘soil core/profile’ scale indicate that the mineral properties of soils, as well as plant roots and mycorrhizal associations are the main controls at this level. Plants and mycorrhizal fungi physically modify soil structure via fragmentation and formation of aggregates

(Angers & Caron, 1998) ⁠ which stabilize nutrients and generate microhabitats for bacteria (Bach et al., 2018; Six & Paustian, 2014).⁠ Plant roots introduce pulses of nutrients to the soil in the form of labile exudates, a process termed the ‘rhizosphere priming effect’ (Bengtson, Barker, & Grayston,

2012; Cheng, 2009).⁠ In addition to contributing to soil aggregation and stimulating microbial decomposition of recalcitrant soil organic matter, this ‘priming effect’ also promotes root colonization of mycorrhizal fungi through the release of strigolactones and labile carbon exudates

(Bengtson et al., 2012; Bonfante & Anca, 2009; Bouwmeester, Roux, Antonio Lopez-Raez, & Card,

2007; Cheng, 2009; Kuzyakov, 2010).⁠ Mycorrhizal fungi uptake, release, and transport distant nutrient sources (Gorka et al., 2019),⁠ and as such they also serve as ‘hyphal highways’ for bacterial dispersal and access to nutrients throughout the soil, as well as promoting microenvironment connectivity (Roux et al., 2017; Simon et al., 2015; Warmink, Nazir, Corten, & van Elsas, 2011).⁠

Other soil biota, including micro-, meso- and macrofauna, act as biological controls on the

‘soil core/profile’ scale by modifying soil structure and maintaining bacterial population sizes through trophic interactions (Geisen et al., 2018; Siddiky, Kohler, Cosme, & Rillig, 2012).

Microfauna include protists and nematodes (Bardgett & Van Der Putten, 2014; Stork & Eggleton,

1992) ⁠ which are found in cell densities of 104 to 107 individuals per m2 of soil (Bardgett & Van Der

Putten, 2014; Vinciguerra, 2009).⁠ These microfauna interact with bacteria and fungi through predation, thereby controlling population densities and making nutrients available to other trophic levels (Bardgett & Van Der Putten, 2014; Geisen et al., 2018; Vinciguerra, 2009).⁠

Mesofauna include invertebrate species such as mites (Acari), springtails (Collembola) and

6

millipedes, that strongly influence the fungal populations through mycophagy (Stork & Eggleton,

1992) ⁠ and facilitate litter fragmentation (Lavelle, 1997).⁠ Macrofauna, including earthworms, termites and ants, are commonly referred to as ‘ecosystem engineers’ as they mechanically alter the physical structure and chemical composition of their soil surroundings (Stork & Eggleton,

1992).⁠ Together, soil fauna control bacterial population dynamics, re-distribute and prime substrates for further decomposition, and generate ‘hot-spots’ of microbial activity across soil ecosystems (Kuzyakov & Blagodatskaya, 2015).⁠

1.3.2 Methodological approaches and challenges at the ‘soil core/profile’ scale

Across the ‘soil core/profile’ scale, numerous techniques have been applied to understand the microbial community. Historically studied through laboratory cultivation, only a minority (0.5% to 1% of the total diversity) of soil bacteria can grow in the lab (e.g. Fierer, Bradford, & Jackson,

2007; Lloyd, Steen, Ladau, Yin, & Crosby, 2018; Rappé & Giovannoni, 2003; Torsvik & Øvreås, 2002)

Alternative approaches to strain isolation aim to preserve some of the cross-feeding interactions between bacteria by focusing on communities (D’Souza et al., 2018) ⁠ These include the isolation chip (ichip; Nichols et al., 2010) and microfluidic-based cultivation (Burmeister et al., 2019),⁠ both of which maintain conditions more similar to in situ soil environments.

Challenges associated with laboratory cultivation have been circumvented using cultivation-independent sequencing approaches. Amplicon surveys have been applied in numerous studies at the ‘soil core scale’, revealing shifts in soil community structure under changing environmental conditions (e.g. Bond-Lamberty et al., 2016; Lladó, Žifčáková, Větrovský,

Eichlerová, & Baldrian, 2016). However, PCR-based approaches are dependent on primers, which may not adequately amplify all organisms in a community, resulting in bias regarding organismal

7

abundance (Eloe-Fadrosh, Ivanova, Woyke, & Kyrpides, 2016).⁠ High-throughput sequencing technologies and improved nucleic acid extraction kits have made community sequencing more accessible and efficient (Van Dijk, Lè Ne Auger, Jaszczyszyn, & Thermes, 2014).⁠ Metagenomic and metatranscriptomic approaches target nucleic acids (DNA or RNA) from the total community, unraveling the microbial community structure and gene expression without primer amplification bias (Eloe-Fadrosh et al., 2016).⁠ Insights into soil community structure and metabolic potential have been achieved using these techniques; however, significant challenges persist in high- throughput sequence data at the ‘soil core/profile’ scale including lack of available bacterial reference genome sequences and poor bioinformatic sequence assembly (Figure 1.2).

Stable isotope probing (SIP) experiments are more targeted approaches to accessing a subset of the microbial community (e.g. Angel et al., 2018; Dumont & Murrell, 2005; Eichorst &

Kuske, 2012; Radajewski, McDonald, & Murrell, 2003).⁠ In SIP approaches, grams of soil are incubated with an isotopically labeled substrate (e.g. using 15N-, or 13C-labeled compounds) or

18 water (e.g. H2 O, D2O). These isotopes become incorporated into microbial biomarkers (e.g. DNA,

RNA, PLFA or proteins), allowing investigators to distinguish active from inactive microorganisms by analyzing stable isotope enrichment of the respective biomarker (Murrell & Whiteley, 2011).⁠

When coupled with genomics-based methods, investigators can target specific functional groups within the soil community and elucidate the influence of microorganisms on a given process (e.g.

Dumont & Murrell, 2005; Murrell & Whiteley, 2011; Radajewski, Ineson, Parekh, & Murrell, 2000).⁠

Recently, a higher-throughput method has been developed, that combines high-density phylogenetic microarrays (‘chips’) with SIP (CHIP-SIP; Mayali et al., 2012).⁠ This highly sensitive approach enables quantification of stable isotope incorporation (such as 13C or 15N) using secondary ion mass spectrometry (NanoSIMS), coupled to taxonomic identification of ribosomal

8

RNA using microarrays (Mayali et al., 2012).⁠ This approach has been applied to estuarine (Mayali et al., 2012) ⁠ and marine samples (Mayali, Stewart, Mabery, & Weber, 2016; Mayali, Weber,

Mabery, & Pett-Ridge, 2014),⁠ but still awaits its application on soil samples.

1.3.3 The millimeter-scale – including soil micro- and macroaggregates

Soil aggregate surfaces and the pore spaces between aggregates are considered ‘hot spots’ of microbial activity (Kuzyakov & Blagodatskaya, 2015),⁠ and therefore comprise a highly relevant scale for investigating microbial processes. Soil aggregates are hierarchical constructs containing building blocks of different sizes: microaggregates are defined as <250 µm in diameter and comprise organo-mineral complexes <53 µm (also termed “silt and clay-sized aggregate fraction”), whereas macroaggregates are defined as >250 µm aggregates (also termed “sand-sized aggregate fraction”) (Six & Paustian, 2014; Totsche et al., 2018).⁠ Microbial process controls at the microaggregate scale include organo-mineral interactions, aggregate stability and pore size distribution (Jastrow, Miller, & Lussenhop, 1998).⁠ Process controls at the macroaggregate scale include soil micro-architecture, as well as fungal-bacterial and plant-microbe interactions (Poirier,

Angers, & Whalen, 2014) ⁠ (Figure 1.1).

Progress in understanding the biological component of soil microenvironments at the aggregate scale, has remained slow relative to our understanding of the physical and biogeochemical properties of soils (Baveye et al., 2018). Studies at the soil aggregate scale have revealed that the abundance of specific microbial groups differs across aggregate size classes: for example, fungi are predominantly associated with coarse-scale fractions or macroaggregates, whereas bacteria and archaea are found to a greater extent in the smallest aggregate size fractions

(Helfrich, Ludwig, Thoms, Gleixner, & Flessa, 2015; Hemkemeyer, Christensen, Martens, & Tebbe,

9

2015; Poll, Thiede, Wermbter, Sessitsch, & Kandeler, 2003).⁠ This distribution of microorganisms across aggregate size fractions can be attributed to the higher surface area of clay-sized particles in microaggregates compared to macroaggregates, which facilitate the colonization and growth of bacteria through extracellular polysaccharide matrix formation (Sessitsch, Weilharter, Gerzabek,

Kirchmann, & Kandeler, 2001).⁠ In contrast, larger aggregates comprised of coarser particles provide habitat for fungal hyphae (Gupta, Bhandari, & Naushad, 2012; Kögel-Knabner et al., 2008).⁠

Smaller aggregate size fractions may provide refuge for bacteria from grazers, reduce competition for substrates, and serve as a mechanism for the physical protection of carbon in soils (Bach &

Hofmockel, 2016).⁠

Soil aggregates are considered hotspots of microbial activity. Some studies found that carbon-degrading enzymes showed higher activity in microaggregates compared to macroaggregates, whereas nitrogen-degrading enzymes (N-acetylglucosaminidase) were higher in macroaggregates (Bach & Hofmockel, 2016; Nie, Pendall, Bell, & Wallenstein, 2014).⁠ Recent investigations illustrated that the ratio of bacteria to fungi and of Gram-positive to Gram-negative bacteria strongly correlated with aggregate size and turnover rates of soil organic carbon (SOC) in microaggregates, whereas the presence of bacterivorous nematodes promoted the turnover of

SOC in macroaggregates (Jiang et al., 2018).⁠ Yet in other investigations, no significant differences in diversity were observed between the microbial communities of aggregates grouped into classes of high and low beta-glucosidase activity, suggesting no relationship between microbial community structure and ecosystem function (Bailey, Fansler, Stegen, & McCue, 2013).

More recently soil aggregates have been proposed to be ‘incubators’ for microbial evolution (Rillig et al., 2017).⁠ At the time of formation of an aggregate, a select collection of microorganisms is present, and during the subsequent steps of aggregate stabilization these

10

communities can evolve and are later released into the soil upon disintegration of the aggregates

(Rillig et al., 2017).⁠ This suggests microorganisms inhabiting soil aggregates are responding not only to changes in the surrounding soil aggregate environment, but also to interactions with their microbial neighbors. Given the varying effect(s) of aggregate class size on processes, perhaps the variation is due to the composition or genetic potential of the evolving soil aggregate consortium.

Taken together, it is plausible that this consortium could not only influence processes within soil aggregates, but also bulk scale processes upon aggregate disintegration illustrating the power of microbial interactions on function across scales.

1.3.4 Methodological approaches and challenges at the aggregate scale

There have been recent advances in individual-based modeling of microorganisms in soil aggregates to predict biogeochemical fluxes based on anaerobic versus aerobic conditions

(Ebrahimi & Or, 2016).⁠ However, this approach is facing challenges in incorporating additional factors (e.g. the influence of soil depth on soil organic content and quality, microbial community composition, preferential flow and root distribution and activity) into these models, and in scaling up to the ecosystem scale. Other methods are available to explore the interactions and/or structure of soil aggregates. For example, the pore structure of microaggregates can be characterized down to 1 µm resolution using micro-computed tomography (µCT) (Vos et al., 2013).⁠

This approach can be coupled to microfluidics, targeted investigations at the micrometer-scale via the manipulation and exchange of fluids, (Aleklett et al., 2018; Stanley, Grossmann, Casadevall i

Solvas, & deMello, 2016; Wessel, Hmelo, Parsek, & Whiteley, 2013)(Aleklett⁠ et al., 2018; Stanley,

Grossmann, Casadevall i Solvas, & deMello, 2016; Wessel, Hmelo, Parsek, & Whiteley, 2013) for downstream single-cell analysis (discussed further in later section).

11

The use of fluorogenic (e.g. methylumbelliferol, MUF) or fluorescence-resonance energy transfer (FRET) substrates may help finding the locations of extracellular enzyme activity in soil aggregates. This approach combines fluorescence and confocal microscopy at microscales (Kovarik

& Allbritton, 2011).⁠ Although challenging, the three-dimensional structure of soil aggregates can be preserved with resin embedding and sectioning for individual cell visualization at the micrometer scale (Vos et al., 2013).⁠ When this approach is combined with the use of stable isotope-labeled substrates, single-cell visualization and isotope composition analysis (such as FISH and NanoSIMS – more detailed discussion in subsequent section) for targeted process investigations, it holds tremendous potential for understanding the distribution of microorganisms, their function and interaction at this scale.

1.4 The 'microbial consortium': The key scale for investigating microbially-mediated processes in soil

In the environment, microorganisms live within social networks with other microorganisms, termed ‘microbial consortia’. Microorganisms in consortia are distributed within direct contact distance, promoting interaction with organisms of the same species, as well as other species and the surrounding environment. Examples of ‘microbial consortia’ include a population of a bacterium in a batch culture or a community of interacting microorganisms in a soil aggregate.

‘Microbial consortia’ can be comprised of any number of combinations of bacteria, archaea and single-cell eukaryotes all of which interact through the production and exchange of metabolites, competition for substrates and/or suppression by antagonistic compounds (D’Souza et al., 2018;

Pande & Kost, 2017).⁠

There are two main mechanisms of microbial interactions in soil: (i) interspecies metabolic cross-feeding that occurs ubiquitously in microbial environments (D’Souza et al., 2018; Hug & Co,

12

2018; Zelezniak et al., 2015),⁠ and (ii) chemical signaling or microbial cell-to-cell communication

(DeAngelis, 2016). Metabolic cross-feeding, i.e., if one microorganism uses metabolites produced by another as energy or nutrient sources, allows for diverse ecological interactions including competition, mutualism and cheating (D’Souza et al., 2018; Estrela, Trisos, & Brown, 2012; Morris,

2015).⁠ Interspecific bacterial communication can stem from the release of quorum sensing compounds, such as N-acyl-homoserine lactones in Gram-negative microorganisms or ϒ- butyrolactones in Gram-positive microorganisms (Teplitski, Robinson, & Bauer, 2000).⁠ The number of interactions between individuals as well as the distance of the interactions is limited by the

‘calling distance’ of these compounds. Bacteria cells are estimated to interact within 12.5 to 20

µm distances in the soil, enabling interactions with an average of 120 individuals and <100 other species (Raynaud & Nunan, 2014).⁠

Although soil microorganisms are frequently referred to as being functionally redundant or generalists (Allison & Martiny, 2008),⁠ we hypothesize that members of a microbial consortium are instead functional specialists, investing energy in a specific task for economical use of substrates. In support of this working hypothesis, studies have found that the type and amount of nutrients are heterogeneously distributed at the nanoscale (Lehmann et al., 2008),⁠ giving rise to resource specialization potential even at the smallest scale (Vos et al., 2013).⁠ Instead of microorganisms competing for the same resource, the presence of nutrients in different forms generates additional niches leading to a diversification of resource specialists (MacLean, 2005).⁠ As such, this strategic resource allocation reduces or abolishes competition for extracellular resources and can minimize the accumulation of intermediates, which could generate negative feedback effects (Lindemann et al., 2016).⁠

13

The idea of specialists within a microbial consortium reflects the emergent properties of an ecological system as described by (Nielsen & Müller, 2000),⁠ where subunits (here the microbial cells) cannot exist in isolation, and the interactions of the subunits emerge at a higher level as new properties (here functions). Microbial consortia for a given process have been documented previously in other systems to increase the efficiency and/or rate of a targeted process (Brenner,

You, & Arnold, 2008).⁠ As such, we propose that the consortium of interacting microorganisms is critical and thus the key scale, having significant influences on community metabolism, and thereby on soil processes and ecosystem functions. Our proposed ‘microbial consortium’ is the basis of microbial assemblages at higher spatial scales, including soil aggregates, and communities associated with plant roots and mycorrhizal fungi.

Although investigations at the ‘microbial consortium’ scale are still in their infancy due to technical limitations, the results could be groundbreaking for a better understanding of these microbial interactions and their subsequent effect on processes. We propose that these micro- scale microbial interactions can be explored by (1) microfluidics, such as the recently described

‘soil-on-a-chip’, (2) sequencing mini-metagenomes, (3) single-cell level investigations, and (4) computational investigations, such as co-occurrence networks and modeling. Taken together, these approaches provide insight into the genetic and metabolic diversity of microbial communities with finer resolution than soil community studies at the soil core and aggregate scales.

1.4.1 Simplifying soils in the lab with microfluidics and ‘soil-on-a-chip’

The complexity of microbial interactions within ‘microbial consortia’ warrants the application of more refined approaches to studying the microbial members and their activity in the native environment. Much of what is known about microbial diversity has come from high- 14

throughput sequencing approaches at the ‘soil core/profile’ scale, which ignore potential interactions between organisms and their environment. Microfluidic systems including ‘soil-on-a- chip’ were developed to investigate soil microorganisms and their interactions in situ (Aleklett et al., 2018; Stanley et al., 2016).⁠ Microfluidics permits precise spatiotemporal control and high- resolution imaging to examine interactions from consortium-scale down to the single-cell level

(Stanley et al., 2016).⁠

The use of microfluidics in soils has recently been proposed to address fundamental questions in the field of microbial ecology, namely in the understanding of physical heterogeneity, chemical gradients, microbial interactions, rhizosphere interactions and cultivating the ‘yet-to-be- cultivated’ (Aleklett et al., 2018; Massalha, Korenblum, Malitsky, Shapiro, & Aharoni, 2017; Nagy et al., 2018).⁠ The construction of artificial soil habitats enables investigation of ecological and evolutionary processes on a chip (Nagy et al., 2018). For example, the spatial heterogeneity of soil and the resulting patchy distribution of nutrients can be simulated using this approach (Aleklett et al., 2018; Nagy et al., 2018).⁠

1.4.2 Mini-metagenomics for capturing diversity at the 'microbial consortium' scale

The combination of genomics and cell sorting provides excellent resolution for investigating the activity of cells in microbial consortia, and may facilitate investigation of microbial intra- and interspecies interactions that play important roles at higher-level scales. Cell-sorting and sequencing has given rise to single-cell genomics, a more recent addition to the toolkit for studying the genetic makeup of uncultivated microbial cells from environmental samples (Blainey, 2013a;

Stepanauskas, 2012; Woyke, Doud, & Schulz, 2017).⁠ In this approach individual cells are isolated from consortia using fluorescence-activated cell sorting (FACS) or microfluidic sorting, and their

15

genomes are subjected to whole genome amplification after cell lysis, followed by sequencing to generate single amplified genomes (SAGs) (Stepanauskas et al., 2017; Woyke, Doud, & Schulz,

2017b).⁠ While this approach provides genomic information with strain-level resolution, the resulting SAG assemblies are often highly fragmented and incomplete, and the overall process is prone to biases and contamination. These challenges are further conflated by the high genetic diversity in soils in conjunction with the low-throughput of single cell workflows. Furthermore, these single cell approaches may not be adequate to capture the entire collection of organisms at the ‘microbial consortium’ scale.

One alternative approach, termed ‘mini-metagenomics’, sequences a small pool of sorted cells (<100 cells), thus allowing an investigator to characterize the genetic potential and associated of microorganisms from heterogeneous environments. Mini-metagenomics was first used in combination with shotgun sequencing to study hospital sink biofilm communities, which are comprised of highly diverse and potentially pathogenic organisms (McLean et al., 2013a).⁠

More recently, this approach was used in combination with microfluidics technology to explore microbial consortia in hot spring samples (Berghuis et al., 2019; Yu et al., 2017).⁠ For the first time, this mini-metagenomic approach was applied to soil from the Harvard Forest Long-Term Ecological

Research (LTER) site and facilitated the expansion of bacterial, archaeal, and giant virus diversity

(Alteio⁠ et al., in revision; Schulz et al., 2018).⁠ Sequencing a smaller pool of organisms within the highly diverse soil microbial community may result in better coverage of genomes, and the potential for improved genome sequence assembly (Figure 1.2; Yu et al., 2017).⁠ However, this mini-metagenomic approach does not preserve the overall structure and chemical composition of soil environments, thereby still representing a selective view of the microbial consortium.

16

The mini-metagenome approach is particularly attractive for studying microbial “dark matter” (Rinke et al., 2013),⁠ organisms which have not been isolated using culture. Improved sequence assembly capabilities may provide opportunities to capture lower abundance community members that are otherwise overlooked in bulk metagenome studies. In addition to contributing potentially novel diversity to genome sequence databases, reduction of soil community diversity and complexity by sampling subpopulations could refine genome-level resolution to the species- or even strain-levels (Yu et al., 2017).⁠ Accessing microbial diversity at this high resolution may enable quantification of selection in microbial populations using comparative genomics, resulting in improved understanding of how microorganisms change in response to environmental pressures.

Although mini-metagenomics does not preserve the spatial distribution or in situ soil conditions as in a microfluidic system, pooled cell sorting may enable investigation of close cell- cell interactions and symbioses. For example, a recent study using fluorescence activated cell sorting resulted in the co-sorting of Nanoarchaeota and their putative hosts, and enabled the identification of genes involved in maintaining this symbiotic relationship (Jarett et al., 2018).⁠

Similarly, cell sorting approaches of single- or pooled cells could decrease the complexity of entire

‘microbial consortia’ in soils, and presents opportunities for a closer look into how cells metabolize, interact and diversify in the soil environment.

1.4.3 Single-cell investigations of the ‘microbial consortium’

‘Microbial consortia’ and interactions therein can also be investigated in situ at the single- cell level with the use of stable isotopes (such as 13C, 15N- or 18O -labeled substrates) coupled with high-resolution techniques, such as secondary ion mass spectrometry (NanoSIMS) and Raman microspectroscopy. Similar to SIP approaches applied at the ‘soil core/profile’ scale, single-cell SIP 17

studies enable tracing of isotopes from labelled substrates into biomarkers within active cells.

Secondary Ion Mass Spectrometry (SIMS) is a surface analysis technique (Benninghoven,

Rüdenauer, & Werner, 1987) ⁠ that can be used to detect and quantify the abundance of rare isotopes (13C, 15N or 18O) in metabolically active cells relative to that of the common isotope (12C,

14N or 16O). Embedding samples in a resin and sectioning prior to SIMS can preserve information about the soil sample and associated ‘microbial consortia’ (Vidal et al., 2018).⁠ Another approach to SIP studies is Raman microspectroscopy, a rapid, non-destructive vibrational spectroscopic method which provides information on the molecular composition of a sample (S. Eichorst &

Woebken, 2014; W. E. Huang et al., 2007; Jarvis & Goodacre, 2004).⁠ The incorporation of stable isotopes in the cellular components gives rise to changes in Raman spectral signatures, allowing investigators to detect and quantify isotope tracer incorporation (S. Eichorst & Woebken, 2014),⁠ as shown for naphthalene degradation (W. E. Huang et al., 2007), carbon dioxide fixation (M. Li et al., 2012) ⁠ and general microbal activity (Berry et al., 2015; S. A. Eichorst et al., 2015).⁠

Fluorescence approaches also allow investigation of microbial consortia. Translationally active microorganisms have been identified in aquatic systems using an approach called bioorthogonal non-canonical amino acid tagging (BONCAT). This method involves the incubation of samples with a methionine amino acid analog. After performing click-chemistry, these non- canonical amino acids become fluorescently labeled and can be sorted using FACS. Although this approach has not yet been applied in soils, it will enable detection of metabolically active cells

(Hatzenpichler et al., 2016).⁠ The combination of NanoSIMS with cell identification via fluorescence in situ hybridization (FISH) (Rudolf Amann & Fuchs, 2008) ⁠ has proven very useful in linking the identity of a microorganism with its function in environmental samples (Clode et al., 2009; Musat

18

et al., 2008; Orphan, House, Hinrichs, McKeegan, & DeLong, 2001; Ploug et al., 2011; Woebken et al., 2012).⁠

FISH-NanoSIMS has been increasingly applied to aquatic samples, and with recently developed sample processing steps can also be applied to soil samples (S. A. Eichorst et al., 2015).⁠

The combined application of these single-cell techniques to soil samples allows one to not only maintain the sample’s structural information, but also reveal the identity of microorganisms, their activities and interactions within a ‘microbial consortium’. These approaches, although applied at the scale of the single cell within a consortium, can provide valuable information about the taxonomic identity and metabolic activity of organisms within the more complex ‘microbial consortium’. Deconstructing these complex interaction networks will enable improved understanding of microbial diversity, metabolism and interactions with higher resolution than approaches at the ‘soil core’ or aggregate scales, and presents opportunities for scaling from the single cell to higher-level ecosystem process.

1.4.4 Computational approaches to the ‘microbial consortium’

Exploration of microbial interactions have been conducted using bioinformatic analyses of amplicon and metagenomic datasets (Berry & Widder, 2014),⁠ coined co-occurrence networks, although rarely at the ‘microbial consortium’ level. These computational analyses provide a comprehensive, albeit indirect, assessment of potential microbial interactions within a community

(Barberán, Bates, Casamayor, & Fierer, 2012).⁠ Furthermore, microbial co-occurrence patterns can also be applied at higher scales of biological organization, for instance within and across ecosystems, thereby capturing emergent properties of ecological systems (Baveye et al., 2018;

Williams, Howe, & Hofmockel, 2014).⁠

19

More recently, the use of spatially-explicit, individual-based computer models has emerged to better elucidate spatial interactions between individuals (O’Donnell, Young, Rushton,

Shirley, & Crawford, 2007).⁠ These models incorporate aspects of soil architecture, chemistry and physics to dynamically determine the distribution and interactions between microorganisms and the environment (Baveye et al., 2018; O’Donnell et al., 2007).⁠ In turn, models can couple these interactions to metabolic functions, thereby describing the biogeochemical functioning of communities under different environmental conditions. For example, these computer-based models have shown that self-organizing and self-regulating features can emerge from these first- principle interactions in simulations of microbial decomposer communities (Allison, 2005; Kaiser,

Franklin, Dieckmann, & Richter, 2014; Kaiser et al., 2015; Momeni, Waite, & Shou, 2013).⁠

Moreover, changing environmental conditions has been shown to be a potential control for ecosystem-scale processes (Allison, 2005; Kaiser et al., 2014, 2015). The ability of computer models to accurately predict community function and dynamics is ultimately dependent on their parameterization. Given this, improved model-experiment integration is needed in the field of microbial ecology when applying these models to ensure a more robust view on an investigated system (Baveye et al., 2018; Widder et al., 2016).

1.5 Scaling up towards the ecosystem scale

Integrating across scales to the ecosystem level is challenging, as ecosystem scale processes are based on a complex network of metabolic processes and involve emerging scale- dependent controls. To facilitate the link between the microbial scale and ecosystem functions, processes have been classified as ‘broad processes’ and ‘narrow processes’. Broad processes encompass activities carried out by diverse microbial communities, with many species contributing to the same process (e.g. microbial respiration or organic nitrogen mineralization)

20

(Schimel & Schaeffer, 2012; Trivedi, Anderson, & Singh, 2013).⁠ In contrast, narrow processes are more defined in their genetic and physiological specificity (e.g. nitrification or CH4 oxidation). Yet it remains unclear as to whether the controls of ecosystem level processes - for both broad and narrow processes - are the same at the micro- and macroscale.

Measurements of ‘broad processes’ are relatively diffuse in regards to genetic and enzymatic targets, making high-resolution investigations challenging. Broad process measurements of microbial activity have been recorded using soil respiration chambers (Bowden,

Davidson, Savage, Arabia, & Steudler, 2004),⁠ organic nitrogen mineralization (S. D. Frey et al., 2008;

Pisani, Frey, Simpson, & Simpson, 2015; Schmidt et al., 2011) ⁠ and enzymatic assays (Widmer,

Flieûbach, Laczko Â, Schulze-Aurich, & Zeyer, 2000) ⁠ in complement to broad characterization of microbial community structure, including biomass and phospholipid fatty acid analysis (PLFAs;

(Watzinger, 2015).⁠ These approaches provide insight into processes performed by many if not most microorganisms, but are unable to assign function to specific organisms (Nesme et al., 2016).⁠

In contrast, narrow processes (such as nitrification or CH4 oxidation) are more defined in regards of their genetic and/or enzymatic targets and associated physiology (Schimel & Schaeffer,

2012).⁠ Therefore, understanding the influence of controls (such as labile C, Mo and V availability for N2 fixation) is hypothesized to be more direct, scalable and strongly linked to the active microbial consortia mediating these processes (Bakken, Bergaust, Liu, & Frostegård, 2012; Isobe,

Koba, Otsuka, & Senoo, 2011; Schimel & Gulledge, 1998).⁠

Challenges exist in integrating across scales primarily due to microbe-microbe interactions. Although processes such as microbial respiration is based on the activity of individual microorganisms, microbial community metabolism is not simply the sum of these single microbial

21

activities. For example, chemical interactions among cells and populations can have differing outcomes (Ponomarova & Patil, 2015; Stubbendieck, Vargas-Bautista, & Straight, 2016) and in addition several unrelated microbial groups can (simultaneously) influence the same process

(Röling, van Breukelen, Bruggeman, & Westerhoff, 2007).⁠ Given these non-predictable variables, determining the controls at one scale and then up or down scaling remains a major challenge and may not be possible.

To better understand the controls and determine if they are similar at both the micro- and macroscale, multi-disciplinary investigations with techniques and tools stemming from both the

+ macroscale (such as enzymatic assays and/or measurements of CO2 or NH4 production), combined with microbial community structure (via amplicon sequencing) and functional diversity

(via metagenome sequencing) investigations, and microscale approaches (such as microfluidics,

FISH, NanoSIMS, Raman microspectrosopy) are warranted. Typically, specific processes are investigated at one scale with its commonly applied set of techniques, which makes it challenging to extrapolate to the opposing scale where other processes are measured with other techniques.

This is further exacerbated by challenges the -omics field is facing, since much of the genomic, proteomic and metabolic data are only available for select model organisms across the archaea, bacteria and microbial eukaryotes (e.g. (Choi et al., 2017),⁠ and typically >50% of their genomic information lacks functional annotation (Delmont et al., 2015).⁠ More refined methodologies across scales are necessary to improve understanding of microbial community composition, metabolism, interactions, as well as how these factors are controlled and drive processes across scales in soil ecosystems.

22

1.6 Synthesis

Scales are an important aspect of soil microbial processes, which ultimately affect overall ecosystem function. In this perspective we sought to raise awareness about spatial scales in soils and reinforce the need to make more refined soil investigations across scales - with particular emphasis on the soil ‘microbial consortium’ scale. We propose that the ‘microbial consortium’ is the key scale that drives processes at higher spatial scales (from aggregate to ecosystem), demanding further consideration and investigation. Mini-metagenomics, microfluidics, stable isotope single-cell level investigations along with computational modeling and co-occurrence networks have the potential to provide the resolution to investigate the active participants in the microbial consortium. In light of our new view of the tree of life (Castelle & Banfield, 2018; Hug et al., 2016) ⁠ our knowledge regarding the extent of diversity is expanding. The complex properties of soil in addition to the expanse of unexplored microbial diversity contribute to the lack of understanding of microbial communities, and how processes at the single cell scale are related to processes at the ecosystem level. Study of soil communities and processes at the scale of the microbial consortium allow for more targeted measurements of microbial metabolism and interactions that result in feedbacks at higher levels. We postulate that the ‘microbial consortium’ scale could change our understanding of microbial community structure and function in soil(s), resulting in stronger linkages between microorganisms and their roles as ecosystem process drivers.

23

Figure 1.1. Conceptual image depicting the different scales at which processes occur in soil and exemplifying major process controls at each scale.

24

Figure 1.2. Diversity contributes to the complexity of soil ecosystems. Soil provides habitats for organisms including virophages, giant viruses, bacteria, archaea, protists, fungi and arthropods. However, this diversity presents challenges in the study of soil ecology. Approaches have been taken at higher level scales (including the soil core and millimeter scales) to capture the diversity and physiology to decrease complexity, including microbial biomass, phospholipid fatty acid analysis, and laboratory culture of microorganisms. To further dissect the diversity and potential function of organisms within microbial consortia, molecular approaches have been applied across scales including bulk metagenomics, mini-metagenomics, and single-cell genomics. These approaches reduce complexity of consortia in situ and enable improved genome assembly and binning.

25

CHAPTER 2

COMPLEMENTARY METAGENOMIC APPROACHES REVEAL BACTERIAL AND ARCHAEAL

DIVERSITY IN FOREST SOIL2

2.1 Abstract

Soil ecosystems harbor diverse microorganisms yet remain poorly characterized as neither single- cell sequencing nor whole community sequencing offers a complete picture of these spatially complex communities. Thus, the genetic and metabolic potential of this ‘uncultivated majority’ remains underexplored. To circumvent these challenges, we applied a cell sorting based mini- metagenomics approach and compared the results to bulk metagenomics. Informatic binning of these data produced 200 mini-metagenome assembled genomes (sorted-MAGs) and 29 bulk metagenome assembled genomes (MAGs). These sorted and bulk MAGs increased known phylogenetic diversity of soil taxa and increased total soil tree branch length by 7.2%. Additionally, sorted-MAGs expanded the rare biosphere not captured through MAGs from bulk sequences, exemplified through phylogenetic and functional analyses of the phylum Bacteroidetes. Analysis of 66 Bacteroidetes sorted-MAGs showed conserved patterns of carbon metabolism across four clades. These results indicate that mini-metagenomics enables genome-resolved investigation of predicted metabolism and demonstrates the utility of combining metagenomics methods to tap into the diversity of heterogeneous microbial assemblages.

2 Alteio LV, Schulz F, Seshadri R, Varghese N, Rodriguez-Reillo W, Ryan E, Goudeau D, Eichorst SA, Malmstrom RR, Katz LA, Blanchard JL, and Woyke T. Complementary metagenomic approaches reveal bacterial and archaeal diversity in forest soil. In revision at Microbiome.

26

2.2 Introduction

Soil is considered one of the most biologically diverse ecosystem types, yet much of its microbial diversity remains poorly characterized (e.g. (O’Donnell et al., 2007).⁠ Each gram of soil is estimated to harbor 1,000 to 1,000,000 different bacterial species (e.g. McLean et al., 2013).⁠⁠

Investigating soil microorganisms in situ is challenging due to the heterogeneous nature of the soil environment (e.g. Yu et al., 2017).⁠⁠ As a result, terrestrial habitats remain immense reservoirs of untapped genetic and metabolic diversity (Nesme et al., 2014; Torsvik & Øvreås, 2002) ⁠⁠ encoded within microbial communities that drive important ecosystem-level processes, including nitrogen cycling and carbon dioxide flux (Delgado-Baquerizo et al., 2016; Graham et al., 2016; Lladó, López-

Mondéjar, & Baldrian, 2017).⁠⁠⁠ Soils are regarded as critical for global health, as they contain 3000

Pg of carbon, with the potential to act as a positive feedback to further climatic shift (Nesme et al., 2014; Torsvik & Øvreås, 2002).⁠ It is therefore essential to characterize soil microbial diversity to better understand ecosystem function and resilience in the face of rapid environmental change.

Historically, microbial diversity has been studied using laboratory cultivation techniques

(Nesme et al., 2014; Torsvik & Øvreås, 2002) ⁠ with only a minute fraction of estimated bacterial diversity being successfully cultivated (Delgado-Baquerizo et al., 2016; Graham et al., 2016; Lladó,

López-Mondéjar, & Baldrian, 2017).⁠ Substantial efforts are being made to develop innovative cultivation techniques, including the ichip and droplet-based sorting coupled with laboratory cultivation (Hicks Pries, Castanha, Porras, & Torn, 2017).⁠⁠ Yet, these approaches have only contributed to the expansion of novel families, whereas microbial taxa are predicted to be more phylogenetically divergent, belonging to potentially novel phyla (Overmann, Abt, & Sikorski, 2017;

Pham & Kim, 2012).⁠ Thus, challenges associated with direct study of soil microorganisms have yielded a large knowledge gap regarding terrestrial microbial diversity. In addition to cultivation 27

limitations, there is a lack of representative reference genomes for soil microbes (Staley &

Konopka, 1985; Urich et al., 2008).⁠ From the publicly available Integrated Microbial Genomics

(IMG/M) database, we were able to curate a collection of 3,024 isolate genomes, single amplified genomes (SAGs) and metagenome assembled genomes (MAGs) from previous soil studies (Nichols et al., 2010; Overmann et al., 2017).⁠ However, with estimates of diversity on the order of millions of species per gram (Lloyd et al., 2018),⁠⁠ these references represent only a small percentage of soil microbes.

High-throughput sequencing technologies combined with novel metagenome binning algorithms (Choi et al., 2017) ⁠ enable genome-resolved metagenomics and have greatly expanded the availability of reference genomes from uncultured taxa by circumventing challenges associated with cultivation (Chen et al., 2016).⁠⁠⁠ The more recent applications of directly sequencing

DNA from soil microbial communities allows one to obtain a broader perspective on the taxonomic and functional potential of soil microorganisms. However, metagenomics in highly diverse environments may capture only the most abundant, and therefore best-assembling representatives from the total community (Urich et al., 2008),⁠ ⁠ and population heterogeneity can hamper the assembly efficiency, even of abundant microorganisms (Albertsen et al., 2013;

Wrighton et al., 2012).⁠⁠⁠

Traditional MAGs combine genomes from similar organisms within populations (Castelle

& Banfield, 2018; Hug et al., 2016; Nesme et al., 2016).⁠⁠ Depending on the binning parameters used, MAGs may collapse contigs from a highly diverse sample into a single genome, further complicating data interpretation. Soils are typically dominated by a small set of highly abundant taxa (Delmont et al., 2015; Nayfach & Pollard, 2016; Parks et al., 2017),⁠⁠⁠ and therefore the rare biosphere may be overlooked in metagenomic studies despite playing an important role in soil 28

biogeochemical processes (Sczyrba et al., 2017).⁠ Lastly, bulk metagenomics can also include extracellular DNA from dead microorganisms, which may be abundant in the environment. This exogenous DNA has the potential to inflate estimates of diversity and genomic potential (Sczyrba et al., 2017),⁠⁠ and further reduce our ability to assemble sequences from rare taxa. Decoupling intracellular and exogenous DNA during sequencing may provide a more accurate estimate of microbial diversity (Delgado-Baquerizo et al., 2016).⁠

Challenges associated with bulk metagenomics may be mitigated by reducing community complexity. The most extreme example involves the application of fluorescence-activated cell sorting (FACS) for separating communities into single cells for single-cell genomics, which provides genomic information with strain-level resolution (Jousset et al., 2017).⁠ However, the resulting SAG assemblies are often highly fragmented and incomplete, and the overall process is prone to biases and contamination.

In order to circumvent some of the challenges associated with bulk metagenomics and single-cell genomics, we applied a pooled cell sorting approach, termed mini-metagenomics, on forest soils collected from the Barre Woods soil warming experiment at the Harvard Forest Long-

Term Ecological Research (LTER) site. Prior to this study, mini-metagenomics of microorganisms had been applied only in aqueous environments, including hot springs, hospital sink biofilms and activated sludge (Carini et al., 2016; Jay T Lennon, Muscarella, Placella, & Lehmkuhl, 2017; Nagler,

Insam, Pietramellara, & Ascher-Jenull, 2018).⁠⁠⁠ Mini-metagenomics has higher throughput than single-cell genomics, providing the opportunity to capture more diversity than is possible with single-cell sequencing. Mini-metagenomics may enable investigation of different components of the soil community in comparison to bulk metagenomics, including cells that can be dissociated from particles, and cells with varying lysis susceptibility. The use of two overlapping metagenomic

29

methods may allow us to capture a broader taxonomic diversity than if only one approach were applied on its own. Additionally, cell sorting using FACS requires cells to be intact in order to be sorted, thereby minimizing challenges introduced by extracellular DNA in bulk soil samples. Using mini-metagenomics to reduce the number of cells relative to bulk metagenomics may decrease the number of genomes collapsed into a single MAG (J T Lennon, Muscarella, Placella, Lehmkuhl,

& Kellogg, 2018).⁠ Hence, we evaluated this method as a tool to complement bulk metagenomics in uncovering the ‘microbial dark matter’ in soil.

Here we demonstrate the utility of combining mini-metagenomics and bulk metagenomics for biological discovery in a highly diverse forest soil microbial community.

Separation of intact cells from soil via FACS enabled mini-metagenomic sequencing, while bulk metagenomics provided total community context for benchmarking. Our approach generated 200 sorted-MAGs and 29 bulk metagenome MAGs of medium quality, expanding phylogenetic diversity of known soil clades. Our data suggest that the sorted-MAGs represent some of the diversity of previously un-sequenced organisms that are challenging to access using bulk approaches, offering insights into the functional potential of soil dark matter.

2.3 Results and Discussion

2.3.1 Improved assembly and binning from mini-metagenomes

Our application of mini-metagenomics combines microbial cell sorting and metagenome sequencing in order to divide a complex soil community into many smaller, less complex subsets.

We performed FACS on pools of cells from four soil samples collected from the Barre Woods experimental warming plots at the Harvard Forest Long-Term Ecological Research (LTER) site. From each of the four samples we sequenced 90 replicate pools of 100 cells for a total of 359 mini-

30

metagenomes (one mini-metagenome failed quality control standards). In conjunction with mini- metagenomic sequencing, we performed bulk metagenomics on these four soils generating a total of 1.2 Gbp and 1.3 Gbp, respectively (Figure 2.1).

Binning of assembled contigs produced 1793 mini-metagenome assembled genomes

(sorted-MAGs) and 275 bulk metagenome MAGs (Figure 2.2 and 2.3). Following CheckM quality assessment (Blainey, 2013b; Stepanauskas, 2012; Woyke et al., 2017b),⁠ 200 sorted-MAGs and 29 bulk MAGs surpassed a completeness threshold of ≥50% complete, ≤10% contamination and

≤10% strain heterogeneity. We considered MAGs with less than 50% completeness as ‘low quality’ based on MIMAG standards (Berghuis et al., 2019; McLean et al., 2013a; Yu et al., 2017),⁠⁠ and excluded them from additional analyses (Figure 2.2 and Figure 2.3). Overall, quality filtering removed lower quality sorted-MAGs on the basis of completeness, whereas bulk MAGs were removed due to a higher degree of contamination. Assessment of MAG quality using CheckM showed an average percentage completeness of 81.5% in medium quality bulk metagenome

MAGs (n=29) relative to 61.9% in the medium quality sorted-MAGs (n=200; p=3.29X10-7; Figure

2.2 and 2.4). When assessed for marker gene contamination, bulk metagenome MAGs revealed an average estimated level of contamination of 1.92%, similar to the 0.98% average contamination in the sorted-MAGs (p=0.01117; Figure 2.2 and Figure 2.4).

As one measure to compare mini-metagenomics and bulk metagenomics methods, we assessed GC content and found an average of 49.2% GC and 60.5% GC in sorted-MAGs and MAGs, respectively (Figure 2.2, Figure 2.5). Variation in GC content can be attributed to known biases in the single cell workflow such as susceptibility of cells to sorting and lysis (Yu et al., 2017),⁠ as well as amplification bias introduced during MDA (Parks, Imelfort, Skennerton, Hugenholtz, & Tyson,

2015).⁠ The cell isolation method used in mini-metagenomics reduces inflation of community

31

diversity as a result of exogenous DNA and enables sequencing of organisms that are typically attached to soil particles through the application of a mild surfactant. Additionally, the difference in DNA extraction procedures between mini-metagenomics and bulk metagenomics represents an opportunity to capture an expanded diversity of microorganisms, as each approach may access a different component of the community. Taken together, mini-metagenomics and bulk metagenomics generated a large number of quality MAGs that can be used as complementary datasets in genome-resolved studies to investigate broad microbial diversity.

2.3.2 Expansion of phylogenetic diversity

As one aim of our study was to provide reference genomes that represent soil microbiome diversity, we evaluated the contribution of both sorted-MAGs and bulk MAGs to phylogenetic diversity in the context of previously published genomes of soil bacteria and archaea. We inferred the phylogenetic relationships using concatenated marker genes from the 200 sorted-MAGs, the

29 bulk MAGs and 3,024 soil microbe reference genomes from the IMG/M (Figure 2.6) (Bowers et al., 2017).⁠⁠ For this analysis, we clustered sequences at 95% average nucleotide identity (ANI) to estimate distinct species-level lineages, resulting in 170 sorted-MAGs, 25 bulk MAGs and 2,341 reference sequences from IMG/M (Figure 2.6 and Figure 2.7). This small decrease in the number of MAGs as a result of clustering indicates very little redundancy between previous MAGs and available reference sequences. Sorted and bulk MAGs from this study contributed genome diversity across numerous soil clades, including Alphaproteobacteria (16 sorted-MAGs, 2 bulk

MAGs), Acidobacteria (11 sorted-MAGs, 14 bulk MAGs) and Planctomycetes (2 sorted-MAGs, 1 bulk MAG). Sorted and bulk MAGs also contributed diversity to less abundant soil taxa including

TM6 (6 sorted-MAGs, 1 bulk MAG) and Betaproteobacteria (3 sorted-MAGs, 1 bulk MAG).

32

Comparison of MAGs recovered through mini and bulk metagenomics revealed a broad diversity of soil bacteria and archaea, and demonstrated the complementarity of these approaches for biological discovery. The sorted-MAGs expanded taxonomic diversity of abundant groups including Bacteroidetes (48), Verrucomicrobia (8), and taxa with lower abundances including Thaumarchaeota (4), Omnitrophica (3), Ignavibacteria (2), Melainabacteria (1) and

Firestonebacteria (1). Interestingly, numerous sorted-MAGs belonged to phyla typically comprised of pathogens and endosymbionts such as the Chlamydiae (31) and Gammaproteobacteria (30), specifically within Legionellales, as well as TM6 (7) (Blainey, 2013b; Clingenpeel, Clum,

Schwientek, Rinke, & Woyke, 2014) ⁠ (Figure 2.3).⁠ The phyla identified by sorted-MAGs represented abundant taxa found in previous soil community studies (Stepanauskas et al., 2017) ⁠ in addition to the rare biosphere, demonstrating the utility of mini-metagenomics for expanding diversity beyond abundant soil taxa (Figure 2.3). As for the bulk MAGs, some of these belonged to rare taxa not recovered through mini-metagenomics, including WPS-2 (3), Euryarchaeota (1) and

Saccharibacteria (1).

We assessed the phylogenetic diversity (PD), the total amount of branch length contributed by sequences of interest within a phylogenetic tree, from the sorted-MAGs to determine the contribution of this single study to known microbial diversity. Calculation of phylogenetic diversity revealed a 7.2% increase in total branch length contributed by the sorted-

MAGs in relation to the soil reference sequences from IMG/M (Figure 2.3). Mini-metagenomes not only expanded phylogenetic diversity within clades of known soil bacteria and archaea, but also candidate phyla and low abundance taxa typically found in forest soils. More specifically, the sorted-MAGs increased the branch lengths of well-studied bacterial groups, including

Bacteroidetes (33.6%) and Alphaproteobacteria (19.4%), along with groups notoriously recalcitrant to laboratory cultivation, such as Verrucomicrobia (62.1%), Acidobacteria (28.0%) and 33

TM6 (11.3%) (Chen et al., 2016).⁠ Most notable was the PD increase in the Chlamydiae (72.5%), a taxonomic group which is typically overlooked in soil metagenomic studies due to their low abundance and likely dependence on eukaryotic host cells (Deeg, Zimmer, et al., 2018; Horn, 2008;

Pagnier et al., 2015; Schulz et al., 2015). We hypothesize that the application of mild detergent and syringe filtration during sample processing may have lysed the microbial eukaryotes that serve as hosts for bacterial endosymbionts, making these bacteria more accessible for FACS. A similar phenomenon was suggested for the detection of 16 novel giant viruses from these same samples

(DeAngelis et al., 2015; Delgado-Baquerizo et al., 2018; Fierer, 2017),⁠ as these viruses are most often associated with eukaryotic host cells (S. A. Eichorst et al., 2018; McLean et al., 2013a).⁠⁠

The sorted-MAGs demonstrated the potential for mini-metagenomics to increase our knowledge of diversity beyond what can be achieved using MAGs from bulk metagenome studies alone. The bulk MAGs contributed to the phylogenetic diversity of many of the same clades of soil bacteria as the sorted-MAGs, including Acidobacteria (10.5%), TM6 (7.3%) and

Alphaproteobacteria (2.6%). However, even in clades where more bulk-derived genomes were added relative to sorted-MAGs, such as in Acidobacteria, the sorted-MAGs were phylogenetically more diverse.

2.3.3 Caveats of mini-metagenomics

Although the mini-metagenomics approach produced a greater number of medium quality genome bins relative to bulk metagenomics, this approach is not without challenges. In comparison to bulk metagenomics, mini-metagenomics may be prohibitive as it involves equipment and expertise that may not be easily accessible. In addition to logistical obstacles, methodological challenges including cell isolation, differential membrane lysis efficiency, and GC- based genome amplification skew likely introduce bias during sample processing. This may be 34

reflected in our data, where organisms which are typically abundant in forest soils such as

Actinobacteria, Chloroflexi and Firmicutes (Lagkouvardos et al., 2014),⁠⁠ were present in low numbers using mini-metagenomics as compared to traditional bulk metagenomics (Figure 2.3).

Though these taxa might have been missed due to the aforementioned biases, it is also possible that sequences from these organisms were not binned or were placed in a lower quality bin based on our filtering threshold. For example, bacteria in the phylum Spirochaetes were represented by

47 distinct sorted-MAGs; however, none of these passed quality filtering standards and were therefore excluded (Figure 2.3). An alternative DNA amplification method has been developed, termed WGA-X, which improves cell lysis and amplification of high GC-content organisms over

MDA (Schulz et al., 2018).⁠ With this improved method of DNA amplification, more representative mini-metagenomic sampling might be possible. Additionally, the methods for sample processing prior to FACS may be modified to achieve a more targeted cell or particle fraction, thereby further expanding the utility of mini-metagenomics to detect dark matter microorganisms.

2.3.4 Representation of sorted-MAGs and MAGs Across Terrestrial Soil Metagenomes

To assess the representation of our newly generated soil reference genomes across other terrestrial ecosystems, we searched for protein coding sequences from our collection of sorted-

MAGs and MAGs across publicly available soil metagenomes from 80 terrestrial metagenome studies. For this analysis, we dereplicated the 200 sorted-MAGs and 29 bulk MAGs from this study by clustering at 95% average nucleotide identity without reference sequences, resulting in 173 sorted-MAGs and 28 bulk MAGs as cluster representatives (Figure 2.4). We assessed these mini- and bulk MAGs in the context of broader terrestrial community studies by comparing them against

2,210 metagenomes from the 80 terrestrial studies using LAST (Aherfi, Colson, La Scola, & Raoult,

2016) ⁠ (Figure 2.4).⁠ We defined highly represented sorted-MAGs and MAGs as those with at least

35

200 protein coding sequences with hits to metagenome samples at ≥95% amino acid identity over

70% alignment length (Delgado-Baquerizo et al., 2018).⁠⁠⁠

Some of our sorted-MAGs and MAGs detected in previous metagenomic soil investigations were members of the phylum Acidobacteria (10 sorted-MAGs, and 15 MAGs; Figure

2.4). Five bulk MAGs in the phylum Proteobacteria were detected in metagenomes from forest, agricultural, arctic, grassland, and vadose zone soils, whereas two bulk MAGs in candidate division

WPS-2 were detected in metagenomes from Harvard Forest and other forest soil metagenomes, as well as arctic and surface soils. Interestingly, one MAG in the Planctomycetes was detected only in metagenome sequences from the Harvard Forest, indicating that this may represent a unique

MAG which has not been found in previous terrestrial metagenome studies.

The phylum Bacteroidetes was highly represented by the sorted-MAGs (55.5%) as compared to the bulk metagenome MAGs (0.1%) and unbinned metagenome data (3.8%). In contrast, Acidobacteria was the most abundant phylum in the bulk MAGs (77%) and unbinned metagenome data (32%), as compared to the sorted-MAGs (8.5%). Taxonomic annotation of un- assembled reads from Harvard Forest bulk metagenome revealed that Acidobacteria was among the most abundant phyla, comprising 32.0% relative abundance of annotated reads from the total community.

The sorted-MAGs in the phylum Bacteroidetes appeared to be novel as they had relatively poor matches to protein coding sequences from publicly available soil metagenomes. This presumed novelty could also contribute to computation challenges associated with sequence assembly, as only the most abundant taxa are over-represented in public databases (Stepanauskas et al., 2017).⁠ Yet, many of these sorted and bulk MAGs were not represented in previous Harvard

Forest metagenomes. Taken together, the low representation of our Bacteroidetes sorted-MAGs

36

across previously published metagenome samples illustrates the expanded biodiversity gained through the use of mini-metagenomes, demonstrating the utility of this approach for accessing the rare taxa within diverse samples.

2.3.5 Biological Insights into Carbon Metabolism in Soil Bacteroidetes

Bacteroidetes make up ~10% to the total microbial community in soils (Kiełbasa, Wan,

Sato, Horton, & Frith, 2011),⁠⁠ yet most of our knowledge about members of this phylum stems from sequenced isolates from vertebrate guts and aquatic habitats (C. Luo, Rodriguez-R, &

Konstantinidis, 2014; Seshadri et al., 2018).⁠ Furthermore, they are poorly represented in metagenome samples from terrestrial environments (Nayfach & Pollard, 2016).⁠ Given the relatively small body of work on soil Bacteroidetes and the substantial contribution of 66 putatively novel sorted-MAGs from this study (Figure 2.5 and Figure 2.3), we further explored these sorted-MAGs from Bacteroidetes to gain insight into their genomic potential in soils with a particular emphasis on carbon metabolism.

The genome sizes of the sorted-MAGs ranged from 1.6 to 5 Mb, which is somewhat consistent with previously reported Bacteroidetes genome sizes that range from 0.9 Mb

(Cardinium endosymbiont) (Fierer, 2017) ⁠ to 9.1 Mb (Chitinophaga pinensis) (Fernández-Gómez et al., 2013; Kabisch et al., 2014; Thomas, Hehemann, Rebuffet, Czjzek, & Michel, 2011).⁠ The smaller genome sizes of the sorted-MAGs was likely due to genome completeness estimates, which ranged from 50% to 80.5% based on analysis of CheckM marker genes (Figure 2.5) (Thomas et al.,

2011).⁠⁠ The sorted-MAGs were distributed across three distinct families, including Cytophagaceae,

Chitinophagaceae, and , as well as a clade of unclassified sorted-MAGs

(Figure 2.5). Although not much is known about the function of soil Bacteroidetes, they are known

37

to have a large set of genes that encode enzymes for carbohydrate degradation, including a broad array of glycoside hydrolases, which are phylogenetically conserved across taxa (Penz, Schmitz-

Esser, Kelly, Cass, & Mü Ller, 2012).⁠ We focused on the glycoside hydrolase, glycosyl transferase and carbohydrate binding module genes, which contribute to the degradation, synthesis and transport of polymeric carbon substrates (Glavina et al., 2010).⁠ We hypothesized that the distribution of these genes across the Bacteroidetes mini-MAGs would allow us to infer the ecological roles of these organisms in their native soil environment.

The distribution of CAZy families across the Bacteroidetes exhibited clade-specific abundance patterns of glycoside hydrolases, glycosyl transferases, and carbohydrate binding modules (Figure 2.5) in three families, including the Cytophagaceae, Chitinophagaceae and

Sphingobacteriaceae. Members of the Cytophagaceae family appeared to be specialized for polymeric carbon degradation, namely cellulose. Three sorted-MAGs assigned to the

Cytophagaceae family contained proteins in glycoside hydrolase family 5, a gene family comprised almost entirely of endocellulases with the potential to degrade polymeric cellulose structures into polysaccharides (Parks et al., 2015),⁠ which is consistent with previously sequenced Cytophagaceae genomes (C. Berlemont & Martiny, 2015).⁠⁠ In contrast, members of the Chitinophagaceae and

Sphingobacteriaceae families appeared to be generalists in carbon utilization. More specifically, the Chitinophagaceae sorted-MAGs harbored the potential to utilize cellulose, hemicellulose and chitin. Seventeen of the twenty-seven sorted-MAGs in the Chitinophagaceae family contained at least one chitinase in glycoside hydrolase family 18 or 19 (Cantarel et al., 2009) ⁠⁠ along with cellulases in glycoside hydrolase families 5, 8, and 9 and glycoside hydrolases in family 43 that may degrade hemicellulose and pectin (R. Berlemont & Martiny, 2013) ⁠ (Figure 2.5). In support of this conjecture, the sequenced genome of Chitinophaga pinensis (member of the Chitinophagaceae

38

family) contains genes to degrade leaf matter and fungal structures, suggesting its ability to degrade both cellulose and chitin (Taillefer, Arntzen, Henrissat, Pope, & Larsbrink, 2018).⁠⁠ Twenty sorted-MAGs belonged to the family Sphingobacteriaceae and typically harbored the potential to degrade cellulose, xylan, and chitin, with GH families 2, 3, 5, 13, 18 and 20 being the most abundant across sorted-MAGs in this group. The class Spingobacteria are typically found in soils, with an average abundance of ~4.6% (Hoell, Vaaje-Kolstad, & Eijsink, 2010).⁠⁠ Interestingly, one sorted-MAG (Q3300020668_2) had the highest number of glycoside hydrolase genes within the

Sphingobacteriaceae (125 annotated glycoside hydrolases), representing a diverse array of carbohydrate degradation capabilities and potential metabolic flexibility. This is consistent with previous investigations describing the family Sphingobacteriaceae as capable of degrading diverse polysaccharides (Mewis, Lenfant, Lombard, & Henrissat, 2016).⁠

Putatively novel Bacteroidetes sorted-MAGs stemming from experimental warming plots at the Harvard Forest Long-Term Ecological Research spanned three different families and harbored an extensive diversity of CAZymes, such as chitin, cellulose and hemicellulose. The genomic potential to utilize these labile carbon compounds is consistent with previous metagenomic investigations in soils of warmed plots (Mckee, Martínez-Abad, Ruthes, Vilaplana,

& Brumer, 2019).⁠ Furthermore, previously sequenced genomes of Bacteroidetes reveal an extensive enzymatic repertoire to degrade carbon (Janssen, 2006),⁠ presumably making them well- suited for survival in soils with a diverse collection of carbon compounds from plant material.

Interestingly, the number of identified carbohydrate active enzyme genes increased with genome size for each of the six CAZy categories (Figure 2.5). This suggests that members of the

Bacteroidetes accumulate the capacity to degrade various carbohydrates, presumably expanding their niche for carbohydrate utilization in the soil.

39

2.3.6 Reduced Carbon Metabolism in an Unclassified Bacteroidetes Clade

Nineteen sorted-MAGs belonged to an unclassified clade of Bacteroidetes, which were highly depleted in glycoside hydrolases and carbohydrate binding modules but retained a high number of glycosyl transferases (Figure 2.5). The low abundance of CAZy genes associated with substrate access and degradation may indicate that these organisms are not involved in substrate decomposition. Rather, the relatively higher abundance of glycosyl transferase genes involved in formation of glycoside bonds may indicate that these organisms are responsible for synthesis of higher molecular weight compounds and may depend on living in close association with other organisms.

Bacteroidetes such as Amoebophilus asiaticus (Shen et al., 2017),⁠ Cardinium sp. (Y. Luo et al., 2001; Zhou et al., 2011),⁠⁠ Sulcia muelleri (Thomas et al., 2011) ⁠ and Blattabacterium sp.

(Schmitz-Esser et al., 2010) ⁠ were identified as endosymbionts and symbionts (Figure 2.5). Similar to these known symbionts, the estimated GC contents of unclassified sorted-MAGs in this study were low relative to other Bacteroidetes sequences, with an average of 39.97% GC (Zchori-Fein,

Perlman, Kelly, Katzir, & Hunter, 2004).⁠⁠⁠ These unclassified Bacteroidetes may have limited metabolic capabilities while retaining comparable genome sizes to Bacteroidetes previously identified as host-associated. For example, the genome size of A. asiaticus was 1.89 Mb, and average assembly size for the unclassified Bacteroidetes sorted-MAGs was 2.4 Mb, which are smaller relative to free-living Bacteroidetes genomes. Symbionts may undergo the process of reduction in genome size when in contact with the host organism, resulting in a linear relationship between the number of protein-coding genes contained and the size of the genome (Chang et al.,

2015).⁠ Similarities in the genome structure and relatively low composition of CAZy genes may indicate a symbiotic or host-associated life strategy for some of these sorted-MAGs in the clade of

40

unclassified Bacteroidetes. Completeness estimates of sorted-MAGs within this clade ranged from

51.13% to 80.52%, with an average completeness of 60.5%. Estimated completeness may therefore play a role in the decreased number of annotated CAZy genes.

The abundance of unclassified Bacteroidetes within this study may be further evidence of the liberation of symbionts from host cells and vacuoles prior to FACS. Alternatively, the relatively low abundance of glycoside hydrolases in sorted-MAGs within the unclassified clade may be indicative of an opportunistic life strategy (Ló Pez-Sánchez, Neef, Peretó, Patiñ O-Navarrete, &

Pignatelli, 2009).⁠ The inability to degrade compounds through hydrolysis makes this clade well- suited to living in close association with active degraders.

2.4 Conclusions

This application of mini- and bulk metagenomics has demonstrated the utility of these complementary techniques for biological discovery within the complex soil ecosystem. Using mini- metagenomics to reduce the number of cells prior to sequencing, we have uncovered bacterial and archaeal soil diversity that could not be accessed using bulk metagenomics alone. Mini- metagenomics is a powerful tool for the discovery of rare biosphere organisms and potential endosymbionts, revealing biodiversity in dominant soil groups as well as low abundance taxa.

Taken together, mini- and bulk metagenomics allow us to probe deeper into microbial diversity and function within heterogeneous environments beyond soil.

2.5 Methods

2.5.1 Sample Collection and Incubation

Soils were collected on 24 May 2017 from the Barre Woods long-term experimental warming plots located at the Harvard Forest Long Term Ecological Research (LTER) site in 41

Petersham, MA. Cores were taken from subplots within the larger 30x30 meter plots. Soils were separated into organic (approximately top 5 cm of soil core) and mineral (lower 5 cm of soil core) horizons by visual inspection and were sieved with a 2mm mesh. Approximately 5g of soil was immediately frozen in a dry ice/ethanol bath for DNA extraction, then was transported to the

University of Massachusetts Amherst for storage at -80°C. Approximately 15g of soil was transferred to a 50 mL falcon tube for transportation on ice to the Joint Genome Institute (JGI) in

Walnut Creek, CA. Samples were further processed as described in (Moran, Mclaughlin, & Sorek,

2009).

2.4.2 Sample Preparation and Cell Sorting

Cells were separated from four incubated soils (heated organic, heated mineral, control organic, and control mineral samples) for FACS through the addition of 0.02% Tween 20 followed by vortexing for 5 minutes. Samples were centrifuged for 5 minutes at 500xg to pellet large soil particles. Following centrifugation, the supernatant was filtered through a 5µm syringe filter to remove the remaining soil particulates. Samples were diluted 1:100 in PBS and stained with SYBR- green. For each of the four soil samples, ninety pools of 100 SYBR+ cells were sorted into microwell plates using a BD Influx cell sorter to perform FACS. Sorted pools underwent cell lysis and whole genome amplification using the Qiagen RepliG Single Cell kit for Multiple Displacement

Amplification (MDA). A total of 360 libraries were generated for sequencing with Nextera XT v2 kit

(Illumina) with 9 rounds of PCR amplification.

2.4.3 Mini-Metagenomes

Following library preparation, the 360 mini-metagenome libraries were sequenced on the

Illumina NextSeq platform at the DOE Joint Genome Institute (JGI, Walnut Creek, CA). Pools of 90

42

libraries were processed in four sequencing runs with 2x150bp read lengths. Raw Illumina reads were quality filtered to remove contamination and low-quality reads using BBTools (v37.38) (Ló

Pez-Sánchez et al., 2009; McCutcheon & Moran, 2012; Moran et al., 2009),⁠ resulting in 359 mini- metagenomes for downstream analysis, as one mini-metagenome did not pass quality filtering standards. Read normalization was performed using BBNorm (R. Berlemont & Martiny, 2013) ⁠ and error correction was conducted using Tadpole (Bushnell, n.d.).⁠ Assembly of filtered, normalized

Illumina reads was completed using SPAdes (v3.10.1) (Bushnell, n.d.) ⁠ with the following options: -

-phred-offset 33 -t 16 -m 115 --sc -k 25,55,95. All contig ends were trimmed of 200bp and contigs were discarded if the length was <2kb or read coverage was less than 2 using BBMap (Bushnell, n.d.) ⁠ with the following options: nodisk ambig, filterbycoverage.sh: mincov.

2.4.4 Bulk Metagenomes

Total DNA was extracted from ~0.25g of soil using the DNeasy PowerSoil DNA extraction kit (QIAGEN). Extracted DNA was assessed using the Bioanalyzer and Qubit. Unamplified TruSeq libraries were prepared for 4 DNA samples prior to sequencing on the Illumina HiSeq-2000 platform at the DOE JGI. Raw Illumina reads were trimmed, quality filtered, and corrected using bfc (version r181) with the following options: -1 -s 10g -k 21 -t 10. Following quality filtering, reads were assembled using SPAdes (v3.11.1) (Bushnell, n.d.) ⁠ with the following options:-m 2000 --only- assembler -k 33,55,77,99,127 --meta -t 32. The entire filtered read set was mapped to the final assembly and coverage information was generated using BBMap (v37.62) (Bankevich et al., 2012) ⁠ with default parameters except ambiguous=random. The version of the processing pipeline was jgi_mga_meta_rqc.py, 2.1.0. Of the twenty-eight metagenome samples sequenced, only four were selected for inclusion in analysis for this study because they corresponded to those samples sorted using FACS. 43

2.4.5 Genome Binning and Quality Assessment

Assembled contigs from the mini-metagenomes and bulk metagenomes were binned based on tetranucleotide frequency using MetaBat2 (Bushnell, n.d.).⁠ Genome bins were generated for mini-metagenomes without contig coverage patterns due to MDA bias. Binning of mini-metagenomes resulted in 1793 sorted-MAGs, and binning of bulk metagenomes from all samples resulted in 275 bulk metagenome MAGs. Genome bins were assessed for completeness and contamination using a set of marker genes implemented in CheckM (Bankevich et al., 2012).⁠⁠⁠

Mini- and bulk metagenome MAGs were filtered to ≥50% completeness, ≤10% contamination, and

≤10% strain heterogeneity, to retain medium quality sorted-MAGs and bulk metagenome MAGs for downstream analysis (Bushnell, n.d.).⁠ Following quality filtering, 200 medium quality sorted-

MAGs, and 29 medium quality bulk metagenome MAGs remained.

2.4.6 Phylogenetic Tree Construction and Phylogenetic Diversity

A concatenated marker gene phylogenetic tree was constructed for two-hundred medium quality sorted-MAGs, 29 bulk MAGs, and 3,024 reference genomes from soil bacteria and archaea available in the IMG/M database. A set of 56 universal single copy marker proteins (Kang, Froula,

Egan, & Wang, 2015) ⁠ was identified with hmmsearch (v3.1b2) (Parks et al., 2015) ⁠ and specific

HMMs for each of the markers. For every marker protein, alignments were built with MAFFT

(v7.294b) (Bowers et al., 2017) ⁠⁠ and subsequently trimmed with BMGE using BLOSUM30

(Bankevich et al., 2012; Yu et al., 2017).⁠⁠ MAGs and reference sequences were clustered at 95% average nucleotide identity with FastANI v1.0 (Eddy, 2011),⁠ resulting in 170 sorted-MAGs, 25 bulk

MAGs, and 2,341 reference sequences with distinct taxonomic classification. Single protein alignments were then concatenated and a phylogenetic tree inferred with FastTree2 using the

44

options: -spr 4 -mlacc 2 -slownni -lg (Katoh, Misawa, Kuma, & Miyata, 2002)[89]⁠ ⁠ and was visualized using iTol (Criscuolo & Gribaldo, 2010).⁠

The contribution of sorted-MAGs and bulk MAGs to phylogenetic diversity was determined by calculating the sum of total branch length of contributed genomes relative to reference genomes (Jain, Rodriguez-R, Phillippy, Konstantinidis, & Aluru, 2018).⁠ Total Branch length was calculated for a phylogenetic tree containing only 2,341 bacterial and archaeal reference sequences from IMG/M (Price, Dehal, & Arkin, 2010).⁠ We then calculated the additional total branch length contributed by sorted-MAGs and bulk MAGs. The percentage increase in total branch length was determined for the complete phylogenetic tree, as well as for clades that included sorted-MAGs.

Taxonomy was assigned to sorted-MAGs, bulk MAGs, and metagenome reads by searching sequences against the NCBI-NR database using DIAMOND (Letunic & Bork, 2016).⁠ Blast results were imported into MEGAN6 (Wu et al., 2009) ⁠ for taxonomic assignment. The relative abundance of each phylum was computed and visualized in R using ggplot2 (Chen et al., 2016).⁠

2.4.7 Gene Recruitment

Two-hundred sorted-MAGs and 29 bulk MAGs were de-replicated by clustering based on

95% average nucleotide identity. Protein coding sequences from the resulting 199 representative sorted-MAGs and MAGs were compared against coding sequences predicted from 2,210 soil metagenome samples from 80 terrestrial metagenome studies stored in the IMG/M database using LAST (Buchfink, Xie, & Huson, 2014)(⁠ Figure 2.4). Individual sorted-MAGs and MAGs were designated as a match to metagenome samples if the following criteria were met: a minimum of

200 CDS with hits at ≥ 95% amino acid identity over 70% alignment lengths to CDS of an individual

45

metagenome. The rationale for choosing the minimum 200 hit count was to ensure that the evidence included more than merely housekeeping genes, which may be more highly conserved.

The 90% amino acid identity cutoff was chosen based on Luo et al. 2014 (Huson et al., 2018),⁠ who assert that organisms grouped at the 'species' level typically show >85% AAI among themselves.

Since our dataset includes divergent sub-lineages, the more conservative threshold of 95% amino acid identity was adopted. The average percentage of CDS with a metagenome hit was calculated for each mini-metagenome (Figure 2.4), and the results were plotted as a multi-bar chart in iTol

(Wickham, 2016).⁠

2.4.8 Metabolic Insights

A maximum likelihood tree for Bacteroidetes was constructed using IQTree (Kiełbasa et al., 2011) ⁠ for the 66 sorted-MAGs and soil Bacteroidetes references from IMG/M. The tree was rooted with

Pedosphaera parvula in the phylum Verrucomicrobia. Family level taxonomic classification, genome size, and genome size based on CheckM marker gene assessment (C. Luo et al., 2014) ⁠ were visualized using iTol (Letunic & Bork, 2016).⁠⁠⁠

Functional annotation for sorted-MAGs was assigned using the Carbohydrate Active

Enzyme database (CAZy) (Nguyen, Schmidt, von Haeseler, & Minh, 2015) ⁠ implemented in dbCAN2

(Parks et al., 2015).⁠ The percentage of total annotated genes assigned to each gene family was calculated and is displayed as a multibar chart in iTol (Letunic & Bork, 2016).⁠

46

Figure 2.1. Overview of mini-metagenome and bulk metagenome approaches used in this study. A Mini-Metagenomics performed on four soil samples, including one heated sample from the top organic soil, one heated sample from the lower mineral soil, one control organic sample, and one control mineral sample (n=4). Cells were separated from soil particles using a mild detergent, followed by vortexing, centrifugation and filtration through a 5μm syringe filter. Suspended cells were stained with SYBR-green and sorted into 90 pools of 100 cells each, generating 359 mini- metagenomes. B Bulk metagenomic sequencing conducted on the four soils that were used in mini-metagenomics. C Following nucleic acid extraction, libraries were prepared, and shotgun sequencing was performed. Sequence data underwent assembly and quality control. Data were binned and assessed for bin quality. Only medium quality genome bins with estimates of 50% completeness, 10% contamination and 10% strain heterogeneity were used in downstream phylogenomic and functional analyses. Further details are provided in the Methods.

47

Figure 2.2. Assessment of sorted-MAG and MAG quality. Sorted-MAGs (orange, n=1793) and bulk MAGs from the four samples corresponding to those sorted with FACS (blue, n=275). Medium quality sorted-MAGs (dark orange, n=200) and MAGs (dark blue, n=29) are those with ≥50% completeness, ≤10% contamination and ≤10% strain heterogeneity based on analysis of CheckM marker genes (V. Lombard, Golaconda Ramulu, Drula, Coutinho, & Henrissat, 2014).⁠ Size of the circle represents the number of 16S rRNA gene copies within each MAG.

48

Figure 2.3. Phylogenetic diversity of soil taxa identified in this study. A Maximum-likelihood tree of the phylogenetic distribution of medium quality sorted-MAGs and bulk MAGs in the context of previously sequenced soil taxa. Colored branches represent clades that include sorted-MAGs and/or bulk MAGs. Orange branches include only sorted-MAGs, blue include only bulk MAGs, and green include both mini and bulk MAGs. Numbers in orange represent number of contributed sorted-MAGs, blue numbers represent bulk MAGs, and gray numbers represent number of reference sequences in each clade. B Phylogenetic diversity expansion through mini- and bulk MAGs. Gray represents the total branch length contributed by soil reference sequences from the IMG database. Orange bars represent total branch length from sorted-MAGs and blue represents branch length from bulk MAGs. The percentage increase in phylogenetic diversity from this study is shown next to each bar.

49

Figure 2.4. Comparison of MAGs from this study with published data from terrestrial metagenomes. Innermost is a maximum likelihood tree based on a concatenated alignment of 56 conserved marker proteins from medium quality sorted-MAGs and bulk MAGs recovered in this study. Mini and bulk MAGs were dereplicated by clustering at 95% average nucleotide identity, resulting in 173 sorted-MAGs and 28 bulk MAGs. The clade names are color-coded according to phylum. Individual tracks around the tree depict hits of individual mini- and bulk MAGs by metagenome samples arising from each terrestrial habitat type as specified in the legend. Height of the bar chart indicates the total number of mini- and bulk MAG coding sequences that match metagenome samples. The MAGs were considered matches if they had a minimum of 200 coding sequences with hits at ≥ 95% amino acid identity over 70% alignment lengths to CDS of an individual metagenome. The figure was rendered using iTOL (H. Zhang et al., 2018).⁠⁠

50

Figure 2.5. Insights into carbon metabolism within the phylum Bacteroidetes. A Concatenated marker gene tree of 66 Bacteroidetes sorted-MAGs and 70 Bacteroidetes reference sequences from the IMG/M database. Sorted-MAGs are distributed across three families of Bacteroidetes, including Cytophagaceae, Chitinophagaceae , Sphingobacteriaceae, and a clade of unclassified sorted-MAGs. B Genome size is shown with darkest color representing the largest genome of 9.1Mb and lightest representing a genome size of 0.6Mb. C Genome completeness is shown as a color gradient, ranging from 50% to 80.5%. D Genome GC content is presented as a gradient is presented as a gradient from 21.13% to 61.24% . E,F, G The percentages of genes annotated as glycoside hydrolases (E), glycosyl transferases (F), and carbohydrate binding modules (G) are illustrated as bars with vertical lines denoting 0% and 50% of genes. Bacteroidetes with known symbiotic relationships are indicated with * .

51

CHAPTER 3

GENOME-INFORMED ECOPHYSIOLOGICAL CHARACTERIZATION OF NOVEL, UNCULTIVATED

SUBDIVISION 2 ACIDOBACTERIA3

3.1 Abstract

The bacterial phylum Acidobacteria is found ubiquitously in the environment, representing nearly 20% of global bacterial diversity. In soil, acidobacteria are particularly abundant and are thought to be drivers of carbohydrate degradation as they are slow-growing and contain a large array of genes for carbon metabolism. To date, most of what is known about

Acidobacteria has come from bacterial culture or amplicon sequencing studies. These approaches have led to the proposal of 26 subdivisions in this phylum, and have captured a broad scope of diversity in acidobacterial subdivisions 1, 3, 4, and 8. Among the less-well known clades, subdivision 2 acidobacteria are highly abundant in soils, however characterization of this taxonomic clade remains limited and there are no reference genome sequences available. We here analyze Acidobacteria from combined mini-metagenomic and bulk metagenomic characterization of communities in forest soils collected at the Barre Woods warming experiment at the Harvard

Forest Long-Term Ecological Research (LTER) site. We generated 26 medium- and high-quality acidobacterial metagenome assembled genomes (MAGs), 13 of which are members of subdivision 2 acidobacteria. This study represents one of the first analyses of the predicted metabolism of subdivision 2 acidobacteria from soils, and describes the ecological niches for these

3 Alteio LV, Ryan E, Goudeau D, Malmstrom RR, Blanchard JL, Katz LA, Woyke T, and Eichorst SA. Genome-informed ecophysiological characterization of novel, uncultivated subdivision 2 Acidobacteria. In prep.

52

bacteria. As a major carbon sink, soils are anticipated to become a carbon source under conditions of continued climate change. Acidobacteria play a large role in terrestrial carbon cyling, making discovery of acidobacterial diversity and analyses such as these critical to understanding microbial diversity, function and niche specialization as a result of environmental selection under a changing climate.

3.2 Introduction

Members of the phylum Acidobacteria are found in various terrestrial habitats (Dunbar,

Barns, Ticknor, & Kuske, 2002; Hausmann et al., 2018; Kuske, Barns, & Busch, 1997; Männistö,

Rawat, Starovoytov, & Häggblom, 2012; Navarrete et al., 2013; Pankratov, 2012; Štursová et al.,

2016). While representatives have been identified in freshwater ponds (Zimmermann, Portillo,

Serrano, Ludwig, & Gonzalez, 2012),⁠ acid mine lakes (Wegner & Liesack, 2017a) ⁠ and hot springs

(Losey et al., 2013),⁠ this phylum is of particular interest in soil ecosystems as acidobacteria can comprise around 25 to 40% of the bacterial community (Fierer,⁠ 2017).⁠ There are 26 recognized subdivisions or clades of the phylum Acidobacteria (Barns, Cain, Sommerville, & Kuske, 2007),⁠ which can be assigned to 15 class-level units (Dedysh & Yilmaz, 2019).⁠ However, this tremendous scope of diversity remains largely underexplored due to methodological challenges (i.e. lack of cultivable lineages, challenges with assembly of low abundance taxa from metagenomes).

Members of taxonomic subdivisions 1, 2, 3, 4 and 6 are typically the most highly abundant in soils (Janssen, 2006; Jones et al., 2009),⁠ and data are now emerging on their ecophysiology in the soil environment. Insights have come from whole genome and metagenomic sequencing, yet we currently only have representative genomes available for members of subdivisions 1, 3, 4 and

6 isolated from soils (S. A. Eichorst et al., 2018). Acidobacteria harbor a large array of metabolic genes involved in substrate acquisition and degradation, which may make them well-adapted to 53

fluctuations in nutrient concentration and quality (S. A. Eichorst et al., 2018; Kielak, Matheus,

Cipriano, Eiko, & Kuramae, 2016; Naether et al., 2012).⁠ The ability to occupy diverse niches in a soil ecosystem, and to adjust to low nutrient conditions may confer an advantage to Acidobacteria under conditions of rapid climatic change.

Although members of subdivision 2 acidobacteria are fairly ubiquitous in soils (Catão et al., 2014; Janssen, 2006),⁠ there are currently no published genomes and few metabolic data.

Previous studies have revealed the elusive subdivision 2 acidobacteria in high abundance in soils across different biomes using analysis of 16S rRNA (e.g. Catão et al., 2014; DeAngelis et al., 2015;

Naether et al., 2012; Navarrete et al., 2013). However, few studies have been able to describe the functional potential of this group of bacteria. Of 11,230 16S rRNA sequences in the SILVA database assigned to subdivision 2, 89% of these were sequenced from soil environments (Quast et al.,

2012),⁠ which reinforces the importance of studying this group in soils. To date, the metabolism of only one metagenome assembled genome (MAG) classified as a subdivision 2 acidobacterium has been described (Wegner & Liesack, 2017a).⁠

To explore the diversity and genomic potential of Acidobacteria, we applied complementary metagenomic approaches to soil collected at the Barre Woods long-term experimental warming plots, located at the Harvard Forest Long-Term Ecological Research (LTER) site (Alteio et al., in revision; Schulz et al., 2018). Previous studies have shown that these soils harbor an extensive diversity and abundance of Acidobacteria (Alteio et al., in revision; DeAngelis et al., 2015).⁠ At these long-term warming sites, Acidobacteria have increased in abundance with warming treatment relative to the control soils, suggesting an important role for these bacteria under a changing climate (DeAngelis et al., 2015).⁠ In this study, we uncover a broader diversity of elusive subdivision 2 acidobacteria using metagenome assembled genomes (MAGs) from forest soil and characterize the metabolic potential of these acidobacterial lineages, to investigate 54

potential ecological niches for these bacteria. Additionally, simulated climate change has resulted in decreases in available soil organic matter and an overall decrease in microbial biomass. Due to these changes, we investigated the metabolism of subdivision 2 acidobacteria to determine if they express traits that make them particularly well-adapted to low-nutrient soil conditions, like those observed in the heated plots at the Barre Woods experimental site.

3.3 Results and Discussion

3.3.1 Taxonomic distribution across bulk- and mini-metagenomes reveals abundant acidobacteria in Harvard Forest soils

The bulk metagenomes from forest soil collected at the Barre Woods long-term warming experiment at the Harvard Forest selected primarily acidobacteria while mini-metagenomic (i.e. pooled cell-sorting) approach yielded a more even distribution of bacterial taxa (Figure 3.1; Alteio et al., in revision).⁠ Acidobacteria were the most prevalent in the bulk metagenomes comprising ca. 75% of all reads, in comparison to the mini-metagenomes where they comprise 5% of the total reads (Figure 3.1). The acidobacterial reads represent the major acidobacterial classes detected in previous studies of soil diversity (e.g. (Dedysh & Yilmaz, 2019) ⁠ and are distributed across the following classes: Acidobacteriia, Blastoctocatella, Holophagae, Solibacteres, subdivision 6 and unclassified (Figure 3.1).

We extracted the 16S rRNA genes from mini-metagenome and metagenome reads using

PhyloFlash (Gruber-Vodicka, Seah, & Pruesse, 2019) ⁠ to further explore taxonomy.⁠ Hierarchical classification of these extracted 16S rRNA sequences using the RDP Classifier (Q. Wang, Garrity,

Tiedje, & Cole, 2007) resulted in 155 and 152 reads identified as acidobacteria from the total set of 1,374 metagenome and 759 mini-metagenome reads, respectively. We found members of subdivisions 1-7, 10 and 13 in this forest soil, with members of subdivisions 1, 2 and 3 being the

55

most prevalent (Figure 3.2). More specifically, members of subdivision 1 represented ~37% of the acidobacterial reads (26 metagenomic, 88 mini-metagenomic reads), subdivision 2 represented ca. 30% of the acidobacterial reads (48 metagenomic, 46 mini-metagenomic reads) and subdivision 3 represented ca. 6% of the acidobacterial reads (11 metagenomic, 8 mini- metagenomic reads) (Table 3.1).

3.3.2 Phylogenomic distribution of acidobacterial MAGs in forest soil expands known diversity of subdivision 2 acidobacteria

Genome binning from these combined approaches resulted in a total of 26 metagenome assembled genomes (MAGs) assigned to the phylum Acidobacteria. The estimated completeness of these MAGs ranged from 50.02% to 99.95%, while the estimated contamination (i.e. the presence of lineage-specific marker genes from multiple lineages in a single genome bin as assessed by CheckM) ranged from 0% to 7.22% (Table 3.2; Parks et al., 2015).⁠ The estimated genome size ranged from 2.1 to 6.8 Mb, and GC content ranged from 54.6% to 60.9% (Table 3.2), which are smaller in size and lower in GC content when compared to sequenced isolates and MAGs generated from peatland soils (Hausmann et al., 2018).⁠

Using a set of 120 bacterial marker genes from in the Genome Taxonomy Database (GTDB;

Parks et al., 2018),⁠ we assessed the phylogenomic placement of these 26 MAGs. Twelve MAGs clustered within subdivision 1, and one MAG clustered within subdivision 3 (Figures 3.3, S1).

Additionally, 13 MAGs clustered within a clade of acidobacteria reference MAGs that were previously identified as subdivision 2 acidobacteria (Figure 3.3). Compared to the nine available subdivision 2 MAGs in the GTDB database, the unclassified MAGs from this study are higher in completeness and lower in contamination, making them some of the highest quality subdivision

2 MAGs to-date (Figure 3.4).

56

We screened the 13 unclassified acidobacterial MAGs for the presence of a 16S rRNA gene for better taxonomic assignment using CMsearch (Cui, Lu, Wang, Jing-Yan Wang, & Gao,

2016).⁠ Four contained a nearly full-length 16S rRNA gene (1427 to 1534 bp; Figure 3.3). Based on

RDP classification, these sequences belong to subdivision 2 with 100% confidence (Table S1). This taxonomic assignment was further confirmed through the generation of a maximum likelihood phylogenetic tree with sequence representatives of subdivision 2 (Figure 3.5).

The generation of numerous high-quality subdivision 2 MAGs in this study presents the opportunity to expand our existing knowledge of diversity within this abundant clade of

Acidobacteria. Previously only mentioned in amplicon sequencing studies due to challenges in isolation and assembly using high-throughput sequencing (Jones et al., 2009),⁠ MAGs enable exploration of the metabolic diversity of this clade that was previously not possible. The investigation of this group using high-quality draft genomes allows us to elucidate ecological roles and adaptations of subdivision 2 acidobacteria that may explain their abundance and ubiquity across soil ecosystems.

3.3.3 Comparison of genomic similarity across acidobacteria supports discovery of novel species

We compared the average nucleotide identity (ANI) across the MAGs and acidobacterial genomes stemming from isolates. To do this, we first refined our acidobacterial dataset using a more stringent quality filtering threshold of ≥80% completeness and ≤2.5% contamination as the isolated genomes have nearly 100% completeness and <5% contamination (S. A. Eichorst et al.,

2018).⁠ Applying this quality filtering threshold, the resulting dataset included 5 MAGs in subdivision 1, and 5 MAGs in subdivision 2. The ANI values ranged from 74.53% to 80.79% across members of subdivision 2 (Figure S2), suggesting that they could represent unique species based

57

on the proposed species threshold of 95% ANI (e.g. (Goris et al., 2007; Konstantinidis & Tiedje,

2005; L. Rodriguez-R & Konstantinidis, 2016).⁠ The ANI values ranged from 73.57% to 78.39% across members of subdivision 1 (Figure S3), again suggesting unique species. We compared these subdivision 1 MAGs to their closest, cultured relative based on the phylogenomic tree and found that subdivision 1 MAG HF_1_M5 had the highest percent identity (73.53%) to “Candidatus

Koribacter versatilis Ellin345” and MAG HF_1_M2 (70.19%) to Acidobacterium capsulatum (Figure

S3). This low percent identity based on nucleotide BLAST analyses indicates the substantial diversity we captured within the well-studied subdivision 1 clade of acidobacteria. Clearly, even with extensive available sequence data and cultured isolates, we are only beginning to scratch the surface of this understudied yet ubiquitous group of bacteria.

As the ANI values were relatively low (e.g. 70-78%), we compared average amino acid identity (AAI) across the subdivision 2 acidobacterial genomes to further assess genomic similarity.

The AAI values across subdivision 2 acidobacteria ranged from 55.54% to 80.99% (Figure 3.2). The

AAI values across subdivision 1 ranged from 50.46% to 71.08% (Figure S3). The low AAI percentages across these acidobacterial subdivisions again suggest that these are distinct species, and perhaps potentially novel genera (Goris et al., 2007; L. M. Rodriguez-R & Konstantinidis, n.d.).⁠

Using these combined approaches – ANI, AAI, as well as phylogenomic and 16S rRNA gene trees – presents opportunities to taxonomically classify uncultivated acidobacteria with high confidence. We recognize that taxonomic identification can only be assigned to bacteria if a cultured isolate is submitted to culture collections, and a host of physiological characteristics are described (Konstantinidis, Rosselló-Móra, & Amann, 2017).⁠ However, recent efforts in the community of microbial researchers has opened the possibility for taxonomic classification of uncultivated and ‘yet-to-be cultivated’ microorganisms. According to standards proposed in

Konstantinidis et al 2017, the MAGs analyzed in this study are of high enough quality for ANI/AAI

58

comparisons of genomic discreteness (≥80% completeness, ≤2.5% contamination), and we provide a thorough description of the ecological data surrounding the collection of samples.

Additionally, a few of the high quality MAGs contain complete or nearly complete 16S rRNA genes

(Konstantinidis et al., 2017).⁠ Based on these characteristics and the description of key metabolic functions and gene expression of these MAGs (discussed in later sections), we can confidently define taxonomy for these acidobacterial MAGs.

3.3.4 Genomic potential reflects functional similarities driven by acidobacterial subdivision

The high quality of the draft genomes generated through this study presents the opportunity to characterize some of the potential metabolic diversity of uncultured Acidobacteria through analyses of gene content. We annotated functional genes in the high-quality subdivision

1 and 2 MAGs from this study, as well as select reference genomes, using EggNOG (Huerta-Cepas et al., 2019) ⁠ in order to describe the functional potential of these organisms and estimate potential niches in the diverse soil ecosystem.

Previous acidobacterial comparative genomics revealed a clear grouping by subdivisions and environments based on the analysis of the COG/NOG categories (Eichorst et al., 2018). We sought to expand upon these observations with our high-quality MAGs (≥80% completeness,

≤2.5% contamination) and performed a principal coordinate analysis of the COG/NOG categories.

The PCo1 axis explains 22% of the variability and appears to separate largely based on taxonomic subdivision, while PCo2 axis explains 13% of the variability (Figures 3.6, S4, S5). Acidobacteria in subdivision 1 appear in two clusters, with one appearing more similar to subdivision 3. The second cluster of subdivision 1 acidobacteria falls closer to “Candidatus Koribacter versatilis Ellin345”

(referred to as Ellin345 from here forward) and the subdivision 2 acidobacteria, indicating potential similarity in metabolic function. 59

The clustering of subdivision 1 acidobacteria into two distinct groups along with Ellin345 aligns with the taxonomic features of this particular genome. Phylogenomic analyses place the genome of Ellin345 in a distinct clade from the rest of the subdivision 1 acidobacteria (Figure 3.3), illustrating the taxonomic distinctiveness of this particular genome. The high quality subdivision 1

MAGs from this study including HF_1_M3, HF_1_M4, HF_1_M5 and HF_1_M6 group closely with

Ellin345 on the phylogenomic tree (Figure 3.3), and also cluster closely with Ellin345 in the PCoA analyses (Figures 3.6, S4).

The previously-noted unusual metabolic genes found in Ellin345 may be driving the clustering pattern of subdivision 1 and 2 MAGs (Figure 3.6), indicating that subdivision 2 acidobacterial MAGs may have functions in common with this cultivated soil bacterium. A comparative study of three acidobacteria genome sequences in subdivisions 1 and 3 revealed that approximately 25% of coding sequences in Ellin345 were unique to that genome sequence, with the remainder of genes being mainly housekeeping genes (Ward et al., 2009).⁠ The clustering pattern including Ellin345 and our additional MAGs demonstrates some functional similarity between these sequences, indicating the potential for some overlap in ecological niches for these acidobacteria.

3.3.5 Carbohydrate degradation potential indicates overlap in metabolism and niche specialization across Acidobacteria

In addition to the broad functional categories assessed using EggNOG (Figure 3.6), we detected a diverse array of genes involved in the transport and metabolism of carbohydrate substrates in subdivision 2 MAGs. This is consistent with previous descriptions of acidobacteria as versatile heterotrophs with relatively slow growth and oligotrophic metabolic strategies (Kielak,

Barreto, Kowalchuk, van Veen, & Kuramae, 2016; Lladó et al., 2016; Männistö, Kurhela, Tiirola, &

Häggblom, 2013).⁠ We explored the carbohydrate degradation potential of the 5 high-quality 60

subdivision 2 MAGs using the CAZy database (V. Lombard et al., 2014; H. Zhang et al., 2018).⁠ To determine if these subdivision 2 MAGs are functionally distinct from reference genomes from other subdivisions (i.e. 1, 3, 4, 6, 8 and 23), we assessed these genomes and draft genomes for patterns of clustering using principal coordinate analysis.

The clustering pattern of the carbohydrate degradation genes are distinct from the

COG/NOG pattern (Figures 3.6, S4, S5). The PCo1 axis explained over a third of the variability

(37%) and separates based on subdivisions (Figure 3.7). In general, members of subdivisions 1 and 3 are distinct from members of subdivision 2, 4, 8 and 23. This demonstrates differences in the genomic potential for carbohydrate metabolism in members of subdivision 2 in comparison to members of subdivisions 1 and 3 (Figure 3.7, S6, S7). Yet there does appear to be much overlap amongst our MAGs. Select subdivision 1 and 2 MAGs from our forest soil appear to be more similar in terms of carbohydrate degradation potential as compared to previously published isolate genomes (Figure 3.7). Specifically, subdivision 1 MAGs including HF_1_M3, HF_1_M4 and

HF_1_M6 cluster within a larger grouping of subdivision 2 MAGs (Figure 3.7).

To determine the genes that might be driving the overlap of subdivisions 1, 2 and 3 we compared the array of genes related to carbohydrate metabolism in MAGs and reference genomes across these subdivisions (Figures S7, S8). Glycoside hydrolase genes comprise 48% of annotated genes in subdivision 1 on average, an average of 33.8% in subdivision 2, and an average of 39.5% in subdivision 3 indicating that hydrolysis of glycosidic bonds is an important metabolic strategy in soil Acidobacteria (Figure S8). Across the three subdivisions, the acidobacteria contain genes in glycoside hydrolase family 13, a broad group containing numerous subfamilies involved in starch and glycogen degradation (C. Berlemont & Martiny, 2015; R. Berlemont & Martiny, 2016).⁠

This family was further divided into subfamilies that show improved substrate specificity than

61

when observed at the broader gene family level (V. Lombard et al., 2014).⁠ The genomes in all three subdivisions contain subfamilies 9, 11 and 13, that are involved in the conversion of amylose to amylopectin through the transfer of a glucan group (Labes et al., 2008),⁠ isoamylase activity, as well as pullalanase and dextrinase activity, respectively (C. Berlemont & Martiny, 2015).⁠ These findings demonstrate that the ability to degrade starch and glycogen compounds is ubiquitous throughout phyla of soil Acidobacteria and could be driving the clustering pattern between these genomes.

The pattern of clustering, where subdivision 2 MAGs overlap with subdivisions 1 and 3, indicates that environmental selection may play a role in determining ecological niches for these acidobacteria, rather than function being driven solely by taxonomic clade. The close grouping of subdivision 2 and 3 acidobacteria, as well as some of the subdivision 1 acidobacteria, indicates some functional overlap of these taxa in terms of carbohydrate degradation potential. This demonstrates that the acidobacteria share close ecological niches in the soil environment, and is consistent with the pattern observed in Catao et al (2014) whereby increases in the abundance of subdivision 2 acidobacteria resulted in a corresponding decrease in the abundance of subdivision 1 acidobacteria. The overlap in carbohydrate degradation potential exhibited here indicates similar metabolic strategies which drives direct competition for substrates, and results in environmental selection for particular Acidobacteria taxa.

In addition to starch degradation genes, acidobacteria in subdivisions 1 and 3 contain a broader array of glycoside hydrolases across different gene families. These acidobacteria contain glycoside hydrolases in families 36 and 42 with galactose and arabinopyranoside hydrolytic activity

(Figure S7). Additionally, genomes in subdivision 1 and contain genes in GH families 79 with glucuronidase and heparanase activity, families 92 and 125 with mannose hydrolytic activity, as well as families 95 and 141 which break down fucose and xylan (Figure S7). However, these genes

62

are absent across the subdivision 2 MAGs, demonstrating that genomes in subdivisions 1 and 3 have broader array of genes with the potential to degrade a wider range of carbohydrate substrates. Overall, carbohydrate metabolism may be reduced in subdivision 2 when compared to acidobacteria in subdivisions 1 and 3.

Although the relatively small number of glycoside hydrolases found in subdivision 2 sequences could be attributed to the incomplete nature of these draft genomes relative to isolate genomes, the absence of these genes may also reflect an alternative strategy of these subdivision

2 acidobacteria for carbohydrate metabolism. Many of the CAZy genes found in subdivision 2 draft genomes are associated with hydrolysis of relatively simple carbohydrate molecules, the abundance of which suggests that the putative subdivision 2 acidobacteria are not involved in degradation of complex carbon substrates, but rather, metabolize more readily oxidizable substrates. The ability of subdivision 2 acidobacteria to degrade simple compounds including glucose, xylose and trehalose in addition to starch, is distinct from the more diverse array of carbohydrate genes in subdivisions 1 and 3.

3.3.6 Gene expression analysis suggests metabolic flexibility of subdivision 2 acidobacteria

In addition to functional potential of coding sequences in the five subdivision 2 MAGs, we mapped assembled contigs from four Barre Woods soil metatranscriptomes to the MAGs to estimate gene expression. We assigned functional classifications to these mapped contigs using the EggNOG and KEGG databases (Huerta-Cepas et al., 2019; M. Kanehisa & Goto, 2000).⁠ For central carbon metabolism, we find evidence of expression of the complete Embden-Meyerhof, tricarboxylic acid cycle, and the pentose phosphate pathway, as well as evidence of the partial

Entner-Doudoroff pathway (Figure 3.8). This is consistent with the obligate aerobic strategy of most acidobacteria genera, with the exception of Acidobacterium and Telmatobacter sp. that are

63

facultatively anaerobic (Campbell, 2014).⁠

In addition to central carbon metabolism, we evaluated the expression of genes involved in accessing nutrients and substrates as these are critical for bacterial survival. The five subdivision

2 acidobacteria MAGs express diverse ABC transporters including transporters for uptake of phosphate, D-xylose, glucose, mannose, sorbitol and mannitol (Figure 3.8). Additionally, the genomes express two-component systems including CusS and CusR for the uptake of copper ions, as well as PhoBRP, SenX3, RegX3, and PstS genes that are indicative of phosphate uptake under limiting conditions (Figure 3.8). Expression of these genes involved in substrate uptake are indicative of versatile heterotrophy that is characteristic of bacteria in the phylum Acidobacteria

(S. A. Eichorst et al., 2018; Kielak, Barreto, et al., 2016).⁠

We also evaluated the expression of genes involved in motility. Functional annotations of the MAGs and their respective transcripts reveal evidence of a non-motile life strategy with genes for motility and flagellar assembly not expressed across all five genomes (Table 3.3). This lack of expression of flagellar assembly is in contrast to genomes from cultured isolates in subdivision 1, particularly Ellin345 and Acidobacterium capsulatum (Ward et al., 2009);⁠ however, this trait is variable across the phylum Acidobacteria (Campbell, 2014).⁠ The absence of expression of these genes indicates that these bacteria were likely sessile at time of collection.

3.3.6.1 Evidence of assimilatory sulfur metabolism

In previous studies of Acidobacteria from peatland soils and acid mine drainage, sulfur cycling is an essential process for survival (Hausmann et al., 2018; Wegner & Liesack, 2017a).⁠ We assessed the expression for sulfur metabolism in our subdivision 2 acidobacteria MAGs.

Transcripts mapped to the five subdivision 2 acidobacteria MAGs and provide evidence of

64

assimilatory sulfate reduction in forest soils. Additionally, the five MAGs express Sox genes involved in thiosulfate oxidation, including SoxC, SoxD, and SoxY (Figure 3.8). In contrast to previously studied acidobacteria from peatland soils in subdivisions 1 and 3, we detect no evidence of dissimilatory sulfate reduction (including dsrABC or aprBA) in the subdivision 2 MAGs.

Environments such as acid mine drainage and peatlands contain sulfur compounds in high to even toxic concentrations, compared to temperate forest soils from which these acidobacteria MAGs were generated that have relatively low sulfur. Therefore, cycling of sulfur compounds may be important but not crucial for survival in temperate forest soils in comparison to highly sulfuric environments.

3.3.6.2 Assimilatory nitrate reduction in subdivision 2

Nitrogen is an essential nutrient, the cycling of which is mediated largely by microorganisms in terrestrial environments (Lladó et al., 2017; Jerry M Melillo et al., 2011; C. Wang et al., 2018).⁠ All five subdivision 2 acidobacteria MAGs have evidence of assimilatory nitrate reduction, particularly the expression of NarB, NR, and NasAB (Figure 3.8). Additionally, the five genomes express most of the genes involved in dissimilatory nitrate reduction to ammonia (Figure

3.8). The reduction of nitrate to nitrite is encoded for by NarGHI and NapAB genes across four of the five subdivision 2 MAGs. Further reduction of nitrite to ammonia is encoded for by the expression of NirBD and NrfAH, both of which are present in four of the five subdivision 2 genomes, whereas only NirBD is expressed in HF_2_M10 (Figure 3.8).

In studies of nitrogen cycling dynamics in combination with simulated climate change, warming has induced increases in nitrogen mineralization which is then used by vegetation (Butler et al., 2012; Rustad et al., 2001).⁠ At the Harvard Forest LTER site, increased nitrogen mineralization has been observed in the warmed soil plots as compared to the control (Butler et al., 2012; Serita 65

D Frey, Knorr, Parrent, & Simpson, 2004).⁠ The subdivision 2 acidobacteria MAGs in this study express genes involved in assimilatory and dissimilatory nitrogen reduction, as well as the conversion of nitrite to ammonia. This indicates that acidobacteria contribute to nitrogen mineralization, making inorganic nitrogen more available to plants, which in turn is consistent with measured increases in nitrogen mineralization in response to warming (Butler et al., 2012).⁠

3.3.6.3 Terminal oxidases enable respiration in low oxygen environments

Using the results from EggNOG and KEGG (Huerta-Cepas et al., 2019; M. Kanehisa & Goto,

2000),⁠ we further explored the metabolic potential of subdivision 2 acidobacteria to determine strategies that make them well-adapted to soil ecosystems. Evidence of obligately aerobic respiration is supported by the lack of alternative electron donors within the subdivision 2 acidobacteria MAGs (Campbell, 2014; Wegner & Liesack, 2017b).⁠ However, these genomes have transcripts for high- and low-affinity terminal oxidases: all five genomes contain evidence of Cox gene expression, which is part of heme-cytochrome oxidase class A (aa3-type)(S. A. Eichorst et al., 2018).⁠ Additionally, the five genomes express high oxygen affinity terminal oxidases (bd-type) in heme-cytochrome oxidase class C (S. A. Eichorst et al., 2018).⁠

The presence of both low- and high-affinity terminal oxidases contributes to the metabolic flexibility of these acidobacteria by enabling respiration across oxygen gradients (Pereira, Santana,

& Teixeira, 2001; Sousa et al., 2012),⁠ even in microaerobic conditions. Given the heterogeneity of nutrients and oxygen in soils, (Hansel, Fendorf, Jardine, & Francis, 2008; Rillig et al., 2017),⁠ nutrient gradients within the heterogeneous soil structure may also lead to ‘hot-spots’ of microbial activity, and additionally result in anoxic microenvironments (Borer, Tecon, & Or, 2018; Kuzyakov &

Blagodatskaya, 2015; Rillig et al., 2017).⁠ Soil structure as well as biological processes that mediate

66

the distribution of nutrients and oxygen necessitate flexible metabolic strategies, such as the expression of low- and high-affinity terminal oxidases that support the survival of subdivision 2 acidobacteria.

3.3.6.4 Hydrogenases enable hydrogen scavenging and utilization

We assessed expression of 1h/5 [Ni-Fe] hydrogenases that are involved in hydrogen scavenging and uptake from the environment. Transcripts mapped to the subdivision 2 MAGs reveal that all five genomes express [NiFe] hydrogenases, including the genes HupS and HupL

(Figure 3.8). These genes code for the expression of the small and large catalytic subunits of [NiFe] hydrogenases, respectively. The distribution of hydrogenases is scattered throughout the bacterial domain, and is consistent with frequent horizontal gene transfer (Figure S9; S. A. Eichorst et al.,

2018; Greening et al., 2015).⁠

The expression of large and small subunit [NiFe] hydrogenases among our MAGs indicates that hydrogen scavenging and uptake is a metabolic strategy across acidobacteria in soils.

Hydrogen presents a valuable growth substrate in environments where carbon substrates have become more limiting (Conrad, 1996; Greening et al., 2015).⁠ In particular, for acidobacteria that exhibit a slow-growing, oligotrophic strategy, hydrogen scavenging supports dormancy and metabolism in nutrient poor environments (Greening et al., 2015).⁠ The ability to use hydrogen as a substrate is consistent with observed decreases in readily oxidizable organic matter present in the warming plots at the Harvard Forest long-term warming simulation experiments (Pold, Grandy,

Melillo, & Deangelis, 2017).⁠ With reduced availability of carbohydrate substrates which acidobacteria are known to readily degrade, hydrogen utilization offers a survival strategy in low nutrient environments (Greening et al., 2016, 2015),⁠ such as soil under long-term warming exposure.

67

3.3.7 Metabolic strategies across Acidobacteria reflect environmental selection and niche specialization

In addition to the presence of carbohydrate metabolism genes in subdivision 2 acidobacteria, we assessed the expression of genes involved in carbohydrate degradation by assigning functional classification to mapped metatranscriptomes using the carbohydrate active enzyme database (CAZy; (V. Lombard et al., 2014)(Figure⁠ S10, Table 3.4). Subdivision 2 acidobacteria express the starch degradation genes from the GH13 subfamilies 9, 11 and 13 (C.

Berlemont & Martiny, 2015).⁠ Additionally, all five of the subdivision 2 MAGs express genes in glycoside hydrolase families GH3, 23 and 109. Genes in family GH3 act broadly on monomers containing glucosides, arabinofuranosides, xylopyranosides and N-acetyl-glucosamines ((López-

Mondéjar, Zühlke, Becher, Riedel, & Baldrian, 2016)).⁠ Families 23 and 109 degrade peptidoglycan, and N-galactosamine, respectively (López-Mondéjar et al., 2016).⁠ Additionally, genes in glycoside hydrolase families GH15, 16, 29 and 128 are expressed across at least 3 subdivision 2 MAGs. These genes break down glucans, fucose, galactose, xylose, dextrin, and trehalose (López-Mondéjar et al., 2016).⁠ Glycoside hydrolase family 18 is expressed as well, and exhibits chitinase and N- acetylglucosaminidase activity (Belova et al., 2018).⁠

Taken together, the expression of these gene families provides further support for flexible carbon substrate metabolism exhibited by subdivision 2 acidobacteria. The abundance of genes in the glycoside hydrolase gene family, representing an average of 40.8% of expressed genes annotated with CAZy across the five subdivision 2 genomes (Figure S11), reveals the versatile heterotrophic strategy of subdivision 2 acidobacteria. This metabolic flexibility could support the abundance of subdivision 2 acidobacteria in nutrient-depleted soils, and enable them to adapt to changing environmental conditions.

68

Climate change is expected to induce shifts in microbially-mediated processes in soil by changing the quality and quantity of carbon substrates (Bond-Lamberty et al., 2016; Jerry M

Melillo et al., 2011; Pold et al., 2017).⁠ In addition to containing a small suite of glycoside hydrolase genes, the reduced ability for subdivision 2 acidobacteria to degrade complex carbon substrates is further supported by changes in the available substrate pool measured at the Harvard Forest experimental warming plots after extensive warming (Pold et al., 2017).⁠⁠ At the longest-running

Harvard Forest warming experiment at Prospect Hill, warming has induced a decrease in recalcitrant carbon substrates, providing more readily oxidizable carbon for microorganisms (Pold et al., 2017).⁠ This reduction in structurally complex substrates and subsequent increase in more easily degraded compounds supports an expanded niche for subdivision 2 acidobacteria under soil warming conditions.

In accordance with this, subdivision 2 acidobacteria from samples collected at the sister warming site at Barre Woods are found in higher abundance in the lower mineral horizon soils compared to the upper organic soils (Figure S12). Mineral soils contain fewer inputs of vegetation and complex organic matter when compared to organic soils, which may select for organisms that can survive in low-nutrient conditions. The reduced expression of carbon degradation genes in subdivision 2 acidobacteria, and their increased abundance in low nutrient habitats suggests that they are well-equipped to survive in these conditions. These findings are consistent with a microbial community with more specific and relatively reduced carbon metabolism as measured in previous studies at the Harvard Forest (Bradford et al., 2008; S. D. Frey et al., 2008; Pold et al.,

2017),⁠ indicating an important role for subdivision 2 acidobacteria under a changing climate.

3.8 Synthesis

Acidobacteria are ubiquitous across environments, but have traits that may make them

69

particularly well-adapted to survival in soils. The generation of the highest-quality subdivision 2

MAGs to-date in this study has enabled genome-informed investigation of this previously undescribed group of bacteria. In combination with metatranscriptomics, we were able to characterize the taxonomic and metabolic diversity of subdivision 2 acidobacteria. Additionally, we were able to hypothesize niche specialization of this subdivision based on expression of genes that make them well-adapted to low-nutrient soils.

The abundance of genes and transcripts allocated to carbohydrate metabolism in these genomes, as well as the expression of terminal oxidases and hydrogenases indicate that the subdivision 2 acidobacteria, in particular, are versatile heterotrophs capable of survival in rapidly changing conditions present in soil microenvironments. Climate change in the form of soil warming is expected to alter the composition of available carbon in soils through microbially- mediated activity. Changes in soil moisture, as well as substrate and nutrient availability induced by climate change may select for more slow-growing, oligotrophic taxa such as acidobacteria. The high abundance of subdivision 2 acidobacteria in soil ecosystems make them particularly interesting for understanding how microbially-mediated processes might shift in response to climate change.

3.9 Methods

3.9.1 Sample Collection and Processing

A set of fourteen soil cores were collected on 24 May 2017 from the Barre Woods plot of the Harvard Forest Long-Term Ecological Research (LTER) site (Petersham, MA, United States). Soil samples were collected using a tulip bulk corer and cores were separated into organic (top layer) and mineral horizon (lower layer) based on visual inspection, resulting in 28 soil samples. Samples were flash-frozen in the field using an ethanol and dry ice bath. Frozen samples were transported

70

to the University of Massachusetts Amherst (Amherst, MA, United States) for long-term storage, and to the Joint Genome Institute (JGI; Walnut Creek, CA, United States) for further processing and nucleic acid extraction.

3.9.2 Mini-Metagenomics

From the original set of 28 soil samples, one representative sample from each treatment and soil horizon (heated organic, heated mineral, control organic and control mineral) was selected for downstream processing using cell sorting coupled to high-throughput sequencing.

Cells and large soil particles were separated through the application of tween 20 and vortexing, followed by centrifugation to pellet smaller soil particles. Intact cells were stained using SYBR- green, and 100 SBYR+ cells were sorted into 90 pools using fluorescence activated cells sorting for each of the four selected soils. These pools of cells underwent multiple displacement amplification

(MDA) and downstream nextera library preparation prior to sequencing on the Illumina NextSeq platform. See Schulz et al 2018 and Alteio et al in revision for further details on sample preparation

(Alteio et al., in revision; Schulz et al., 2018).⁠

3.9.3 Bulk Metagenomics

The complete set of 28 soil samples was extracted for total DNA using the DNeasy Soil

DNA extraction kit (QIAGEN). The 28 unamplified TruSeq libraries were sequenced on the Illumina

HiSeq platform at the Joint Genome Institute (JGI; Walnut Creek, CA, United States). Following standard quality filtering procedures, reads were assembled using SpAdes (Bankevich et al., 2012).⁠

From the original 28 samples, the 4 samples corresponding to those that were processed via fluorescence activated cell sorting were selected for analysis in this study.

71

3.9.4 Metatranscriptomics

Total RNA was extracted from the complete set of soil samples collected at the Barre

Woods site using the RNeasy Soil RNA extraction kit (QIAGEN). Libraries were prepared and were sequenced on the Illumina NextSeq platform at the Joint Genome Institute (JGI; Walnut Creek, CA,

United States). Quality cleaned reads were assembled using the MEGAHIT assembler (D. Li, Liu,

Luo, Sadakane, & Lam, 2015).⁠

3.9.5 Genome Binning and Quality Assessment

Genome bins were generated from the mini-metagenome and bulk metagenome datasets using MetaBat2 (Kang et al., 2015).⁠ Using this approach 1793 mini-MAGs and 1776 bulk MAGs were generated. Of these, 11 sorted-MAGs and 14 bulk MAGs were classified to the phylum

Acidobacteria. Following genome binning, bins were assessed for quality using marker genes implemented in CheckM (Parks et al., 2015).⁠⁠ Sorted-MAGs and MAGs were retained for downstream analyses if they surpassed a quality filtering threshold of ≥50% estimated completeness, ≤10% estimated marker gene contamination, and ≤10% strain heterogeneity.

Following quality filtering 11 mini-MAGs and 14 bulk MAGs within the phylum Acidobacteria remained. This dataset was further refined to contain draft genomes with ≥80% estimated completeness and ≤2.5% estimated marker gene contamination in order to better compare with sequenced genomes from isolates in the phylum Acidobacteria. Following this refined quality threshold, 5 MAGs from Subdivision 1 and 5 MAGs from Subdivision 2 were retained, respectively.

3.9.6 Distribution of taxa in bulk and sorted-MAGs

Taxonomy was assigned to contigs from the sorted-MAGs and bulk MAGs by comparing sequence identity against the NCBI-NR database using DIAMOND (Buchfink et al., 2014).⁠⁠ Blast

72

results were imported into MEGAN6 (Huson et al., 2018) ⁠ for taxonomic classification . The relative abundance of each phylum was computed and visualized in R using ggplot2 (Wickham, 2016).⁠⁠⁠

3.9.7 Taxonomic distribution of extracted 16S rRNA gene sequences

To assess the distribution of taxa across the mini-metagenome and bulk metagenome samples, 16S rRNA gene sequences were extracted using PhyloFlash (Gruber-Vodicka et al., 2019) ⁠ from the set of 759 reads and 1,374 reads from the mini- and bulk metagenome sequences, respectively. These reads were classified to taxonomy using the RDP classifier (Q. Wang et al.,

2007),⁠ and relative abundance of taxonomic distribution was visualized using the R package ggplot2 (Wickham, 2016).⁠

In addition to extracting 16S rRNA sequences from the unassembled read data, 16S rRNA gene sequences were identified in the metagenome assembled genomes (MAGs) and sorted-

MAGs using CMsearch (Cui et al., 2016).⁠ This approach revealed 4 of the 13 putative subdivision 2 acidobacterial MAGs contained a full-length 16S rRNA gene (between 1427 to 1534 bp). These extracted 16S rRNA gene sequences were used to generate a maximum-likelihood tree of subdivisions 1, 2, 3, 4, 6 and 8, including 16S rRNA genes from SILVA as reference sequences. The species Acanthopleuribacter pedis (AB303221) was used as an outgroup for the tree. The tree was bootstrapped 1000 times, and bootstrap values are displayed on the tree as ≥95% (●) and ≥70%

(●).

3.9.8 Phylogenomic analysis of MAGs and Sorted-MAGs

The GTDB-Toolkit (GTDB-TK) was used to generate a phylogenomic tree in order to determine taxonomic distribution for MAGs derived from mini-metagenomes and bulk

73

metagenomes (Parks et al., 2018).⁠ This tool assessed the MAGs for the presence of 120 marker genes, and generated a concatenated marker gene tree using the GTDB as a reference dataset.

The resulting tree was visualized using ARB (Ludwig et al., 2004).⁠ A subtree containing reference sequences and MAGs in the phylum Acidobacteria was extracted and visualized.

3.9.9 Average Nucleotide Identity and Amino Acid Identity

To assess genome similarity across acidobacterial subdivision 2 MAGs (≥80% completeness and ≤2.56% contamination), we used the ANI/AAI matrix tool to calculate average nucleotide identity (ANI) and amino acid identity (AAI) (L. Rodriguez-R & Konstantinidis, 2016).⁠

This was repeated for subdivision 1 MAGs and selected subdivision 1 isolate genomes using the

ANI/AAI matrix tool. The resulting ANI and AAI values were visualized as heatmaps using the ggplot2 package in R (Wickham, 2016).⁠

3.9.10 Metabolic Insights from MAGs

Functional classifications from the KEGG database were assigned to MAGs derived from bulk and mini-metagenome samples using GhostKoala (Minoru Kanehisa, Sato, Morishima, &

Sternberg, 2016).⁠ Additionally EggNOGmapper (Huerta-Cepas et al., 2019) ⁠ was used to assign

COG/NOG categories to MAGs and sorted-MAGs (Huerta-Cepas et al., 2019).⁠ Gene families involved in carbon substrate degradation were identified in MAGs and sorted-MAGs using the

Carbohydrate Active Enzyme database (CAZy; (V. Lombard et al., 2014) ⁠ implemented in DBcan2

(H. Zhang et al., 2018).⁠

We assessed the functional similarity between Acidobacteria MAGs from this study, published MAGs and genomes from Acidobacteria isolates in a principal coordinates analysis by

74

calculating Bray-Curtis Similarity between EggNOG annotated genes across MAGs and isolate genomes. Clustering patterns were visualized using a PCoA plot generated in ggplot2 (Wickham,

2016).⁠ Genes that may play a role in the visualized patterns of clustering were identified in a heatmap generated using the superheat package in R (Barter & Yu, 2017).⁠ Similarly, patterns of clustering and characterization of carbohydrate metabolism were identified using PCoA plot and heatmap of the CAZy categories.

3.9.11 Estimation of gene expression in Acidobacteria

Assembled contigs from the four metatranscriptomes were aligned to high quality MAGs classified to Acidobacteria subdivision 2 using bbsplit, part of the bbtools suite (Bushnell, n.d.).⁠

Reads that mapped to acidobacterial MAGs were assigned functional annotation using GhostKoala to assign KEGG categories (M. Kanehisa & Goto, 2000; Minoru Kanehisa et al., 2016),⁠

EggNOGmapper to assign COG/NOGs from the EggNOG database (Huerta-Cepas et al., 2019) ⁠ and the Carbohydrate Active Enzyme database (CAZy) implemented in DBcan2 (V. Lombard et al.,

2014; H. Zhang et al., 2018).⁠

3.9.12 Detecting hydrogenases in acidobacterial MAGs

Sequences from genomes known to contain group 1h/5 [NiFe]-hydrogenases were aligned using MUSCLE (Edgar, 2004).⁠ A maximum likelihood tree of these reference sequences and MAGs was generated in IQTree, ModelFinder (Nguyen et al., 2015) ⁠ and was bootstrapped 1000 times.

The tree is rooted with Desulfovibrio vulgaris (WP_010939204.1).

75

Table 3.1. Number of reads assigned to Acidobacteria from the mini- and bulk metagenome datasets using RDP Classifier.

76

Table 3.2 CheckM quality assessment of bulk and sorted-MAGs classified to the phylum Acidobacteria.

77

Table 3.3. Number of expressed KEGG genes in subdivision 2 acidobacteria draft genomes.

78

Table 3.4. Number of expressed CAZy genes in each subdivision 2 acidobacteria genome.

79

Figure 3.1. Taxonomic distribution in mini- and bulk metagenomes. Taxonomy was assigned to contigs by comparing to the NCBI-nr database using DIAMOND blastp (Buchfink et al., 2014).⁠ Taxa counts were determined using MEGAN6 (Huson et al., 2018).⁠ The most abundant phyla are displayed, with taxa representing less than 1% of the total community grouped into “Other”.

80

Figure 3.2. Distribution of acidobacterial clades across the bulk metagenome and mini- metagenome samples. Taxonomic classification was completed using the RDP Classifier on 16S rRNA genes extracted from metagenome and mini-metagenome reads using PhyloFlash. The relative abundance of acidobacterial classes is plotted for each of the four metagenome samples and the four mini-metagenome samples. Each sequencing type (metagenome and mini- metagenome) represents one of the soil horizons (organic or mineral) and one of the temperature treatments (control or heated).

81

Figure 3.3. Concatenated marker gene tree of acidobacterial MAGs and references. We assessed the MAGs and references from the Gene Taxonomy Database (GTDB) for the presence of 120 marker genes using the GTDB-Toolkit (Parks et al 2018). Acidobacterial subdivisions are displayed next to each clade. The MAGs from this study are colored by soil treatment, with heated samples are in orange text and MAGs from control temperature soils are in blue. The clade of subdivision 2 acidobacteria is highlighted in green. The MAGs from which a full-length 16S rRNA gene could be extracted are marked with an asterisk (*). Genomes from cultured acidobacteria are in bold. Bootstrap support values ≥95% (●) and ≥70% (●).

82

Figure 3.4. Quality assessment of MAGs and reference MAGs from GTDB. Completeness and contamination was assessed for each MAG and GTDB reference genome (Parks et al 2018) using CheckM marker genes (Parks et al., 2015).⁠ All MAGs displayed have a contamination of ≤10%. Sample type is represented by shape where bulk metagenome MAGs are circles, GTDB reference MAGs are triangles, and sorted-MAGs are squares. Points are colored by acidobacterial subdivision, including subdivision 1 in yellow, subdivision 2 in green and subdivision 3 in purple. Names are displayed for the ten high-quality MAGs in subdivisions 1 and 3.

83

Figure 3.5. Maximum likelihood tree of acidobacterial subdivisions 1,2,3,4,6 and 8 based on full- length 16S rRNA gene. Subdivision is indicated next to grouping. Accession numbers of the 16S rRNA genes in the SILVA database are given in parentheses. Acanthopleuribacter pedis (AB303221) was used as the outgroup. The 16S rRNA genes extracted from Subdivision 2 MAGs are depicted in dark blue. The subdivision 2 clade, including previously sequenced SD2 Acidobacteria, is highlighted in green. Internal nodes with bootstrap values (1000 interactions) of ≥95% (●) and ≥70% (●) are displayed. The scale bar indicates 0.10 changes per nucleotide.

84

Figure 3.6. PCoA of COG/NOG genes across Acidobacteria subdivisions. The points are colored by acidobacterial subdivision (SDs) where subdivision is presented in yellow, subdivision 2 is green, subdivision 3 is purple and all other subdivisions (4,6,8 and 23) are shown in blue. PCoA axis 1 explains 22% of variability and PCoA axis 2 explains 13% of variability.

85

Figure 3.7 PCoA of CAZy gene families across Acidobacteria subdivisions. Points are colored by acidobacterial subdivisions (SDs) where subdivision 1 is presented in yellow, subdivision 2 is green, subdivision 3 is purple, and sequences from all other subdivisions (4,6,8 and 23) are shown in blue. PCoA axis 1 explains 37% of variability, and PCoA axis 2 explains 11% of the variability.

86

Figure 3.8 Metabolic diagram of the 5 subdivision 2 acidobacteria draft genomes. Metatranscriptome reads mapped to these genomes revealed expression of the Embden- Meyerhof pathway, the pentose phosphate pathway and the tricarboxylic acid cycle of central carbon metabolism. Additionally, MAGs demonstrate expression of dissimilatory and assimilatory nitrate reduction, assimilatory sulfate reduction, GalNAc degradation. The five draft genomes express group 1h/5 [NiFe]-hydrogenases and both high- and low-affinity terminal oxidases involved in oxidative phosphorylation.

87

CHAPTER 4

HIDDEN DIVERSITY OF SOIL GIANT VIRUSES4

4.1 Abstract

Known giant virus diversity is heavily skewed towards viruses isolated from aquatic environments and cultivated in the laboratory. Here, we employed cultivation-independent metagenomics and mini-metagenomics on soils from the Harvard Forest. This lead to the discovery of 16 novel giant viruses, almost exclusively recovered by mini-metagenomics. The new viruses expanded phylogenetic diversity of all giant viruses by more than 20% and represented completely novel lineages or were affiliated with Klosneuviruses, Cafeteria roenbergensis virus or

Tupanviruses. One virus had a genome size of 2.4 Mb, the largest ever recorded in the Mimiviridae, while others had genomes encoding up to 80% orphan genes. In addition, more than 240 major capsid proteins were found encoded on unbinned metagenome fragments, further indicating that giant viruses are greatly underexplored in soil ecosystems. The fact that most of these novel viruses evaded detection in the bulk metagenomes suggests mini-metagenomics could be a potent tool to unearth viral giants.

4.2 Introduction

Viruses larger than some cellular organisms and with genomes up to several megabases in size have been discovered in diverse environmental niches across the globe, primarily from aquatic systems, such as fresh-, sea- and wastewater (Letunic & Bork, 2016),⁠ with only few from

4 Schulz F*, Alteio LV*, Goudeau D, Ryan EM, Yu FB, Malmstrom RR, Blanchard JL, and Woyke T. Hidden diversity of soil giant viruses. November 2018. Nature Communications. *Indicates authors contributed equally.

88

terrestrial environments (Parks et al., 2015) ⁠ including permafrost (Letunic & Bork, 2016).⁠ These viruses are nucleocytoplasmic large DNA viruses (NCDLV), and they infect a wide range of eukaryotes, in particular protists and algae (Aherfi et al., 2016; Andrade et al., 2018).⁠ Besides the algae-infecting Phycodnaviridae (Pagnier et al., 2013; Yoosuf et al., 2014),⁠⁠ only a few NCDLV have been recovered with their native hosts, such as Cafeteria roenbergensis virus (CroV) in the marina flagellate Cafeteria roenbergensis (Abergel, Legendre, & Claverie, 2015; Boughalmi et al., 2013) ⁠ and the Bodo saltans virus (BsV) (Abergel et al., 2015; Abrahão et al., 2018; Fischer, 2016; Legendre et al., 2014) ⁠ . Many of the NCDLV are referred to as “giant viruses” based on their large physical size (Fischer, 2016),⁠ although the term has also been applied to viruses with a genome size of at least 200 kb (Wilson, Van Etten, & Allen, 2009).⁠ Importantly, for many of these NCDLV genome size and particle diameter do no correlate (Fischer, Allen, Wilson, & Suttle, 2010).⁠ In the following, the term “giant virus” refers to any member of the monophyletic group of NCLDV, the proposed order

Megavirales (Deeg, Chow, & Suttle, 2018) ⁠

Most of our current understanding of giant viruses comes from isolates retrieved in co- cultivation with laboratory strains of Acanthamoeba (Claverie & Abergel, 2016).⁠ Only recently have the genomes of giant viruses been recovered by approaches such as bulk shotgun metagenomics

(Legendre et al., 2014) ⁠ , flow-cytometric sorting (Aherfi et al., 2016; Pagnier et al., 2013) ⁠ and isolation using a wider range of protist hosts (Aherfi et al., 2016; Pagnier et al., 2013) ⁠ . Recent large-scale marker gene-based environmental surveys (Andreani et al., 2018; Roux et al., 2017;

Schulz et al., 2017; Verneau, Levasseur, Raoult, La Scola, & Colson, 2016; W. Zhang et al., 2015) ⁠ hinted an immense phylogenetic breadth of giant viruses of which, however, only a small fraction has been isolated to date. Possible reasons are challenges in providing a suitable host during co-

89

cultivation and the inability to recover the viruses together with their native hosts (Khalil et al.,

2016; Martínez, Swan, & Wilson, 2014; Wilson et al., 2009).⁠ In addition, a systematic recovery of giant virus genomes from metagenomic datasets is lacking and thus, the true genetic diversity of giant viruses remains underexplored. Here we describe, for the first time, giant virus genomes from a forest soil ecosystem that were recovered using a cultivation-independent approach.

4.3 Results

4.3.1 Mini-metagenomics facilitated the discovery of giant viruses

Soil samples from the Harvard Forest were subjected to standard shotgun sequencing of microbial communities. Four of the twenty-eight samples were also analyzed using a ‘mini- metagenomics’ (Bajrai et al., 2016; Khalil et al., 2016; Reteno et al., 2015) ⁠ approach, where multiple sets of 100 DNA-stained particles were flow sorted and subjected to whole genome amplification and sequencing (Figure 4.1). Metagenomic binning of assembled contigs produced

15 metagenome assembled genomes (MAGs) from the mini-metagenomes and 1 MAG from the bulk metagenomes that displayed features typically found in most NCLDV genomes (Colson,

Aherfi, & La Scola, 2017; Hingamp et al., 2013; Mihara et al., 2018),⁠ such as hallmark genes encoding for major capsid protein(s) (MCP), factors for maturation of the viral capsid, and packaging ATPases. Furthermore, we observed on most contigs a uniform distribution of genes of viral, bacterial or eukaryotic origin and many without matches in public databases. In addition these new viruses encoded numerous paralogous genes, a feature common to many NCLDV

(Halary, Temmam, Raoult, & Desnues, 2016).⁠ Many of the duplicated genes were located on different contigs and often unique to the respective genomes, providing additional evidence that these contigs belong to a single viral MAG. Moreover, presence, absence and copy number of nucleocytoplasmic virus orthologous genes (NCVOGs) (Berghuis et al., 2019; McLean et al., 2013b; 90

Yu et al., 2017) were comparable to previously described giant viruses, suggesting that the MAGs are made up by single viral genomes and several of them being nearly complete. An independently conducted benchmarking experiment of the mini-metagenomics approach revealed that no chimeric contigs are being created during this workflow which further supports the quality of the genomes derived here. Despite the bulk metagenome approach generating five-fold more reads, it only yielded in a single giant virus genome, whereas mini-metagenomics lead to the recovery of

15 additional bins attributable to NCLDV (Figure 4.1). Bulk metagenome reads only mapped to the

MAG recovered from bulk metagenomes (at ~9x coverage) and not to any mini-metagenome

MAGs, suggesting most of the discovered viruses were of low abundance in the sampled forest soil (Figure 4.1). This was also reflected in the soil metatranscriptomes in which no or only low transcriptional activity of the giant viruses could be detected (Figure 4.1).

4.3.2 Cell-sorted viral particles expand known diversity of NCLDV

The phylogenetic relationships inferred from the tree built from a concatenated alignment of five core NCVOGs (Iyer, Aravind, & Koonin, 2001; Yutin, Wolf, Raoult, & Koonin, 2009) ⁠ (Figure

4.2) and the consensus of single protein phylogenies showed that newly discovered viruses from forest soil were affiliated with diverse lineages in the Megavirales. Two of the new viruses, Ca.

Solivirus and Ca. Solumvirus, were in sister-position to the Pithoviruses, Cedratviruses and the recently isolated Orpheovirus (Filée, 2013; Suhre, 2005).⁠ Ca. Sylvanvirus represented a long branch on its own. Most novel soil NCLDV were positioned within the Mimiviridae, a viral family in the

Megavirales comprising the Megamimivirinae, the Klosneuvirinae, the algae-infecting

Mesomimivirinae and the genus Cafeteriavirus (Yutin, Wolf, & Koonin, 2014; Yutin et al., 2009) ⁠

(Figure 4.2). One of the new viruses, Ca. Faunusvirus, grouped with Cafeteria roenbergensis virus and represents the second viral genome sampled in this clade (Figure 4.2). Another novel virus,

91

Ca. Satyrvirus, branched as sister lineage to the two recently isolated Tupanviruses, which were derived from deep sea and a soda lake samples (Yutin, Wolf, & Koonin, 2014; Yutin et al., 2009),⁠ together forming a monophyletic clade in the Megamimivirinae (Figure 4.2). Notably, none of the new lineages were directly affiliated with any of the three other subgroups of well-studied

Megamimivirinae (Andreani et al., 2018).⁠ Eight of the new viruses branched within the

Klosneuvirinae, currently the largest subfamily in the Mimiviridae based on phylogenetic diversity

(PD) (Gallot-Lavallée, Blanc, & Claverie, 2017; Schulz et al., 2017) ⁠ (Figure 4.2).

Strikingly, the addition of the novel giant viruses to the NCLDV tree lead to a 21% increase of the total PD in the Megavirales (Figure 4.2), expanded the diversity of the Mimiviridae by 77% and nearly tripled the PD of the Klosneuvirinae (Figure 4.2). It is important to note that this expansion of PD was from a single study using cultivation-independent techniques, thereby building upon decades of previous giant virus discovery work (Abergel et al., 2015).⁠ The fact that all these newly discovered viruses represent distinct lineages in the NCLDV hints that additional sampling is expected to lead to a further substantial increase in giant virus PD.

4.3.3 Genomic features of soil giant viruses

The novel viral genomes assigned to the Klosneuviruses were among the largest ever found (Figure 4.2). With a genome size of up to 2.4 Mb the Ca. Hyperionvirus holds the new record for genome size in the Mimiviridae, dwarfing Klosneuvirus and Tupanvirus with their ~ 1.5 Mb genomes (Colson, La Scola, Levasseur, Caetano-Anollés, & Raoult, 2017; Rodrigues, Mougari,

Colson, La Scola, & Abrahão, 2019).⁠ Considering that several of the forest soil MAGs are potentially only partially complete, the true genome size of the new viruses might be even larger. Similar to recently discovered Klosneuviruses and Tupanviruses (La Scola et al., 2003),⁠ several of the new

92

viruses affiliated with the Klosneuvirinae encode for expanded sets of aminoacyl tRNA synthetases

(aaRS), e.g. Ca. Terrestrivirus with up to 19 different aaRS and up to 50 tRNAs with specificity for all 20 different amino acids, a feature only very recently described in the Tupanviruses (Abrahão et al., 2018; Aherfi et al., 2016; Legendre et al., 2015; Rodrigues et al., 2019).⁠ In concert with other viral components of the eukaryotic translation system, such viruses likely override host protein biosynthesis using their own enzymes to ensure efficient production of viral proteins. Being less dependent on the host cell machinery makes it conceivable that Klosneuviruses might be able to infect multiple hosts, i.e. fewer proteins are necessary to target and interact with alternative hosts.

A broader host range has been experimentally verified for Tupanviruses (Abergel et al., 2015;

Schulz et al., 2017).⁠ While Tupanviruses were able to infect different protists, viral titer did not necessarily increase in all the cases (Abergel et al., 2015; Schulz et al., 2017),⁠ suggesting the importance of not yet understood factors necessary for successful host exploitation

4.3.4 Genome novelty of soil giant viruses

Complementary to the phylogenetic analysis (Figure 4.2), we inferred a gene sharing network to provide further insights into the relationship of the novel viral genomes to known

NCLDV lineages based on shared gene content. In agreement with the species tree, viral lineages such as the Mimiviridae, the Marseilleviridae, the Pitho- and Cedratviruses, the Faustoviruses and the Molli- and Pandoraviruses remained well connected (Figure 4.3). Among the novel viruses with the lowest percentage of genes shared with other NCLDV were Ca. Solumvirus and Ca. Solivirus, with Ca. Solivirus being only connected to Orpheovirus and Marseilleviridae and Solumvirus to the

Cedratviruses. In contrast to the phylogenetic tree in which Soli- and Solumvirus are affiliated to each other, there was no particular linkage between them in the network. This suggests limited

93

taxon sampling and we would expect that with discovery of additional giant virus genomes, the phylogenetic position of these viruses will be better resolved.

Another of the soil giant viruses denoted as Ca. Sylvanvirus featured a genome completely disconnected from all other NCLDV (Figure 4.3). With a size of almost 1 Mb it represents one of the largest viral genomes outside Pandoraviruses and the Mimiviridae (Figure 4.3) (Abergel et al.,

2015).⁠ With the presence of 11 ancestral NCLDV genes, a number similar to several other NCLDV, the Ca. Sylvanvirus genome can be considered near complete. Intriguingly, the vast majority

(~80%) of its proteins had neither matches in the NCBI non-redundant (nr) database (Figure 4.3).

From the proteins with database hits, 57% had matches to eukaryotes and 27% to bacteria but only 13% to other viruses (Figure 4.3). Importantly, there was no trend in taxonomic affiliation of the hits (Figure 4.3), again emphasizing the lack of any affiliation to known viruses and organisms.

Among the identifiable genes were 18 potential kinases, 5 ubiquitin ligases and a histone, all potentially playing important roles in interaction with a currently unknown host.

4.3.5 True diversity of giant viruses in forest soil

The MCPs in the soil bulk metagenomes revealed that the 16 novel viruses represent just the tip of the iceberg in terms of soil giant virus diversity (Figure 4.3). In total, 245 different MCP genes were detected, of which 99% were part of the unbinned metagenome fraction. Most of these MCPs were located on short contigs with a read coverage of below 2, indicating an extremely low abundance of corresponding NCLDV in the respective samples (Figure 4.3). Importantly, none of the bulk-metagenome MCPs matched MCPs from the mini-metagenome-derived MAGs, further underlining the much greater diversity of giant viruses in these samples. MCPs can be heavily duplicated but usually branch together in lineage specific clades enabling taxonomic classification based on their nearest neighbors in the tree (Abergel et al., 2015).⁠ Based on identified

94

phylogenetic relationships it was possible to assign taxonomy to several of the bulk metagenome

MCPs, of which most could be attributed to the Klosneuviruses (Figure 4.3). A hint of the true dimension of the NCLDV diversity is revealed when considering that the total number of nearly

300 MCPs discovered in this study, which includes MCPs from all the MAGs, is far greater than the

226 MCPs identified in previously published NCLDV genomes.

4.4 Discussion

Our results illustrate that employing cultivation independent methods on a minute sample from forest soil, a habitat in which giant viruses have rarely been found previously (Abergel et al.,

2015),⁠ can lead to major discoveries. Recovery of Ca. Solum-, Soli- and Sylvanvirus, three potentially genus, subfamily or even family level NCLDV lineages together with 13 other novel giant virus genomes vastly expands the phylogenetic diversity of the NCLDV and provides novel insights into their genetic makeup. The fact that only a single giant virus MAG was recovered in the bulk metagenomes suggests extremely low abundance of these viruses compared to microbial community members in forest soil. However, mini-metagenomics has proven most effective in recovering these viruses, yet without any detectable traces of host sequences. It is noteworthy that oftentimes the average read coverage of the giant virus MAGs was the highest or among the highest compared to other microbial MAGs derived from the same mini-metagenomes pool of

100 DNA-stained particles. The high coverage and completeness of giant virus genomes is consistent with having several copies of the same viral genome in the same mini-metagenome pool, but the overall low abundance of giant viruses in the system makes it unlikely that several identical viral particles were sorted by chance. A plausible scenario could be that host vacuoles already filled with giant viruses may have been recovered during sorting, thereby delivering several clonal copies of a giant virus genome into a single mini-metagenome pool. This would

95

enable genome assembly of higher quality and completeness, as previously shown for polyploid bacterial symbionts (Legendre et al., 2014; Wu et al., 2009).⁠

Of the few available studies that have used this mini-metagenomes method, one describes the discovery of a novel intracellular bacterium 30 and another a new group of giant viruses (Legendre et al., 2018),⁠ suggesting mini-metagenomics is a promising method for elucidating the hidden diversity of intracellular entities such as giant viruses. As shown by the MCP diversity in the unbinned metagenome fraction many novel giant viruses are readily awaiting discovery. Importantly, the mini-metagenomics approach has not been exhaustively performed in soil or any other ecosystem and thus represents a promising toolkit for exploring the untapped diversity in the giant virus universe.

4.5 Methods

4.5.1 Sampling and sample preparation

Fourteen forest soil cores from the Barre Woods warming experiment located at the

Harvard Forest Long-Term Ecological Research site (Petersham, MA) were collected and sub- sampled into organic horizon and mineral zone, resulting in twenty-eight total samples. Mineral zone samples were flash-frozen while organic horizons were incubated with deuterium oxide for

2 weeks prior to freezing to label the active microbial communities. This incubation was carried out as part of a different experiment that will be addressed in a later manuscript. Total DNA and

RNA were extracted from twenty-eight soil samples for bulk metagenomics and metatranscriptomics using the MoBio PowerSoil DNA and RNA kits, respectively. Bacterial and

Plant rRNA depletion was performed on the RNA samples prior to sequencing. Of these 28 soil samples, a subset of four encompassing two organic and two mineral layers were selected for

96

mini-metagenomics. Cells, and presumably viral particles and/or eukaryote vacuoles containing them, were separated from soil particles using a mild detergent, followed by vortexing, centrifugation, and filtration through a 5 μm syringe filter. The filtrates were stained with SYBR

Green nucleic acid stain. For each of the four samples, ninety pools containing 100 SYBR+ particles were sorted into microwell plates using Fluorescence Activated Cell Sorting (FACS). Sorted pools underwent lysis and whole genome amplification through Multiple Displacement Amplification

(MDA) following methods outlined previously 46 . A total of 360 sequencing libraries were generated with the Nextra XT v2 kit (Illumina) with 9 rounds of PCR amplification.

4.5.2 Mini-Metagenomes

The 360 libraries derived from sorted particles were sequenced at the DOE Joint Genome

Institute (JGI, Walnut Creek, CA) using the Illumina NextSeq platform. Pools of 90 libraries were processed in four sequencing runs that generated 2X150bp read lengths. Raw Illumina reads were quality filtered to remove contamination and low quality reads using BBTools

(http://bbtools.jgi.doe.gov, version 37.38). Read normalization was performed using BBNorm

(http://bbtools.jgi.doe.gov) and error correction with Tadpole (http://bbtools.jgi.doe.gov).

Assembly of filtered, normalized Illumina reads was performed using SPAdes (v3.10.1) (Legendre et al., 2018; Pagnier et al., 2013) ⁠ with the following options: --phred-offset 33 -t 16 -m 115 --sc -k

25,55,95. All contig ends were then trimmed of 200bp and contigs were discarded if the length was <2kb or read coverage less than 2 using BBMap (http://bbtools.jgi.doe.gov) with the following options: nodisk ambig, filterbycoverage.sh: mincov.

97

4.5.3 Bulk Metagenomes

Unamplified TruSeq libraries were prepared for the 28 DNA samples for metagenomic sequencing on the Illumina HiSeq-2000 platform at the DOE JGI. Raw Illumina reads were trimmed, quality filtered, and corrected using bfc (version r181) (S. W. Wilhelm, Coy, Gann,

Moniruzzaman, & Stough, 2016) ⁠ with the following options: -1 -s 10g -k 21 -t 10. Following quality filtering, reads were assembled using SPAdes (v3.11.1) (Schulz et al., 2017) ⁠ with the following options:-m 2000 --only-assembler -k 33,55,77,99,127 --meta -t 32. The entire filtered read set was mapped to the final assembly and coverage information generated using bbmap

(http://bbtools.jgi.doe.gov, version 37.62) with default parameters except ambiguous=random.

The version of the processing pipeline was jgi_mga_meta_rqc.py, 2.1.0.

4.5.4 Metatranscriptomes

Libraries were prepared and sequenced on the Illumina NextSeq platform at the DOE JGI.

Following sequencing, metatranscriptome reads were quality cleaned and a combined assembly was generated using the MEGAHIT assembler (v1.1.2) (Bankevich et al., 2012) ⁠ using the following options: -m 0.2 --k-list 23,43,63,83,103,123 --continue -o out.megahit --12. These cleaned reads were aligned to metagenome reference sequences using BBMap (http://bbtools.jgi.doe.gov, version 37.38) with the following options: nodisk=true interleaved=true ambiguous=random.

4.5.5 Metagenome Binning

Contigs were organized into genome bins based on tetranucleotide sequence composition with MetaBat2 (Heng Li, 2015).⁠ Genome bins were generated for mini-metagenomes without contig coverage patterns due to MDA bias (Bankevich et al., 2012; Nurk, Meleshko, Korobeynikov,

98

& Pevzner, n.d.).⁠ Coverage was determined for the bulk metagenomes by mapping reads to the completed assemblies using the Burrows-Wheeler aligner (D. Li et al., 2015).⁠

4.5.6 Screening for giant viruses

Metagenomic bins were screened for presence of the 20 ancestral NCVOGs (Kang et al.,

2015) ⁠ with hmmsearch (version 3.1b2 , hmmer.org). Bins with more than 5 different hits and/or that contained the NCLDV MCP gene (NCVOG0022) were selected and further evaluated (see below).

4.5.7 Annotation and quality control of viral genome bins

Gene calling was performed with GeneMarkS using the virus model (Woyke et al., 2009).⁠

For functional annotation proteins were blasted against previously established NCVOGs (H. Li &

Durbin, 2009) ⁠ and the NCBI non redundant database (nr) using Diamond blastp (Yutin et al., 2009) ⁠ with an e-value cutoff of 1.0e-5. In addition, protein domains were identified by hmmsearch

(version 3.1b2 , hmmer.org) against Pfam-A (version 29.0) (Borodovsky & Lomsadze, 2011),⁠ and tRNAs and introns were identified using tRNAscan-SE (Yutin et al., 2009) ⁠ and cmsearch from the

Infernal package (Buchfink et al., 2014) ⁠ against the Rfam database (version 13.0) (Finn et al.,

2016).⁠ Nearly identical sequences within genome bins (>100 bp, identity > 94%) were detected using the MUMmer repeat-match algorithm (Lowe & Chan, 2016) ⁠ and visualized with Circos

(Nawrocki & Eddy, 2013) ⁠ together with the respective genome bins. For all MAGs, paralogs and best diamond blastp vs NCBI nr hits were visualized with Circos (Kalvari et al., 2018).⁠ Furthermore, distribution of read depth across contigs was evaluated and regions with low average coverage were identified.

99

4.5.8 Experimental benchmarking of the mini-metagenomics approach

Benchmarking of the mini-metagenomics approach to assess potential chimera formation during MDA was performed by randomly sorting 10 cells from a bacterial mock community consisting of five different bacterial isolates; Escherichia coli K12, vietnamensis DSM

17526, Shewanella oneidensis MR-1, Pseudomonas putida F1 and Meiothermus ruber. In total 59 of these 10-cell sorts were subject to MDA and sequencing. Resulting reads were filtered, assembled and analyzed with the same bioinformatics pipeline used for the mini-metagenomes generated in this study. Assembly statistics of recovered MAGs were generated with MetaQUAST

(Delcher, Salzberg, & Phillippy, 2003).⁠

4.5.9 Computational benchmarking of giant virus metagenomic binning

In addition, benchmarking of the binning workflow was performed to assess its applicability to giant virus data. First, binning of a simulated mock community consisting of 12 giant viruses was tested, each a representative of a subfamily or family in the NCLDV. In addition, the herein newly discovered giant viruses were used as template for a second simulated mock community. In brief, MDA was simulated on the genomes of the mock communities with MDAsim

(Krzywinski et al., 2009) ⁠ (https://github.com/hzi-bifo/mdasim/releases/v2.1.1). In the following,

Illumina reads were generated with ART (Krzywinski et al., 2009) ⁠ and the same bioinformatics pipeline used for the mini-metagenomes in this study employed for read error-correction, normalization, assembly and binning.

4.5.10 Phylogenomics

To remove redundancy , the set of 186 published NCLDV genomes and 16 novel soil giant viruses were clustered at an average nucleotide identity (ANI) of 95% with at least 100 kb aligned

100

fraction using fastANI (Mikheenko, Saveliev, & Gurevich, 2016) resulting⁠ in 132 clusters and singletons. None of the newly discovered viruses clustered with any other virus. The three most incomplete novel giant virus genomes were removed from the data set. To infer the positions of novel soil giant viruses in the NCLDV, five core NCLDV proteins (Tagliavi & Draghici, 2012) ⁠ were selected: DNA polymerase elongation subunit family B (NCVOG0038), D5-like helicase-primase

(NCVOG0023), packaging ATPase (NCVOG0249) and DNA or RNA helicases of superfamily II

(NCVOG0076) and Poxvirus Late Transcription Factor VLTF3-like (NCVOG0262), and identified with hmmsearch (version 3.1b2 , hmmer.org). Three of the MAGs derived from mini-metagenomes were excluded from the analysis as they had less than three conserved NCLDV proteins. Protein sequences were aligned using mafft (W. Huang, Li, Myers, & Marth, 2012) ⁠ . Gapped columns in alignments (less than 10% sequence information) and columns with low information content were removed from the alignment with trimal (Jain et al., 2018).⁠ Phylogenetic trees for each protein and for a concatenated alignment of all five proteins were constructed using IQ-tree with LG+F+R6 as suggested by model test as best-fit substitution model (Yutin et al., 2009).⁠ PD was calculated as described previously (Katoh et al., 2002);⁠ briefly, the total branch lengths of phylogenetic trees with and without the novel soil NCLDV genomes were calculated and compared to determine the gain in PD.

4.5.11 Major capsid protein analysis

Bulk metagenome assemblies and 186 published NCLDV genomes and 16 soil MAGs were screened for presence of the NCLDV MCP gene (NCVOG0022) (Capella-Gutierrez, Silla-Martinez,

& Gabaldon, 2009) ⁠ with hmmsearch (version 3.1b2 , hmmer.org) and applying a cutoff of 1e-6.

This cutoff has been evaluated against ~60,000 available microbial, eukaryotic and non-NCLDV genomes in the Integrated Microbial Genomes database (Nguyen et al., 2015) ⁠ yielding in only few 101

false positives. Resulting protein hits were extracted from the metagenome and to reduce redundancy clustered with cd-hit at a sequence similarity of 95% (Wu et al., 2009).⁠ Cluster representatives were then subject to diamond blastp (Schulz et al., 2017; Yutin et al., 2009) ⁠ against nr database (June 2018) and proteins which had hits but no NCLDV MCP in the top 10 were excluded from further analysis as potentially false positives. For tree construction, MCPs were extracted and aligned with mafft-ginsi (--unalignlevel 0.8, --allowshift) (Chen et al., 2016).⁠ Gapped columns in the alignment (less than 10% sequence information) were removed with trimal (W. Li

& Godzik, 2006) ⁠ and proteins with less than 50 aligned amino acids were removed. A phylogenetic tree was constructed with IQ-tree and the LG+F+R8 as suggested by model test as the best-fit substitution model (Buchfink et al., 2014).⁠

4.5.12 Gene sharing network

Protein families were inferred with OrthoFinder 1.03 71 on a representative dataset of

93 NCLDV genomes for comparative analysis (after de-replication using 95% ANI clustering

(Katoh et al., 2002),⁠ details described above, and removal of 36 Poxviruses). For each pair of

NCLDV genomes (ANI 95% cluster representatives) the average percentage of proteins in shared orthogroups in relation to the total number of proteins in the respective genome was calculated and used as edge weight in the network.The network was created in Gephi (Capella-Gutierrez et al., 2009) ⁠ using a force layout and filtered at an edge weight of 18%.

102

103

Figure 4.1. Discovery pipeline for soil giant viruses. a Overall workflow. Fourteen forest soil cores from Barre Woods long-term experimental warming site were sub-sampled into organic horizon and mineral zone resulting in twenty-eight total samples. Total DNA and RNA were extracted from twenty-eight soil samples for bulk metagenomics and metatranscriptomics. Of these samples, a subset of four encompassing two organic and two mineral layers were selected for flow-sorted mini-metagenomics. Cells, and presumably viral particles, were separated from soil, stained with SYBR Green nucleic acid stain and sorted using Fluorescence Activated Cell Sorting (FACS). Ninety sorted pools of 100 SYBR+ particles underwent lysis, whole genome amplification, library preparation and sequencing on the Illumina NextSeq platform. Phylogenomic analysis of metagenome assembled genomes (MAGs) facilitated the identification of novel giant viruses. There was no correlation of presence or absence of giant viruses and sample treatment. b Data analysis summary. Fifteen giant virus MAGs (orange circles) were recovered from sorted samples, while only one giant virus MAG (turquoise circle) was recovered from the bulk metagenomes. The other 1778 MAGs from the mini-metagenomes (gray circles) and 1772 MAGs from the bulk metagenomes (gray circles) were of microbial origin and not analyzed further in this study. Mapping of bulk metagenome reads to MAGs revealed ~9X coverage of the bulk-metagenome derived MAG and <1X coverage of MAGs derived from mini-metagenomes, confirming the inability to recover these novel giant virus genomes using bulk metagenomics despite deep sequencing efforts. Assembly and mapping of metatranscriptome data indicated expression of only few of the novel giant virus genes of MAGs derived from mini-metagenomes.

104

Figure 4.2. Expansion of NCLDV diversity by soil giant viruses. a Phylogenetic tree (IQ-tree LG+F+R6) of NCLDV inferred from a concatenated protein alignment of five core nucleocytoplasmic virus orthologous genes (NCVOGs). The tree was built from a representative set of NCDLV genomes after de-replication by ANI clustering (95% id). Novel soil NCLDV lineages and existing major NCLDV lineages grouping together with soil NCLDV are highlighted in black. The scale bar represents substitutions per site. Branches are collapsed if support was low (<50), filled circles indicate moderate support (50-80, white) or high support (80-97, black), branches without circles are fully supported (>97). b Detailed phylogenetic tree of the Mimiviridae. Diameter of filled circles correlates with assembly size and shades of gray with GC% ranging from 20% (light gray) to 60% (dark gray). Bar plots summarize total number of encoded aminoacyl-tRNA synthetases (aaRS) and tRNAs. In addition, completeness was estimated based on number of identified marker genes out of 20 ancestral NCVOGs. c Increase of phylogenetic diversity (PD) after adding the soil NCLDV MAGs (black) to representative sets of NCLDV reference genomes (gray).

105

Figure 4.3. Genome novelty of soil giant viruses. A Nucleocytoplasmic large DNA virus (NCLDV) gene sharing network, with nodes representing genomes, node diameter correlating with genome size, edge diameter and color intensity with normalized percentage of genes in shared gene families between node pairs above a threshold of 18%. B Circular representation of the Ca. Sylvanvirus genome. From outside to inside: Blue filled circles depict location of encoded tRNAs. The second ring displays positions of genes (gray) either on the minus or the plus strand. The next track illustrates GC content in gray ranging from 20% (white) to 60% (dark gray). The fourth track shows color-coded origin of proteins with best blastp hits to cellular homologs. Best hits against viral proteins are indicated in white and further broken down based on their taxonomic origin color-coded on the most inner track. Lines in the middle of the plot connect paralogs (gray) and nearly identical repeats (orange). The pie chart in the center of the plot summaries the percentage of genes with and without cellular homologs, which are further broken down based on best blastp hits in the adjacent bar plot. C Percentage of genes in NCLDV genes with bacterial or eukaryotic homologs with no blastp hits (in the NCBI NR database, highlighting the unique position of Ca. Sylvanvirus).

106

Figure 4.4. Hidden diversity of giant viruses in bulk metagenomes. a Total number of major capsid proteins (MCPs) found in reference NCLDV genomes, MAGs or recovered from bulk metagenomes on contigs > 1 kb and contigs < 1 kb (dark gray), colored by taxonomy. b Size and cover of bulk metagenome contigs containing MCP genes, either from the unbinned fraction (filled blue circles) or the MAGs (filled pink circles) c Phylogenetic tree of the MCPs of nucleocytoplasmic large DNA virus (NCLDV). Branches are color-coded based on taxonomic origin of MCPs inferred by relationship in the tree to MCPs of known reference NCLDV. MCPs of novel giant viruses from this study which are not members of the Mimiviridae are indicated in red. Branches labelled with a circle represent novel MCP from MAGs generated in this study while stars indicate MCPs recovered from the unassembled fraction (contigs > 1 kb) of bulk-metagenomes. Circles and stars are filled in color if taxonomy could be assigned based on the tree and in black if it was not possible to assign taxonomy.

107

APPENDIX A

SUPPLEMENTARY TABLES

108

Table S1. RDP Classifier results for full-length 16S sequences extracted from acidobacteria MAGs.

109

APPENDIX B

SUPPLEMENTARY FIGURES

110

Figure S1. Full concatenated marker gene tree of acidobacterial MAGs and references. Genome sequences and MAGs were assessed for the presence of 120 marker genes in the Genome Taxonomy Database (GTDB;Parks et al., 2018).⁠ Acidobacterial subdivisions are displayed next to each clade, with subdivision 2 highlighted in green. The MAGs from which a full-length 16S rRNA gene could be extracted are marked with an asterisk (*). Genomes from cultured acidobacteria are in bold. Bootstrap support values ≥95% (●) and ≥70% (●).

111

Figure S2. Genomic similarity across the 5 subdivision 2 acidobacteria characterized in this study. Average nucleotide identity (ANI, in blue) and average amino acid identity (AAI, in green) of subdivision 2 MAGs from this study. ANI and AAI were calculated using the ANI/AAI matrix calculator tool by. (L. Rodriguez-R & Konstantinidis, 2016) ⁠ and visualized using the ggplot2 package in R.

112

Figure S3. Genomic similarity across subdivision 1 acidobacteria. Average nucleotide identity (ANI, in blue) and average amino acid identity (AAI, in green) across subdivision 1 acidobacteria MAGs from this study, and reference genomes from isolates. ANI and AAI were calculated using the ANI/AAI matrix calculator tool by Kostantinidis et al. and visualized using the ggplot2 package in R.

113

Figure S4. PcoA clustering COG/NOG categories across acidobacteria MAGs and reference genomes. All points are labeled with the genome name. Acidobacterial subdivisions (Sds) are represented by color, with subdivsion 1 genomes and MAGs in yellow, subdivision 2 MAGs in green, subdivision 2 genomes in purple and all other subdivisions in blue.

114

Figure S5. Heatmap of EggNOG genes across Acidobacteria subdivisions. Acidobacterial subdivisions (SDs) are labeled on the x-axis of the heatmap. Black represents the presence of the gene in the genome, and white represent the absence of the gene.

115

Figure S6. PcoA clustering CAZy categories across acidobacteria MAGs and reference genomes. All points are labeled with the genome name. Acidobacterial subdivsions (Sds) are represented by colors with subdivision 1 MAGs and reference genomes in yellow, subdivision 2 MAGs in green, subdivison 3 genomes in purple, and all other subdivisions in blue.

116

Figure S7. Heatmap of CAZy gene families across acidobacterial subdivisions. Acidobacterial subdivisions (SDs) are labeled on the x-axis. Names on the y-axis disply the annotated gene families from the CAZy database. The clustering in the second y-axis is based on hierarchical clustering of the genomes. Black color in the heatmap represents the presence of the gene, and white represents the absence of the gene.

117

Figure S8. Relative abundance of CAZy gene families across acidobacteria MAGs and reference genome sequences. Reference genomes and MAGs are displayed on the x-axis and the relative abundance of each gene family in each genome. The legend displays each of the CAZy gene families, including Auxiliary Activities (AA in dark green), Carbohydrate Binding Modules (CBM in light green), Carbohydrate Esterases (CE in orange), Glycoside Hydrolases (GH in light blue), Glycosyl Transferases (GT in purple) and Polysaccharide Lyases (PL in yellow).

118

Figure S9. Phylogenetic tree of group 1h/5 [NiFe] hydrogenases. Maximum likelihood tree based on the deduced amino acid sequence (~600 amino acid positions) of the group 1h/5 [NiFe]- hydrogenase large subunit. The tree was bootstrapped 1,000 times and the consensus support is displayed ≥99% (●) and ≥90% (●). The outgroup (not shown) was the group 1 [NiFe]-hydrogenase from Desulfovibrio vulgaris (WP_010939204.1). The scale bar depicts 0.01 changes per amino acid. The yellow box indicates the clade of Acidobacteria. Sequences in red represent those genes from acidobacterial MAGs generated in this study.

119

Figure S10. Glycoside hydrolases expressed across subdivision 2 acidobacteria MAGs. The x-axis includes the five, high-quality acidobacterial MAGs classified as subdivision 2. The y-axis contains each of the CAZy gene families expressed in the genomes of these subdivision 2 acidobacteria, and the second y-axis shows the pattern of hierarchical clustering based on the genes within the genomes. Purple color represents that the gene family is expressed in the subdivision 2 acidobacterial genome, and white represents the lack of expression of that gene family.

120

Figure S11. Relative abundance of CAZy gene families expressed across subdivision 2 MAGs. The x-axis displays the five high-quality subdivision 2 acidobacterial MAGs generated in this study. The y-axis displays the relative abundance of each CAZy gene family in each genome. CAZy gene families are represented in the legend as Auxiliary Activities (AA in dark green), Carbohydrate Binding Modules (CBM in light green), Carbohydrate Esterases (CE in orange), Glycoside Hydrolases (GH in light blue), Glycosyl Transferases (GT in purple) and Polysaccharide Lyases (PL in yellow).

121

Figure S12. Distribution of acidobacterial classes in the mineral and organic soil horizons. Taxonomic classification was completed using the RDP Classifier on 16S rRNA genes extracted from metagenome and mini-metagenome reads using PhyloFlash. The x-axis describes the soil horizon (Mineral or Organic) from which the soil samples were taken. The y-axis displays the relative abundance of each acidobacterial class.

122

BIBLIOGRAPHY

Abergel, C., Legendre, M., & Claverie, J.-M. (2015). The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiology Reviews, 39(6), 779–796. https://doi.org/10.1093/femsre/fuv037

Abrahão, J., Silva, L., Silva, L. S., Khalil, J. Y. B., Rodrigues, R., Arantes, T., … La Scola, B. (2018). Tailed giant Tupanvirus possesses the most complete translational apparatus of the known virosphere. Nature Communications, 9(1), 749. https://doi.org/10.1038/s41467-018- 03168-1

Aherfi, S., Colson, P., La Scola, B., & Raoult, D. (2016). Giant Viruses of Amoebas: An Update. Frontiers in Microbiology, 7, 349. https://doi.org/10.3389/fmicb.2016.00349

Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K. L., Tyson, G. W., & Nielsen, P. H. (2013). Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotechnology, 31(6). https://doi.org/10.1038/nbt.2579

Aleklett, K., Kiers, E. T., Ohlsson, P., Shimizu, T. S., Caldas, V. E., & Hammer, E. C. (2018). Build your own soil: Exploring microfluidics to create microbial habitat structures. ISME Journal. https://doi.org/10.1038/ismej.2017.184

Allison, S. D. (2005). Cheaters, diffusion and nutrients constrain decomposition by microbial enzymes in spatially structured environments. Ecology Letters, 8(6), 626–635. https://doi.org/10.1111/j.1461-0248.2005.00756.x

Allison, S. D., & Martiny, J. B. (2008). Colloquium paper: resistance, resilience, and redundancy in microbial communities. Proc Natl Acad Sci U S A, 105 Suppl, 11512–11519. https://doi.org/10.1073/pnas.0801925105

Alteio, L. V., Schulz, F., Seshadri, R., Varghese, N. J., Rodriguez, W., Goudeau, D., … Woyke, T. (n.d.). Complementary metagenomics approaches reveal diversity of bacteria and archaea in forest soil. Microbiome.

Amann, R., & Rosselló-Móra, R. (2016). After All, Only Millions? MBio, 7(3), 999–1015. https://doi.org/10.1128/mBio.00201-16

Amann, Rudolf, & Fuchs, B. M. (2008). Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques. Nature Reviews Microbiology, 6(5), 339–348. https://doi.org/10.1038/nrmicro1888

Andrade, A. C. dos S. P., Arantes, T. S., Rodrigues, R. A. L., Machado, T. B., Dornas, F. P., Landell, M. F., … Abrahão, J. S. (2018). Ubiquitous giants: a plethora of giant viruses found in Brazil and Antarctica. Virology Journal, 15(1), 22. https://doi.org/10.1186/s12985-018-0930-x

123

Andreani, J., Khalil, J. Y. B., Baptiste, E., Hasni, I., Michelle, C., Raoult, D., … La Scola, B. (2018). Orpheovirus IHUMI-LCC2: A New Virus among the Giant Viruses. Frontiers in Microbiology, 8, 2643. https://doi.org/10.3389/fmicb.2017.02643

Angel, R., Panhölzl, C., Gabriel, R., Herbold, C., Wanek, W., Richter, A., … Woebken, D. (2018). Application of stable-isotope labelling techniques for the detection of active diazotrophs. Environmental Microbiology, 20(1), 44–61. https://doi.org/10.1111/1462-2920.13954

Angers, D. A., & Caron, J. (1998). Plant-induced Changes in Soil Structure: Processes and Feedbacks. Biogeochemistry, 42(1/2), 55–72. https://doi.org/10.1023/A:1005944025343

Bach, E. M., & Hofmockel, K. S. (2016). A time for every season: soil aggregate turnover stimulates decomposition and reduces carbon loss in grasslands managed for bioenergy. GCB Bioenergy, 8(3), 588–599. https://doi.org/10.1111/gcbb.12267

Bach, E. M., Williams, R. J., Hargreaves, S. K., Yang, F., & Hofmockel, K. S. (2018). Greatest soil microbial diversity found in micro-habitats. Soil Biology and Biochemistry, 118, 217–226. https://doi.org/10.1016/j.soilbio.2017.12.018

Bailey, V. L., Fansler, S. J., Stegen, J. C., & McCue, L. A. (2013). Linking microbial community structure to β-glucosidic function in soil aggregates. The ISME Journal, 7(10), 2044–2053. https://doi.org/10.1038/ismej.2013.87

Bajrai, L., Benamar, S., Azhar, E., Robert, C., Levasseur, A., Raoult, D., … La Scola, B. (2016). Kaumoebavirus, a New Virus That Clusters with Faustoviruses and Asfarviridae. Viruses, 8(11), 278. https://doi.org/10.3390/v8110278

Bakken, L. R., Bergaust, L., Liu, B., & Frostegård, A. (2012). Regulation of denitrification at the cellular level: a clue to the understanding of N2O emissions from soils. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 367(1593), 1226– 1234. https://doi.org/10.1098/rstb.2011.0321

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., … Pevzner, P. A. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology : A Journal of Computational Molecular Cell Biology, 19(5), 455–477. https://doi.org/10.1089/cmb.2012.0021

Barberán, A., Bates, S. T., Casamayor, E. O., & Fierer, N. (2012). Using network analysis to explore co-occurrence patterns in soil microbial communities. The ISME Journal, 6(2), 343–351. https://doi.org/10.1038/ismej.2011.119

Bardgett, R. D., & Van Der Putten, W. H. (2014). Belowground biodiversity and ecosystem functioning. https://doi.org/10.1038/nature13855

Barns, S. M., Cain, E. C., Sommerville, L., & Kuske, C. R. (2007). Acidobacteria phylum sequences in uranium-contaminated subsurface sediments greatly expand the known diversity within

124

the phylum. Applied and Environmental Microbiology, 73(9), 3113–3116. https://doi.org/10.1128/AEM.02012-06

Barter, R., & Yu, B. (2017). Superheat: A graphical took for exploring complex datasets using heatmaps.

Baveye, P. C., Otten, W., Kravchenko, A., Balseiro-Romero, M., Beckers, É., Chalhoub, M., … Vogel, H.-J. (2018). Emergent Properties of Microbial Activity in Heterogeneous Soil Microenvironments: Different Research Approaches Are Slowly Converging, Yet Major Challenges Remain. Frontiers in Microbiology, 9, 1929. https://doi.org/10.3389/fmicb.2018.01929

Belova, S. E., Ravin, N. V., Pankratov, T. A., Rakitin, A. L., Ivanova, A. A., Beletsky, A. V., … Dedysh, S. N. (2018). Hydrolytic Capabilities as a Key to Environmental Success: Chitinolytic and Cellulolytic Acidobacteria From Acidic Sub-arctic Soils and Boreal Peatlands. Frontiers in Microbiology, 9, 2775. https://doi.org/10.3389/fmicb.2018.02775

Bengtson, P., Barker, J., & Grayston, S. J. (2012). Evidence of a strong coupling between root exudation, C and N availability, and stimulated SOM decomposition caused by rhizosphere priming effects. Ecology and Evolution, 2(8), 1843–1852. https://doi.org/10.1002/ece3.311

Benninghoven, A., Rüdenauer, F. G., & Werner, H. W. (1987). Secondary ion mass spectrometry : basic concepts, instrumental aspects, applications, and trends. New York : J. Wiley. Retrieved from https://searchworks.stanford.edu/view/1288747

Berghuis, B. A., Yu, B., Schulz, F., Blainey, P. C., Woyke, T., & Quake, S. R. (2019). Hydrogenotrophic methanogenesis in archaeal phylum Verstraetearchaeota reveals the shared ancestry of all methanogens. PNAS, 116(11), 5037–5044. https://doi.org/10.1073/pnas.1815631116

Berlemont, C., & Martiny, R. C. (2015). Genomic potential for polysaccharide deconstruction in bacteria. Appl Environ Microbiol, 81, 1513–1519. https://doi.org/10.1128/AEM.03718-14

Berlemont, R., & Martiny, A. C. (2013). Phylogenetic distribution of potential cellulases in bacteria. Applied and Environmental Microbiology, 79(5), 1545–1554. https://doi.org/10.1128/AEM.03305-12

Berlemont, R., & Martiny, A. C. (2016). Glycoside Hydrolases across Environmental Microbial Communities. PLoS Computational Biology, 12(12), e1005300. https://doi.org/10.1371/journal.pcbi.1005300

Berry, D., Mader, E., Lee, T. K., Woebken, D., Wang, Y., Zhu, D., … Wagner, M. (2015). Tracking heavy water (D2O) incorporation for identifying and sorting active microbial cells. Proceedings of the National Academy of Sciences of the United States of America, 112(2), E194-203. https://doi.org/10.1073/pnas.1420406112

125

Berry, D., & Widder, S. (2014). Deciphering microbial interactions and detecting keystone species with co-occurrence networks. Frontiers in Microbiology, 5, 219. https://doi.org/10.3389/fmicb.2014.00219

Blainey, P. C. (2013a). The future is now: single-cell genomics of bacteria and archaea. FEMS Microbiology Reviews, 37(3), 407–427. https://doi.org/10.1111/1574-6976.12015

Blainey, P. C. (2013b). The future is now: single-cell genomics of bacteria and archaea. FEMS Microbiology Reviews, 37(3), 407–427. https://doi.org/10.1111/1574-6976.12015

Bond-Lamberty, B., Bolton, H., Fansler, S., Heredia-Langner, A., Liu, C., McCue, L. A., … Bailey, V. (2016). Soil respiration and bacterial structure and function after 17 years of a reciprocal soil transplant experiment. PLoS ONE, 11(3), 171736. https://doi.org/10.1371/journal.pone.0150599

Bonfante, P., & Anca, I.-A. (2009). Plants, Mycorrhizal Fungi, and Bacteria: A Network of Interactions. https://doi.org/10.1146/annurev.micro.091208.073504

Borer, B., Tecon, R., & Or, D. (2018). Spatial organization of bacterial populations in response to oxygen and carbon counter-gradients in pore networks. Nature Communications, 9(1), 769. https://doi.org/10.1038/s41467-018-03187-y

Borodovsky, M., & Lomsadze, A. (2011). Gene Identification in Prokaryotic Genomes, Phages, Metagenomes, and EST Sequences with GeneMarkS Suite. In Current Protocols in Bioinformatics (Vol. 35, pp. 4.5.1-4.5.17). Hoboken, NJ, USA: John Wiley & Sons, Inc. https://doi.org/10.1002/0471250953.bi0405s35

Boughalmi, M., Saadi, H., Pagnier, I., Colson, P., Fournous, G., Raoult, D., & La Scola, B. (2013). High-throughput isolation of giant viruses of the Mimiviridae and Marseilleviridae families in the Tunisian environment. Environmental Microbiology, 15(7), 2000–2007. https://doi.org/10.1111/1462-2920.12068

Bouwmeester, H. J., Roux, C., Antonio Lopez-Raez, J., & Card, G. B. (2007). Rhizosphere communication of plants, parasitic plants and AM fungi. Review TRENDS in Plant Science, 12(5). https://doi.org/10.1016/j.tplants.2007.03.009

Bowden, R. D., Davidson, E., Savage, K., Arabia, C., & Steudler, P. (2004). Chronic nitrogen additions reduce total soil respiration and microbial respiration in temperate forest soils at the Harvard Forest. Forest Ecology and Management, 196(1), 43–56. https://doi.org/10.1016/j.foreco.2004.03.011

Bowers, R. M., Kyrpides, N. C., Stepanauskas, R., Harmon-Smith, M., Doud, D., Reddy, T. B. K., … Woyke, T. (2017). Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology, 35(8), 725–731. https://doi.org/10.1038/nbt.3893

126

Bradford, M. A., Davies, C. A., Frey, S. D., Maddox, T. R., Melillo, J. M., Mohan, J. E., … Wallenstein, M. D. (2008). Thermal adaptation of soil microbial respiration to elevated temperature. Ecology Letters, 11(12), 1316–1327. https://doi.org/10.1111/j.1461- 0248.2008.01251.x

Brenner, K., You, L., & Arnold, F. H. (2008). Engineering microbial consortia: a new frontier in synthetic biology. Trends in Biotechnology, 26(9), 483–489. https://doi.org/10.1016/J.TIBTECH.2008.05.004

Buchfink, B., Xie, C., & Huson, D. H. (2014). Fast and sensitive protein alignment using DIAMOND. Nature Methods. https://doi.org/10.1038/nmeth.3176

Burmeister, A., Hilgers, F., Langner, A., Westerwalbesloh, C., Kerkhoff, Y., Tenhaef, N., … Grünberger, A. (2019). A microfluidic co-cultivation platform to investigate microbial interactions at defined microenvironments. Lab on a Chip, 19(1), 98–110. https://doi.org/10.1039/C8LC00977E

Bushnell, B. (n.d.). BBTools. Retrieved from http://bbtools.jgi.doe.gov

Butler, S. M., Melillo, J. M., Johnson, J. E., Mohan, J., Steudler, P. A., Lux, H., … Bowles, F. (2012). Soil warming alters nitrogen cycling in a New England forest: Implications for ecosystem function and structure. Oecologia, 168(3), 819–828. https://doi.org/10.1007/s00442-011- 2133-7

Campbell, B. J. (2014). The family . In The Prokaryotes: Other Major Lineages of Bacteria and The Archaea (Vol. 9783642389, pp. 405–415). https://doi.org/10.1007/978- 3-642-38954-2_160

Cantarel, B. L., Coutinho, P. M., Rancurel, C., Bernard, T., Lombard, V., & Henrissat, B. (2009). The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Research, 37(Database), D233–D238. https://doi.org/10.1093/nar/gkn663

Capella-Gutierrez, S., Silla-Martinez, J. M., & Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972– 1973. https://doi.org/10.1093/bioinformatics/btp348

Carey, J. C., Tang, J., Templer, P. H., Kroeger, K. D., Crowther, T. W., Burton, A. J., … Tietema, A. (2016). Temperature response of soil respiration largely unaltered with experimental warming. Proceedings of the National Academy of Sciences, 113(48), 13797–13802. https://doi.org/10.1073/pnas.1605365113

Carini, P., Marsden, P. J., Leff, J. W., Morgan, E. E., Strickland, M. S., Fierer, N., … Storey, J. D. (2016). Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Nature Microbiology, 2, 16242. https://doi.org/10.1038/nmicrobiol.2016.242

127

Castelle, C. J., & Banfield, J. F. (2018). Leading Edge Perspective Major New Microbial Groups Expand Diversity and Alter our Understanding of the Tree of Life. https://doi.org/10.1016/j.cell.2018.02.016

Catão, E. C. P., Lopes, F. A. C., Araújo, J. F., De Castro, A. P., Barreto, C. C., Bustamante, M. M. C., … Krüger, R. H. (2014). Soil acidobacterial 16s rrna gene sequences reveal subgroup level differences between savanna-like cerrado and atlantic forest brazilian biomes. International Journal of Microbiology, 2014, 1–12. https://doi.org/10.1155/2014/156341

Chang, H.-H., Cho, S.-T., Canale, M. C., Mugford, S. T., Lopes, J. R. S., Hogenhout, S. A., & Kuo, C.- H. (2015). Complete Genome Sequence of "Candidatus Sulcia muelleri" ML, an Obligate Nutritional Symbiont of Maize Leafhopper (Dalbulus maidis). https://doi.org/10.1128/genomeA.01483-14

Chen, I.-M. A., Markowitz, V. M., Chu, K., Palaniappan, K., Szeto, E., Pillay, M., … Kyrpides, N. C. (2016). IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Research, 45, 507–516. https://doi.org/10.1093/nar/gkw929

Cheng, W. (2009). Rhizosphere priming effect: Its functional relationships with microbial turnover, evapotranspiration, and C–N budgets. Soil Biology and Biochemistry, 41(9), 1795– 1801. https://doi.org/10.1016/J.SOILBIO.2008.04.018

Choi, J., Yang, F., Stepanauskas, R., Cardenas, E., Garoutte, A., Williams, R., … Howe, A. (2017). Strategies to improve reference databases for soil microbiomes. The ISME Journal, 11(4). https://doi.org/10.1038/ismej.2016.168

Claverie, J. M., & Abergel, C. (2016). Giant viruses: The difficult breaking of multiple epistemological barriers. Studies in History and Philosophy of Science Part C :Studies in History and Philosophy of Biological and Biomedical Sciences, 59, 89–99. https://doi.org/10.1016/j.shpsc.2016.02.015

Clingenpeel, S., Clum, A., Schwientek, P., Rinke, C., & Woyke, T. (2014). Reconstructing each cell’s genome within complex microbial communities - dream or reality? Frontiers in Microbiology, 5(DEC). https://doi.org/10.3389/fmicb.2014.00771

Clode, P. L., Kilburn, M. R., Jones, D. L., Stockdale, E. A., Cliff, J. B., Herrmann, A. M., & Murphy, D. V. (2009). In situ mapping of nutrient uptake in the rhizosphere using nanoscale secondary ion mass spectrometry. Plant Physiology, 151(4), 1751–1757. https://doi.org/10.1104/pp.109.141499

Colson, P., Aherfi, S., & La Scola, B. (2017, December 1). Evidence of giant viruses of amoebae in the human gut. Human Microbiome Journal. Elsevier. https://doi.org/10.1016/j.humic.2017.11.001

128

Colson, P., La Scola, B., Levasseur, A., Caetano-Anollés, G., & Raoult, D. (2017). Mimivirus: leading the way in the discovery of giant viruses of amoebae. Nature Reviews Microbiology, 15(4), 243–254. https://doi.org/10.1038/nrmicro.2016.197

Conrad, R. (1996). Soil Microorganisms as Controllers of Atmospheric Trace Gases (H 2 , CO, CH 4 , OCS, N 2 O, and NO). MICROBIOLOGICAL REVIEWS (Vol. 60). Retrieved from http://mmbr.asm.org/

Criscuolo, A., & Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with Entropy): A new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology, 10(1). https://doi.org/10.1186/1471-2148-10-210

Cui, X., Lu, Z., Wang, S., Jing-Yan Wang, J., & Gao, X. (2016). CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics, 32(12), i332–i340. https://doi.org/10.1093/bioinformatics/btw271

D’Souza, G., Shitut, S., Preussger, D., Yousif, G., Waschina, S., & Kost, C. (2018). Ecology and evolution of metabolic cross-feeding interactions in bacteria. Natural Product Reports. https://doi.org/10.1039/c8np00009c

DeAngelis, K. M., Pold, G., Topçuoğlu, B. D., van Diepen, L. T. A., Varney, R. M., Blanchard, J. L., … Frey, S. D. (2015). Long-term forest soil warming alters microbial communities in temperate forest soils. Frontiers in Microbiology, 6(FEB), 1–13. https://doi.org/10.3389/fmicb.2015.00104

Dedysh, S. N., & Yilmaz, P. (2019). Refining the taxonomic structure of the phylum Acidobacteria. https://doi.org/10.1099/ijsem.0.003062

Deeg, C. M., Chow, C.-E. T., & Suttle, C. A. (2018). The kinetoplastid-infecting Bodo saltans virus (BsV), a window into the most abundant giant viruses in the sea. ELife, 7. https://doi.org/10.7554/eLife.33014

Deeg, C. M., Zimmer, M. M., George, E., Husnik, F., Keeling, P. J., & Suttle, C. A. (2018). Chromulinavorax destructans, a pathogenic TM6 bacterium with an unusual replication strategy targeting protist mitochondrion. BioRxiv, 379388. https://doi.org/10.1101/379388

Delcher, A. L., Salzberg, S. L., & Phillippy, A. M. (2003). Using MUMmer to Identify Similar Regions in Large Sequence Sets. Current Protocols in Bioinformatics, 00(1), 10.3.1-10.3.18. https://doi.org/10.1002/0471250953.bi1003s00

Delgado-Baquerizo, M., Maestre, F. T., Reich, P. B., Jeffries, T. C., Gaitan, J. J., Encinar, D., … Singh, B. K. (2016). Microbial diversity drives multifunctionality in terrestrial ecosystems. Nature Communications, 7, 10541. https://doi.org/10.1038/ncomms10541

129

Delgado-Baquerizo, M., Oliverio, A. M., Brewer, T. E., Benavent-González, A., Eldridge, D. J., Bardgett, R. D., … Fierer, N. (2018). A global atlas of the dominant bacteria found in soil. Science, 359(6373), 320–325. https://doi.org/10.1126/science.aap9516

Delmont, T. O., Eren, A. M., Maccario, L., Prestat, E., Esen, Ö. C., Pelletier, E., … Vogel, T. M. (2015). Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics. Frontiers in Microbiology, 6, 358. https://doi.org/10.3389/fmicb.2015.00358

Dumont, M. G., & Murrell, J. C. (2005). Stable isotope probing - Linking microbial identify to function. Nature Reviews Microbiology, 3(6), 499–504. https://doi.org/10.1038/nrmicro1162

Dunbar, J., Barns, S. M., Ticknor, L. O., & Kuske, C. R. (2002). Empirical and Theoretical Bacterial Diversity in Four Arizona Soils. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 68(6), 3035–3045. https://doi.org/10.1128/AEM.68.6.3035-3045.2002

Ebrahimi, A., & Or, D. (2016). Microbial community dynamics in soil aggregates shape biogeochemical gas fluxes from soil profiles - upscaling an aggregate biophysical model. Global Change Biology, 22(9), 3141–3156. https://doi.org/10.1111/gcb.13345

Eddy, S. R. (2011). Accelerated Profile HMM Searches. PLoS Computational Biology, 7(10), e1002195. https://doi.org/10.1371/journal.pcbi.1002195

Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340

Eichorst, S. A., & Kuske, C. R. (2012). Identification of cellulose-responsive bacterial and fungal communities in geographically and edaphically different soils by using stable isotope probing. Applied and Environmental Microbiology, 78(7), 2316–2327. https://doi.org/10.1128/AEM.07313-11

Eichorst, S. A., Strasser, F., Woyke, T., Schintlmeister, A., Wagner, M., & Woebken, D. (2015). Advancements in the application of NanoSIMS and Raman microspectroscopy to investigate the activity of microbial cells in soils. FEMS Microbiology Ecology, 91(10), fiv106. https://doi.org/10.1093/femsec/fiv106

Eichorst, S. A., Trojan, D., Roux, S., Herbold, C., Rattei, T., & Woebken, D. (2018). Genomic insights into the Acidobacteria reveal strategies for their success in terrestrial environments. Environmental Microbiology, 20(3), 1041–1063. https://doi.org/10.1111/1462-2920.14043

Eichorst, S., & Woebken, D. (2014). Investigation of microorganisms at the single-cell level using Raman Microspectroscopy and Nanometer-scale Secondary Ion Mass Spectrometry (pp. 203–211). 130

Eloe-Fadrosh, E. A., Ivanova, N. N., Woyke, T., & Kyrpides, N. C. (2016). Metagenomics uncovers gaps in amplicon-based detection of microbial diversity. Nature Microbiology, 1(4), 15032. https://doi.org/10.1038/nmicrobiol.2015.32

Estrela, S., Trisos, C. H., & Brown, S. P. (2012). From metabolism to ecology: cross-feeding interactions shape the balance between polymicrobial conflict and mutualism. The American Naturalist, 180(5), 566–576. https://doi.org/10.1086/667887

Fernández-Gómez, B., Richter, M., Schüler, M., Pinhassi, J., Acinas, S. G., González, J. M., & Pedrós-Alió, C. (2013). Ecology of marine Bacteroidetes: a comparative genomics approach. The ISME Journal, 7(5), 1026–1037. https://doi.org/10.1038/ismej.2012.169

Fierer, N. (2017). Embracing the unknown: Disentangling the complexities of the soil microbiome. Nature Reviews Microbiology. https://doi.org/10.1038/nrmicro.2017.87

Fierer, N., Bradford, M. A., & Jackson, R. B. (2007). TOWARD AN ECOLOGICAL CLASSIFICATION OF SOIL BACTERIA. Ecology, 88(6), 1354–1364. https://doi.org/10.1890/05-1839

Filée, J. (2013, October 1). Route of NCLDV evolution: The genomic accordion. Current Opinion in Virology. Elsevier. https://doi.org/10.1016/j.coviro.2013.07.003

Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., … Bateman, A. (2016). The Pfam protein families database: towards a more sustainable future. Nucleic Acids Research, 44(D1), D279–D285. https://doi.org/10.1093/nar/gkv1344

Fischer, M. G. (2016, June 1). Giant viruses come of age. Current Opinion in Microbiology. Elsevier Current Trends. https://doi.org/10.1016/j.mib.2016.03.001

Fischer, M. G., Allen, M. J., Wilson, W. H., & Suttle, C. A. (2010). Giant virus with a remarkable complement of genes infects marine zooplankton. Proceedings of the National Academy of Sciences of the United States of America, 107(45), 19508–19513. https://doi.org/10.1073/pnas.1007615107

Frey, S. D., Drijber, R., Smith, H., & Melillo, J. (2008). Microbial biomass, functional capacity, and community structure after 12 years of soil warming. Soil Biology and Biochemistry, 40(11), 2904–2907. https://doi.org/10.1016/j.soilbio.2008.07.020

Frey, Serita D., Lee, J., Melillo, J. M., & Six, J. (2013). The temperature response of soil microbial efficiency and its feedback to climate. Nature Climate Change, 3(4), 395–398. https://doi.org/10.1038/nclimate1796

Frey, Serita D, Knorr, M., Parrent, J. L., & Simpson, R. T. (2004). Chronic nitrogen enrichment affects the structure and function of the soil microbial community in temperate hardwood and pine forests. Forest Ecology and Management, 196(1), 159–171. https://doi.org/10.1016/J.FORECO.2004.03.018

131

Gallot-Lavallée, L., Blanc, G., & Claverie, J.-M. (2017). Comparative Genomics of Chrysochromulina Ericina Virus and Other Microalga-Infecting Large DNA Viruses Highlights Their Intricate Evolutionary Relationship with the Established Mimiviridae Family. Journal of Virology, 91(14). https://doi.org/10.1128/JVI.00230-17

Gans, J., Wolinsky, M., & Dunbar, J. (2005). Microbiology: Computational improvements reveal great bacterial diversity and high toxicity in soil. Science, 309(5739), 1387–1390. https://doi.org/10.1126/science.1112665

Geisen, S., Mitchell, E. A. D., Adl, S., Bonkowski, M., Dunthorn, M., Ekelund, F., … Lara, E. (2018). Soil protists: a fertile frontier in soil biology research. https://doi.org/10.1093/femsre/fuy006/4855940

Glavina, T., Rio, D., Abt, B., Spring, S., Lapidus, A., Nolan, M., … Lucas, S. (2010). Complete genome sequence of Chitinophaga pinensis type strain (UQM 2034 T ). Standards in Genomic Sciences, 2, 87–95. https://doi.org/10.4056/sigs.661199

Goris, J., Konstantinidis, K. T., Klappenbach, J. A., Coenye, T., Vandamme, P., & Tiedje, J. M. (2007). DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. International Journal of Systematic and Evolutionary Microbiology, 57(1), 81– 91. https://doi.org/10.1099/ijs.0.64483-0

Gorka, S., Dietrich, M., Mayerhofer, W., Gabriel, R., Wiesenbauer, J., Martin, V., … Kaiser, C. (2019). Rapid Transfer of Plant Photosynthates to Soil Bacteria via Ectomycorrhizal Hyphae and Its Interaction With Nitrogen Availability. Frontiers in Microbiology, 10, 168. https://doi.org/10.3389/fmicb.2019.00168

Graham, E. B., Knelman, J. E., Schindlbacher, A., Siciliano, S., Breulmann, M., Yannarell, A., … Nemergut, D. R. (2016). Microbes as Engines of Ecosystem Function: When Does Community Structure Enhance Predictions of Ecosystem Processes? Frontiers in Microbiology, 7, 214. https://doi.org/10.3389/fmicb.2016.00214

Greening, C., Biswas, A., Carere, C. R., Jackson, C. J., Taylor, M. C., Stott, M. B., … Morales, S. E. (2016). Genomic and metagenomic surveys of hydrogenase distribution indicate H 2 is a widely utilised energy source for microbial growth and survival. ISME Journal, 10(3), 761– 777. https://doi.org/10.1038/ismej.2015.153

Greening, C., Constant, P., Hards, K., Morales, S. E., Oakeshott, J. G., Russell, R. J., … Cookb, G. M. (2015). Atmospheric hydrogen scavenging: From enzymes to ecosystems. Applied and Environmental Microbiology. https://doi.org/10.1128/AEM.03364-14

Gruber-Vodicka, H. R., Seah, B. K. B., & Pruesse, E. (2019). phyloFlash-Rapid SSU rRNA profiling and targeted assembly from metagenomes. https://doi.org/10.1101/521922

Gupta, R. S., Bhandari, V., & Naushad, H. S. (2012). Molecular signatures for the PVC clade (Planctomycetes, Verrucomicrobia, Chlamydiae, and Lentisphaerae) of bacteria provide 132

insights into their evolutionary relationships. Frontiers in Microbiology, 3, 327. https://doi.org/10.3389/fmicb.2012.00327

Halary, S., Temmam, S., Raoult, D., & Desnues, C. (2016, June 1). Viral metagenomics: Are we missing the giants? Current Opinion in Microbiology. Elsevier Current Trends. https://doi.org/10.1016/j.mib.2016.01.005

Hansel, C. M., Fendorf, S., Jardine, P. M., & Francis, C. A. (2008). Changes in bacterial and archaeal community structure and functional diversity along a geochemically variable soil profile. Applied and Environmental Microbiology, 74(5), 1620–1633. https://doi.org/10.1128/AEM.01787-07

Hatzenpichler, R., Connon, S. A., Goudeau, D., Malmstrom, R. R., Woyke, T., & Orphan, V. J. (2016). Visualizing in situ translational activity for identifying and sorting slow-growing archaeal−bacterial consortia. Proceedings of the National Academy of Sciences, 113(28), E4069–E4078. https://doi.org/10.1073/pnas.1603757113

Hausmann, B., Pelikan, C., Herbold, C. W., Köstlbacher, S., Albertsen, M., Eichorst, S. A., … Loy, A. (2018). Peatland Acidobacteria with a dissimilatory sulfur metabolism. The ISME Journal, 12, 1729–1742. https://doi.org/10.1038/s41396-018-0077-1

Helfrich, M., Ludwig, B., Thoms, C., Gleixner, G., & Flessa, H. (2015). The role of soil fungi and bacteria in plant litter decomposition and macroaggregate formation determined using phospholipid fatty acids. Applied Soil Ecology, 96, 261–264. https://doi.org/10.1016/J.APSOIL.2015.08.023

Hemkemeyer, M., Christensen, B. T., Martens, R., & Tebbe, C. C. (2015). Soil particle size fractions harbour distinct microbial communities and differ in potential for microbial mineralisation of organic pollutants. Soil Biology and Biochemistry, 90, 255–265. https://doi.org/10.1016/J.SOILBIO.2015.08.018

Hicks Pries, C. E., Castanha, C., Porras, R. C., & Torn, M. S. (2017). The whole-soil carbon flux in response to warming. Science (New York, N.Y.), 355(6332), 1420–1423. https://doi.org/10.1126/science.aal1319

Hingamp, P., Grimsley, N., Acinas, S. G., Clerissi, C., Subirana, L., Poulain, J., … Ogata, H. (2013). Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes. The ISME Journal, 7(9), 1678–1695. https://doi.org/10.1038/ismej.2013.59

Hoell, I. A., Vaaje-Kolstad, G., & Eijsink, V. G. H. (2010). Structure and function of enzymes acting on chitin and chitosan. Biotechnology and Genetic Engineering Reviews, 27(1), 331–366. https://doi.org/10.1080/02648725.2010.10648156

Horn, M. (2008). Chlamydiae as Symbionts in Eukaryotes. Annual Review of Microbiology, 62(1), 113–131. https://doi.org/10.1146/annurev.micro.62.081307.162818

133

Huang, W. E., Stoecker, K., Griffiths, R., Newbold, L., Daims, H., Whiteley, A. S., & Wagner, M. (2007a). Raman-FISH: combining stable-isotope Raman spectroscopy and fluorescence in situ hybridization for the single cell analysis of identity and function. Environmental Microbiology, 9(8), 1878–1889. https://doi.org/10.1111/j.1462-2920.2007.01352.x

Huang, W. E., Stoecker, K., Griffiths, R., Newbold, L., Daims, H., Whiteley, A. S., & Wagner, M. (2007b). Raman-FISH: Combining stable-isotope Raman spectroscopy and fluorescence in situ hybridization for the single cell analysis of identity and function. Environmental Microbiology, 9(8), 1878–1889. https://doi.org/10.1111/j.1462-2920.2007.01352.x

Huang, W., Li, L., Myers, J. R., & Marth, G. T. (2012). ART: a next-generation sequencing read simulator. Bioinformatics, 28(4), 593–594. https://doi.org/10.1093/bioinformatics/btr708

Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., … Bork, P. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research, 47(D1), D309–D314. https://doi.org/10.1093/nar/gky1085

Hug, L. A., Baker, B. J., Anantharaman, K., Brown, C. T., Probst, A. J., Castelle, C. J., … Banfield, J. F. (2016). A new view of the tree of life. Nature Microbiology, 1(5). https://doi.org/10.1038/nmicrobiol.2016.48

Hug, L. A., & Co, R. (2018). It Takes a Village: Microbial Communities Thrive through Interactions and Metabolic Handoffs. MSystems, 3(2). https://doi.org/10.1128/mSystems.00152-17

Hultman, J., Waldrop, M. P., Mackelprang, R., David, M. M., McFarland, J., Blazewicz, S. J., … Jansson, J. K. (2015). Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature, 521(7551), 208–212. https://doi.org/10.1038/nature14238

Huson, D. H., Albrecht, B., Bağcı, C., Bessarab, I., Górska, A., Jolic, D., & Williams, R. B. H. (2018). MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biology Direct, 13(1), 6. https://doi.org/10.1186/s13062-018-0208-7

Isobe, K., Koba, K., Otsuka, S., & Senoo, K. (2011). Nitrification and nitrifying microbial communities in forest soils. Journal of Forest Research, 16(5), 351–362. https://doi.org/10.1007/s10310-011-0266-5

Iyer, L. M., Aravind, L., & Koonin, E. V. (2001). Common origin of four diverse families of large eukaryotic DNA viruses. Journal of Virology, 75(23), 11720–11734. https://doi.org/10.1128/JVI.75.23.11720-11734.2001

Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T., & Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications, 9(1), 5114. https://doi.org/10.1038/s41467-018-07641-9

134

Janssen, P. H. (2006). Identifying the Dominant Soil Bacterial Taxa in Libraries of 16S rRNA and 16S rRNA Genes. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 72(3), 1719–1728. https://doi.org/10.1128/AEM.72.3.1719-1728.2006

Jarett, J. K., Nayfach, S., Podar, M., Inskeep, W., Ivanova, N. N., Munson-McGee, J., … Woyke, T. (2018). Single-cell genomics of co-sorted Nanoarchaeota suggests novel putative host associations and diversification of proteins involved in symbiosis. Microbiome, 6(1), 161. https://doi.org/10.1186/s40168-018-0539-8

Jarvis, R. M., & Goodacre, R. (2004). Discrimination of Bacteria Using Surface-Enhanced Raman Spectroscopy. Analytical Chemistry, 76(1), 40–47. https://doi.org/10.1021/ac034689c

Jastrow, J. D., Miller, R. M., & Lussenhop, J. (1998). Contributions of interacting biological mechanisms to soil aggregate stabilization in restored prairie. Soil Biology and Biochemistry, 30(7), 905–916. https://doi.org/10.1016/S0038-0717(97)00207-1

Jiang, Y., Qian, H., Wang, X., Chen, L., Liu, M., Li, H., & Sun, B. (2018). Nematodes and microbial community affect the sizes and turnover rates of organic carbon pools in soil aggregates. Soil Biology and Biochemistry, 119, 22–31. https://doi.org/10.1016/J.SOILBIO.2018.01.001

Jones, R. T., Robeson, M. S., Lauber, C. L., Hamady, M., Knight, R., & Fierer, N. (2009). A comprehensive survey of soil acidobacterial diversity using pyrosequencing and clone library analyses. ISME Journal, 3(4), 442–453. https://doi.org/10.1038/ismej.2008.127

Jousset, A., Bienhold, C., Chatzinotas, A., Gallien, L., Gobet, A., Kurm, V., … Hol, W. H. G. (2017). Where less may be more: how the rare biosphere pulls ecosystems strings. The ISME Journal, 11(4), 853–862. https://doi.org/10.1038/ismej.2016.174

Kabisch, A., Otto, A., König, S., Becher, D., Albrecht, D., Schüler, M., … Schweder, T. (2014). Functional characterization of polysaccharide utilization loci in the marine Bacteroidetes ‘Gramella forsetii’ KT0803. The ISME Journal, 8(7), 1492–1502. https://doi.org/10.1038/ismej.2014.4

Kaiser, C., Franklin, O., Dieckmann, U., & Richter, A. (2014). Microbial community dynamics alleviate stoichiometric constraints during litter decay. Ecology Letters, 17(6), 680–690. https://doi.org/10.1111/ele.12269

Kaiser, C., Kilburn, M. R., Clode, P. L., Fuchslueger, L., Koranda, M., Cliff, J. B., … Murphy, D. V. (2015). Exploring the transfer of recent plant photosynthates to soil microbes: mycorrhizal pathway vs direct root exudation. New Phytologist, 205(4), 1537–1551. https://doi.org/10.1111/nph.13138

Kalvari, I., Argasinska, J., Quinones-Olvera, N., Nawrocki, E. P., Rivas, E., Eddy, S. R., … Petrov, A. I. (2018). Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Research, 46(D1), D335–D342. https://doi.org/10.1093/nar/gkx1038

135

Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28(1), 27–30. https://doi.org/10.1093/nar/28.1.27

Kanehisa, Minoru, Sato, Y., Morishima, K., & Sternberg, M. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. https://doi.org/10.1016/j.jmb.2015.11.006

Kang, D. D., Froula, J., Egan, R., & Wang, Z. (2015). MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ, 3, e1165. https://doi.org/10.7717/peerj.1165

Katoh, K., Misawa, K., Kuma, K., & Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14), 3059– 3066. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12136088

Khalil, J. Y. B., Robert, S., Reteno, D. G., Andreani, J., Raoult, D., & La Scola, B. (2016). High- Throughput Isolation of Giant Viruses in Liquid Medium Using Automated Flow Cytometry and Fluorescence Staining. Frontiers in Microbiology, 7, 26. https://doi.org/10.3389/fmicb.2016.00026

Kielak, A. M., Barreto, C. C., Kowalchuk, G. A., van Veen, J. A., & Kuramae, E. E. (2016). The ecology of Acidobacteria: Moving beyond genes and genomes. Frontiers in Microbiology. https://doi.org/10.3389/fmicb.2016.00744

Kielak, A. M., Matheus, ·, Cipriano, A. P., Eiko, ·, & Kuramae, E. (2016). Acidobacteria strains from subdivision 1 act as plant growth-promoting bacteria. Archives of Microbiology, 198, 987– 993. https://doi.org/10.1007/s00203-016-1260-2

Kiełbasa, S. M., Wan, R., Sato, K., Horton, P., & Frith, M. C. (2011). Adaptive seeds tame genomic sequence comparison. Genome Research, 21(3), 487–493. https://doi.org/10.1101/gr.113985.110

Kögel-Knabner, I., Guggenberger, G., Kleber, M., Kandeler, E., Kalbitz, K., Scheu, S., … Leinweber, P. (2008). Organo-mineral associations in temperate soils: Integrating biology, mineralogy, and organic matter chemistry §. J. Plant Nutr. Soil Sci, 171, 61–82. https://doi.org/10.1002/jpln.200700048

Konstantinidis, K. T., Rosselló-Móra, R., & Amann, R. (2017). Uncultivated microbes in need of their own taxonomy. ISME Journal. https://doi.org/10.1038/ismej.2017.113

Konstantinidis, K. T., & Tiedje, J. M. (2005). Genomic insights that advance the species definition for prokaryotes. PNAS February (Vol. 15). Retrieved from www.tigr.org.

Kovarik, M. L., & Allbritton, N. L. (2011). Measuring enzyme activity in single cells. Trends in Biotechnology, 29(5), 222–230. https://doi.org/10.1016/J.TIBTECH.2011.01.003

136

Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., … Marra, M. A. (2009). Circos: An information aesthetic for comparative genomics. Genome Research, 19(9), 1639– 1645. https://doi.org/10.1101/gr.092759.109

Kuske, C. R., Barns, S. M., & Busch, J. D. (1997). Diverse Uncultivated Bacterial Groups from Soils of the Arid Southwestern United States That Are Present in Many Geographic Regions. APPLIED AND ENVIRONMENTAL MICROBIOLOGY (Vol. 63). Retrieved from http://aem.asm.org/

Kuzyakov, Y. (2010). Citation Classics Priming effects: Interactions between living and dead organic matter. https://doi.org/10.1016/j.soilbio.2010.04.003

Kuzyakov, Y., & Blagodatskaya, E. (2015). Microbial hotspots and hot moments in soil: Concept & review. Soil Biology and Biochemistry, 83, 184–199. https://doi.org/10.1016/J.SOILBIO.2015.01.025

La Scola, B., Audic, S., Robert, C., Jungang, L., De Lamballerie, X., Drancourt, M., … Raoult, D. (2003). A giant virus in amoebae. Science. https://doi.org/10.1126/science.1081867

Labes, A., Karlsson, E. N., Fridjonsson, O. H., Turner, P., Hreggvidson, G. O., Kristjansson, J. K., … Schönheit, P. (2008). Novel members of glycoside hydrolase family 13 derived from environmental DNA. Applied and Environmental Microbiology, 74(6), 1914–1921. https://doi.org/10.1128/AEM.02102-07

Lagkouvardos, I., Weinmaier, T., Lauro, F. M., Cavicchioli, R., Rattei, T., & Horn, M. (2014). Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. The ISME Journal, 8(1), 115–125. https://doi.org/10.1038/ismej.2013.142

Lavelle, P. (1997). Faunal Activities and Soil Processes: Adaptive Strategies That Determine Ecosystem Function. Advances in Ecological Research, 27(C), 93–132. https://doi.org/10.1016/S0065-2504(08)60007-0

Legendre, M., Bartoli, J., Shmakova, L., Jeudy, S., Labadie, K., Adrait, A., … Claverie, J.-M. (2014). Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology. Proceedings of the National Academy of Sciences of the United States of America, 111(11), 4274–4279. https://doi.org/10.1073/pnas.1320670111

Legendre, M., Fabre, E., Poirot, O., Jeudy, S., Lartigue, A., Alempic, J.-M., … Claverie, J.-M. (2018). Diversity and evolution of the emerging Pandoraviridae family. Nature Communications, 9(1), 2285. https://doi.org/10.1038/s41467-018-04698-4

Legendre, M., Lartigue, A., Bertaux, L., Jeudy, S., Bartoli, J., Lescot, M., … Claverie, J.-M. (2015). In-depth study of Mollivirus sibericum, a new 30,000-y-old giant virus infecting Acanthamoeba. Proceedings of the National Academy of Sciences of the United States of America, 112(38), E5327-35. https://doi.org/10.1073/pnas.1510795112 137

Lehmann, J., Solomon, D., Kinyangi, J., Dathe, L., Wirick, S., & Jacobsen, C. (2008). Spatial complexity of soil organic matter forms at nanometre scales. Nature Geoscience, 1(4), 238– 242. https://doi.org/10.1038/ngeo155

Lennon, J T, Muscarella, M. E., Placella, S. A., Lehmkuhl, B. K., & Kellogg, W. K. (2018). How, When, and Where Relic DNA Affects Microbial Diversity. https://doi.org/10.1128/mBio

Lennon, Jay T, & Locey, K. J. (2016). The Underestimation of Global Microbial Diversity. https://doi.org/10.1128/mBio.01623-16

Lennon, Jay T, Muscarella, M. E., Placella, S. A., & Lehmkuhl, B. K. (2017). How, when, and where relic DNA biases estimates of microbial diversity. BioRxiv, 131284. https://doi.org/10.1101/131284

Letunic, I., & Bork, P. (2016). Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Research, 44(W1), W242–W245. https://doi.org/10.1093/nar/gkw290

Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10). https://doi.org/10.1093/bioinformatics/btv033

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760. https://doi.org/10.1093/bioinformatics/btp324

Li, Heng. (2015). BFC: correcting Illumina sequencing errors. Bioinformatics, 31(17), 2885–2887. https://doi.org/10.1093/bioinformatics/btv290

Li, M., Canniffe, D. P., Jackson, P. J., Davison, P. A., FitzGerald, S., Dickman, M. J., … Huang, W. E. (2012). Rapid resonance Raman microspectroscopy to probe carbon dioxide fixation by single cells in microbial communities. The ISME Journal, 6(4), 875–885. https://doi.org/10.1038/ismej.2011.150

Li, W., & Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22(13), 1658–1659. https://doi.org/10.1093/bioinformatics/btl158

Lindemann, S. R., Bernstein, H. C., Song, H.-S., Fredrickson, J. K., Fields, M. W., Shou, W., … Beliaev, A. S. (2016). Engineering microbial consortia for controllable outputs. The ISME Journal, 10(9), 2077–2084. https://doi.org/10.1038/ismej.2016.26

Lladó, S., López-Mondéjar, R., & Baldrian, P. (2017). Forest Soil Bacteria: Diversity, Involvement in Ecosystem Processes, and Response to Global Change. Microbiology and Molecular Biology Reviews : MMBR, 81(2), e00063-16. https://doi.org/10.1128/MMBR.00063-16

138

Lladó, S., Žifčáková, L., Větrovský, T., Eichlerová, I., & Baldrian, P. (2016). Functional screening of abundant bacteria from acidic forest soil indicates the metabolic potential of Acidobacteria subdivision 1 for polysaccharide decomposition. Biology and Fertility of Soils, 52(2), 251– 260. https://doi.org/10.1007/s00374-015-1072-6

Lloyd, K. G., Steen, A. D., Ladau, J., Yin, J., & Crosby, L. (2018). Phylogenetically Novel Uncultured Microbial Cells Dominate Earth Microbiomes. MSystems, 3(5), e00055-18. https://doi.org/10.1128/mSystems.00055-18

Ló Pez-Sánchez, M. J., Neef, A., Peretó, J., Patiñ O-Navarrete, R., & Pignatelli, M. (2009). Evolutionary Convergence and Nitrogen Metabolism in Blattabacterium strain Bge, Primary Endosymbiont of the Cockroach Blattella germanica. PLoS Genet, 5(11), 1000721. https://doi.org/10.1371/journal.pgen.1000721

Locey, K. J., & Lennon, J. T. (2016a). Scaling laws predict global microbial diversity, 113(21). https://doi.org/10.1073/pnas.1521291113

Locey, K. J., & Lennon, J. T. (2016b). Scaling laws predict global microbial diversity. Proceedings of the National Academy of Sciences, 113(21), 5970–5975. https://doi.org/10.1073/pnas.1521291113

Lombard, N., Prestat, E., van Elsas, J. D., & Simonet, P. (2011). Soil-specific limitations for access and analysis of soil microbial communities by metagenomics. FEMS Microbiology Ecology, 78(1), 31–49. https://doi.org/10.1111/j.1574-6941.2011.01140.x

Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M., & Henrissat, B. (2014). The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research, 42(Database issue), D490-5. https://doi.org/10.1093/nar/gkt1178

López-Mondéjar, R., Zühlke, D., Becher, D., Riedel, K., & Baldrian, P. (2016). Cellulose and hemicellulose decomposition by forest soil bacteria proceeds by the action of structurally variable enzymatic systems. Scientific Reports, 6, 25279. https://doi.org/10.1038/srep25279

Losey, N. A., Stevenson, B. S., Busse, H. J., Damsté, J. S. S., Rijpstra, W. I. C., Rudd, S., & Lawson, P. A. (2013). Thermoanaerobaculum aquaticum gen. nov., sp. nov., the first cultivated member of acidobacteria subdivision 23, isolated from a hot spring. International Journal of Systematic and Evolutionary Microbiology, 63(PART 11), 4149–4157. https://doi.org/10.1099/ijs.0.051425-0

Lowe, T. M., & Chan, P. P. (2016). tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Research, 44(W1), W54–W57. https://doi.org/10.1093/nar/gkw413

139

Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Yadhukumar, … Schleifer, K. (2004). ARB: a software environment for sequence data. Nucleic Acids Research, 32(4), 1363–1371. https://doi.org/10.1093/nar/gkh293

Luo, C., Rodriguez-R, L. M., & Konstantinidis, K. T. (2014). MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Research, 42(8), e73– e73. https://doi.org/10.1093/nar/gku169

Luo, Y., Wan, S., Hui, D., & Wallace, L. L. (2001). Acclimatization of soil respiration to warming in a tall grass prairie. Nature, 413(6856), 622–625. https://doi.org/10.1038/35098065

MacLean, R. C. (2005, October 12). Adaptive radiation in microbial microcosms. Journal of Evolutionary Biology. John Wiley & Sons, Ltd (10.1111). https://doi.org/10.1111/j.1420- 9101.2005.00931.x

Männistö, M. K., Kurhela, E., Tiirola, M., & Häggblom, M. M. (2013). Acidobacteria dominate the active bacterial communities of Arctic tundra with widely divergent winter-time snow accumulation and soil temperatures. FEMS Microbiology Ecology, 84(1), 47–59. https://doi.org/10.1111/1574-6941.12035

Männistö, M. K., Rawat, S., Starovoytov, V., & Häggblom, M. M. (2012). Granulicella arctica sp. nov., granulicella mallensis sp. nov., granulicella tundricola sp. nov. and granulicella sapmiensis sp. nov., novel acidobacteria from tundra soil. International Journal of Systematic and Evolutionary Microbiology, 62(9), 2097–2106. https://doi.org/10.1099/ijs.0.031864-0

Martínez, J. M., Swan, B. K., & Wilson, W. H. (2014). Marine viruses, a genetic reservoir revealed by targeted viromics. The ISME Journal, 8(5), 1079–1088. https://doi.org/10.1038/ismej.2013.214

Massalha, H., Korenblum, E., Malitsky, S., Shapiro, O. H., & Aharoni, A. (2017). Live imaging of root-bacteria interactions in a microfluidics setup. PLANT BIOLOGY SEE COMMENTARY, 114(17), 4549–4554. https://doi.org/10.1073/pnas.1618584114

Mayali, X., Stewart, B., Mabery, S., & Weber, P. K. (2016). Temporal succession in carbon incorporation from macromolecules by particle-attached bacteria in marine microcosms. Environmental Microbiology Reports, 8(1), 68–75. https://doi.org/10.1111/1758- 2229.12352

Mayali, X., Weber, P. K., Brodie, E. L., Mabery, S., Hoeprich, P. D., & Pett-Ridge, J. (2012). High- throughput isotopic analysis of RNA microarrays to quantify microbial resource use. The ISME Journal, 6(6), 1210–1221. https://doi.org/10.1038/ismej.2011.175

Mayali, X., Weber, P. K., Mabery, S., & Pett-Ridge, J. (2014). Phylogenetic Patterns in the Microbial Response to Resource Availability: Amino Acid Incorporation in San Francisco Bay. PLoS ONE, 9(4), e95842. https://doi.org/10.1371/journal.pone.0095842 140

McCutcheon, J. P., & Moran, N. A. (2012). Extreme genome reduction in symbiotic bacteria. Nature Reviews Microbiology. https://doi.org/10.1038/nrmicro2670

Mckee, L. S., Martínez-Abad, A., Ruthes, A. C., Vilaplana, F., & Brumer, H. (2019). Focused Metabolism of β-Glucans by the Soil Bacteroidetes Species Chitinophaga pinensis. https://doi.org/10.1128/AEM.02231-18

McLean, J. S., Lombardo, M.-J., Badger, J. H., Edlund, A., Novotny, M., Yee-Greenbaum, J., … Lasken, R. S. (2013a). Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum. Proceedings of the National Academy of Sciences of the United States of America, 110(26), E2390-9. https://doi.org/10.1073/pnas.1219809110

McLean, J. S., Lombardo, M. J., Badger, J. H., Edlund, A., Novotny, M., Yee-Greenbaum, J., … Lasken, R. S. (2013b). Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum. Proc Natl Acad Sci U S A, 110(26), E2390-9. https://doi.org/10.1073/pnas.1219809110

Melillo, J M, Frey, S. D., Deangelis, K. M., Werner, W. J., Bernard, M. J., Bowles, F. P., … Grandy, A. S. (2017). Long-term pattern and magnitude of soil carbon feedback to the climate system in a warming world. Science, 358(6359), 101–105. https://doi.org/10.1126/science.aan2874

Melillo, Jerry M, Butler, S., Johnson, J., Mohan, J., Steudler, P., Lux, H., … Tang, J. (2011). Soil warming, carbon-nitrogen interactions, and forest carbon budgets. Proceedings of the National Academy of Sciences of the United States of America, 108(23), 9508–9512. https://doi.org/10.1073/pnas.1018189108

Mewis, K., Lenfant, N., Lombard, V., & Henrissat, B. (2016). Dividing the Large Glycoside Hydrolase Family 43 into Subfamilies: a Motivation for Detailed Enzyme Characterization. Appl. Environ. Microbiol., 82(6), 1686–1692. https://doi.org/10.1128/AEM.03453-15

Mihara, T., Koyano, H., Hingamp, P., Grimsley, N., Goto, S., & Ogata, H. (2018). Taxon Richness of “Megaviridae” Exceeds those of Bacteria and Archaea in the Ocean. Microbes and Environments, 33(2), 162–171. https://doi.org/10.1264/jsme2.ME17203

Mikheenko, A., Saveliev, V., & Gurevich, A. (2016). MetaQUAST: evaluation of metagenome assemblies. Bioinformatics, 32(7), 1088–1090. https://doi.org/10.1093/bioinformatics/btv697

Momeni, B., Waite, A. J., & Shou, W. (2013). Spatial self-organization favors heterotypic cooperation over cheating. ELife, 2. https://doi.org/10.7554/eLife.00960

Moran, N. A., Mclaughlin, H. J., & Sorek, R. (2009). The Dynamics and Time Scale of Ongoing Genomic Erosion in Symbiotic Bacteria. Source: Science, New Series, 323(5912), 379–382. https://doi.org/10.1126/science.1163934 141

Morris, J. J. (2015). Black Queen evolution: the role of leakiness in structuring microbial communities. Trends in Genetics, 31(8), 475–482. https://doi.org/10.1016/J.TIG.2015.05.004

Murrell, Y. J. C., & Whiteley, A. S. (2011). Stable Isotope Probing and Related Technologies. Retrieved from www.asmscience.org

Musat, N., Halm, H., Rbel Winterholler, B., Hoppe, P., Peduzzi, S., Hillion, F., … Kuypers, M. M. M. (2008). A single-cell view on the ecophysiology of anaerobic phototrophic bacteria. Retrieved from www.pnas.org/cgi/content/full/

Naether, A., Foesel, B. U., Naegele, V., Wüst, P. K., Weinert, J., Bonkowski, M., … Friedrich, M. W. (2012). Environmental Factors Affect Acidobacterial Communities below the Subgroup Level in Grassland and Forest Soils. https://doi.org/10.1128/AEM.01325-12

Nagler, M., Insam, H., Pietramellara, G., & Ascher-Jenull, J. (2018). Extracellular DNA in natural environments: features, relevance and applications. Applied Microbiology and Biotechnology. https://doi.org/10.1007/s00253-018-9120-4

Nagy, K., Ábrahám, Á., Keymer, J. E., & Galajda, P. (2018). Application of Microfluidics in Experimental Ecology: The Importance of Being Spatial. Frontiers in Microbiology, 9, 496. https://doi.org/10.3389/fmicb.2018.00496

Navarrete, A. A., Kuramae, E. E., de Hollander, M., Pijl, A. S., van Veen, J. A., & Tsai, S. M. (2013). Acidobacterial community responses to agricultural management of soybean in Amazon forest soils. FEMS Microbiology Ecology, 83(3), 607–621. https://doi.org/10.1111/1574- 6941.12018

Nawrocki, E. P., & Eddy, S. R. (2013). Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics, 29(22), 2933–2935. https://doi.org/10.1093/bioinformatics/btt509

Nayfach, S., & Pollard, K. S. (2016). Toward Accurate and Quantitative Comparative Metagenomics. Cell, 166(5), 1103–1116. https://doi.org/10.1016/j.cell.2016.08.007

Nemergut, D. R., Schmidt, S. K., Fukami, T., O’Neill, S. P., Bilinski, T. M., Stanish, L. F., … Ferrenberg, S. (2013). Patterns and processes of microbial community assembly. Microbiology and Molecular Biology Reviews : MMBR, 77(3), 342–356. https://doi.org/10.1128/MMBR.00051-12

Nesme, J., Achouak, W., Agathos, S. N., Bailey, M., Baldrian, P., Brunel, D., … Simonet, P. (2016). Back to the Future of Soil Metagenomics. Frontiers in Microbiology, 7. https://doi.org/10.3389/fmicb.2016.00073

Nesme, J., Cécillon, S., Delmont, T. O., Monier, J.-M., Vogel, T. M., & Simonet, P. (2014). Large- Scale Metagenomic-Based Study of Antibiotic Resistance in the Environment. Current Biology, 24(10), 1096–1100. https://doi.org/10.1016/J.CUB.2014.03.036

142

Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and Evolution, 32(1), 268–274. https://doi.org/10.1093/molbev/msu300

Nichols, D., Cahoon, N., Trakhtenberg, E. M., Pham, L., Mehta, A., Belanger, A., … Epstein, S. S. (2010). Use of Ichip for High-Throughput In Situ Cultivation of "Uncultivable" Microbial Species. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 76(8), 2445–2450. https://doi.org/10.1128/AEM.01754-09

Nie, M., Pendall, E., Bell, C., & Wallenstein, M. D. (2014). Soil aggregate size distribution mediates microbial climate change feedbacks. Soil Biology and Biochemistry, 68, 357–365. https://doi.org/10.1016/J.SOILBIO.2013.10.012

Nielsen, S., & Müller, F. (2000). Emergent properties of ecosystems. Handbook of Ecosystem Theories and Management, 195–216.

Noh, N.-J., Kuribayashi, M., Saitoh, T. M., Nakaji, T., Nakamura, M., Hiura, T., & Muraoka, H. (2016). Responses of Soil, Heterotrophic, and Autotrophic Respiration to Experimental Open-Field Soil Warming in a Cool-Temperate Deciduous Forest. Ecosystems, 19(3), 504– 520. https://doi.org/10.1007/s10021-015-9948-8

Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A. (n.d.). metaSPAdes: a new versatile metagenomic assembler. https://doi.org/10.1101/gr.213959.116

O’Donnell, A. G., Young, I. M., Rushton, S. P., Shirley, M. D., & Crawford, J. W. (2007). Visualization, modelling and prediction in soil microbiology. Nature Reviews Microbiology, 5(9), 689–699. https://doi.org/10.1038/nrmicro1714

Ofaim, S., Ofek-Lalzar, M., Sela, N., Jinag, J., Kashi, Y., Minz, D., & Freilich, S. (2017). Analysis of Microbial Functions in the Rhizosphere Using a Metabolic-Network Based Framework for Metagenomics Interpretation. Frontiers in Microbiology, 8, 1606. https://doi.org/10.3389/fmicb.2017.01606

Orphan, V. J., House, C. H., Hinrichs, K. U., McKeegan, K. D., & DeLong, E. F. (2001). Methane- consuming archaea revealed by directly coupled isotopic and phylogenetic analysis. Science (New York, N.Y.), 293(5529), 484–487. https://doi.org/10.1126/science.1061338

Overmann, J. ¨, Abt, B., & Sikorski, J. (2017). Present and Future of Culturing Bacteria. Annu. Rev. Microbiol, 71, 711–730. https://doi.org/10.1146/annurev-micro-090816

Pagnier, I., Reteno, D.-G. I., Saadi, H., Boughalmi, M., Gaia, M., Slimani, M., … La Scola, B. (2013). A Decade of Improvements in Mimiviridae and Marseilleviridae Isolation from Amoeba. Intervirology, 56(6), 354–363. https://doi.org/10.1159/000354556

Pagnier, I., Yutin, N., Croce, O., Makarova, K. S., Wolf, Y. I., Benamar, S., … La Scola, B. (2015). Babela massiliensis, a representative of a widespread bacterial phylum with unusual

143

adaptations to parasitism in amoebae. Biology Direct, 10(1), 13. https://doi.org/10.1186/s13062-015-0043-z

Pande, S., & Kost, C. (2017). Bacterial Unculturability and the Formation of Intercellular Metabolic Networks. Trends in Microbiology, 25(5), 349–361. https://doi.org/10.1016/j.tim.2017.02.015

Pankratov, T. A. (2012). Acidobacteria in microbial communities of the bog and tundra lichens. Microbiology, 81(1), 51–58. https://doi.org/10.1134/s0026261711060166

Parks, D. H., Chuvochina, M., Waite, D. W., Rinke, C., Skarshewski, A., Chaumeil, P.-A., & Hugenholtz, P. (2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, 36(10), 996–1004. https://doi.org/10.1038/nbt.4229

Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25(7), 1043–1055. https://doi.org/10.1101/gr.186072.114

Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P. A., Woodcroft, B. J., Evans, P. N., … Tyson, G. W. (2017). Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology, 2(11), 1533–1542. https://doi.org/10.1038/s41564-017-0012-7

Penz, T., Schmitz-Esser, S., Kelly, S. E., Cass, B. N., & Mü Ller, A. (2012). Comparative Genomics Suggests an Independent Origin of Cytoplasmic Incompatibility in Cardinium hertigii. PLoS Genet, 8(10), 1003012. https://doi.org/10.1371/journal.pgen.1003012

Pereira, M. M., Santana, M., & Teixeira, M. (2001, June 1). A novel scenario for the evolution of haem-copper oxygen reductases. Biochimica et Biophysica Acta - Bioenergetics. Elsevier. https://doi.org/10.1016/S0005-2728(01)00169-4

Pham, V. H. T., & Kim, J. (2012). Cultivation of unculturable soil bacteria. Trends in Biotechnology, 30(9), 475–484. https://doi.org/10.1016/J.TIBTECH.2012.05.007

Pisani, O., Frey, S. D., Simpson, A. J., & Simpson, M. J. (2015). Soil warming and nitrogen deposition alter soil organic matter composition at the molecular-level. Biogeochemistry, 123(3), 391–409. https://doi.org/10.1007/s10533-015-0073-8

Ploug, H., Adam, B., Musat, N., Kalvelage, T., Lavik, G., Wolf-Gladrow, D., & Kuypers, M. M. M. (2011). Carbon, nitrogen and O2 fluxes associated with the cyanobacterium Nodularia spumigena in the Baltic Sea. The ISME Journal, 5(9), 1549–1558. https://doi.org/10.1038/ismej.2011.20

144

Poirier, V., Angers, D. A., & Whalen, J. K. (2014). Formation of millimetric-scale aggregates and associated retention of 13C–15N-labelled residues are greater in subsoil than topsoil. Soil Biology and Biochemistry, 75, 45–53. https://doi.org/10.1016/j.soilbio.2014.03.020

Pold, G., Grandy, A. S., Melillo, J. M., & Deangelis, K. M. (2017). Changes in substrate availability drive carbon cycle response to chronic warming. Soil Biology and Biochemistry, 110, 68–78. https://doi.org/10.1016/j.soilbio.2017.03.002

Poll, C., Thiede, A., Wermbter, N., Sessitsch, A., & Kandeler, E. (2003). Micro-scale distribution of microorganisms and microbial enzyme activities in a soil with long-term organic amendment. European Journal of Soil Science, 54(4), 715–724. https://doi.org/10.1046/j.1351-0754.2003.0569.x

Ponomarova, O., & Patil, K. R. (2015). Metabolic interactions in microbial communities: untangling the Gordian knot. Current Opinion in Microbiology, 27, 37–44. https://doi.org/10.1016/J.MIB.2015.06.014

Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE, 5(3), e9490. https://doi.org/10.1371/journal.pone.0009490

Prosser, J. I., Bohannan, B. J. M., Curtis, T. P., Ellis, R. J., Firestone, M. K., Freckleton, R. P., … Young, J. P. W. (2007). The role of ecological theory in microbial ecology. Nature Reviews Microbiology, 5(5), 384–392. https://doi.org/10.1038/nrmicro1643

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., … Glöckner, F. O. (2012). The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research, 41(D1), D590–D596. https://doi.org/10.1093/nar/gks1219

Radajewski, S., Ineson, P., Parekh, N. R., & Murrell, J. C. (2000). Stable-isotope probing as a tool in microbial ecology. Nature, 403(6770), 646–649. https://doi.org/10.1038/35001054

Radajewski, S., McDonald, I. R., & Murrell, J. C. (2003, June 1). Stable-isotope probing of nucleic acids: A window to the function of uncultured microorganisms. Current Opinion in Biotechnology. Elsevier Current Trends. https://doi.org/10.1016/S0958-1669(03)00064-8

Rappé, M. S., & Giovannoni, S. J. (2003). THE UNCULTURED MICROBIAL MAJORITY. Annu. Rev. Microbiol, 57, 369–394. https://doi.org/10.1146/annurev.micro.57.030502.090759

Raynaud, X., & Nunan, N. (2014). Spatial Ecology of Bacteria at the Microscale in Soil. PLoS ONE, 9(1), e87217. https://doi.org/10.1371/journal.pone.0087217

Reteno, D. G., Benamar, S., Khalil, J. B., Andreani, J., Armstrong, N., Klose, T., … La Scola, B. (2015). Faustovirus, an asfarvirus-related new lineage of giant viruses infecting amoebae. Journal of Virology, 89(13), 6585–6594. https://doi.org/10.1128/JVI.00115-15

145

Rillig, M. C., Muller, L. A., & Lehmann, A. (2017). Soil aggregates as massively concurrent evolutionary incubators. Nature Publishing Group. https://doi.org/10.1038/ismej.2017.56

Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J.-F., … Woyke, T. (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature, 499(7459), 431–437. https://doi.org/10.1038/nature12352

Rodrigues, R. A. L., Mougari, S., Colson, P., La Scola, B., & Abrahão, J. S. (2019). “Tupanvirus”, a new genus in the family Mimiviridae. Archives of Virology, 164(1), 325–331. https://doi.org/10.1007/s00705-018-4067-4

Rodriguez-R, L., & Konstantinidis, K. (2016). The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ Preprints. https://doi.org/10.7287/peerj.preprints.1900

Rodriguez-R, L. M., & Konstantinidis, K. T. (n.d.). Bypassing Cultivation To Identify Bacterial Species Culture-independent genomic approaches identify credibly distinct clusters, avoid cultivation bias, and provide true insights into microbial species. Retrieved from http://enve-omics.gatech.edu/sites/default/files/2014-Rodriguez_R- Konstantinidis_Microbe_Magazine.pdf

Röling, W. F. M., van Breukelen, B. M., Bruggeman, F. J., & Westerhoff, H. V. (2007). Ecological control analysis: being(s) in control of mass flux and metabolite concentrations in anaerobic degradation processes. Environmental Microbiology, 9(2), 500–511. https://doi.org/10.1111/j.1462-2920.2006.01167.x

Roux, S., Chan, L.-K., Egan, R., Malmstrom, R. R., McMahon, K. D., & Sullivan, M. B. (2017). Ecogenomics of virophages and their giant virus hosts assessed through time series metagenomics. Nature Communications, 8(1), 858. https://doi.org/10.1038/s41467-017- 01086-2

Rustad, L. E., Campbell, J. L., Marion, G. M., Norby, R. J., Mitchell, M. J., Hartley, A. E., … Wright, R. (2001). A meta-analysis of the response of soil respiration, net nitrogen mineralization, and aboveground plant growth to experimental ecosystem warming. Oecologia, 126(4), 543–562. https://doi.org/10.1007/s004420000544

Schimel, J. P., & Gulledge, J. (1998). Microbial community structure and global trace gases. Global Change Biology, 4(7), 745–758. https://doi.org/10.1046/j.1365-2486.1998.00195.x

Schimel, J. P., & Schaeffer, S. M. (2012). Microbial control over carbon cycling in soil. Frontiers in Microbiology, 3, 348. https://doi.org/10.3389/fmicb.2012.00348

Schmidt, M. W. I., Torn, M. S., Abiven, S., Dittmar, T., Guggenberger, G., Janssens, I. A., … Trumbore, S. E. (2011). Persistence of soil organic matter as an ecosystem property. Nature, 478(7367), 49–56. https://doi.org/10.1038/nature10386

146

Schmitz-Esser, S., Tischler, P., Arnold, R., Montanaro, J., Wagner, M., Rattei, T., & Horn, M. (2010). The genome of the amoeba symbiont "Candidatus Amoebophilus asiaticus" reveals common mechanisms for host cell interaction among amoeba-associated bacteria. Journal of Bacteriology, 192(4), 1045–1057. https://doi.org/10.1128/JB.01379-09

Schulz, F., Alteio, L., Goudeau, D., Ryan, E. M., Yu, F. B., Malmstrom, R. R., … Woyke, T. (2018). Hidden diversity of soil giant viruses. Nature Communications, 9(1), 4881. https://doi.org/10.1038/s41467-018-07335-2

Schulz, F., Tyml, T., Pizzetti, I., Dyková, I., Fazi, S., Kostka, M., & Horn, M. (2015). Marine amoebae with cytoplasmic and perinuclear symbionts deeply branching in the Gammaproteobacteria. Scientific Reports, 5, 13381. https://doi.org/10.1038/srep13381

Schulz, F., Yutin, N., Ivanova, N. N., Ortega, D. R., Lee, T. K., Vierheilig, J., … Woyke, T. (2017). Giant viruses with an expanded complement of translation system components. Science, 356(6333), 82–85. https://doi.org/10.1126/science.aal4657

Sczyrba, A., Hofmann, P., Belmann, P., Koslicki, D., Janssen, S., Dröge, J., … McHardy, A. C. (2017). Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nature Methods, 14(11), 1063–1071. https://doi.org/10.1038/nmeth.4458

Seshadri, R., Leahy, S. C., Attwood, G. T., Teh, K. H., Lambie, S. C., Cookson, A. L., … Kelly, W. J. (2018). Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nature Biotechnology, 36(4), 359–367. https://doi.org/10.1038/nbt.4110

Sessitsch, A., Weilharter, A., Gerzabek, M. H., Kirchmann, H., & Kandeler, E. (2001). Microbial population structures in soil particle size fractions of a long-term fertilizer field experiment. Applied and Environmental Microbiology, 67(9), 4215–4224. https://doi.org/10.1128/aem.67.9.4215-4224.2001

Shen, L., Liu, Y., Xu, B., Wang, N., Zhao, H., Liu, X., & Liu, F. (2017). Comparative genomic analysis reveals the environmental impacts on two Arcticibacter strains including sixteen Sphingobacteriaceae species. Scientific Reports, 7(1). https://doi.org/10.1038/s41598-017- 02191-4

Siddiky, M. R. K., Kohler, J., Cosme, M., & Rillig, M. C. (2012). Soil biota effects on soil structure: Interactions between arbuscular mycorrhizal fungal mycelium and collembola. Soil Biology and Biochemistry, 50, 33–39. https://doi.org/10.1016/j.soilbio.2012.03.001

Simon, A., Bindschedler, S., Job, D., Wick, L. Y., Filippidou, S., Kooli, W. M., … Junier, P. (2015). Exploiting the fungal highway: development of a novel tool for the in situ isolation of bacteria migrating along fungal mycelium. FEMS Microbiology Ecology, 91, 116. https://doi.org/10.1093/femsec/fiv116

147

Six, J., & Paustian, K. (2014). Aggregate-associated soil organic matter as an ecosystem property and a measurement tool. Soil Biology and Biochemistry, 68, A4–A9. https://doi.org/10.1016/J.SOILBIO.2013.06.014

Solden, L., Lloyd, K., & Wrighton, K. (2016). The bright side of microbial dark matter: lessons learned from the uncultivated majority. Current Opinion in Microbiology, 31, 217–226. https://doi.org/10.1016/j.mib.2016.04.020

Sousa, F. L., Alves, R. J., Ribeiro, M. A., Pereira-Leal, J. B., Teixeira, M., & Pereira, M. M. (2012). The superfamily of heme-copper oxygen reductases: Types and evolutionary considerations. Biochimica et Biophysica Acta - Bioenergetics, 1817(4), 629–637. https://doi.org/10.1016/j.bbabio.2011.09.020

Spohn, M., Pötsch, E. M., Eichorst, S. A., Woebken, D., Wanek, W., & Richter, A. (2016). Soil microbial carbon use efficiency and biomass turnover in a long-term fertilization experiment in a temperate grassland. Soil Biology and Biochemistry, 97, 168–175. https://doi.org/10.1016/j.soilbio.2016.03.008

Staley, J. T., & Konopka, A. (1985). Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats, 39, 321–346. Retrieved from https://www.annualreviews.org/doi/pdf/10.1146/annurev.mi.39.100185.001541

Stanley, C. E., Grossmann, G., Casadevall i Solvas, X., & deMello, A. J. (2016). Soil-on-a-Chip: microfluidic platforms for environmental organismal studies. Lab on a Chip, 16(2), 228–241. https://doi.org/10.1039/C5LC01285F

Stepanauskas, R. (2012). Single cell genomics: an individual look at microbes. Current Opinion in Microbiology, 15(5), 613–620. https://doi.org/10.1016/J.MIB.2012.09.001

Stepanauskas, R., Fergusson, E. A., Brown, J., Poulton, N. J., Tupper, B., Labonté, J. M., … Lubys, A. (2017). Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles. Nature Communications, 8(1). https://doi.org/10.1038/s41467-017-00128-z

Stork, N. E., & Eggleton, P. (1992). Invertebrates as determinants and indicators of soil quality. American Journal of Alternative Agriculture, 7(May), 38–47. Retrieved from https://pdfs.semanticscholar.org/d878/271084789d4a4867947aa5047b3c8a097ec8.pdf

Stubbendieck, R. M., Vargas-Bautista, C., & Straight, P. D. (2016). Bacterial Communities: Interactions to Scale. Frontiers in Microbiology, 7, 1234. https://doi.org/10.3389/fmicb.2016.01234

Štursová, M., Bárta, J., Šantrůčková, H., & Baldrian, P. (2016). Small-scale spatial heterogeneity of ecosystem properties, microbial community composition and microbial activities in a temperate mountain forest soil. FEMS Microbiology Ecology, 92(12), fiw185. https://doi.org/10.1093/femsec/fiw185 148

Suhre, K. (2005). Gene and Genome Duplication in Acanthamoeba polyphaga Mimivirus. Journal of Virology, 79(24), 15591–15591. https://doi.org/10.1128/JVI.79.24.15591.2005

Tagliavi, Z., & Draghici, S. (2012). MDAsim: A multiple displacement amplification simulator. In 2012 IEEE International Conference on Bioinformatics and Biomedicine (pp. 1–4). IEEE. https://doi.org/10.1109/BIBM.2012.6392622

Taillefer, M., Arntzen, M. Ø., Henrissat, B., Pope, P. B., & Larsbrink, J. (2018). Proteomic Dissection of the Cellulolytic Machineries Used by Soil-Dwelling Bacteroidetes. MSystems, 3(6), e00240-18. https://doi.org/10.1128/mSystems.00240-18

Teplitski, M., Robinson, J. B., & Bauer, W. D. (2000). Plants Secrete Substances That Mimic Bacterial N -Acyl Homoserine Lactone Signal Activities and Affect Population Density- Dependent Behaviors in Associated Bacteria. Molecular Plant-Microbe Interactions, 13(6), 637–648. https://doi.org/10.1094/MPMI.2000.13.6.637

Thomas, F., Hehemann, J. H., Rebuffet, E., Czjzek, M., & Michel, G. (2011). Environmental and gut Bacteroidetes: The food connection. Frontiers in Microbiology, 2(MAY). https://doi.org/10.3389/fmicb.2011.00093

Torsvik, V., & Øvreås, L. (2002). Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology, 5(3), 240–245. https://doi.org/10.1016/S1369-5274(02)00324-7

Totsche, K. U., Amelung, W., Gerzabek, M. H., Guggenberger, G., Klumpp, E., Knief, C., … Kögel- Knabner, I. (2018). Microaggregates in soils. Journal of Plant Nutrition and Soil Science, 181(1), 104–136. https://doi.org/10.1002/jpln.201600451

Trivedi, P., Anderson, I. C., & Singh, B. K. (2013). Microbial modulators of soil carbon storage: integrating genomic and metabolic knowledge for global prediction. Trends in Microbiology, 21(12), 641–651. https://doi.org/10.1016/J.TIM.2013.09.005

Tucker, C. L., Bell, J., Pendall, E., & Ogle, K. (2013). Does declining carbon-use efficiency explain thermal acclimation of soil respiration with warming? Global Change Biology, 19(1), 252– 263. https://doi.org/10.1111/gcb.12036

Tveit, A. T., Urich, T., & Svenning, M. M. (2014). Metatranscriptomic analysis of arctic peat soil microbiota. Applied and Environmental Microbiology, 80(18), 5761–5772. https://doi.org/10.1128/AEM.01030-14

Urich, T., Lanzén, A., Qi, J., Huson, D. H., Schleper, C., & Schuster, S. C. (2008). Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome. PLoS ONE, 3(6), e2527. https://doi.org/10.1371/journal.pone.0002527

149

Van Dijk, E. L., Lè Ne Auger, H., Jaszczyszyn, Y., & Thermes, C. (2014). Ten years of next-generation sequencing technology. Trends in Genetics, 30, 418–426. https://doi.org/10.1016/j.tig.2014.07.001

Verneau, J., Levasseur, A., Raoult, D., La Scola, B., & Colson, P. (2016). MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes. Frontiers in Microbiology, 7, 428. https://doi.org/10.3389/fmicb.2016.00428

Vidal, A., Hirte, J., Bender, S. F., Mayer, J., Gattinger, A., Höschen, C., … Mueller, C. W. (2018). Linking 3D Soil Structure and Plant-Microbe-Soil Carbon Transfer in the Rhizosphere. Frontiers in Environmental Science, 6, 9. https://doi.org/10.3389/fenvs.2018.00009

Vinciguerra, M. T. (2009). To cite this article: Maria Teresa Vinciguerra (1979) Role of nematodes in the biological processes of the soil. Italian Journal of Zoology, 46(4), 363–374. https://doi.org/10.1080/11250007909440312

Vos, M., Wolf, A. B., Jennings, S. J., & Kowalchuk, G. A. (2013). Micro-scale determinants of bacterial diversity in soil. FEMS Microbiology Reviews, 37(6), 936–954. https://doi.org/10.1111/1574-6976.12023

Wang, C., Lu, X., Mori, T., Mao, Q., Zhou, K., Zhou, G., … Mo, J. (2018). Responses of soil microbial community to continuous experimental nitrogen additions for 13 years in a nitrogen-rich tropical forest. https://doi.org/10.1016/j.soilbio.2018.03.009

Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology, 73(16), 5261–5267. https://doi.org/10.1128/AEM.00062-07

Ward, N. L., Challacombe, J. F., Janssen, P. H., Henrissat, B., Coutinho, P. M., Wu, M., … Kuske, C. R. (2009). Three genomes from the phylum Acidobacteria provide insight into the lifestyles of these microorganisms in soils. Applied and Environmental Microbiology, 75(7), 2046– 2056. https://doi.org/10.1128/AEM.02294-08

Warmink, J. A., Nazir, R., Corten, B., & van Elsas, J. D. (2011). Hitchhikers on the fungal highway: The helper effect for bacterial migration via fungal hyphae. Soil Biology and Biochemistry, 43(4), 760–765. https://doi.org/10.1016/j.soilbio.2010.12.009

Watzinger, A. (2015). Microbial phospholipid biomarkers and stable isotope methods help reveal soil functions. Soil Biology and Biochemistry, 86, 98–107. https://doi.org/10.1016/j.soilbio.2015.03.019

Wegner, C.-E., & Liesack, W. (2017a). Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags. Frontiers in Microbiology, 8, 1023. https://doi.org/10.3389/fmicb.2017.01023

150

Wegner, C.-E., & Liesack, W. (2017b). Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags. Frontiers in Microbiology, 8, 1023. https://doi.org/10.3389/fmicb.2017.01023

Wessel, A. K., Hmelo, L., Parsek, M. R., & Whiteley, M. (2013). Going local: technologies for exploring bacterial microenvironments. Nature Reviews Microbiology, 11(5), 337–348. https://doi.org/10.1038/nrmicro3010

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. Retrieved from https://cran.r-project.org/web/packages/ggplot2/citation.html

Widder, S., Allen, R. J., Pfeiffer, T., Curtis, T. P., Wiuf, C., Sloan, W. T., … Soyer, O. S. (2016). Challenges in microbial ecology: building predictive understanding of community function and dynamics. The ISME Journal, 10(11), 2557–2568. https://doi.org/10.1038/ismej.2016.45

Widmer, F., Flieûbach, A., Laczko Â, E., Schulze-Aurich, J., & Zeyer, J. (2000). Assessing soil biological characteristics: a comparison of bulk soil community DNA-, PLFA-, and Biologe- analyses. Soil Biology and Biochemistry, 33, 1029–1036. Retrieved from http://ac.els- cdn.com/S0038071701000062/1-s2.0-S0038071701000062-main.pdf?_tid=c9cc9b1c-65bd- 11e7-a4d0-00000aab0f6c&acdnat=1499725327_0806f6f26c4c34634e4e8041f10afc43

Wilhelm, S., Bird, J., Bonifer, K., Calfee, B., Chen, T., Coy, S., … LeCleir, G. R. (2017). A Student’s Guide to Giant Viruses Infecting Small Eukaryotes: From Acanthamoeba to Zooxanthellae. Viruses, 9(3), 46. https://doi.org/10.3390/v9030046

Wilhelm, S. W., Coy, S. R., Gann, E. R., Moniruzzaman, M., & Stough, J. M. A. (2016). Standing on the Shoulders of Giant Viruses: Five Lessons Learned about Large Viruses Infecting Small Eukaryotes and the Opportunities They Create. PLOS Pathogens, 12(8), e1005752. https://doi.org/10.1371/journal.ppat.1005752

Williams, R. J., Howe, A., & Hofmockel, K. S. (2014). Demonstrating microbial co-occurrence pattern analyses within and between ecosystems. Frontiers in Microbiology, 5, 358. https://doi.org/10.3389/fmicb.2014.00358

Wilson, W. H., Van Etten, J. L., & Allen, M. J. (2009). The Phycodnaviridae: the story of how tiny giants rule the world. Current Topics in Microbiology and Immunology, 328, 1–42. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/19216434

Woebken, D., Burow, L. C., Prufert-Bebout, L., Bebout, B. M., Hoehler, T. M., Pett-Ridge, J., … Singer, S. W. (2012). Identification of a novel cyanobacterial group as active diazotrophs in a coastal microbial mat using NanoSIMS analysis. The ISME Journal, 6(7), 1427–1439. https://doi.org/10.1038/ismej.2011.200

Woyke, T., Doud, D. F. R., & Schulz, F. (2017a). The trajectory of microbial single-cell sequencing. Nature Methods. https://doi.org/10.1038/nmeth.4469 151

Woyke, T., Doud, D. F. R., & Schulz, F. (2017b, October 31). The trajectory of microbial single-cell sequencing. Nature Methods. Nature Publishing Group. https://doi.org/10.1038/nmeth.4469

Woyke, T., Xie, G., Copeland, A., González, J. M., Han, C., Kiss, H., … Stepanauskas, R. (2009). Assembling the Marine Metagenome, One Cell at a Time. PLoS ONE, 4(4), e5299. https://doi.org/10.1371/journal.pone.0005299

Wrighton, K. C., Thomas, B. C., Sharon, I., Miller, C. S., Castelle, C. J., VerBerkmoes, N. C., … Banfield, J. F. (2012). Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science, 337(6102), 1661–1665. https://doi.org/10.1126/science.1224041

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N. N., … Eisen, J. A. (2009). LETTERS A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature, 462. https://doi.org/10.1038/nature08656

Yoosuf, N., Pagnier, I., Fournous, G., Robert, C., Raoult, D., La Scola, B., & Colson, P. (2014). Draft genome sequences of Terra1 and Terra2 viruses, new members of the family Mimiviridae isolated from soil. Virology, 452–453, 125–132. https://doi.org/10.1016/j.virol.2013.12.032

Yu, F. B., Blainey, P. C., Schulz, F., Woyke, T., Horowitz, M. A., & Quake, S. R. (2017). Microfluidic- based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples. ELife, 6. https://doi.org/10.7554/eLife.26580

Yutin, N., Wolf, Y. I., & Koonin, E. V. (2014). Origin of giant viruses from smaller DNA viruses not from a fourth domain of cellular life. Virology, 466–467, 38–52. https://doi.org/10.1016/j.virol.2014.06.032

Yutin, N., Wolf, Y. I., Raoult, D., & Koonin, E. V. (2009). Eukaryotic large nucleo-cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome evolution. Virology Journal, 6(1), 223. https://doi.org/10.1186/1743-422X-6-223

Zchori-Fein, E., Perlman, S. J., Kelly, S. E., Katzir, N., & Hunter, M. S. (2004). Characterization of a “Bacteroidetes” symbiont in Encarsia wasps (Hymenoptera: Aphelinidae): Proposal of “Candidatus Cardinium hertigii.” International Journal of Systematic and Evolutionary Microbiology, 54(3), 961–968. https://doi.org/10.1099/ijs.0.02957-0

Zelezniak, A., Andrejev, S., Ponomarova, O., Mende, D. R., Bork, P., & Patil, K. R. (2015). Metabolic dependencies drive species co-occurrence in diverse microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 112(20), 6449–6454. https://doi.org/10.1073/pnas.1421834112

Zhang, H., Yohe, T., Huang, L., Entwistle, S., Wu, P., Yang, Z., … Yin, Y. (2018). dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Research, 46, 95–101. https://doi.org/10.1093/nar/gky418 152

Zhang, W., Zhou, J., Liu, T., Yu, Y., Pan, Y., Yan, S., & Wang, Y. (2015). Four novel algal virus genomes discovered from Yellowstone Lake metagenomes. Scientific Reports, 5(1), 15131. https://doi.org/10.1038/srep15131

Zhou, J., Xue, K., Xie, J., Deng, Y., Wu, L., Cheng, X., … Luo, Y. (2011). Microbial mediation of carbon-cycle feedbacks to climate warming. Nature Climate Change, 2. https://doi.org/10.1038/NCLIMATE1331

Zimmermann, J., Portillo, M. C., Serrano, L., Ludwig, W., & Gonzalez, J. M. (2012). Acidobacteria in Freshwater Ponds at Doñana National Park, Spain. Microbial Ecology, 63(4), 844–855. https://doi.org/10.1007/s00248-011-9988-3

153