<<

METAGENOMIC AND METATRANSCRIPTOMIC ANALYSES OF ACCRETION ICE

Yury M. Shtarkman

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

December 2015

Committee:

Scott O. Rogers, Advisor

Rober W. Midden Graduate Faculty Representative

Vipaporn Phuntumart

Paul F. Morris

Robert Michael McKay © 2015

Yury M Shtarkman

All Rights Reserved iii

ABSTRACT

Scott O. Rogers, Advisor

Lake Vostok (Antarctica) is the 4th deepest lake on Earth, the 6th largest by volume, and 16th largest by area, being similar in area to Ladoga Lake (Russia) and Lake Ontario (North

America). However, it is a subglacial lake, constantly covered by more than 3,800 m of glacial ice, and has been covered for at least 15 million years. As the glacier slowly traverses the lake, water from the lake freezes (i.e., accretes) to the bottom of the glacier, such that on the far side of the lake a 230 m thick layer of accretion ice collects. This essentially samples various parts of the lake surface water as the glacier moves across the lake. As the glacier enters the lake, it passes over a shallow embayment. The embayment accretion ice is characterized by its silty inclusions and relatively high concentrations of several ions. It then passes over a peninsula (or island) and into the main basin. The main basin accretion ice is clear with almost no inclusions and low ion content.

Metagenomic/metatranscriptomic analysis has been performed on two accretion ice samples; one from the shallow embayment and the other from part of the main lake basin. Ice from the shallow embayment contains a variety of as well as a few and several types of

Eukarya. Most are related to that are psychrophilic, marine, aquatic, or live in lake/ocean sediments, or a combination of these. However, sequences identified as originating from many different were found, suggesting the presence of hydrothermal activity in the lake.

In contrast to the embayment ice, the ice from the main basin yielded only about 5-6% of the number of sequences. Here again, molecular signatures of , marine, aquatic, a few sediment-dwelling , and a few thermophiles were found. Because of the extreme iv conditions, it has been hypothesized that Lake Vostok is sterile, or that very few types of organisms inhabit the lake. Our results indicate that it contains a diverse set of organisms, and the number and taxonomic composition varies with position in the lake. v

This dissertation is dedicated to Irina Vasilkova and Mikhail Shtarkman - my wonderful parents,

my amazing sister Alexandra and my sweet grandma. vi

ACKNOWLEDGMENTS

I would like to express my sincere appreciation to Dr. Scott O. Rogers, my advisor, for giving me this opportunity. He was always the one encouraging me to move forward in my project, learn more and gain a broader scientific view. I am very thankful for the knowledge I gained and the skills I learned from him. I am grateful for his supervision, guidance, constructive criticism, and the amazing patience he had for my work, especially writing. He helped me to become a better scientist and without him this work would not have been completed and published.

This work was partially funded by the National Science Foundation. I would also like to thank Bowling Green State University and especially the Department for Biological Sciences for providing financial support and a great research environment.

I would also like to thank my incredible committee: Dr. Paul Morris for his support, guidance and suggestions on this project; Dr. Vipaporn Phuntumart for her comments and suggestions; Dr.

R. Michael L. McKay for his support and cheers, and Dr. W. Robert Midden for his interest in our research area. I would like to thank a former member of my committee, Dr. Carmen

Fioravanti, for his support and constructive criticism. I am also thankful to all my committee members for the passion and interest the devoted to preliminary examination, proposal defense, and while serving in my dissertation committee. I appreciate all their valuable ideas and feedback on my dissertation defense examination.

This work would not have been possible without the support and advice of many people. I would like to acknowledge former members from our laboratory: Tom D’Elia, Ram Satish

9HHUDSDQHQLDQG=H\QHS.RऊHUIRUWKHLUZRUNRQWKHGHVLJQRIWKLVSURMHFWDQGRQWKHLFHFRUHV and Caitlin Knowlton for her work on the control samples. I am also thankful to Robyn Edgar, vii

Dr. Morris’s former student, for her help with bioinformatics software and database. I am also grateful to all of them for their comments and suggestions in preparation of the manuscripts.

I would also like to thank my former lab mates, Farida Sidiq and Lorena Harris for their help and dedication during my first months in the lab. Special thanks go to Chen Xing, Seiguk Shin,

Caitlin Knowlton and Sammy Juma for their support and the fun times we had together. They helped me during rough times and we had a lot of fun working together. Thank you guys big time.

I would like to mention Dr. George Bullerjahn, Dr. Neocles Leontis and former vice provost for research and Dean of the Graduate College Heinz Bulmahn for letting me be a part of the

BGSU graduate school and believing in my capabilities.

My heart is full of friends back home in Russia and those who support me here at Bowling

Green. Among hundreds, I especially want to thank Dr. Yury Ivanov, Dr. Vasiliy Morosov,

Pavel Moroz and Natalia Kholmicheva, we shared a lot together as scientists and as roommates.

Thank you for putting up with me for such a long time. I would also like to share my sincere thanks to my great friends Astha Malik and Jigar Patel for always being there for me when I needed support.

Lastly, I would like to thank my family for supporting me 100% and believing in me. I would not be able to do this without their love, care and encouragement. viii

TABLE OF CONTENTS

Page

CHAPTER I ...... 1

Introduction to methods ...... 1

Ribosomal as metagenomic targets ...... 3

Culturing and phylogenetic analysis ...... 6

Cold environments ...... 7

Genomic sequencing overview ...... 10

Lake Vostok accretion ice ...... 11

Lake Vostok samples preparation...... 13

Software overview ...... 14

Summary...... 15

References for the Chapter I ...... 18

CHAPTER II ...... 22

Introduction ...... 22

Lake Vostok origin ...... 26

Antarctic drilling project ...... 28

Microbial composition of the lake ...... 31

Accretion ice I ...... 32

Accretion ice type II ...... 36

Cold stress mechanisms ...... 39

Other planets ...... 40

Research statement ...... 43 ix

Materials and methods ...... 45

Molecular analysis ...... 45

Sequence analyses...... 46

Taxonomy identification and database construction ...... 49

Metabolic map reconstruction ...... 51

Water control samples ...... 54

Results...... 54

Psychrophiles ...... 83

Thermophiles ...... 84

Halophiles ...... 85

Other ...... 86

Symbiotic and parasitic species ...... 86

Opisthokonta...... 88

Archaeplastida, Chromalveolata, Bikonta, ...... 89

Unknown ...... 90

Metabolic classifications...... 91

Nucleotide metabolism ...... 94

Pyruvate metabolism ...... 96

Carbohydrates ...... 97

Energy systems...... 98

Discussion...... 107

Marine vs freshwater environments...... 111

Thermophiles and thermotolerant ...... 115 x

Psychrophiles and psychrotolerant ...... 119

Mean unique sequence concentrations ...... 121

Metabolic analysis ...... 122

Carbon fixation pathways ...... 123

Nitrogen metabolism ...... 125

Conclusions ...... 127

References for the Chapter II ...... 133

APPENDIX A...... 145

SUPPLEMENTARY TABLES ...... 161

SUPPLEMENTARY FIGURES ...... 228

Abbreviations for supplementary figures ...... 247

References for the supplementary information ...... 251 xi

LIST OF FIGURES

Figure Page

1 Metagenomic analysis of the German Alps glacier ice samples ...... 9

2 Various strategies for the metagenomic/metatranscriptomic analyses ...... 16

3 Position and location of the subglacial Lake Vostok ...... 23

4 Graphical representation of the southern part of Lake Vostok ...... 25

5 Airborne ice penetrating radar results from 1993-2000 expeditions ...... 30

6 A schematic representation of the Lake Vostok ice core ...... 33

7 NASA satellite images of the Jovian moon Europa and Mars surfaces ...... 42

8 Global metabolic map reconstructed from the results of the KAAS-KEGG analysis of the

V5 sequences ...... 92

9 Reconstructed oxidative phosphorylation pathway connected to the TCA cycle ...... 99

10 Carbon fixation processes via reductive pentose phosphate pathway ...... 101

11 Reconstructed tricarboxylic acid (TCA, citric acid, citrate, or Krebs cycle) cycle ...... 102

12 Additional reactions indicated in the V5 sample based on KAAS-KEGG sequence

analyses...... 106

13 Sequence distribution based on conditions ...... 108

14 Sequence distribution based on growth conditions ...... 112

S1 , aspartate and glutamate metabolism (1) ...... 229

S2 Alanine, aspartate and glutamate metabolism (2) ...... 231

S3 Pyruvate metabolic processes with supporting pathways ...... 233

S4 Pyruvate metabolism supplying other metabolic processes ...... 234

S5 Purine metabolism ...... 235 xii

S6 Amino sugar and nucleotide sugar metabolism ...... 236

S7 biosynthesis ...... 237

S8 , and biosynthesis and supporting pathways ...... 238

S9 , , threonine metabolism ...... 239

S10 , and biosynthesis ...... 240

S11 Glutathione and and methionine metabolism ...... 241

S12 Part of supplying terpenoid backbone pathways ...... 242

S13 Chlorophyrin and chlorophyll metabolism ...... 243

S14 Degradation of aromatic compounds ...... 244

S15 One carbon pool by folate cycle ...... 245

S16 ȕ-Alanine metabolism ...... 246 xiii

LIST OF TABLES

Table Page

1 Overall sequences distribution within two Lake Vostok samples ...... 55

2 Numbers of the unique sequences recovered from the V5 sample ...... 57

3 Numbers of the unique sequences recovered from the V6 sample ...... 60

4 The taxonomic summary of results for the V5 sequences ...... 64

5 The taxonomic summary of results for the V6 sequences ...... 79

S1 Small subunit rRNA genes of Bacteria and Eukarya from V5 ...... 161

S2 Large subunit rRNA genes of Bacteria and Eukarya from V5 ...... 190

S3 Ribosomal RNA sequences less than 200 nt in length from V5 ...... 194

S4 Bacteria mRNA (and other non-rRNA) gene sequences from V5...... 200

S5 Eukarya mRNA gene sequences (and several genomic records that have rRNA genes)

from the V5 sample ...... 205

S6 Archaea and Viruses from V5...... 206

S7 Small subunit rRNA genes of Bacteria and Eukarya from V6 ...... 207

S8 Large subunit rRNA genes of Bacteria and Eukarya from V6 ...... 209

S9 Ribosomal RNA gene sequences less than 200 nt in length from V6 ...... 210

S10 Bacteria mRNA (and other non-rRNA) gene sequences from the V6 ...... 211

S11 BLASTn and BLASTx results from analysis of V5 sequences on the KAAS KEGG site

...... 212

S12 BLASTn and BLASTx results from analysis of V6 sequences on the KAAS KEGG site

...... 219

S13 Sequences removed from the V5 data set ...... 220 xiv

S14 Sequences removed from the V6 data set ...... 227 1

CHAPTER I

Introduction to methods

This study presents a comprehensive analysis of metagenomic/metatranscriptomic data from

Lake Vostok ice core meltwater. Metagenomic/metatranscriptomic analysis is an area of science that combines various techniques for sequencing of isolated nucleic acids and determination of their taxonomic associations. The term metagenomics was first proposed by Jo Handelsman and her colleagues in 1998 and was identified as a molecular analysis of collective of the total microbiota from a specific environment (Handelsman analyzed soil microflora)

(Handelsman et al. 1998). Metagenomics primarily focuses on the analysis of environmental

DNA in order to determine genetic and taxonomic sequence diversity. The protocols include

DNA extraction, PCR amplification, and sequencing from environmental samples to construct sequence libraries of the various nucleic acids present in the sample (Handelsman et al. 1998).

This provides a taxonomic representation of microbial community composition and can lead to an understanding of the metabolic activity that exists in the microbiota. The entire analysis can also include ribosomal RNA genes (rRNA) that can reveal the phylogenetic diversity of species.

One strategy is to clone segments of extracted environmental DNA into a vector and sequence them. Most often these sequences are the ribosomal DNA regions (rDNA), either full or partial sequences. Small subunit ribosomal DNA (SSU rDNA) sequences have highly conserved regions, which facilitates primer construction. The resulting sequences can give information about various taxa present in sampled environments (Rodrigues-Valera 2004). Thus, library construction and clone screening can be powerful techniques that can be applied for taxonomic analysis as well as for analysis of specific genes, their activities and functions. It is important to mention that PCR amplification was developed during 1983-1990 (Nobel Prize in 1993 was 2 awarded to Kary Mullis) (Bilsker 1998). Prior to that time, cloning was done using restriction enzymes and ligation.

One of the major directions for metagenomic analysis is to determine species diversity based on genetic distances. Strains of bacteria can include similar genes, but the overall gene content and genomic size can be different. This can lead to physiological and ecological diversity.

Extracted environmental RNAs can be used for metatranscriptomic analysis, which is the analysis of the collective transcriptome of various species inhabiting the same environment.

Presumably, this method can be used to predict metabolic processes within a microbial community, but the challenge here is to distinguish between functional (expressed) genes/enzymes and those that represent nucleic acids extracted from non-viable cells (Simon and

Daniel 2012). The main concept of meta-molecular studies is based on environmental nucleic acid analyses; thus, the extract can include DNA and/or RNA from viable cells, as well as environmental DNA from nonviable ones. While DNA can be preserved for millions of years, recovery of high quality mRNA could be compromised due to its rapid degradation.

Additionally, metagenomic/metatranscriptomic samples will contain rRNAs, internal transcribed spacer regions (ITS), tandem repeats, and other various genomic regions all of which will complicate sequence classification, annotation, and species affiliation. Also, determining the nature of the pooled RNA becomes challenging for the transcriptomic analysis. Messenger RNA sequences are in low abundance, whereas rRNAs and transfer RNAs (tRNA) are highly represented in metagenomic/metatranscriptomic extracts. Thus, clone libraries for metatranscriptomic analysis become very time consuming procedures, which might not even detect some low-abundant mRNA sequences (Dupré and O’Malley 2007). This Lake Vostok study was designed to overcome those complications by utilizing random hexamer primer cDNA 3 synthesis coupled with subsequent next-generation pyrosequencing instead of cloning sequences into vectors.

Ribosomal genes as metagenomic targets

Several omics protocols have been developed, and three main directions of analysis can be determined depending on the analyzed cellular process. Genomics investigates modifications, sequence organization, gene transfer, loss of gene functions due to mutations, and natural selection. Transcriptomics analyze different components of the gene expression mechanism, while proteomics look into details of /enzyme structure and function, and metabolomics analyze chemical reactions among metabolites based on their end products and intermediate compounds. Chromosomics and epigenomics analyze chromosomal rearrangements and genetic material modifications subsequently affecting gene expression and/or inhibition.

Metagenomics analysis is a complex study of sequences recovered/extracted from organisms, which occupy the same habitat. The primary goal of metagenomics is to describe microbial diversity in a specific environment. While almost 99% of microbes cannot grow in the laboratory conditions, with different primer sets, one can amplify different regions of extracted environmental nucleic acids, clone and sequence. Depending on the target genomic region, amplicons can represent overall microbial diversity with particular phylotypes (universal primers) and/or can determine exact species, tissue, organelles, and specific genes (specific primers) (Rogers 2012). In the past decade, development of different complete genome sequencing machines enhanced the use of oligo dT primers for the complementary DNA synthesis from mRNA as well as random primer amplification. Amplification with random primers will produce complete and partial segments of mRNA, rRNA, tRNA genes, as well as 4 non-coding genomic regions. However, analysis of the ribosomal RNA sequences still remains universal for determining microbial community composition. A bacterial cell contains approximately 20,000 ribosomes, whereas eukaryotic cell can have more than several million

(Rogers 2012). One of the largest amounts of ribosomes was found in amphibian oocyte (female reproductive cell), which contains up to 1012 ribosomes (almost 200,000 times more than in average somatic cells) (Gilbert 2010). The reason for such large numbers of ribosomes is that they are needed to synthesize all types of in each cell. Assemblies of ribosomal proteins with rRNAs build ribosomal subunits, which are the central components in mRNA translation.

Bacterial and archaeal 70S ribosomes consist of two subunits: 30S includes SSU rRNA gene

(1.5-1.6Kb) and 21 proteins and 50S includes LSU rRNA gene (2.2-2.5Kb), 5S rRNAs (~120 bases) with 34 proteins. In , 80S ribosomes also contain two subunits: 40S subunit with SSU rRNA gene (1.7-1.9Kb) and ~33 proteins and 60S with LSU rRNA gene (up to

2.9Kb), 5.8S (~165 bases), 5S rRNAs and ~49 proteins. Most often the rRNA genes are used as target sequences for metagenomics analysis. Bacteria and Archaea rRNA genes are organized in , where one promoter regulates transcription of all rRNA genes. Each consists of two external (ETS; on 5’ and 3’ ends) and one internal (ITS) transcribed spacer (ETS) regions.

The 5’ ETS is followed by the small subunit (SSU) rRNA gene, separated from the large subunit

(LSU) rRNA gene by the ITS region. The 5S gene in Bacteria and Archaea is located after LSU rRNA gene but before 3’ ETS region. In Bacterial and Archaea, one or several tRNA genes can interrupt the ITS as well as the 3’ ETS regions. Eukaryotic rRNA genes are organized similarly; however, there are several differences. The SSU rRNA gene and LSU rRNA gene in eukaryotes are separated with two ITS regions with 5.8S gene in the middle. While eukaryotic ITS1 region shares similarity with bacterial ITS region, the 5.8S gene was found to be homologous to a 5 portion of the 5’ end of the bacterial LSU rRNA gene and was separated from ancestral bacterial

LSU rRNA by an insertion of the ITS2 sequence (Rogers 2012). Additionally, eukaryotic 5S gene is not a part of the rRNA operon. It is located outside of the 3’ ETS and is transcribed with

RNA polymerase III, like tRNA genes. Bacterial and archaeal organisms can have up to 15 copies of rRNA operons located in different parts on the , while eukaryotic rRNA genes are organized in tandem repeats, sometimes up to tens of thousands repeats within one locus, and can even have multiple of such loci tightly packed in the secondary constriction of the chromosome (Rogers 2012). Ribosomal gene organization provides information for a broad range of studies at different taxonomic levels. Different portions of rRNA genes exhibit sequence similarities among all species (from single-celled to complex multicellular), which points towards the same ancestor(s). Due to high conservation, SSU rRNA gene sequences are used to determine the phylogenetic relationships among species, which in turn shows the distribution of

Phyla and species richness in environmental samples. On the other hand, partial conservation of the LSU rRNA genes can be used in analysis of Families and Orders, whereas ITS regions are more variable and can be used to determine finer taxonomic levels, from Families to strains.

Depending on the conservation of the sequences, evolutionary events from ancient to the most recent may be determined. Different genomic regions have varying degrees of sequence conservation among organisms (the lower the level of conservation the higher the sequence variability) and can be used to determine diversity of microorganisms within taxonomic groups and their evolutionary relationships with other taxonomic groups (Rogers 2012). 6

Culturing and phylogenetic analysis

Various studies have shown that about 99% of the microbial community cannot currently be cultivated in the laboratory (Hugenholtz 2002; Rappe and Giovannoni 2003). This is the reason why microbial diversity is impossible to assess using cultivation methods. Phylogenetic analysis is one of the most common techniques used in metagenomics. This method uses sequence comparisons to determine relationships among taxa. Based on the results, changes in gene sequence among the taxonomic groups can be monitored and their evolution can be reconstructed. A unique metagenomic study was performed by Noah Fierer and colleagues on soil samples (Fierer et al. 2007). They examined SSU rRNA sequences retrieved from soil samples obtained from three different environmental conditions: prairie, desert, and rainforest

(Fierer et al. 2007). This project had several important impacts. First, by analyzing SSU rRNA sequences, they determined the diversity of three major taxonomic groups (Bacteria, Archaea, and Fungi) in each set of soil samples. Second, by constructing clone libraries, they identified the number of clones needed to reach the point where the most unique sequences in the sample have been determined (called a rarefaction curve). It is important to understand whether an adequate sampling of clones has been achieved. Different environments have unique levels of species richness; thus, the required number of sampled clones can vary significantly depending on the environment. While Fierer’s rarefaction curves did not reach asymptote, they managed to estimate microbial diversity based on the observed abundance distribution of the taxonomic units and estimated richness of each community. Fierer noted, that richness estimates should rather be used for comparing richness levels between taxonomic groups than estimating the exact number of taxonomic units in each tested habitat. Overall richness levels of all taxonomic units was predicted as very high in all sampling areas (Fierer et al. 2007), where estimated richness of 7 fungi, archaea and viruses was higher than of soil bacterial in all four sampling locations.

However, the main point of their study was that all analyzed taxonomic units are not just diverse within their own environment but also have a very low overlap between the sampling sites.

Phylogenetic analysis confirmed that none of the bacterial, fungal or archaeal taxonomic units were found at more than one site. Overall diversity estimation of bacterial sequences in all three soils was shown to be close to other studies, whereas fungal richness estimates were significantly higher than those obtained from other studies (3000 species per 400-ha site). Only one overlapping 20 bp region was found for all viral sequences between the sampling sites. Further comparison of viral sequences with 510 phage genomes revealed that similar types of phages

(but distant from marine viruses) were present in all three soil samples, of which the most abundant were bacterial pathogens. Fierer concluded that the analyzed soil biota contained much higher diversity than had previously been reported; thus, soil microbial communities (as well as any other environments) are far from being completely understood.

Cold environments

Microbial community structure in freshwater differs from those in seas and oceans, and varies from season to season. More recent metagenomic projects on ice samples revealed new information about the changes in community structure. Cold environments (such as ice and permafrost) are the biggest reservoirs of preserved microbial biomass (Simon et al. 2009).

Analysis of accretion and glacial ice cores from Antarctic Lake Vostok showed the presence of a complex microbial ecosystem (D’Elia et al. 2008, 2009). Ice is a unique matrix, which entraps huge numbers of viable microorganisms and releases them when the ice melts. Two types of ice cores were used for this study; accretion ice that was approximately 5,000 years old, and glacial 8 that was up to 2 million years old. Antarctic ice holds a long history of entrapped atmospheric contents (including microbes) that can possibly yield information about ancestral microorganisms as indicators of past conditions on Earth. Using cultivation, PCR, sequencing, and phylogenetic analysis, D’Elia and colleagues identified Antarctic bacteria sequences closely related to species of , , and (D’Elia et al. 2008). In

2009, 16S rRNA cloning and pyrosequencing analysis of DNA from the German Alps glacial ice by Carola Simon showed the presence of almost all known phyla of bacteria (Figure 1) (Simon et al. 2009). They found that species are highly represented in Alps glacial ice as well as Alphaproteobacteria and species. Based on the number of sequences in the created 16S rRNA gene libraries and number of identified taxonomic units, they created rarefaction curves, showing that in order to determine microbial diversity in glacial ice a relatively small clone library is needed to reach a saturation point. At the species level you can reach a saturation point after sequencing of 300 clones. On the other hand, they also showed that pyrosequencing of DNA from glacier ice was a much more powerful tool than cloning and resulted in almost 150,000 sequences. While the same taxonomic groups (Alphaproteobacteria,

Betaproteobacteria, Bacteroidetes) were found as most abundant in both molecular analyses, sequences related to , , Chlorobi and Eukaryotes were detected only by pyrosequencing.

Within the pyrosequencing data, 142,487 sequences matched orthologous genes for metabolic enzymes of which the majority (41,723) matched various genes related to carbohydrate metabolism and 28,859 genes similar to those responsible for energy synthesis.

Interestingly, most of the sequences (~9000) from glacial ice were closely related to genes 9

Figure 1. Metagenomic analysis of the German Alps glacier ice samples. Three different approaches for analyzing microorganism distribution in the glacial ice from the German Alps,

Northern Schneeferner region. Distribution of amplified 16S rRNA gene sequences: column A - with best hits to the Ribosomal Database Project II database (Cole et al. 2004), column B - from the pyrosequencing assay, and column C - after using a specific Molecular Dynamic Analysis

CARMA program (Glykos, 2006) for assembling pyrosequencing results to match lengths of greater than 200 bases with more than 60% confidence estimation. This figure was adapted from reference Simon et al. 2009.

10 responsible for the oxidative phosphorylation; only a few sequences were similar to components of photosystems and rhodopsin-like proteins.

Glacial ice is covered with snow almost all year round and because of the high reflective power for snow almost none of the light can pass through, which means that entrapped ice microbes would not be able to perform photosynthesis. Approximately 5540 sequences were closely related to genes for metabolism, showing an abundance of dissimilatory and assimilatory nitrate/nitrite reduction and ammonia oxidation genes (Simon et al. 2009).

Analysis of the metabolic pathways together with taxonomic representation showed that the upper layers of the German Alps ice are primarily inhabited by aerobic (facultative aerobes) and non-phototrophic species that prefer low organic carbon concentrations and primarily a psychrophilic lifestyle. The presence of denitrification genes (nitrate/nitrite reduction) was assigned to facultative aerobic species of , which were found to be the most abundant in the glacial ice.

Genomic sequencing overview

In collaboration with his colleagues, J. Craig Venter made one of the biggest breakthroughs in metagenomics research while working on the microbe metagenomics project from the

Sargasso Sea (Venter et al. 2004). After using size selective filtration to remove the eukaryotic cell population, they performed whole-genome shotgun sequencing. As a result, a total of 1.045 billion base pairs of non-redundant sequences were generated and examined for gene content and species diversity. Approximately 1800 genomic sequences were phylogenetically analyzed. This number also includes 148 novel bacterial phylotypes that were found during their research.

Analysis of the gene content of the total base pair amount revealed 1.2 million previously 11 unknown genes, 782 of which were identified as new rhodopsin-like photoreceptors (Venter et al. 2004).

In the rapidly developing area of sequencing many methods are now available. The basis for the shotgun sequencing technique that was used by Venter lab was first proposed in 1979

(Staden, 1979). With this method cloned DNA is passed through several steps of fragmentation and sequencing. This method has proven to be very useful for environmental metagenomic analysis. First of all, reads obtained from this type of sequencing are easy to assemble into larger contiguous pieces (called contigs) as they are sequenced in several-fold coverage. During contigs assembly different size fragments overlap with one another and a consensus sequence is deduced. Thus, as it increases the accuracy of the sequence assembly, this method has been used for whole genome sequencing of individual species (Venter et al. 2004). However, both bacterial and eukaryotic sequences should not be present in the sample due to the conservation of some genes, and hence the possibility of producing inaccurate chimeric contigs. For instance, due to the high conservation in the ribosomal regions, it is difficult for the assembly program to distinguish the source of the small fragments within the sample. For this reason Venter and colleagues performed bacterial sequencing separately from eukaryotic (Venter et al. 2004).

Lake Vostok accretion ice

For the past 15 years Lake Vostok ice core biological research was mostly based on fluorescent electron microscopy, culturing, 16S rRNA sequencing and limited phylogenetic analyses. Prior to our study, there were no previous reports of successful metagenomic analysis of Vostok accretion ice. The cell and nucleic acid concentrations in the accretion ice vary depending on ice core depth, and thus accretion ice differs by region of the lake. Two types of 12 accretion ice are different in terms of numbers of inclusions (minerals and cells). Ice that refreezes to the bottom of the glacier in the vicinity of the shallow embayment (region between glacier entry point and a peninsula) as well as near the shores of the main basin, was termed type

I accretion ice, contains high concentrations of inclusions (Bell et al. 2005; Siegert et al. 2001).

Type II accretion ice is formed over the deeper portions of the embayment and the main basin.

Most of this type of ice is extremely clear (almost no inclusions) and was shown to have low concentrations of ions (Bell et al. 2005; Siegert et al. 2001). The microbial population structure and quantity of organisms varies within these two types of ice. Type I accretion ice has higher ion concentrations (Siegert et al. 2001), biomass, and nucleic acids (Siegert et al. 2001; Priscu et al. 1999; Karl et al. 1999; Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Rogers et al.

2013; Shtarkman et al. 2013), while type II contains low concentrations of ions, biomass, and nucleic acids (Priscu et al. 1999; Karl et al. 1999; Christner et al. 2001, 2006; D’Elia et al. 2008,

2009; Rogers et al. 2013; Shtarkman et al. 2013; Lipenkov et al. 2002). Previous studies based on epifluorescence microscopy, electron microscopy, DNA analysis and culturing have shown that the mean numbers of cells in the ice cores range from 0-3 and up to 12 cells/ml, whereas other studies have reported concentrations of 100 to 700 cells/ml (Karl et al. 1999; Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Rogers et al. 2013; Shtarkman et al. 2013). While means for total (12-13 cell/ml) and viable cell (6-7 cell/ml) counts were previously shown to be highest for accretion ice from 3584/85 m (D’Elia et al. 2008), individual counts were as high as

35 cells per ml (S.O. Rogers, personal communication). 13

Lake Vostok samples preparation

In order to maximize and preserve nucleic acids from the Lake Vostok samples, we used a

MinElute Virus Spin Kit (OIAGEN, Valencia, CA). This kit was specifically developed to extract DNA and RNA from viruses. Even though the kit was designed for viral nucleic acid extraction, the protocol makes it possible to extract RNA and DNA from many types of organisms and from environmental samples. The Vostok accretion ice meltwater samples were filtered (MF-Millipore™ 0.22 μm Membrane Filters, EMD Millipore Corporation, Billerica,

MA, USA) and filters were stored in -80°C, whereas the flow-through was ultracentrifuged

(100,000 xg, or 32,500 rpm for 16 hours) and the pellets were used for the nucleic acids extraction. The pellets would have organisms smaller than the 0.22 μm (filter size), viruses and

DNA and RNA from broken cells. Therefore, the MinElute Virus Spin Kit became a handy tool for environmental DNA and RNA extraction, at the same time protecting from cross sample contamination (QIAamp® MinElute® Virus Spin Handbook, 2010).

Although metagenomic protocols primarily target sequences for specific genes, in this study the nucleic acids were amplified with random hexamer primers in order to construct cDNA libraries from ribosomal RNA gene regions, other non-coding RNAs and mRNAs present in the sample. The short random primer amplification is a common method used for producing cDNA from RNAs of unknown sequence composition. Also, this method is beneficial for several reasons. First, a mix of random hexamer primers contains 4096 different combinations, increasing the chances of binding to any segment of almost any nucleotide sequence. During the cDNA synthesis random primers can create multiple different size fragments from each gene.

Thus, the multiple cDNA copies of different sizes increase the sequencing coverage per each

RNA sequence by several-fold. More importantly for metatranscriptomic studies, random 14 hexamer primers are much more beneficial than oligo dT primers. Those oligo dT primers work only on mRNAs that have poly-A tails. As the size of mRNA gene sequence can vary from 1-1.5

Kb (in bacteria) and up to 38 Kb (human titin gene contains 363 exons), the reverse transcriptase has less chances of converting the mRNAs 5’ end into cDNA with oligo dT than with random primers.

Software overview

Computational programs for analysis of the sequence reads were selected based on the software available on the Ohio Super Computer (OSC) server. Most of the available clipping, assembly, and Basic Local Alignment Search Tool (BLAST) programs can be used as installed on a personal computer as well as in the command line scripting programs. Due to the large size of the sequence files (more than 1Mb of sequences), it is faster and easier for analysis to process them on special servers with already assembled software packages. Also, in order to ensure the proper functioning of downloaded and installed programs, the personal computer being utilized has to have enough working space and a powerful processor. Therefore, running the programs on the OSC server was more efficient and faster. Plus, the scripting system and execution commands on the OSC server are universal for different computational languages (BioPerl,

Matlab and others; https://www.osc.edu/supercomputing/software-list) and include these languages as separate modules. Also, examples of the batch files with descriptions can be also found on the OSC web-site (https://www.osc.edu/supercomputing/batch-processing-at-osc). In this study we used a Python shell script (http://bioinf.comav.upv.es/sff_extract/download.html) to clip off the 454 primer sequences, the MIRA 3.0.5 assembly program (Chevreux et al. 1999), and the BLAST search tool, which is a part of the Biosoftw module on the OSC 15

(https://www.osc.edu/supercomputing/software/BLAST). Two additional software programs were used: FileMaker Pro11 (http://www.filemaker.com/support/product/documentation.html;

FileMaker, Inc. Santa Clara, CA) and PAUP (Swofford, 2003). The former was used for database analysis, as it has a very convenient interface and easy scripting method, described under a documentation link in the FileMaker Pro11 Script Steps Reference. The latter is a phylogenetic package that includes several methods for phylogenetic analyses. Also, PAUP is compatible with several operating systems, and files can be output in several formats.

Summary

Current metagenomic protocols consist of two major steps for analysis of environmental nucleic acids. The first step combines microbiological techniques and molecular genetics approaches for sequence determination. The second step applies bioinformatics software to sequenced environmental nucleic acids to estimate taxonomic affinities to known organisms and to determine metabolic pathways within microbial communities. The combination of taxonomic analysis and metabolic reconstruction makes it possible to understand the overall species richness and physiology of these organisms in specific . More than that, for isolated environments like Lake Vostok, where organisms have undergone long-term evolutionary processes (about 15 million years to adapt and modify for survival), the taxonomic, metabolic and physiologic characteristics can be used to predict some of the geological and hydrological aspects of the lake. From a practical standpoint, the computational analysis was retested and the estimated time was 10 days from obtaining the 454 reads to the full retrieval and database analysis for the dataset of 150,000 reads. Figure 2 below represents the analytical steps 16

Figure 2. Various strategies for the metagenomic/metatranscriptomic analyses. Summary of the ways to analyze environmental data, starting clockwise (middle left) from collecting samples and clone library construction, followed by a series of genotyping procedures, sequencing processes and phylogenetic analysis, and closing this cycle with functional genomics microarray analysis, proteomic, and community modeling. Red arrows and circles represent the directions used in current study. Note that we constructed cDNA libraries for portions of the sequence analysis. The cloning technique was used for another metagenomic project, which is described in the Chapter II. This figure was adapted from reference Deutschbauer et al. 2006. 17 for the metagenomic analysis, which combined together, can be used to describe microbial populations and predict their metabolic dynamics (Deutschbauer et al. 2006). 18

References for the Chapter I

Bell R, Studinger M, Tikku A, Castello JD (2005) Comparative biological analyses of accretion ice from subglacial Lake Vostok. In in Ancient Ice; Castello JD, Rogers SO, Eds.;

Princeton University Press: Princeton, NJ, U.S.A, pp. 251-267.

Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP,

Evers DJ, Barnes CL, Bignell HR, et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456: 53-59.

Bilsker R (1998) Ethnography of a Nobel Prize. International Journal for Philosophy of

Chemistry, Vol. 4(2): 167-169. Review of: Rabinow P (1996) Making PCR: A Story of

Biotechnology. The University of Chicago Press, Chicago -vii, 190 pp. (ISBN: 0-226-70147-6).

Chevreux B, Wetter T and Suhai S (1999) Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proc German Conf on

Bioinf (GCB) 99: 45-56.

Christner BC, Mosley-Thompson E, Thompson LG, and Reeve JN (2001) Isolation of bacteria and 16S rDNAs from Lake Vostok accretion ice. Environm Microbiol 3(9): 570-577.

Christner BC, Royston-Bishop G, Foreman CM, Arnold BR, Tranter M, Welch KA, Lyons

WB, Tsapin AI, Studinger M, Priscu JC (2006) Limnological conditions in Subglacial Lake

Vostok, Antarctica. Limnol Oceanogr 51(6): 2485-2501.

Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, and Tiedje JM

(2004) The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33: D294–D296.

D’Elia T, Veerapaneni R and Rogers SO (2008) Isolations of microbes from the Lake Vostok

Accrecion Ice. Appl Environ Microbiol 74(15): 4962-4965. 19

D’Elia T, Veerapaneni R, Theraisnathan V and Rogers SO (2009) Isolations of fungi from the Lake Vostok accretion ice. Mycologia 101(6): 751-763.

Deutschbauer AM, Chivian D, Arkin AP (2006) Genomics for environmental microbiology

Current Opinion in . Elsevier Ltd. 17(3): 229–235.

Dupré J, O’Malley MA (2007) Metagenomics and biological ontology. Elsevier Ltd 38(4):

834–846.

Fierer N, Breitbart M, Nulton J, Salamon P, Lozupone C, Jones R, Robeson M, Edwards R

A, Felts B, Rayhawk S et al. (2007) Metagenomic and Small-Subunit rRNA Analyses Reveal the

Genetic Diversity of Bacteria, Archaea, Fungi, and Viruses in Soil. Appl Environ Microbiol

73(21): 7059–7066.

Gilbert SF (2010) Developmental biology. Ninth Edition. Sinauer Associated. Sunderland,

MA.

Glykos NM (2006) Software news and updates. Carma: a molecular dynamics analysis program. J Comput Chem. 27(14): 1765-1768.

Guo J, Yu L, Turro NJ, Ju J (2010) An integrated system for DNA sequencing by synthesis using novel nucleotide analogues. Acc Chem Res. 43(4): 551-563.

Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM (1998) Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products. Chem

Biol 5(10): R245-249.

Hawkins ME (2003) Fluorescent Nucleoside Analogues as DNA Probes. Pharmacology and

Experimental Therapeutics Section, National Cancer Institute, Bethesda, Maryland. In: Topics in

Fluorescence Spectroscopy 7: 151-175. Lakowicz JR (2003) DNA Technology. Kluwer

Academic / Plenum Publishers, New York. 20

Hugenholtz P (2002) Exploring prokaryotic diversity in the genomic era. Genome Biology

3(2): 1–8.

Ju J, Kim DH, Bi L, Meng Q, Bai X, Li Z, Li X, Marma MS, Shi S, Wu J, et al. (2006) Four- color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators. PNAS. 103 (52): 19635-19640.

Karl DM, Bird DF, Björkman K, Houlihan T, Shackelford R, Tupas L (1999)

Microorganisms in the Accreted ice of Lake Vostok, Antarctica. Science 286(5447): 2144-2147.

Lipenkov V, Istomin V, Bulat S, Raynaud D, Petit JR (2002) An Estimate of the dissolved concentration in subglacial Lake Vostok. AGU, Spring Meeting, abstract #B21A-06.

Mardis EL (2009) Next-Generation DNA Sequencing Methods. Annu Rev Genomics Hum

Genet 9: 387–402.

Metzker ML, Raghavachari R, Richards S, Jacutin SE, Civitello A , Burgess K, and Gibbs

RA (1994) Termination of DNA synthesis by novel 3'-modified-deoxyribonucleoside 5'- triphosphates. Nucleic Acids Res. 22(20): 4259–4267.

MinElute Virus Spin Kits. QIAamp® MinElute® Virus Spin Handbook. QIAGEN, all rights reserved (2010).

Priscu JC, Adams EE, Lyons WB, Voytek MA, Mogk DW, Brown RL, McKay CP, Takacs

CD, Welch KA, Wolf CF, et al. (1999) Geomicrobiology of subglacial ice above Lake Vostok,

Antarctica. Science 286(5447): 2141-2144.

Rappe M, Giovannoni S (2003) The uncultured microbial majority. Annu Rev Microbiol 57:

369–394.

Rodrigues-Valera F (2004) Environmental genomics, the big picture? FEMS Microbiology

Letters 231(2): 153-8. 21

Rogers SO (2012) Integrated Molecular Evolution. CRC Press, Boca Raton, FL.

Rogers SO, Shtarkman YM, Koçer ZA, Edgar R, Veerapaneni R, and D’Elia T (2013)

Ecology of subglacial Lake Vostok (Antarctica) based on metagenomic/metatranscriptomic analyses of accretion ice. Biology 2(2): 629-650.

Rothberg JM & Leamon JH (2008) The development and impact of 454 sequencing. Nature

Biotechnology 26(10): 1117-1124.

Shtarkman YM, Koçer ZA, Edgar R, Veerapaneni RS, D’Elia T, Morris PF, Rogers SO

(2013) Subglacial Lake Vostok (Antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya. PLoS ONE 8(7): 1-13.

Siegert MJ, Ellis-Evans JC, Tranter M, Mayer C, Petit J, et al. (2001) Physical, chemical and biological processes in Lake Vostok and other Antarctic subglacial lakes. Nature 414: 603-609.

Simon C, Wiezer A, Strittmatter AW, and Daniel R (2009) Phylogenetic Diversity and

Metabolic Potential Revealed in a Glacier Ice Metagenome. Applied Appl Environ Microbiol

75(23): 7519–7526.

Simon C. and Daniel R (2012) Metagenomic Analysis: Past and Future Trends. Appl Environ

Microbiol 77(4): 1153–1161.

Staden R (1979) A strategy of DNA sequencing employing computer programs. Nucleic

Acids Res 6(7): 2601-2610.

Swofford DL (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other

Methods). Version 4. Sinauer Associates. Sunderland, Massachusetts.

Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I,

Nelson KE, Nelson W, et al. (2004) Environmental Genome Shotgun Sequencing of the Sargasso

Sea. Science 304: 66-74. 22

CHAPTER II

Introduction

Lake Vostok is the largest and deepest of 400 subglacial lakes in Antarctica. It is of great interest for scientists because of the unique extreme conditions, nutrient levels and long term isolation from the atmosphere (MacGregor et al. 2009; Wright and Siegert, 2011; Kapista et al.

1996; Studinger et al. 2003). Lake Vostok covers an area of 15,690 km2 (Figure 3), which is close to the size of the Lake Ontario (18,960 km2). It is the sixth largest lake (by volume) on

Earth. The ice cover formation started approximately 34 million years ago (Myr), whereas complete isolation of the lake from the atmosphere began more than 15 million years ago

(Barrett 2003; Kapista et al. 1996; Siegert et al. 2001; D’Elia et al. 2009; Bulat et al. 2004).

Previous studies described some of the unique environmental conditions of the lake (Ekaykin et al. 2010; Siegert et al. 2001; Salamatin et al. 2009; Priscu et al. 1999, 2005; Christner et al.

2006). Lake Vostok is located below about 4000 meters of glacial ice (3700 m thick over some portions of the lake and 4200 m in other locations) (Christner et al. 2006; Abysov et al. 2001;

Bulat et al. 2009; Lipenkov et al. 2002). This results in a pressure of 350–400 atmospheres

(5878 psi) and complete absence of sunlight. The coldest temperature ever found on Earth (-

89°C) was recorded on 21 July 1983 (Turner et al. 2009) at the surface of the glacier covering

Lake Vostok, whereas the average temperature of the lake itself could be as cold as -3°C (Siegert et al. 2001; Bulat et al. 2004, 2009; Abysov et al. 2001; Karl et al. 1999; D’Elia et al. 2008; Jean-

Baptiste et al. 2001). Because the lake is not frozen and because of the possible hydrothermal/geothermal activity and ablation, portions of the bottom of the glacier melt into the lake. As the glacier moves across the lake to the drill site the lake water refreezes to the bottom

(accretes) forming a 230 m-thick layer of ice. Two types of accretion ice have been described. 23

Subglacial Lake a. W Vostok below 3700-4200 m of

glacier ice Antarctica Length 250 km S Width 50 km Mean depth 400 m Max. depth 1200 m 2 Size 14000 km 3 Volume 5400 km b. 3.

Pressure 400 atm 2.

Temperature -2 to -3 °C 1. N Lake Vostok Nutrients low Oxygen high 50 km E Nitrogen high

Figure 3. Position and location of subglacial Lake Vostok. a. – Antarctic continent with the location of Lake Vostok. b. – enlarged satellite image of the Lake Vostok. Black dashed line is an approximate outline of the lake. White letters S, N, E, W represent cardinal direction. Dashed white arrows represent glacier entry point into the lake. Solid white arrows and numbers correspond to: 1. - the Lake Vostok station; 2. - peninsula or ridge; 3. - shallow embayment area.

Figures a. and b. are courtesy of NASA Visualization Studio; images were made using the

RADARSAT dataset of Antarctica by Stuart A. Snodgrass on November 8, 1999. The geological and chemical characteristic of the lake are shown on the left from figures a. and b. This figure was edited from Rogers et al 2013. 24

Type I accretion ice is formed in the vicinity of the shallow embayment and in the shallow portions of the main basin, in the proximity of a peninsula (Figure 4). This type of ice contains fine particulate matter (Bell et al. 2005), as well as higher concentrations of ions (Siegert et al.

2001), biomass, and nucleic acids (Siegert et al. 2001; Christner et al. 2006; D’Elia et al. 2008,

2009; Rogers et. al, 2013; Shtarkman et al. 2013). Type II accretion ice is formed over deeper portions of the shallow embayment and in the southern main basin (Figure 4). It is extremely clear (almost no inclusions) and has low concentrations of ions, organisms and nucleic acids

(Priscu et al. 1999; Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Lipenkov et al. 2002;

Karl et al. 1999). It has been estimated that the mean accreting rate (if the melting areas are of the same size as the refreezing areas) in the lake is close to 2.3 cm/year, which means that 230 m of accretion ice at the bottom of the glacier would form in approximately 10,000 years. However, the accretion rate changes depending on the location over the glacier flow line; for example, the shallow embayment accretion rate was estimated to be lower than the main basin (Siegert et al.

2001; Lipenkov et al. 2002; Bell et al. 2002, 2005). Bell suggested that if none of the embayment accretion ice is transferred over the peninsula, then the rate along the western shore would be close to 2.5 cm/year, whereas close to the eastern shore the accretion rate could be up to 2.9 cm/year (Bell et al. 2002). This correlates with the possible presence of hydrothermal activity leading to more melting than refreezing in the embayment region of the lake (Priscu et al. 2005;

Bulat et al. 2004; Karl et al. 1999; Bell and Karl, 1999; Bell et al. 2002; Bulat et al. 2011). Even though the nutrient levels are very low, it has been reported that the level of oxygen is high, because tiny bubbles consisting of atmospheric gasses are constantly delivered to the lake from melting glacial ice (Siegert et al. 2001; Priscu et al. 1999; Lipenkov et al. 2002; Karl et al. 1999). 25

105ͼ N

1010 kmkm 78ͼ 78ͼ

Embayment

W V5 E V6

Main basin

Peninsula

105ͼ S

Figure 4. Graphical representation of the southern part of Lake Vostok redrawn to scale from the reference Salamatin et al. 2009. Black dashed lines represent the elevation above the sea level of the ice above the lake. Black dashed arrows on the western shore of the lake represent the glacier entry to the lake. Red arrow represents glacier flow line to the drill site with accretion ice accumulation along the glacier flow line. The 78° and 105° represent longitude and latitude coordinates; the Vostok station coordinates are 78° 27’ 51.92” S and 106° 50’ 14.38” E. Possible shallow regions of the lake along the coast are shown as dark-gray. Ice core sections at 3563 and

3584/3585 m accreted shallow embayment represent V5 sample, and core sections 3606 and

3621 m accreted in the southern main basin of Lake Vostok represent V6 sample. 26

In 2006 it was discovered that a network of rivers exist under the ice, which connects

Antarctica's subglacial lakes. This network might be another source of oxygen in the lake, but so far it is not certain whether Lake Vostok is a part of this network (Duncan et al. 2006; Priscu et al. 2008). However, the surface of Lake Vostok is more than 200 m below sea level, and therefore, it may only receive inputs from other lakes and/or the ocean, but might not have an outlet.

Lake Vostok origin

During the late Mesozoic era, approximately 184 Myr ago, Gondwana began to break apart.

Antarctica together with Madagascar, India, and Australia began to separate from the African continent (Chatterjee and Scotese 1999). Subsequently, Antarctica started to separate from

Australia about 80 million years ago and became completely separate about 40 Myr ago moving towards the South Pole. Lake Vostok was probably already forming by that time. The first explanation was based on the lake shape. By comparing the stretched narrow shape of Lake

Vostok with similar shaped lakes like Lake Malawi (Africa), Robin Bell deduced the geological origin (Bell and Karl 1999). During the break up of Gondwana, the tectonic/lithosphere plate movement caused the formation of the African Rift Valley, parts of which have become lakes

(Bell and Karl 1999). Bell proposed that Lake Vostok originated in the same way. Based on the crustal architecture and possible uplift mechanisms of the Gamburtsev subglacial mountains

(Antarctica) ~60 Myr ago, it was determined that Lake Vostok originated in a graben at the base of those mountains, as the land subsided in that region (Bell and Karl 1999; Ferracciolli et al.

2011). On the other hand, considering that 50 Myr ago sea level was much higher than now, it is possible that the Antarctic continent, while moving to the current location, about 34 Myr ago was 27 at least partly submerged in what we now know as the Southern Ocean (Rogers et al. 2013; Pross et al. 2012).

Presently, the lake origin and possible presence of any life forms is still debatable. In 2012

Jorg Pross and colleagues published a paper discussing possible climate conditions of the

Antarctic continent during the late Paleocene (~56 Myr) and early Eocene (~48 Myr). Analyzing the Wilkes Land coast of East Antarctica, they came to the conclusion that during these periods of time, environmental conditions could support growth of near-tropical (mesothermal) forests

(Pross et al. 2012). This idea came from microscopy analysis of the sediment, sporomorph-based climate reconstruction, organic geochemistry by high-performance liquid chromatography

(HPLC), and atmospheric pressure chemical ionization mass spectrometry (APCI-MS).

Additionally, in the course of their study they defined two biomes of the Wilkes Land region. A paratropical rainforest biome consisted primarily of palms (Araceae), bamboo trees (like

Malvaceae), and some Olacaceae species, representing the Early Eocene era (~54-52 Myr). This changed to a temperate rainforest biome in the beginning of the Middle Eocene (~49-46 Myr), completely replacing a paratropical biome. Jorg Pross also mentioned that Antarctic past paratropical biome is still represented in Australia, New Guinea, and New Caledonia (Pross et al.

2012). The presence of Nothofagus fusca pointed towards cooling climatic conditions. Based on the predicted greenhouse gas concentrations, they concluded a continental temperature reconstruction, suggesting coldest-month temperature means RI•10°C and mean winter temperatures of 11±5°C during the Early Eocene (which is consistent with the deep water temperatures for that time period). As Antarctica moved close to its current location (~46-34

Myr), the temperature decreased. In the Middle and Late Eocene, the presence of the rainforests and warm surface water temperatures of the surrounding Austrailo-Antarctic gulf created a high 28 moisture environment, leading to cold rains and possibly snow (Barrett 2003; Pross et al. 2012).

The waters surrounding the Austrailo-Antarctic continent started to cool down until approximately 35 Myr ago (first half of the Oligocene) when West Antarctica separated from

South America. This event caused the disconnection of the warm currents, which previously passed over the northern side of Antarctica, subsequently cooling the continent, until 15 Myr ago it became completely covered with ice (Priscu et al. 1999; Abysov et al. 2001, 2004; Karl et al.

1999). As the Antarctic Gamburtsev Mountains continued to rise, the ice sheet (Barrett 2003;

Bell and Karl 1999; Pross et al. 2012; Bell et al. 2002) started to spread covering East Antarctica, subsequently covering Lake Vostok and other neighboring lakes. Today, the nearest and highest point to Lake Vostok is Ridge B. Currently, Ridge B reaches 3800 m, whereas Vostok Station is at the 3480 m, making 320 m of elevation difference. The glacial ice sheet travels 310 km east

(currently highest point Ridge B is at 3800 m elevation) towards the lake and then another 60 km from the point when it enters the lake towards the drill site in the southeast direction (Salamatin et al. 2009) (Figure 4).

Antarctic drilling project

The primary goal of the Russian expedition in the 1950s, was to determine the thickness of the ice cover in Antarctica. During the first expeditions in 1955 and 1957, Russian geographer

Andrej Kapitsa, using directed explosions and seismic equipment, tried to determine the presence of the lake, but they only discovered that the ice thickness was about 4 kilometers

(http://lenta.ru/news/2011/08/03/kapitsa/) (Kapista et al. 1996). However, no liquid water was detected at that time. For about 20 years, scientists refused to believe that a lake existed below the ice, until in 1973 radio echo sounding survey by Oswald and Robin (Oswald and Robin 29

1993) detected reflections indicating a lake. Later, G.P. Ridley (UK), using ERS-1 laser altimetry, confirmed the presence of the subglacial lake in 1993 (Ridley et al. 1993). Since then

Lake Vostok, named after the surface station Vostok (which in Russian means “East”), has been of great interest to scientists (Polar Research Board 2007). The primary research areas from that time were to determine the dynamics and formation of the ice sheet as well as the origin of the lake itself. A complete airborne study of the lake was initially performed during a 1999-2000 expedition by an Italian group (Figure 5) (Tabacco et al. 2002).

Since 1967 several boreholes have been drilled with various drilling equipment. The latest

(February 2012) and the deepest (3768 m) borehole was labeled 5G-3 indicating that it was the fifth drilling cycle (Soviet Antarctic Expeditions #35-38, 40-43, corresponding 1990-1992, 1995-

1998). The borehole label “G” comes from the Russian alphabet letter “Ƚ”, which is the 4th letter, meaning that it was the fourth successfully drilled borehole, and the number “3” indicates that there were three deviations from the main borehole (Talalay 2004). These deviations were necessary because the drills became stuck, and diagonal drilling with another drill had to be used to proceed around and past the abandoned drills. Over the past 15 years, scientists from Russia,

USA, and France have been striving to predict the microbial, geological and biochemical structure of the lake.

Ever since the existence of the lake was proposed in the 1950s (Polar Research Board 2007) and up to the end of the 20th century, the primary focus was on the development of ice core dating systems and methods capable of measurement of global climate changes (Bell et al. 2005;

Bell and Karl 1999; Petit et al. 1999). When ice forms it entraps atmospheric gases, volcanic ash, microbes and biomolecules present in the air and precipitation. Therefore, everything that has been released into the air due to certain climatic events could be preserved in ice. 30

W E 0

1000

2000 Subglacial Mountains

Glacier entry 3000 to the lake Ice thickness (m) Ice thickness 4000

The Lake Vostok surface 5000 0 25 50 75 100 Distance (km)

Figure 5. Airborne ice penetrating radar results from 1993-2000 expeditions. This diagram shows a portion of the radar survey with the Western part of the bedrock surrounding Lake

Vostok. The glacier entry point to the lake is the approximate point where the glacier contacts the lakebed. Currently, the water surface of the Lake Vostok is more than 200 meters below the sea level, and 3538/3539 meters of glacial ice with 231 meters of accretion ice cover the lake.

The wavy red lines in the upper part of the radar picture represent the glacial layers over the subglacial mountains and the lake. Dotted black vertical lines represent approximate lake water borders. This picture was adapted from the reference Tabacco et al. 2002. 31

Similar to tree rings, scientists explore segments of the ice cores meter-by-meter, where analyzed segments correspond to the age of the ice, to determine various climate changes due to the entrapped gasses and mineral inclusions. Previous ice core studies suggest that the shallow embayment is more saline (Siegert et al. 2001; Christner et al. 2006; D’Elia et al. 2008, 2009;

Rogers et al. 2013; Shtarkman et al. 2013) and is similar to a marine environment, while the accretion ice from the main basin is freshwater and contains almost no mineral inclusions or biomass (Priscu et al. 1999; Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Lipenkov et al.

2002; Karl et al. 1999).

Microbial composition of the lake

The ice sheet over Lake Vostok travels approximately 60 km across the lake. This distance represents a time line of glacial movement over the lake during which ice has accreted to the bottom of the glacier while entrapping microorganisms and nucleic acids. The accretion and movement of the glacier lead to a temporal and spatial record of these biological inclusions preserved in the accretion ice. Robin Bell estimated that the glacier moves across the lake with the speed of 3 m/year and with an average accretion rate along the glacier flow line close to 2.8 cm/year. She suggested that such accretion rate would be sufficient enough to create the first 60 m of the 231 m of accretion ice. However, as she found that the first 60 meters of ice along the flow line accreted over the first 5 km of the embayment region of the lake, it became clear that the accretion rate differs between the two basins and also depends on whether accreted shallow embayment ice remains in the embayment or is transported over the peninsula (or ridge). Bell estimated the accretion rate of the embayment ice to be between 1.4 and 2.5 cm/year, and as the glacier passes the peninsula this rate decreases to 0.7 cm/year, thus holding an accretion time 32 record of ~11,000 years at the bottom of 5G ice core. Also, it takes approximately 16,000 years

(assuming that the ice sheet velocity remains the same at all times) for the glacier to travel about

60 km from the western entry point of the lake to the drill site at the southeastern end of the lake

(Bell et al. 2002). During this movement, water from the surface of the lake accretes onto the bottom of the glacier forming a 231 m-thick layer of ice. This type of ice is formed over two regions of the lake (shallow embayment and main basin) and is divided into two types of accretion ice (Figure 4, 6). In the western portions of the shallow embayment region, at the glacier entry point and near peninsula region, higher numbers of mineral inclusions and higher ion concentrations are found, as the glacier literally scrapes and passes over part of the lakebed. This type of accretion ice is termed type I (or type 1). Studies have shown that some sections of the ice core have low concentrations of microbes, whereas much of the type I accretion ice has higher numbers and diversities of microorganisms (D’Elia et al. 2008, 2009; Bulat et al. 2004; Abyzov et al. 2001; Karl et al. 1999). Ice, collected from the eastern deeper portion of the embayment and open water regions of the lake, contains very few inclusions and has low diversity and quantity of microorganisms. This type of ice was termed type II (or type 2) accretion ice (Figure 4, 6) (Priscu et al. 1999; Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Karl et al. 1999).

Accretion ice type I

Abyzov and colleagues analyzed ice cores from a variety of type I accretion ice sections as well as several from type II ice core sections (Abyzov et al. 2001). Using primarily epifluorescence microscopy, they described many prokaryotic microbes within ten ice core sections. Some of the larger size cells were morphologically described as Caulobacter-like

(Alphaproteobacteria) organisms and budding forms. The dominant smaller microorganisms 33

3500 3501 6 cells/ml Bottom of the 3520 3534 glacier ice layer 3535 7 cells/ml 3538 3540 6 cells/ml 3540 <200 cells/ml 35413541

3555 600 cells/ml Abyzov S.S. 8.6 inclusions/m 3563 3-4 cells/ml Shallow 250 cells/ml 3565 Embayment 100 cells/ml 3579 3582 3584 250 cells/ml 35843584 35853585 12-13 cells/ml 3585 3587 Stambrotto R. <1000 cells/ml 3590 Priscu J.C. 3591 5 cells/ml <1 inclusions/m 3592 Abyzovy S.S. Glacier grounding line 900 cells/ml 35933593 Christner B.C.

500 cells/ml 3598 Abyzov S.S. 4 inclusions/m Peninsula < 300 cells/ml 3603 Karl D.M. 3606 <5 cells/ml 700 cells/ml 3606 3608 Abyzov S.S. 3610 <4 cells/ml 100 cells/ml 3611

3613 3 cells/ml Main Basin <1 inclusions/m 3619 3619 Bulat S.A. 3621 2 cells/ml

Figure 6. A schematic representation of the Lake Vostok ice core. Top 38 m represent the glacier ice followed by the less than 2 m of the transition line (grey) between meteoric and accretion ice and top 80m (of 231 m) of accretion ice. Blue regions indicate type I accretion ice, while red areas indicate type II accretion ice. The right side of the ice core shows depths and cell concentrations (in the middle) for specific ice cores that were analyzed by the corresponding authors. Blue dashed lines also separate two types of ice and show previously estimated number of inclusion/meter. On the left are the cell concentrations and the ice core segments analyzed by

S.O. Rogers and colleagues (D’Elia et al 2008, 2009). The red stars indicate the ice core 34 segments analyzed in this study (Rogers et al 2013; Shtarkman et al 2013). Ice core sections at

3563 and 3584/3585 m (corresponding to the shallow embayment) represent V5 sample, and core sections 3606 and 3621 m (corresponding to the southern main basin of Lake Vostok) represent V6 sample. 35 were Micrococci-like and rod shaped organisms. The highest cell concentration of <100 cells/ml was recorded for the ice cores from 3534 m depth, which is located at the bottom of the glacier only a few meters above the transition line between glacier and accretion ice (Abyzov et al.

2001). Also, ice cores 3555 m (type I accretion ice from the shallow embayment), and 3592 and

3606 m (type II accretion ice, from the shallow embayment open water and main basin) showed high cell concentrations from ~50-70 cells/ml (Abyzov et al. 2001) (Figure 6). The shallow embayment grounding line accretion ice had the largest number of inclusions. Single yeast cells and partially lysed fragments of fungal mycelium were detected by Abyzov as well (between

3541 m and 3585 m) (Abyzov et al. 2001). In addition to Abyzov’s results, Karl also described small coccoid and thin rod shaped bacteria in the shallow open lake grounding line/shoreline region (3603 m; type I ice) (Karl et al. 1999). One of the interesting points raised by Karl was the energy source for the microbial life. He speculated that if the source is coming from above, then the lake is probably oligotrophic, but, at the same time, he mentioned that there is a possibility that metabolic processes might be supplied from geothermal energy sources in the lake (i.e. ), because Lake Vostok originated from a rift (Priscu et al. 1999;

Bulat et al. 2004, 2011; Bell and Karl 1999; Bell et al. 2002). Two years later, Christner published a study of the ice core from the 3593 m depth, which is from a transition zone representing the border between the deep part of the main basin and the shallower portion of the main basin, near a peninsula (Christner et al. 2001). Christner’s group isolated and phylogenetically analyzed bacterial clones that were similar to members of the Bacteroidetes

(Sphingobacterium heparinum), Firmicutes (Alkalibacterium olivoapovlitic), and Actinobacteria

(Rubrobacter xylanophilus) (Christner et al. 2001). In 2004, Sergey Bulat found sequences related to Sphingomonas (Alphaproteobacteria), Acinetobacter johnsonii 36

(), and Chondromyces apiculatus () (Bulat et al. 2004).

Deltaproteobacterial species are one of the primary interests for us, as in our metagenomic data we have species of this related to bacteria inhabiting deep-sea hydrothermal areas.

Using fluorescent staining and scanning electron microscopy, Rogers’ group (D’Elia et al.

2008) identified possible viable cells as well as dead cells in the ice cores of 3540, 3563, 3582, and 3584/5 m depth morphologically similar to coccoid bacteria, filamentous bacteria, and fungi

(D’Elia et al. 2008, 2009). Analysis of the 3606 m depth ice core indicated the presence of small

Caulobacter-type microbes. Several images showed the presence of -type cells, spiral- shaped, linear, and angular shaped cells. Further phylogenetic analysis of the SSU rRNA gene sequences confirmed sequences to be closely related to those from Alphaproteobacteria,

Actinobacteria, and Firmicutes (D’Elia et al. 2008). Phylogenetic analysis of eukaryotic internal transcribed spacer regions (ITS) and 5.8S gene from cultured isolates of 4 ice cores of the same depth and one from 3591 m confirmed the presence of fungi. The majority of which were related to sequences (species of Rhodotorula and Cryptococcus). A few sequences were closely related to sequences of various species (the genera Phoma, Cladosporium,

Aureobasidium, and Penicillium) (D’Elia et al. 2009). One cell was morphologically similar to a type of green algae (D’Elia et al. 2008).

Accretion ice type II

Ice that has accreted from the main basin water region (peninsula shallow embayment open lake with the main basin; Figure 6) has very few inclusions, and the diversity and quantity of microorganisms is very low. The deeper portion of the embayment region was shown to contain fungi, rod-shape bacteria, marine and fresh-water diatoms, and pollen grains (Sambrotto and 37

Burckle 2005). Cells were encountered within the Alphaproteobacteria, Betaproteobacteria, and

Actinobacteria (Afipia, , Comamonas, and Actinomyces) (Priscu et al. 1999), while single cell algae and diatoms were detected by Abyzov (Abysov et al. 2001).

A Sphingomonas sp. (Alphaproteobacteria), several Betaproteobacteria (phototrophic

Rhodocyclus group), rhodesiae and parainfluenzae

(Gammaproteobacteria), Chondromyces apiculatus (probably contaminant Deltaproteobacteria),

Micrococcus luteus (Actinobacteria), and Abiotrophia defective (Firmicutes) sequences were found in the accretion ice from 3607 meters by Sergey Bulat (Bulat et al. 2004). He also found two sequences closely related to thermophilic bacteria, which were phylogenetically close to

Hydrogenophilus (Bulat et al. 2004), again suggesting the presence of geothermal activity.

Sequence LV3607bR-40A was 100% identical to Hydrogenophilus thermoluteolus TH-1strain, whereas LV3607bR-1G clone differed by one nucleotide substitution (Maximum Parsimony method with 100 bootstrap replicates) (Bulat et al. 2004; Abyzov et al. 2004).

Analysis of the ice core segment 3622 by Christner in 2006 showed the presence of a variety of Proteobacteria species (Christner et al. 2006). Sequences were closely related to

Methylobacillus (Betaproteobacteria), Acidithiobacillus ferrooxidans (Gammaproteobacteria), and members of the Deltaproteobacteria. The chemolithoautotrophic Acidithiobacillus sp. are

0 strict anaerobes that respire via reduction of SO4, S , Fe (III), and Mn (IV) (Christner et al.

2001).

Rogers’ group studying ice core sections from 3591 and 3610 m depths showed the presence of small Caulobacter-type cells and several bacterial cells that looked similar to those found by them in the ice cores 3582, 3584/5 and 3606 m (discussed above) (D’Elia et al. 2008). Also, analysis of 10 accretion ice core sections from the shallow embayment at the glacier entry point 38 and main basin area revealed the presence of eukaryotic organisms. The sequences were closely related to Basidiomycetes species (closest to species of Pseudozyma and Ustilago) (D’Elia et al.

2009), some of which were previously found in glacial ice by Abyzov (Rhodotorula mucilaginosa, Cryptococcus) (Abyzov et al. 2004). In addition to that, several species of the

Rhodotorula isolates were previously recovered from the ancient Greenland glacier ice by

Rogers group, as well as many Ascomycetes (like Penicillium and Cladosporium), and a few

Basidiomycetes (Ma et al. 1999; Ma et al. 2000; Stamer et al. 2005). Also, several isolates were phylogenetically related to various Ascomycetes, i.e. Penicilium, Aspergillus, and Cladosporium species, as well as Aeurobasidium pullulans, Phoma spp. and Cerebella androprogonis (D’Elia et al. 2009). In addition, main basin type I ice was shown to contain diatoms and species at the depth of 3611 m (Abysov et al. 2001).

This study presents a full metagenomic/metatranscriptomic analysis of four Lake Vostok accretion ice core section (V5 included 2 core sections - 3563 and 3584/3585 m; V6 included 2 core sections - 3606 and 3621 m) in order to address several different questions. Before 1999, the scientific community considered Lake Vostok to be sterile, because it was assumed that no living could survive the cold temperatures and high pressures. Secondly, it was assumed that the lake originated from the rift, which in the beginning was ice-covered and then ice melted from the bottom due to the geothermal activity. Therefore, ice melted and formed a sterile lake under the glacier sheet (Bell and Karl 1999). However, in the past 15 years research results have indicated that the lake is far from sterile, and it contains some regions with higher ion concentrations. Previous studies based on fluorescence electron microscopy, some phylogenetic analysis, metabolic analysis of the meltwater, and cell culturing pointed out the presence of microbes in the ice sample, thus indicating their possible presence in the lake, but at the same 39 time causing some speculation regarding the geology, biochemistry, and biota of this subglacial environment (Siegert et al. 2001; Priscu et al. 1998, 1999, 2005; Christner et al. 2001, 2006;

D’Elia et al. 2008, 2009; Bulat et al. 2004, 2009, 2011; Abysov et al. 2001, 2004; Karl et al.

1999; Rogers et al. 2013; Shtarkman et al. 2013; Pearson et al. 2001; Sambrotto and Burckle L

2005). Lake Vostok ice was first penetrated on the 5th of February 2012, giving the scientific community new possibilities for analysis of actual water from one of the most isolated environments on Earth. That said, those water samples retrieved from the Lake Vostok in the year 2012 were heavily contaminated with a drilling fluid.

Cold stress mechanisms

In 2002, Thomas and Dieckmann performed analysis of ice samples collected from the

Southern Ocean (Thomas and Dieckmann 2002). The importance of their work was in describing the metabolic capabilities of organisms entrapped in the sea ice. As the temperature dropped they noticed a transition from a diverse bacterial population to psychrophilic and psychrotolerant species (Thomas and Dieckmann 2002). These organisms were found capable of regulating their membrane fluidity by increasing the concentration of the polyunsaturated fatty acids. This important property protects the cells from freezing. Analysis of coastal Antarctic ice samples

(Dronning Maud Land) by Runa Antony and colleagues (Antony et al. 2012) reported almost 90% of the total membrane lipids in isolated Methylobacterium species were composed of unsaturated fatty acids. Bacillus (Firmicutes) and Micrococcus cells had membranes built of branched fatty acids and anteiso-C15:0 (Up to 70% of total membrane lipids) (Antony et al. 2012; Pearson et al.

2001). Also, in diatoms, regulation of membrane fluidity was shown to be essential for function of the electron transport system at low temperatures, low light irradiation, and nitrogen limitation 40

(Thomas and Dieckmann 2002). An inability to regulate the concentration of the unsaturated fatty acids during nitrogen limitation conditions affects protein biosynthesis, which can cause significant damage to the membrane stabilizing proteins and pigments. Psychrophilic diatoms increase the production of the galactosyl- and phosphatidilglycerol molecules to stabilize the membrane. Sugar metabolism also plays a key role in cell protection during ice formation.

Synthesized produce complex soluble organic compounds on the cell surface and can be considered as another defense system to prevent ice crystal damage (Thomas and

Dieckmann 2002). As the temperature declines, ice trapped bacteria and algae change their nitrogen utilization from nitrate to ammonia. The reason is due to the slightly elevated pH in sea ice brines. Increase in pH changes ammonium to ammonia, which accumulates in the cell, where it is converted to , which serves as an inhibitor for nitrate reductase. Also, psychrophilic bacteria in sea ice were shown to regulate their lipid packaging and the proportion of fatty acids by salt tolerant enzymes as a part of the salinity stress response (Thomas and Dieckmann 2002). In

2006 some Antarctic subglacial lakes were shown to have networks of subglacial rivers (Bell et al.

2002; Bulat et al. 2011) and as the Southern Ocean surrounds Antarctic continent, it may contribute to the microbial diversity and biochemistry of subglacial lakes by being connected via those rivers. Described by Thomas and Dieckmann metabolic processes associated with synthesis of extracellular polymeric substances in sea-ice algae and bacteria suggest novel cold stress defense mechanism for ice entrapped organisms.

Other planets

For centuries people considered that our solar system was unique, and our planet is the only one in this solar system that can sustain life due to the perfect distance from the sun, the presence 41 of abundant water, an atmosphere, and favorable environmental conditions in general. As we all know, organic life requires liquid water. From this standpoint, just in the past 10 years Jupiter’s moon, Europa, has become one of the primary space research targets (Figure 7). This moon was proposed to have 100 km-deep oceans covered completely with kilometers-thick ice. Much of the ice is cracked and ridged, which indicates possible convection in the subtending oceans

(Geissler et al. 1998; Figueredo and Greeley 2004; Goodman et al. 2007). This may indicate that warm geothermal activity exists in these regions. As ice melts and constantly refreezes, it forms fractures in the ice that build a surface layer called brittle crust (Figure 7a). This crust is basically a “fragile” layer of ice that can bend and crack under stress (or its own weight). Over time, fractures crack and form marginal ridges of different lengths and widths (some 1-5 km wide), (Figure 7). From images of surface crust fractures and their ridges it was concluded that

Europa’s ice crust formation appears to undergo (Geissler et al. 1998; Figueredo and Greeley

2004; Goodman et al. 2007) the same or similar processes as the frozen Southern Ocean, while the melted and refrozen bottom of that crust is similar to the process of ice accretion as in Lake

Vostok.

This means that if Europa has geothermal energy in the form of heat and light, then the liquid water beneath the ice may be capable of sustaining microscopic life. At the same time the environment of one of the closest bodies to Earth has been the topic of much discussion and debate regarding the biotic and abiotic composition of the planet Mars (Bibring et al. 2007; Yung et al. 2010; Gendrin et al. 2005; McKay et al. 2005). Based on the atmosphere/surface mineral composition (various sulfates, ferric oxides, and methanogenesis), Mars was shown to have glaciers (Figure 7) and to be geochemically active and probably had an environment close to

Earth (Bibring et al. 2007; McKay et al. 2005) in the past. If true, it is possible that life appeared 42

a

b c

Figures 7. NASA satellite images of the Jovian moon Europa and Mars surfaces (pictures were NASA colored). a. Close-up of the ice crust in the Connemara region of the planet Europa taken with a Solid State Imaging (CCD) system of NASA's Galileo spacecraft, while orbiting through the Jovian system (http://photojournal.jpl.nasa.gov/jpeg/PIA01127.jpg). b. Europa from the distance as photographed by robot spacecraft Galileo. Visible are plains of bright ice, cracks that run to the horizon, and dark patches that likely contain both ice and particulate matter (1995-

1998) (http://solarsystem.nasa.gov/planets/profile.cfm?Object=Europa). c. Computer-generated view of Mars reported at Mars Daily Global Image from April 1999, with blue-white water clouds over volcanoes. At the top is the ice cap, composed primarily of frozen carbon dioxide

(http://www.jpl.nasa.gov/spaceimages/details.php?id=PIA02653). 43 and then disappeared from Mars several billion years ago. Even though the Martian surface is still considered to be inhospitable to life of any known organism, the presence of CO2 glaciers and water beneath them might permit methanogenesis, and the ion content might be a source of life for those microscopic species that adapted to these conditions (Yung et al. 2010; Gendrin et al. 2005; McKay et al. 2005).

Research statement

In this study, we present the first ever performed metagenomic and metatranscriptomic analysis of Lake Vostok accretion ice accumulated from two different regions of the lake. The estimated glacier velocity is 3 m/year, which means that the duration needed to traverse the lake to the drill site (~48 km) is approximately 10,000 years. During that time, ice accreted to the bottom of the glacier will represent the surface water from different parts of the lake along the glacier flow line. In this case the life forms in the lake, as well as non-viable cells and environmental nucleic acids could be trapped in that ice. Two ice cores from the depths of 3563 and 3585 m (V5 sample) represent ice accreted from the surface water in the vicinity of the embayment region of the lake approximately 7,000 to 8,000 years ago, while 3606 m and 3621 m ice cores (V6 sample) represent water that accreted from the main basin between 5,500 and

4,000 years ago. Previous cell count results based on electron microscopy, culturing, and phylogenetic analysis suggest that the cell concentration in the top 60 m of the accretion ice

(5,000 – 10,000 year old) differs within various regions of the lake and ranges from 0 cells and up to 13 cells/ml, while other studies have reported from zero to a few hundred cells/ml. With that, multiple studies indicate that shallow embayment type I accretion ice traps more mineral inclusions than the type II, and the ion concentrations differ greatly. According to the previous 44

Rogers and colleagues (D’Elia et al. 2008, 2009) results from the corresponding ice core sections, the mean cell concentrations ranges from as little as 3-4 cells/ml in the middle of the embayment and up to maximum of 12-13 cells/ml while approaching the peninsula (few samples had 20-35 cells per ml). Such range of detected viable cells between 3563 and 3585 m ice cores indicates higher biological activity in the shallow embayment. Thus, combined into one sample, both ice cores contained the average representation of the surface water from the vicinity of the shallow embayment. The main basin accretion ice is much more clear (low ion concentrations and biomass), and the further the glacier moves from the peninsula, almost none of the inclusions were previously found. Also, the cell counts for those ice cores from 3606 and 3621 m ranged from 5-4 and decreased to almost 2 cells/ml. The higher number of cells in the ice core 3606 also points towards biological activity more likely to be present in the shallow embayment as it accreted close to the eastern part of the peninsula, while the ice from the 3621 m depth was accreted over deep water.

Several previous studies indicated that there is a possibility for the geothermal activity in the vicinity of the shallow embayment (Priscu et al. 2005; Bulat et al. 2004, 2011; Karl et al. 2008;

Bell and Karl 1999; Bell et al. 2002). This could provide an explanation for the high cell counts and species diversity in the ice cores from the shallow embayment region, and low concentration of nucleic acids and biomass in the main basin.

Lake Vostok, being buried under 4 km of ice, represents not just one of the most extreme environments on the planet, but also can be used as a model for further investigations of planets with similar environments. If life forms are present in Lake Vostok, this means that it might be possible to find life forms under the ice crust of the Europa or underneath Mars glaciers as all share similar geochemical conditions with Lake Vostok (i.e. geothermal activity, anaerobic 45 conditions near vents, liquid water, absence of light, the pressure is different probably due to different gravity). Taxonomic and metabolic analyses in this study tie together previous results for microbial analysis, estimated ecology and evolution of the lake itself, as well as proposed possible energy sources, that have been driving the evolution of those microbes for more than 15

Myr.

Materials and methods

Molecular analysis

Briefly, four ice cores sections from the depths of 3563, 3585, 3606, and 3621 m were obtained from the National Ice Core Laboratory (NICL, Denver, CO) and surface decontaminated using appropriate decontamination protocol (Rogers et al. 2004, 2005). The core sections were transferred into sterile funnels and melted at room temperature by collection of 30-

50 ml aliquots. A total of 250 ml of meltwater was used for each sample. The V5 sample included core sections at 3563 and 3585 m (corresponding to the shallow embayment), and V6 included core sections 3606 and 3621 m (corresponding to the southern main basin of Lake

Vostok). The meltwater from Vostok ice cores was subjected to filtration using 0.22 μm polycarbonate filters. The filters were placed into the Petri plates, sealed with parafilm and stored in -80°C freezer, while the filtrate was ultracentrifuged at 100,000 rcf (speed = 32,500 rpm, rotor type Beckman 60Ti) for 16 h. Both Vostok samples were prepared on different days as well as two control water samples to eliminate the chance of potential contamination. The autoclaved

1DQRSXUHZDWHUFRQWUROVDPSOH 0ȍ SSE72& DQG1DQRSXUHZDWHUFRQWUROVDPSOH

0ȍ SSE72& ZHUe prepared and processed by Caitlin Knowlton (Knowlton et al.

2013). Rehydrated pellets were subjected to nucleic acids extraction with MinElute Virus Spin

Kits (QIAGEN, Valencia, CA). Extracted nucleic acids contained primarily rRNA sequences; 46 however, the present DNA provided mRNA genes and other genomic sequences. Using random hexamer primers from the SuperScript cDNA kit (Invitrogen, Carlsbad, CA, USA) cDNA libraries were constructed. Both DNA and cDNA were used for adapter ligation (EcoRI/NotI adapters: AATTCGCGGCCGCGTCGAC, dsDNA) and size fractionation with column chromatography. Selected fractions were PCR amplified with EcoRI/NotI adapter primers, and 1

ȝODOLTXRWRIHDFK3&5SURGXFWZDVTXDQWLILHGZLWKJHOHOHFWURSKRUHVLVRQDJDURVHJHOE\ comparing to pGEM4Z (Promega, Madison, WI). Lastly, PCR products were reamplified with the 454-specific primers. Each primer contained either specific 454 sequences

A (CGTATCGCCTCCCTCGCGCCA) or B (CTATGCGCCTTGCCAGCCCGC) on the 5’ ends followed by 4 nucleotide tags (TCAG) and EcoRI/NotI adapter sequence on the 3’ ends. All PCR products were cleaned with a PCR purification kit (Qiagen, Valencia, CA) and their nucleic acids concentration was calculated on the agarose gel. After adjusting concentrations to approximately

1 μg/μl, 20 μg of each sample was sent to Roche 454 Life Sciences 454 Technologies (Roche,

Branford, CT) for 454 pyrosequencing using a 454 GS Junior System.

All molecular biological procedures: DNA and RNA extractions, cDNA synthesis, adapter ligations, PCR, and gel electrophoresis were performed by Zeynep Koçer and Scott O. Rogers

(Rogers et al. 2013 and Shtarkman et al. 2013).

Sequence analyses

Three types of files from 454 GS Junior sequencing were uploaded to the Ohio Super

Computer (OSC, Columbus, OH; http://www.osc.edu/supercomputing/software/) server for further analysis. The fna files contained fasta formatted reads with dimensional characteristics.

The qual files indicate the quality of the sequence information for each read, indicating 47 consistency scores for each nucleotide at each position. Finally, the sff files contained sequences in fasta format, as well as additional information about them. The data section for each read included in the sff file consists of the universal accession numbers (h), sequence information (s), quality scores of basecalls (q), clipping positions (c), flowgram values (f), and flowgram indices

(i). By using any of these features, one can select the combination that is relevant to specificity of the analysis. The sff files were unpacked with extract.py, which is a Python shell script for sequence analyses and manipulations. Python (Python Software Foundation; http://python.org/psf/) (Rossum and de Boer 1991) was used to extract sequences and quality files. The 454 primers were identified by the command line: $ sff_extract -s seq.fasta -q qual.fasta -x anci.xml file_name.sff where –s, –q and –x stand for the name of the sequence file, the quality file, and the ancillary xml file that are going to be exported and created out of the “sff” file. As long as the number of primer nucleotides was detected, they were clipped off with the next command line:

$ sff_extract --min_left_clip=nucleotide_numer file_name.sff

Where --min_left_clip stands for the minimal number of nucleotides to be clipped off from the left side of the sequence. After the primers were clipped off, the 454 reads were assembled

(Appendices 1-2).

MIRA 3.0.5 (Whole Genome Shotgun and EST Sequence Assembler; http://mira- assembler.sourceforge.net/) (Chevreux et al. 1999) was used for assembly. By using a batch file, the program performed a multiple sequence alignment and searched for the overlaps.

Subsequently, it automatically assembled the reads together into longer contiguous sequences.

All three files from clipping were renamed (Filename_in.454.fasta,

Filename_in.454.fasta.qual, Filename_traceinfo_in.454.xml) and placed directly into the 48

MIRA 3.0.5 folder, inside the working bin directory. We used BBEdit (Bedford, MA, Copyright

©1992-2012 Bare Bones Software, Inc.: http://www.barebones.com/) text editor (use any text editor, like: WordPad, TextEdit, Notepad) to create a batch pbs file. The sample names were used in the name of the MIRA batch file for convenience (Appendix 3). Initial taxonomic analyses were performed on the MG-RAST server (http://metagenomics.anl.gov/) (Meyer et al.

2008).

It is important to mention that even though both V5 and V6 454 reads were originally assembled, both samples were also analyzed as 454 reads. After evaluation of the quantity and the quality of the assembled results versus unassembled, the conclusion was to use assembled contigs for the V6 samples and use the reads (after clipping off of the primer sequences) for the

V5. This decision was based on several important factors. The total nucleotide count for the V5 sample was more than 36.70 Mb of sequences, whereas the V6 sample had 1.17 Mb of sequences. The quality of the V5 454 reads was much higher (mean length = 388 bp) than in V6

(mean length = 225 bp). Additionally, phylogenetic analyses and examination of aligned sequences indicated that the assembly was inaccurate for many of the contigs that were assembled. Lastly, taxonomic analyses revealed that 25-30% of unique records were lost from the V5 sample after assembly. During the assembly process, reads with low assembly alignment scores and small quantities were discarded by the program as low complexity regions. The 454 sequencing of the V6 sample contained lower sequence concentrations and quality of nucleic acids than in V5, thus the V6 assembled sequences were used in order to preserve valuable unique records. For the V5 sample analysis it was decided to use reads instead of contigs. 49

Taxonomy identification and database construction

Unpadded fasta files with the assembled V6 contigs (padded file contains assembled sequences with included gaps due to the assembly alignments; unpadded file - no gaps included) and V5 454 reads were used for BLAST sequence similarity searches through the OSC server.

Batch nucleotide BLAST searches (using the Mega BLAST algorithm) were performed to determine fine taxonomic level and gene identities. A cut-off value of less than 1e-10 was used to select potential matches. The top 10 highly similar sequences were retrieved (Appendix 4). This generated more than 40,000 records. The default output format (outfmt 0) was selected in order to retrieve complete pairwise alignment information, BLAST scores, and complete subject descriptions (Camacho et al. 2008). The output files from the nucleotide BLAST results were converted into tabular format with the Blast2Table.pl perl script on a local PC, which was loaded with the biosoftw module (Appendix 6). It is important to mention that when the number of contigs is greater than 40,000, it is best to split them into smaller files for BLAST analysis.

The results can be concatenated later. This will lessen the chance of the search being cut off before completion because of exceeding the maximum computation time allowed by OSC.

The BLAST results files were opened with the FileMaker Pro11 (Copyright © 1994-2012,

FileMaker, Inc. Santa Clara, CA; http://www.filemaker.com/) and formatted for the database.

Prior to the taxonomy analysis, duplicate records were eliminated from the database based on the repeated Gene Identification numbers (GI number) retrieved from National Center for

Biotechnology Information (NCBI, Bethesda, MD; http://www.ncbi.nlm.nih.gov/). Created

FileMaker scripts allowed sorting of the records by GI numbers (bringing the duplicate hits together), then highest High-scoring Segment Pairs (HSP), lowest e-value and highest percent identity (Appendix 7). Once all the data was sorted, the script preserves the first record of each 50 duplicate set and deletes the rest of the duplicate records (Appendix 7). Then, the same exported tabular database was uploaded to the Galaxy web server (http://main.g2.bx.psu.edu/) (Goecks et al. 2010; Blankenberg et al. 2010; Giardine et al. 2005) in order to retrieve the taxonomic representations (Fetch taxonomic representation tool; version 1.1.0) based on GI numbers that matched V5 and V6 sequences (Appendix 8). Files containing only sequences were also converted to tabular formats using the Galaxy web server and were imported into the FileMaker databases. In the database, they were then linked with imported BLAST results by sequence names. The taxonomy results were imported into the databases by linking the GI numbers of the hit records. Query sequences were manually evaluated for misassembled segments based on the nucleotide Mega BLAST scores and alignment positions. The reevaluated and corrected sequences were then added to the database.

Primary taxonomic analysis was performed by screening the database and calculating numbers of different species per each Phylum. All data was structured in a way that records represented two categories: rRNA gene sequences and those matching complete genome records on NCBI. Complete genome records were subjected to manual examination in order to determine the exact DNA regions (or a product of this gene in terms of mRNA’s) of homology. Ribosomal

RNA genes were prepared first for submission to the NCBI GenBank database. Gene identification numbers from the BLAST searches were used to retrieve hit sequences (NCBI

Batch entrez; http://www.ncbi.nlm.nih.gov/sites/batchentrez). They were combined into one file with the assembled reads and aligned with Multiple Sequence Alignment using Fast Fourier

Transform (MAFFT v6.833b) (Katoh et al. 2002, 2005). The rRNA sequences were used for

Neighbor-Joining Phylogenetic Analysis Using Parsimony (NJ using PAUP; Portable version

4.0b10 for Unix) (Swofford 2000) analyses to confirm taxonomic determinations, and to assess 51 phylogenetic affinities for sequences that were determined to be closest to sequences from unknown and uncultured environmental organisms (further investigation is still needed)

(Appendices 9-10). Full metagenomic and metatranscriptomic dataset was also submitted to

NCBI high-throughput DNA and RNA sequence read archive (SRA) under BioProject accession number PRJNA212629 (id number: 212629).

Metabolic map reconstruction

The 454 reads from both samples were used for the Bi-directional Best Hit (BBH) analysis with the Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server (KEGG

KAAS, Kyoto, Japan; http://www.genome.jp/tools/kaas/) (Moriya et al. 2007). The sequences were compared with known genomes from 40 taxa. The default set of 23 taxa has: hsa – Homo sapiens; dme – Drosophila melanogaster; cel – Caenorhabditis elegans; ath – Arabidopsis thaliana; sce – Saccharomyces cerevisiae; cho – Cryptosporidium hominis; eco – K-12 MG1655; nme – meningitidis MC58; hpy – pylori 26695; rpr –

Rickettsia prowazekii Madrid E; bsu – Bacillus subtilis; lla – Lactococcus lactis subsp. lactis

IL1403; cac – Clostridium acetobutylicum ATCC 824; mge – Mycoplasma genitalium; mtu –

Mycobacterium tuberculosis H37Rv; ctr – Chlamydia trachomatis D/UW-3/CX; bbu – Borrelia burgdorferi B31; syn – Synechocystis sp. PCC 6803; bth – Bacteroides thetaiotaomicrondra; dra

- radiodurans; aae – ; mja – jannaschii; ape -

Aeropyrum pernix were provided by the KAAS-KEGG site, with an additional 17 taxa added manually: cne - Cryptococcus neoformans JEC21; tps - Thalassiosira pseudonana; ddi -

Dictyostelium discoideum; bma - mallei ATCC 23344; cje - jejuni

NCTC11168; dvl - Desulfovibrio vulgaris DP4; ccr - Caulobacter crescentus CB15; mlu - 52

Micrococcus luteus; aca - Acidobacterium capsulatum; fjo - Flavobacterium johnsoniae; fsu -

Fibrobacter succinogenes; fnu - Fusobacterium nucleatum; ote - Opitutus terrae; gau -

Gemmatimonas aurantiaca; rba - Rhodopirellula baltica; cli - Chlorobium limicola; cau -

Chloroflexus aurantiacus. The KAAS KEGG genes for enzymes and proteins (which were similar to V5 sequences) were used to reconstruct possible metabolic pathways present in the

Lake Vostok accretion ice samples. The resulting global metabolic maps contained only the names of the reconstructed pathways, but each mapped pathway had its own enzymatic characterization rebuild based on the BBH alignment results from the KAAS KEGG server.

Basically, KEGG annotation server performs BLAST and translated nucleotide BLAST searches

(tBLASTx) for the query sequence(s) as one genome in forward and reverse order. That identifies best matches (score > 60 bits) from each selected genome on KEGG (no less than two: forward and reverse matches). Scores for the pair of genes from the given genomes can vary between forward and reverse alignments; their best score would not always reflect the Smith-

Waterman standards (Moriya et al. 2007). Therefore, for the same query sequence KEGG divides forward alignment bit scores with reverse for each subject genome (Rf) then vice versa (Rr), automatically saving the highest bit score and orthology number for the gene of each selected genome. Then, by multiplying Rf x Rr, the program calculates the Bi-directional hit rate (BHR), which never exceeds value of 1 (100% identity). The highest BHR (more than 0.95) defines the reference genome and the highest bit score for forward or reverse gene direction defines the orthology gene number of that genome (Moriya et al. 2007).

Metabolic reconstructions were performed in several steps. First, 454 nucleotide reads were compared (using BLAST searches) to genomes in the KEGG gene orthology database. The detected enzymes for the various metabolic pathways were highlighted on the pathway maps. 53

Then, the maps were downloaded with the tables of the orthologous genes for the selected enzymes. Each metabolic pathway was manually screened and evaluated to detect the direct function of the highlighted enzymes. Then, these evaluated pathways were redrawn using

PowerPoint (Office 2010, Microsoft, Redmond, WA), based on the number and physiology of the enzymes within each pathway, thus predicting the characteristics of the metabolic pathways.

Finally, the pathways were linked together based on the reactants, products, and shared enzymes.

Final metabolic pathways were redrawn from PowerPoint into Cell Designer (The Systems

Biology Institute, Tokyo, Japan; http://www.celldesigner.org/) using only graphical properties

(functional properties were discarded) (Kitano et al. 2005). At the same time, each V5 sequence that exhibited similarity to the gene of the KAAS-KEGG server was manually queried through

NCBI BLASTn and tBLASTx with a cut-off threshold of 1e-10. The NCBI BLASTn and tBLASTx results were retrieved and compared to the records previously defined by NCBI as complete genome records during the first BLASTn search with OSC to assure that the same sequences were reported only once. All sequences similar to portions of rRNA genes with percent’s identity more that 99% and those similar to mRNA sequences the stringency of more than 70% positive substitutions were selected. In order to differentiate BLASTn results from tBLASTx results identity percent’s, which corresponded to the BLASTn results are shown in square brackets “[]”, translated nucleotide BLAST results are shown with additional asterisk mark

“[*]”.

Comparisons of taxonomic diversity were based on the ribosomal RNA genes and complete genomes. Together with derived global metabolic map, the ecology of the lake was deduced. 54

One of the future directions will be to compare our metagenome with other metagenomic databases. For this purpose, BLAST searches will be used to compare sequences from pairs of metagenomic databases (Appendix 11).

Water control samples

All metagenomic sequences were screened for potential contamination. Two water control samples (processed using the same protocols described above) were used to track possible contaminants (Knowlton et al. 2013). Both V5 and V6 query data were first screened for the same subject GI numbers in the water control samples. Then, V5 and V6 sequences matching those found in the water controls were removed and the second screening was performed. The remaining sequences were screened for the low quality sequences and those that match control samples at lower e-values and % similarity were also removed from the analysis. Lastly, all records containing identical species descriptions as in the water control samples were removed.

The contaminant sequences removed from the analysis can be found in the Table S13 and S14.

Based on performed bioinformatics screening tests, laboratory sterility conditions of the procedures, as well taxonomic diversity and ecological characteristics of sequences, they are unlikely to be contaminants.

Results

A total of 36,754,464 bp of sequence data was obtained from sample V5 that included 94,728 high quality 454 sequence reads, with a mean length of 388 bp. For the V6 sample, a total of

1,170,900 bp of sequence data was obtained that included 5,204 high quality reads, with a mean length of 225 bp (Table 1). Overall, approximately 15% of the sequences were unique, while the 55

Table 1. Overall sequences distribution within two Lake Vostok samples. Numbers next to percentages of rRNA gene sequences and unknown species indicate count of sequences in each sample. Those sequences that were similar to bacterial species are abbreviated to - bac., eukaryotic – euk., archaeal – arch., and viruses – vir.

V5 sample V6 sample

Total base pair count 36,754,464 bp of sequences 1,170,900 bp of sequences 94,728 5,204 High quality reads mean length of 388 bp mean length of 225 bp 3620 (bac.) + 207 (euk.) + 2 (arch.) 139 (bac.) + 26 (euk.) = 165 Number of unique sequences + 2 (vir.) = 3831 Bacteria 94% of the unique sequences 84% of the unique sequences Unknown Bacteria 50% (1823) 34% (48) Unknown Eukarya 14% (29) 18% (5)

 56 remaining 85% were additional copies from the unique set of sequences. The V5 pyrosequencing reads and assembled V6 contigs were subjected to BLASTn searches of the NCBI nucleotide database through the Ohio Super Computer center (OSC), which yielded over 30,000 sequences similar to the Lake Vostok sample V5 and 270 sequences similar to the V6 sequences (e-value cut off 10-10). Also, the V5 pyrosequencing reads were submitted to the KAAS KEGG server and the results from these searches were manually re-analyzed with NCBI BLASTn and tBLASTx tool.

The retrieved NCBI gene identification numbers (GI numbers) were used for the taxonomy determinations and duplicate record removal.

Approximately 94% of the unique sequences in V5 and 84% unique assembled sequences in

V6 were from Bacteria (Table 1). Only two unique Archaea sequences were found in the V5 sample, and none were found in the V6 sample. The remaining 207 sequences from V5 and 27 sequences from V6 were closest to various eukaryotic organisms, most of which were Fungi.

Based on the rRNAs, complete genome records, and mRNA sequences, the taxonomic analysis yielded a total of 3827 sequences from V5 and 165 sequences from the V6 sample (Table 2, 3).

Approximately 78% (2843) of the V5 sequences that matched bacteria and 68% (141) of the eukaryotic sequences were portions of rRNAs (Table 2). In the V6 sample, approximately 60%

(83) of the bacterial sequences and 80% (21) eukaryotic sequences were portions of rRNA genes

(Table 3). In both, sample sequences closest to bacteria were most similar to those from members of the Actinobacteria, Firmicutes, Alphaproteobacteria, Betaproteobacteria, and

Gammaproteobacteria. However, the total number of sequences for the V6 sample was from 10 to

30 times smaller than those for V5 (Table 2, 3). Only one sequence closely related to

Bacteroidetes and seven Alphaproteobacteria species were found in V6 sample, while more than

120 sequences were found for both phyla in V5 (Table 2, 3). Interestingly, sequences closely 57

Table 2. Numbers of the unique sequences recovered from the V5 sample. Total [1+2+3+4] is a summary of all unique sequences for the BLASTn searches with stringencies <97% and ≥97% identity and BLASTx searches with the stringencies <70% and ≥70% positive. Total [1] shows numbers of sequences closely related to various organisms based on the rRNA matches, total [2+3] is a summary of sequences from BLASTn search matching complete genome NCBI records and confirmed with KAAS KEGG analysis, and total [4] shows KAAS KEGG results that were confirmed with manual BLASTx searches. Metagenomic sequences matching

NCBI records with the BLASTn identity ≥99% and BLASTx positive stringencies ≥90% were used to estimate geomicrobiology and ecology of the lake.

V5 sequence count based on V5 sequence count based on V5 sequence count based on complete genome and mRNA all ribosomal RNA sequences KAAS KEGG BLASTx ∑ Taxon sequences from KAAS KEGG from BLASTn searches results (% positive) BLASTn searches

BACTERIA < 97% ≥ 97% ≥ 99% total[1] < 97% ≥ 97% ≥ 99% total[2+3] < 70% ≥ 70% ≥ 90% total[4] total[1+2+3+4] 1 0 0 1 0 1 0 1 1 - - 1 3 Actinobacteria 53 106 73 159 31 3 1 34 - 9 4 9 202 Bacteroidetes 27 42 27 69 29 1 1 30 1 32 19 33 132 0 0 0 0 1 0 0 1 - - - 0 1 Cyanobacteria 83 77 51 160 113 28 6 141 1 4 2 5 306 Deferribacteres 0 1 0 1 0 0 0 0 - - - 0 1 Deinococcus-Thermus 0 2 0 2 2 4 2 6 - - - 0 8 1 0 0 1 0 0 0 0 - - - 0 1 Firmicutes 184 221 158 405 80 93 66 173 2 33 13 35 613 5 3 3 8 2 1 0 3 - - - 0 11 Uncultured Bacteria 720 1011 626 1731 58 9 5 67 0 0 0 0 1798 Uncultured Gram+ bacterium 3 0 0 3 0 0 0 0 - - - 0 3 Uncultured Marine bacterium 0 5 5 5 1 0 0 1 - - - 0 6 Uncultured Rumen bacterium 1 2 1 3 0 0 0 0 - - - 0 3 58

Table 2. Cont.

Uncultured Soil bacterium 4 8 6 12 0 1 0 1 - - - 0 13 Planctomycetes 4 0 0 4 1 1 1 2 - 1 1 1 7 Proteobacteria (alpha) 20 32 18 52 47 5 3 52 1 17 7 18 122 Proteobacteria (beta) 11 26 19 37 44 49 35 93 2 12 6 14 144 Proteobacteria (delta) 3 2 0 5 5 0 0 5 1 1 - 2 12 Proteobacteria (epsilon) 2 3 1 5 1 0 0 1 - - - 0 6 Proteobacteria (gamma) 64 101 75 165 20 9 4 29 - 14 12 14 208 Proteobacteria (symbionts) 0 1 0 1 0 0 0 0 - - - 0 1 Proteobacteria (uncultured) 3 3 2 6 0 0 0 0 - - - 0 6 0 3 2 3 0 1 1 1 - 0 0 0 4 Tenericutes 4 0 0 4 0 0 0 0 - - - 0 4 0 1 0 1 2 0 0 2 - 1 - 1 4 (Verrucomicrobia) 0 0 0 0 0 0 0 0 1 - - 1 1 BACTERIA total values 1193 1650 1067 2843 437 206 125 643 10 124 64 134 3620

EUKARYOTA < 97% ≥ 97% ≥ 99% total[1] < 97% ≥ 97% ≥ 99% total[2+3] < 70% ≥ 70% ≥ 90% total[4] total[1+2+3+4] AMOEBOZOA 1 0 0 1 0 0 0 0 - - - 0 1 (soil limax amoebae) ARCHAEPLASTIDA () (green alga) 7 2 0 9 0 1 1 1 - - - 0 10 Rhodophyta () 0 0 0 0 0 1 0 1 - - - 0 1 Streptophyta (land ) 12 13 6 25 19 13 9 32 - - - 0 57 CHROMALVEOLATA Bacillariophyta (diatom) 1 2 2 3 0 0 0 0 - - - 0 3 Ciliophora (protozoan) 1 1 1 2 0 0 0 0 - - - 0 2 Cryptophyta (algae) 0 0 0 0 1 0 0 1 - - - 0 1 Heterokontophyta (water molds) 0 3 2 3 0 2 1 2 - - - 0 5 Perkinsea (Alveolata) 0 0 0 0 0 1 1 1 - - - 0 1 Kinetoplastida () 0 0 0 0 0 1 1 1 - - - 0 1 59

Table 2. Cont.

Percolozoa (Protozoa) 0 0 0 0 0 1 0 1 - - - 0 1 OPISTHOKONTA (Animalia) Arthropoda 5 3 1 8 4 2 0 6 - - - 0 14 (sea anemone) 0 0 0 0 1 0 0 1 - - - 0 1 Bilateria environmental sample 1 0 0 1 0 0 0 0 - - - 0 1 0 1 1 1 0 0 0 0 - - - 0 1 Rotifera (microscopic) 0 1 0 1 0 0 0 0 - - - 0 1 Tardigrada (water-bear) 1 0 0 1 0 0 0 0 - - - 0 1 OPISTHOKONTA (Dikarya) Ascomycota 7 26 18 33 5 4 3 9 1 - - 1 43 Basidiomycota 14 9 5 23 4 0 0 4 1 3 2 4 31 0 1 1 1 0 0 0 0 - - - 0 1 uncultured 2 9 6 11 0 0 0 0 - - - 0 11 uncultured soil fungus 1 0 0 1 0 0 0 0 - - - 0 1 uncultured marine fungus 0 1 1 1 0 0 0 0 - - - 0 1 RHIZARIA () 0 0 0 0 1 0 0 1 - - - 0 1 uncultured 7 6 3 13 0 0 0 0 - - - 0 13 uncultured phototrophic 1 1 1 2 0 0 0 0 - - - 0 2 eukaryote uncultured rumen protozoa 0 1 1 1 0 0 0 0 - - - 0 1 EUKARYOTA total values 61 79 49 141 35 26 16 61 2 3 2 5 207 TOTAL VALUES 1254 1730 1116 2984 472 232 141 704 12 127 66 139 3827

60

Table 3. Numbers of the unique sequences recovered from the V6 sample. Total [1+2+3+4] is a summary of all unique sequences for the BLASTn searches with stringencies <97% and ≥97% identity and BLASTx searches with the stringencies <70% and ≥70% positive. Total [1] shows numbers of sequences closely related to various organisms based on the rRNA matches, total [2+3] is a summary of sequences from BLASTn search matching complete genome NCBI records and confirmed with KAAS KEGG analysis, and total [4] shows KAAS KEGG results that were confirmed with manual BLASTx searches. Metagenomic sequences matching

NCBI records with the BLASTn identity ≥99% and BLASTx positive stringencies ≥90% were used to estimate geomicrobiology and ecology of the lake.

V6 sequence count based on V6 sequence count based on all V6 sequence count based on complete genome and mRNA ribosomal RNA sequences from KAAS KEGG BLASTx results ∑ Taxon sequences from KAAS KEGG BLASTn searches (% positive) BLASTn searches

BACTERIA < 97% ≥ 97% ≥ 99% total[1] < 97% ≥ 97% ≥ 99% total[2+3] < 70% ≥ 70% ≥ 90% total[4] total[1+2+3+4] Actinobacteria 1 4 3 5 3 2 0 5 0 1 1 1 11 Bacteroidetes 0 0 0 0 0 1 1 1 0 0 0 0 1 Chloroflexi 0 1 1 1 0 0 0 0 0 0 0 0 1 Deinococcus-Thermus 0 0 0 0 1 0 0 1 0 0 0 0 1 Firmicutes 2 3 3 5 4 2 0 6 0 1 1 1 12 Fusobacteria 0 0 0 0 1 0 0 1 0 0 0 0 1 uncultured bacterium 17 25 12 42 3 1 1 4 0 0 0 0 46 uncultured compost bacterium 0 1 0 1 0 0 0 0 0 0 0 0 1 uncultured Gram+ bacterium 0 1 1 1 0 0 0 0 0 0 0 0 1 Proteobacteria (alpha) 1 3 2 4 2 1 1 3 0 0 0 0 7 Proteobacteria (beta) 6 3 2 9 11 8 2 19 0 6 2 6 34 Proteobacteria (gamma) 4 9 7 13 3 4 2 7 0 1 0 1 21 Proteobacteria (uncultured) 1 1 0 2 0 0 0 0 0 0 0 0 2 61

Table 3. Cont.

BACTERIA total values 32 51 31 83 28 19 7 47 0 9 4 9 139

EUKARYOTA < 97% ≥ 97% ≥ 99% total[1] < 97% ≥ 97% ≥ 99% total[2+3] < 70% ≥ 70% ≥ 90% total[4] total[1+2+3+4] ARCHAEPLASTIDA (Viridiplantae) Streptophyta (land plants) 0 2 2 2 0 0 0 0 0 0 0 0 2 OPISTHOKONTA (Animalia) Arthropoda 0 0 0 0 1 4 1 5 0 0 0 0 5 OPISTHOKONTA (Dikarya) Ascomycota 3 7 3 10 0 1 1 0 0 0 0 0 10 Basidiomycota 0 4 3 4 0 0 0 0 0 0 0 0 4 uncultured fungus 1 3 1 4 0 0 0 0 0 0 0 0 4 uncultured soil fungus 0 1 1 1 0 0 0 0 0 0 0 0 1 EUKARYOTA total values 4 17 10 21 1 4 1 5 0 0 0 0 26 TOTAL VALUES 36 68 41 104 29 24 9 52 0 9 4 9 165

62 related to more than 300 cyanobacterial sequences, as well as photosynthetic Chromalveolata and

Archaeplastida sequences were found in the V5 sample, while none were present in the V6 (Table

2, 3). Both samples contained sequences closest to species of Ascomycota and Basidiomycota contributed the highest quantities of eukaryotic sequences along with and

Streptophytes (Table 2, 3). Microbial taxonomy and their physiological characteristics were estimated based on the results from nucleotide and translated BLAST searches. In order to be confident in the and species levels determinations, a stringency of 99-100% in nucleotide identities were selected for the rRNA matches. For the protein alignments, the stringency cut-off was •SRVLWLYHVXEVWLWXWLRQV7KLVLQFOXGHVUHVLGXHV (amino acids) that are identical and those that have similar physico-chemical properties, thus increasing similarity. Any two amino acid sequences are considered homologous if their alignment identity value is higher than 35%.

The first bi-directional best hit search was performed with the KAAS KEGG server and the metagenomic sequences that matched the annotated KAAS KEGG genes were selected for the second manual NCBI BLASTn and tBLASTx searches using a cut-off value of 1e-10. The second set of searches UHVXOWHGLQDOPRVWDOOVHTXHQFHVEHLQJ•LGHQWLFDOWR1&%,VHTXHQFHVRI organisms with gene descriptions that matched those from the first search. Only six records for the

VHTXHQFHVPDWFKLQJ”LQLGHQWLWLHVEXW•SHUFHQWSRVLWLYHVXEVWLWXWLRQVIRUWKHDPLQRDFLG sequences were excluded from the analysis. They were V5 sequences that matched Bacteroides salanitronis [*48%], Lentisphaera araneosa [*49%], Clostridium saccharolyticum [*50%],

Clavispora lusitaniae [*43%], Desulfovibrio piger [*50%], and Ralstonia eutropha [*50%]

(marked violate in the Table S11). Two records were just above the 50% identity level closely related to Calothrix sp. [53%], Pedobacter sp. [*53%] (marked green in the Table S11). For the

9VDPSOH  RIDOOEDFWHULDODQG  RIDOOHXNDU\RWLFVHTXHQFHVZHUH• 63

LGHQWLFDOWRULERVRPDO51$VHTXHQFHVDQGRUKDYLQJ•SRVLWLYHVXEVWLWXWLRQVLQWKHP51$ sequences. Of these, 51% were similar to sequences from uncultured bacteria and 18% were similar to sequences from uncultured eukaryotes (Table 2, 4, S1-S5, S11). Within the V6 sample,

30% (42) of all bacterial and 42% (11) of all eukaryotic sequences were found for the same levels of identity and positive substitutions. At high stringency, 33% (14) were closest to sequences from uncultured bacteria, and 18% (2) were closest to sequences from uncultured eukaryotes (Table 3,

5, S7-S10, S12). Among bacterial sequences, 50% (1823) of V5 and 34% (48) of V6 sequences were closest to sequences from bacteria that could not be taxonomically classified. For the sequences that were closest to those from eukaryotic organisms, 14% (30) from V5 could not be taxonomically classified. Among these were sequences similar to those of unknown marine and soil fungi, various eukaryotes, including 2 prototrophs, one rumen protozoan, and an unknown bilaterian (Table 2, 4, S1-5, S11). Within the V6 dataset, only 5 sequences (23%) were lacking

NCBI taxonomic affiliations, although they were closest to unknown fungal sequences (Table 3, 5

S7-10, and S12). Those sequences that most closely matched sequences from uncultured and unidentified bacterial and eukaryotic NCBI records will require phylogenetic analyses in order to establish precise taxonomic classifications, which were not performed for this part of this study.

Only two archaea sequences and two virus sequences were found in V5 sample (Table 4, S6).

While archaea sequences were similar to methanotrophs from cold deep-ocean sediments, the one metagenomic sequence was closely related to thermophilic/thermotolerant Enterobacteria phage phiX174 [98%] (Cox et al. 2010). 64

Table 4. The taxonomic summary of results for the V5 sequences. All unique sequences listed in the Tables S1-S6 and S11 were manually screened through the NCBI records to determine their origin. The NCBI BLAST searches for those metagenomic sequences that matched at stringency of 99 and 100% in sequence identities (for mRNA sequences translated BLAST > 90% positive) were used to describe ecological and metabolic characteristics, specifically, information from NCBI descriptions, their reported publications (if present), and other web sources. The BLAST results with stringencies lower than 99% (for BLASTx <90% positive substitutions) were used if none of the results for > 99% were available and are shown in the table in square brackets “[]”, their accession numbers can be found in the Tables S1-S6 and S11. The translated BLAST (BLASTx) results present in this table are marked with additional asterisk“[*]” next to % positive substitutions. For taxonomic groups where the number of NCBI matches was too large to include all of them in the ecological characteristics section were marked “**”.

№ of № of seq Ecological characteristics for metagenomic Matching NCBI GI sequences that were closely related to Metabolic numbers for Taxon unique ≥ 99% sequences of organisms reported on NCBI characteristics sequences 99 and seq identity* gene bank 100% identity* Bacteria 3620 1256 Acidobacteria 3 0 soil, acidophilic, psychrophilic: uncultured heterotrophic, Acidobacteria [97%] [96%]; Granulicella chemoorganotrophic tundricola [*65%] Actinobacteria** 202 78 Actinomyces georgia [98%] ; heterotrophic, 285797411, 284451155, BLASTococcus sp. (previously identified from lake organic 285802785, 285802883, and deep sea sediments); bolidensis decomposition, 285201485, 12641602, (facultative anaerobe found in lake sediment some reduce 281485357, 290759807, enriched with arsenic); Arsenicicoccus piscis [95%] arsenic, capable of 269979854, 239924942, - a previously isolated from the intestinal nitrogen fixation, 285178409, 256826526, tract of a ; Agrococcus sp. [98%] from carbon anammox, nitrite 282598458, 282598461, rich soil; 2 sequences matching Janibacter sp., one and ammonia 291290503, 255926777, from foreshore soil novel marine organisms isolated oxidizing 194680069, 282934985, 65

Table 4. Cont.

from the seas near the Bermuda islands); Leifsonia 197114172, 293629580, spp. (aerobic pathogen); spp. 134105934, 134105935, primarily found in soil and are obligate aerobes 289594400, 111146878, some of which preserved in a permafrost ice wedges 117644155, 119951388, for 25,000 years; Micrococcus spp. saprotrophic 270341314, 292596387, bacterium found in sediment, soil, water and dust; 292596389, 145293728, Mycobacteriaceae species are fast adaptive aerobic 145293734, 158562718, pathogens; Nocardioides sp. one aerobic soil 158562707, 228007468, bacterium previously isolated from Bigeum Island, 295809764, 289598546, Korea and one from alpine glacier 295809753, 260100813, cryoconite; 6 Nesterenkonia spp. some of which 158702950, 219809025, were previously isolate from mud and water 240129659, 116119394, samples of volcanoes in Xinjiang (China), others 294992046, 294992048, were halophilic or halotolerant, alkaliphilic or 158551986, 157703991, alkalitolerant, like Nesterenkonia halotolerans 265678815, 295393241, [93%], Nesterenkonia sandarakina [98%] ; 471965147, 471964577, uncultured Mycobacterium sp. (endolithic 289551862, 260871474, community of Yellowstone geothermal 62736086, 34329838, environment); Nocardioides sp. and Streptomyces 146166675, 105990462, sp. like Streptomyces rimosus [97%] and 154186968, 154199122, Streptomyces sp. 175(2010) [94%] previously 218533798, 284387472, isolated from marine sediment; Leifsonia sp. 290565057, 293618152, (aquatic); Yaniella soli [98%] slightly halophilic, 269113439, 295410271, soil; Atopobium parvulum [98%] marine bacterium; 215983420, 293629578, Phycicola gilvusaerobic, non-mycelium-forming 110238724, 270048059 actinomycete isolated from a living seaweed; Subtercola frigoramans psychrophilic from permanently cold groundwater in Finland; Uncultured actinobacteria from soils of Marble Point and Wright Valley, Victoria Land [94%], Antarctica; hyper-arid polar desert [100%]; integrated lake epilimnion Wisconsin, Cox Hollow Lake [97%]; Atacama Desert soil ammonia- oxidizing bacteria. Bacteroidetes** 132 44 Almost all sequences were closely related to Some of these 60266503, 126131306, sequences of freshwater or wastewater species: species are capable 146298219, 149280567, uncultured Bacteroides sp. (19 sequences) of fixing carbon via 150004775, 154193781, freshwater, ether residing at Lake Michigan the reductive TCA 154193791, 175940971, 66

Table 4. Cont.

beaches, various water sources, river waters, cycle (rTCA) with 184132972, 186926082, anaerobic digestion of sludge, farm soil adjacent to electron donors 189431537, 189431608, a silage storage bunker. Some of the hit sequences from sulfide, 189460866, 189461995, were from oil sands tailings ponds, as well as from hydrogen and 197360255, 203289069, Eastern Mediterranean atmosphere. Several sometimes ferrous 208689585, 213399744, sequences were similar to those from Bacteroidetes ions, pentose 217337429, 237709863, species isolated from soil in rhizosphere of faba phosphate pathway 239619880, 253683837, beans; Sphingobacterium shayense previously 261265014, 281420760, identified from forest soil and Sphingobacterium sp. 283443628, 284451151, from Lake Michigan water; psychrophile Gillisia 290759830, 290759845, limnaea previously found in microbial mats in Lake 291329690, 291329981, Fryxel, Antactica; all sequences similar to 291330802, 291330808, Bacteroidaceae family were previously found in 291330883, 291330884, association with intestinal tract or 291332865, 294613743, excrement; Flavobacterium johnsoniae animal and 300770671, 325298068, plant pathogen; Prevotella melaninogenica obligate 325298624, 332876550, anaerobes; Pedobacter steynii rhizobacteria; 332877849, 357043085, Flavobacterium sp. [98%] from soil environment; 374595657, 392495344 Sphingobacterium shayense [98%] aerobic soil environment; Zunongwangia profunda [96%] is a low-oxygen deep sea bacteria Chloroflexi 1 0 Sphaerobacter thermophilus [83%] green non-sulfur chemoheterotrophic, bacteria, thermophilic, obligate aerobe 3-hydroxy- propionate cycle Cyanobacteria 306 56 psychrophilic/psychrotolerant uncultured Antarctic reductive pentose 1668785, 18182395, cyanobacterium from Lake Fryxell, Taylor Valley phosphate pathway 18182418, 18182490, [96%], Orange Pond and Fresh Pond, McMurdo Ice 18182491, 23978201, Shelf [94%], Forlidas Pond, Transantarctic 29124940, 33327320, mountains [93%], and one uncultured marine 37782175, 45643565, bacterium isolated from Prydz Bay, Antarctic sea 46409897, 67482522, water; 3 uncultured cyanobacterium sequences 82470879, 82697090, matched microbes from soil of Canyonlands 89242016, 105990262, National Park, Utah; Nostoc flagelliforme terrestrial 111610610, 119657303, photosynthetic, Inner Mongolia, Sunitezuoqi; 121308615, 143635043, Nostoc muscorum nitrogen-fixing cyanobacteria in 146740644, 149364155, the Brazilian Amazon floodplain; Nostoc sp. 93359876, 149364160, Mollenhauer associated with species of Peltigera 154361765, 157384116, fungus; Nostoc sp. SKJF2 symbiotic organism of 163962573, 166997748, 67

Table 4. Cont.

hepatic liverwort from the genus Blasia, northern 172050761, 189484695, Norway; Leptolyngbya sp. 0BB32S02 from water 189484726, 213053926, and sediment samples of saline wetland of Salar de 213053927, 213053932, Huasc, Chilean Altiplano; Lyngbya birgei CCC 333 219883489, 225031951, from paddy field; 2 environmental Microcoleus sp. 225382352, 225696183, sequences; Oscillatoria amoena freshwater algae 225696274, 225696281, culture Collection of the Institute of Hydrobiology 225696290, 226844847, of China; Oscillatoria prolifera freshwater; 229563895, 229563905, Oscillatoria sp. PCC 7112 marine; Oscillatoriales 253749514, 258547396, cyanobacterium 2Dp86E associated with marine 260595431, 285015322, hydroid cnidarian Dynamena pumila L. [98%]; 285015409, 291360379, Oscillatoriales cyanobacterium IL-1.4 aquatic, 291603790, 295149401, possible freshwater; Phormidium cf. autumnale 358680771, 428244862, CCALA 757 algae based (marine, aquatic, 428318681, 428319688 freshwater) [98%]; Phormidium sp. 1668785; uncultured Oscillatoriales cyanobacterium clone QB53 isolated from quartz in a Tibet desert; Phormidium autumnale [98%] Cyanobacteria previously found in freshwater river, New Zeland; thermophilic uncultured Thermosynecococcus sp. [95%] Deferribacteres 1 0 previously isolated from animal intestinal tract, heterotrophic, anaerobic, also include thermophilic and mesophilic chemoorganotrophic aquatic environment; schaedleri [97%] Deinococcus-Thermus 8 2 D. marmoris [97%] UV- and draught-tolerant heterotrophic, 37665094, 284055538, bacteria isolated from Antarctic soil and rock chemoorganotrophic 290469363, 294979666, samples; D. radiodurans [97%] radioresistant 226319394; 296848933; polyextremophile - cold, dehydration, vacuum, acidic environment; M. ruber [98%], T. thermophilus [95%], and M. silvanus [99% and 98%] extreme thermophiles previously isolated from hot springs and thermal vents Deinococcus deserti [86%] Fibrobacteres 1 0 previously isolated from animal intestinal tract, heterotrophic, 145567431 anaerobic: uncultured Fibrobacter sp. [96%] chemoorganotrophic 68

Table 4. Cont.

Firmicutes** 613 71 2 sequences 98-99% Bacillus agaradhaerens are heterotrophic 323142324, 15042017, (total numbers) alkaliphilic, halophilic and halotolerant organisms 24415973, 30908820, most commonly found in salt lake sediments (like 45934548, 78038860, Bange salt lake); 3 sequences Bacillus cereus soil, 86279623, 91093763, cold adaptive; Bacillus circulans most commonly 94962022, 110673209, found in soil; Bacillus decisifrondis soil underlying 110673209, 134290402, decaying leaf foliage; Bacillus halmapalus 145902672, 171336058, freshwater Tarim River, China; Bacillus 209552631, 222350117, megaterium soil; Bacillus spp. previously found in 223959348, 229264291, deep sea sediment, soil and soda ponds and lakes of 238769132, 253579944, Hungary, air, marine bacteria in sediments of 253683955, 256797400, Yellow sea South Korea, salt mine deposit China; 257751815, 261368112, aerobic heterotrophic halophilic and halotolerant 269994025, 270297792, bacteria Marinococcus sp. [98% and 99%] from 282765781, 283131629, Bange salt lake and from the Great Salt Plains of 283486716, 283945409, Oklahoma; thermophilic uncultured Bacillus sp. 284428514, 289065368, previously isolated Icelandic geothermal waters and 294337906, 294769167, one soil; Alkaliphilus oremlandii [98%] previously 294799804, 294999187, isolated from sediments of the Ohio River 295315546, 295317476, (Pittsburgh PA); Clostridium sp. F-02 metal- 295646952, 295790202, reducing previously isolated from thermophilic 295815415, 358065153, methanogenic sludge; Eubacterium tenue spore- 407955691, 20975393, forming bacterium; sequences closely related to 85542635, 146430648, uncultured Firmicutes previously isolated from 154186415, 154189328, completely different environments: mesophilic 154189837, 154193192, anaerobic microbes that have been used in 154193211, 260072815, municipal wastewater sludge, thermophiles from 290565065, 291328355, volcanic environments, closely related to a 291329218, 291329502, sequence from a marine (Prostylyssa 291329590, 291329778, foetida) symbiont. Other uncultured Firmicutes 291329796, 291330079, sequences were related to planktonic bacteria from 291330349, 291330933, the northern Bering Sea, ammonia-oxidizing soil 291331108, 291331129, bacteria from the Atacama Desert; Antarctic 291331435, 291332873, terrestrial habitats. 291332909, 294799805, 294799812, 295656276, 295656572 69

Table 4. Cont.

Firmicutes 80 39 : Lactobacillus curvatus lactic acid heterotrophic 354806737, 15146028, Lactobacillaceae bacteria in various fermented foods; Lactobacillus 15146030, 50080123, delbrueckii animal intestinal lactobacilli; 57864919, 90820184, Lactobacillus fermentum; Lactobacillus rhamnosus 103422338, 116103724, gut of healthy juvenile flounder, animal blood 116106497, 121581899, cultures; Lactobacillus salivarius gastrointestinal 163954906, 167046809, tract; Lactococcus lactis associated with paddy rice 189345361, 190711126, silage. 195537732, 237512288, 257149867, 257152781, 285171057, 285201718, 285201754, 285802968, 285802972, 288225761, 288812699, 290760129, 292673284, 292673285, 292673288, 294438970, 294938080, 295149327, 312279338, 325124855, 325332286, 333956887, 355393429, 406356677, 413973243 Firmicutes 38 19 Primarily consisted of species found in sea water heterotrophic 323490869, 389820589, and sediment environments; however several 458759422, 389819483, representatives were isolated form volcanoes mud 323487712, 293633232, samples as well as alpine permafrost from China. 262410342, 255689464, Planococcus maitriensis from Lonar Crater 291419708, 89257980, sediment water; Planococcus psychrotoleratus from 240248421, 209917046, mud volcanoes in Xinjiang; Planococcus sp. 294992062, 198250506, Norway seawater and frozen soil (permafrost) in the 270282461, 257043974, Tianshan Mountains; Planomicrobium koreense soil 187711733, 292485796, from Lop Nur region, Xinjiang; 292485814 Planomicrobium psychrophilum was previously detected in the mud volcanoes in Xinjiang (China) [98%]; Planomicrobium sp. air samples from Xinjiang Province; Planomicrobium flavidum isolated from a marine solar saltern; Sporosarcina sp. from freshly deposited granite sand of Damma glacier (Central Alps) and layered soil South Shetland Islands, Antarctica; 3 sequences [98%, and 70

Table 4. Cont.

two 100%] uncultured Planococcus sp. soda lake on salt-secreting desert tree, glacial snow from glacier in Tibetan Plateau, sandy loam permafrost soil Laptev Sea coast Siberia Firmicutes 29 9 [98%] Jeotgalicoccus halotolerans associated with heterotrophic 289551402, 443424428, marine environment; Jeotgalicoccus sp. YIM 86279588, 219809027, KMY9-1 ancient deposits in the Kunming salt mine 295443962, 242027442, of China; 3 sequences [95%, 96%, 97%] 75753540, 238835868, sp. were from sea water bacterial 259120992 (marine seaweeds, seawater and mussels Perna canaliculus) from the coast of the North Island of New Zealand; Staphylococcus sp. isolated from tobacco whitefly Bemisia tabaci, diseased citrus leaves from Istanbul, and soil; uncultured Staphylococcus sp. animal/human associated, anaerobic microbes in spacecraft assembly clean rooms, air sample collected 25 meters above sea level of Baltic Sea coast Firmicutes 8 5 psychrophilic uncultured Carnobacterium sp. was heterotrophic 257167997, 49617305, Carnobacteriaceae previously detected from a Northeast Siberian 284022001, 292485813, seacoast permafrost sample, associated to Octopus 295147519 maya; 2 sequences were close to Carnobacterium sp. animal associated and fungus associated; Trichococcus sp. Cheon-ho reservoir South Korea Firmicutes 142 64 16 sequences were closely related to Lactococcus heterotrophic 3582195, 11991762, Streptococcaceae lactis primarily plant associated, 5 sequences were 15593133, 21591599, Veillonellaceae related to Streptococcus australis, Streptococcus 24474984, 28274377, salivarius plant pathogen; Streptococcus suis and 60501133, 60501147, other uncultured Streptococcus sp. animal pathogen, 60501679, 60501751, as well as from oral, amniotic and intestinal fluids; 77819579, 110613571, Veillonella parvula strict anaerobe, oral and gut 113120160, 116108977, bacteria; Selenomonas sputigena anaerobic oral and 124244807, 124491690, upper respiratory tract bacteria 159032974, 171191150, 189305923, 206600409, 208657445, 223470134, 225029381, 225724295, 254971972, 254972501, 259221069, 269911992, 71

Table 4. Cont.

281374316, 284176962, 285159329, 284451152, 285159297, 285177736, 285178071, 285178094, 285194533, 285194556, 285206259, 285803105, 290759892, 290759899, 292557464, 294828947, 295002587, 295002588, 295002592, 295322329, 295646981, 295815580, 302024578, 326682110, 335369081, 392602377, 406368402, 413973243, 269093698, 294795056, 285166114, 285166457, 162846335, 62910916, 285166317, 285802763 Fusobacteria 11 3 anaerobic, parasitic on : uncultured heterotrophic, 253684045, 265678963, Leptotrichia sp. [99%]; Clostridium rectum [99%]; chemoorganotrophic 213536832, 257048753 Fusobacterium necrophorum [99%]; Leptotrichia buccalis [*98%] Uncultured bacterium 1823 643 iron-reducing bacterium [97%]; halophilic unknown 290782562, 295018012, bacterium [100%]; uncultured sponge symbiont 295018051, 295018133, PAUC32f [98%]; 5 uncultured compost bacterium; 295026878, 295027011, uncultured Gram+ bacterium; 6 uncultured marine 270305386, 284467687, bacterium - 1 Antarctic coastal site and 4 sequences 284467695, 284467703, close to microbial community of the coral Acropora 284468021, 283981089, eurystoma; 3 uncultured Rumen bacterium 16517886, 194475432, 194475452, 260750894, 16517864, Planctomycetes 7 2 Pirellula staleyipreviously isolated from fresh and Chemoautolitho- 91199943, 283778000 brackish water, as well as from hypersaline lakes, trophic, anammox, anaerobe: Kuenenia stuttgartiensis [100%]; oxidation of nitrite Pirellula staleyi [*94%] to nitrate with ammonium as e- donors for carbon fixation 72

Table 4. Cont.

Proteobacteria (alpha) 122 29 heterotrophic, halophilic Brevundimonas sp. nitrate and nitrite 32263857, 32328294, associated with cyanobacterial blooms also land reducing, nitrifying 55975756, 71844062, plants associated; deep sea water (1000m depth) and denitrifying, 83592425, 111606883, uncultured Agrobacterium sp.; Mesorhizobium ammonia 125656032, 146189981, australicum associated with Biserrula pelecinus L. assimilation, 186923312, 197114179, (Australia); uncultured alpha proteobacterium methylotrophic, 197360272, 206581410, [96%] was previously found in Mid-Atlantic Ridge aerobic and 209423311, 220914710, hydrothermal sediment, another [97%] from un anaerobic, carbon 224027504, 237638489, vegetated soil environments at Coal Nunatak, fixation with rTCA, 242963821, 285200728, Antarctica; psychrophilic Paracoccus sp. previously reductive pentose 288926859, 289064903, isolated from East Antarctica inland, phosphate pathway, 294662650, 294992056, halophilic/halotolerant Paracoccus sp. from saltern capable of carbon 295147952, 295809779, water; associated with deep-water marine monoxide fixation 337270029, 359790610, invertebrates; marine Paracoccus yeei sequence was 393719433, 433773923, similar to those isolated from the Caribbean coast, 440227685 Costa Rica, and marine Paracoccus sp. [98%] from South China Sea; psychrotolerant, phototrophic Rhodobacter changlensis from Himalayas of India; thermotolerant non sulfur purple bacterium Rhodospirillum centenum; anoxygenic Rhodospirillum rubrum grows on carbon monoxide; extremotolerant Sphingomonas dokdonensis; North Sea Sphingomonas sp.; Sphingopyxis alaskensis [97%] Proteobacteria (beta) 144 36 betaproteobacterium enrichment culture from water nitrogen fixation, 407894523, 295656273, sample collected from a natural gas storage (800 m diazotrophic, nitrate 402569635, 77965403, depth); Burkholderia cepacia soil; Burkholderia sp. reduction, oxidation 387903695, 387575654, endophytic bacterium of rice; Burkholderia sp. of ammonia, iron, 387578572, 76665718, previously found in roots of endemic trees of arsenic, and 416909337, 390570583, Madagascar; thermal aerobic Caldimonas manganese, 109659435, 189047087, hydrothermale and Caldimonas manganoxidans; reductive pentose 290350907, 213536827, psychrophilic bacterium from phosphate pathway, 85002019, 300072131, glacier surface of Gulkana Glacier, Alaska; Delftia electron donors are 62183809, 133737197, acidovorans animal pathogen; nitrogen-fixing, reduced sulfur 290759918, 299070035, plant-growth promoting Herbaspirillum compounds and 469772332, 30407127, seropedicae; Herminiimonas arsenicoxydans from As[III] 334194119, 206593202, aquatic environments contaminated with heavy 404395813, 309778701, metals; Ralstonia solanacearum plant pathogen; 109140177, 154192572, 73

Table 4. Cont.

anaerobic intestinal tract as well as sewage 284793547, 254972620, treatment plant uncultured betaproteobacterium; 269113442, 295656441, psychrophilic soil uncultured Burkholderia sp. from 184189965, 238915008, Pindari Glacier; freshwater uncultured 294828897, 189305112 Comamonadaceae bacterium (from Lake Michigan) Proteobacteria (delta) 12 0 aerobic sp. [98%] soil, sea water and rTCA cycle, fresh water environment, mesophilic anaerobic sulphate reducing, uncultured deltaproteobacteria bacterium from iron reducing wastewater treatment [95%]; halophilic Gram- bacteria negative Haliangium ochraceum; uncultured deltaproteobacterium collected at 500m depth in mesopelagic Antarctic waters [84%]; sulfate reducing moderately halophilic, mesophilic Desulfovibrio alaskensis [84%] Proteobacteria 6 1 animal associated (pathogens), aerobic and rTCA cycle 112012332 (epsilon) anaerobic; thermophilic: Helicobacter sp. [100%]; Campylobacter concisus [97%]; uncultured Helicobacter sp. [97%] Proteobacteria 208 90 psychrophilic/psychrotolerant piezophilic hydrocarbon 399543417, 1907095, (gamma)** Marinobacter sp. from Arctic Ocean water depth of degrading, organic 13276758, 22135588, 1568 m; paddy soil Klebsiella sp.; 3 carbon oxidation, 22217940, 28932767, sequences one of which was sulfur oxidation, use 32127599, 33334429, close to primary endosymbiont of Sitophilus oxidized manganese 34525843, 38601962, zeamais; 6 different Halomonas from ether salt and iron as electron 46250582, 46253495, lake, deep-sea or deep-ocean sediment or halophilic acceptors, 63253978, 71065631, bacteria from soil samples; one sequences for heterotrophic , 89093021, 89257986, Halomonas sp. high tolerance to arsenite; 20 chemolithoauto- 108733343, 117551063, sequences matched different Psychrobacter sp. trophic, nitrogen 129561834, 151564445, sequences from various environments: ether cold- fixing, nitrate 151564597, 152061209, adapted soil bacteria or air sample collected 25 reduction, nitrite 154103703, 157367007, meters above sea level (Sweden, Baltic Sea) or respiration, 158562692, 158699333, seawater from the Northern Bering Sea, as well as reductive pentose 163676404, 164506999, from deep-sea sediment of the Pacific Ocean phosphate pathway 164707704, 168812014, (responsible for metal cycling); also Psychrobacter 186702557, 187427012, sp. from cold saline (7.5% salt) sulfidic spring of 192757978, 195364246, Canadian High Arctic, South China seawater, 195969270, 209421463, Antarctic soil, or manganese bacteria previously 209422607, 218203793, isolated from Arctic deep-sea sediment; 220936495, 222142510, 74

Table 4. Cont.

Psychrobacter maritimus from production 225055455, 226815632, wastewater treatment plant (China); Psychrobacter 226944628, 228007453, pulmonis associated with plants; Psychrobacter 238543855, 238623540, arcticus psychroactive Siberian permafrost 239835504, 242124429, bacterium; Psychrobacter faecalis animal 254621816, 259090477, excrement; psychrotrophic halophilic Psychrobacter 259090496, 259121295, immobilis bacteria isolated from marine 262374579, 263040682, environments; 20 different Pseudomonas sp. were 283486725, 283580095, originally found ether in sea coastal ecosystems, 283979647, 284813477, like coastal waters near the station GDN064 at 284999744, 285027202, South China Sea (20 m depth), or deep-sea sediment 285192491, 289186780, (East Pacific region), one of which was polycyclic 289655714, 290759922, aromatic hydrocarbon (PAH)-degrading bacteria 291072775, 291072776, from deep sea sediment; Pseudomonas sp. were also 291195506, 291293761, previously isolated from wastewater and soil 291293769, 294861179, treatment samples, arsenic-contaminated 294992072, 294997060, environment (Environmental Sites in China), and 294999830, 295003953, from activated sludge in a high strength ammonia 295345518, 295345523, wastewater treatment facility; 295393594, 295647398, psychrophilic Pseudomonas sp. were previously 295651552, 295687302, described as free-living heterotrophic nitrogen- 295791645, 295815419, fixing bacteria isolated from fuel-contaminated 295815425, 352106048, Antarctic soils or from oil soil samples capable of 384478111, 400287034, degrading 4-aminobenzaldehyde, or cold-adapted 400287352, 400287901, bacteria from Arctic sea ice (Canada Basin) and 400288130, 400288736 deep-sea extremophiles from Antarctica; 10 uncultured Pseudomonas sp. were related to sequences found in: deep subsurface microbial communities of Nankai Trough sediments, hydrocarbon-contaminated Antarctic soil from around Scott Base or metal-containing water fluid associated with biofilms (Ann Arbor, MI); halophilic Pseudomonas sp. from saltern of South Korea; Pseudomonas xanthomarina alkaliaphilic bacteria isolated form soda ash sludge; an obligate aerobe Azotobacter vinelandii; slightly halophilic, strictly aerobic, motile chemoorganotrophic Neptuniibacter caesariensis 75

Table 4. Cont.

Proteobacteria 1 0 proteobacteria symbiont of Nilaparvata lugens (symbiont) [97%] a rice pest, transmitting rice viruses Proteobacteria 6 2 freshwater or near shore bacterioplankton unknown 184190210, 262409981 (uncultured) communities of Lake Michigan Spirochaetes 4 3 animal pathogen, anaerobic pilosicoli heterotrophic 288225748, 288225749, [*99%] 430780086 Tenericutes 4 0 pathogens and symbiotic bacteria associated with heterotrophic insects and some saprotrophic fungi, anaerobic: Acholeplasma axanthum [91%]; Mycoplasma feliminutum [90%]; Spiroplasma diabroticae [88%]; Acholeplasma equifetale [81%] Verrucomicrobia 5 0 aerobic soil (rhizosphere) and aquatic (freshwater) heterotrophic bacteria: Pedosphaera parvula [98%]; Uncultured Verrucomicrobia [90%]; Akkermansia muciniphila [88%]; Pedosphaera parvula [*73%]; Lentisphaerae [*62%] marine found in surface and mesopelagic waters of the Pacific and Atlantic oceans ARCHAEA 2 2 Uncultured archaea from hydrate-bearing deep carbon fixation with 207366080, 262527001 marine sediments, anaerobic, psychrotolerant reductive acetyl- CoA pathway, methanotrophic EUKARYOTA 207 67 AMOEBOZOA 1 0 Nolandella sp. [88%] aquatic, unicellular, heterotrophic bacteriovores, also feed on diatoms and ARCHAEPLASTIDA (Viridiplantae) Chlorophyta (green alga) 10 1 all green alga: Pseudendoclonium akinetum and reductive pentose 56159573 Pyramimonas tetrarhynchus [98%] ether deep-sea phosphate pathway sediment or freshwater environment; stagnorum [97%] terrestrial from Dolomite mountains, boggy water Rhodophyta (red algae) 1 0 Gracilaria tenuistipitata var. liui Antarctic red alga, reductive pentose similar to Plocamium cartilagineum previously phosphate pathway found in Antarctica 76

Table 4. Cont.

Streptophyta (land 57 15 The origin of land plants sequences in the lake is reductive pentose 219819090, 367479280, plants) undefined. They are ether coming from pollen phosphate pathway 259526188, 166084404, accumulated in the sediment of the lake, or are 170522360, 62149314, being deposited into the lake from the overriding 161621688, 290782507, glacier, or both, either way these are probably non- 148910867, 255099160, viable 209361311, 284506657, 45386022, 329124647, 239764707

CHROMALVEOLATA Ciliophora (protozoan) 2 1 Uroleptus pisces freshwater; Sterkiella heterotrophic 5566332 histriomuscorum [93%] Cryptophyta (algae) 1 0 Cryptomonas paramecium [96%] non- reductive pentose photosynthetic deep freshwater algae phosphate pathway Heterokontophyta 5 3 Aphanomyces euteiches [100%, 98%] plant and reductive pentose 166044520, 46019696, (stramenopiles) animal pathogen; Botrydiopsis constricta soil Signy phosphate pathway 225545967 Island, Antarctica; uncultured labyrinthulid seawater and sediment water mold; Halosiphon tomentosus [97%] brown algae Myzozoa (alveolata) 1 1 Perkinsus marinus parasite of bivalve mollusks (i.e. heterotrophic 294935818 clams, oysters), marine Bacillariophyta (diatom) 3 2 Stephanodiscus sp. marine and freshwater diatom; reductive pentose 98990721, 268633305 Hantzschia amphioxys [95%] freshwater algae from phosphate pathway Australia EXCAVATA (protozoa) 1 0 Naegleria gruberi [98%] free living freshwater non heterotrophic parasitic and soil species Euglenozoa 1 1 Trypanosoma cruzi freshwater unicellular parasitic heterotrophic 336170404 (rinetoplastida)

OPISTHOKONTA (Animalia) Arthropoda 14 1 Lepidocyrtus sp. Yan Gao 06126; heterotrophic 148372021 dorsosignata [98%]; aquatic Daphnia pulex [98%] possibly psychrophilic Bilaterian 1 0 psychrophilic halotolerant Bilateria environmental - sample [92%] from Alps 77

Table 4. Cont.

Chordata 4 1 The origin of the Gallus gallus [99%] sequence in - 218047175 lake is unknown. It is ether a contaminant or from meteoric ice Cnidaria 1 0 halotolerant Nematostella vectensis [78%] small heterotrophic sea anemone found on the Atlantic and Pacific coasts of North America, as well as the coast of southeast England; pH values from <7 to >9, tolerate extremely wide range of temperatures -1 to +28°C Mollusca 1 1 Nutricola tantilla [100%] marine sediment bivalve heterotrophic 159031144

Rotifera 1 0 Adineta vaga [98%] microscopic animal hunting heterotrophic various organisms (protist, bacteria and algae), previously found in Antarctica Tardigrada 1 0 Milnesium tardigradum [93%] waterbear, heterotrophic radiotolerant extremophile, the same species were previously found in Antarctic ocean OPISTHOKONTA (Dikarya) Ascomycota 43 21 Davidiella tassiana, doliolum and heterotrophic 284158823, 294987020, Verticillium dahliae plant pathogens; 6 sequences 169893764, 31415568, matched Didymellaceae species associated with 282160302, 294987114, land plants, however, among those could be animal 294987118, 6537145, pathogens as well as marine Phoma species; 1888314, 254028318, Nigrospora sp. associated with Taxus globosa; 171673225, 90577143, Candida ontarioensis previously isolated from 288557599, 284192847, wood-boring beetle larvae; 5 sequences of 154563033, 208879720, Phaeosphaeriaceae species: Phaeosphaeria sp. soil, 290760004, 156099766, Phaeodothis winteri aquatic, Phaeosphaeria 259145041, 259147625, avenaria and Phaeosphaeria nodorum plant 238033210 pathogens, Phaeosphaeria spartinicola marine fungus; Pichia pastoris methylotriphic fungus; Articulospora tetracladia [98%] aquatic, sedimentary, organic decomposition Basidiomycota 31 7 Cryptococcus sp. Insect (ants) associated; heterotrophic 224979500, 109452380, Rhodotorula lamellibrachiae and Rhodotorula 117168475, 124377860, glutinis [95%] marine deep sea psychrophiles, other 260279026, 164657686, Rhodotorula sp. were previously found in Antarctic 164662831 78

Table 4. Cont.

sea ice; Sistotrema brinkmannii on decaying wet wood; psychrophilic/mesophilic Cryptococcus neoformans; Dioszegia rishiriensis previously isolated from soil, plant associated, glacial melt and Antarctica; Malassezia globosa animal associated; Sakaguchia dacryoidea [98%] marine Zygomycota 1 1 Gongronella sp. xt-2009 [99%] unknown 227452751 uncultured fungus 12 6 uncultured fungus from Lakshadweep coral reef unknown 262358083, 268637040, sand, Arabian Sea; Mojave Desert soil; Lizonia 219563746, 234195395, sexangularis (originally was uncultured fungus 289470189, 85700735 record) is a necrotrophic plant parasite; animal gut associated, plant-pathogen resistance to herbivores uncultured compost fungus uncultured marine 1 1 uncultured marine fungus [99%] deep-sea/deep- unknown 157955968 fungus ocean hydrothermal vent RHIZARIA 1 0 Paulinella chromatophora [94%] freshwater heterotrophic Cercozoa phototroph, its closest marine relatives feed on cyanobacteria uncultured rumen 1 1 uncultured rumen protozoa [100%] unknown 39547204 protozoa uncultured eukaryote 13 3 uncultured eukaryote unknown 198444245, 291263834, 222089870 uncultured phototrophic 2 1 2 x uncultured phototrophic eukaryote [93%] unknown 223674440 eukaryote VIRUS 2 0 thermotolerant Enterobacteria phage phiX174 unknown [98%] and Propionibacterium phage PA6 [79%]

79

Table 5. The taxonomic summary of results for the V6 sequences. All unique sequences listed in the Tables S7-S10 and S12 were manually screened through the NCBI records to determine their origin. The NCBI BLAST searches for those metagenomic sequences that matched at stringency of 99 and 100% in sequence identities (for mRNA sequences translated BLAST > 90% positive) were used to describe ecological and metabolic characteristics, specifically, information from NCBI descriptions, their reported publications (if present), and other web sources. The BLAST results with stringencies lower than 99% (for BLASTx <90% positive substitutions) were used if none of the results for > 99% were available and are shown in the table in square brackets “[]”, their accession numbers can be found in the Tables S1-S6 and S11. The translated BLAST (BLASTx) results present in this table are marked with additional asterisk“[*]” next to % positive substitutions. For taxonomic groups where the number of NCBI matches was too large to include all of them in the ecological characteristics section were marked “**”.

№ of № of seq Ecological/physiological characteristics for Metabolic Matching NCBI GI unique ≥ 99% metagenomic sequences that were closely characteristics numbers for sequences Taxon seq identity* related to those of organisms reported on 99 and 100% identity* NCBI gene bank Bacteria 139 51 Actinobacteria 11 5 uncultured Frankineae bacterium from acidic high- heterotrophic, 262063836, 268032028, Arctic wetland permafrost soil; methylotrophic chemoorganotrophic 196174919, 162952245, Micrococcus sp.; Mycobacterium marinum striped 418245854 bass (Morone saxatilis) pathogen; Renibacterium salmoninarum fish pathogen; Corynebacterium glutamicum [*91%] soil Bacteroidetes/Chlorobi 1 1 Dyadobacter fermentans species plants associated, heterotrophic 254946573 previously isolated from soil and deep sea sediment, aerobic, alkaliphilic, psychrophilic - is able to grow at 4°C Chloroflexi 1 1 uncultured Chloroflexi bacterium from bauxite 290794322 soil (aluminum ore, bauxite) 80

Table 5. Cont.

Deinococcus-Thermus 1 0 [79%] thermophilic heterotrophic, (previously isolated from hot springs and thermal chemoorganotrophic vents), radiophilic, some associated with cyanobacteria Firmicutes 12 5 thermophilic, meso and psychrophilic; 295443962, 154184543, Staphylococcus sp. citrus plant pathogen; 294337929, 68445725, uncultured Lachnospiraceae bacterium animal 223043362 intestinal tract parasite; Bacillus clausii soil; Staphylococcus haemolyticus and Staphylococcus capitis [*100%] animal associated; previously isolated alkaliphilic Bacillus and Staphylococcus form lake sediments Fusobacteria 1 0 Leptotrichia buccalis [92%] free-living species are heterotrophic anaerobic, mesophilic, animal parasite Uncultured Bacteria 48 14 sequences similar to those from different unknown 70959311, 169283493, metagenomics studies from uncultured bacteria 291507684, 9187671, (animal parasite/symbionts, from freshwater 126402247, 295687301, wetland soils, wastewater treatment, deep-ocean 291260719, 193084619, (Pacific Ocean); freshwater rivers, mould- 223675496, 292596344, colonized water, arsenic rich soil, uncultured rape 238341248, 261262250, rhizosphere); lobster gut bacterium ABHa3 [98%] 295814818, 291258986 aerobic; uncultured bacterium [95%] from deep sea sediment of western Pacific warm pool Proteobacteria (alpha) 7 3 Brevundimonas sp. freshwater associated with heterotrophic, 224027508, 146403799, cyanobacterial water blooms; psychrophilic and nitrogen-fixing, 134084827 mesophilic Brevundimonas sp. were also reductive pentose previously isolated form Arctic and Antarctic soil phosphate pathway and glaciers; Bradyrhizobium sp. aquatic photosynthetic; citrus rhizosphere Subaequorebacter tamlense 81

Table 5. Cont.

Proteobacteria (beta) 34 13 Herbaspirillum sp. animal and plant associated; nitrogen-fixation 285200309, 284810302, Burkholderia sp. soil, plant (i.e. sugarcane) roots and recycling, 120591888, 407894523, [rRNA and mRNA sequences]; Polaromonas diazotrophic, 299065054, 387575654, naphthalenivorans [98%]; Acidovorax sp. [*98%] heterotrophic, 133737197, 351728124, soil; Ralstonia solanacearum [*91, 92, 97%] plant chemoarganotrophic 149926016, 77964193, pathogen; Herminiimonas arsenicoxydans [*96%] , arsenic oxidation 187713229, 334194119, aquatic arsenic contaminated environment; free- and reduction, sulfur 30407127 living Acidovorax radicis [*96%] is associated and hydrogen with ; Limnobacter sp. [*95%] thiosulfate- oxidizing, oxidizing bacteria previously isolated from chemolithotrophic, freshwater sediment; Burkholderia sp. rhizosphere reductive pentose nitrogen-fixing bacteria; Burkholderia phosphate pathway phytofirmans [*92%] endophyte; thermophilic uncultured sp. [97%] aquatic highly adaptable from hot springs (close relative of Lake Vostok Hydrogenophilus thermolutheus, previously reported by Bulat and colleagues in 2004; thermophilic uncultured betaproteobacterium [96%] from high-temperature volcanic environment Proteobacteria 21 9 aerobic and anaerobic; Escherichia sp. from nitrate reduction, 295394130, 295687302, (gamma) freshwater low level radioactive waste; uncultured nitrogen fixation, 269931076, 269911743, Pseudomonas sp. insect larva associated; animal heterotrophic, 294799818, 257073647, associated uncultured Escherichia sp.; ammonia hydrogen oxidation 257074351, 281599365, producing Shigella sp.; uncultured Citrobacter sp. coupled with nitrite 291150583 most common in contaminated soils and waters, reduction, reductive also found in animal intestine; uncultured pentose phosphate Enterobacter sp. animal pathogens; animal pathway intestine ; Shigella species can live for 5 days in freshwater and up to 30 hours in salt water; Pantoea ananatis plant pathogen; uncultured gammaproteobacteria [98%] associated with the marine sponge; [98%] previously found in Antarctica (ref3); Rheinheimera sp. [97%] from mesotrophic lake; rainbow trout intestinal bacterium T1 [93%] 82

Table 5. Cont.

Proteobacteria 2 0 unknown unknown (uncultured) EUKARYOTA 27 12 ARCHAEPLASTIDA (Viridiplantae) Streptophyta (land plants) 2 2 The origin of land plants sequences in the lake is reductive pentose 256373711, 290782478 undefined. They are ether coming from pollen phosphate pathway accumulated in the sediment of the lake, or are being deposited into the lake from the overriding glacier, or both, either way, these are probably non-viable OPISTHOKONTA (Animalia) Arthropoda 5 1 Dermacentor variabilis animal parasite, Acari; heterotrophic 294910885 free-living Daphnia pulex [98%], freshwater OPISTHOKONTA (Dikarya, fungi) Ascomycota 11 4 thermophilic Ogataea thermomethanolica heterotrophic 284159233, 291482367, methylotrophic yeast; Cladosporium 284158823, 356871503 cladosporioides plant pathogen; Davidiella tassiana (Cladosporium herbarum) dead plant tissue; osmotolerant halotolerant Pichia sorbitophila [*99%]; Cyathicula microspora [98%] bryophyte symbiont; Millerozyma farinosa decaying plants [98%]; Candida tropicalis [97%] yeast for ethanol production; 3 sequences [92%, 94%] closely related to those of methanol assimilating and rock-inhabiting (biopitting of marble) fungi Basidiomycota 4 3 uncultured Agaricomycotina (some novel soil heterotrophic 234195550, 295393258, psychrophilic species found in Antarctica, can be 292660457 aquatic); thermotolerant Bullera taiwanensis; uncultured Basidiomycota from roots in chernozem soil; Geastrum sessile [98%] associated with orchid from tropical mountain rainforest uncultured fungus 4 1 unknown unknown 157925543 uncultured eukaryote 1 1 uncultured organism unknown 256006248 83

Psychrophiles

The majority of sequences related to those of psychrophilic organisms were closely related to

Gammaproteobacteria species. Among those were sequences closely related to psychrophilic microbes in the NCBI database from species inhabiting deep-sea and ocean sediments, permafrost, and glacial ice. One was piezophilic Marinobacter sp. [*90%] isolated from more than 1.5 km deep in the Arctic Ocean. At least 24 sequences found in the V5 sample were closest to those from psychrophilic Psychrobacter spp. from marine, cold saline springs, deep- sea sediment, and Antarctic soil (Table 4, S1, S3, S4, and S11). Interestingly, the V6

Gammaproteobacteria sequences were primarily similar to those from fresh-water and soil environments or those that are animal associated (Table 5). One sequence was similar to that from an uncultured Proteobacteria associated with a marine sponge [98%] (Table 5). Within both samples, sequences closely related to those from psychrophilic Actinobacteria were found.

They were primarily psychrophilic Nocardioides sp. (V5) and permafrost/soil Micrococcus spp.

(V6).

The majority of sequences similar to those from Alphaproteobacteria species in the V5 sample were from psychrophilic and psychrotolerant organisms. Several sequences were also related to those previously described as either halophilic or thermotolerant, for example the non- sulfur purple bacterium Rhodospirillum centenum [99%]. In addition, sequences related to those from halophilic Brevimonas sp. [94%] and Paracoccus sp. [99%], and hydrothermal sediment uncultured Alphaproteobacteria were present (Table 5, S3, S4). Only seven unique sequences were similar to those from Alphaproteobacteria in the V6 sample, of which one sequence was closest to a freshwater Brevundimonas sp. [100%] (Table S1). Several V5 and V6 sequences were similar to those from aquatic and soil psychrophilic Brevundimonas sp. 84

Among most abundant taxonomic groups the majority of the V5 sequences (613) were closely related to members of Firmicutes. Approximately one-third (207) were at the highest stringency levels (BLASTQUHVXOWV•LGHQWLW\tBLAST[UHVXOWV•SRVLWLYH substitutions). Only twelve sequences related to Firmicutes were found in the V6 sample. At least five sequences were >99% identical to sequences from animal and plant associated

Staphylococcus and Bacillus species. These BLAST results included psychrophilic, thermophilic, and alkaliphilic species, some of which were previously isolated from lake sediments. The 207 sequences in the V5 sample were closely related to a variety of Firmicutes species from diverse environments. Some were psychrophilic (permafrost soil uncultured Carnobacterium sp., [99%]) and thermophilic freshwater (uncultured Bacillus sp. from geothermal waters [100%]) and marine species (Arctic sea ice Planococcus sp. [90%] and Kongsfjorden seawater Planococcus sp. Norway [99%]). Also present were alkaliphilic Bacillus agaradhaerens [98% and 99%], halotolerant aquatic Marinococcus sp. [98%], and soil Sporosarcina sp. [100%] species. One sequences was 99% identical to the alpine permafrost Planococcus sp. [99%] (Table 4 and S1)

Thermophiles

Both samples contained sequences similar to species of Chloroflexi, one [83%] aerobic Sphaerobacter thermophilus (V5), and an uncultured Chloroflexi bacterium [99%] growing on bauxite (V6) (Table 4, 5, S4, S9). Some members of Chloroflexi use the 3- hydroxypropionic acid cycle for carbon fixation. Only one out of six V5 sequences was found to be highly similar to [100%]. The Helicobacter spp. are thermophilic, and these chemoautotrophs are often associated with sulfidic deep-sea hydrothermal vent environments (Yamamoto and Takai 2011) (Table 4, S1-4). The V5 sample also contained 85 sequences related to aerobic thermophilic Betaproteobacteria from hot springs: Caldimonas hydrothermale [100%] and Caldimonas manganoxidans [100%] (Table S1). One V6 rRNA sequence was 96% identical to an uncultured Betaproteobacteria from a volcanic environment

(Table S7) and another was 97% identical to an uncultured Thiobacillus sp. (Table S9), which is in the same family () as Hydrogenophilus thermoluteolus, which was previously isolated from the Lake Vostok (Bulat et al. 2004). In addition, both samples contained sequences closely related to various Betaproteobacteria sequences previously isolated from soil and freshwater environments, as well as some that were animal and plant associated.

At least eight sequences similar to those from Deinococcus-Thermus were found in the V5 sample and one in V6 (Table 4, 5, S1, S3, S4, S10, and S11). They were closely related to the extremophile , a radioresistant/polyextremophile that can survive cold, dehydration, a range of pressures, and low pH environments. Sequences closest to those from another member of Deinococcus-Thermus, ruber, were also found in the V5 sample.

Two sequences exhibited lower percent identity [79% and 95%] to Thermus thermophiles found in both Vostok samples (Table 4, 5, S4, S10). This chemoorganoheterotroph is an extreme thermophile originally isolated from a thermal vent (Oshima and Kazutomo 1974).

Halophiles

Metagenomic and metatranscriptomic data also suggests the presence of halophilic Pirellula staleyi (Planctomycetes) in the V5 sample (Table 4, S2, S4, and S11). Sequences similar to

Deltaproteobacteria were also present in the V5 sample (Table 4, S2, S4, and S11). All 12 sequences were related to marine and freshwater mesophilic and halophilic organisms, most of them below 95% identity; however, one rRNA sequence was 98% identical to Bacteriovorax sp. 86

A sequence closest to an uncultured Deltaproteobacteria, previously found in mesopelagic

Antarctic waters [84%] was found in V5 (Table 4, S4).

Other Extremophiles

Sequences related to those from psychrophilic Bacteroidetes species were found in both

Vostok samples, among which were Dyadobacter fermentas (V6), an alkaliphilic deep-sea sediment bacterium; and Gillisia limnaea (V5), previously isolated from Lake Fryxel in

Antarctica (Trappen et al. 2004) (Table 4 and 5). However, at lower stringencies some of the V5 sequences were also related to extremophilic Actinomyces georgia [98%], halophilic

Nesternkonia sp. [93% and >99%], alkaliphilic and acidophilic Nesternkonia halotolerans

[93%], and arsenic reducing Actinobacteria species (Arsenicicoccus bolidensis and

Arsenicicoccus piscis [95%]) were present. Only three sequences from Lake Vostok accretion ice were close to sequences from soil acidophilic Acidobacteria in the V5 sample (Table 4, S1, S4, and S11).

Symbiotic and parasitic species

BLAST searches detected sequences similarities to eleven anaerobic animal parasites and anoxic marine sediment Fusobacteria, among which were Propionigenium maris, previously isolated from marine mud, and strict anaerobe Propionigenium modestum, although percent identities for both sequences were lower than 93% (V5) (Table S2). Also, sequences related to

Leptotrichia buccalis were found in both Vostok samples (V5 [94% and 98%]; V6 [92%])

(Table 4, 5, S4, S10, S11). This anaerobic organism is primarily an animal parasite. However, as a free-living species they are mesophilic. 87

Two sequences were related to those from animal intestinal symbionts. Sequences with low percent identity to chemoorganoheterotrophic Mucispirillum schaedleri from the Deferribacteres

[97%] and uncultured Fibrobacter sp. from Fibrobacteres [96%] were present in V5 (Table 4,

S1, and S3). Mucispirillum schaedleri is related to Flexistipes, some of which were first isolated from the brines of the Red Sea and were described as thermophilic, halotolerant, and tolerant to heavy metals (Lapidus et al. 2011). Within the Gammaproteobacteria, sequences closely related to those from various symbiotic and parasitic/pathogenic species were found. Sequences similar to rainbow trout intestinal bacterium T1 (V6 [93%]), (Table S7), several Pseudomonas sp. associated with plants (Table 4, 5, S1, S3, S4, S10), one Pantoea ananatis (Table S10), a plant pathogen of Eucalyptus sp., and a fire blight pathogen Erwinia amylovora [94% and 97%]

(Table S4) were found in the V5 and V6 samples.

Four sequences in the V5 sample were closely related to the same genus of Spirochaetes

[97%, 99% and 100%]. These Brachyspira species are obligate anaerobic animal pathogens

(Table 4, S2, and S11). Four sequences were similar to Tenericutes species sequences, although percent identities were less than 91% for all of them (Table 4, S1, and S2). Acholeplasma equifetale and Acholeplasma exanthum species are considered saprotrophic organisms found on some animals and plants (Chen et al. 2011). Spiroplasma diabroticae is a symbiont of arthropods, in most cases of insects (Anbutsu et al. 2008), and Mycoplasma feliminutum is an pathogen. Four sequences showed low [88%, 90%, 98%, *60%] levels of identity for nucleotide and translated sequences to aerobic and anaerobic Verrucomicrobia species (Table 4,

S2, S4, and S11). The highest identity score was to Pedosphaera parvula [98%], a heterotrophic obligate aerobic bacterium previously isolated from soil, whereas Akkermansia muciniphila

[88%] was previously identified as a strictly anaerobic animal symbiont/pathogen. Also, 88 sequences closely related to various plant and animal symbionts and parasites/pathogens from the Firmicutes were present, including an uncultured species associated with a marine sponge, several Lactobacillaceae animal intestinal species, and the plant pathogen Streptococcus salivarius (at least 6 sequences •>@) (Table 4, 5, S1 and S11).

Opisthokonta

In general, eukaryotic sequences were closely related to organisms specific to freshwater lakes, marine environments, soil, lake sediments, deep-sea sediments, deep-sea thermal vents, as well as those that can form associations with animals and/or plants. Sequences from and heterotrophs were present. However, the quantities and taxa differed in V5 and V6. In both samples, sequences related to fungi were the most numerous. Among the 21 Ascomycota sequences in V5 were those that were closest to those from species associated with plants

(Verticillium dahliae, [100%]), aquatic sediments (Articulospora tetracladia, [98%]), marine systems (Phoma sp., [98%]), and insect larvae (Candida ontarioensis, [97%]) (Table 4, S1). The

V6 Ascomycota sequences were mostly related to those from plant pathogens and plant decaying species (Table 5). One sequence from the V6 sample was highly similar to thermophilic Ogatae thermomethanolica [100%] (Table S9), which is methanotrophic yeast. Many sequences from both samples were highly similar to psychrophilic Basidiomycota. Two V5 sequences were similar to sequences from Rhodotorula species [95% and 99%] (Table S1), which were also previously found in Antarctic sea ice and Lake Vostok accretion ice cores (D’Elia et al. 2009; Ma et al. 1999,

2000; Stamer et al. 2005; Raymond et al. 2008). One of three V6 sequences was identical to a sequence from thermophilic Bulleria taiwanensis [100%] (Table S9). The remaining two were similar to sequences from aquatic psychrophilic fungi. One sequence in the V5 sample was 99% 89 identical to a sequence from an uncultured marine fungus (Table 4, S1), which was previously isolated from a deep-sea/ocean hydrothermal vent.

Both samples contained sequences similar to those from Arthropods. At lower stringencies

[98%], some of these sequences were related to freshwater psychrophilic Daphnia pulex (Table 4,

5, S5, S10). While sequences from Arthropods were present in both Vostok samples, the V5 sample also contained nine sequences similar to other members of the Animalia group. At lower stringencies, sequences similar to Cnidaria (a small psychrophilic sea anemone) [78%], Tardigrada

(extremophilic waterbear) [93%], Rotifera (psychrophilic predator of , bacteria and algal species) [98%], and a halotolerant bilaterian [92%] were present (Table 4, S1, S3, S5). One sequence was 100% identical to Nutricola tantilla, a small piezotolerant marine sediment bivalve

(Table 4, S3).

Archaeplastida, Chromalveolata, Bikonta, Amoebozoa

Both Vostok samples also contained molecular traces of plant species (Streptophyta). The origin of these in the lake is still unknown. These sequences might have been introduced into the lake through either deposition by melting of meteoric ice or originated in the lake when it was still open to the atmosphere. In addition to that, the V5 sample contained eleven sequences related to other species of the Archaeplastida. One sequence was similar [99%] to one isolated from a deep- sea sediment green alga Pseudendoclonuim akinetum (Chlorophyta) (Table 4, S5). However, species of this genus can also live in freshwater. Another sequence was 97% identical to a sequence from an Antarctic red alga, Glacilaria tenuistipitata (Rhodophyta) (Table 4, S5).

Some of the V5 sequences were similar to those from sixteen single-celled eukaryotes.

Twelve sequences were related to several Chromalveolata species (seven sequences with highest 90 stringencies). Three were similar to sequences from diatoms (Stephanodiscus sp. [99% and

100%] and Hantzschia sp., [95%]), three (Aphanomyces euteiches, [98% and 100%],

Botrydiopsis constricta [99%], Halosiphon tomentosus, [97%]), one non-photosynthetic deep- water cryptophyte (Cryptomonas paramecium, [96%], and a member of Perkinsea (Perkinsus marinus, a mollusk parasite, [100%]) (Table 4, S1-S3, S5). One sequence was 94% identical to a sequence from the freshwater phototroph Paulinella chromatophora (Rhizaria), a marine species that feeds on cyanobacteria (Table 4, S5). Two sequences related to sequences from single- celled eukaryotes were from members of Excavata. The first one was parasitic Trypanosoma cruzi [100%] and the other was the non-parasitic Naegleria gruberi [98%]; both were previously described as freshwater species (Table 4, S5). At lower stringencies one sequence was closest

[88%] to a sequence from an aquatic Amoebozoan similar to the bacteriovores Nolandella sp., which also feeds on diatoms and nematodes (Table 4, S1).

Unknown

Almost 50% of V5 and 34% of V6 sequences from bacteria and 14% of V5 and 18% of V6 sequences from eukaryotic organisms were most similar to sequences from uncultured and unidentified organisms. However, while detailed NCBI taxonomy information was not available for these, at least five sequences from V5 were highly similar [99% and 100%] to those from marine bacteria, one was previously isolated from the Antarctic coast, one was similar to a sequences of an iron-reducing bacterium [96%], and another was closely related to the sequence from a halophilic bacterium [100%] (Table S1). Two sequences were >98% identical to two rRNA gene sequences from a marine symbiotic bacteria (coral Acropora eurystoma symbiont, and sponge Theonella swinhoei PAUC32f symbiont) (Table S1). Almost half of the V5 91 uncultured eukaryotic sequences were most similar to those from fungi. One of these was 99% identical to rRNA sequence from a deep-sea hydrothermal vent fungus (Table 4). Most of the sequences in the V6 sample were similar to rRNA sequences from aquatic and freshwater uncultured bacteria. Among those, one sequences was 98% identical to a lobster gut bacterium sequence, and one was 95% identical to a deep-sea sediment bacterium sequence (Table 5 and

S7). Only five sequences in the V5 sample were similar to those from uncultured eukaryotes, and four were fungi (Table 5). While both V5 and V6 sequences were similar to those listed in the

NCBI database as uncultured and unidentified organisms from other environmental studies, they require further detailed phylogenetic analysis. Some or most of them may be novel species that have yet to be scientifically described.

Metabolic classifications

Only 23 sequences from the V6 sample were similar to sequences for enzymes based on the

BLASTn and tBLASTx searches. As all represent different cellular processes (mostly one enzyme per pathway), a comprehensive metabolic analysis was impossible for the V6 sample.

However, KAAS-KEGG results and NCBI BLAST searches of the V5 sequences resulted in 402 sequences matching complete genome records (Table S4, S5) and 441 sequences that matched sequences for known enzymes (Table S11). Although 14% of complete genome records were from ribosomal DNA gene sequences (Table S5), the remaining sequences were used for metabolic determinations and pathways reconstructions as were similar to different gene sequences (Figure 8). Prior to global metabolic map construction, sixteen supplementary figures were created from more than 200 KAAS-KEGG metabolic pathways. Each metabolic pathway was screened for enzymes and proteins that were detected by the KAAS-KEGG search process. 92

93

Figure 8. Global metabolic map reconstructed from the results of the KAAS-KEGG analysis of the V5 sequences. Color key for the metabolic pathways is at the lower left. Solid black lines indicate pathways within a metabolic process. Dashed red lines indicate connections between the pathways. Dashed black lines indicate pathways not represented in the sequence database. Grey background represents produced intermediate compounds. The global metabolic map was built from reconstructed pathways indicated in the supplementary section (Figures S1-S16). 94

Then, the supplementary figures were constructed based on the Lake Vostok accretion ice sequences matching sequences for those enzymes (Table S4, S5, and S11; Figures S1-S16).

Nucleotide metabolism

Forty four sequences similar to 33 enzymes of two metabolic pathways for nucleotide synthesis were recovered from the V5 sample. Nine sequences were closest to those of common genes for both purine and pyrimidine metabolic processes, while thirteen were unique to purine processes and eleven to pyrimidine synthesis. In addition to nucleotide synthesis, more than 30 sequences were similar to genes encoding for DNA replication, mismatch repair, nucleotide excision repair, recombination proteins, and RNA degrading enzymes (Table S11). The metagenomic results suggest that pyrimidine metabolism is supplied from two amino acid metabolic processes. It either starts from L-glutamine (arginine and proline biosynthesis), which in the presence of carbamoyl-phosphate synthase [89%] (Table S11) in alanine, aspartate and glutamate metabolism is converted into carbamoyl-phosphate, thus entering the pyrimidine pathway, or from aspartate (Table S11, Figure S1, S2). Only seven sequences matched different genes from aspartate synthesis (three of them were the same oxidases [97%, *97%, *96%]).

However, the enzymes produced from these genes are probably creating precursors for other pathways.

Four sequences were closely related to the genes for the enzymes of oxaloacetate synthesis, which is an important intermediate in both oxidative and reductive TCA (o/rTCA) cycles. One was similar to the aspartate aminotransferase gene [98%] and three were similar to L-aspartate oxidases [97%], [*97%], [*96%]. Sequence closest to the gene of the aspartate-semialdehyde dehydrogenase enzyme [88%], which produces a substrate for peptidoglycan biosynthesis, was 95 also present (Table S11, Figure 8, S1, S2, S7). Fourteen sequences in the V5 sample were closest to sequences for genes involved in arginine and proline metabolism, of which only six were specific to that pathway. Those were sequences similar to genes for amino-acid N- acetyltransferase [*60%], agmatine deiminase [*97%], argininosuccinate lyase (also produces fumarate for o/rTCA cycles) [99%], ornithine carbamoyltransferase [*93%] and acetylornithine deacetylase [*78%] in the urea cycle, and proline iminopeptidase [99%] (Table S11). The remaining eight sequences were similar to genes encoding for enzymes also common to other pathways. Among those eight were two sequences similar to the same aldehyde dehydrogenase gene [90% and 96%] from Glycolysis- pathway and one similar to cytosine deaminase gene [87%] from Pyrimidine metabolism. Another two sequences were similar to the genes for the enzymes, which are essential for almost half of amino acids syntheses (monoamine oxidase [*93%] and aspartate aminotransferase [98%]). The last three sequences were closely related to glutamate metabolic genes, one of which (glutamine synthetase [87%]) produces glutamine by fixing ammonia (Table S11). None of the sequences were similar to genes encoding enzymes for 5-aminoimidazole-4-carboxamide ribonucleotide (AICAR), which is an important precursor for inosine monophosphate (IMP) production in purine metabolism.

Sequences matching genes of the pentose phosphate pathway suggested that phosphoribosyl pyrophosphate (PRPP) could be used as a substrate for enzymes of the pathways, eventually producing AICAR for IMP synthesis needed in purine metabolic processes. Only seven sequences matched genes involved in histidine metabolism, and they were primarily supporting a by-pass mechanism for IMP synthesis (Table S11, Figure S5). 96

Pyruvate metabolism

The metagenomic analysis indicated the presence of enzymes involved in the production of pyruvate and acetyl-CoA. Sequences similar to those of two genes responsible for the production of pyruvate from phosphoenolpyruvate (PEP) via oxaloacetate were found in the dataset.

Phosphoenolpyruvate carboxykinase (ATP) [87%] is involved in the first part of the process and oxaloacetate decarboxylase beta subunit [83%] converts oxaloacetate into pyruvate in the second. Nucleotide BLAST results also suggest the presence of a sequence that was closely related to the gene for the pyruvate water dikinase [99%], which converts pyruvate back to the phosphoenolpyruvate (Table S11). This pyruvate-oxaloacetate system (mentioned above) is also an important part of the prokaryotic carbon fixation process.

Three sequences were similar to the genes that encode for enzymes of direct synthesis of acetyl-CoA from pyruvate [97% and 99%] (Table S11, Figure 8 and S3). Pyruvate metabolism also supports several other processes. An example is a sequence similar to the gene encoding formate C-acetyltransferase [97%], which produces formate and Acetyl-CoA from pyruvate for glyoxylate and dicarboxylate metabolic processes. Later, in the presence of NAD+ (oxidized) formate is there broken down to CO2 by formate dehydrogenase [92%, 97% and 99%] (Table

S11, Figure 8, S3 and S4). Synthesized acetyl-CoA can be shuttled to propanoate metabolism for propanyl-CoA biosynthesis. Sequences related to these enzymes were not found in the V5 sample; however, the presence of sequences similar to those related to the cycling processes of propanyl-CoA and 2-oxobutanoate (formate C-acetyltransferase [97%] and acetoacetyl-CoA synthetase [73% and 88%]) indicates that propanyl-CoA might be produced in other amino acids pathways (Table S11, Figure 8 and S3). In addition to that, acetoacetyl-CoA synthetase 97 participates in prokaryotic carbon fixation by converting to acetyl-CoA in the presence of

ATP.

Carbohydrates

Fourteen sequences from V5 were closest to gene sequences encoding enzymes in the amino sugar and nucleotide sugar pathways, of which three sequences were similar to the gene for UDP -6-dehydrogenase [83%, *62% and *93], which is an important component of ascorbate synthesis, glucuronate interconversions, and metabolic processes (Table S11). The enzymes from amino and nucleotide sugar metabolic processes convert -6-phosphate into glucose-6-phosphate for ADP-glucose production, which, in turn, is used for starch and metabolism. By sharing several steps with the glycolysis pathway the phosphotransferase system enzymes with -specific IIB component [98%] and glucose-6-phosphate isomerase

[98%] can convert extra- and intracellular glucose into fructose-6-phosphate, which can either reenter the glycolysis pathway or can be converted to uridine diphosphate N-acetylglucosamine for peptidoglycan biosynthesis. Such enzymes as: N-acetylglucosamine-6-phosphate deacetylase

[99%], bi-functional UDP-N-acetylglucosamine pyrophosphorylase/Glucosamine-1-phosphate

N-acetyltransferase [99%] and UDP-N-acetylmuramate dehydrogenase [*83%] were found in our metagenome (Table S11; Figure 8, S6). At least 15 sequences similar to ten genes encoding enzymes in glycolysis were present in the V5 sample, of which five are shared with other pathways: four involved in pyruvate metabolism and oxidative TCA cycle (three different components of [97% and 99%] and phosphoenolpyruvate carboxykinase [87%]) and two sequences similar to the aldehyde dehydrogenase (NAD+) gene

[90% and 96%], which is involved in pyruvate and ascorbate and glucuronate interconversions 98 cycles. Also, among five metatranscriptomic sequences related to the glyceraldehyde 3- phosphate dehydrogenase gene, one was at 99% identity level (Table S11).

The amino and nucleotide sugar pathway has exactly the same sequence of enzymes as in fructose and mannose metabolism. Those enzymes (PTS system, mannose-specific IIB component [98%] and mannose-1-phosphate guanylyltransferase [96%]) convert either D- fructose or D-mannose into GDP-D-mannose, which is required for N-glycan biosynthesis. One

RIWKHLQWHUPHGLDWHSURGXFWVRIJO\FRO\VLVLVȕ-D-fructose-6-phosphate that can also be used for glycan biosynthesis. The UDP-N-acetyl-D-glucosamine produced from glucose-6-phosphate in sugar metabolism is used as a substrate for the enzyme UDP-N-acetylglucosamine acyltransferase [96%] for lipid-A-disaccharide synthesis. Only four sequences were found similar to the genes of lipopolysaccharide biosynthesis, those responsible for complex sugar compounds syntheses, which are required for membrane synthesis (Table S11; Figure 8, S7).

Energy systems

Metatranscriptomic data included sequences closely related to many genes of proteins necessary for the oxidative phosphorylation process. An almost complete and fully functional system was reconstructed from the BLASTn and tBLASTx results. Among those, partial gene sequences for seven different NADH dehydrogenase subunits >•92% DQG• @, iron-sulfur protein [96%], one cytochrome bd-I oxidase subunit I [99%], three different ATPase subunits [96%, 97% and *98%], two ubiquinol-cytochrome c reductase cytochrome b subunits [98%] and [*96%], and one for cytochrome o ubiquinol oxidase subunit I

[91%] were present in the database (Table S11; Figure 9). The electron donors, like NADPH and ferredoxins, are needed for different carbon fixation processes. 99

Figure 9. Reconstructed oxidative phosphorylation pathway connected to the TCA cycle. The electron transport chain (ETC)

(below) connected to the o/rTCA cycle (above) reconstructed based on the results from the KAAS-KEGG analysis and manual

BLASTn and BLASTx of the metagenomic/metatranscriptomic sequences (Table S11). Dotted-dashed red arrows represent proton transport through the ETC complexes. Those subunits that were found by KAAS KEGG are stacked together in boxes, those represent different ETC complexes (labeled at the bottom of each box). The color coding and connections are the same as in the Figure 8. 100

Sequences for only seven enzymes were identified from the ascorbate and aldarate pathways, but the majority of them are located within other processes. However, two sequences were unique for this pathway and were closest to the gene sequences for L-ribulose-5-phosphate 3- epimerase [*82%], which functions in oxidative or reductive pentose phosphate pathways (o/r

PP) and L-ascorbate 6-phosphate lactonase [99%] (Table S11; Figure 10).

The presence of the reductive pentose phosphate cycle (rPP) was also confirmed by 13 sequences. These were similar to the sequences for ascorbate metabolic enzymes mentioned above; two aldehyde dehydrogenase (NAD+) [90% and 96%] and one glucose-6-phosphate isomerase [98%] also supported glycolysis pathways. Also, sequences for transketolase gene

[95%], as well as -phosphate pyrophosphokinase gene [99%] were found in the V5 sequences. It is important to mention that sequence closely related to the gene for the phosphoribulokinase [95%] was also present (Figure 10, Table S11). Phosphoribulokinase is necessary for the rPP cycle as it synthesizes D-ribulose-1,5-biphosphate, which in turn acts as a substrate for the CO2 fixation and produces D-glycerate 3-phosphate (Figure 10, Table S11).

Sequences related to nine genes for bacterial carbon fixation enzymes were found in the Lake

Vostok V5 sample. Two sequences were similar to gene for putative pyruvate-flavodoxin oxidoreductase [84% and 87%] enzyme and another two were for the isocitrate dehydrogenase gene [76% and 99%]; both fix CO2 into pyruvate and isocitrate for the rTCA cycle (Table S11;

Figure 11). One sequences was similar to the gene for the aconitate hydratase 1 [*96%], which converts isocitrate to citrate. Other four sequences were similar to three genes for the enzymes, which are involved in the pyruvate-oxaloacetate conversion system (mentioned before - pyruvate metabolism) for both o/rTCA cycle. However, sequences matching the genes for ATP citrate lyase and malate dehydrogenase, which are unique for the rTCA cycle were not found. The first 101

Figure 10. Carbon fixation processes via reductive pentose phosphate pathway (on the right, clockwise) and adjoining metabolic processes reconstructed based on the results from the KAAS-KEGG analysis and manual BLASTn and BLASTx of the metagenomic/metatranscriptomic sequences (Table S11). The color coding and connection are the same as in the Figure 8. 102

Figure 11. Reconstructed tricarboxylic acid (TCA, citric acid, citrate, or Krebs cycle) cycle (center) with other connecting pathways found in the sequence data set. The TCA cycle can proceed as an oxidative process (clockwise; oTCA), which leads to production of ATP and NADH, or it can proceed as a reductive process (counterclockwise, rTCA) for carbon fixation. Metagenomic and metatranscriptomic results suggest that both directions are present in the Lake Vostok samples. The color coding and connection are the same as in the Figure 8. 103 one convert citrate to oxaloacetate, while the second is needed for the production of the malate from oxaloacetate. Both missing steps of the rTCA cycle are synthesized in pyruvate metabolism

(Figure 8, 11, S3), and therefore they might be supplied from that metabolic process. The third missing enzyme of the rTCA cycle was 2-oxoglutarate-decarboxylase, which converts succinyl-

CoA into 2-oxoglutarate by fixation of CO2. Metatranscriptomic data suggest the presence of 2- oxoglutarate as one of the by-products from glutamate and glutamine metabolism (glutamate dehydrogenase (NADP+) [*98%]) (Table S11; Figure 8, 11, S2). As mentioned above, two sequences were related to the gene for isocitrate dehydrogenase, which means that the 2- oxoglutarate might be present in the rTCA cycle. While only six sequences were similar to genes of the o/rTCA cycle, other intermediate products were found to be supplied from other metabolic pathways; for instance, argininosuccinate lyase [99%] in arginine metabolism and adenylosuccinate synthase [*61%] in alanine and aspartate synthesis both participate in the synthesis of fumarate (Table S11; Figure 8, 11).

Four sequences were similar to genes of the glutamate-glutamine cycle. The by-product of these processes is released and assimilated ammonia. Two sequences were similar to the genes for amino acid N-acetyltransferase [*60%] and glutamate dehydrogenase [*98%]; the latter is known to use NADP+ as a cofactor for 2-oxoglutarate and ammonia production. The two other sequences were similar to genes for glutaminase [94%] and glutamine synthetase [87%]. The former converts glutamine into glutamate and release ammonia, whereas the latter uses ammonia and ATP for ammonia assimilation. All four also participate in nitrogen cycles (Table S11;

Figure S1, S2). In addition, three other sequences were similar to sequences encoding for nitrate reductase 1 alpha subunit [*79% and *91%], indicating the possibility for denitrification process and ferredoxin-nitrate reductase for nitrification [97%] (Table S11). Additionally, sequence 104 similar to a sensor histidine kinase HydH from the NtrC family [90%] was also present. This histidine kinase is a nitrogen regulatory protein responsible for the activation of glutamine synthetase transcription, subsequently facilitating production of glutamine from glutamate and ammonia (Table S11; Figure S2). Metagenomic data also suggests the presence of sequences similar to potentially anammox Planctomycetes species in the V5 sample (i.e. Kuenenia stuttgartiensis [100%]) (Table 4, S4). Such species use available ammonium and nitrite to produce dinitrogen gas.

Two sequences similar to sequences from methanotrophic Archaea were found in the V5 samples [99% and 100%] (Table S6). Some Archaea are capable of fixing carbon via a reductive acetyl-CoA pathway. However, no sequences encoding enzymes for this pathway were found, suggesting that they may use one of the other pathways for carbon fixation. Some

Chloroflexi species (V5 [83%], V6 [99%]) are known to fix carbon using 3-hydroxypropionic cycles, although others utilize rPP pathway. Only two sequences were similar to rRNAs gene sequences from Chloroflexi organisms and other two were similar to propionyl-CoA carboxylase beta chain [*97%] and 3-hydroxyacyl-CoA dehydrogenase [*70%] (Table S4, S9 and S11).

Those sequences that were similar to the genes of the enzymes for the 3-hydroxypropionic and hydroxypropionate cycles are not responsible for direct synthesis of 3-hydroxypropionyl-CoA; thus, existence of this pathway is uncertain.

As mentioned above, the metagenomic and metatranscriptomic data suggests the presence of various phototrophic organisms, including bacterial and eukaryotic species. At least 53 sequences closest to those from Cyanobacteria rRNA genes were indicated DW•as well as eleven algal species (three rRNA sequences [97-99%]), and eleven Chromalveolata species (five rRNA and three mRNA VHTXHQFHV•) (Table S1-S5, S11). Besides the taxonomic presence, 105 metatranscriptomic analysis indicated gene sequences highly similar to genes for parts of both photosystems I [95-97%] and II [96-97%], including genes encoding phycocyanin [98%] and apocytochrome f [96%]. At least nine sequences were similar to genes for enzymes for porphyrin and chlorophyll metabolism, among these were genes for two light-independent protochlorophyllide reductase subunits [96%, 97%], coproporphyrinogen III oxidase [88%], uroporphyrinogen decarboxylase [94%], one heme oxygenase [97%], and one precorrin-8X methylmutase [97%] (Table S11, and Figure S9).

At least eighteen sequences were closest to genes responsible for tRNA synthesis and 84 matched genes encoding two-component phosphotransferases and bacterial secretion system proteins, as well as ABC transporter and pore export proteins (Table S11). Because the V5 sequences did not match all genes responsible for most metabolic processes, all pathways were incomplete. However, some partial pathways may simply supply others.

Metagenomic/metatranscriptomic sequences were also similar to three different genes responsible for cyanoamino acid metabolism, seven genes for the selenocompound metabolism, three genes for phosphonate and phosphinate production, and seventeen genes from various glycan and lipid metabolic processes. Because the quantities of V5 sequences similar to these genes were small, it was impossible to establish the connections for these pathways with the global map (Table S11; Figure 12). 106



Figure 12. Additional reactions indicated in the V5 sample based on KAAS-KEGG sequence analyses. These pathways could not be connected to the global metabolic map (Figure 8) due to one or more missing enzymes. Grey background indicates substrates and products of reactions.

Dashed arrows indicate missing enzyme sequences. Solid lines represent possible reactions with enzymes indicated in the Lake Vostok accretion ice sequence data.

 107

Discussion

While Lake Vostok has been hypothesized to be sterile to oligotrophic, this Lake Vostok accretion ice sequence analysis indicates that the lake may contain a biological community that is far more complex than other subglacial lakes. Based on the metagenomic and metatranscriptomic sequences, there are thousands of species in the lake, ranging from chemolithoautotrophs to multicellular heterotrophs (Rogers et al. 2013; Shtarkman et al. 2013).

However, the concentrations of biomass in the lake water, as indicated from the accretion ice, are low, from a few cells to several hundred cells per milliliter (Priscu et al. 1999; Christner et al.

2001, 2006; D’Elia et al. 2008, 2009; Karl et al. 2008; Bulat et al. 2011). Therefore, it appears that the lake contains a complex web of organisms, but these organisms are at low concentrations due to the extreme conditions that exist in the lake. In addition to this, the lake is under high pressure and is in complete darkness, thus creating one of the most demanding environments on

Earth. Nonetheless, the sequence analyses indicate that the lake might contain cold and hot regions, marine and freshwater regions, as well as regions with extremes in pH. The V5 sample consisted of 414 sequences similar to those from organisms from various habitats, of which 278 sequences exhibited BLASTQUHVXOWV•Ln identity. In the V6 sample, 36 sequences were at

•LQLGHQWLW\OHYHORILGHQWLW\to those from organisms previously isolated from various environments (Figure 13).

Metagenomic/metatranscriptomic sequence analysis suggest the presence of not only aquatic, marine and sedimentary organisms, but also sequences similar to those previously isolated from soil and Arctic and Antarctic ice, while the majority of V5 (>40%) and V6 (>60%) sequences (at both stringency levels) were related to pathogens, parasites, and commensalists of animals and plants. Of those, sequences similar to animal associated organisms were primarily from 108

Environment

N=414 97-100% N=57 97-100% Ice 2% Soil Soil 10% 24% Animal associated 29% Animal Aqautic associated Ice 21% 46% 3% Plant Aqautic associated Plant 19% 11% Marine Marine associated 14% 2% 19%

N=278 99-100% N=36 99-100% Ice 3% Soil Soil 11% 25% Animal associated Animal 28% Aqautic Ice associated 19% 2% 47% Plant Aqautic associated 17% 14% Plant Marine Marine associated 14% 3% 17% V5 V6 

Figure 13. Sequence distribution based on habitat conditions. Proportions of metagenomic sequences that were related to sequences annotated and published on NCBI, whose species could be characterized by habitat. Sequences presented in Tables S1-12 were manually rescreened to construct Tables 4 and 5. Ecological characteristics listed in those tables were used for this figure. BLASTn sequence results •99% (lower two) and ш97% (upper two) identity were selected. Charts represent two samples, V5 on the left (blue background) and V6 on the right

(grey background). The NCBI records that had species descriptions and/or related publications were used to reconstruct these charts. Numbers of descriptions used in each pie chart are marked with N. 109

Gammaproteobacteria. In addition, eleven sequences in the V5 and one sequence in the V6 were similar those from parasitic Fusobacteria (Table 4, 5). Besides sequences similar to those from animal associated bacteria (Flavobacterium johnsoniae, Firmicutes marine sponge symbiont, and lobster gut bacterium, Reinbacterium salmoninarium), the V5 sample contained sequences closest to those from multicellular organisms. Among these were sequences similar to those from animals, including: fourteen Arthropoda sequences (two were [98%] identical to a psychrophilic water flea Daphnia pulex in both Vostok samples), three sequences that were similar to microscopic freshwater predatory (Adineta vaga, [98%]), a sequence closest to that of the marine bivalve Nutricola tantilla [100%], and sequences closest to those from the radiotolerant extremophilic waterbear Milnesium tardigradum [93%] (Table S1, S3, S5, S10). These

Opisthokonta species were mostly marine deep-sea and sediment species and may inhabit areas in the vicinity of the hydrothermal vent, where there would be higher concentrations of energy, nutrients, and sources of usable biomolecules. The presence of sequences similar to those from

Opisthokonta indicates that not only the unicellular life might exist in the lake Vostok, but complex, albeit microscopic, multicellular organisms may live in the lake.

Approximately 11-14% of the V5 and 17-19% of the V6 sequences were similar to those from bacterial and eukaryotic plant associated species as well as photosynthetic and non- photosynthetic Chromalveolata and Archaeplastida organisms (Figure 13). While most plant associated species do not depend on the presence of their mutualistic partner and could possibly survive in the Lake Vostok environment, some plant pathogens require the presence of their host.

For instance, soil nitrogen-fixing Alphaproteobacteria Rhizobium ssp. are dependent on the plant host for supplying carbohydrates, proteins and oxygen; thus, they cannot survive without them.

At least six V5 sequences were similar to the rRNA sequences from different members of 110

Rhizobiaceae; however, only one sequence was at [99%] identity level to uncultured

Agrobacterium sp. (Table S1-S3). Additionally, one V5 sequence was also similar to one from decaying leaf foliage Firmicutes (Bacillus decisifrondis [99%]). The majority of eukaryotic sequences in both Vostok samples were similar to those from different fungi organisms. In the

V5 sample, 28 sequences at •99% identity were similar to those from Ascomycetes and

Basidiomycetes, while at the same stringencies, four and three sequences (respectively) were found in the V6 sample (Table 3, 4). One sequence from the V5 sample was closely related to the sequence of wet decaying wood Ascomycota (Phaeosphaeria spartinicola [99%]) (Table

S2). Among the V5 eukaryotic sequences were also those closest to sequences from photosynthetic (Pseudendoclonium akinetum, [99%]) and non-photosynthetic (Cryptomonas paramecium [96%]) organisms, as well as diatoms (Stephanodiscus sp., •>99%]), none of which were present in the V6 sample.

While it is possible that deep-sea and deep-ocean mollusks, waterbears, anemones, non- photosynthetic Cryptomonas sp., and vent-surrounding diatoms exist in the lake, (as indicated by the data), the source for sequences related to Archaeplastida species (V5: 1 red and 10 green algae and 57 higher plants; V6: 2 higher plants) is still unknown (Table 4 and 5). Among all sequences related to Archaeplastida species, only two sequences in the V5 sample were similar to the portions of the rRNA genes of the Gracilaria tenuistipitata [97%] and

Pseudendoclonium akinetum [99%]. However, molecular traces of Streptophyta species were present in both Vostok samples. Algae and diatom symbiotic cyanobacteria species could possibly adapt and survive in the Lake Vostok environment; although, it is hard to imagine higher plants surviving such harsh conditions. The V5 and V6 sequences similar to higher plants and associated pathogens probably originate either from the melting glacier or they may have 111 been deposited into the lake when the lake was still exposed to the atmosphere and preserved in the sediments. Either way, these sequences would be from DNA of non-viable cells, as RNA

(indicative of a living organisms) is less stable and could not be preserved for millions of years.

However, the true origin of higher plants is still speculative.

Marine vs freshwater environments

Previous analyses of the ice cores that correlate with the V5 sample (3563 and 3585 m) have

+ 2+ 2+ - - 2- reported high concentrations of Na , Ca , Mg , Cl , HCO3 , and SO4 , whereas the levels of the same ions are significantly reduced in the ice core sections corresponding to those in the V6 sample (Siegert et al. 2001; Priscu et al. 1999; Christner et al. 2006). This indicates that the V5 shallow embayment region is more saline (close to a marine or brackish water environment), which is also confirmed by the large number of V5 sequences related to halophilic, halotolerant and marine bacterial and eukaryotic organisms (marine mollusk, thermal vent fungus, sea anemone), and deep-ocean sediment Archaea (Figure 13, 14, Table 4). Almost 18% of sequences (annotated on NCBI) similar to the V5 sequences were from halophilic or halotolerant species, whereas only two sequences similar to halotolerant uncultured Frankineae bacterium

[99%] and aquatic halotolerant plant associated fungus Pichia farinose [98%] were present in the

V6 sample (Figure 13, 14, Table S7, S9). The accretion ice from the main basin is freshwater and contains almost no mineral inclusions or biomass, which explains almost complete absence of sequences related to halophilic species in the V6 sample (Figure 14) (Priscu et al. 1999;

Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Lipenkov et al. 2002; Karl et al. 1999).

Taxonomic analysis indicates that almost 20% of sequences in both samples originated from freshwater aquatic bacterial and eukaryotic organisms (Figure 13). This is not surprising, as 112

Figure 14. Sequence distribution based on growth conditions. Proportions of metagenomic sequences that were related to sequences annotated and published on NCBI, whose species could be characterized by growing conditions. Sequences presented in Tables S1-12 were manually rescreened to construct Tables 4 and 5. Those sequences matching NCBI sequences with identity levels •99% (lower two) and ш97% (upper two) were selected. The pie charts represent two samples, V5 on the left (blue background) and V6 on the right (grey background).

Metagenomic sequences were categorized based on growth characteristics of NCBI sequences.

Only NCBI records that had species descriptions and/or related publications were used to reconstruct these charts. Numbers of sequences used for each comparison are indicated by N. 113

Growth conditions on the charts are shortened: Psychro- are psychrophilic or psychrotolerant;

Thermo- are thermophilic or tolerant; Acido- are acidophilic or acid tolerant; Alkali- are alkaliphilic or alkalitolerant; Halo- are halophilic or halotolerant; Desiccation means desiccation resistant.

 114

Lake Vostok is primarily an aquatic environment. At lower stringency levels [97%-100%] and even when the stringency was increased up to 99%-100% identity, the same percentage of the V5 sequences (14%) were closely related to those previously described as marine organisms (Figure

13). The large number of sequences in the V5 sample that are most similar to those of marine, halophilic and/or halotolerant species is consistent with a marine environment predicted in the shallow embayment. Only 3% of the V6 sequences were similar [99-100%] to portions of rRNA genes from marine species (Figure 13). These might represent species that adapted to the freshwater environment over the tens of millions of years since the lake might have been continuous with the ocean (Rogers et al. 2013), or the organisms may have simply floated into the main basin from the embayment region. Such microbial composition could be explained by the presence of saltwater or brine layer in the hypolimnion.

Although Lake Vostok has been completely and constantly covered with ice for more than 15 million years (Kapista et al. 1996; Studinger et al. 2003; Ferracciolli et al. 2011; Pross et al.

2012; Young et al. 2011; Zachos et al. 2001), during early and middle Eocene (49 - 46 Myr ago),

Antarctica was ice free and covered with paratropical rain forests with rivers, lakes, microbes, plants, fungi and animals (Young et al. 2011; Zachos et al. 2001; Pross et al. 2012). During that time, sea level was much higher (by 50-100 m), and it is possible that the lake itself and large portions of East Antarctica was part of what we now call the Southern Ocean (Rogers et al.

2013). The Gamburtsev Mountains on the western side of the lake were formed during part of the process that also created depression now occupied by Lake Vostok. Ferraccioli estimated that the rifting process of those mountains, specifically root formation actually started almost 1 Gyr ago in the Proterozoic era (2.5 billion to 540 million years ago). Two other rifting cycles triggered by tectonic movement continued the uplift of Gamburtsev Mountains in Permian (250 115

Myr ago) and Cretaceous (100 Myr ago) periods (Ferracciolli et al. 2011). Additional rifting occurred approximately 65 Myr ago (with uplift of the mountains on the east side of the rift).

From 35 to 34 Myr ago in the late Eocene the temperature suddenly decreased, and while the

Antarctic ice sheet grew, sea levels also decreased. As this occurred, the lake may have been isolated from the ocean due to relatively higher regions to the east of the lake (Rogers et al.

2013; Young et al. 2011). By this time glaciers partially covered the lake (Rogers et al. 2013;

Pross et al. 2012; Young et al. 2011; Zachos et al. 2011). During the end of the Oligocene (27-25

Myr ago) and early Miocene (25-15 Myr ago), the Lake Vostok was intermittently covered with ice due to fluctuating temperatures. Approximately 15 Myr ago, there was another decrease in temperatures, which caused complete ice coverage of the Lake Vostok (Kapista et al. 1996;

Studinger et al. 2003; Ferracciolli et al. 2011; Pross et al. 2012; Young et al. 2011; Zachos et al.

2011). Once the lake became isolated from the atmosphere, surviving aerobic species would

2- - 2+ have remained in the epilimnion waters to utilize O2, SO4 , NO3 , Fe and other nutrients and gases deposited from the glacier, while anaerobes would primarily occupy the hypolimnion.

Considering the rifting processes of the surrounding mountains (at least 60 Myr ago), and the fact that the lake is a result of Earth crustal movements and originated in a graben, it is likely that any hydrothermal or geothermal regions would likely occur in the fault regions of the rift

(Rogers et al. 2013; Ferracciolli et al. 2011).

Thermophiles and thermotolerant

Several studies have speculated about possible hydrothermal activity in the shallow embayment (Priscu et al. 2005; Bulat et al. 2004, 2011) based on deuterium, helium, oxygen-18, and other ion concentrations in the accreted ice, as well as the presence of sequences from 116 thermophiles in the accretion ice (Bulat et al. 2004; Rogers et al. 2013; Shtarkman et al. 2013).

Our results clearly indicate the presence of sequences of thermophilic and thermotolerant organisms in the shallow embayment as well as a few sequences from the main basin, some of which are similar to thermophiles previously detected in the accretion ice samples (Bulat et al.

2004, 2009, 2011; Rogers et al. 2013; Shtarkman et al. 2013). While at lower stringencies, only four of the V6 sequences were similar to those from thermophilic organisms; in the V5 sample, twenty nine sequences most similar to those from thermophilic or thermotolerant species were present (Figure 14). The larger number of nucleic acids from the thermophilic and thermotolerant organisms in the V5 also suggests the presence of the hydrothermal activity in the vicinity of the shallow embayment and possibly on the western side of the peninsula.

These findings are consistent with previous speculations and support hypothesized presence of hydrothermal or geothermal activity primarily in the embayment and peninsula regions (Priscu et al. 2005; Bulat et al. 2004, 2009, 2011; Abyzov et al. 2004; Rogers et al. 2013; Shtarkman et al. 2013). The hydrothermal system might cause mixing of ions and gases, thus preventing stratification of the water in the vicinity of the hydrothermal vents. It might also disrupt some parts of the sediment. At the same time it may provide energy and nutrient source for organisms in the lake. In fact, it could be the major source of energy and nutrients in the lake. The hydrothermal activity and the attendant mixing might be the reason for the large diversity of environmental DNA found in the V5 samples. The marine environment in the shallow embayment is probably slightly acidic due to mineral deposition from the overriding glacier, which could explain the presence of several sequences similar to acidophilic organisms in both samples (Figure 14). 117

Molecular traces of higher plants and sequences similar to organic decomposition microbes

(i.e. wood decay fungi, decaying leaf foliage bacteria) found in both Vostok samples suggest denitrification process occurring in the sediment layer at the bottom of the shallow embayment.

Anaerobic degradation of the sediment would probably lead to release of base functional groups, which can absorb excessive amount of hydrogen released from the hydrothermal vent probably due to water radiolysis (Bulat et al. 2011). Also, hydrogenotrophic microbes are capable of extracting energy by breaking di-hydrogen molecules with either quinone or highly reactive molecules, such as NAD+ (Lavire et al. 2006). Among such microbes are species from

Betaproteobacteria (Ralstonia sp.) and Deltaproteobacteria (Desulfovibrio sp.). Sequences similar to these organisms were found in this study (Table S4 and S11). While speculative, the described scenario might lead to a slightly alkaline environment close to the hydrothermal vent at the bottom of the shallow embayment and can explain the presence of eleven sequences similar to rRNA genes from alkaliphilic and/or alkalitolerant organisms only in the V5 sample [>99%]

(Figure 14) (Martin and Russel 2007; Lavire et al. 2006; Bulat et al. 2011).

As the glacier moves into the lake, melted freshwater probably creates an ion gradient. The southern main basin area might be isolated from the hydrothermal activity and this stable condition might have allowed stratification of the lake, which in turn could result in the formation of a brine layer at the bottom of the lake, leaving glacial meltwater in the epilimnion layer. At both stringency levels, several sequences related to thermophilic bacteria were found in the V6 sample (Figure 14, Table S9 and S10). Ice from that depth accretes close to the vicinity of the hydrothermal/geothermal source, which might be trapping thermophilic organisms and/or their nucleic acids. One of the sequences was similar to a sequence from uncultured Thiobacillus sp. [97%], which is a Hydrogenophilaceae bacterium. Within the same family of organisms, 118

Bulat and colleagues recovered sequences from Hydrogenophilus thermoluteus from the ice core section that was less than one meter below (3607 m) one of the V6 ice core segments (3606 m) that was used in this study (Bulat et al. 2004; Lavire et al. 2006).

Several sequences were also related to sequences from arsenic oxidizing bacteria. Two sequences in both samples were similar to rRNA sequences from Thiomonas spp. [80% and

95%] and one sequence in the V5 sample was 81% identical to the putative thioredoxin gene from the Thiomonas sp. 3As (Table S4 and S10). Another four sequences were closely related to sequences from Arsenicicoccus bolidensis (Actinobacteria) [rRNA 98%], Herminiimonas arsenicoxydans (Betaproteobacteria) [mRNA, 90%], one sequence was closest to arsenic tolerant

Halomonas sp. HAL1 [*99%], and one was 100% identical to the Pseudomonas sp. SY6 rRNA gene sequence (Gammaproteobacteria) previously isolated from an arsenic contaminated environment (Table 4, S1, S4, S11). As one of the major sources of arsenic is from volcanic emissions and the fact that sequences from thermophilic microbes were present in the metagenomic data, it is possible that the arsenic compounds originate from some sort of geothermal or hydrothermal activity. In addition to that, the hydrothermal vent existence could be also estimated from the presence of sequences similar to those from deep-sea and deep-ocean bacteria, hydrothermal vent fungi, deep-sediments, hot acidic springs, and volcanic mud (Table

4, 5).

Hydrothermal or geothermal activity in the vicinity of the shallow embayment could cause water turbidity, bringing gases, ions, sediment particles, cells, and biological molecules (e.g., nucleic acids) to the lake surface. Thus, constant mixing will keep the salinity level similar to a brackish water environment (basically preventing lake from stratification), which might characterize the shallow embayment (Rogers et al. 2013; Shtarkman et al. 2013). At the same 119 time, the number of sequences from the V6 sample similar to freshwater species might indicate that little or no mixing occurs in the main basin. Previous studies suggest that the lake was previously connected to the ocean (Rogers et al. 2013; Shtarkman et al. 2013; Young et al.

2011), and therefore might have a brine layer at the bottom. The freshwater at the surface of the main basin likely originates from glacial meltwater and/or river systems surrounding the lake.

Psychrophiles and psychrotolerant

At 97% identity level and higher, 156 sequences in the V5 sample and 18 sequences in the

V6 were similar to NCBI sequences indicating species, which were previously characterized by different growth conditions. Of these, BLASTn results for 96 sequences in V5 and 11 in V6

VDPSOHVZHUHDW•LGHQWLW\OHYHO$WERWKLGHQWLW\OHYHOVLQERWKVDPSOHV, the majority of sequences (41 to 65%) were similar to species previously described as psychrophilic or psychrotolerant (Figure 14). The constant accretion process suggests that the surface water should be near the crystallization temperature; thus, such large numbers of sequences similar to psychrophilic and psychrotolerant organisms previously found in similar environments and from the same ice core samples is expected. Taxonomic analysis suggests the presence of sequences similar to those previously recovered from other Antarctic lakes and the Southern Ocean as well as from Arctic ice. In most cases these sequences were closely related to those from

Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Firmicutes, Actinobacteria, and

Cyanobacteria. This meets the general composition of sequences found in this study of Lake

Vostok accretion ice. For instance, in this study, four 16S rRNA sequences were related to analogous sequences from uncultured Antarctic cyanobacteria from Lake Fryxell, Taylor Valley

[96%], Orange Pond and Fresh Pond, McMurdo Ice Shelf [94%], Forlidas Pond, the 120

Transantarctic Mountains [93%], and one exhibited [100%] identity to uncultured marine bacterium isolated from Antarctic sea water from Prydz Bay (Table 4, S1). Sequences were similar to those from Antarctic Firmicutes and Actinobacteria species, which were previously isolated and sequenced of from the ice core section 3582 m. Specifically, Carnobacterium (two sequences [100%]) and Kocuria spp. (two sequences [100%]) from section 3585 m and

Micrococcus sp. (seven sequences [99%-100%]) from section 3606 m found by Rogers’ group were also present in the V5 sample (D’Elia et al. 2008). Sequences related to these bacterial species were present in the corresponding and neighboring core sections of the V5 (3563 m,

3585 m) and V6 (3606 m, 3621 m) samples (Table 4, 5, S1, S3, S4, S9, S10). One sequence was

97% identical to rRNA sequence of the uncultured alphaproteobacterium, which was previously isolated from the soil samples collected at Coal Nunatak, Antarctica (Yergeau E et al. 2007)

(Table S3). More than 30 other sequences were similar to portions of rRNA genes from different

Psychrobacter spp. and at least half of those sequences were at 99% and 100% identity levels

(Table 4, S1, and S3). Some of the V5 sequences were similar to psychrophilic species of

Microbacterium spp.; these are in the same family as Frigoribacterium sp., which was previously isolated from Siberian permafrost (Gilichinsky et al. 2005; Bakermans et al. 2006).

Two sequences [95% and 97%] similar to Frigoribacterium sp. were also present in the V5 sample (Table S1).

Within Eukaryota, two classes of fungi (Ascomycota and Basidiomycota) were previously identified in glacial and accretion ice core sections, and represented the largest fraction of eukaryotic sequences identified in this study. Among these were sequences related to Phoma sp.

(3563 m, V5) (Table S1, S2), also previously isolated and reported by Rogers’ group from the same ice core section (D’Elia et al. 2008, 2009). Sequences related to sp., 121

Davidellaceae sp., Rhodotorula sp., and Cryptococcus sp. were found in the V5 sample (Table

3, S1, S2, S5), and they were also isolated from the Lake Vostok accretion ice core 3582 m and

Greenland glacial ice (D’Elia et al. 2009; Ma et al. 1999, 2000; Stamer et al. 2005; Raymond et al. 2008; Knowlton et al. 2013). The V6 sequences were also similar to species previously found in Lake Vostok ice core sections. One sequence was most similar to one from Cladosporium sp.

[99%], which had been previously isolated from three Lake Vostok accretion ice core sections,

3610, 3613 m, and 3582 m depths. Another was 100% similar to a sequence from Bullera taiwanensis, which is in the same family as Cryptococcus sp., which was previously isolated from the Lake Vostok ice core section at 3619 m depth (Table 4, S7, S9) (D’Elia et al. 2009).

Mean unique sequence concentrations

Overall, at all stringencies levels, more than 90% of all sequences (among both samples) were from the V5 sample and were related to a variety of bacterial species. The remaining 10% divided between sequences that were related to eukaryotic organisms in the V5 sample (207 sequences) and all sequences from the V6 sample (165 sequences) (Table 1-5). Such difference

(more than 20 times) in the quantities of the sequences between the two Vostok samples was expected. From the previous fluorescence microscopy, cultivation and sequence results of the ice cores corresponding to V5 and V6 samples and from similar depths, cell concentration ranges from less than 1 and up to 35 cells per ml of meltwater (D’Elia et al. 2008, 2009, personal communication). From this metagenomic analysis we could estimate the mean unique sequence concentration to be close to 14 unique sequences per ml. Thus, considering that the V6 sample contained at least 20 times less organisms, the mean unique sequence concentration would be less than 0.7 unique sequences per ml. Presumably the Lake Vostok surface water harbors at 122 least the same numbers of organisms per ml as estimated mean unique sequence concentration.

The difference between the number of sequences correlates directly with previous cell counts from microscopy experiments, culturing, sequencing, and phylogenetic analysis (Priscu et al.

1999; Christner et al. 2001, 2006; D’Elia et al. 2008, 2009; Karl et al. 1999; Bulat et al. 2011).

Metabolic analysis

A total of 441 mRNA sequences for many enzymes of different metabolic pathways were found during several BLAST searches and KAAS KEGG analysis. For the 302 sequences analyzed using BLASTn, the mean percent identity was 94% and SHUFHQWLGHQWLW\ZDV•IRU

VHTXHQFHV RIWKRVHZHUH• 7KHUHPDLQLQJP51$VHTXHQFHVZHUHDQDO\]HG with tBLAST[VHDUFKRIWKRVHH[KLELWHG•SRVLWLYHVXEVWLWXWLRQV •LGHQWLW\OHYHO 

(Table S11). While KAAS KEGG searches suggested that the metabolic pathways were incomplete, the number of sequences and their similarity to those of a variety of known enzymes from GenBank indicated metabolic functionality of reconstructed pathways presented in Figures

11-14. The V5 sample sequence analysis suggests the presence of carbohydrate pathways

(glycolysis, pyruvate metabolism, glyoxylate synthesis, and others), energy cycles (oxidative phosphorylation, o/rTCA cycles, the rPP pathway, and the reductive Acetyl-CoA), and nucleotide synthesis. Sequences most closely related to those that encode for enzymes from various nitrogen metabolic processes (nitrogen fixation, nitrification and denitrification) as well as and sulfur metabolism were present. The majority of sequences were related to sequences for enzymes from various amino acid biosynthesis and degradation processes (of 119 sequences, 73 were unique) (Table S11). In addition to those, nineteen sequences similar to those of different aa-tRNA synthases were present in our metagenomic data, of which 4 123 sequences from BLASTn were at [99%] identity, and 4 sequences were at •> @LGHQWLW\ from tBLASTx searches. Nineteen other sequences related to enzymes from glycan and 31 sequence related to enzyme from lipid biosynthetic pathways (nineteen were unique) were present. Additional forty sequences were closely related to genes of different membrane transport proteinVRIZKLFKVHTXHQFHVH[KLELWHG•[97%] identity in BLASTn and 7 were at •

[*90%] identity in tBLASTx searches (Table S11). Also, twelve sequences were similar to different secondary metabolites and 38 to co-factors (of which 30 unique). Five sequences were closely related to the portions of different oxidoreductase genes, of which three were highly similar to NADH oxidase [99%], cytochrome c peroxidase [*79%] and chloride peroxidase

[*98%]. Thirty two sequences exhibited similarity to segments of different genes essential for

DNA replication (i.e. DNA helicase [100%], DNA gyrase subunit A [85%], single-strand DNA- binding protein [94%], DNA polymerase I [96%]), homologous recombination (i.e. recombination protein RecA [96%] and Holliday junction DNA helicase RuvB [*78%]) and

DNA repair (i.e. repressor LexA [99%]) and RNA degradation (several ATP-dependent RNA helicases: helicase DeaD [96%], DOB1 [97%], and DDX6/DHH1 [99%]) (Table S11).

Reconstructed metabolic pathways indicate that not only microorganism have more than one pathway for the energy synthesis, but have pathways for major cellular processes that potentially can help in sustaining life in Lake Vostok waters.

Carbon fixation pathways

The taxonomic analysis with metabolic reconstruction suggests the presence of a complex heterotrophic biota, but also indicates the presence of many autotrophic species (Table 3, 4).

Based on the metagenomic and metatranscriptomic analysis, three different carbon fixation 124 pathways were deduced. The rPP pathway (Calvin-Benson Cycle) was the most abundant based on the number of rRNA and mRNA sequences. Sequences related to Alphaproteobacteria,

Betaproteobacteria, Gammaproteobacteria, Cyanobacteria (Anabaena azotica, [97%]) as well as

Archaeplastida and Chromalveolata (Table 3, S1, S4, S5, S11) were present in the Lake Vostok accretion ice samples, although some species of Alphaproteobacteria (Williams et al. 2006) and

Bacteroidetes are also known to fix carbon via the rTCA cycle (Bar-Even et al. 2012). In addition to that, some of the species found within the Alphaproteobacteria, Deltaproteobacteria,

Epsilonproteobacteria, and Chlorobi utilize the rTCA cycle for carbon fixation. The V5 sequences also suggest the presence of enzymes for the rTCA cycle (Table S4, S5, and S11).

The Chlorobium species and a few members of the Deltaproteobacteria (e.g., Desulfobacter hydrogenophilus) utilize the rTCA cycle (Campbell and Cary 2004). Some thermophilic bacteria from the Aquificales and the archaeal Thermoproteaceae also utilize rTCA cycle (Campbell and

Cary 2004). Several types of nitrite-oxidizing bacteria have been shown to perform carbon fixation via rTCA cycle. Members of Alphaproteobacteria (Nitrobacter) and Nitrococcus mobilis from the Gammaproteobacteria use 2H+ and 2e- from converting nitrite into nitrate in their electron transport chains. They also have been reported to perform carbon fixation using an rTCA cycle (Lückera et al. 2010). Sequence evidence for the rTCA cycle was also reported within members of the Epsilonproteobacteria from deep-sea and deep-ocean hydrothermal vents, as well as from marine sediments (Yamamoto and Takai 2011; Tarasov et al. 2005). Our phylogenetic and taxonomic analyses indicated the presence of several deep-sea vent

Epsilonproteobacteria (Helicobacter and Campylobacter sps.). The fact that these meso- and thermophilic bacteria use rTCA cycle for carbon fixation also suggest the presence of hydrothermal activity in the Lake Vostok (Nakagawa et al. 2007). 125

Considering that there is no light present in the lake, it is likely that phototrophic species are probably functioning as heterotrophs. Based on the metagenomic data, two sequences were closely related to methanotrophic deep sediment Archaea. These might be utilizing the reductive acetyl-CoA pathway for carbon fixation; however, sequences related to those enzymes were not present. At the same time, one V5 sequence was related to sequence from a Chloroflexi species, which often perform carbon fixation using the 3-hydroxypropionic cycle (Bar-Even et al. 2012), but as the sequence was related to the portion of the ribosomal gene and some Chloroflexi species can also fix carbon with rPP pathway, it is unclear whether this cycle is present in the lake.

Nitrogen metabolism

Taxonomic and metabolic sequence analyses indicated the presence of species and genes that participate in nitrogen fixation as well as nitrogen conversion processes. The source of nitrogen in the lake is probably from the glacial ice above. When the glacier moves across the lake, some of the lower portions of the glacier break off and melt into the lake, and thus release trapped atmospheric gasses into the lake water. Sequences similar to nitrogen fixing enzymes and such species capable of nitrogen fixation, like Cyanobacteria, some members of Alphaproteobacteria,

Betaproteobacteria, Gammaproteobacteria, and Firmicutes were present. Sequences related to several nitrifying bacteria, such as Betaproteobacteria and Gammaproteobacteria were present in the V5 sample (Table S1). The V5 metagenomic analysis showed sequences from anammox

Planctomycetes species (Table 3, S1-S4, and S11) (Speth et al. 2012). Sequences similar to genes encoding nine different enzymes for nitrogen metabolism and several enzymes from supporting pathways suggest the presence of nitrification and denitrification processes as well as 126 nitrogen fixation. Taxonomic analysis confirmed the presence of sequences similar to nitrogen fixing species of Cyanobacteria, members of Alphaproteobacteria (Mesorhizobium,

Rhodobacter), Betaproteobacteria (Burkholderia), Gammaproteobacteria (Azotobacter,

Klebseilla, Halomonas spp.), and Firmicutes (Bacillus). Sequences related to nitrifying bacteria, such as Denitrobacter and Nitrosomonas (Betaproteobacteria) and relatives of Nitrosococcus species from the same family of Chromatiaceae (Gammaproteobacteria), were found in the V5 sample (Table S1). Sequences related to species of nitrate reducing Firmicutes (e.g., Bacillus and Clostridium), Actinobacteria (Micrococcus and Streptomyces), Alphaproteobacteria

(Paracoccus), Betaproteobacteria (nitrifying Thiomonas and nitrate reducing Denitrobacter spp.), and Gammaproteobacteria (Pseudomonas) also involved in denitrification processes were found in the Lake Vostok metagenomic data (Table S1-S4, S11).

One sequence was [100%] identical to the portion of the rRNA gene from anammox

Planctomycetes species (Kuenenia stuttgartiensis) and one was similar to the other rRNA gene

[98%] from organic decomposition fungi (Articulospora tetracladia) suggesting possibility for ammonification process. Both samples also contained sequences similar to nitrogen assimilation fungi (incorporation of ammonia into amino acid with glutamine-glutamate synthase enzymes) as well as wood decay and organic decomposition fungi (ammonification process converts organic nitrogen to ammonium). The presence of sequences similar to genes for enzymes from nitrogen metabolism as well as taxonomic analysis indicates that a complex nitrogen cycle exists among Lake Vostok microorganisms; thus, nitrogen concentrations in the lake are probably sufficient to support life, as suggested by others (Priscu et al. 1999; Christner et al. 2006; Karl et al. 1999). 127

Conclusions

Lake Vostok is the largest of almost 400 subglacial lakes and 4th deepest lake on Earth.

Extreme conditions, such as cold and possible heat from the hydrothermal activity, complete absence of sunlight, pressure from almost 4,000 m of the overriding glacier, with limited nutrients, creates one of the most hostile environments on the planet. Two types of accretion ice are formed over the lake. Type I accretion ice represents surface water frozen to the bottom of the glacier in the western part of the shallow embayment and western shallow portion of the main basin (near a peninsula), whereas type II originates from water in the eastern part of the embayment and also over the main southern lake basin (over the glacier flow line). Although some have hypothesized that the lake is sterile, this accretion ice metagenomic and metatranscriptomic study indicates that it is far from sterile. Sequence-based analysis indicates that a complex ecosystem appears to be present in the lake. Four ice core sections combined into two samples represent two different parts of the lake. The V5 sample (two ice core sections from

3563 and 3584/3585 m depths) represent ice that accreted in the vicinity of the shallow embayment, and the V6 sample (two ice core sections from 3606 and 3621 m depths) is accretion ice from the western part of the main basin.

Previous microscopy, cell culturing, sequencing, and phylogenetic analysis of the rRNA gene sequences from these ice cores indicated the presence of a taxonomic diversity of Bacteria and

Eukarya (primarily fungi). This study confirms and expands the complexity of bacterial community and eukaryotic diversity in the Lake Vostok. Sequences similar to those from thermophilic, psychrophilic, marine, aquatic, deep-sea, and sedimentary Bacteria and Eukarya were present in both Vostok samples. While large numbers of sequences most similar to those from psychrophilic organisms are consistent with the cold water environment of Lake Vostok, 128 sequences that were most similar to those from thermophilic organisms were also present. The majority of sequences that were most similar to those from thermophilic organisms were found in the V5 sample, indicating possible presence of the hydrothermal/geothermal activity in the vicinity of the shallow embayment. Such possibility has been hypothesized by several researchers (Karl et al. 1999; Bell and Karl 1999; Bell et al. 2002; Bulat et al. 2004, 2011; Priscu et al. 2005; Rogers et al. 2013; Shtarkman et al. 2013). In addition, sequences similar to those from deep-ocean hydrothermal vent bacteria and eukaryotes were found. Among these were large numbers of thermophilic bacteria previously isolated from deep-sea hydrothermal vents, hot springs and mud volcanoes, as well as hydrothermal vent fungi. These sequences were mostly found in the V5 sample, but a few sequences characteristic of those from thermophilic bacteria were present in the V6. This also corresponds to the previously suggested location of the hydrothermal vent, in close proximity to the western side of the peninsula region.

Among eukaryotes, most abundant were sequences of different fungi species. Many of those were closely related to those from Ascomycota, which had previously been characterized as animal and plant pathogens, plant decaying fungi, and organic decomposers. At the same time, sequences similar to those from Basidiomycota were mostly classified as psychrophilic taxa that had previously been isolated from Antarctic glaciers, Antarctic sea ice, and soils. The V5 sample also contains sequences similar to those from several psychrophilic deep-sea animals (a small deep-sea anemone and a marine sediment mollusk), non-photosynthetic and photosynthetic

Chromalveolata, two sequences from members of the Excavata, and one freshwater phototrophic

Rhizaria.

Both samples also contained sequences similar to those from members of the Archaeplastida.

Eleven of the sequences were most closely related to those from deep-sea sediment Chlorophyta 129 and one Rhodophyta (similar to one previously isolated from Antarctica). Also, 27 % of all V5 eukaryotic sequences were similar to those from Streptophyta, while only two were found in the

V6 sample (Rogers et al. 2013; Shtarkman et al. 2013). The origin of these sequences in the lake is probably either from the sediment that formed while the lake was still open to the atmosphere more than 15 Myr ago or from meltwater from the overriding glacier. Sequences that were most similar to those from higher plants were found in both samples, but might represent nucleic acids from the remains of dead plants.

Viable bacteria as well as fungi were previously recovered from Antarctic ice cores (those correspond to ice core sections described in this study), as well as from Siberian permafrost, and

Greenland glaciers (D’Elia et al. 2008, 2009; Gilichinsky et al. 2005; Ma et al. 1999, 2000;

Stamer et al. 2005). Nucleic acids can be detected even millions of years after an organism has perished (D’Elia et al. 2008, 2009; Rogers et al. 2013). Together with the large number of

Streptophyta sequences (in comparison with V6), this also correlates with hydrothermal activity in the vicinity of the shallow embayment. The hydrothermal vent might disrupt parts of the sediment in the embayment region; thus, once sediment biomass and nucleic acids reach the water surface, they can be entrapped in the ice accreted onto the bottom of the glacier. Few sequences similar to those from plants were found in the V6 sample, which also suggests that the plant sequences are from plant remains, and that mixing is limited in the main basin. Also, the sequences most similar to those from plant pathogenic fungi, found in both samples, might have been deposited into the lake with those land plants millions of years ago. The cold anaerobic conditions in the sediments at the bottom of the lake would favor preservation of the DNA in the dead plant remains. 130

While many gene sequences for enzymes in the metabolic pathways were not found, those that were present in this study (for the V5 sample) suggest many functional metabolic pathways, including pathways for several carbohydrate metabolic processes, various amino acids synthesis pathways, and several degradative pathways. Sequences similar to gene sequences from enzymes of the purine and pyrimidine metabolic processes were present in the dataset, along with sequences similar to many genes involved in DNA replication, repair, and RNA degradation, as well as genes for amino-acyl-tRNA-synthase and prokaryotic and eukaryotic ribosome subunits.

Sequences similar to genes encoding various sugar metabolic processes were present, all leading towards utilizing various sugars for pyruvate production. More importantly, two major carbon fixation systems were deduced. Based on the taxonomic representation and gene sequences, the most common was the rPP cycle, but the rTCA cycle also appeared to be present to a high degree. Both pathways fix CO2 utilizing NADH and NADPH. There were weak indications of two other methods of carbon fixation, the acetyl Co-A and 3-propionate bicycle, but results were equivocal. The number of sequences similar to the portions of the genes of proteins involved in oxidative phosphorylation present in the V5 sample was sufficient to estimate a fully functional electron transport chain with genes similar to those encoding for subunits of all three proton pumps and ATP synthase. Both carbon fixation pathways require large quantities of ATP, especially the rPP cycle, which needs approximately 3.5 times more than the rTCA cycle. The sequences similar to genes for proteins involved in several nitrogen cycle processes were also found in the dataset. Those were responsible for nitrification and denitrification, as well as nitrate reduction and nitrogen fixation. All of the processes are supported by the taxonomic analyses.

The presence of sequences similar to heterotrophic organic decomposing fungi and enzymes for glutamine and glutamate conversions indicates possible ammonia assimilation processes. 131

In the course of this study, a complex ecosystem was discovered that might indicate the biological conditions in Lake Vostok. Taxonomic analysis and metabolic classification suggest that Lake Vostok harbors not only simple unicellular organisms but also multicellular eukaryotes. More than 50% of all sequences were similar to uncultured and unidentified organisms. The underrepresentation of environmental sequences in the NCBI GenBank database and rudimentary annotation of known sequenced environmental nucleic acids complicates full detailed taxonomic analysis of the Lake Vostok metagenome presented here. Many are probably novel species that possibly represent various known taxonomic groups. This means that intensive phylogenetic analysis would be required in order to predict detailed taxonomic levels. However, the current metagenomic study is the most detailed and comprehensive to date. The number of unique sequences and their taxonomy in this study is far greater that those from previous cell culture and fluorescence microscopy studies (Priscu et al. 1999; Christner et al. 2001, 2006;

D’Elia et al. 2008, 2009; Karl et al. 1999; Bulat et al. 2011), and estimated mean numbers of unique sequences (V5: fourteen unique sequences per ml; and V6: 0.7 unique sequences per ml) are consistent with those from corresponding ice core sections from previous studies (D’Elia et al. 2008, 2009). Considering mean numbers of unique sequences and sample volumes (250 ml), there could be at least 3500 sequences in V5 sample and 175 in the V6. These estimated values are very close to number of sequences reported in Tables 1, 2, and 3. If each unique sequence is a representation of a single organism, this means that the same number of (single and even complex multicellular) organisms might exist in the Lake Vostok surface waters of the shallow embayment, while at least 165 organisms would be present in the main basin surface waters

(Tables 1, 2 and 3). 132

While the geological and ecological characteristics of Lake Vostok vary from those described on Mars, or on Europa (a Jovian moon), some similarities have been proposed. Both Mars and

Europa are geochemically active and also have glaciers with subglacial liquid water (Geissler et al. 1998; Figueredo and Greeley 2004; McKay et al. 2005; Goodman et al. 2007, Bibring et al.

2007; Yung et al. 2010). Several studies suggest that at one time Mars had an environment close to that on Earth, but due to unknown events, it changed several billion years ago (Bibring et al.

2007; McKay et al. 2005). The ice crust on the Europa indicates that some geothermal activity is occurring beneath tens of kilometers of glacier ice (Geissler et al. 1998; Figueredo and Greeley

2004; Goodman et al. 2007). This means that if life forms can withstand the high pressures exerted by the glacier, low nutrient levels, complete absence of light, water temperature close to freezing, and heat from the hydrothermal activity, then there is a chance that life forms exist on the planets and moons with similar environments. 133

References for the Chapter II

Abysov SS, Mitskevich IN, Poglazova MN, Barkov NI, Lipenkov VY, Bobin NE,

Koudryashov BB, Pashkevich VM, Ivanov MV (2001) Microflora in the basal strata at Antarctic ice core above the Vostok Lake. Advanced Space Res 28(4):701-6.

Abyzov SS, Hoover RB, Imura S, Mitskevich IN, Naganuma T, Poglazova MN, Ivanov MV

(2004) Use of different methods for discovery of ice-entrapped microorganisms in ancient layers of the Antarctic glacier. Elsevier, Advances in Space Research 33(8): 1222-1230.

Anbutsu H, Goto S, and Fukatsu T (2008) High and low temperatures differently affect infection density and vertical transmission of male-killing Spiroplasma symbionts in Drosophila hosts. Appl. Environ. Microbiol. 74(19):6053-6059.

Antony R, Krishnan KP, Laluraj CM, Thamban M, Dhakephalkar PK, Engineer AS, Shivaji

S (2012) Diversity and physiology of culturable bacteria associated with coastal Antarctic ice core. Elsevier. Microbiological Res, 167: 372-380.

Bakermans C, Ayala-del-Río HL, Ponder MA, Vishnivetskaya T, Gilichinsky D, Thomashow

MF, Tiedje JM (2006) Psychrobacter cryohalolentis sp. nov. and Psychrobacter arcticus sp. nov., isolated from Siberian permafrost. Int J Syst Evol Microbiol 56(6): 1285-1291.

Bar-Even A, Noor E, and Milo R (2012) A survey of carbon fixation pathways through a quantitative lens. Journal of Experimental Botany 63(6): 2325–2342.

Barrett P (2003) Paleoclimatology: Cooling a continent. Nature 421(6920): 221-223.

Beatty JT, Overmann J, Lince MT, Manske AK, Lang AS, Blankenship RE, Van Dover CL,

Martinson TA, and Plumley FJ (2005) An obligately photosynthetic bacterial anaerobe from a deep-sea hydrothermal vent. Proc Natl Acad Sci 102(26): 9306-9310. 134

Bell R, Studinger M, Tikku A, Castello JD (2005) Comparative biological analyses of accretion ice from subglacial Lake Vostok. In Life in Ancient Ice; Castello JD, Rogers SO, Eds.;

Princeton University Press: Princeton, NJ, U.S.A: 251-267.

Bell RE and Karl DM (1999) Evolutionary processes a focus of decade-long ecosystem study of Antarctic's Lake Vostok. Eos Trans AGU 80(48), 573.

Bell RE, Studinger M, Shuman CA, Fahnestock MA, Joughin I (2007) Large subglacial lakes in East Antarctica at the onset of fast-flowing ice streams. Nature 445: 904-907.

Bell RE, Studinger M, Tikku AA, Clarke GKC, Gutner MM, and Meertens C (2002) Origin and fate of Lake Vostok water frozen to the base of the East Antarctic ice sheet. Nature 416:

307-310.

Bibring JP, Arvidson RE, Gendrin A, Gondet B, Langevin Y, Le Mouelic S, Mangold N,

Morris RV, Mustard JF, Poulet F, Quantin C, Sotin C (2007) Coupled Ferric Oxides and Sulfates on the Martian Surface. Science 317: 1206-1210.

Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A,

Taylor J (2010) Unit 19.10 Galaxy: a web-based genome analysis tool for experimentalists. Curr

Protoc Mol Biol. Chapter 19: 1-21.

Bulat SA, Alekhina IA, Marie D, Martins J, Petit JR (2011) Searching for Life in the

Extreme Environments Relevant to Jovian’s Europa: Lessons from Subglacial Ice Studies at

Lake Vostok (East Antarctica). Elsevier, Advances in Space Res 48: 697-701.

Bulat SA, Alekhina IA, Lipenkov VYa, Lukin VV, Marie D, and Petit JR (2009) Cell concentrations of microorganisms in glacial and lake ice of the Vostok ice core, East Antarctica.

Microbiology 78(6): 808-810. 135

Bulat SA, Alekhina IA, Blot M, Petit JR, de Angelis M, Wagenbach D, Lipenkov

VYa, Vasilyeva LP, Wloch DM, Raynaud D, et al. (2004) DNA signatures of thermophilic bacteria from the aged accretion ice of Lake Vostok, Antarctica: implications for searching for life in extreme icy environments. International Journal of Astrobiology 3(01): 1-12.

Camacho C, Madden T, Coulouris G, Ma N, Tao T, Agarwala R, Morgulis A (2013) BLAST

Command Line Applications User Manual. NCBI (2008), updated March 25th.

Campbell BJ and Cary SC (2004) Abundance of reverse Tricarboxylic Acid Cycle genes in free-living microorganisms at deep-sea hydrothermal vents. Applied and Environmental

Microbiology. 6282–6289.

Chatterjee S and Scotese CR (1999) The breakup of Gondwana and the evolution and biogeography of the Indian plate. PINSA 65: 379-425.

Chen J-G, Lou D, and Yang J-F (2011) Isolation and identification of Acholeplasma sp. from the mud crab, Scylla serrata. Hindawi Publishing Corporation. Evidence-Based Complementary and Alternative Medicine. 2011: 1-5.

Chevreux B, Wetter T and Suhai S (1999) Genome sequence assembly using trace signals and additional sequence information. Computer Science and Biology In: Proceedings of the

German Conference on Bioinformatics 99: 45-56.

Christner BC, Mosley-Thompson E, Thompson LG, and Reeve JN (2001) Isolation of bacteria and 16S rDNAs from Lake Vostok accretion ice. Env Micro 3(9): 570-577.

Christner BC, Royston-Bishop G, Foreman CM, Arnold BR, Tranter M, Welch KA, Lyons

WB, Tsapin AI, Studinger M, Priscu JC (2006) Limnological conditions in Subglacial Lake

Vostok, Antarctica. Limnol Oceanogr 51(6): 2485-2501. 136

Cox J, Schubert AM, Travisano M, Putonti C (2010) Adaptive evolution and inherent tolerance to extreme thermal environments.BMC Evol Biol. 10(75): 1-11.

D’Elia T, Veerapaneni R and Rogers S.O (2008) Isolations of microbes from the Lake

Vostok Accretion Ice. Applied Env Micro 4962-4965.

D’Elia T, Veerapaneni R, Theraisnathan V and Rogers SO (2009) Isolations of fungi from the Lake Vostok Accretion Ice. Mycologia 101(6).

Duncan J, Wingham DJ, Siegert MJ, Shepherd A and Muir AS (2006) Rapid discharge connects Antarctic subglacial lakes. Nature Letters 440: 1033-1036.

Ekaykin AA, Lipenkov VY, Petit JR, Johnsen S, Jouzel J, Masson-Delmotte V (2010)

Insights into hydrological regime of the Lake Vostok from differential behavior of deuterium and oxygen-18 in accreted ice. Journal of Geophysical Res 115: 1-14.

Ferracciolli F, Finn CA, Jordan TA, Bell RE, Anderson LM, Damaske D (2011) East

Antarctic rifting triggers uplift of the Gamburtsev Mountains. Nature 479: 388-392.

Figueredo PH and Greeley R (2004) Resurfacing history of Europa from pole-to-pole geological mapping. Elsevier, Icarus 167: 287–312.

Geissler PE, Greenberg R, Hoppa G, McEwen A, Tufts R, Phillips C, Clark B, Ockert-Bell

M, Helfenstein P, Burns J, et al. (1998) Evolution of lineaments on Europa: clues from Galileo multispectral imaging observations. Elsevier, Icarus 135(1): 107–126.

Gendrin A, Mangold N, Bibring JP, Langevin Y, Gondet B, Poulet F, Bonello G, Quantin C,

Mustard J, Arvidson R, et al. (2005) Sulfates in Martian layered terrains: The OMEGA/Mars

Express View. Science 307:1587-1591. 137

Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg

D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Research 15(10): 1451-1455.

Gilichinsky D, Rivkina E, Bakermans C, Shcherbakova V, Petrovskaya L, Ozerskaya S,

Ivanushkina N, Kochkina G, Laurinavichuis K, Pecheritsina S, et al. (2005) Biodiversity of cryopegs in permafrost. FEMS Microbiology Ecology 53(1): 117-128.

Goecks J, Nekrutenko A, Taylor J and the Galaxy Team (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Bio 111(8): R86.

Goodman JC, Collins GC, Marshall J, Pierrehumbert RT (2007) Hydrothermal plume dynamics on Europa: Implications for formation. Journal of Geophysical Res 109: 1-19.

Guidetti R and Bertolani R (2005) taxonomy: an updated check list of the taxa and a list of characters for their identification. Zootaxa 845: 1–46.

Jean-Baptiste P, Petit J.-R, Lipenkov VY, Raynaud D, Barkov NI (2001) Constraints on hydrothermal processes and water exchange in Lake Vostok from helium isotopes. Nature 411

(6836): 460-462.

Kapista A, Ridley JF, GdeQ R, Siegert MJ, Zotikov I (1996) Large deep freshwater lake beneath the ice of central Antarctica. Nature 381: 684-686.

Karl DM, Bird DF, Björkman K, Houlihan T, Shackelford R, Tupas L (1999)

Microorganisms in the accreted ice of Lake Vostok, Antarctica. Science 286(5447): 2144-7.

Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33: 511-518. 138

Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30: 3059-3066.

Knowlton CN, Veerapaneni R, D’Elia T and Rogers SO (2013) Microbial Analyses of

Ancient Ice Core Sections from Greenland and Antarctica. Biology 2(1): 206–232.

Kitano H, Funahashi A, Matsuoka Y, Oda K (2005) Using process diagram for the graphical representation of biological networks. Nat Biotechnol 23(8): 961-966.

Lapidus A, Chertkov O, Nolan M, Lucas S, Hammon N, Deshpande S, Cheng J-F, Tapia R,

Han C, Goodwin L, et al. (2011) Genome sequence of the moderately thermophilic

Flexistripes sinusarabici strain MAS10T. Stand Genomic Sci. 5(1): 86-96.

Lavire, C., Normand, P., Alekhina, I., Bulat, S., Prieur, D., Birrien, J. L., Fournier, P., Hänni,

C., Petit, J. R. Presence of Hydrogenophilus thermoluteolus DNA in accretion ice in the subglacial Lake Vostok, Antarctica, assessed using rrs, cbb and hox. Environm. Microbiol. 8,

2106-2114 (2006).

Lipenkov, V, Istomin, V, Bulat, S, Raynaud, D, Petit J. An estimate of the dissolved oxygen concentration in subglacial Lake Vostok. AGU, Spring Meeting (2002), abstract #B21A-06.

Lückera S, Wagnera M, Maixnera F, Pelletierb E, Kocha H, Vacherieb B, Ratteie, Damstéf

JSS, Spieckg E, Le Paslierb D, and Daims H (2010) A Nitrospira metagenome illuminates the physiology and evolution of globally important nitrite-oxidizing bacteria. PNAS 1-6.

Ma LJ, Catranis CM, Stamer WT, and Rogers SO (1999) Revival and characterization of fungi from ancient polar ice. Microbiology. 70-73.

Ma LJ, Rogers SO, Catranis CM (2000) Detection and characterization of ancient fungi entrapped in glacier ice. Mycologia 92(2): 286-295.

MacGregor JA, Matsuoka K, Studinger M (2009) Radar detection of accreted ice over Lake 139

Vostok, Antarctica. Earth Planet Sci Lett 282: 222-233.

Martin W and Russel MJ, (2007) On the origin of biochemistry at an alkaline hydrothermal vent. Phil. Trans. R. Soc. B 362: 1887-1925.

McKay CP, Andersen DT, Pollard W.H, Heldmann J.L, Doran, P.T, Fritsen C.H, Priscu J.C

(2005) Polar lakes, streams, and springs as analogs for the hydrological cycle on Mars. In Water on Mars and Life. Adv in Astrobiol and Biogeophys 4: 219-233

Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A,

Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The Metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC

Bioinformatics 9: 386.

Moriya Y, Itoh M, Okuda S, Yoshizawa AC and Kanehisa M (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Bioinformatics Center, Institute for

Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan. Nucleic Acids Res

35: W182-W185.

Nakagawa S, Takaki Y, Shimamura S, Reysenbach A-L, Takai K, and Horikosh K (2007)

Deep-sea vent İ-proteobacterial genomes provide insights into emergence of pathogens. PNAS

104(29): 12146-12150.

Oshima T and Kazutomo I (1974) Description of Thermus thermophilus (Yoshida and

Oshima) comb. nov., a nonsporulating thermophilic bacterium from a Japanese thermal spa. Int J

Syst Bacteriol. 102-112.

Oswald GKA and Robin G de Q (1993) Lakes beneath the Antarctic ice sheet. Nature 245:

251-254. 140

Pearson A, Mcnichol AP, Benitez-Nelson BC, Hayes JM, and Eglinton TI (2001) Origins of lipid biomarkers in Santa Monica Basin surface sediment: A case study using compound-specific

¨14C analysis. Geochimica et Cosmochimica Acta 65 (18): 3123–3137.

Petit JR, Jouzel J, Raynaud D, Barkov NI, Barnola JM, Basile I, Bender M, Chappellaz J,

Davis M, Delaygue G et al. (1999) Climate and atmospheric history of the past 420,000 years from the Vostok ice core Antarctica. Nature 399: 429-436.

Petz W (1997) Ecology of the active soil microfauna (Protozoa, Metazoa) of Wilkes Land,

East Antarctica. Polar Biol 18: 33-44.

Polar Research Board. Division of Earth and Life Studies (2007) Exploration of Antarctic

Subglacial Aquatic Environments: Environmental and Scientific Stewardship. The National

Academies Press. Washington, D.CISBN: 0-309-10636-2, 162.

Priscu JC, Adams EE, Lyons WB, Voytek MA, Mogk DW, Brown RL, McKay CP, Takacs

CD, Welch KA, Wolf CF, et al. (1999) Geomicrobiology of subglacial ice above Lake Vostok,

Antarctica. Science 286(5447): 2141-2144.

Priscu JC, Fritsen CH, Adams EE, Giovannoni SL, Paerl HW, McKay CP, Doran PT,

Gordon DA, Lanoil BD, and Pinckney JL (1998) Perennial Antarctic Lake Ice: an oasis for life in a polar desert. Science 280 (5372), 2095-2098.

Priscu JC, Kennicutt II MC, Bell RE, Bulat SA, Ellis-Evans JC, Lukin VV, Petit JR, Powell

RD, Siegert MJ, Tabacco I (2005) Exploring subglacial Antarctic Lake Environment. EOS,

Transitions, American Geophysical Union 86(20): 193-200.

Priscu JC, Tulaczyk S, Studinger M, Kennicutt MCII, Christner BC, and Foreman CM

(2008) Antarctic subglacial water: origin, evolution, and ecology–In: Polar Lakes and Rivers.

Oxford University Press 119-135. 141

Pross J, Contreras L, Bijl PK, Greenwood DR, Bohaty SM, Schouten S, Bendle JA, Röhl U,

Tauxe L, Raine JI, et al. (2012) Persistent near-tropical warmth on the Antarctic continent during the Early Eocene Epoch. Nature Research Letters 488:73-77.

Ridley JK, Cudlip W, and Laxon SW (1993) Identification of subglacial lakes using ERS-1 radar altimeter. Journal of Glaciology 39: 625-634.

Rogers SO, Ma L, Zhao Y, Catranis CM, Starmer WT, Castello JD (2005) Recommendations for elimination of contaminants and authentication of isolates in ancient ice cores. In: Castello

JD, Rogers SO, (eds): Life in ancient ice. Princeton Univ Press. Princeton, New Jersey 5–21.

Rogers SO, Shtarkman YM, Koçer ZA, Edgar R, Veerapaneni R, and D’Elia T (2013)

Ecology of subglacial Lake Vostok (Antarctica) based on metagenomic/metatranscriptomic analyses of accretion ice. Biology ISSN 2079-7737,

Rogers SO, Theraisnathan V, Ma LJ, Zhao Y, Zhang G, Shin SG, Castello JD, Starmer WT

(2004) Comparisons of protocols to decontaminate environmental ice samples for biological and molecular examinations. Appl Environ Microbiol 70: 2540-2544.

Rossum G and de Boer J (1991) Interactively testing remote servers using the python programming language. CWI Quarterly 4(4): 283-303.

Salamatin AN, Tsyganova EA, Popov SV, Lipenkov VY (2009) Ice flow line modeling and ice core data interpretation: Vostok Station (East Antarctica) In: Physics of Ice Core Records –

II, Low Temperature Science, Supplementary Issue, ILTS, Hokkaido University, Sapporo, 68:

167-194.

Sambrotto R and Burckle L (2005) The nature and likely sources of biogenic particles found in ancient ice cores from Antarctica.In:Castello LD and Scott OR Life in Ancient Ice. Princeton

University Press 94-105. 142

Shtarkman YM, Koçer ZA, Edgar R, Veerapaneni RS, D’Elia T, Morris PF, Rogers SO

(2013) Subglacial Lake Vostok (Antarctica) Accretion Ice Contains a Diverse Set of Sequences from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya. PLoS ONE 8(7): 1-13.

Siegert MJ, Ellis-Evans JC, Tranter M, Mayer C, Petit J, et al. (2001) Physical, chemical and biological processes in Lake Vostok and other Antarctic subglacial lakes. Nature 414: 603-609.

Siegert MJ, Tranter M, Ellis-Evans JC, Priscu JC, and Lyons WB (2003) The hydrochemistry of the Lake Vostok and potential for life in Antarctic subglacial lakes. Hydrol. Process 17: 795-

814.

Speth DR, Hu B, Bosch N, Keltjens JT, Stunnenberg HG, and Jetten MSM (2012)

Comparative Genomics of Two Independently Enriched Candidatus Kuenenia Stuttgartiensis

Anammox Bacteria. Front Microbiol 3(307): 1-7.

Stamer WT, Fell JW, Catranis CM, Aberdeen V, Ma LJ, Zhou, and Rogers SO (2005) Yeasts in the genus Rhodotorula recovered from the Greenland ice sheet. In: Castello JD, Rogers SO,

(eds): Life in ancient ice. Princeton Univ Press. Princeton, New Jersey. 181-195.

Studinger M, Karner GD, Bell RE, Levin V, Raymond CA, Tikku A (2003) Geophysical models for the tectonic framework of the Lake Vostok region East Antarctica. Earth Planet Sci

Lett 216: 663-677.

Swofford DL (2000) PAUP Phylogenetic Analysis Using Parsimony (and Other Methods)

Version 4. Sinauer Associates, Sunderland, Massachusetts.

Tabacco IE, Bianchi C, Zirizzotti A, Zuccheretti E, Forieri A, and Della Vedova A (2002)

Airborne Radar Survey above Vostok Region, East-Central Antarctica: Ice Thickness and Lake

Vostok Geometry. In: Annals of Glaciology 62-69. 143

Talalay PG (2004) Ice drilling bibliography. Part I: Russian Drilling Antarctic Bibliography.

St. Petersburg State Mining Institute. 1-24.

Tarasov VG, Gebruk AV, MironovAN, Moskalev LI (2005) Deep-sea and shallow-water hydrothermal vent communities: Two different phenomena? Chemical Geology 224(1–3): 5-39.

Thomas DN and Dieckmann GS (2002) Antarctic Sea Ice – a Habitat for Extremophiles.

Science 295 (5555): 641-4.

Trappen SV, Vandecandelaere I, Mergaert J and Swings J (2004) Gillisia limnaea gen. nov., sp. nov., a new member of the family Flavobacteriaceae isolated from a microbial mat in Lake

Fryxell, Antarctica. Int J Syst Evol Microbiol. 54: 445–448.

Williams TJ, Zhang CL, Scott JH, and Bazylinski DA (2006) Evidence for autotrophy via the reverse tricarboxylic acid cycle in the marine magnetotactic strain MC-1. Appl Environ

Microbiol 72(2): 1322–1329.

Wright A, Siegert MJ (2011) The identification and physiographical setting of Antarctic subglacial lakes: An update based on recent discoveries. Geophys Monogr Ser 192: 9-26.

Yamamoto M and Takai K (2011) Sulfur metabolisms in epsilon- and gamma-Proteobacteria in deep-sea hydrothermal fields. Front Microbiol. 2(129): 1-8.

Yergeau E, Newsham KK, Pearce DA, and Kowalchuk GA (2007) Patterns of bacterial diversity across a range of Antarctic terrestrial habitats. Env. Microbiol. 9(11): 2670–2682.

Young DA, Wright AP, Roberts JL, Warner RC, Young NW, Greenbaum JS, Schroeder DM,

Holt JW, Sugden DE, Blankenship DD et al. (2011) A dynamic early East Antarctic Ice Sheet suggested by ice-covered fjord landscapes. Nature 474(7349): 72-75.

Yung YL, Russell MJ, and Parkinson CD (2010) The search for life on Mars. Journal of

Cosmology 5: 1121-1130. 144

Zachos J, Pagani M, Sloan L, Thomas E, Billups K (2001) Trends, rhythms, and aberrations in global climate 65 Ma to present. Science 292(5517): 686-693. 145

APPENDIX A

Computational analysis

1. Obtain an OHIO SUPER COMPUTER (OSC) account; use PUTTY to access your OSC

account. a) Open terminal window and type glenn.osc.edu, then write your user name and password.

Next: -bash-3.2$ mkdir this command creates a new directory, and type:

-bash-3.2$ cd to enter the directory, which was just created. b) Open WinSCP (if using PC, for MAC users go to active terminal on the desktop for

computing. Fetch program can be used for copying files from the OSC server) using the

same host name password and user name to access your account. This program is required in

order to upload original files from the sequencing services to the OSC account. c) It is important to be consistent with the file names; do not add periods and underscores

together. For some programs this might not affect the process, but for others it might result in

an error. Most of the programs are linked with a batch execution file, where the walltime

parameter is the estimated maximum run time. Any unnecessary symbol in the name of the

file or accidently added to the script will result in the failure of the program to execute.

Regularly in Macintosh or Windows interfaces, spaces, tabs, new line characters and some

others are not shown, but every one of them has its own symbol, so by adding a space, for

instance, you are also adding a corresponding symbol to the script or a program, which

eventually will affect the result or performance.

2. Use the original file to clip the primers off. The 454 primers that are necessary for

pyrosequencing must be removed from the sequences. The program (sff_extract.py) searches

through each sequence and removes the primer sequences from the beginning of the 146

sequences. The Python (‘py’ extension) software extracts the sequences (seq), quality

sequencing scores (qual) and ancillary information from sff files. a) Create a file named ‘sff_extract.py’ or download it. b) Use the terminal command line to run the sff_extract.py program to analyze the sequence

file. The 454 primers are identified by the command line:

-bash-3.2$ ./sff_extract -s seq.fasta -q qual.fasta -x anci.xml file_name.sff

Where –s, –q and –x stand for the name of the sequence file, quality file and ancillary xml

file that will be created from the sff file and will contain exported sequences. As long as the

number of primer nucleotides was detected (commonly used EcoRI/NotI adaptor:

GAATTCGCGGCCGCGTCGAC), they were clipped off with the next command line:

-bash-3.2$ ./sff_extract --min_left_clip=nucleotide_numer file_name.sff

Where --min_left_clip stands for the minimal number of nucleotides to be clipped from the

left side of the sequence. c) Refresh your terminal window by typing ls, this will update the directory, and will show all

files.

3. MIRA 3.0.5 (Whole Genome Shotgun and EST Sequence Assembler; http://mira-

assembler.sourceforge.net/; [4]) is used to sequence assembly. Copy or download the MIRA

3.0.5 program from the web or from a previous account (mira_3.0.5_prog_linux-

gnu_x86_64_static). Place all three renamed files directly into the MIRA 3.0.5 folder, inside

the bin directory. By using a batch file the program performs a multiple sequence alignment

and searches for overlaps.. Batch file is basically a computer script in a text format that is

required for the correct program execution. It should be prepared in the text editing program

and saved with the Portable Batch System (pbs) format. Use any text editor of your choice 147

(WordPad, TextEdit, Notepad, BBedit works best) to create a batch *.pbs file. It is best to

use the sample name in the name of the MIRA batch file. An example of the batch file

content is shown below:

Example:

#PBS -l walltime=40:00:00 #PBS -l nodes=1:ppn=1 #PBS -N mira-test #PBS -S /bin/bash export PATH=$PATH:/nfs/15/bgs0279/mira_3.0.5_prod_linux-gnu_x86_64_static/ cd $PBS_O_WORKDIR mira --project=name --job=denovo,genome,accurate,454 >&log_assembly

Highlighted portion of the batch script is a pathway with the location of the file on the OSC.

To submit the script to the batch system, use the following command:

-bash-3.2$ qsub mira-test.pbs - It should return a one-line output with the job id.

-bash-3.2$ qstat -u login_name - checks the status of your job.

4. Retrieve information within the dataset via the OSC (OHIO SUPER COMPUTER) and run

the nucleotide BLAST tool with the megablast specification. Again, the Batch file content is

shown below and has to be saved in a text editing program with the ‘pbs’ extension.

Description of the command lines are listed under the script (do not include them into the

batch file):

Example:

#PBS -l walltime=10:00:00 #PBS -l nodes=1: ppn=1 #PBS -N blast #PBS -S /bin/bash export PATH=$PATH:/nfs/15/bgs0279/mira_3.0.5_prod_linux-gnu_x86_64_static/ cd $PBS_O_WORKDIR module load biosoftw module load blast 148 blastn -query File_name.fna -task megablast -db nt -num_descriptions 10 -num_alignments 10 -show_gis -outfmt 0 -out File_name_out

The first line shows an estimated time of 10 hours. This is the time request from the OSC for this specific job to be done. The next three lines start with the number of nodes requested with amount of processes per node. Then, the type of job needed (nucleotide blast), and type of system – binary, are specified. The next line indicates the pathway in the Network File

System (nfs) under ones user name. The pathway is important for the OSC analytical tools like BLAST to find and execute files of interest. Current pathway line represents branch number 15 in the nfs catalog; on the branch 15 under the account number bgs0279 there is a mira_3.0.5_prod_linux-gnu_x86_64_static folder. The “/” symbol indicates the location of the files to be processed. As BLAST is a tool (program) which is located within the whole package of the biological software on the server, it has to be loaded with separate command line. Thus the next line states: load the biosoftw module, and then specifies that within the biosoftw package one will be using only blast tool.

The last command line is read as: x run nucleotide blast on a query file, named File_name.fna x type of blast specification - megablast x type of search – nucleotide database x retrieve top 10 alignments with descriptions of sequence queries, and include GI numbers x save as fasta format output file under name File_name_out

-bash-3.2$ qsub Batch_file_name.pbs – executes the batch file

If the estimated walltime is exceded before the job is done, the server will end the process.

The process will have to be repeated with smaller files. 149

5. Parse (or separate) the file by number of sequences (e.g., 100 sequences). This step is

required only when the module Biosoftw on OSC fails to process large data sets of sequences

with megablast. Due to the large number of sequences (>200,000 seq), the estimated

maximum walltime of 40 hours would fail to finish. Therefore, the sequences must be

divided into the smaller files, which could be concatenated later. a) In the beginning, the files must be converted into UNIX format. UNIX format helps to

convert each sequence with its ID into two lines (for this purpose, SeaView4 can be used).

This will automatically remove the fasta format signs “>”. But this can be reversed using

BBEdit, and substitute all empty spaces in front of the ID by “>”. b) Below is a script for splitting into files with smaller number of sequences. In general, the

script literally goes into the sequence file copies first 150 lines and pastes/saves in a separate

file; in this particular example, 150 lines represent 75 sequences, where each first line

contains sequence name and each second – is a sequence (line descriptions are located under

the script):

Example:

#!/usr/bin/perl use lib '/usr/local/biosoftw/bioperl-1.5.1'; open (FILE, ''); $file = "0001"; open (MYFILE, '>', $file); $i = 0; while () { if ($i < 150) { chomp; ($ID, $seq) = split("\n"); print MYFILE "$ID"; print MYFILE "\n$seq"; $pos = tell FILE; ++$i; 150

} else { close (MYFILE); ++$file; open (MYFILE, '>', $file); seek (FILE, $pos, 0); $i = 0; } } close (FILE);

Description for the script lines starting from the first #!/usr/bin/perl

This is a specification for the OSC server to execute as Perl script. As the script was built for

Perl computing language, one needs to specify the type of Perl program (bioperl version

1.5.1). The next two lines are read as: open file (with filename) and create a new file with the name 0001. Then the filehandle directs the program to write from the original file into a new created file. After, it is important to set a variable ($i = 0; where zero represents the first variable). The ‘while’ command reads each line one line at a time and subsequently the program assigns temporary variables (this is done in the original file). The “{}” signs represent a loop or a circle within which the program will perform separate from the whole script processes. In this case, the increment ‘if’ sets a criteria for the program to not exceed more than 149 lines (when $i = 0 it is the first line, automatically the second line will have $i

= 1). As the increment is set for the group of 150 lines, ‘chomp’ function is used to clear each next line separately with the specification to create new variables for each line as $ID

(sequence name) and $seq (sequence). Then ‘print’ function basically commands to show (on the screen or show in a new file) the sequence name line, new line character, and the corresponding sequence. The redirection of the ‘print’ command from the original file to a newly created file 0001 is done by the ‘$pos’ index. The whole loop is repeated until the 151

variable $i reaches 149 (the ++$i is and automatic increment, which subsequently increased

by one variable every loop). As long as the inside loop for the ‘if’ criteria meets programs

expectations, the new created file 0001 contains 150 ‘printed’ lines. When the variable $i =

150 then ‘else’ criteria opens another loop that closes the created file 0001, automatically

increments new file name (in this case 0002), redirects the information to be saved into a new

file and zeroes the variable ‘$i’ for the next set of sequence names and sequences. After that,

‘else’ loop is closed and the program returns to the ‘if’ criteria and the cycle begins again. As

long as all the lines were exported into new files, both loops are closed and the original file is

closed too.

-bash-3.2$ perl File_name.pl – execute perl file. c) Concatenate the results

When working with large dataset, which are broken into small files prior to the blast analysis,

concatenation of the output blast files is performed with the following line:

-bash-3.2$ cat file1 file2 file3 > selected.tab

Or one can first split sequence file into multiple smaller files, then run BLAST analysis,

convert the resulting output files into tabular format, and only after that concatenate BLAST

results into one tabular file. After this file is inserted into the database program and

duplicates are remove, one can run Galaxy genome web site to retrieve full taxonomic

representation for the NCBI hit sequences.

Another option would be to concatenate the files with the help of the Galaxy genome web

site or by using created Perl script. Also, one can create a text file Cat_Files.pl with the

script, shown below:

#!/usr/bin/perl use strict; use warnings; 152 local $|=1; #turn off buffering for ( map { glob $_ } @ARGV ) { open my $FH, '<', $_ or do { warn "Can't open '$_' for read, skipping: $!"; next; } while (<$FH>) { print $_ } close $FH; }

Lines descriptions: second line requests from the program to show mistakes and use strict search for mistyped commands within the script. The next line specifies to use local directory and turn of the buffering option, by that increasing the amount of memory for the process to run. The next line creates a map for the concatenation process where an open variable ‘$_’ is used and ‘@ARGV’ means any file name. Thus, when the script is executed in a way written below, the program will concatenate all files with the ‘tab’ extension no matter what their names are. The next three lines represent a warning statement; open filehandle (‘$FH’) and import information from the files. If the concatenated file cannot be imported into the ‘$FH’

(filehandle), the program prints a warning line on the screen and moves to the next file. As long as all the information is printed in the filehandle, the program can close it. It is important to mention that the filehandle system is basically a connection between Perl script and the data file, which is labeled by an individual during program execution. When the script is run, the script file must be copied into the folder with multiple Blast_out results. The script shown above, can be used to concatenate different types of files. In this case, specify the extension (ether *.txt or *.html). Asterisks indicate all files with the same extension:

-bash-3.2$ perl Cat_files.pl *.tab > Filename.tab 153

In the execution line, perl command runs the script to concatenate all tabular files in a current

directory with the filehandle name ‘Filaname.tab’, which eventually will be a generated file

with all of records.

6. Blast to table: a) Option 1 - convert blast results to tabular format using BioPerl. Download or create a text file

with the script for the ‘Blast2Table’ program. This is a large script for Perl and is executed

as:

-bash-3.2$ perl Blast2Table File_name_out > File_name_out.txt

As the Blast2Table script has to be activated from the batch file, create a batch *.pbs file to

execute the Perl script. Create Batch_Blast.pbs with these commands:

Example:

#PBS -l walltime=10:00:00 #PBS -l nodes=1:ppn=8 #PBS -N Blast2table #PBS -S /bin/bash export PATH=$PATH:/nfs/18/bgs0294/mira_3.0.5_prod_linux-gnu_x86_64_static/bin cd $PBS_O_WORKDIR module load bioperl perl Blast2table Filename_blast_out > Filename_table_out b) Option 2 - Use format output –outfmt 10 or -outfmt 6, then the result output can be converted

into the database format using FileMakerPro11. In the case of –outfmt 10, the output is the

same but in the regular text format, which can be converted with BBEdit into tabular form by

substituting all ‘,’ and ‘|’ symbols by tabs. Absence of the specified format in the command

line will run the file in the default mode, which basically retrieves all alignments information.

7. Use FileMakerPro to edit and add tables, and to remove duplicates (by gene id). Open the

FileMaker Pro database with your results. Go to ‘script’, and in a new window press ‘new’. 154

All commands will be listed on the left side of the window and the typing area is on right.

Most parts of the script are straight forward and the commands mean exactly what they are.

If the script says ‘sort []’ or ‘go to Record/Request/Page [First]’this means that the records will be sorted based on the selected criteria and in the second case the program will literally select the first record. The ‘[]’ brackets represent a variable and its specifications. For instance: below the first line says ‘Sort [YourTable::YourField, ascending, no dialog]’. This means that the command is to sort records by ‘YourField’ variable (which is a column), located in the file name ‘YourTable’; type of sort is ascending and ‘no dialog’ means that no other criteria needs to be changes or used for this command. Example:

Script 1

Sort [YourTable::YourField, ascending, no dialog] Go to Record/Request/Page [ First ] Set Variable [ $$DupID; Value:Table::Field_dup ] Omit Record Comment ["the loop is set to preserve ONE TOP record of each set of duplicates"] Loop Exit Loop If [ Get ( FoundCount ) = 0 ] If [ Table::Field_dup = $$DupID ] Delete Record/Request [ No dialog ] Else Set Variable [ $$DupID; Value:Table::Field_dup ] Omit Record End If End Loop Show All Records Type2 script

Sort [YourTable::YourField, ascending, no dialog] Go to Record/Request/Page [ First ] Set Variable [ $$DupID; Value:Table::Field_dup ] Go to Record/Request/Page [ Next ] Loop If [ Table::Field_dup = $$DupID ] Delete Record/Request [ No dialog ] Else Set Variable [ $$DupID; Value:Table::Field_dup ] 155

Go to Record/Request/Page [ Next; Exit after last ] End If End Loop Show All Records

Filemaker program uses exactly the same computational principle as Perl and C++; the only

difference is that it is much easier. In both computational languages, all used above functions

are replaced by a certain sequence of symbols in the script; this makes programming

language more complicated than Filemaker database software. For instance: Filemaker

function ‘Omit Record’ in Perl language can be represented as a line in the script (print

"$DupID\n";), where ‘\n;’ means new line character and in the Filemaker it would mean next

record line.

The difference between the two scripts listed above is that the first script sorts and saves the

first record then sets a circular process or ‘loop’. If the number of records with the same ID

number is 1, then the ‘loop’ is closed and the script moves to the next record and starts the

script again. If the counted number is > 1 record, then the script saves the first and deletes the

rest on the same GI records. So the script circles basically around the records of the same

duplicated variable.

The second script can be used for the large databases (more than 30,000 records), as it saves

the first record and circles from the next record if it is duplicated; thus, the second script uses

less operations and should run faster.

8. Assembled sequences should be converted into tab-delimited format using the Galaxy

genome browser; under section ‘Fasta manipulation’ one can convert tabular-to-fasta and

vice versa. The BLAST output file can be uploaded to the Galaxy genome website and used

for the metagenomic analysis to retrieve the taxonomy for the gene identification numbers. 156

Under the section ‘Metagenomic analysis’ one can select ‘Fetch taxonomic representation’

and select ‘gene identification’ columns as sources of information.

9. Multiple sequence alignment for large dataset using MAFFT. Prior to running MAFFT visit

the link, shown below to access multiple sequence alignment algorithms:

http://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html a) Place a file(s) into the 'bin' directory by simply dragging the file(s), download and install

MAFFT in the 'bin' directory. Open the Terminal window ('cd' command returns to the

previous directory, 'cd with directory name' - enters s specified directory ; 'ls' - shows the

files inside the directory) In the Terminal type 'ls' to see your directories; 'cd bin' - to enter

the bin directory

Type 'which mafft' - to see the pathway for the program

Example:

-bash-3.2: bin$ which mafft

/usr/local/bin/mafft

-bash-3.2: bin$

b) Copy the pathway and insert it, followed by the file name with the sequence data file.

Note: the format of the file may be ether 'txt' or 'fasta'. After the file name you put the sign '>'

and type the name of the file you want to create and save as.

Example:

-bash-3.2: bin$ /usr/local/bin/mafft All16S.fasta > 16Sout.txt

Example (the beginning of the run):

-bash-3.2: bin$ /usr/local/bin/mafft All23S.fasta > 23Sout.txt nthread = 0 generating 200PAM scoring matrix for nucleotides ... done done 157

done scoremtx = -1 Gap Penalty = -1.53, +0.00, +0.00…

Example (the ending of the run):

Progressive alignment... STEP 583 /583 d done. tbfast (nuc) Version 6.833b alg=A, model=DNA200 (2), 1.530 ( 4.590), -0.000 (-0.000) 0 thread(s) Strategy: FFT-NS-2 (Fast but rough) Progressive method (guide trees were built 2 times.) If unsure which option to use, try 'mafft --auto input > output'. If the possibility of long gaps can be excluded, add '--ep 0.123'. For more information, see 'mafft --help', 'mafft --man' and the mafft page. Making a distance matrix… 581 / 584nknown character y done.

Note that the strategy for the multiple sequence alignment can be selected by simply pasting the

pathway for MAFFT and pressing enter, using MAFFT commands.

Example:

-bash-3.2: bin$ /usr/local/bin/mafft

------

MAFFT v6.833b (2010/10/20)

Copyright (c) 2010 Kazutaka Katoh

NAR 30:3059-3066, NAR 33:511-518

http://mafft.cbrc.jp/alignment/software/

------

Input file? (fasta format) @ ‘File name’ OK. infile = 28SAll.fasta 158

Output file? @ ‘File name fir the results’ OK. outfile = 28Sout.txt

Ourput format? 1. Clustal format / Sorted 2. Clustal format / Input order 3. Fasta format / Sorted 4. Fasta format / Input order @... OK. arguments = --reorder

Strategy? 1. --auto 2. FFT-NS-1 (fast) 3. FFT-NS-2 (default) 4. G-INS-i (accurate) 5. L-INS-i (accurate) 6. E-INS-i (accurate) @... OK. arguments = --localpair --maxiterate 16 --reorder

Additional arguments? (--ep #, --op #, --kappa #, etc) @ command= "/usr/local/bin/mafft" --localpair --maxiterate 16 --reorder ‘Filename’ > ‘New_file_name’ OK? @ [Y]…

Put the number to select one of the four formats and one of the six algorithms listed above. If

the exact parameters are known for the gap penalties and scoring matrix they can be entered

after selecting an algorithm. Otherwise, press “Enter” and confirm the procedure with “Y”

and then “Enter” again. FFT stands for “Fast Fourier Transform”.

10. Prepare the “compute.txt” file for Neighbor-Joining phylogenetic analysis (using PAUP on

Macintosh system):

Example:

Begin paup; 159 set autoclose=yes warntree=no warnreset=no; log start file=data.log replace=yes; execute ‘FilenameNEXUS.txt’; NJ brlens=yes treefile=nexnj.tre replace=yes; bootstrap nreps=1000 treefile=boot_align_dist.tre search=NJ; condense collapse=maxbrlen; contree; describetrees all/plot=phylogram; savetrees file=newicknj.tre brlens=yes maxDecimals=0 saveBootP=nodeLabels maxDecimals=0 from=1 to=1 replace=yes; log stop; end;

Open Terminal and go to the PAUP directory; type './paup4b10-ppc-macosx'

Example:

-bash-3.2: PAUP$ ./paup4b10-ppc-macosx

This command initiates PAUP. The interface will look like this:

Example:

-bash-3.2: PAUP$ ./paup4b10-ppc-macosx

P A U P *

Portable version 4.0b10 for Unix

Sat Feb 12 14:17:00 2011

------NOTICE------

This is a beta-test version. Please report any crashes, apparent calculation errors, or other

anomalous results.

There are no restrictions on publication of results obtained with this version, but you should

check the WWW site frequently for bug announcements and/or updated versions. See the

README file on the distribution media for details.

------160

paup>

Type in the command line 'execute' and the 'compute file name'

This will input the aligned sequence matrix within the file and set up the default parameters

or other parameters that have been set manually.

11. Compare databases using BLAST. If you want to search one dataset with another database

or to another set of sequences, follow these steps: a) Convert an ‘*.fna’ file into nucleotide database:

-bash-3.2$ ./makeblastdb -in ‘Filename_1.fna’ –db type nucl b) Blast created database over a genomic file and make an output as a new file with defining a

pathway, where to save:

-bash-3.2$ perl legacy_blast.pl blastall -p blastn -i ‘Filename_1.fna’ -d ‘Filename_2.fna’ -o

File1_COMP_File2.out --path /Users/pmorris/Desktop/ncbi-blast-2.2.23+/bin

It is important to mention that the ‘legacy_blast.pl’ scrip is a program within the BLAST

toolkit that helps to perform a BLAST function without actual BLAST download and

installation. Plus, this script is a part of the BLAST tool package and is present on OSC, so

anyone can use it at any time. The command line above executes the *.pl script, which runs

nucleotide BLAST program (-p) for the query file (-i) Filename_1.fna over the database (-d)

Filename_2.fna with the output file (-o) File1_COMP_File2.out. The pathway displays the

location of the current files as well as the direction for the new file to be created. 161 SUPPLEMENTARY TABLES

Table S1. Small subunit rRNA genes of Bacteria and Eukarya from V5. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Accession Q Q Q %- %- e-value GI number Phylum Family Genus / Species number length start end ident sim JQ997163 424 2 424 0 96% 96% 185178423 Bacteria Acidobacteria n uncultured Acidobacteriales bacterium JQ997164 329 16 293 3E-133 98% 98% 290759818 Bacteria Actinobacteria Actinomycetaceae Actinomyces georgiae JQ997166 490 18 458 0 93% 93% 12641589 Bacteria Actinobacteria Actinomycetaceae Actinomyces odontolyticus JQ997165 320 17 275 2E-110 95% 95% 290759817 Bacteria Actinobacteria Actinomycetaceae Actinomyces odontolyticus JQ997170 575 4 574 0 100% 100% 288558711 Bacteria Actinobacteria Actinomycetaceae Actinomyces oris JQ997167 334 5 290 4E-147 100% 100% 290759821 Bacteria Actinobacteria Actinomycetaceae Actinomyces oris JQ997168 434 17 369 0 100% 100% 290759822 Bacteria Actinobacteria Actinomycetaceae Actinomyces oris JQ997169 520 4 309 2E-147 98% 98% 290759823 Bacteria Actinobacteria Actinomycetaceae Actinomyces oris JQ997171 554 5 194 9E-57 88% 88% 12641598 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. JQ997172 533 1 530 0 100% 100% 10946537 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral clone EP011 JQ997173 550 10 538 0 99% 99% 33860321 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral clone IO077 JQ997174 567 18 566 0 99% 99% 9837443 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral strain B27SC JQ997175 507 16 429 0 99% 99% 285797117 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 169 JQ997177 540 23 532 0 100% 100% 284451154 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 171 JQ997176 504 15 365 5E-178 99% 99% 285797225 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 171 JQ997178 408 3 305 1E-138 97% 97% 285797322 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 175 JQ997180 415 5 374 0 100% 100% 285797410 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 177 JQ997179 307 5 244 3E-119 100% 100% 285797411 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 177 JQ997181 568 1 565 0 99% 99% 284451155 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 178 JQ997183 409 18 317 3E-149 99% 99% 285802785 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 448 JQ997182 261 5 211 1E-101 100% 100% 285802883 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon 448 JQ997184 267 4 211 4E-102 100% 100% 285201485 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. oral taxon B78 JQ997185 585 17 580 0 93% 93% 290759814 Bacteria Actinobacteria Actinomycetaceae Actinomyces sp. TeJ5 JQ997188 472 5 427 0 99% 99% 12641602 Bacteria Actinobacteria Actinomycetaceae Actinomyces viscosus JQ997187 414 13 382 0 99% 99% 281485357 Bacteria Actinobacteria Actinomycetaceae Actinomyces viscosus JQ997186 365 5 264 6E-131 100% 100% 290759807 Bacteria Actinobacteria Actinomycetaceae Actinomyces viscosus JQ997189 556 23 548 0 97% 97% 162846486 Bacteria Actinobacteria Actinomycetaceae uncultured Actinomyces sp. JQ997190 270 4 212 3E-98 98% 98% 290759824 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium durum JQ997191 544 5 542 0 99% 99% 282598458 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium sp. NML00-0156 JQ997192 337 5 274 3E-138 100% 100% 282598461 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium sp. NML09-0341 JQ997193 441 17 349 2E-156 97% 97% 9837446 Bacteria Actinobacteria Dermabacteraceae Dermabacter sp. oral strain B46KS JQ997194 463 4 414 0 96% 96% 291290506 Bacteria Actinobacteria Dermatophilus chelonae JQ997195 537 4 534 0 99% 99% 291290503 Bacteria Actinobacteria Dermatophilaceae pelagius JQ997199 289 4 222 2E-104 99% 99% 255926777 Bacteria Actinobacteria Geodermatophilaceae uncultured Blastococcus sp. JQ997200 491 20 437 0 98% 98% 194680069 Bacteria Actinobacteria Arsenicicoccus bolidensis JQ997201 567 19 563 0 95% 95% 258678886 Bacteria Actinobacteria Intrasporangiaceae Arsenicicoccus piscis JQ997202 479 3 420 0 97% 97% 254692454 Bacteria Actinobacteria Intrasporangiaceae Janibacter anophelis JQ997203 304 18 238 2E-100 97% 97% 294992068 Bacteria Actinobacteria Intrasporangiaceae Janibacter sp. MJ436 JQ997205 550 18 548 0 99% 99% 282934985 Bacteria Actinobacteria Intrasporangiaceae Janibacter sp. RC5-101 JQ997206 548 18 509 0 99% 99% 197114172 Bacteria Actinobacteria Intrasporangiaceae Janibacter terrae JQ997207 420 20 373 4E-158 96% 96% 198404150 Bacteria Actinobacteria Intrasporangiaceae Terracoccus sp. WPCB166 JQ997208 410 4 361 9E-175 98% 98% 169643232 Bacteria Actinobacteria Agrococcus sp. 1038/2 JQ997211 393 17 353 1E-142 95% 95% 270282501 Bacteria Actinobacteria Microbacteriaceae Frigoribacterium sp. 19 JQ997212 525 18 518 0 97% 97% 37812156 Bacteria Actinobacteria Microbacteriaceae Frigoribacterium sp. GWS-SE-H243 JQ997213 530 18 466 0 99% 99% 134105934 Bacteria Actinobacteria Microbacteriaceae Leifsonia kribbensis JQ997214 528 15 508 0 99% 99% 134105935 Bacteria Actinobacteria Microbacteriaceae Leifsonia sp. MSL 07 JQ997215 536 17 533 0 98% 98% 283979980 Bacteria Actinobacteria Microbacteriaceae Microbacteriaceae bacterium MIDF13 JQ997216 404 5 350 1E-153 95% 95% 284156650 Bacteria Actinobacteria Microbacteriaceae Microbacterium sp. JDM-3-08 JQ997217 534 17 532 0 99% 99% 289594400 Bacteria Actinobacteria Microbacteriaceae Microbacterium sp. KT 820 JQ997218 280 17 248 4E-107 97% 97% 254682001 Bacteria Actinobacteria Microbacteriaceae Microbacterium sp. THWCSN36 JQ997219 514 17 462 0 100% 100% 111146878 Bacteria Actinobacteria Microbacteriaceae Phycicola gilvus JQ997220 533 5 473 0 99% 99% 117644155 Bacteria Actinobacteria Microbacteriaceae Subtercola frigoramans JQ997221 331 24 283 2E-106 94% 94% 195971977 Bacteria Actinobacteria Microbacteriaceae uncultured Cryobacterium sp. JQ997222 341 19 236 8E-65 89% 89% 219898423 Bacteria Actinobacteria Microbacteriaceae uncultured Leifsonia sp. JQ997223 406 17 372 0 100% 100% 270341314 Bacteria Actinobacteria Arthrobacter flavus JQ997224 414 16 341 2E-167 100% 100% 292596387 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. 01-Au-006/3 JQ997225 423 4 73 7E-27 100% 100% 292596389 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. 01-Je-001 JQ997226 237 5 193 7E-79 96% 96% 292596388 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. 01-St-006-Luft JQ997227 387 18 124 1E-23 88% 88% 107593744 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. AE05102002_1 JQ997228 497 17 450 9E-161 91% 91% 239703779 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. AMV8 JQ997229 424 3 379 0 99% 99% 145293728 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. g7 JQ997230 350 12 311 5E-152 99% 99% 145293734 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. h42 JQ997231 542 4 538 0 99% 99% 158562718 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. NP2 JQ997232 537 5 531 0 99% 99% 158562707 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. NP3 JQ997233 301 17 254 9E-119 100% 100% 228007468 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. SH-43B JQ997234 251 17 183 5E-81 100% 100% 289598546 Bacteria Actinobacteria Micrococcaceae Kocuria palustris JQ997235 296 4 251 4E-77 90% 90% 37785787 Bacteria Actinobacteria Micrococcaceae Kocuria rosea JQ997237 475 5 413 0 100% 100% 295809753 Bacteria Actinobacteria Micrococcaceae Kocuria sp. DNG32 JQ999505 545 3 533 0 97% 97% 60544888 Bacteria Actinobacteria Micrococcaceae Kocuria sp. T213BO3 JQ997238 520 6 461 0 97% 97% 272760971 Bacteria Actinobacteria Micrococcaceae Micrococcaceae bacterium IVw_V JQ997239 542 20 538 0 99% 99% 260100813 Bacteria Actinobacteria Micrococcaceae Micrococcaceae bacterium M1S6-12 JQ997240 305 19 260 2E-115 98% 98% 154243273 Bacteria Actinobacteria Micrococcaceae Micrococcaceae bacterium NASA2-30 JQ997251 549 5 520 0 95% 95% 209943871 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. 3S3 JQ997252 314 8 224 3E-104 99% 99% 158702950 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. 66H20-1 JQ997253 309 5 268 7E-125 98% 98% 219809684 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. BQAB-06d 162 Table S1 Cont.

JQ997254 557 5 555 0 99% 99% 219809025 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. BQN1N-03d JQ997255 563 18 561 0 95% 95% 241897494 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. C-09 JQ997256 453 25 410 0 97% 97% 215983448 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. CCGE3063 JQ997257 559 19 522 0 99% 99% 240129659 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. CTDB2 JQ997258 277 4 226 4E-112 100% 100% 116119394 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. DY-1 JQ997259 259 82 201 3E-53 99% 99% 294992046 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. MJ314 JQ997260 545 5 543 0 99% 99% 294992048 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. MJ425 JQ997261 371 25 340 7E-111 90% 90% 294992049 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. MJ524 JQ997262 566 5 560 0 97% 97% 187319412 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. MOLA 73 JQ997263 531 18 254 2E-98 95% 95% 154243246 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. NASA2-3 JQ997264 554 5 554 0 99% 99% 157703991 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. SY-13 JQ997265 380 18 328 9E-125 93% 93% 265678768 Bacteria Actinobacteria Micrococcaceae Nesterenkonia halotolerans JQ997266 551 5 547 0 99% 99% 265678815 Bacteria Actinobacteria Micrococcaceae Nesterenkonia lutea JQ997267 537 18 536 0 98% 98% 283486727 Bacteria Actinobacteria Micrococcaceae Nesterenkonia sandarakina JQ997268 542 5 530 0 95% 95% 256592563 Bacteria Actinobacteria Micrococcaceae Nesterenkonia sp. 110-7 JQ997269 543 4 538 0 91% 91% 256592564 Bacteria Actinobacteria Micrococcaceae Nesterenkonia sp. 110-8 JQ997270 537 24 522 0 93% 93% 283486721 Bacteria Actinobacteria Micrococcaceae Nesterenkonia sp. 2019 JQ997272 407 19 314 1E-133 96% 96% 219878171 Bacteria Actinobacteria Micrococcaceae Rothia nasimurium JQ997273 417 5 302 1E-153 100% 100% 295393241 Bacteria Actinobacteria Micrococcaceae uncultured Rothia sp. JQ997275 285 5 232 6E-115 100% 100% 260871474 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium sp. GN-10803 JQ997276 263 18 170 3E-73 100% 100% 62736086 Bacteria Actinobacteria Mycobacteriaceae uncultured Mycobacterium sp. JQ997277 548 19 548 0 99% 99% 146166675 Bacteria Actinobacteria n Micrococcineae bacterium 4_C16_66 JQ997278 410 17 375 0 99% 99% 269113439 Bacteria Actinobacteria n uncultured bacterium JQ997279 575 5 235 2E-107 98% 98% 11127808 Bacteria Actinobacteria n uncultured sheep mite bacterium Llangefni 35 JQ997280 542 17 536 0 92% 92% 111146975 Bacteria Actinobacteria Marmoricola aequoreus JQ997281 335 18 277 1E-132 100% 100% 215983420 Bacteria Actinobacteria Nocardioidaceae Nocardioides sp. CCGE2239 JQ997282 455 18 318 3E-150 99% 99% 293629578 Bacteria Actinobacteria Nocardioidaceae Nocardioides sp. Cr7-14 JQ997283 457 5 407 0 98% 98% 291464959 Bacteria Actinobacteria Nocardioidaceae uncultured Nocardioides sp. JQ997285 561 5 555 0 97% 97% 8574099 Bacteria Actinobacteria Streptomycetaceae Streptomyces rimosus JQ997286 461 3 55 8E-12 94% 94% 295702212 Bacteria Actinobacteria Streptomycetaceae Streptomyces sp. 175_2010_ JQ997288 498 17 376 3E-175 98% 98% 219816071 Bacteria Actinobacteria Yaniellaceae Yaniella soli JQ997289 550 57 545 0 92% 92% 284930174 Bacteria Actinobacteria Bifidobacteriaceae Bifidobacterium pullorum JQ997290 473 5 419 8E-171 94% 94% 284930176 Bacteria Actinobacteria Bifidobacteriaceae Bifidobacterium saeculare JQ997292 404 18 357 5E-177 100% 100% 239924942 Bacteria Actinobacteria Bifidobacteriaceae Parascardovia denticolens JQ997291 303 5 230 9E-114 100% 100% 285178409 Bacteria Actinobacteria Bifidobacteriaceae Parascardovia denticolens JQ997293 547 46 546 0 98% 98% 295147946 Bacteria Actinobacteria Coriobacteriaceae Atopobium parvulum JQ997294 370 18 283 3E-134 100% 100% 34329838 Bacteria Actinobacteria n actinobacterium iEI7 JQ997306 590 18 537 0 94% 94% 87042306 Bacteria Actinobacteria n uncultured actinobacterium JQ997296 257 70 225 1E-71 99% 99% 105990462 Bacteria Actinobacteria n uncultured actinobacterium JQ997304 573 234 566 4E-165 99% 99% 154186968 Bacteria Actinobacteria n uncultured actinobacterium JQ997300 474 11 415 0 99% 99% 154199122 Bacteria Actinobacteria n uncultured actinobacterium JQ997301 516 5 480 0 100% 100% 218533798 Bacteria Actinobacteria n uncultured actinobacterium JQ997302 538 43 422 0 98% 98% 220682447 Bacteria Actinobacteria n uncultured actinobacterium JQ997303 559 5 493 3E-171 90% 90% 226918723 Bacteria Actinobacteria n uncultured actinobacterium JQ997299 392 5 359 2E-175 98% 98% 237637651 Bacteria Actinobacteria n uncultured actinobacterium JQ997305 574 19 487 0 97% 97% 237637665 Bacteria Actinobacteria n uncultured actinobacterium JQ997298 277 4 221 2E-109 100% 100% 284387472 Bacteria Actinobacteria n uncultured actinobacterium JQ997295 251 5 144 5E-66 100% 100% 290565057 Bacteria Actinobacteria n uncultured actinobacterium JQ997297 266 18 174 7E-55 92% 92% 290794332 Bacteria Actinobacteria n uncultured actinobacterium JQ998479 465 18 340 5E-148 97% 97% 237934379 Bacteria Actinobacteria n uncultured bacterium JQ997307 527 18 464 0 98% 98% 159159332 Bacteria Bacteroidetes Bacteroidaceae Bacteroides coprocola JQ997314 394 5 349 8E-180 100% 100% 175940971 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997312 315 4 271 2E-130 99% 99% 208689585 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997310 284 18 252 4E-92 93% 93% 208690692 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997317 455 2 398 8E-176 95% 95% 208690861 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997318 464 18 411 1E-178 96% 96% 281324569 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997319 576 5 564 0 92% 92% 281324578 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997315 401 4 357 2E-151 94% 94% 281324586 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997311 312 25 225 3E-93 98% 98% 281324587 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997316 404 2 349 2E-156 96% 96% 281324588 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997313 379 5 310 1E-143 97% 97% 281324591 Bacteria Bacteroidetes Bacteroidaceae uncultured Bacteroides sp. JQ997324 564 24 525 0 99% 99% 154193781 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium JQ997322 557 5 531 0 99% 99% 154193791 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium JQ997320 439 5 359 1E-163 96% 96% 217337425 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium JQ997321 498 23 340 5E-163 100% 100% 261265014 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium JQ997325 574 22 327 2E-67 83% 83% 159159377 Bacteria Bacteroidetes Porphyromonadaceae Parabacteroides goldsteinii JQ997327 474 23 427 6E-147 90% 90% 86371916 Bacteria Bacteroidetes Porphyromonadaceae uncultured Porphyromonas sp. JQ997326 276 9 114 2E-45 99% 99% 294613743 Bacteria Bacteroidetes Porphyromonadaceae uncultured Porphyromonas sp. JQ997328 329 18 297 2E-125 96% 96% 215273719 Bacteria Bacteroidetes Prevotellaceae Paraprevotella xylaniphila JQ997329 556 18 552 0 99% 99% 284451151 Bacteria Bacteroidetes Prevotellaceae Prevotella denticola JQ997330 372 5 335 3E-109 89% 89% 189406708 Bacteria Bacteroidetes Prevotellaceae Prevotella falsenii JQ997333 550 20 505 0 97% 97% 32492917 Bacteria Bacteroidetes Prevotellaceae Prevotella melaninogenica JQ997332 437 18 386 0 100% 100% 290759845 Bacteria Bacteroidetes Prevotellaceae Prevotella melaninogenica JQ997331 269 4 216 5E-96 97% 97% 290759846 Bacteria Bacteroidetes Prevotellaceae Prevotella melaninogenica JQ997334 294 18 276 4E-132 100% 100% 213399744 Bacteria Bacteroidetes Prevotellaceae Prevotella sp. 8400706 JQ997335 421 18 382 7E-176 98% 98% 14161353 Bacteria Bacteroidetes Prevotellaceae uncultured Prevotella sp. JQ997336 461 4 424 0 99% 99% 253683837 Bacteria Bacteroidetes Prevotellaceae uncultured Prevotella sp. JQ997337 538 18 537 0 99% 99% 290759830 Bacteria Bacteroidetes Flavobacteriaceae Capnocytophaga granulosa JQ997338 316 18 268 1E-127 100% 100% 283443628 Bacteria Bacteroidetes Flavobacteriaceae Flavobacterium johnsoniae 163 Table S1 Cont.

JQ997339 352 17 298 4E-133 98% 98% 157170669 Bacteria Bacteroidetes Flavobacteriaceae Flavobacterium sp. P-131 JQ999022 552 17 310 5E-119 94% 94% 237931356 Bacteria Bacteroidetes n uncultured bacterium JQ997350 521 16 465 0 99% 99% 60266503 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997356 577 5 571 0 97% 97% 118772949 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997349 515 18 469 0 98% 98% 118772961 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997340 261 162 223 1E-22 100% 100% 126131306 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997353 546 17 542 0 97% 97% 151936618 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997348 500 24 438 0 100% 100% 197360255 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997351 538 18 537 2E-163 87% 87% 222079937 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997346 370 17 260 1E-123 100% 100% 239619880 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997352 544 18 510 0 98% 98% 291329669 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997344 364 5 306 6E-156 100% 100% 291329690 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997342 317 18 169 1E-72 100% 100% 291329981 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997354 546 5 510 0 100% 100% 291330802 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997355 565 23 565 0 99% 99% 291330808 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997343 362 4 312 8E-155 99% 99% 291330883 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997341 277 3 232 5E-116 100% 100% 291330884 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997347 437 5 380 0 99% 99% 291332865 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997345 369 24 304 4E-118 94% 94% 291332904 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ999501 547 18 507 0 98% 98% 219961820 Bacteria Bacteroidetes n uncultured Bacteroidetes bacterium JQ997357 239 5 195 4E-71 93% 93% 246367044 Bacteria Bacteroidetes Sphingobacteriaceae Pedobacter sp. BZ42 JQ997358 545 17 539 0 96% 96% 211908640 Bacteria Bacteroidetes Sphingobacteriaceae Pedobacter sp. L2b-1 JQ997359 496 4 358 2E-171 98% 98% 225382587 Bacteria Bacteroidetes Sphingobacteriaceae Sphingobacterium shayense JQ997360 373 17 327 3E-154 99% 99% 203289069 Bacteria Bacteroidetes Sphingobacteriaceae Sphingobacterium sp. MOL-1 JQ997362 256 15 224 3E-103 100% 100% 295149401 Bacteria Cyanobacteria n uncultured Cyanobacterium sp. JQ997363 332 5 282 1E-118 95% 95% 148299079 Bacteria Cyanobacteria n cyanobacterium OSC JQ997478 202 1 202 8E-59 88% 88% 372197969 Bacteria Cyanobacteria n n JQ997364 402 10 359 1E-157 96% 96% 34808714 Bacteria Cyanobacteria n uncultured Antarctic cyanobacterium JQ997367 759 5 136 1E-46 94% 94% 46948073 Bacteria Cyanobacteria n uncultured Antarctic cyanobacterium JQ997366 546 18 543 0 94% 94% 46948088 Bacteria Cyanobacteria n uncultured Antarctic cyanobacterium JQ997365 460 5 299 1E-114 93% 93% 220683477 Bacteria Cyanobacteria n uncultured Antarctic cyanobacterium JQ997368 283 18 253 1E-101 96% 96% 15212605 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997380 466 5 383 6E-177 97% 97% 76096831 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997377 401 18 323 1E-152 99% 99% 93359876 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997376 384 27 315 1E-113 93% 93% 105990288 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997379 434 13 386 2E-177 97% 97% 105990324 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997383 504 17 456 0 95% 95% 129563748 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997386 535 18 529 0 92% 92% 129563749 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997387 536 5 535 0 92% 92% 146141933 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997372 339 7 282 8E-125 96% 96% 146141966 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997381 477 69 442 4E-164 95% 95% 149350798 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997389 547 15 543 0 91% 91% 162289063 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997391 556 5 525 0 95% 95% 192337841 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997373 354 18 189 2E-60 92% 92% 192804239 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997382 480 5 350 4E-159 97% 97% 213053960 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997370 318 2 273 6E-101 92% 92% 219883478 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997378 412 259 367 2E-31 92% 92% 227072228 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997390 551 17 531 0 95% 95% 227072230 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997385 517 18 240 8E-52 85% 85% 229562671 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997369 290 5 195 2E-79 95% 95% 229563859 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997392 556 4 554 0 97% 97% 238632326 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997395 678 336 641 8E-63 83% 83% 261290524 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997388 540 1 533 0 92% 92% 282765201 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997371 328 228 255 0.001 100% 100% 285015409 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997397 325 5 294 3E-129 96% 96% 18182396 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997402 577 212 489 8E-102 92% 92% 18182408 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997398 327 5 236 5E-112 99% 99% 18182418 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997404 690 18 222 4E-76 93% 93% 18182460 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997396 317 5 280 2E-126 97% 97% 18182464 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997401 392 3 276 5E-127 97% 97% 18182489 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997399 329 22 284 3E-134 100% 100% 18182490 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997400 389 18 358 6E-176 100% 100% 18182491 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium JQ997405 445 5 412 0 97% 97% 21388238 Bacteria Cyanobacteria Nostocaceae Anabaena azotica JQ997406 357 18 326 4E-88 87% 87% 19343359 Bacteria Cyanobacteria Nostocaceae Nodularia spumigena JQ997407 533 17 528 0 99% 99% 291360379 Bacteria Cyanobacteria Nostocaceae Nostoc flagelliforme JQ997408 308 13 262 4E-122 99% 99% 29124940 Bacteria Cyanobacteria Nostocaceae Nostoc muscorum JQ997409 514 17 514 0 99% 99% 82470879 Bacteria Cyanobacteria Nostocaceae Nostoc sp. _Mollenhauer 1:1-115_ JQ997410 297 18 266 7E-100 93% 93% 124108938 Bacteria Cyanobacteria Nostocaceae Nostoc sp. _Pannaria durietzii cyanobiont_ 1 NZ JQ997411 549 4 543 0 95% 95% 82470910 Bacteria Cyanobacteria Nostocaceae Nostoc sp. PCC 7423 JQ997412 263 5 168 2E-79 100% 100% 154361765 Bacteria Cyanobacteria Nostocaceae Nostoc sp. SKJF2 JQ997413 707 23 75 3E-13 96% 96% 225922035 Bacteria Cyanobacteria Nostocaceae uncultured Nostoc sp. JQ997414 403 5 357 0 100% 100% 82697090 Bacteria Cyanobacteria n Leptolyngbya sp. 0BB32S02 JQ997415 328 17 44 0.001 100% 100% 172050761 Bacteria Cyanobacteria n Lyngbya birgei JQ997416 561 3 554 0 95% 95% 12004672 Bacteria Cyanobacteria n Microcoleus acremanii JQ997417 497 17 456 0 99% 99% 149364155 Bacteria Cyanobacteria n Microcoleus sp. HTT-U-KK5 JQ997418 416 5 343 2E-176 100% 100% 149364160 Bacteria Cyanobacteria n Microcoleus sp. SAG 2212 JQ997419 538 5 505 0 91% 91% 19879913 Bacteria Cyanobacteria n Microcoleus steenstrupii JQ997453 442 3 234 6E-117 100% 100% 33327320 Bacteria Cyanobacteria n Oscillatoria amoena JQ997454 532 5 475 0 94% 94% 291603789 Bacteria Cyanobacteria n Oscillatoria margaritifera 164 Table S1 Cont.

JQ997455 540 2 452 0 99% 99% 89242016 Bacteria Cyanobacteria n Oscillatoria prolifera JQ997456 489 15 377 6E-172 97% 97% 161723071 Bacteria Cyanobacteria n Oscillatoria sp. 195-A20 JQ997457 479 24 445 0 96% 96% 222876471 Bacteria Cyanobacteria n Oscillatoria sp. 327/2 JQ997458 238 2 184 7E-79 96% 96% 15428333 Bacteria Cyanobacteria n Oscillatoria sp. Ant-G16 JQ997459 285 15 205 2E-94 100% 100% 23978201 Bacteria Cyanobacteria n Oscillatoria sp. PCC 7112 JQ997460 558 17 525 0 98% 98% 281308415 Bacteria Cyanobacteria n Oscillatoriales cyanobacterium 2Dp86E JQ997461 351 18 317 8E-155 100% 100% 37782175 Bacteria Cyanobacteria n Oscillatoriales cyanobacterium IL-1.4 JQ997462 517 7 478 1E-178 91% 91% 149364150 Bacteria Cyanobacteria n Phormidiaceae cyanobacterium CPER-KK1 JQ997469 453 17 334 4E-34 77% 77% 124491646 Bacteria Cyanobacteria n Phormidium autumnale JQ997474 564 5 558 0 99% 99% 166997748 Bacteria Cyanobacteria n Phormidium autumnale JQ997476 585 5 583 0 92% 92% 166997749 Bacteria Cyanobacteria n Phormidium autumnale JQ997472 525 15 481 0 95% 95% 167508106 Bacteria Cyanobacteria n Phormidium autumnale JQ997475 571 10 568 0 97% 97% 167508107 Bacteria Cyanobacteria n Phormidium autumnale JQ997468 447 5 402 2E-177 95% 95% 167508108 Bacteria Cyanobacteria n Phormidium autumnale JQ997466 354 5 303 2E-135 96% 96% 167508118 Bacteria Cyanobacteria n Phormidium autumnale JQ997467 378 22 344 1E-158 98% 98% 167508119 Bacteria Cyanobacteria n Phormidium autumnale JQ997473 555 18 553 0 97% 97% 167508120 Bacteria Cyanobacteria n Phormidium autumnale JQ997471 490 18 434 1E-169 93% 93% 258547383 Bacteria Cyanobacteria n Phormidium autumnale JQ997470 478 18 429 1E-154 91% 91% 258547389 Bacteria Cyanobacteria n Phormidium autumnale JQ997465 328 16 236 6E-91 95% 95% 289976368 Bacteria Cyanobacteria n Phormidium autumnale JQ997477 317 31 285 2E-106 95% 95% 158452035 Bacteria Cyanobacteria n Phormidium corium JQ997479 548 1 545 0 93% 93% 149166808 Bacteria Cyanobacteria n Phormidium sp. KU003 JQ997480 269 5 223 3E-108 100% 100% 1668785 Bacteria Cyanobacteria n Phormidium sp. NIVA-CYA 203 JQ997481 527 18 515 0 95% 95% 167508121 Bacteria Cyanobacteria n Phormidium subfuscum JQ997482 245 7 113 1E-37 94% 94% 284159158 Bacteria Cyanobacteria n uncultured Hydrocoleum sp. JQ997483 443 29 397 3E-165 95% 95% 225382331 Bacteria Cyanobacteria n uncultured Oscillatoriales cyanobacterium JQ997484 551 4 546 0 100% 100% 225382352 Bacteria Cyanobacteria n uncultured Oscillatoriales cyanobacterium JQ997485 565 5 554 0 88% 88% 225382365 Bacteria Cyanobacteria n uncultured Oscillatoriales cyanobacterium JQ997486 246 18 214 4E-86 96% 96% 220683485 Bacteria Cyanobacteria n Wilmottia murrayi JQ997489 357 3 280 2E-125 96% 96% 225696175 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997497 571 18 565 0 92% 92% 225696186 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997487 307 15 118 1E-17 84% 84% 225696195 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997494 535 5 402 3E-156 92% 92% 225696229 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997493 519 14 482 1E-115 84% 84% 225696231 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997491 386 19 279 3E-89 90% 90% 225696240 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997488 331 4 286 4E-127 96% 96% 225696241 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997492 427 1 106 9E-31 91% 91% 225696253 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997495 546 5 546 0 93% 93% 225696255 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997490 377 5 92 6E-32 97% 97% 225696261 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997496 552 5 542 0 94% 94% 225696263 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. JQ997498 531 19 467 0 97% 97% 3093968 Bacteria Deferribacteres Mucispirillum schaedleri JQ997499 495 25 419 0 97% 97% 37665094 Bacteria Deinococcus-Thermus Deinococcaceae Deinococcus marmoris JQ997500 332 18 128 9E-25 87% 87% 214091025 Bacteria Firmicutes Anoxybacillus sp. F81 JQ997532 347 1 347 1E-173 99% 99% 15042017 Bacteria Firmicutes Bacillaceae Bacillus JQ997502 568 7 541 0 98% 98% 254826546 Bacteria Firmicutes Bacillaceae Bacillus agaradhaerens JQ997501 546 5 397 0 99% 99% 283486716 Bacteria Firmicutes Bacillaceae Bacillus agaradhaerens JQ997504 535 19 532 0 100% 100% 269994025 Bacteria Firmicutes Bacillaceae Bacillus cereus JQ997503 524 5 521 0 99% 99% 270297792 Bacteria Firmicutes Bacillaceae Bacillus cereus JQ997505 543 5 541 0 99% 99% 294999187 Bacteria Firmicutes Bacillaceae Bacillus cereus JQ997506 341 23 290 6E-131 99% 99% 294769167 Bacteria Firmicutes Bacillaceae Bacillus circulans JQ997507 457 18 381 0 99% 99% 223959348 Bacteria Firmicutes Bacillaceae Bacillus cohnii JQ997508 534 17 498 0 99% 99% 94962022 Bacteria Firmicutes Bacillaceae Bacillus decisifrondis JQ997509 463 15 393 0 99% 99% 209552631 Bacteria Firmicutes Bacillaceae Bacillus halmapalus JQ997511 535 18 522 0 97% 97% 13899044 Bacteria Firmicutes Bacillaceae Bacillus horikoshii JQ997510 248 20 216 4E-81 95% 95% 239812465 Bacteria Firmicutes Bacillaceae Bacillus horikoshii JQ997514 572 4 572 0 96% 96% 254675492 Bacteria Firmicutes Bacillaceae Bacillus horikoshii JQ997512 538 5 533 0 97% 97% 262410334 Bacteria Firmicutes Bacillaceae Bacillus horikoshii JQ997513 556 11 554 0 89% 89% 264667843 Bacteria Firmicutes Bacillaceae Bacillus horikoshii JQ997515 581 18 576 0 94% 94% 264667850 Bacteria Firmicutes Bacillaceae Bacillus horikoshii JQ997516 551 12 547 0 92% 92% 47847537 Bacteria Firmicutes Bacillaceae Bacillus horti JQ997518 356 5 305 2E-90 87% 87% 134290370 Bacteria Firmicutes Bacillaceae Bacillus megaterium JQ997517 270 3 54 4E-17 100% 100% 295790202 Bacteria Firmicutes Bacillaceae Bacillus megaterium JQ997519 431 21 323 1E-148 99% 99% 78038860 Bacteria Firmicutes Bacillaceae Bacillus sp. 7327 JQ997520 556 16 527 0 92% 92% 148357814 Bacteria Firmicutes Bacillaceae Bacillus sp. 8SB JQ997521 539 4 538 0 91% 91% 56417368 Bacteria Firmicutes Bacillaceae Bacillus sp. BA299 JQ997522 278 5 231 1E-112 100% 100% 289065368 Bacteria Firmicutes Bacillaceae Bacillus sp. CCBAU 05776 JQ997523 540 5 534 0 99% 99% 24415973 Bacteria Firmicutes Bacillaceae Bacillus sp. CPB 7 JQ997524 523 19 523 0 97% 97% 225031498 Bacteria Firmicutes Bacillaceae Bacillus sp. E-163 JQ997525 692 2 253 2E-103 94% 94% 195548081 Bacteria Firmicutes Bacillaceae Bacillus sp. EK-1 JQ997526 379 17 334 2E-161 99% 99% 222350117 Bacteria Firmicutes Bacillaceae Bacillus sp. F2-1 JQ997527 552 19 548 0 97% 97% 281487256 Bacteria Firmicutes Bacillaceae Bacillus sp. H3B7 JQ997528 257 5 203 8E-99 100% 100% 295815415 Bacteria Firmicutes Bacillaceae Bacillus sp. I_B8 JQ997529 377 5 344 1E-148 95% 95% 193795485 Bacteria Firmicutes Bacillaceae Bacillus sp. ISO_02_Chiprana JQ997530 473 5 242 2E-87 93% 93% 116089632 Bacteria Firmicutes Bacillaceae Bacillus sp. m3-13 JQ997531 541 5 540 0 99% 99% 257751815 Bacteria Firmicutes Bacillaceae Bacillus sp. MB63 JQ997533 551 5 523 0 96% 96% 158562722 Bacteria Firmicutes Bacillaceae Bacillus sp. NP16 JQ997534 337 4 305 5E-132 95% 95% 291196850 Bacteria Firmicutes Bacillaceae Bacillus sp. OU-A7 JQ997535 408 18 273 5E-73 88% 88% 289186779 Bacteria Firmicutes Bacillaceae Bacillus sp. QT14 JQ997536 647 24 205 6E-34 84% 84% 21541801 Bacteria Firmicutes Bacillaceae Bacillus sp. RiMSX30 165 Table S1 Cont.

JQ997537 531 13 495 0 95% 95% 189307029 Bacteria Firmicutes Bacillaceae Bacillus sp. SL177 JQ997538 569 26 565 0 94% 94% 223048105 Bacteria Firmicutes Bacillaceae Bacillus sp. T2830 JQ997539 307 3 276 7E-120 96% 96% 295322911 Bacteria Firmicutes Bacillaceae Bacillus sp. T47_2010_ JQ997540 553 16 553 0 99% 99% 86279623 Bacteria Firmicutes Bacillaceae Bacillus sp. YIM DKMY117-2 JQ997541 538 16 485 0 93% 93% 143425 Bacteria Firmicutes Bacillaceae Bacillus subtilis JQ997542 542 17 539 0 93% 93% 223972589 Bacteria Firmicutes Bacillaceae Bacillus trypoxylicola JQ997543 585 22 516 0 94% 94% 283486718 Bacteria Firmicutes Bacillaceae Marinococcus sp. 2009 JQ997544 526 5 523 0 98% 98% 283486730 Bacteria Firmicutes Bacillaceae Marinococcus sp. 2046 JQ997545 556 34 554 0 99% 99% 45934548 Bacteria Firmicutes Bacillaceae Marinococcus sp. GSP32 JQ997550 508 4 474 7E-152 88% 88% 157644545 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. JQ997552 560 18 555 0 96% 96% 189514178 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. JQ997551 546 17 543 0 94% 94% 195971876 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. JQ997547 445 17 400 2E-167 95% 95% 238836028 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. JQ997549 490 17 211 3E-96 100% 100% 282765781 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. JQ997546 401 5 291 7E-146 100% 100% 284428514 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. JQ997553 391 5 358 9E-140 92% 92% 291289395 Bacteria Firmicutes n Alkalilactibacillus ikkense JQ997554 296 24 264 2E-105 96% 96% 220897617 Bacteria Firmicutes n uncultured bacterium JQ997555 412 5 380 1E-168 95% 95% 219857497 Bacteria Firmicutes Paenibacillaceae Paenibacillus granivorans JQ997556 537 18 536 0 95% 95% 154818670 Bacteria Firmicutes Paenibacillaceae Saccharibacillus kuerlensis JQ997557 384 18 327 1E-133 95% 95% 254972685 Bacteria Firmicutes Paenibacillaceae uncultured Paenibacillus sp. JQ997558 394 20 360 3E-174 99% 99% 293633232 Bacteria Firmicutes Planococcaceae Planococcus maitriensis JQ997559 546 3 540 0 95% 95% 292386047 Bacteria Firmicutes Planococcaceae Planococcus maritimus JQ997560 337 31 292 1E-128 99% 99% 262410342 Bacteria Firmicutes Planococcaceae Planococcus psychrotoleratus JQ997561 311 4 281 1E-87 90% 90% 24431214 Bacteria Firmicutes Planococcaceae Planococcus sp. 1-1 JQ997562 524 5 413 0 96% 96% 78033696 Bacteria Firmicutes Planococcaceae Planococcus sp. 3059 JQ997563 521 5 454 3E-161 90% 90% 219944562 Bacteria Firmicutes Planococcaceae Planococcus sp. B-2 JQ997564 547 5 532 0 99% 99% 255689464 Bacteria Firmicutes Planococcaceae Planococcus sp. BSw21500 JQ997565 495 17 387 3E-160 95% 95% 291293772 Bacteria Firmicutes Planococcaceae Planococcus sp. enrichment culture clone B2-1 JQ997566 544 18 544 0 99% 99% 291419708 Bacteria Firmicutes Planococcaceae Planococcus sp. JDN JQ997567 602 18 576 0 89% 89% 270048079 Bacteria Firmicutes Planococcaceae Planococcus sp. ljh-25 JQ997568 570 17 546 0 94% 94% 54303743 Bacteria Firmicutes Planococcaceae Planococcus sp. NPO-JL-69 JQ997569 348 5 302 1E-128 95% 95% 289629721 Bacteria Firmicutes Planococcaceae Planococcus sp. S118 JQ997570 552 5 552 0 99% 99% 89257980 Bacteria Firmicutes Planococcaceae Planococcus sp. TSBY-25 JQ997571 547 1 522 0 91% 91% 209981399 Bacteria Firmicutes Planococcaceae Planococcus sp. Zao-A JQ997572 491 12 435 0 99% 99% 240248421 Bacteria Firmicutes Planococcaceae Planomicrobium koreense JQ997573 288 5 244 1E-97 94% 94% 262410321 Bacteria Firmicutes Planococcaceae Planomicrobium okeanokoites JQ997574 558 5 558 0 98% 98% 260100820 Bacteria Firmicutes Planococcaceae Planomicrobium psychrophilum JQ997575 573 21 569 0 95% 95% 256274955 Bacteria Firmicutes Planococcaceae Planomicrobium sp. G-5 JQ997576 565 17 509 0 99% 99% 209917046 Bacteria Firmicutes Planococcaceae Planomicrobium sp. ISL-41 JQ997577 295 23 239 1E-102 99% 99% 294992062 Bacteria Firmicutes Planococcaceae Planomicrobium sp. MJ426 JQ997578 559 18 527 0 99% 99% 198250506 Bacteria Firmicutes Planococcaceae Planomicrobium sp. RCML-41 JQ997579 437 4 361 0 99% 99% 270282461 Bacteria Firmicutes Planococcaceae Sporosarcina sp. 4-76 JQ997580 251 5 206 2E-100 100% 100% 257043974 Bacteria Firmicutes Planococcaceae Sporosarcina sp. LI4 JQ997581 571 24 567 0 96% 96% 154757245 Bacteria Firmicutes Planococcaceae uncultured Jeotgalibacillus sp. JQ997582 433 5 372 2E-147 93% 93% 255976610 Bacteria Firmicutes Planococcaceae uncultured Planococcaceae bacterium JQ997583 440 17 282 1E-68 86% 86% 260874923 Bacteria Firmicutes Planococcaceae uncultured Planococcaceae bacterium JQ997588 572 18 569 0 96% 96% 154757011 Bacteria Firmicutes Planococcaceae uncultured Planococcus sp. JQ997585 456 5 410 0 98% 98% 161702580 Bacteria Firmicutes Planococcaceae uncultured Planococcus sp. JQ997587 538 4 538 0 100% 100% 187711733 Bacteria Firmicutes Planococcaceae uncultured Planococcus sp. JQ997586 466 4 411 0 100% 100% 292485796 Bacteria Firmicutes Planococcaceae uncultured Planococcus sp. JQ997584 377 17 328 2E-146 97% 97% 292485817 Bacteria Firmicutes Planococcaceae uncultured Planococcus sp. JQ997590 550 5 533 0 96% 96% 161702576 Bacteria Firmicutes Planococcaceae uncultured Planomicrobium sp. JQ997589 482 26 436 1E-159 93% 93% 161702610 Bacteria Firmicutes Planococcaceae uncultured Planomicrobium sp. JQ997591 348 24 261 6E-96 94% 94% 74052526 Bacteria Firmicutes Sporolactobacillaceae Sinobaca qinghaiensis JQ997592 573 23 518 0 98% 98% 219846053 Bacteria Firmicutes Staphylococcaceae Jeotgalicoccus halotolerans JQ997593 561 21 528 0 96% 96% 290783602 Bacteria Firmicutes Staphylococcaceae Jeotgalicoccus nanhaiensis JQ997594 561 29 558 0 92% 92% 219846054 Bacteria Firmicutes Staphylococcaceae Jeotgalicoccus psychrophilus JQ997595 557 21 539 0 91% 91% 219373967 Bacteria Firmicutes Staphylococcaceae Jeotgalicoccus sp. YD2-57 JQ997596 547 5 543 0 99% 99% 86279588 Bacteria Firmicutes Staphylococcaceae Jeotgalicoccus sp. YIM KMY9-1 JQ997597 536 19 514 0 96% 96% 21586494 Bacteria Firmicutes Staphylococcaceae Macrococcus brunensis JQ997598 374 26 326 1E-127 95% 95% 294769193 Bacteria Firmicutes Staphylococcaceae Macrococcus caseolyticus JQ997599 751 5 291 2E-134 97% 97% 283858007 Bacteria Firmicutes Staphylococcaceae Macrococcus sp. AMGM1 JQ997600 563 14 366 2E-162 96% 96% 194368442 Bacteria Firmicutes Staphylococcaceae Salinicoccus sp. B-WPyS1 JQ997601 487 18 416 0 96% 96% 269313994 Bacteria Firmicutes Staphylococcaceae Staphylococcus arlettae JQ997603 347 5 275 1E-132 99% 99% 219809027 Bacteria Firmicutes Staphylococcaceae Staphylococcus sp. BQN1P-02d JQ997604 553 18 546 0 99% 99% 295443962 Bacteria Firmicutes Staphylococcaceae Staphylococcus sp. NCCP-163 JQ997605 316 5 271 2E-131 99% 99% 242027442 Bacteria Firmicutes Staphylococcaceae Staphylococcus sp. NII-116 JQ997612 614 5 86 4E-31 99% 99% 75753540 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997609 503 15 449 0 97% 97% 238835780 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997608 471 166 410 2E-122 100% 100% 238835868 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997610 540 5 533 0 95% 95% 238836019 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997611 545 18 541 0 96% 96% 238836032 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997607 349 21 314 4E-128 96% 96% 238836130 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997606 272 18 190 2E-84 100% 100% 259120992 Bacteria Firmicutes Staphylococcaceae uncultured Staphylococcus sp. JQ997613 707 30 134 3E-37 95% 95% 20385618 Bacteria Firmicutes Carnobacteriaceae Carnobacterium mobile JQ997614 500 18 441 0 100% 100% 257167997 Bacteria Firmicutes Carnobacteriaceae Carnobacterium sp. 12266/2009 JQ997615 295 14 250 7E-120 100% 100% 49617305 Bacteria Firmicutes Carnobacteriaceae Carnobacterium sp. BM-8 JQ997616 514 24 399 0 99% 99% 284022001 Bacteria Firmicutes Carnobacteriaceae Trichococcus sp. EX-07 JQ997617 283 16 224 2E-89 96% 96% 225031786 Bacteria Firmicutes Carnobacteriaceae uncultured Alkalibacterium sp. 166 Table S1 Cont.

JQ997619 551 5 477 0 99% 99% 292485813 Bacteria Firmicutes Carnobacteriaceae uncultured Carnobacterium sp. JQ997618 249 17 217 4E-96 99% 99% 295147519 Bacteria Firmicutes Carnobacteriaceae uncultured Carnobacterium sp. JQ997620 483 19 387 2E-88 84% 84% 223048104 Bacteria Firmicutes Vagococcus sp. T4130 JQ997621 508 5 475 0 96% 96% 294438969 Bacteria Firmicutes Lactobacillaceae Lactobacillus acidophilus JQ997622 520 17 518 0 94% 94% 265679044 Bacteria Firmicutes Lactobacillaceae Lactobacillus amylolyticus JQ997625 343 5 195 5E-67 91% 91% 226377546 Bacteria Firmicutes Lactobacillaceae Lactobacillus casei JQ997627 482 5 434 0 100% 100% 292673285 Bacteria Firmicutes Lactobacillaceae Lactobacillus curvatus JQ997634 520 18 475 0 100% 100% 121581899 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997629 383 5 344 5E-177 100% 100% 163954906 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997631 494 24 492 0 93% 93% 237512278 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997635 529 5 527 0 100% 100% 237512288 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997628 350 5 306 5E-152 99% 99% 292673284 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997632 513 5 510 0 100% 100% 292673288 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997633 518 18 515 0 99% 99% 294438970 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997630 480 18 438 0 99% 99% 294938080 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii JQ997636 536 17 530 0 96% 96% 218775050 Bacteria Firmicutes Lactobacillaceae Lactobacillus equicursoris JQ997637 491 23 432 0 100% 100% 295149327 Bacteria Firmicutes Lactobacillaceae Lactobacillus fermentum JQ997652 425 5 392 1E-133 91% 91% 254305415 Bacteria Firmicutes Lactobacillaceae Lactobacillus paracasei JQ997664 557 18 534 0 96% 96% 37496510 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997653 248 4 205 7E-99 100% 100% 57864919 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997656 487 5 425 0 96% 96% 127905849 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997661 542 20 497 1E-174 90% 90% 285201574 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997657 502 18 442 2E-167 92% 92% 285201674 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997654 359 4 309 8E-140 96% 96% 285201707 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997655 407 18 297 2E-126 96% 96% 285201711 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997662 553 5 505 0 99% 99% 285201754 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997658 538 4 492 0 100% 100% 288812699 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997659 538 23 534 0 100% 100% 290760129 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997660 539 5 539 0 93% 93% 290784161 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997663 555 171 555 0 98% 98% 294714412 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus JQ997665 284 4 240 6E-120 100% 100% 288225761 Bacteria Firmicutes Lactobacillaceae Lactobacillus salivarius JQ997666 541 5 515 0 94% 94% 292385841 Bacteria Firmicutes Lactobacillaceae Lactobacillus salivarius JQ997667 436 5 404 7E-166 94% 94% 15408520 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. B5406 JQ997668 542 18 437 0 98% 98% 55418398 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. BCRC16000 JQ997669 399 3 190 2E-92 100% 100% 285171057 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. oral taxon 461 JQ997670 470 18 397 0 99% 99% 285802972 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. oral taxon 461 JQ997671 371 5 324 2E-150 98% 98% 38327309 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. RA2062 JQ997672 542 18 542 0 92% 92% 38373968 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. rennanqilfy2 JQ997673 493 5 447 0 94% 94% 295394120 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. TAB-26 JQ997674 272 5 228 1E-87 93% 93% 188572057 Bacteria Firmicutes Lactobacillaceae Lactobacillus vaginalis JQ997675 525 154 474 1E-80 85% 85% 151384635 Bacteria Firmicutes Lactobacillaceae uncultured Lactobacillus sp. JQ997676 532 18 528 0 94% 94% 295646945 Bacteria Firmicutes Lactobacillaceae uncultured Lactobacillus sp. JQ997678 296 4 239 2E-119 100% 100% 21591599 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997677 276 4 227 5E-111 100% 100% 124244807 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997682 534 18 499 0 99% 99% 206600409 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997683 540 16 537 0 99% 99% 225029381 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997680 400 8 237 2E-77 91% 91% 225029671 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997681 508 18 428 0 100% 100% 254971972 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997679 313 4 271 4E-137 100% 100% 285803105 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997684 541 17 537 0 100% 100% 294828947 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997685 544 5 526 0 100% 100% 295815580 Bacteria Firmicutes Streptococcaceae Lactococcus lactis JQ997696 286 1 286 5E-147 100% 100% 11991762 Bacteria Firmicutes Streptococcaceae Streptococcus mutans JQ997686 342 5 293 1E-88 88% 88% 285177464 Bacteria Firmicutes Streptococcaceae Streptococcus constellatus JQ997691 555 18 550 0 97% 97% 55163273 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus JQ997690 543 5 477 0 100% 100% 284451152 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus JQ997689 441 4 289 3E-145 100% 100% 285177736 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus JQ997688 375 18 354 2E-170 99% 99% 285178071 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus JQ997687 283 4 221 1E-107 100% 100% 285178094 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus JQ997692 556 3 551 0 100% 100% 295002588 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus JQ997693 529 5 256 3E-121 98% 98% 295002589 Bacteria Firmicutes Streptococcaceae Streptococcus gordonii JQ997697 547 19 547 0 100% 100% 295002592 Bacteria Firmicutes Streptococcaceae Streptococcus mutans JQ997699 525 18 480 0 98% 98% 290759894 Bacteria Firmicutes Streptococcaceae Streptococcus oralis JQ997700 531 17 527 0 98% 98% 295002593 Bacteria Firmicutes Streptococcaceae Streptococcus oralis JQ997701 484 24 432 0 98% 98% 285801983 Bacteria Firmicutes Streptococcaceae Streptococcus parasanguinis JQ997702 491 5 453 0 98% 98% 285802105 Bacteria Firmicutes Streptococcaceae Streptococcus parasanguinis JQ997703 530 22 524 0 100% 100% 290759892 Bacteria Firmicutes Streptococcaceae Streptococcus parasanguinis JQ997704 526 17 470 0 100% 100% 290759899 Bacteria Firmicutes Streptococcaceae Streptococcus pneumoniae JQ997707 300 18 260 1E-121 100% 100% 24474984 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997713 538 21 535 0 99% 99% 171191150 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997714 539 15 537 0 99% 99% 208657445 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997711 389 18 344 4E-168 100% 100% 284176962 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997708 325 3 176 8E-85 100% 100% 285194533 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997715 550 5 545 0 98% 98% 285194543 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997705 275 18 244 1E-112 100% 100% 285194556 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997709 326 3 269 3E-128 98% 98% 285194587 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997706 284 51 251 9E-84 95% 95% 285194597 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997712 401 17 351 5E-88 86% 86% 285194615 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997710 337 18 266 8E-105 95% 95% 285194695 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius JQ997716 565 18 563 0 94% 94% 295002596 Bacteria Firmicutes Streptococcaceae Streptococcus salivarius 167 Table S1 Cont.

JQ997719 439 5 120 2E-28 89% 89% 11526815 Bacteria Firmicutes Streptococcaceae Streptococcus sp. ES11 JQ997721 434 15 317 5E-133 95% 95% 285203360 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997725 538 18 421 0 96% 96% 285203400 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997724 506 2 418 0 97% 97% 285203424 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997722 472 4 423 8E-141 89% 89% 285203520 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997720 322 4 270 6E-126 98% 98% 285203550 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997726 564 5 502 6E-123 85% 85% 285203608 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997723 501 4 447 0 96% 96% 285203671 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon C65 JQ997727 333 21 288 2E-130 99% 99% 285206259 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon G59 JQ997740 513 5 440 0 99% 99% 28274377 Bacteria Firmicutes Streptococcaceae Streptococcus vestibularis JQ997739 337 4 271 5E-137 100% 100% 223470134 Bacteria Firmicutes Streptococcaceae Streptococcus vestibularis JQ997738 286 2 247 1E-121 99% 99% 285159297 Bacteria Firmicutes Streptococcaceae Streptococcus vestibularis JQ997741 530 5 402 0 100% 100% 285159329 Bacteria Firmicutes Streptococcaceae Streptococcus vestibularis JQ999504 340 23 280 1E-127 99% 99% 269911992 Bacteria Firmicutes Streptococcaceae uncultured Streptococcaceae bacterium JQ997757 548 17 542 0 97% 97% 15593129 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997753 540 5 539 0 99% 99% 15593133 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997746 400 23 351 3E-159 98% 98% 60501121 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997754 544 61 539 0 100% 100% 60501133 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997743 337 5 243 4E-103 96% 96% 60501533 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997758 550 18 518 0 100% 100% 60501679 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997759 551 4 548 0 99% 99% 60501751 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997742 241 5 187 9E-88 99% 99% 77819579 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997749 482 5 410 0 98% 98% 85813098 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997747 410 17 354 4E-168 99% 99% 110613571 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997761 560 9 560 0 88% 88% 164453546 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997756 547 18 509 0 98% 98% 171467463 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997748 414 5 349 2E-161 97% 97% 189305804 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997745 379 4 328 1E-168 100% 100% 189305923 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997744 341 5 294 3E-139 98% 98% 238914954 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997752 539 17 502 0 100% 100% 254972501 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997750 532 18 530 0 100% 100% 259221069 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997751 534 16 351 2E-157 97% 97% 281332796 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997760 554 4 548 0 96% 96% 295646938 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997755 544 18 544 0 99% 99% 295646981 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. JQ997770 495 19 439 6E-177 94% 94% 148717062 Bacteria Firmicutes n uncultured bacterium JQ997775 545 370 487 1E-49 98% 98% 154185054 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997768 473 15 404 0 98% 98% 154186406 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997762 283 5 251 8E-124 100% 100% 154186415 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997763 310 7 277 2E-110 94% 94% 154187391 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997769 478 62 397 5E-153 96% 96% 154187409 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997767 468 4 407 0 97% 97% 154187417 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997771 532 5 531 0 97% 97% 154188162 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997766 437 6 391 6E-152 93% 93% 154188901 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997774 542 5 538 0 100% 100% 154189837 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997772 540 15 531 2E-172 88% 88% 154191244 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997777 547 5 547 0 99% 99% 154193192 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997764 403 5 352 9E-180 100% 100% 154193211 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997773 540 18 487 0 94% 94% 154195887 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997765 432 34 369 4E-158 97% 97% 154198794 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997776 546 18 540 0 98% 98% 154198888 Bacteria Firmicutes n uncultured Bacilli bacterium JQ997778 385 18 287 2E-136 100% 100% 171336058 Bacteria Firmicutes Clostridiaceae Butyricicoccus pullicaecorum JQ997779 527 15 497 0 95% 95% 166063935 Bacteria Firmicutes Clostridiaceae Clostridiaceae bacterium SK082 JQ997780 583 18 399 1E-159 94% 94% 265678940 Bacteria Firmicutes Clostridiaceae Clostridium nexile JQ997781 338 17 306 3E-104 91% 91% 254841672 Bacteria Firmicutes Clostridiaceae Clostridium perfringens JQ997783 533 19 470 0 100% 100% 283945409 Bacteria Firmicutes Clostridiaceae Clostridium perfringens JQ997782 383 13 328 2E-156 99% 99% 294799804 Bacteria Firmicutes Clostridiaceae Clostridium perfringens JQ997784 348 3 260 4E-113 96% 96% 47558861 Bacteria Firmicutes Clostridiaceae Clostridium saccharolyticum JQ997785 465 5 431 0 96% 96% 269854738 Bacteria Firmicutes Clostridiaceae Clostridium sp. 4-2a JQ997786 650 4 76 2E-28 100% 100% 238769132 Bacteria Firmicutes Clostridiaceae Clostridium sp. F-02 JQ997787 258 1 217 1E-106 100% 100% 295646952 Bacteria Firmicutes Clostridiaceae uncultured Clostridium sp. JQ997788 433 24 285 1E-133 100% 100% 91093763 Bacteria Firmicutes Eubacteriaceae Eubacterium tenue JQ997789 491 16 421 2E-162 93% 93% 285162551 Bacteria Firmicutes Lachnospiraceae Lachnospiraceae bacterium oral taxon 107 JQ997790 299 18 256 2E-115 99% 99% 30908820 Bacteria Firmicutes Lachnospiraceae Lachnospiraceae genomosp. C1 JQ997791 322 18 278 9E-119 97% 97% 110555124 Bacteria Firmicutes Lachnospiraceae Robinsoniella peoriensis JQ997793 484 4 412 0 98% 98% 154190788 Bacteria Firmicutes Lachnospiraceae uncultured Lachnospiraceae bacterium JQ997795 547 29 434 3E-165 93% 93% 154191505 Bacteria Firmicutes Lachnospiraceae uncultured Lachnospiraceae bacterium JQ997794 527 6 508 0 95% 95% 154194307 Bacteria Firmicutes Lachnospiraceae uncultured Lachnospiraceae bacterium JQ997792 332 4 254 1E-122 99% 99% 253683955 Bacteria Firmicutes Lachnospiraceae uncultured Lachnospiraceae bacterium JQ997798 422 25 368 7E-156 96% 96% 262223663 Bacteria Firmicutes n Flavonifractor plautii JQ997796 248 16 216 6E-100 100% 100% 294799805 Bacteria Firmicutes n Flavonifractor plautii JQ997797 313 18 265 2E-121 99% 99% 294799812 Bacteria Firmicutes n Flavonifractor plautii JQ997799 380 18 348 1E-157 98% 98% 186915004 Bacteria Firmicutes n uncultured Clostridiales bacterium JQ997800 552 5 548 0 98% 98% 215981601 Bacteria Firmicutes n uncultured Clostridiales bacterium JQ997802 471 15 415 2E-171 94% 94% 293509177 Bacteria Firmicutes Ruminococcaceae Ruminococcus sp. 316498/08 JQ997803 575 3 96 2E-38 99% 99% 85542635 Bacteria Firmicutes n uncultured Clostridia bacterium JQ997805 462 4 417 0 99% 99% 295315546 Bacteria Firmicutes Erysipelotrichaceae Eubacterium cylindroides JQ997806 543 17 508 0 97% 97% 119371525 Bacteria Firmicutes n Firmicutes bacterium BL80 JQ997807 556 18 444 0 95% 95% 119371526 Bacteria Firmicutes n Firmicutes bacterium EG14 JQ997868 472 22 315 1E-144 99% 99% 20975393 Bacteria Firmicutes n Firmicutes str. C29 168 Table S1 Cont.

JQ998191 364 5 306 4E-118 92% 92% 237970917 Bacteria Firmicutes n uncultured bacterium JQ998874 542 24 541 0 92% 92% 237987786 Bacteria Firmicutes n uncultured bacterium JQ997826 515 5 463 0 97% 97% 118135996 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997821 422 17 378 6E-132 91% 91% 151936484 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997822 428 24 363 2E-146 95% 95% 156121565 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997823 430 5 325 4E-158 98% 98% 197131304 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997835 573 20 550 0 91% 91% 217038553 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997830 538 6 536 0 93% 93% 239835509 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997833 545 18 509 0 99% 99% 260072815 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997831 538 5 239 3E-116 100% 100% 290565065 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997813 327 1 280 2E-95 90% 90% 290565070 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997815 369 18 323 4E-158 100% 100% 291328355 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997816 371 18 331 1E-162 100% 100% 291329218 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997812 326 130 294 4E-68 96% 96% 291329291 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997819 401 18 343 6E-127 93% 93% 291329413 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997808 274 5 212 8E-104 100% 100% 291329502 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997834 547 10 542 0 96% 96% 291329543 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997824 464 18 413 0 100% 100% 291329590 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997829 537 5 537 0 97% 97% 291329777 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997814 341 7 265 8E-130 100% 100% 291329778 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997817 384 18 337 3E-164 100% 100% 291329796 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997825 481 23 305 7E-142 99% 99% 291330079 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997832 538 17 534 0 100% 100% 291330349 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997818 400 5 357 0 100% 100% 291330933 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997810 293 5 248 2E-120 99% 99% 291331108 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997809 286 18 182 7E-80 100% 100% 291331129 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997811 323 5 283 3E-143 100% 100% 291331435 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997827 524 14 506 0 92% 92% 291331838 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997820 416 116 371 1E-128 100% 100% 291332873 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997828 531 14 177 8E-62 95% 95% 291332981 Bacteria Firmicutes n uncultured Firmicutes bacterium JQ997836 420 5 372 4E-153 94% 94% 55418240 Bacteria Firmicutes n uncultured low G+C Gram-positive bacterium JQ997837 514 4 465 0 92% 92% 257480629 Bacteria Firmicutes Veillonellaceae Selenomonas sputigena JQ997840 546 16 404 0 100% 100% 285166114 Bacteria Firmicutes Veillonellaceae Selenomonas sputigena JQ997839 543 5 444 0 99% 99% 285166457 Bacteria Firmicutes Veillonellaceae Selenomonas sputigena JQ997838 528 18 477 0 98% 98% 285166459 Bacteria Firmicutes Veillonellaceae Selenomonas sputigena JQ997841 433 5 374 0 99% 99% 162846335 Bacteria Firmicutes Veillonellaceae uncultured Selenomonas sp. JQ997842 531 29 529 0 98% 98% 209781609 Bacteria Firmicutes Veillonellaceae uncultured Veillonella sp. JQ997843 412 15 357 3E-164 98% 98% 290759910 Bacteria Firmicutes Veillonellaceae Veillonella dispar JQ997844 522 5 494 0 97% 97% 291220264 Bacteria Firmicutes Veillonellaceae Veillonella parvula JQ997845 414 3 365 1E-128 91% 91% 295315569 Bacteria Firmicutes Veillonellaceae Veillonella sp. oral clone 13-17 JQ997846 549 5 544 0 99% 99% 62910916 Bacteria Firmicutes Veillonellaceae Veillonella sp. oral clone VeillG4 JQ997849 534 153 533 0 99% 99% 253684045 Bacteria Fusobacteria Fusobacteriaceae uncultured Leptotrichia sp. JQ997848 526 4 522 0 94% 94% 269979883 Bacteria Fusobacteria Fusobacteriaceae uncultured Leptotrichia sp. JQ997850 450 119 418 1E-149 99% 99% 265678963 Bacteria Fusobacteria n Clostridium rectum JQ997851 430 18 358 1E-148 95% 95% 198387333 Bacteria n n bacterium 071021-ONK-SLIME-CHAB2 JQ997852 353 258 315 3E-20 100% 100% 295639957 Bacteria n n bacterium EK-I90 JQ997853 511 9 57 7E-13 98% 98% 294883925 Bacteria n n bacterium enrichment culture clone heteroA1_4W JQ997854 336 24 285 1E-133 100% 100% 294883955 Bacteria n n bacterium enrichment culture clone heteroA75_4W

JQ997855 539 4 248 4E-115 98% 98% 294883970 Bacteria n n bacterium enrichment culture clone heteroB99_4W

JQ997856 301 86 210 2E-40 92% 92% 291419661 Bacteria n n bacterium enrichment culture clone NAP-24 JQ997857 542 5 527 0 97% 97% 291419658 Bacteria n n bacterium enrichment culture clone NAP-40 JQ997858 404 9 366 9E-120 89% 89% 289188106 Bacteria n n bacterium enrichment culture clone SRC_DSC19 JQ997859 511 18 455 0 98% 98% 256665427 Bacteria n n bacterium F3_2009_ JQ997860 578 22 523 0 92% 92% 66932766 Bacteria n n bacterium ic1311 JQ997861 565 42 542 0 98% 98% 66932769 Bacteria n n bacterium ic1337 JQ997862 541 5 540 0 99% 99% 254547235 Bacteria n n bacterium Mlm3 JQ997863 313 25 281 6E-106 94% 94% 219660860 Bacteria n n bacterium N159B.200 JQ997864 491 5 427 5E-178 94% 94% 219660887 Bacteria n n bacterium N159G.614 JQ997865 537 19 467 0 97% 97% 39652444 Bacteria n n bacterium PE03-7A27 JQ997866 485 5 239 3E-115 99% 99% 56790880 Bacteria n n bacterium SN12-19 JQ997867 738 12 302 4E-141 98% 98% 37789225 Bacteria n n extreme arid zone bacterium HX-IE13 JQ997869 332 18 265 3E-124 100% 100% 290782562 Bacteria n n halophilic bacterium NAHalo1 JQ997870 537 23 532 0 100% 100% 258618269 Bacteria n n intestinal bacterium CPA-20A JQ997871 325 49 294 1E-112 97% 97% 209865500 Bacteria n n iron-reducing bacterium enrichment culture clone HN31 JQ997872 639 5 364 9E-162 96% 96% 224796376 Bacteria n n swine fecal bacterium RF2B-Pec19 JQ997873 543 18 540 0 99% 99% 26225064 Bacteria n n swine manure bacterium RT-3A JQ998078 324 18 294 3E-129 97% 97% 2117330 Bacteria n n uncultured bacterium JQ999214 698 9 267 3E-122 98% 98% 3901210 Bacteria n n uncultured bacterium JQ998451 456 4 413 0 95% 95% 14289547 Bacteria n n uncultured bacterium JQ999183 582 168 442 9E-77 88% 88% 14916017 Bacteria n n uncultured bacterium JQ998187 364 92 331 2E-106 96% 96% 18141101 Bacteria n n uncultured bacterium JQ998575 497 5 440 0 94% 94% 18644181 Bacteria n n uncultured bacterium JQ997900 250 5 195 2E-89 98% 98% 18644258 Bacteria n n uncultured bacterium JQ998694 526 5 521 0 99% 99% 18644580 Bacteria n n uncultured bacterium JQ998361 419 12 350 6E-152 96% 96% 19170730 Bacteria n n uncultured bacterium JQ997979 288 2 257 7E-95 92% 92% 19170737 Bacteria n n uncultured bacterium JQ999187 595 5 368 2E-178 98% 98% 19908568 Bacteria n n uncultured bacterium 169 Table S1 Cont.

JQ999210 680 5 256 2E-118 98% 98% 21213945 Bacteria n n uncultured bacterium JQ998841 540 23 435 9E-161 92% 92% 22296508 Bacteria n n uncultured bacterium JQ998081 325 18 280 1E-87 90% 90% 25188103 Bacteria n n uncultured bacterium JQ998033 309 20 264 6E-106 96% 96% 32187181 Bacteria n n uncultured bacterium JQ997914 258 3 222 6E-95 96% 96% 38455463 Bacteria n n uncultured bacterium JQ998000 298 24 248 7E-105 98% 98% 40806477 Bacteria n n uncultured bacterium JQ997935 268 51 224 7E-55 90% 90% 45738714 Bacteria n n uncultured bacterium JQ998421 445 4 357 1E-144 94% 94% 50059448 Bacteria n n uncultured bacterium JQ998238 380 5 320 1E-163 100% 100% 50080902 Bacteria n n uncultured bacterium JQ998603 506 24 461 0 97% 97% 50404665 Bacteria n n uncultured bacterium JQ998196 366 5 259 7E-121 98% 98% 50982376 Bacteria n n uncultured bacterium JQ999042 555 18 508 0 93% 93% 54695040 Bacteria n n uncultured bacterium JQ998859 542 23 536 0 98% 98% 54695044 Bacteria n n uncultured bacterium JQ999145 567 18 564 0 89% 89% 55845936 Bacteria n n uncultured bacterium JQ998366 421 20 371 3E-180 99% 99% 56044224 Bacteria n n uncultured bacterium JQ998212 370 5 330 1E-147 96% 96% 57434350 Bacteria n n uncultured bacterium JQ998097 332 18 279 1E-128 99% 99% 60327442 Bacteria n n uncultured bacterium JQ998446 454 131 410 1E-129 97% 97% 60657399 Bacteria n n uncultured bacterium JQ997942 270 4 214 5E-96 97% 97% 60657422 Bacteria n n uncultured bacterium JQ999002 551 18 550 0 97% 97% 61620047 Bacteria n n uncultured bacterium JQ998533 484 5 434 3E-170 92% 92% 61620155 Bacteria n n uncultured bacterium JQ998608 507 18 443 0 97% 97% 62753119 Bacteria n n uncultured bacterium JQ999082 558 18 553 0 99% 99% 62753204 Bacteria n n uncultured bacterium JQ999139 566 5 560 0 99% 99% 62755107 Bacteria n n uncultured bacterium JQ998816 538 17 505 0 95% 95% 66736353 Bacteria n n uncultured bacterium JQ999043 555 17 553 0 96% 96% 66736372 Bacteria n n uncultured bacterium JQ998221 375 5 330 4E-118 92% 92% 66736401 Bacteria n n uncultured bacterium JQ998651 518 149 511 9E-161 95% 95% 66878800 Bacteria n n uncultured bacterium JQ998761 534 5 521 0 100% 100% 66878873 Bacteria n n uncultured bacterium JQ998960 548 11 543 0 93% 93% 70959309 Bacteria n n uncultured bacterium JQ998087 328 18 281 2E-131 99% 99% 70959311 Bacteria n n uncultured bacterium JQ998842 540 23 495 0 94% 94% 70959336 Bacteria n n uncultured bacterium JQ998875 543 5 540 0 95% 95% 70959358 Bacteria n n uncultured bacterium JQ998646 517 18 473 0 96% 96% 71089293 Bacteria n n uncultured bacterium JQ998429 448 5 381 4E-144 92% 92% 71739182 Bacteria n n uncultured bacterium JQ998860 542 5 532 0 95% 95% 73536369 Bacteria n n uncultured bacterium JQ998391 432 4 328 2E-147 97% 97% 74038672 Bacteria n n uncultured bacterium JQ998207 369 6 317 2E-111 91% 91% 74038684 Bacteria n n uncultured bacterium JQ997899 249 11 169 1E-76 100% 100% 75993096 Bacteria n n uncultured bacterium JQ998422 445 9 378 2E-176 97% 97% 77379338 Bacteria n n uncultured bacterium JQ997984 292 5 260 9E-119 97% 97% 78058230 Bacteria n n uncultured bacterium JQ997997 297 16 260 5E-111 97% 97% 81238650 Bacteria n n uncultured bacterium JQ998433 449 18 392 0 99% 99% 82400218 Bacteria n n uncultured bacterium JQ998183 363 5 331 2E-146 96% 96% 82400224 Bacteria n n uncultured bacterium JQ998248 382 7 333 4E-143 95% 95% 82400229 Bacteria n n uncultured bacterium JQ998395 434 17 311 2E-122 94% 94% 82548202 Bacteria n n uncultured bacterium JQ998239 380 24 348 1E-128 93% 93% 83940142 Bacteria n n uncultured bacterium JQ998664 521 5 498 0 96% 96% 84315971 Bacteria n n uncultured bacterium JQ998801 537 7 487 0 92% 92% 85001804 Bacteria n n uncultured bacterium JQ998004 299 4 254 3E-94 92% 92% 85001806 Bacteria n n uncultured bacterium JQ998762 534 2 533 0 99% 99% 85062530 Bacteria n n uncultured bacterium JQ998109 338 5 308 4E-152 99% 99% 85682833 Bacteria n n uncultured bacterium JQ998583 501 5 436 0 95% 95% 85718236 Bacteria n n uncultured bacterium JQ998973 549 5 548 0 95% 95% 85718244 Bacteria n n uncultured bacterium JQ998111 339 18 288 5E-137 100% 100% 87042319 Bacteria n n uncultured bacterium JQ998827 539 18 536 0 98% 98% 90297218 Bacteria n n uncultured bacterium JQ999070 557 1 544 0 92% 92% 90822834 Bacteria n n uncultured bacterium JQ999207 664 5 351 4E-170 98% 98% 90904047 Bacteria n n uncultured bacterium JQ999106 560 5 488 0 97% 97% 92087243 Bacteria n n uncultured bacterium JQ998447 454 5 404 0 99% 99% 92087329 Bacteria n n uncultured bacterium JQ998376 426 88 363 8E-136 99% 99% 92087349 Bacteria n n uncultured bacterium JQ999083 558 5 558 0 95% 95% 92087371 Bacteria n n uncultured bacterium JQ998679 524 24 522 0 96% 96% 92087379 Bacteria n n uncultured bacterium JQ998069 323 18 286 6E-121 96% 96% 92087412 Bacteria n n uncultured bacterium JQ999060 556 18 550 0 99% 99% 92087442 Bacteria n n uncultured bacterium JQ998680 524 16 522 0 98% 98% 92087477 Bacteria n n uncultured bacterium JQ998208 370 5 325 1E-148 97% 97% 92087504 Bacteria n n uncultured bacterium JQ998545 487 5 450 0 99% 99% 92087513 Bacteria n n uncultured bacterium JQ998665 521 18 486 0 99% 99% 92087807 Bacteria n n uncultured bacterium JQ998623 510 18 446 0 100% 100% 98975473 Bacteria n n uncultured bacterium JQ999195 609 107 523 3E-72 81% 81% 99643491 Bacteria n n uncultured bacterium JQ998848 541 19 531 0 94% 94% 102415912 Bacteria n n uncultured bacterium JQ998719 529 18 523 0 94% 94% 102415948 Bacteria n n uncultured bacterium JQ998355 417 18 367 5E-177 99% 99% 108946447 Bacteria n n uncultured bacterium JQ998177 361 18 305 4E-148 100% 100% 108946553 Bacteria n n uncultured bacterium JQ997965 282 4 224 5E-111 100% 100% 109141309 Bacteria n n uncultured bacterium JQ998828 539 4 535 0 96% 96% 109143113 Bacteria n n uncultured bacterium JQ998829 539 18 534 0 91% 91% 109143656 Bacteria n n uncultured bacterium JQ998202 368 3 325 1E-167 100% 100% 109144240 Bacteria n n uncultured bacterium 170 Table S1 Cont.

JQ998489 468 3 435 0 100% 100% 109144608 Bacteria n n uncultured bacterium JQ998050 316 5 268 7E-135 100% 100% 109144972 Bacteria n n uncultured bacterium JQ998326 407 5 374 6E-142 92% 92% 109145027 Bacteria n n uncultured bacterium JQ998913 545 5 528 0 98% 98% 109145060 Bacteria n n uncultured bacterium JQ999084 558 19 558 0 100% 100% 109145272 Bacteria n n uncultured bacterium JQ998334 410 18 378 0 100% 100% 109145452 Bacteria n n uncultured bacterium JQ998472 463 5 430 0 100% 100% 109145521 Bacteria n n uncultured bacterium JQ998104 336 18 268 1E-127 100% 100% 109145993 Bacteria n n uncultured bacterium JQ998327 407 5 301 4E-153 100% 100% 109146142 Bacteria n n uncultured bacterium JQ998861 542 5 513 0 100% 100% 109146225 Bacteria n n uncultured bacterium JQ998894 544 263 542 2E-138 99% 99% 109146231 Bacteria n n uncultured bacterium JQ998476 464 6 401 0 99% 99% 109146267 Bacteria n n uncultured bacterium JQ998222 375 4 340 2E-175 100% 100% 109146295 Bacteria n n uncultured bacterium JQ998802 537 18 491 0 100% 100% 109146793 Bacteria n n uncultured bacterium JQ999044 555 43 553 0 100% 100% 109146801 Bacteria n n uncultured bacterium JQ998107 337 81 298 3E-109 100% 100% 109147422 Bacteria n n uncultured bacterium JQ998961 548 11 548 0 99% 99% 109147514 Bacteria n n uncultured bacterium JQ998522 480 40 434 0 100% 100% 109147647 Bacteria n n uncultured bacterium JQ998803 537 5 517 9E-171 89% 89% 109147713 Bacteria n n uncultured bacterium JQ997906 253 18 223 4E-101 100% 100% 109147753 Bacteria n n uncultured bacterium JQ998079 325 33 292 3E-129 99% 99% 109147859 Bacteria n n uncultured bacterium JQ998502 475 5 121 5E-34 91% 91% 110188400 Bacteria n n uncultured bacterium JQ998137 349 5 312 3E-144 97% 97% 110434048 Bacteria n n uncultured bacterium JQ999085 558 1 553 0 98% 98% 110434224 Bacteria n n uncultured bacterium JQ998876 543 5 458 0 96% 96% 110434340 Bacteria n n uncultured bacterium JQ997923 263 5 228 2E-94 95% 95% 110434402 Bacteria n n uncultured bacterium JQ998928 546 23 538 0 95% 95% 110434494 Bacteria n n uncultured bacterium JQ998425 446 18 388 1E-169 96% 96% 110434584 Bacteria n n uncultured bacterium JQ998944 547 18 542 0 97% 97% 110434604 Bacteria n n uncultured bacterium JQ998405 438 5 364 1E-129 90% 90% 110434708 Bacteria n n uncultured bacterium JQ998184 363 5 297 1E-112 92% 92% 110435398 Bacteria n n uncultured bacterium JQ998152 352 5 300 1E-152 100% 100% 110435586 Bacteria n n uncultured bacterium JQ998070 323 5 237 6E-111 98% 98% 110435722 Bacteria n n uncultured bacterium JQ998204 369 18 296 9E-120 95% 95% 110436204 Bacteria n n uncultured bacterium JQ998720 529 19 463 0 96% 96% 110436291 Bacteria n n uncultured bacterium JQ998217 373 5 330 7E-126 92% 92% 110437056 Bacteria n n uncultured bacterium JQ999199 617 17 503 0 92% 92% 110438165 Bacteria n n uncultured bacterium JQ997924 263 23 215 2E-90 98% 98% 110438192 Bacteria n n uncultured bacterium JQ998763 534 7 527 0 89% 89% 110438225 Bacteria n n uncultured bacterium JQ998037 311 15 269 3E-83 89% 89% 110438326 Bacteria n n uncultured bacterium JQ998038 311 12 275 7E-115 95% 95% 110438379 Bacteria n n uncultured bacterium JQ998205 369 23 317 2E-121 94% 94% 110438381 Bacteria n n uncultured bacterium JQ998551 490 14 295 1E-124 96% 96% 110438438 Bacteria n n uncultured bacterium JQ998316 405 5 360 0 100% 100% 110438638 Bacteria n n uncultured bacterium JQ998467 461 72 413 1E-149 95% 95% 110438856 Bacteria n n uncultured bacterium JQ997947 272 18 228 2E-95 97% 97% 110439795 Bacteria n n uncultured bacterium JQ998688 525 5 478 0 94% 94% 110439901 Bacteria n n uncultured bacterium JQ998295 398 5 366 9E-145 93% 93% 110440019 Bacteria n n uncultured bacterium JQ999112 561 18 550 0 94% 94% 110440098 Bacteria n n uncultured bacterium JQ998275 392 18 347 2E-166 99% 99% 110440160 Bacteria n n uncultured bacterium JQ998473 463 16 413 1E-163 93% 93% 110440247 Bacteria n n uncultured bacterium JQ998601 505 18 468 0 96% 96% 110440335 Bacteria n n uncultured bacterium JQ998673 523 18 523 0 95% 95% 110440402 Bacteria n n uncultured bacterium JQ998359 419 18 364 7E-161 97% 97% 110440797 Bacteria n n uncultured bacterium JQ998661 520 18 470 4E-169 91% 91% 110440804 Bacteria n n uncultured bacterium JQ997976 287 5 223 7E-110 100% 100% 110441643 Bacteria n n uncultured bacterium JQ998143 350 5 308 1E-123 93% 93% 110442289 Bacteria n n uncultured bacterium JQ998558 493 18 446 0 97% 97% 110443809 Bacteria n n uncultured bacterium JQ999173 576 18 569 0 96% 96% 110444022 Bacteria n n uncultured bacterium JQ998468 461 24 403 3E-170 96% 96% 110444135 Bacteria n n uncultured bacterium JQ998945 547 22 544 0 99% 99% 110444775 Bacteria n n uncultured bacterium JQ998576 497 21 440 0 96% 96% 110444791 Bacteria n n uncultured bacterium JQ998849 541 1 527 0 95% 95% 110444825 Bacteria n n uncultured bacterium JQ998509 477 4 416 0 100% 100% 110445101 Bacteria n n uncultured bacterium JQ997888 244 18 200 1E-71 94% 94% 110445140 Bacteria n n uncultured bacterium JQ998298 399 18 357 1E-148 95% 95% 110445294 Bacteria n n uncultured bacterium JQ998946 547 5 543 0 91% 91% 110445618 Bacteria n n uncultured bacterium JQ997939 269 14 232 2E-90 95% 95% 110447114 Bacteria n n uncultured bacterium JQ998516 479 16 447 0 94% 94% 110447803 Bacteria n n uncultured bacterium JQ998564 494 18 437 0 94% 94% 110448781 Bacteria n n uncultured bacterium JQ997940 269 5 209 4E-82 94% 94% 110449223 Bacteria n n uncultured bacterium JQ998830 539 19 535 0 94% 94% 110449534 Bacteria n n uncultured bacterium JQ998993 550 18 529 0 93% 93% 110449796 Bacteria n n uncultured bacterium JQ998469 461 24 390 6E-157 94% 94% 110450561 Bacteria n n uncultured bacterium JQ998843 540 7 537 0 90% 90% 110450644 Bacteria n n uncultured bacterium JQ998974 549 5 538 0 96% 96% 110450647 Bacteria n n uncultured bacterium JQ998460 458 4 424 3E-170 93% 93% 110450683 Bacteria n n uncultured bacterium JQ998652 518 26 476 0 98% 98% 110450754 Bacteria n n uncultured bacterium JQ998130 347 4 301 6E-126 94% 94% 110451179 Bacteria n n uncultured bacterium 171 Table S1 Cont.

JQ998831 539 17 507 0 92% 92% 110832477 Bacteria n n uncultured bacterium JQ998779 535 17 490 0 99% 99% 110832479 Bacteria n n uncultured bacterium JQ999137 565 20 562 0 96% 96% 110832499 Bacteria n n uncultured bacterium JQ997886 242 5 219 9E-103 99% 99% 110832569 Bacteria n n uncultured bacterium JQ998462 459 17 325 2E-147 98% 98% 112819327 Bacteria n n uncultured bacterium JQ997909 255 1 194 5E-96 100% 100% 113870203 Bacteria n n uncultured bacterium JQ998877 543 14 540 0 100% 100% 113870743 Bacteria n n uncultured bacterium JQ998947 547 5 541 0 95% 95% 116876085 Bacteria n n uncultured bacterium JQ998080 325 1 274 3E-119 96% 96% 116876101 Bacteria n n uncultured bacterium JQ997938 268 1 230 5E-86 92% 92% 117969615 Bacteria n n uncultured bacterium JQ998862 542 18 538 0 95% 95% 118406565 Bacteria n n uncultured bacterium JQ998709 528 74 525 7E-167 91% 91% 119436067 Bacteria n n uncultured bacterium JQ999045 555 16 549 0 91% 91% 119436142 Bacteria n n uncultured bacterium JQ998356 417 24 375 1E-128 91% 91% 119436366 Bacteria n n uncultured bacterium JQ998863 542 5 538 0 92% 92% 119437478 Bacteria n n uncultured bacterium JQ999046 555 5 551 0 93% 93% 119437735 Bacteria n n uncultured bacterium JQ998523 480 4 426 0 98% 98% 124303494 Bacteria n n uncultured bacterium JQ999146 567 5 565 0 100% 100% 124303511 Bacteria n n uncultured bacterium JQ998406 438 5 378 2E-167 95% 95% 125660699 Bacteria n n uncultured bacterium JQ999057 555 18 520 0 98% 98% 125715968 Bacteria n n uncultured bacterium JQ997980 289 18 243 1E-92 94% 94% 125742383 Bacteria n n uncultured bacterium JQ999061 556 3 552 0 99% 99% 126111464 Bacteria n n uncultured bacterium JQ998131 347 25 295 2E-115 95% 95% 126113031 Bacteria n n uncultured bacterium JQ998895 544 3 541 0 100% 100% 126113813 Bacteria n n uncultured bacterium JQ998483 467 18 438 0 97% 97% 126113939 Bacteria n n uncultured bacterium JQ999138 565 18 565 0 97% 97% 126114009 Bacteria n n uncultured bacterium JQ998218 373 93 327 2E-105 97% 97% 126114126 Bacteria n n uncultured bacterium JQ998764 534 17 532 0 98% 98% 126115468 Bacteria n n uncultured bacterium JQ998994 550 4 540 0 99% 99% 126115792 Bacteria n n uncultured bacterium JQ998962 548 1 528 0 92% 92% 126362330 Bacteria n n uncultured bacterium JQ998765 534 5 529 0 96% 96% 126674407 Bacteria n n uncultured bacterium JQ998463 459 112 405 8E-146 99% 99% 134021526 Bacteria n n uncultured bacterium JQ999113 561 18 548 0 97% 97% 134140995 Bacteria n n uncultured bacterium JQ998025 307 73 262 4E-92 99% 99% 138239976 Bacteria n n uncultured bacterium JQ998056 318 10 162 8E-70 99% 99% 138240043 Bacteria n n uncultured bacterium JQ997896 248 5 211 2E-99 99% 99% 138240346 Bacteria n n uncultured bacterium JQ998701 527 18 513 0 96% 96% 138240383 Bacteria n n uncultured bacterium JQ998172 359 4 324 2E-166 100% 100% 140326239 Bacteria n n uncultured bacterium JQ998185 363 5 330 1E-153 97% 97% 145284746 Bacteria n n uncultured bacterium JQ998039 311 17 259 6E-116 98% 98% 145285090 Bacteria n n uncultured bacterium JQ999071 557 34 555 0 98% 98% 145286222 Bacteria n n uncultured bacterium JQ998873 542 5 538 0 100% 100% 145582288 Bacteria n n uncultured bacterium JQ999105 559 17 532 0 97% 97% 145582336 Bacteria n n uncultured bacterium JQ998579 497 24 447 3E-145 89% 89% 145583005 Bacteria n n uncultured bacterium JQ998959 547 17 466 0 99% 99% 145583249 Bacteria n n uncultured bacterium JQ998195 365 23 314 4E-83 88% 88% 145584302 Bacteria n n uncultured bacterium JQ998541 485 3 418 6E-177 94% 94% 145584307 Bacteria n n uncultured bacterium JQ997901 250 5 205 3E-83 95% 95% 146165027 Bacteria n n uncultured bacterium JQ999072 557 5 551 0 99% 99% 146575767 Bacteria n n uncultured bacterium JQ999163 573 18 566 0 93% 93% 148248671 Bacteria n n uncultured bacterium JQ998347 414 20 368 7E-166 97% 97% 148472131 Bacteria n n uncultured bacterium JQ998100 334 5 302 1E-123 94% 94% 148730991 Bacteria n n uncultured bacterium JQ998243 381 62 335 4E-133 99% 99% 148731710 Bacteria n n uncultured bacterium JQ999073 557 5 554 0 99% 99% 148828955 Bacteria n n uncultured bacterium JQ998695 526 18 523 0 95% 95% 149350589 Bacteria n n uncultured bacterium JQ999179 579 17 559 0 92% 92% 152926424 Bacteria n n uncultured bacterium JQ998736 531 20 450 0 94% 94% 152926426 Bacteria n n uncultured bacterium JQ998721 529 18 523 0 99% 99% 155968453 Bacteria n n uncultured bacterium JQ998817 538 5 538 0 97% 97% 156505941 Bacteria n n uncultured bacterium JQ998790 535 16 527 0 98% 98% 156522978 Bacteria n n uncultured bacterium JQ997910 255 4 188 6E-80 96% 96% 157498485 Bacteria n n uncultured bacterium JQ999023 553 6 533 0 89% 89% 157499269 Bacteria n n uncultured bacterium JQ998053 317 5 263 4E-132 100% 100% 157500126 Bacteria n n uncultured bacterium JQ998088 328 5 251 2E-110 96% 96% 157649100 Bacteria n n uncultured bacterium JQ999030 553 5 514 0 92% 92% 157690469 Bacteria n n uncultured bacterium JQ998538 484 5 436 0 96% 96% 157690565 Bacteria n n uncultured bacterium JQ998975 549 18 531 0 98% 98% 157740581 Bacteria n n uncultured bacterium JQ998573 496 5 442 0 95% 95% 157926675 Bacteria n n uncultured bacterium JQ997876 228 5 165 2E-60 94% 94% 157926821 Bacteria n n uncultured bacterium JQ998127 346 19 318 8E-155 100% 100% 157927262 Bacteria n n uncultured bacterium JQ998174 360 16 85 4E-23 97% 97% 157927557 Bacteria n n uncultured bacterium JQ997933 267 164 222 5E-21 100% 100% 157927616 Bacteria n n uncultured bacterium JQ998427 446 33 400 5E-103 86% 86% 158148402 Bacteria n n uncultured bacterium JQ998367 422 14 391 2E-176 97% 97% 158148421 Bacteria n n uncultured bacterium JQ998430 448 15 359 5E-163 97% 97% 158442466 Bacteria n n uncultured bacterium JQ998201 367 24 324 2E-135 96% 96% 159885105 Bacteria n n uncultured bacterium JQ998464 459 17 377 0 99% 99% 160280259 Bacteria n n uncultured bacterium JQ998687 524 5 472 0 94% 94% 160332417 Bacteria n n uncultured bacterium JQ997892 246 5 200 3E-97 100% 100% 160714403 Bacteria n n uncultured bacterium 172 Table S1 Cont.

JQ998737 531 5 387 1E-154 93% 93% 160922872 Bacteria n n uncultured bacterium JQ998271 390 5 354 4E-64 82% 82% 160922944 Bacteria n n uncultured bacterium JQ998609 507 5 444 0 98% 98% 160922945 Bacteria n n uncultured bacterium JQ999160 572 17 545 0 98% 98% 161876636 Bacteria n n uncultured bacterium JQ998386 431 19 386 3E-150 93% 93% 162950760 Bacteria n n uncultured bacterium JQ998272 390 17 305 9E-145 99% 99% 164460317 Bacteria n n uncultured bacterium JQ998110 338 5 306 3E-139 97% 97% 164509981 Bacteria n n uncultured bacterium JQ998635 515 16 73 3E-16 97% 97% 164521760 Bacteria n n uncultured bacterium JQ998584 501 136 435 3E-150 99% 99% 164523483 Bacteria n n uncultured bacterium JQ998896 544 52 527 0 96% 96% 164609939 Bacteria n n uncultured bacterium JQ998165 357 5 315 6E-151 98% 98% 164610216 Bacteria n n uncultured bacterium JQ998878 543 11 533 0 94% 94% 165968461 Bacteria n n uncultured bacterium JQ998963 548 4 544 0 99% 99% 166202170 Bacteria n n uncultured bacterium JQ998596 504 18 442 0 100% 100% 168997732 Bacteria n n uncultured bacterium JQ998491 469 5 424 0 95% 95% 168997829 Bacteria n n uncultured bacterium JQ998092 329 5 297 6E-146 99% 99% 169127868 Bacteria n n uncultured bacterium JQ999097 559 5 533 0 93% 93% 169129792 Bacteria n n uncultured bacterium JQ998181 362 5 326 4E-148 97% 97% 169130297 Bacteria n n uncultured bacterium JQ998546 487 4 441 1E-154 90% 90% 169130400 Bacteria n n uncultured bacterium JQ998392 433 8 381 7E-156 94% 94% 169131607 Bacteria n n uncultured bacterium JQ999086 558 18 555 0 100% 100% 169132460 Bacteria n n uncultured bacterium JQ998610 507 18 470 0 97% 97% 169132729 Bacteria n n uncultured bacterium JQ998702 527 15 516 0 93% 93% 169132832 Bacteria n n uncultured bacterium JQ998034 309 3 266 3E-119 97% 97% 169132904 Bacteria n n uncultured bacterium JQ998223 375 5 311 5E-152 99% 99% 169133475 Bacteria n n uncultured bacterium JQ998436 450 200 413 6E-107 100% 100% 169134448 Bacteria n n uncultured bacterium JQ998428 447 5 384 5E-173 96% 96% 169144791 Bacteria n n uncultured bacterium JQ997949 273 46 221 2E-79 98% 98% 169265423 Bacteria n n uncultured bacterium JQ998662 520 4 478 0 91% 91% 169266227 Bacteria n n uncultured bacterium JQ998879 543 5 519 0 97% 97% 169267043 Bacteria n n uncultured bacterium JQ998995 550 61 547 2E-177 90% 90% 169268927 Bacteria n n uncultured bacterium JQ997919 261 18 122 1E-16 84% 84% 169269241 Bacteria n n uncultured bacterium JQ998071 323 24 271 4E-117 98% 98% 169270538 Bacteria n n uncultured bacterium JQ999003 551 20 542 0 94% 94% 169273928 Bacteria n n uncultured bacterium JQ998703 527 2 490 5E-178 90% 90% 169273933 Bacteria n n uncultured bacterium JQ998383 429 21 386 0 98% 98% 169273949 Bacteria n n uncultured bacterium JQ999114 561 27 561 0 98% 98% 169273981 Bacteria n n uncultured bacterium JQ998559 493 5 434 0 94% 94% 169274045 Bacteria n n uncultured bacterium JQ998897 544 18 540 0 97% 97% 169274573 Bacteria n n uncultured bacterium JQ999134 564 5 559 0 99% 99% 169274878 Bacteria n n uncultured bacterium JQ998722 529 16 519 5E-178 90% 90% 169275081 Bacteria n n uncultured bacterium JQ998780 535 15 432 0 96% 96% 169275203 Bacteria n n uncultured bacterium JQ998319 406 1 343 5E-167 98% 98% 169275222 Bacteria n n uncultured bacterium JQ998471 462 16 425 0 98% 98% 169275300 Bacteria n n uncultured bacterium JQ999196 612 18 602 0 90% 90% 169275355 Bacteria n n uncultured bacterium JQ998898 544 4 535 0 95% 95% 169275415 Bacteria n n uncultured bacterium JQ998128 346 5 296 1E-132 97% 97% 169275467 Bacteria n n uncultured bacterium JQ999125 563 26 510 0 94% 94% 169275696 Bacteria n n uncultured bacterium JQ998118 341 24 310 1E-78 87% 87% 169275865 Bacteria n n uncultured bacterium JQ998880 543 5 540 0 94% 94% 169276279 Bacteria n n uncultured bacterium JQ998738 531 5 501 0 95% 95% 169276317 Bacteria n n uncultured bacterium JQ999909 532 243 528 2E-138 98% 98% 169277275 Bacteria n n uncultured bacterium JQ998095 331 16 299 2E-121 95% 95% 169278085 Bacteria n n uncultured bacterium JQ998470 461 18 402 1E-159 94% 94% 169278764 Bacteria n n uncultured bacterium JQ999216 730 5 237 4E-51 84% 84% 169278944 Bacteria n n uncultured bacterium JQ998266 388 87 352 7E-116 96% 96% 169280166 Bacteria n n uncultured bacterium JQ999062 556 12 551 0 90% 90% 169280905 Bacteria n n uncultured bacterium JQ997911 255 7 223 4E-97 97% 97% 169280992 Bacteria n n uncultured bacterium JQ998490 468 18 419 0 99% 99% 169281198 Bacteria n n uncultured bacterium JQ998257 385 17 348 1E-74 84% 84% 169281437 Bacteria n n uncultured bacterium JQ998006 300 5 247 1E-121 100% 100% 169281727 Bacteria n n uncultured bacterium JQ998517 479 17 376 4E-164 96% 96% 169281966 Bacteria n n uncultured bacterium JQ998340 412 17 298 2E-131 97% 97% 169282044 Bacteria n n uncultured bacterium JQ999126 563 6 534 0 91% 91% 169282669 Bacteria n n uncultured bacterium JQ998948 547 17 544 0 96% 96% 169283420 Bacteria n n uncultured bacterium JQ998351 416 16 348 2E-166 99% 99% 169283464 Bacteria n n uncultured bacterium JQ998527 482 5 445 5E-143 88% 88% 169283814 Bacteria n n uncultured bacterium JQ999087 558 16 529 0 95% 95% 169283893 Bacteria n n uncultured bacterium JQ998227 377 5 336 3E-139 94% 94% 169289574 Bacteria n n uncultured bacterium JQ998273 390 5 337 9E-130 92% 92% 169290095 Bacteria n n uncultured bacterium JQ998914 545 15 543 0 90% 90% 169290564 Bacteria n n uncultured bacterium JQ998102 335 5 282 6E-121 95% 95% 169290664 Bacteria n n uncultured bacterium JQ998539 485 18 434 0 95% 95% 169290715 Bacteria n n uncultured bacterium JQ998320 406 15 376 0 99% 99% 169291056 Bacteria n n uncultured bacterium JQ998437 450 18 426 0 97% 97% 169291284 Bacteria n n uncultured bacterium JQ998138 349 18 283 4E-113 95% 95% 169797979 Bacteria n n uncultured bacterium JQ999032 554 17 553 0 99% 99% 170652676 Bacteria n n uncultured bacterium JQ998804 537 4 528 0 97% 97% 170652677 Bacteria n n uncultured bacterium JQ998084 327 18 296 3E-113 94% 94% 171262582 Bacteria n n uncultured bacterium 173 Table S1 Cont.

JQ997903 252 30 215 3E-83 97% 97% 171262599 Bacteria n n uncultured bacterium JQ998704 527 5 470 0 94% 94% 171467457 Bacteria n n uncultured bacterium JQ998147 351 6 212 9E-70 91% 91% 187424332 Bacteria n n uncultured bacterium JQ998625 511 4 395 0 98% 98% 187438950 Bacteria n n uncultured bacterium JQ998317 405 18 318 1E-153 100% 100% 187736691 Bacteria n n uncultured bacterium JQ998480 466 18 405 0 99% 99% 187963663 Bacteria n n uncultured bacterium JQ999140 566 20 546 0 93% 93% 187964336 Bacteria n n uncultured bacterium JQ998766 534 5 475 0 96% 96% 187965718 Bacteria n n uncultured bacterium JQ998554 491 14 363 3E-180 100% 100% 187967718 Bacteria n n uncultured bacterium JQ998035 310 18 276 7E-120 97% 97% 187968607 Bacteria n n uncultured bacterium JQ997934 267 18 198 8E-79 96% 96% 187968619 Bacteria n n uncultured bacterium JQ998234 379 5 322 8E-165 100% 100% 188530112 Bacteria n n uncultured bacterium JQ998276 392 24 347 7E-131 93% 93% 189309776 Bacteria n n uncultured bacterium JQ997993 296 5 263 9E-119 97% 97% 190364260 Bacteria n n uncultured bacterium JQ998007 300 15 253 3E-94 93% 93% 190705415 Bacteria n n uncultured bacterium JQ998134 348 5 316 2E-131 94% 94% 190707473 Bacteria n n uncultured bacterium JQ998616 508 5 432 4E-159 91% 91% 192966107 Bacteria n n uncultured bacterium JQ999088 558 16 552 3E-161 87% 87% 192966414 Bacteria n n uncultured bacterium JQ998950 547 5 542 0 98% 98% 192967750 Bacteria n n uncultured bacterium JQ998899 544 21 529 0 97% 97% 192968449 Bacteria n n uncultured bacterium JQ998767 534 23 503 0 100% 100% 192970868 Bacteria n n uncultured bacterium JQ998418 444 15 377 1E-164 96% 96% 192972691 Bacteria n n uncultured bacterium JQ998074 324 6 231 3E-84 92% 92% 192974165 Bacteria n n uncultured bacterium JQ998577 497 22 227 9E-101 100% 100% 192975291 Bacteria n n uncultured bacterium JQ998412 440 5 329 1E-138 95% 95% 192975946 Bacteria n n uncultured bacterium JQ999089 558 22 487 0 92% 92% 192975965 Bacteria n n uncultured bacterium JQ998915 545 18 541 0 91% 91% 192976013 Bacteria n n uncultured bacterium JQ998313 404 23 108 5E-28 95% 95% 192976059 Bacteria n n uncultured bacterium JQ998832 539 54 497 7E-172 92% 92% 192976124 Bacteria n n uncultured bacterium JQ998368 423 102 315 3E-100 98% 98% 192976149 Bacteria n n uncultured bacterium JQ998754 533 5 485 0 93% 93% 192976156 Bacteria n n uncultured bacterium JQ998162 356 4 324 3E-119 91% 91% 192976166 Bacteria n n uncultured bacterium JQ998108 337 18 302 3E-143 99% 99% 192976207 Bacteria n n uncultured bacterium JQ998377 427 5 357 3E-154 95% 95% 192976228 Bacteria n n uncultured bacterium JQ998647 517 8 445 3E-151 90% 90% 192976293 Bacteria n n uncultured bacterium JQ997981 289 18 246 5E-106 97% 97% 192976431 Bacteria n n uncultured bacterium JQ998755 533 24 530 0 95% 95% 192976582 Bacteria n n uncultured bacterium JQ998929 546 5 519 0 94% 94% 192976655 Bacteria n n uncultured bacterium JQ997982 290 9 239 3E-108 98% 98% 192976674 Bacteria n n uncultured bacterium JQ998739 531 15 526 0 99% 99% 192976754 Bacteria n n uncultured bacterium JQ999024 553 18 437 0 96% 96% 192976789 Bacteria n n uncultured bacterium JQ998674 523 15 523 0 97% 97% 192976806 Bacteria n n uncultured bacterium JQ998976 549 15 546 0 100% 100% 192976819 Bacteria n n uncultured bacterium JQ998534 484 18 444 0 94% 94% 192977859 Bacteria n n uncultured bacterium JQ998360 419 18 376 0 99% 99% 192978967 Bacteria n n uncultured bacterium JQ998314 404 1 347 9E-160 96% 96% 192978977 Bacteria n n uncultured bacterium JQ999033 554 18 548 0 96% 96% 192979708 Bacteria n n uncultured bacterium JQ998114 340 23 294 4E-128 98% 98% 192979745 Bacteria n n uncultured bacterium JQ998619 509 18 487 0 92% 92% 192979771 Bacteria n n uncultured bacterium JQ998964 548 5 544 0 91% 91% 192979804 Bacteria n n uncultured bacterium JQ999176 578 5 534 0 97% 97% 192979942 Bacteria n n uncultured bacterium JQ998792 536 18 470 0 96% 96% 192979948 Bacteria n n uncultured bacterium JQ998371 424 5 375 3E-169 96% 96% 192979964 Bacteria n n uncultured bacterium JQ999063 556 4 535 0 95% 95% 192979974 Bacteria n n uncultured bacterium JQ998793 536 18 532 2E-157 87% 87% 192979984 Bacteria n n uncultured bacterium JQ998175 360 19 309 1E-147 100% 100% 192980004 Bacteria n n uncultured bacterium JQ998139 349 23 293 2E-115 95% 95% 192980549 Bacteria n n uncultured bacterium JQ997881 240 28 196 6E-60 92% 92% 192980604 Bacteria n n uncultured bacterium JQ998727 530 5 492 0 97% 97% 192980815 Bacteria n n uncultured bacterium JQ998781 535 1 524 0 94% 94% 192980843 Bacteria n n uncultured bacterium JQ998407 438 9 383 2E-176 97% 97% 192980848 Bacteria n n uncultured bacterium JQ998157 353 5 322 1E-143 96% 96% 192980858 Bacteria n n uncultured bacterium JQ998085 327 17 296 1E-113 94% 94% 192980871 Bacteria n n uncultured bacterium JQ998115 340 5 256 4E-108 95% 95% 192980926 Bacteria n n uncultured bacterium JQ997944 271 23 239 4E-77 91% 91% 192981069 Bacteria n n uncultured bacterium JQ998597 504 18 446 9E-131 87% 87% 192981149 Bacteria n n uncultured bacterium JQ998666 521 15 515 0 94% 94% 192981198 Bacteria n n uncultured bacterium JQ998178 361 3 243 3E-114 98% 98% 192981346 Bacteria n n uncultured bacterium JQ998481 466 5 385 3E-160 94% 94% 192983964 Bacteria n n uncultured bacterium JQ998794 536 17 496 0 99% 99% 192983994 Bacteria n n uncultured bacterium JQ998747 532 4 473 0 94% 94% 192984056 Bacteria n n uncultured bacterium JQ997893 246 18 201 3E-77 96% 96% 192984107 Bacteria n n uncultured bacterium JQ998818 538 18 480 0 97% 97% 192984128 Bacteria n n uncultured bacterium JQ998549 488 5 427 0 96% 96% 192984156 Bacteria n n uncultured bacterium JQ999047 555 24 545 0 98% 98% 192984161 Bacteria n n uncultured bacterium JQ998833 539 17 536 0 99% 99% 192984206 Bacteria n n uncultured bacterium JQ998626 511 17 472 3E-175 91% 91% 192984284 Bacteria n n uncultured bacterium JQ998179 361 1 296 3E-139 97% 97% 192984388 Bacteria n n uncultured bacterium JQ998086 327 5 280 2E-136 99% 99% 192984568 Bacteria n n uncultured bacterium 174 Table S1 Cont.

JQ999157 571 4 473 0 95% 95% 192984599 Bacteria n n uncultured bacterium JQ998008 300 22 268 9E-109 96% 96% 192984616 Bacteria n n uncultured bacterium JQ998352 416 17 367 0 100% 100% 192984640 Bacteria n n uncultured bacterium JQ998518 479 18 412 0 99% 99% 192984646 Bacteria n n uncultured bacterium JQ998413 440 5 406 4E-168 94% 94% 192984658 Bacteria n n uncultured bacterium JQ998040 311 5 255 2E-115 97% 97% 192984672 Bacteria n n uncultured bacterium JQ998036 310 5 265 3E-133 100% 100% 192984677 Bacteria n n uncultured bacterium JQ998611 507 42 466 1E-168 92% 92% 192984692 Bacteria n n uncultured bacterium JQ998452 456 7 422 4E-164 92% 92% 192984702 Bacteria n n uncultured bacterium JQ998267 388 5 344 2E-165 98% 98% 192984705 Bacteria n n uncultured bacterium JQ998321 406 24 374 3E-130 91% 91% 192984743 Bacteria n n uncultured bacterium JQ998410 439 5 396 3E-174 95% 95% 192984840 Bacteria n n uncultured bacterium JQ998434 449 18 403 8E-146 91% 91% 192985210 Bacteria n n uncultured bacterium JQ998417 443 4 390 3E-90 84% 84% 192985646 Bacteria n n uncultured bacterium JQ999004 551 4 549 0 97% 97% 192988880 Bacteria n n uncultured bacterium JQ999064 556 3 511 0 92% 92% 192989027 Bacteria n n uncultured bacterium JQ997897 248 19 204 6E-85 98% 98% 192989092 Bacteria n n uncultured bacterium JQ998667 522 24 474 0 92% 92% 192989236 Bacteria n n uncultured bacterium JQ998850 541 4 504 0 93% 93% 192989243 Bacteria n n uncultured bacterium JQ998805 537 5 537 0 91% 91% 192989244 Bacteria n n uncultured bacterium JQ998728 530 5 490 0 92% 92% 192989281 Bacteria n n uncultured bacterium JQ998659 519 5 472 0 98% 98% 192989317 Bacteria n n uncultured bacterium JQ998484 467 18 351 1E-173 100% 100% 192989331 Bacteria n n uncultured bacterium JQ998063 320 8 288 1E-122 95% 95% 192989685 Bacteria n n uncultured bacterium JQ998500 474 5 416 0 97% 97% 192989734 Bacteria n n uncultured bacterium JQ998041 312 5 226 3E-108 99% 99% 193849299 Bacteria n n uncultured bacterium JQ998213 371 24 327 2E-115 92% 92% 194136692 Bacteria n n uncultured bacterium JQ998017 304 18 272 3E-123 98% 98% 194137835 Bacteria n n uncultured bacterium JQ998641 516 5 484 0 100% 100% 194138815 Bacteria n n uncultured bacterium JQ997958 277 5 237 1E-97 95% 95% 194139439 Bacteria n n uncultured bacterium JQ998881 543 18 540 0 91% 91% 194139558 Bacteria n n uncultured bacterium JQ998585 501 17 456 0 99% 99% 194139664 Bacteria n n uncultured bacterium JQ998627 511 13 458 0 99% 99% 194139828 Bacteria n n uncultured bacterium JQ998681 524 5 521 0 99% 99% 194139970 Bacteria n n uncultured bacterium JQ999025 553 18 550 0 99% 99% 194140001 Bacteria n n uncultured bacterium JQ998441 451 18 404 3E-175 96% 96% 194140055 Bacteria n n uncultured bacterium JQ998166 358 5 313 2E-141 96% 96% 194140159 Bacteria n n uncultured bacterium JQ998353 416 18 370 3E-179 99% 99% 194140237 Bacteria n n uncultured bacterium JQ997975 286 14 241 7E-115 100% 100% 194140255 Bacteria n n uncultured bacterium JQ997953 275 17 231 1E-107 100% 100% 194140257 Bacteria n n uncultured bacterium JQ998336 411 18 364 0 100% 100% 194293700 Bacteria n n uncultured bacterium JQ998132 347 5 296 1E-143 99% 99% 194592166 Bacteria n n uncultured bacterium JQ998018 304 5 259 7E-130 100% 100% 194597963 Bacteria n n uncultured bacterium JQ998209 370 18 319 3E-154 100% 100% 194718448 Bacteria n n uncultured bacterium JQ998328 407 5 386 1E-114 88% 88% 194719015 Bacteria n n uncultured bacterium JQ997986 293 3 258 3E-73 87% 87% 195542932 Bacteria n n uncultured bacterium JQ999192 604 18 283 7E-103 93% 93% 197108620 Bacteria n n uncultured bacterium JQ997956 276 16 208 2E-95 100% 100% 197258193 Bacteria n n uncultured bacterium JQ999014 552 18 512 0 92% 92% 197342931 Bacteria n n uncultured bacterium JQ999115 561 17 368 3E-180 99% 99% 197342941 Bacteria n n uncultured bacterium JQ998064 320 17 273 1E-127 99% 99% 197342951 Bacteria n n uncultured bacterium JQ997950 273 100 241 4E-57 96% 96% 197342960 Bacteria n n uncultured bacterium JQ998348 415 16 354 1E-163 98% 98% 197342998 Bacteria n n uncultured bacterium JQ998335 410 5 356 3E-135 92% 92% 197343098 Bacteria n n uncultured bacterium JQ999005 551 15 484 0 95% 95% 197345311 Bacteria n n uncultured bacterium JQ998448 454 18 409 2E-167 94% 94% 197346844 Bacteria n n uncultured bacterium JQ998485 467 5 418 0 95% 95% 197346995 Bacteria n n uncultured bacterium JQ998026 307 5 273 1E-137 100% 100% 197347462 Bacteria n n uncultured bacterium JQ998261 386 18 291 5E-137 99% 99% 197349532 Bacteria n n uncultured bacterium JQ998851 541 5 455 0 97% 97% 197350011 Bacteria n n uncultured bacterium JQ998296 398 7 365 1E-152 94% 94% 197350154 Bacteria n n uncultured bacterium JQ997925 263 5 214 6E-65 89% 89% 197350162 Bacteria n n uncultured bacterium JQ998235 379 5 347 2E-130 91% 91% 197350304 Bacteria n n uncultured bacterium JQ998379 428 5 383 0 98% 98% 197350861 Bacteria n n uncultured bacterium JQ998144 350 5 296 8E-125 95% 95% 197351132 Bacteria n n uncultured bacterium JQ998145 350 5 297 6E-136 97% 97% 197351240 Bacteria n n uncultured bacterium JQ998806 537 18 534 0 93% 93% 197351291 Bacteria n n uncultured bacterium JQ998580 500 35 447 1E-169 93% 93% 197351616 Bacteria n n uncultured bacterium JQ998019 304 5 270 2E-134 100% 100% 197352517 Bacteria n n uncultured bacterium JQ998996 550 29 516 0 98% 98% 197352847 Bacteria n n uncultured bacterium JQ998586 501 23 358 5E-153 96% 96% 197353381 Bacteria n n uncultured bacterium JQ997887 242 5 211 1E-96 98% 98% 197355895 Bacteria n n uncultured bacterium JQ998363 420 17 355 4E-148 95% 95% 197357969 Bacteria n n uncultured bacterium JQ998400 435 19 395 8E-151 93% 93% 197358234 Bacteria n n uncultured bacterium JQ998390 432 14 373 0 99% 99% 197358911 Bacteria n n uncultured bacterium JQ998900 544 4 500 0 94% 94% 197365599 Bacteria n n uncultured bacterium JQ998864 542 4 541 0 94% 94% 197365624 Bacteria n n uncultured bacterium JQ998453 456 71 397 5E-168 100% 100% 197724312 Bacteria n n uncultured bacterium JQ997991 295 18 260 2E-85 92% 92% 198386168 Bacteria n n uncultured bacterium 175 Table S1 Cont.

JQ998408 438 18 381 0 100% 100% 198386169 Bacteria n n uncultured bacterium JQ998882 543 4 540 0 95% 95% 198403335 Bacteria n n uncultured bacterium JQ998182 362 5 326 2E-165 100% 100% 198427020 Bacteria n n uncultured bacterium JQ998514 478 4 404 0 100% 100% 198447831 Bacteria n n uncultured bacterium JQ997987 293 4 240 7E-120 100% 100% 199582799 Bacteria n n uncultured bacterium JQ998768 534 18 533 0 92% 92% 202073332 Bacteria n n uncultured bacterium JQ998158 353 18 307 5E-147 100% 100% 206583457 Bacteria n n uncultured bacterium JQ998883 543 18 543 0 99% 99% 207083801 Bacteria n n uncultured bacterium JQ998380 428 5 210 9E-51 86% 86% 207299102 Bacteria n n uncultured bacterium JQ997889 244 23 214 3E-93 99% 99% 209165384 Bacteria n n uncultured bacterium JQ997874 215 5 103 1E-36 96% 96% 209171353 Bacteria n n uncultured bacterium JQ998930 546 18 544 0 89% 89% 209365353 Bacteria n n uncultured bacterium JQ999208 669 18 358 2E-164 98% 98% 209915822 Bacteria n n uncultured bacterium JQ998449 455 16 374 4E-174 98% 98% 209915834 Bacteria n n uncultured bacterium JQ998255 384 5 338 8E-160 98% 98% 209915940 Bacteria n n uncultured bacterium JQ998769 534 18 446 0 94% 94% 209915946 Bacteria n n uncultured bacterium JQ997989 294 3 150 2E-70 100% 100% 209973735 Bacteria n n uncultured bacterium JQ998901 544 5 488 0 92% 92% 209973762 Bacteria n n uncultured bacterium JQ998528 482 23 436 0 97% 97% 209973787 Bacteria n n uncultured bacterium JQ998252 383 4 255 4E-123 99% 99% 209973825 Bacteria n n uncultured bacterium JQ998965 548 18 539 0 98% 98% 209973833 Bacteria n n uncultured bacterium JQ998159 353 27 284 2E-96 92% 92% 209973849 Bacteria n n uncultured bacterium JQ999147 567 17 529 0 97% 97% 209973893 Bacteria n n uncultured bacterium JQ998135 348 18 260 2E-115 98% 98% 209973941 Bacteria n n uncultured bacterium JQ998163 356 5 304 8E-150 99% 99% 209974024 Bacteria n n uncultured bacterium JQ998258 385 5 343 4E-173 99% 99% 209974036 Bacteria n n uncultured bacterium JQ997917 260 18 74 7E-20 100% 100% 212293829 Bacteria n n uncultured bacterium JQ998931 546 18 517 0 98% 98% 212725698 Bacteria n n uncultured bacterium JQ997974 285 5 226 2E-110 100% 100% 213519754 Bacteria n n uncultured bacterium JQ998498 473 5 413 0 99% 99% 214017338 Bacteria n n uncultured bacterium JQ998884 543 24 266 6E-123 100% 100% 214018279 Bacteria n n uncultured bacterium JQ998602 505 18 372 0 100% 100% 214018628 Bacteria n n uncultured bacterium JQ999090 558 5 547 0 97% 97% 214019092 Bacteria n n uncultured bacterium JQ997904 252 21 204 6E-65 92% 92% 214019655 Bacteria n n uncultured bacterium JQ998210 370 12 325 3E-159 99% 99% 214021245 Bacteria n n uncultured bacterium JQ998098 333 5 272 3E-114 95% 95% 214023606 Bacteria n n uncultured bacterium JQ997880 238 5 195 9E-93 99% 99% 214024174 Bacteria n n uncultured bacterium JQ997926 264 5 232 1E-111 99% 99% 214025032 Bacteria n n uncultured bacterium JQ998387 431 51 387 1E-168 99% 99% 214025113 Bacteria n n uncultured bacterium JQ998629 513 18 458 0 100% 100% 214025124 Bacteria n n uncultured bacterium JQ998535 484 5 435 0 95% 95% 214025498 Bacteria n n uncultured bacterium JQ998594 503 57 456 3E-110 87% 87% 214026621 Bacteria n n uncultured bacterium JQ997878 233 5 219 4E-106 100% 100% 215267990 Bacteria n n uncultured bacterium JQ998778 534 18 529 0 98% 98% 215268007 Bacteria n n uncultured bacterium JQ998440 450 5 350 3E-170 98% 98% 215268914 Bacteria n n uncultured bacterium JQ998550 489 5 425 0 100% 100% 215268983 Bacteria n n uncultured bacterium JQ999058 555 17 542 0 93% 93% 215269445 Bacteria n n uncultured bacterium JQ998358 418 5 107 1E-43 99% 99% 215269470 Bacteria n n uncultured bacterium JQ998826 538 15 533 3E-175 88% 88% 215269500 Bacteria n n uncultured bacterium JQ998645 516 23 461 0 99% 99% 215269508 Bacteria n n uncultured bacterium JQ998814 537 5 535 0 100% 100% 215269603 Bacteria n n uncultured bacterium JQ998658 518 18 514 0 100% 100% 215269622 Bacteria n n uncultured bacterium JQ998350 415 5 378 2E-151 93% 93% 215269784 Bacteria n n uncultured bacterium JQ998013 301 15 257 3E-118 99% 99% 215270156 Bacteria n n uncultured bacterium JQ998892 543 10 543 0 98% 98% 215270289 Bacteria n n uncultured bacterium JQ998190 364 5 313 6E-126 94% 94% 215270482 Bacteria n n uncultured bacterium JQ998226 376 5 317 1E-158 99% 99% 215271112 Bacteria n n uncultured bacterium JQ998593 502 15 401 0 98% 98% 215271225 Bacteria n n uncultured bacterium JQ998142 349 18 315 2E-150 99% 99% 215271342 Bacteria n n uncultured bacterium JQ998478 465 5 414 0 100% 100% 215271352 Bacteria n n uncultured bacterium JQ998997 550 18 322 1E-114 92% 92% 215480383 Bacteria n n uncultured bacterium JQ998362 419 5 362 5E-177 98% 98% 217272751 Bacteria n n uncultured bacterium JQ998620 509 429 495 2E-23 99% 99% 217323685 Bacteria n n uncultured bacterium JQ998604 506 18 460 0 99% 99% 217416996 Bacteria n n uncultured bacterium JQ999177 578 18 573 0 92% 92% 217417011 Bacteria n n uncultured bacterium JQ998547 487 50 450 4E-149 91% 91% 217417036 Bacteria n n uncultured bacterium JQ998902 544 18 540 0 95% 95% 217417113 Bacteria n n uncultured bacterium JQ998277 392 3 334 7E-171 100% 100% 217417131 Bacteria n n uncultured bacterium JQ998998 550 3 539 0 92% 92% 217417141 Bacteria n n uncultured bacterium JQ998492 469 4 419 2E-152 91% 91% 217417149 Bacteria n n uncultured bacterium JQ997928 265 4 248 1E-117 98% 98% 217417286 Bacteria n n uncultured bacterium JQ998807 537 5 536 0 97% 97% 217417293 Bacteria n n uncultured bacterium JQ998315 404 5 337 4E-163 98% 98% 217417294 Bacteria n n uncultured bacterium JQ998136 348 18 292 6E-106 92% 92% 217417308 Bacteria n n uncultured bacterium JQ998009 300 13 267 3E-78 88% 88% 217417319 Bacteria n n uncultured bacterium JQ997961 280 5 230 2E-110 99% 99% 217417401 Bacteria n n uncultured bacterium JQ998553 490 8 418 0 96% 96% 218100559 Bacteria n n uncultured bacterium JQ998858 541 51 540 0 98% 98% 218411140 Bacteria n n uncultured bacterium JQ998770 534 17 518 0 96% 96% 218686562 Bacteria n n uncultured bacterium 176 Table S1 Cont.

JQ998048 315 20 270 1E-117 98% 98% 219893613 Bacteria n n uncultured bacterium JQ999001 550 18 546 0 97% 97% 219906426 Bacteria n n uncultured bacterium JQ998341 412 18 88 4E-24 97% 97% 220937850 Bacteria n n uncultured bacterium JQ999059 555 18 550 0 99% 99% 220980864 Bacteria n n uncultured bacterium JQ998852 541 6 500 0 94% 94% 222090156 Bacteria n n uncultured bacterium JQ997898 248 23 215 2E-70 92% 92% 222090357 Bacteria n n uncultured bacterium JQ998020 304 15 253 3E-114 98% 98% 222101776 Bacteria n n uncultured bacterium JQ998146 350 17 305 8E-120 94% 94% 222101803 Bacteria n n uncultured bacterium JQ998815 537 3 532 0 98% 98% 222427076 Bacteria n n uncultured bacterium JQ998256 384 17 347 2E-155 97% 97% 222432210 Bacteria n n uncultured bacterium JQ997929 265 18 198 4E-77 96% 96% 223675333 Bacteria n n uncultured bacterium JQ998240 380 33 323 9E-140 98% 98% 223675605 Bacteria n n uncultured bacterium JQ998224 375 24 330 5E-142 97% 97% 223676588 Bacteria n n uncultured bacterium JQ999074 557 3 556 0 97% 97% 223676678 Bacteria n n uncultured bacterium JQ998401 435 5 372 3E-179 98% 98% 223676782 Bacteria n n uncultured bacterium JQ998705 527 24 481 0 98% 98% 223677097 Bacteria n n uncultured bacterium JQ998834 539 5 539 0 96% 96% 223677414 Bacteria n n uncultured bacterium JQ998259 385 15 257 1E-98 94% 94% 223677752 Bacteria n n uncultured bacterium JQ998782 535 5 531 0 95% 95% 223677840 Bacteria n n uncultured bacterium JQ999048 555 5 552 0 99% 99% 223678213 Bacteria n n uncultured bacterium JQ998278 392 5 326 5E-157 98% 98% 223678327 Bacteria n n uncultured bacterium JQ998337 411 4 235 1E-33 80% 80% 223678817 Bacteria n n uncultured bacterium JQ998306 402 23 369 4E-163 97% 97% 223678949 Bacteria n n uncultured bacterium JQ998660 519 18 519 0 100% 100% 223678952 Bacteria n n uncultured bacterium JQ998419 444 5 375 0 99% 99% 223679056 Bacteria n n uncultured bacterium JQ998403 437 18 392 0 100% 100% 223679058 Bacteria n n uncultured bacterium JQ999217 762 5 281 2E-129 97% 97% 223679096 Bacteria n n uncultured bacterium JQ998706 527 17 526 0 100% 100% 223679113 Bacteria n n uncultured bacterium JQ998653 518 181 516 1E-169 99% 99% 223679141 Bacteria n n uncultured bacterium JQ998555 491 5 435 0 96% 96% 223679337 Bacteria n n uncultured bacterium JQ998060 319 18 276 6E-116 96% 96% 223679587 Bacteria n n uncultured bacterium JQ998414 441 4 105 2E-27 90% 90% 223679739 Bacteria n n uncultured bacterium JQ998292 397 18 357 2E-136 92% 92% 223679771 Bacteria n n uncultured bacterium JQ998819 538 5 535 0 97% 97% 223679788 Bacteria n n uncultured bacterium JQ998795 536 20 532 0 97% 97% 223679898 Bacteria n n uncultured bacterium JQ997957 276 3 229 1E-112 100% 100% 223679919 Bacteria n n uncultured bacterium JQ998253 383 18 298 3E-144 100% 100% 223680075 Bacteria n n uncultured bacterium JQ999161 572 6 572 0 89% 89% 223680296 Bacteria n n uncultured bacterium JQ998228 377 18 321 2E-150 99% 99% 223680400 Bacteria n n uncultured bacterium JQ999006 551 18 547 0 99% 99% 223680555 Bacteria n n uncultured bacterium JQ998966 548 4 545 0 100% 100% 223680624 Bacteria n n uncultured bacterium JQ999107 560 5 473 0 95% 95% 223680699 Bacteria n n uncultured bacterium JQ997907 254 5 208 6E-100 100% 100% 223680762 Bacteria n n uncultured bacterium JQ999015 552 20 496 0 94% 94% 223680902 Bacteria n n uncultured bacterium JQ998510 477 12 406 0 99% 99% 223680920 Bacteria n n uncultured bacterium JQ998588 502 23 454 0 98% 98% 223680969 Bacteria n n uncultured bacterium JQ998199 367 5 320 1E-158 99% 99% 223680981 Bacteria n n uncultured bacterium JQ999121 562 16 558 0 100% 100% 223681376 Bacteria n n uncultured bacterium JQ998042 312 25 277 6E-106 95% 95% 223681393 Bacteria n n uncultured bacterium JQ998438 450 5 384 6E-157 94% 94% 223681492 Bacteria n n uncultured bacterium JQ998723 529 5 529 0 98% 98% 223681557 Bacteria n n uncultured bacterium JQ998345 414 18 369 1E-163 97% 97% 223681668 Bacteria n n uncultured bacterium JQ998569 495 18 176 1E-55 93% 93% 223681735 Bacteria n n uncultured bacterium JQ998507 476 2 304 8E-151 99% 99% 223681892 Bacteria n n uncultured bacterium JQ998123 343 31 298 5E-137 100% 100% 223682235 Bacteria n n uncultured bacterium JQ999049 555 18 551 0 98% 98% 223682240 Bacteria n n uncultured bacterium JQ997960 279 5 227 2E-110 100% 100% 223682363 Bacteria n n uncultured bacterium JQ998338 411 5 378 0 99% 99% 223682414 Bacteria n n uncultured bacterium JQ998642 516 13 457 0 99% 99% 223682419 Bacteria n n uncultured bacterium JQ998560 493 5 108 8E-37 95% 95% 223682843 Bacteria n n uncultured bacterium JQ998444 453 97 413 8E-156 98% 98% 223683254 Bacteria n n uncultured bacterium JQ998140 349 18 304 1E-147 100% 100% 223683434 Bacteria n n uncultured bacterium JQ999127 563 4 560 0 98% 98% 223683523 Bacteria n n uncultured bacterium JQ997915 258 4 209 5E-101 100% 100% 223683690 Bacteria n n uncultured bacterium JQ998710 528 17 525 0 98% 98% 223684138 Bacteria n n uncultured bacterium JQ997945 271 3 222 2E-100 97% 97% 223685588 Bacteria n n uncultured bacterium JQ998342 412 18 359 5E-177 100% 100% 223685657 Bacteria n n uncultured bacterium JQ999034 554 17 550 0 93% 93% 223685961 Bacteria n n uncultured bacterium JQ998711 528 5 528 0 99% 99% 223686127 Bacteria n n uncultured bacterium JQ997913 256 5 222 5E-76 91% 91% 223686201 Bacteria n n uncultured bacterium JQ998668 522 19 522 0 98% 98% 223688814 Bacteria n n uncultured bacterium JQ997992 295 4 229 4E-112 100% 100% 223689241 Bacteria n n uncultured bacterium JQ998771 534 6 527 0 98% 98% 223689728 Bacteria n n uncultured bacterium JQ998045 315 3 226 1E-77 91% 91% 223689798 Bacteria n n uncultured bacterium JQ999116 561 71 547 0 98% 98% 223694958 Bacteria n n uncultured bacterium JQ998783 535 36 524 0 99% 99% 223695178 Bacteria n n uncultured bacterium JQ998442 451 18 411 8E-161 93% 93% 223695266 Bacteria n n uncultured bacterium JQ998031 308 15 263 7E-125 100% 100% 223695358 Bacteria n n uncultured bacterium JQ999050 555 18 552 0 96% 96% 223695555 Bacteria n n uncultured bacterium 177 Table S1 Cont.

JQ998445 453 18 397 0 100% 100% 223695721 Bacteria n n uncultured bacterium JQ998561 493 5 436 0 95% 95% 223695747 Bacteria n n uncultured bacterium JQ998772 534 14 531 0 93% 93% 223695950 Bacteria n n uncultured bacterium JQ998977 549 18 526 0 93% 93% 223696074 Bacteria n n uncultured bacterium JQ998027 307 25 272 5E-111 96% 96% 223696182 Bacteria n n uncultured bacterium JQ998105 336 5 280 2E-141 100% 100% 223696209 Bacteria n n uncultured bacterium JQ999165 574 8 572 0 91% 91% 223696223 Bacteria n n uncultured bacterium JQ998420 444 5 380 1E-164 95% 95% 223696268 Bacteria n n uncultured bacterium JQ998307 402 18 351 1E-168 99% 99% 223696324 Bacteria n n uncultured bacterium JQ998978 549 5 544 0 96% 96% 223696346 Bacteria n n uncultured bacterium JQ998707 527 3 514 0 100% 100% 223696671 Bacteria n n uncultured bacterium JQ999051 555 18 554 0 98% 98% 223696683 Bacteria n n uncultured bacterium JQ998197 366 18 335 4E-163 100% 100% 223696698 Bacteria n n uncultured bacterium JQ999016 552 4 548 0 99% 99% 223696706 Bacteria n n uncultured bacterium JQ998932 546 24 540 0 91% 91% 223696744 Bacteria n n uncultured bacterium JQ999166 574 5 549 0 94% 94% 223696748 Bacteria n n uncultured bacterium JQ998225 376 18 332 4E-163 100% 100% 223696749 Bacteria n n uncultured bacterium JQ998241 380 18 327 2E-160 100% 100% 223696760 Bacteria n n uncultured bacterium JQ998244 381 18 327 1E-157 99% 99% 223696843 Bacteria n n uncultured bacterium JQ999211 681 20 115 4E-41 100% 100% 223696866 Bacteria n n uncultured bacterium JQ998129 346 5 281 5E-142 100% 100% 223954980 Bacteria n n uncultured bacterium JQ998784 535 5 531 0 91% 91% 223955431 Bacteria n n uncultured bacterium JQ998916 545 15 495 0 100% 100% 224549126 Bacteria n n uncultured bacterium JQ998748 532 18 478 0 100% 100% 224555168 Bacteria n n uncultured bacterium JQ999180 579 96 577 5E-154 88% 88% 224566517 Bacteria n n uncultured bacterium JQ998503 475 24 422 0 98% 98% 224569127 Bacteria n n uncultured bacterium JQ998917 545 16 524 0 100% 100% 224569165 Bacteria n n uncultured bacterium JQ998167 358 26 304 2E-126 96% 96% 224569174 Bacteria n n uncultured bacterium JQ998301 400 13 360 2E-141 93% 93% 224569195 Bacteria n n uncultured bacterium JQ998773 534 18 291 2E-138 100% 100% 224569580 Bacteria n n uncultured bacterium JQ999099 559 5 521 0 94% 94% 224569637 Bacteria n n uncultured bacterium JQ998676 523 16 468 0 100% 100% 224569653 Bacteria n n uncultured bacterium JQ998112 339 160 307 5E-67 99% 99% 224569711 Bacteria n n uncultured bacterium JQ998305 401 5 325 2E-166 100% 100% 224569824 Bacteria n n uncultured bacterium JQ998617 508 26 353 3E-170 100% 100% 224569839 Bacteria n n uncultured bacterium JQ998329 407 35 329 3E-150 100% 100% 224569899 Bacteria n n uncultured bacterium JQ998903 544 2 526 0 95% 95% 224569901 Bacteria n n uncultured bacterium JQ998308 402 5 303 3E-154 100% 100% 224569914 Bacteria n n uncultured bacterium JQ998587 501 17 444 0 98% 98% 224569941 Bacteria n n uncultured bacterium JQ998385 430 19 373 0 100% 100% 224569990 Bacteria n n uncultured bacterium JQ998219 373 19 341 6E-166 100% 100% 224570013 Bacteria n n uncultured bacterium JQ998885 543 5 534 0 99% 99% 224570029 Bacteria n n uncultured bacterium JQ999201 620 5 154 1E-40 88% 88% 224570038 Bacteria n n uncultured bacterium JQ998933 546 5 468 0 93% 93% 224570044 Bacteria n n uncultured bacterium JQ998835 539 21 538 0 93% 93% 224570046 Bacteria n n uncultured bacterium JQ999007 551 18 550 0 97% 97% 224570047 Bacteria n n uncultured bacterium JQ999218 762 5 181 7E-64 93% 93% 224570055 Bacteria n n uncultured bacterium JQ998712 528 17 528 0 99% 99% 224570061 Bacteria n n uncultured bacterium JQ998046 315 5 275 2E-135 99% 99% 224570079 Bacteria n n uncultured bacterium JQ998106 336 4 252 6E-116 98% 98% 224570085 Bacteria n n uncultured bacterium JQ998740 531 17 526 0 94% 94% 224570113 Bacteria n n uncultured bacterium JQ999065 556 3 554 0 98% 98% 224570119 Bacteria n n uncultured bacterium JQ999100 559 259 556 1E-149 99% 99% 224570132 Bacteria n n uncultured bacterium JQ998669 522 5 516 0 97% 97% 224570156 Bacteria n n uncultured bacterium JQ997990 294 5 261 5E-131 100% 100% 224570180 Bacteria n n uncultured bacterium JQ998774 534 5 530 0 100% 100% 224570188 Bacteria n n uncultured bacterium JQ998309 402 5 334 1E-163 99% 99% 224570202 Bacteria n n uncultured bacterium JQ998322 406 125 351 2E-102 97% 97% 224570223 Bacteria n n uncultured bacterium JQ997962 280 18 238 5E-111 100% 100% 224570225 Bacteria n n uncultured bacterium JQ998264 387 18 337 2E-125 93% 93% 224570227 Bacteria n n uncultured bacterium JQ998168 358 5 286 1E-137 99% 99% 224611899 Bacteria n n uncultured bacterium JQ998552 490 5 423 0 100% 100% 224712074 Bacteria n n uncultured bacterium JQ999008 551 21 545 0 98% 98% 224714764 Bacteria n n uncultured bacterium JQ998654 518 19 516 0 97% 97% 225302504 Bacteria n n uncultured bacterium JQ998612 507 5 445 0 100% 100% 225302521 Bacteria n n uncultured bacterium JQ998570 495 5 452 0 94% 94% 225302626 Bacteria n n uncultured bacterium JQ998904 544 21 532 0 99% 99% 225302678 Bacteria n n uncultured bacterium JQ998393 433 18 376 7E-161 95% 95% 225302713 Bacteria n n uncultured bacterium JQ998075 324 157 293 1E-48 94% 94% 225337162 Bacteria n n uncultured bacterium JQ997977 287 24 236 7E-105 100% 100% 225338363 Bacteria n n uncultured bacterium JQ997994 296 5 251 2E-110 96% 96% 225338711 Bacteria n n uncultured bacterium JQ997882 240 14 106 6E-30 94% 94% 225382292 Bacteria n n uncultured bacterium JQ998504 475 1 403 5E-168 94% 94% 225382550 Bacteria n n uncultured bacterium JQ998893 543 23 519 0 97% 97% 225618928 Bacteria n n uncultured bacterium JQ998820 538 15 529 0 94% 94% 225936324 Bacteria n n uncultured bacterium JQ998749 532 5 185 4E-70 94% 94% 226350595 Bacteria n n uncultured bacterium JQ998262 386 18 340 6E-166 100% 100% 226351268 Bacteria n n uncultured bacterium JQ998369 423 4 378 3E-150 93% 93% 226429024 Bacteria n n uncultured bacterium JQ998967 548 18 516 0 99% 99% 226430279 Bacteria n n uncultured bacterium 178 Table S1 Cont.

JQ998431 448 4 392 1E-124 88% 88% 226447054 Bacteria n n uncultured bacterium JQ997998 297 18 57 2E-10 100% 100% 227437797 Bacteria n n uncultured bacterium JQ998198 366 18 321 5E-157 100% 100% 227937314 Bacteria n n uncultured bacterium JQ998153 352 21 199 1E-83 99% 99% 228480902 Bacteria n n uncultured bacterium JQ999172 575 5 272 4E-125 97% 97% 229428720 Bacteria n n uncultured bacterium JQ998346 414 5 381 3E-160 94% 94% 229428732 Bacteria n n uncultured bacterium JQ998865 542 5 540 0 96% 96% 229428738 Bacteria n n uncultured bacterium JQ998065 320 5 268 2E-126 98% 98% 229428747 Bacteria n n uncultured bacterium JQ997936 268 18 232 6E-100 98% 98% 229428758 Bacteria n n uncultured bacterium JQ999091 558 3 381 2E-123 89% 89% 229428778 Bacteria n n uncultured bacterium JQ998160 353 4 295 3E-144 99% 99% 229428780 Bacteria n n uncultured bacterium JQ998670 522 18 488 0 95% 95% 229428782 Bacteria n n uncultured bacterium JQ998632 514 20 513 0 97% 97% 229428785 Bacteria n n uncultured bacterium JQ998750 532 21 528 0 94% 94% 229428786 Bacteria n n uncultured bacterium JQ998016 303 18 253 1E-87 92% 92% 229428847 Bacteria n n uncultured bacterium JQ999035 554 279 514 5E-119 100% 100% 229428882 Bacteria n n uncultured bacterium JQ998249 382 5 339 7E-141 94% 94% 237687615 Bacteria n n uncultured bacterium JQ998302 400 19 350 1E-152 96% 96% 237687664 Bacteria n n uncultured bacterium JQ998493 469 17 423 0 97% 97% 237687668 Bacteria n n uncultured bacterium JQ998148 351 5 308 5E-147 98% 98% 237687669 Bacteria n n uncultured bacterium JQ998886 543 5 540 0 100% 100% 237774863 Bacteria n n uncultured bacterium JQ998791 535 4 483 0 96% 96% 237934376 Bacteria n n uncultured bacterium JQ998061 319 15 280 3E-114 95% 95% 238068056 Bacteria n n uncultured bacterium JQ998066 320 5 251 4E-117 98% 98% 238068457 Bacteria n n uncultured bacterium JQ998775 534 5 529 0 93% 93% 238068520 Bacteria n n uncultured bacterium JQ998057 318 18 270 7E-115 97% 97% 238068847 Bacteria n n uncultured bacterium JQ998729 530 230 530 2E-118 93% 93% 238068889 Bacteria n n uncultured bacterium JQ999206 659 5 91 2E-23 92% 92% 238068929 Bacteria n n uncultured bacterium JQ999203 630 17 163 2E-69 100% 100% 238254033 Bacteria n n uncultured bacterium JQ998648 517 2 477 0 100% 100% 238254594 Bacteria n n uncultured bacterium JQ998149 351 18 319 6E-136 96% 96% 238256387 Bacteria n n uncultured bacterium JQ998887 543 18 543 0 100% 100% 238258794 Bacteria n n uncultured bacterium JQ998388 431 5 375 3E-159 94% 94% 238260512 Bacteria n n uncultured bacterium JQ997966 283 10 233 2E-105 98% 98% 238260516 Bacteria n n uncultured bacterium JQ998979 549 18 544 0 100% 100% 238262069 Bacteria n n uncultured bacterium JQ999122 562 5 558 0 97% 97% 238262159 Bacteria n n uncultured bacterium JQ998980 549 283 544 6E-108 95% 95% 238262473 Bacteria n n uncultured bacterium JQ998519 479 26 397 0 98% 98% 238262864 Bacteria n n uncultured bacterium JQ998290 396 5 363 0 100% 100% 238262966 Bacteria n n uncultured bacterium JQ999117 561 4 556 0 99% 99% 238263260 Bacteria n n uncultured bacterium JQ998043 313 5 230 2E-100 96% 96% 238263285 Bacteria n n uncultured bacterium JQ998934 546 18 541 0 99% 99% 238263468 Bacteria n n uncultured bacterium JQ998982 549 5 537 0 99% 99% 238263771 Bacteria n n uncultured bacterium JQ998808 537 18 533 0 97% 97% 238263814 Bacteria n n uncultured bacterium JQ998951 547 24 544 0 97% 97% 238263848 Bacteria n n uncultured bacterium JQ998124 343 5 317 2E-155 99% 99% 238263874 Bacteria n n uncultured bacterium JQ998370 423 1 352 3E-169 97% 97% 238265232 Bacteria n n uncultured bacterium JQ998028 307 17 232 6E-106 100% 100% 238265477 Bacteria n n uncultured bacterium JQ998119 342 5 294 3E-149 100% 100% 238266272 Bacteria n n uncultured bacterium JQ997967 283 19 238 2E-100 97% 97% 238268651 Bacteria n n uncultured bacterium JQ998260 385 17 340 1E-162 99% 99% 238269172 Bacteria n n uncultured bacterium JQ998785 535 5 501 0 100% 100% 238269688 Bacteria n n uncultured bacterium JQ997968 283 3 245 3E-108 96% 96% 238270075 Bacteria n n uncultured bacterium JQ998311 403 4 281 6E-122 96% 96% 238270979 Bacteria n n uncultured bacterium JQ998193 365 18 138 1E-53 99% 99% 238271851 Bacteria n n uncultured bacterium JQ998054 317 18 280 9E-129 99% 99% 238273315 Bacteria n n uncultured bacterium JQ998776 534 5 493 0 98% 98% 238274455 Bacteria n n uncultured bacterium JQ998450 455 24 371 4E-179 100% 100% 238274745 Bacteria n n uncultured bacterium JQ998001 298 14 249 4E-117 100% 100% 238275219 Bacteria n n uncultured bacterium JQ998101 334 17 301 7E-140 99% 99% 238275250 Bacteria n n uncultured bacterium JQ998540 485 5 137 7E-62 100% 100% 238275715 Bacteria n n uncultured bacterium JQ997985 292 5 259 9E-109 95% 95% 238275802 Bacteria n n uncultured bacterium JQ998357 417 4 366 0 100% 100% 238276735 Bacteria n n uncultured bacterium JQ998531 483 5 328 1E-164 99% 99% 238276739 Bacteria n n uncultured bacterium JQ998844 540 24 537 0 98% 98% 238276766 Bacteria n n uncultured bacterium JQ998853 541 5 500 0 98% 98% 238276795 Bacteria n n uncultured bacterium JQ999128 563 18 561 0 100% 100% 238277365 Bacteria n n uncultured bacterium JQ999209 679 4 375 0 100% 100% 238277647 Bacteria n n uncultured bacterium JQ998524 480 17 386 8E-161 95% 95% 238277655 Bacteria n n uncultured bacterium JQ997951 273 5 231 2E-95 95% 95% 238277784 Bacteria n n uncultured bacterium JQ998002 298 5 241 4E-102 96% 96% 238278012 Bacteria n n uncultured bacterium JQ998396 434 18 393 1E-163 95% 95% 238278031 Bacteria n n uncultured bacterium JQ998378 427 17 385 9E-170 96% 96% 238278034 Bacteria n n uncultured bacterium JQ998643 516 3 422 0 96% 96% 238278039 Bacteria n n uncultured bacterium JQ998082 326 5 282 1E-142 100% 100% 238278081 Bacteria n n uncultured bacterium JQ998349 415 5 257 6E-122 98% 98% 238278110 Bacteria n n uncultured bacterium JQ998141 349 5 300 3E-149 99% 99% 238278174 Bacteria n n uncultured bacterium JQ998935 546 18 542 0 100% 100% 238279128 Bacteria n n uncultured bacterium JQ998671 522 5 481 0 99% 99% 238280160 Bacteria n n uncultured bacterium 179 Table S1 Cont.

JQ998845 540 18 538 0 98% 98% 238280720 Bacteria n n uncultured bacterium JQ998263 386 21 342 2E-135 94% 94% 238281878 Bacteria n n uncultured bacterium JQ998854 541 7 538 0 99% 99% 238286296 Bacteria n n uncultured bacterium JQ997970 284 5 246 1E-107 96% 96% 238286743 Bacteria n n uncultured bacterium JQ998200 367 18 323 4E-158 100% 100% 238286788 Bacteria n n uncultured bacterium JQ998918 545 17 516 0 95% 95% 238286790 Bacteria n n uncultured bacterium JQ998229 377 28 326 1E-137 97% 97% 238286826 Bacteria n n uncultured bacterium JQ999026 553 18 551 0 100% 100% 238286863 Bacteria n n uncultured bacterium JQ997927 264 5 198 4E-92 99% 99% 238289810 Bacteria n n uncultured bacterium JQ998866 542 5 540 0 96% 96% 238292394 Bacteria n n uncultured bacterium JQ998630 513 3 457 0 100% 100% 238292646 Bacteria n n uncultured bacterium JQ998093 330 23 285 1E-127 98% 98% 238292712 Bacteria n n uncultured bacterium JQ997959 277 19 246 6E-110 99% 99% 238293955 Bacteria n n uncultured bacterium JQ999193 604 5 79 1E-20 93% 93% 238295305 Bacteria n n uncultured bacterium JQ998809 537 13 497 0 100% 100% 238295567 Bacteria n n uncultured bacterium JQ999118 561 114 554 0 94% 94% 238296060 Bacteria n n uncultured bacterium JQ999123 562 5 559 0 99% 99% 238296317 Bacteria n n uncultured bacterium JQ998397 434 17 389 1E-158 94% 94% 238297072 Bacteria n n uncultured bacterium JQ998291 396 18 325 3E-159 100% 100% 238297462 Bacteria n n uncultured bacterium JQ998613 507 4 461 0 100% 100% 238297600 Bacteria n n uncultured bacterium JQ998133 347 4 315 8E-160 100% 100% 238297688 Bacteria n n uncultured bacterium JQ998952 547 17 545 0 98% 98% 238299775 Bacteria n n uncultured bacterium JQ998274 390 144 339 3E-95 99% 99% 238299976 Bacteria n n uncultured bacterium JQ998936 546 5 542 0 98% 98% 238300686 Bacteria n n uncultured bacterium JQ998303 400 5 284 4E-123 96% 96% 238301381 Bacteria n n uncultured bacterium JQ998636 515 17 467 0 100% 100% 238301499 Bacteria n n uncultured bacterium JQ997995 296 18 249 4E-112 99% 99% 238301575 Bacteria n n uncultured bacterium JQ998983 549 17 536 0 94% 94% 238301699 Bacteria n n uncultured bacterium JQ998282 393 17 246 3E-84 92% 92% 238302245 Bacteria n n uncultured bacterium JQ998919 545 32 279 2E-122 99% 99% 238302248 Bacteria n n uncultured bacterium JQ998394 433 18 396 0 99% 99% 238302270 Bacteria n n uncultured bacterium JQ999017 552 5 552 0 95% 95% 238302336 Bacteria n n uncultured bacterium JQ998796 536 5 534 0 99% 99% 238302651 Bacteria n n uncultured bacterium JQ998741 531 4 527 0 100% 100% 238302818 Bacteria n n uncultured bacterium JQ998663 520 18 452 0 100% 100% 238302994 Bacteria n n uncultured bacterium JQ998937 546 5 543 0 95% 95% 238303284 Bacteria n n uncultured bacterium JQ998542 486 1 454 0 99% 99% 238303306 Bacteria n n uncultured bacterium JQ998477 465 23 378 0 99% 99% 238303310 Bacteria n n uncultured bacterium JQ998005 299 18 252 4E-102 96% 96% 238303622 Bacteria n n uncultured bacterium JQ998022 305 17 259 1E-112 98% 98% 238303655 Bacteria n n uncultured bacterium JQ998173 359 18 320 2E-156 100% 100% 238304075 Bacteria n n uncultured bacterium JQ999213 689 24 356 2E-128 93% 93% 238305275 Bacteria n n uncultured bacterium JQ999200 617 5 256 1E-85 91% 91% 238305904 Bacteria n n uncultured bacterium JQ998999 550 18 374 0 100% 100% 238306208 Bacteria n n uncultured bacterium JQ999092 558 18 552 0 94% 94% 238306931 Bacteria n n uncultured bacterium JQ998032 308 3 261 4E-132 100% 100% 238307378 Bacteria n n uncultured bacterium JQ998051 316 5 270 6E-121 97% 97% 238308732 Bacteria n n uncultured bacterium JQ998076 324 5 286 2E-101 91% 91% 238308912 Bacteria n n uncultured bacterium JQ999194 607 18 517 0 94% 94% 238309010 Bacteria n n uncultured bacterium JQ998344 413 17 386 0 99% 99% 238309257 Bacteria n n uncultured bacterium JQ998855 541 5 539 0 99% 99% 238309294 Bacteria n n uncultured bacterium JQ998562 493 19 305 9E-136 98% 98% 238309323 Bacteria n n uncultured bacterium JQ998230 378 3 329 8E-170 100% 100% 238309341 Bacteria n n uncultured bacterium JQ998548 487 17 449 6E-172 92% 92% 238309363 Bacteria n n uncultured bacterium JQ997890 244 5 193 3E-93 100% 100% 238309370 Bacteria n n uncultured bacterium JQ998920 545 18 545 0 99% 99% 238309389 Bacteria n n uncultured bacterium JQ998364 421 5 272 2E-126 98% 98% 238309416 Bacteria n n uncultured bacterium JQ998751 532 4 514 0 100% 100% 238309418 Bacteria n n uncultured bacterium JQ998867 542 15 535 0 98% 98% 238309460 Bacteria n n uncultured bacterium JQ998810 537 7 533 0 99% 99% 238309470 Bacteria n n uncultured bacterium JQ999066 556 5 551 0 94% 94% 238309471 Bacteria n n uncultured bacterium JQ997937 268 5 232 6E-110 99% 99% 238309482 Bacteria n n uncultured bacterium JQ998905 544 5 179 2E-62 92% 92% 238309502 Bacteria n n uncultured bacterium JQ998682 524 4 485 0 100% 100% 238309538 Bacteria n n uncultured bacterium JQ997891 245 5 213 9E-103 100% 100% 238309566 Bacteria n n uncultured bacterium JQ997946 271 5 224 2E-110 100% 100% 238309915 Bacteria n n uncultured bacterium JQ998633 514 18 510 0 99% 99% 238310085 Bacteria n n uncultured bacterium JQ998953 547 27 546 0 99% 99% 238310341 Bacteria n n uncultured bacterium JQ998689 525 18 525 0 99% 99% 238310578 Bacteria n n uncultured bacterium JQ998811 537 16 511 0 99% 99% 238311386 Bacteria n n uncultured bacterium JQ999036 554 5 548 0 97% 97% 238312124 Bacteria n n uncultured bacterium JQ999164 573 5 398 8E-107 85% 85% 238312201 Bacteria n n uncultured bacterium JQ998968 548 17 543 0 98% 98% 238312949 Bacteria n n uncultured bacterium JQ998724 529 4 477 0 99% 99% 238313108 Bacteria n n uncultured bacterium JQ998713 528 18 490 0 99% 99% 238313116 Bacteria n n uncultured bacterium JQ997877 232 1 198 2E-83 96% 96% 238313383 Bacteria n n uncultured bacterium JQ998888 543 17 538 0 98% 98% 238313396 Bacteria n n uncultured bacterium JQ998595 503 18 451 0 100% 100% 238313410 Bacteria n n uncultured bacterium JQ998868 542 5 475 0 100% 100% 238313425 Bacteria n n uncultured bacterium 180 Table S1 Cont.

JQ998398 434 35 346 6E-127 94% 94% 238314296 Bacteria n n uncultured bacterium JQ998605 506 13 446 0 98% 98% 238314984 Bacteria n n uncultured bacterium JQ998984 549 5 548 0 99% 99% 238315154 Bacteria n n uncultured bacterium JQ998821 538 5 535 0 90% 90% 238315752 Bacteria n n uncultured bacterium JQ998023 305 5 261 5E-131 100% 100% 238316074 Bacteria n n uncultured bacterium JQ998499 473 17 415 7E-122 88% 88% 238316219 Bacteria n n uncultured bacterium JQ999018 552 18 547 0 98% 98% 238316220 Bacteria n n uncultured bacterium JQ999186 591 6 589 0 92% 92% 238317490 Bacteria n n uncultured bacterium JQ998288 395 24 346 3E-149 97% 97% 238318817 Bacteria n n uncultured bacterium JQ998283 393 14 360 3E-179 100% 100% 238320101 Bacteria n n uncultured bacterium JQ998154 352 5 299 6E-101 91% 91% 238320376 Bacteria n n uncultured bacterium JQ998426 446 4 420 0 100% 100% 238321524 Bacteria n n uncultured bacterium JQ999067 556 16 497 0 97% 97% 238321974 Bacteria n n uncultured bacterium JQ999219 865 17 197 2E-65 93% 93% 238321986 Bacteria n n uncultured bacterium JQ999152 569 83 564 0 100% 100% 238321993 Bacteria n n uncultured bacterium JQ998889 543 2 509 0 97% 97% 238322379 Bacteria n n uncultured bacterium JQ998696 526 5 459 0 95% 95% 238322714 Bacteria n n uncultured bacterium JQ999019 552 5 544 0 100% 100% 238323226 Bacteria n n uncultured bacterium JQ998284 394 5 358 0 100% 100% 238324411 Bacteria n n uncultured bacterium JQ998690 525 4 474 0 100% 100% 238324967 Bacteria n n uncultured bacterium JQ999000 550 18 546 0 99% 99% 238325192 Bacteria n n uncultured bacterium JQ999108 560 5 557 0 97% 97% 238325493 Bacteria n n uncultured bacterium JQ998631 513 12 373 2E-177 98% 98% 238326766 Bacteria n n uncultured bacterium JQ998697 526 18 446 0 94% 94% 238327054 Bacteria n n uncultured bacterium JQ998466 460 5 384 0 99% 99% 238327290 Bacteria n n uncultured bacterium JQ999052 555 5 550 0 97% 97% 238327333 Bacteria n n uncultured bacterium JQ998293 397 17 245 1E-113 100% 100% 238327796 Bacteria n n uncultured bacterium JQ997999 297 5 253 7E-110 96% 96% 238327953 Bacteria n n uncultured bacterium JQ998836 539 24 537 0 97% 97% 238329691 Bacteria n n uncultured bacterium JQ998113 339 18 274 4E-108 95% 95% 238329799 Bacteria n n uncultured bacterium JQ999197 614 18 608 0 93% 93% 238330195 Bacteria n n uncultured bacterium JQ997883 240 5 193 3E-93 100% 100% 238330556 Bacteria n n uncultured bacterium JQ998454 456 1 407 2E-176 95% 95% 238331284 Bacteria n n uncultured bacterium JQ998822 538 23 511 0 99% 99% 238333276 Bacteria n n uncultured bacterium JQ998206 369 18 323 2E-156 100% 100% 238333610 Bacteria n n uncultured bacterium JQ998458 457 5 408 0 96% 96% 238333962 Bacteria n n uncultured bacterium JQ999129 563 17 546 0 100% 100% 238333963 Bacteria n n uncultured bacterium JQ999053 555 5 553 0 99% 99% 238334040 Bacteria n n uncultured bacterium JQ999167 574 18 378 0 99% 99% 238334087 Bacteria n n uncultured bacterium JQ998265 387 18 347 4E-163 98% 98% 238334867 Bacteria n n uncultured bacterium JQ998077 324 5 279 2E-121 96% 96% 238335483 Bacteria n n uncultured bacterium JQ999101 559 17 492 0 100% 100% 238335992 Bacteria n n uncultured bacterium JQ998285 394 5 344 2E-175 100% 100% 238336141 Bacteria n n uncultured bacterium JQ997963 280 5 235 6E-115 100% 100% 238336150 Bacteria n n uncultured bacterium JQ998052 316 5 262 2E-96 92% 92% 238336615 Bacteria n n uncultured bacterium JQ999141 566 24 563 0 97% 97% 238337104 Bacteria n n uncultured bacterium JQ998236 379 5 332 2E-170 100% 100% 238337355 Bacteria n n uncultured bacterium JQ998012 301 28 235 4E-102 100% 100% 238337361 Bacteria n n uncultured bacterium JQ998921 545 5 540 0 99% 99% 238337387 Bacteria n n uncultured bacterium JQ999184 582 5 577 0 94% 94% 238337402 Bacteria n n uncultured bacterium JQ998938 546 5 270 1E-135 100% 100% 238337649 Bacteria n n uncultured bacterium JQ998589 502 5 451 0 99% 99% 238338805 Bacteria n n uncultured bacterium JQ999009 551 31 546 0 98% 98% 238340189 Bacteria n n uncultured bacterium JQ998089 328 5 296 2E-126 95% 95% 238340895 Bacteria n n uncultured bacterium JQ998330 408 18 375 1E-148 94% 94% 238341047 Bacteria n n uncultured bacterium JQ998120 342 18 297 2E-125 96% 96% 238341169 Bacteria n n uncultured bacterium JQ999124 562 18 415 0 99% 99% 238341208 Bacteria n n uncultured bacterium JQ998985 549 16 545 0 99% 99% 238341728 Bacteria n n uncultured bacterium JQ998969 548 5 546 0 99% 99% 238341804 Bacteria n n uncultured bacterium JQ998869 542 17 542 0 99% 99% 238342154 Bacteria n n uncultured bacterium JQ999130 563 5 562 0 98% 98% 238342263 Bacteria n n uncultured bacterium JQ998494 469 4 419 0 99% 99% 238342484 Bacteria n n uncultured bacterium JQ998649 517 12 472 0 92% 92% 238342531 Bacteria n n uncultured bacterium JQ998072 323 17 279 3E-134 100% 100% 238342541 Bacteria n n uncultured bacterium JQ998486 467 4 417 0 98% 98% 238342679 Bacteria n n uncultured bacterium JQ999093 558 18 554 0 97% 97% 238342752 Bacteria n n uncultured bacterium JQ998245 381 17 338 5E-167 100% 100% 238343654 Bacteria n n uncultured bacterium JQ998323 406 13 155 8E-66 99% 99% 238343750 Bacteria n n uncultured bacterium JQ999020 552 5 524 0 93% 93% 238344032 Bacteria n n uncultured bacterium JQ998014 302 5 270 3E-113 95% 95% 238344125 Bacteria n n uncultured bacterium JQ998939 546 5 541 0 96% 96% 238344396 Bacteria n n uncultured bacterium JQ998373 425 5 395 0 99% 99% 238344958 Bacteria n n uncultured bacterium JQ998044 314 17 284 2E-130 99% 99% 238345059 Bacteria n n uncultured bacterium JQ998155 352 5 304 8E-150 99% 99% 238346677 Bacteria n n uncultured bacterium JQ998188 364 5 280 6E-136 99% 99% 238346725 Bacteria n n uncultured bacterium JQ998574 496 5 435 0 99% 99% 238346876 Bacteria n n uncultured bacterium JQ998250 382 5 280 5E-132 98% 98% 238347038 Bacteria n n uncultured bacterium JQ999109 560 5 291 2E-147 100% 100% 238347095 Bacteria n n uncultured bacterium JQ998598 504 5 442 0 93% 93% 238347314 Bacteria n n uncultured bacterium 181 Table S1 Cont.

JQ998906 544 5 526 0 99% 99% 238347468 Bacteria n n uncultured bacterium JQ998907 544 18 310 1E-100 91% 91% 238347897 Bacteria n n uncultured bacterium JQ998970 548 5 548 0 99% 99% 238348588 Bacteria n n uncultured bacterium JQ998374 425 25 382 0 99% 99% 238348791 Bacteria n n uncultured bacterium JQ999153 569 5 569 0 98% 98% 238348794 Bacteria n n uncultured bacterium JQ998176 360 5 304 4E-133 96% 96% 238348865 Bacteria n n uncultured bacterium JQ999054 555 18 553 0 97% 97% 238349220 Bacteria n n uncultured bacterium JQ999168 574 18 573 0 99% 99% 238349987 Bacteria n n uncultured bacterium JQ998094 330 132 289 6E-76 100% 100% 238349990 Bacteria n n uncultured bacterium JQ998714 528 330 493 3E-71 97% 97% 238350004 Bacteria n n uncultured bacterium JQ997971 284 4 222 1E-92 96% 96% 238350221 Bacteria n n uncultured bacterium JQ998021 304 83 273 1E-92 99% 99% 238350299 Bacteria n n uncultured bacterium JQ998846 540 18 538 0 98% 98% 238350584 Bacteria n n uncultured bacterium JQ998742 531 19 530 0 96% 96% 238351114 Bacteria n n uncultured bacterium JQ998954 547 5 368 0 100% 100% 238351604 Bacteria n n uncultured bacterium JQ997884 241 34 197 2E-64 95% 95% 238351640 Bacteria n n uncultured bacterium JQ999181 579 289 575 5E-139 98% 98% 238351692 Bacteria n n uncultured bacterium JQ998312 403 1 357 0 100% 100% 238352483 Bacteria n n uncultured bacterium JQ997954 275 5 219 2E-105 100% 100% 238352567 Bacteria n n uncultured bacterium JQ999155 570 5 570 0 96% 96% 238400367 Bacteria n n uncultured bacterium JQ997930 266 18 221 5E-81 94% 94% 238400424 Bacteria n n uncultured bacterium JQ998922 545 5 526 0 100% 100% 238401117 Bacteria n n uncultured bacterium JQ999075 557 17 555 0 99% 99% 238404569 Bacteria n n uncultured bacterium JQ998459 457 3 409 0 99% 99% 238404723 Bacteria n n uncultured bacterium JQ997918 260 5 212 1E-101 100% 100% 238406617 Bacteria n n uncultured bacterium JQ999148 567 5 566 0 99% 99% 238407074 Bacteria n n uncultured bacterium JQ998870 542 17 539 0 98% 98% 238412051 Bacteria n n uncultured bacterium JQ998572 495 5 440 0 95% 95% 238412351 Bacteria n n uncultured bacterium JQ998443 452 2 386 4E-134 90% 90% 238415068 Bacteria n n uncultured bacterium JQ998847 540 17 535 0 100% 100% 238415407 Bacteria n n uncultured bacterium JQ998618 508 3 427 0 100% 100% 238415774 Bacteria n n uncultured bacterium JQ998526 481 16 424 0 98% 98% 238415993 Bacteria n n uncultured bacterium JQ997983 290 18 245 1E-106 98% 98% 238416037 Bacteria n n uncultured bacterium JQ998231 378 3 232 4E-108 98% 98% 238416695 Bacteria n n uncultured bacterium JQ997996 296 17 265 7E-120 98% 98% 238417704 Bacteria n n uncultured bacterium JQ998743 531 4 459 0 95% 95% 238419619 Bacteria n n uncultured bacterium JQ998683 524 20 228 6E-98 98% 98% 238421978 Bacteria n n uncultured bacterium JQ998435 449 15 255 1E-114 98% 98% 238422186 Bacteria n n uncultured bacterium JQ999135 564 5 563 0 96% 96% 238422435 Bacteria n n uncultured bacterium JQ998986 549 1 537 0 96% 96% 238423057 Bacteria n n uncultured bacterium JQ999068 556 29 556 0 97% 97% 238423665 Bacteria n n uncultured bacterium JQ998268 388 17 357 7E-166 98% 98% 238423676 Bacteria n n uncultured bacterium JQ998090 328 5 273 3E-129 98% 98% 238426672 Bacteria n n uncultured bacterium JQ998715 528 18 473 0 100% 100% 238426679 Bacteria n n uncultured bacterium JQ999154 569 9 567 0 97% 97% 238426699 Bacteria n n uncultured bacterium JQ998103 335 16 303 2E-90 89% 89% 238426702 Bacteria n n uncultured bacterium JQ998987 549 5 549 0 100% 100% 238426783 Bacteria n n uncultured bacterium JQ998908 544 15 543 0 98% 98% 238426795 Bacteria n n uncultured bacterium JQ999110 560 3 559 0 99% 99% 238426815 Bacteria n n uncultured bacterium JQ999156 570 5 563 0 93% 93% 238426833 Bacteria n n uncultured bacterium JQ999037 554 5 551 0 98% 98% 238426835 Bacteria n n uncultured bacterium JQ999038 554 4 540 0 97% 97% 238426859 Bacteria n n uncultured bacterium JQ998563 493 4 433 0 95% 95% 238560560 Bacteria n n uncultured bacterium JQ998725 529 18 482 0 98% 98% 238560573 Bacteria n n uncultured bacterium JQ998375 425 5 296 8E-101 90% 90% 238770193 Bacteria n n uncultured bacterium JQ998621 509 5 300 2E-147 99% 99% 238774505 Bacteria n n uncultured bacterium JQ998475 463 5 436 0 97% 97% 238836153 Bacteria n n uncultured bacterium JQ998465 459 17 400 6E-122 88% 88% 238836154 Bacteria n n uncultured bacterium JQ998565 494 18 404 3E-180 97% 97% 238836158 Bacteria n n uncultured bacterium JQ999178 578 20 571 0 90% 90% 238836167 Bacteria n n uncultured bacterium JQ997964 281 4 229 2E-94 95% 95% 238836172 Bacteria n n uncultured bacterium JQ999131 563 393 556 2E-67 96% 96% 238914983 Bacteria n n uncultured bacterium JQ998121 342 18 294 5E-142 100% 100% 239837090 Bacteria n n uncultured bacterium JQ997931 266 18 153 8E-64 100% 100% 239837186 Bacteria n n uncultured bacterium JQ998286 394 1 288 2E-146 100% 100% 239837190 Bacteria n n uncultured bacterium JQ998024 306 5 243 3E-119 100% 100% 239837210 Bacteria n n uncultured bacterium JQ998655 518 5 464 0 95% 95% 239837238 Bacteria n n uncultured bacterium JQ997943 270 4 209 5E-101 100% 100% 239837254 Bacteria n n uncultured bacterium JQ998992 549 1 546 0 92% 92% 239913644 Bacteria n n uncultured bacterium JQ998389 431 17 91 1E-29 100% 100% 239923289 Bacteria n n uncultured bacterium JQ999174 577 23 577 0 90% 90% 240000256 Bacteria n n uncultured bacterium JQ998684 524 28 523 0 95% 95% 240000985 Bacteria n n uncultured bacterium JQ998691 525 4 525 0 93% 93% 240001010 Bacteria n n uncultured bacterium JQ999076 557 5 557 0 95% 95% 240001265 Bacteria n n uncultured bacterium JQ998251 382 7 229 1E-108 99% 99% 240001267 Bacteria n n uncultured bacterium JQ999055 555 6 548 0 95% 95% 240001303 Bacteria n n uncultured bacterium JQ998073 323 5 271 2E-131 99% 99% 240001638 Bacteria n n uncultured bacterium JQ998246 381 5 339 3E-159 97% 97% 240001755 Bacteria n n uncultured bacterium JQ998971 548 18 545 0 94% 94% 240001795 Bacteria n n uncultured bacterium 182 Table S1 Cont.

JQ998923 545 5 541 3E-180 88% 88% 240001834 Bacteria n n uncultured bacterium JQ998940 546 5 539 0 96% 96% 240001842 Bacteria n n uncultured bacterium JQ998029 307 41 98 2E-20 100% 100% 240001855 Bacteria n n uncultured bacterium JQ998941 546 17 499 0 95% 95% 240002020 Bacteria n n uncultured bacterium JQ998432 448 9 412 2E-126 88% 88% 240002108 Bacteria n n uncultured bacterium JQ998279 392 15 360 2E-170 98% 98% 240126815 Bacteria n n uncultured bacterium JQ998590 502 18 435 0 99% 99% 240126870 Bacteria n n uncultured bacterium JQ998581 500 18 422 0 100% 100% 240126924 Bacteria n n uncultured bacterium JQ999094 558 24 549 0 98% 98% 240126929 Bacteria n n uncultured bacterium JQ999132 563 25 562 0 99% 99% 240126955 Bacteria n n uncultured bacterium JQ998300 399 5 233 2E-91 94% 94% 240127013 Bacteria n n uncultured bacterium JQ998339 411 11 197 3E-85 98% 98% 242247783 Bacteria n n uncultured bacterium JQ998214 371 17 327 3E-159 100% 100% 242758918 Bacteria n n uncultured bacterium JQ998384 429 4 304 1E-148 99% 99% 253769623 Bacteria n n uncultured bacterium JQ998730 530 20 527 0 94% 94% 253770307 Bacteria n n uncultured bacterium JQ998716 528 5 528 0 99% 99% 254210154 Bacteria n n uncultured bacterium JQ998415 441 18 380 2E-176 98% 98% 254210156 Bacteria n n uncultured bacterium JQ998606 506 8 506 0 94% 94% 254210165 Bacteria n n uncultured bacterium JQ998708 527 1 501 0 93% 93% 254210171 Bacteria n n uncultured bacterium JQ998786 535 18 535 0 97% 97% 254210176 Bacteria n n uncultured bacterium JQ998637 515 18 476 0 98% 98% 254210178 Bacteria n n uncultured bacterium JQ998423 445 5 374 1E-148 93% 93% 254210199 Bacteria n n uncultured bacterium JQ998634 514 5 512 0 99% 99% 254210200 Bacteria n n uncultured bacterium JQ997932 266 5 240 6E-110 97% 97% 254210211 Bacteria n n uncultured bacterium JQ997978 288 3 251 2E-115 98% 98% 254210213 Bacteria n n uncultured bacterium JQ998318 405 74 371 1E-133 96% 96% 254210221 Bacteria n n uncultured bacterium JQ998797 536 5 488 0 94% 94% 254210229 Bacteria n n uncultured bacterium JQ998837 539 8 526 0 97% 97% 254210230 Bacteria n n uncultured bacterium JQ998582 500 18 291 5E-133 99% 99% 254210233 Bacteria n n uncultured bacterium JQ998726 529 4 523 0 99% 99% 254210234 Bacteria n n uncultured bacterium JQ998756 533 2 530 0 99% 99% 254210245 Bacteria n n uncultured bacterium JQ998838 539 18 533 0 96% 96% 254210246 Bacteria n n uncultured bacterium JQ999010 551 18 547 0 98% 98% 254210252 Bacteria n n uncultured bacterium JQ998304 400 5 384 0 98% 98% 254210253 Bacteria n n uncultured bacterium JQ998685 524 5 487 0 97% 97% 254210256 Bacteria n n uncultured bacterium JQ999102 559 5 552 0 99% 99% 254210257 Bacteria n n uncultured bacterium JQ998955 547 5 540 0 97% 97% 254210258 Bacteria n n uncultured bacterium JQ998731 530 4 527 0 96% 96% 254210260 Bacteria n n uncultured bacterium JQ998180 361 5 329 8E-165 99% 99% 254210263 Bacteria n n uncultured bacterium JQ998216 372 5 323 1E-158 99% 99% 254210269 Bacteria n n uncultured bacterium JQ998566 494 5 436 0 98% 98% 254210270 Bacteria n n uncultured bacterium JQ998732 530 5 527 0 100% 100% 254210272 Bacteria n n uncultured bacterium JQ998823 538 15 532 0 94% 94% 254210274 Bacteria n n uncultured bacterium JQ998956 547 5 538 0 100% 100% 254210275 Bacteria n n uncultured bacterium JQ998717 528 17 486 0 99% 99% 254210279 Bacteria n n uncultured bacterium JQ998372 424 18 391 7E-161 94% 94% 254527898 Bacteria n n uncultured bacterium JQ997916 258 10 214 4E-77 93% 93% 254771243 Bacteria n n uncultured bacterium JQ998505 475 18 426 0 97% 97% 254771273 Bacteria n n uncultured bacterium JQ998677 523 5 485 0 98% 98% 254971490 Bacteria n n uncultured bacterium JQ998232 378 5 261 5E-122 98% 98% 255040912 Bacteria n n uncultured bacterium JQ998203 368 5 264 1E-132 100% 100% 255041408 Bacteria n n uncultured bacterium JQ998289 395 17 350 1E-173 100% 100% 255041571 Bacteria n n uncultured bacterium JQ998310 402 16 360 9E-180 100% 100% 255042010 Bacteria n n uncultured bacterium JQ998496 470 3 424 2E-166 93% 93% 255042296 Bacteria n n uncultured bacterium JQ998055 317 8 273 5E-136 100% 100% 255043198 Bacteria n n uncultured bacterium JQ998591 502 5 216 2E-82 93% 93% 255044557 Bacteria n n uncultured bacterium JQ998404 437 5 365 0 100% 100% 255045388 Bacteria n n uncultured bacterium JQ998116 340 5 287 2E-110 93% 93% 255045393 Bacteria n n uncultured bacterium JQ998638 515 5 439 0 97% 97% 255339762 Bacteria n n uncultured bacterium JQ999039 554 8 547 0 95% 95% 255339765 Bacteria n n uncultured bacterium JQ998599 504 17 463 2E-166 91% 91% 255689714 Bacteria n n uncultured bacterium JQ999021 552 3 552 0 99% 99% 255762955 Bacteria n n uncultured bacterium JQ998556 492 5 432 0 97% 97% 255976649 Bacteria n n uncultured bacterium JQ998237 379 18 326 7E-151 98% 98% 256352262 Bacteria n n uncultured bacterium JQ999028 553 5 553 0 95% 95% 256355274 Bacteria n n uncultured bacterium JQ998324 406 14 362 2E-180 100% 100% 256355277 Bacteria n n uncultured bacterium JQ998062 319 18 282 2E-106 94% 94% 256355286 Bacteria n n uncultured bacterium JQ999095 558 20 553 0 99% 99% 256592693 Bacteria n n uncultured bacterium JQ998381 428 26 373 3E-169 98% 98% 256592782 Bacteria n n uncultured bacterium JQ998068 322 9 77 5E-12 89% 89% 257130685 Bacteria n n uncultured bacterium JQ998164 356 15 309 1E-113 92% 92% 257130762 Bacteria n n uncultured bacterium JQ998010 300 5 260 4E-127 99% 99% 257131100 Bacteria n n uncultured bacterium JQ997885 241 37 199 1E-62 94% 94% 257131333 Bacteria n n uncultured bacterium JQ999212 681 5 157 2E-28 84% 84% 257131399 Bacteria n n uncultured bacterium JQ999077 557 17 548 0 99% 99% 257131579 Bacteria n n uncultured bacterium JQ998543 486 5 426 0 97% 97% 257131616 Bacteria n n uncultured bacterium JQ999056 555 18 512 0 94% 94% 257131817 Bacteria n n uncultured bacterium JQ998988 549 5 501 0 93% 93% 257132045 Bacteria n n uncultured bacterium JQ998942 546 17 500 0 95% 95% 257132142 Bacteria n n uncultured bacterium 183 Table S1 Cont.

JQ999119 561 18 557 0 100% 100% 257132354 Bacteria n n uncultured bacterium JQ999189 601 1 70 2E-23 97% 97% 257132566 Bacteria n n uncultured bacterium JQ997879 234 34 199 7E-69 96% 96% 257143803 Bacteria n n uncultured bacterium JQ998411 439 5 383 0 99% 99% 257144096 Bacteria n n uncultured bacterium JQ998615 507 15 480 0 99% 99% 257144395 Bacteria n n uncultured bacterium JQ998607 506 18 419 0 100% 100% 257358096 Bacteria n n uncultured bacterium JQ999158 571 5 563 0 97% 97% 258547754 Bacteria n n uncultured bacterium JQ998520 479 18 433 0 99% 99% 258547879 Bacteria n n uncultured bacterium JQ999175 577 17 470 0 97% 97% 258547880 Bacteria n n uncultured bacterium JQ998798 536 19 531 0 90% 90% 258547923 Bacteria n n uncultured bacterium JQ998839 539 5 499 6E-158 88% 88% 258547969 Bacteria n n uncultured bacterium JQ998656 518 17 507 0 98% 98% 258547993 Bacteria n n uncultured bacterium JQ999111 560 18 555 0 99% 99% 258548105 Bacteria n n uncultured bacterium JQ998718 528 5 526 0 99% 99% 258548557 Bacteria n n uncultured bacterium JQ998628 511 17 480 0 100% 100% 258548759 Bacteria n n uncultured bacterium JQ998890 543 5 541 0 90% 90% 258548879 Bacteria n n uncultured bacterium JQ998247 381 3 335 2E-151 96% 96% 258550027 Bacteria n n uncultured bacterium JQ998924 545 5 543 0 100% 100% 258550101 Bacteria n n uncultured bacterium JQ998536 484 1 422 0 94% 94% 258550255 Bacteria n n uncultured bacterium JQ997972 284 18 240 1E-102 98% 98% 258550277 Bacteria n n uncultured bacterium JQ998030 307 5 254 2E-105 95% 95% 258550675 Bacteria n n uncultured bacterium JQ998058 318 13 268 2E-130 100% 100% 258551222 Bacteria n n uncultured bacterium JQ998644 516 15 515 0 93% 93% 259027961 Bacteria n n uncultured bacterium JQ998799 536 282 536 6E-123 98% 98% 259880027 Bacteria n n uncultured bacterium JQ999103 559 17 536 0 96% 96% 259880124 Bacteria n n uncultured bacterium JQ998269 388 5 353 5E-167 98% 98% 259880130 Bacteria n n uncultured bacterium JQ998824 538 25 535 0 97% 97% 259880134 Bacteria n n uncultured bacterium JQ999162 572 18 506 0 98% 98% 259880150 Bacteria n n uncultured bacterium JQ997912 255 18 219 8E-59 88% 88% 259880207 Bacteria n n uncultured bacterium JQ998399 434 18 386 0 98% 98% 259880216 Bacteria n n uncultured bacterium JQ999120 561 22 556 0 99% 99% 259880659 Bacteria n n uncultured bacterium JQ998698 526 65 496 0 94% 94% 260080805 Bacteria n n uncultured bacterium JQ998800 536 4 536 0 94% 94% 260103508 Bacteria n n uncultured bacterium JQ998787 535 18 529 0 98% 98% 260103512 Bacteria n n uncultured bacterium JQ998856 541 18 485 0 99% 99% 260103513 Bacteria n n uncultured bacterium JQ997875 219 5 161 5E-25 83% 83% 260103519 Bacteria n n uncultured bacterium JQ998487 467 43 402 7E-122 89% 89% 260103520 Bacteria n n uncultured bacterium JQ998567 494 5 411 0 99% 99% 260103525 Bacteria n n uncultured bacterium JQ998614 507 18 433 0 98% 98% 260103526 Bacteria n n uncultured bacterium JQ998989 549 19 498 0 95% 95% 260103537 Bacteria n n uncultured bacterium JQ998699 526 17 522 0 97% 97% 260451472 Bacteria n n uncultured bacterium JQ998757 533 18 529 0 100% 100% 260600086 Bacteria n n uncultured bacterium JQ998482 466 5 428 1E-158 91% 91% 260609733 Bacteria n n uncultured bacterium JQ998825 538 18 533 0 98% 98% 260609736 Bacteria n n uncultured bacterium JQ998891 543 17 540 0 93% 93% 260609741 Bacteria n n uncultured bacterium JQ998909 544 19 543 0 99% 99% 260609766 Bacteria n n uncultured bacterium JQ998752 532 18 530 0 98% 98% 260609798 Bacteria n n uncultured bacterium JQ997905 252 5 204 2E-99 100% 100% 260609860 Bacteria n n uncultured bacterium JQ998957 547 5 341 3E-170 99% 99% 260609871 Bacteria n n uncultured bacterium JQ998910 544 18 541 0 96% 96% 260609902 Bacteria n n uncultured bacterium JQ998233 378 5 331 2E-151 97% 97% 260609911 Bacteria n n uncultured bacterium JQ998211 370 4 335 4E-108 89% 89% 260609943 Bacteria n n uncultured bacterium JQ999142 566 19 541 0 94% 94% 260610053 Bacteria n n uncultured bacterium JQ999011 551 5 546 0 98% 98% 260610111 Bacteria n n uncultured bacterium JQ998280 392 18 151 2E-62 100% 100% 260610172 Bacteria n n uncultured bacterium JQ998911 544 15 542 0 97% 97% 260610180 Bacteria n n uncultured bacterium JQ998059 318 18 272 7E-130 100% 100% 260610209 Bacteria n n uncultured bacterium JQ998532 483 18 304 2E-147 100% 100% 260610223 Bacteria n n uncultured bacterium JQ999159 571 17 415 4E-150 92% 92% 260610275 Bacteria n n uncultured bacterium JQ998871 542 17 540 0 98% 98% 260610340 Bacteria n n uncultured bacterium JQ998354 416 24 364 3E-169 99% 99% 261261807 Bacteria n n uncultured bacterium JQ998812 537 17 533 0 100% 100% 261261894 Bacteria n n uncultured bacterium JQ998592 502 18 438 0 99% 99% 261262014 Bacteria n n uncultured bacterium JQ999143 566 23 531 0 100% 100% 261262551 Bacteria n n uncultured bacterium JQ999150 568 5 488 0 100% 100% 261262602 Bacteria n n uncultured bacterium JQ999096 558 24 540 0 97% 97% 261262638 Bacteria n n uncultured bacterium JQ998753 532 18 532 0 99% 99% 261262880 Bacteria n n uncultured bacterium JQ998624 510 39 452 0 96% 96% 261262931 Bacteria n n uncultured bacterium JQ998521 479 18 434 0 100% 100% 261262996 Bacteria n n uncultured bacterium JQ999012 551 4 547 0 91% 91% 261263024 Bacteria n n uncultured bacterium JQ998501 474 5 408 0 100% 100% 261499577 Bacteria n n uncultured bacterium JQ998461 458 20 406 2E-161 94% 94% 261748993 Bacteria n n uncultured bacterium JQ999204 639 5 286 4E-105 92% 92% 262213280 Bacteria n n uncultured bacterium JQ998150 351 70 322 3E-104 94% 94% 262344293 Bacteria n n uncultured bacterium JQ998126 343 18 301 6E-141 99% 99% 262399014 Bacteria n n uncultured bacterium JQ998067 321 17 277 2E-131 100% 100% 264666385 Bacteria n n uncultured bacterium JQ998416 441 16 49 0.0000007 100% 100% 269175391 Bacteria n n uncultured bacterium JQ999136 564 5 561 0 98% 98% 269855868 Bacteria n n uncultured bacterium JQ998508 476 3 322 7E-127 93% 93% 269971070 Bacteria n n uncultured bacterium 184 Table S1 Cont.

JQ998530 482 5 432 2E-131 88% 88% 269971816 Bacteria n n uncultured bacterium JQ998047 315 5 245 1E-82 91% 91% 270094636 Bacteria n n uncultured bacterium JQ998497 472 18 87 1E-25 99% 99% 270097664 Bacteria n n uncultured bacterium JQ998622 509 18 464 0 99% 99% 274138243 Bacteria n n uncultured bacterium JQ998639 515 3 478 7E-132 86% 86% 281187385 Bacteria n n uncultured bacterium JQ998872 542 20 57 0.0000003 97% 97% 281187515 Bacteria n n uncultured bacterium JQ998011 300 21 251 3E-113 99% 99% 281308637 Bacteria n n uncultured bacterium JQ999104 559 4 559 0 88% 88% 281333182 Bacteria n n uncultured bacterium JQ998733 530 5 528 0 98% 98% 281413467 Bacteria n n uncultured bacterium JQ999078 557 16 549 0 93% 93% 281484434 Bacteria n n uncultured bacterium JQ998194 365 52 263 5E-97 98% 98% 281487069 Bacteria n n uncultured bacterium JQ998215 371 5 298 1E-87 88% 88% 281487151 Bacteria n n uncultured bacterium JQ998456 456 4 375 1E-133 91% 91% 281487241 Bacteria n n uncultured bacterium JQ998744 531 18 528 0 93% 93% 281488370 Bacteria n n uncultured bacterium JQ998091 328 18 125 8E-45 98% 98% 281488374 Bacteria n n uncultured bacterium JQ997921 262 5 210 2E-99 99% 99% 281488574 Bacteria n n uncultured bacterium JQ998488 467 3 69 2E-22 97% 97% 281489388 Bacteria n n uncultured bacterium JQ998294 397 2 341 1E-173 99% 99% 281489463 Bacteria n n uncultured bacterium JQ998003 298 5 106 1E-23 89% 89% 283130988 Bacteria n n uncultured bacterium JQ999079 557 18 542 0 94% 94% 283765019 Bacteria n n uncultured bacterium JQ998734 530 5 481 2E-173 90% 90% 283765052 Bacteria n n uncultured bacterium JQ998678 523 18 480 0 100% 100% 283776532 Bacteria n n uncultured bacterium JQ999069 556 24 554 0 97% 97% 284025684 Bacteria n n uncultured bacterium JQ998840 539 15 374 3E-180 99% 99% 284025703 Bacteria n n uncultured bacterium JQ999080 557 25 516 0 95% 95% 284158321 Bacteria n n uncultured bacterium JQ998331 408 5 369 3E-169 96% 96% 284158446 Bacteria n n uncultured bacterium JQ997988 293 18 251 1E-111 98% 98% 284158467 Bacteria n n uncultured bacterium JQ998958 547 21 546 0 99% 99% 284158493 Bacteria n n uncultured bacterium JQ998990 549 17 543 0 99% 99% 284158495 Bacteria n n uncultured bacterium JQ998220 374 5 302 1E-143 98% 98% 284466811 Bacteria n n uncultured bacterium JQ998242 380 5 301 5E-147 99% 99% 284944606 Bacteria n n uncultured bacterium JQ998777 534 18 500 0 93% 93% 285016444 Bacteria n n uncultured bacterium JQ999191 603 1 599 0 90% 90% 285960213 Bacteria n n uncultured bacterium JQ998365 421 17 354 2E-172 99% 99% 285960215 Bacteria n n uncultured bacterium JQ998672 522 5 516 0 96% 96% 285960268 Bacteria n n uncultured bacterium JQ999133 563 9 558 0 97% 97% 285960443 Bacteria n n uncultured bacterium JQ998912 544 8 388 0 99% 99% 285960644 Bacteria n n uncultured bacterium JQ998287 394 5 102 6E-37 97% 97% 285960840 Bacteria n n uncultured bacterium JQ998511 477 5 430 2E-156 91% 91% 285960899 Bacteria n n uncultured bacterium JQ998325 406 18 357 5E-177 100% 100% 288551167 Bacteria n n uncultured bacterium JQ999013 551 5 546 0 99% 99% 288551183 Bacteria n n uncultured bacterium JQ998700 526 5 221 2E-108 100% 100% 289185876 Bacteria n n uncultured bacterium JQ998083 326 19 262 4E-122 100% 100% 289186586 Bacteria n n uncultured bacterium JQ998122 342 17 287 1E-138 100% 100% 289429664 Bacteria n n uncultured bacterium JQ998333 409 21 359 4E-128 92% 92% 289576451 Bacteria n n uncultured bacterium JQ998270 389 5 331 8E-165 99% 99% 289594430 Bacteria n n uncultured bacterium JQ999081 557 5 557 0 93% 93% 289594438 Bacteria n n uncultured bacterium JQ998972 548 7 473 6E-108 82% 82% 290564637 Bacteria n n uncultured bacterium JQ998529 482 18 414 0 99% 99% 290586931 Bacteria n n uncultured bacterium JQ998424 445 10 381 4E-134 90% 90% 290599218 Bacteria n n uncultured bacterium JQ998409 438 111 145 0.0000002 100% 100% 290603331 Bacteria n n uncultured bacterium JQ998281 392 5 320 6E-87 87% 87% 290607302 Bacteria n n uncultured bacterium JQ999198 615 136 312 6E-44 87% 87% 290608049 Bacteria n n uncultured bacterium JQ997908 254 14 207 5E-96 100% 100% 290609113 Bacteria n n uncultured bacterium JQ998745 531 40 483 1E-179 93% 93% 290610030 Bacteria n n uncultured bacterium JQ998925 545 41 447 0 95% 95% 290610134 Bacteria n n uncultured bacterium JQ999221 961 5 45 2E-10 100% 100% 290615238 Bacteria n n uncultured bacterium JQ998686 524 69 475 0 96% 96% 290615266 Bacteria n n uncultured bacterium JQ998857 541 34 502 0 96% 96% 290616525 Bacteria n n uncultured bacterium JQ998537 484 3 430 0 100% 100% 290616865 Bacteria n n uncultured bacterium JQ998099 333 144 301 6E-71 98% 98% 290617111 Bacteria n n uncultured bacterium JQ998788 535 229 508 1E-140 99% 99% 290617937 Bacteria n n uncultured bacterium JQ999040 554 19 454 3E-175 92% 92% 290618217 Bacteria n n uncultured bacterium JQ998297 398 16 322 5E-127 94% 94% 290619377 Bacteria n n uncultured bacterium JQ998926 545 24 500 0 93% 93% 290619838 Bacteria n n uncultured bacterium JQ999029 553 29 514 0 94% 94% 290619849 Bacteria n n uncultured bacterium JQ998506 475 18 404 0 99% 99% 290619868 Bacteria n n uncultured bacterium JQ998151 351 5 296 1E-123 95% 95% 290619952 Bacteria n n uncultured bacterium JQ998578 497 11 440 1E-129 87% 87% 290620581 Bacteria n n uncultured bacterium JQ998169 358 24 293 4E-123 97% 97% 290621172 Bacteria n n uncultured bacterium JQ998657 518 109 457 7E-152 95% 95% 290621918 Bacteria n n uncultured bacterium JQ997973 284 18 240 4E-97 96% 96% 290621930 Bacteria n n uncultured bacterium JQ998170 358 5 274 4E-128 98% 98% 290624086 Bacteria n n uncultured bacterium JQ998254 383 20 339 7E-151 97% 97% 290624805 Bacteria n n uncultured bacterium JQ999220 901 18 92 2E-25 97% 97% 290626899 Bacteria n n uncultured bacterium JQ997969 283 18 181 9E-64 94% 94% 290628890 Bacteria n n uncultured bacterium JQ999170 574 17 335 2E-158 99% 99% 290629721 Bacteria n n uncultured bacterium JQ997894 246 18 214 1E-87 97% 97% 290770329 Bacteria n n uncultured bacterium JQ999205 655 1 68 6E-24 99% 99% 291060461 Bacteria n n uncultured bacterium 185 Table S1 Cont.

JQ998171 358 18 243 2E-100 96% 96% 291060512 Bacteria n n uncultured bacterium JQ998692 525 17 482 0 100% 100% 291192718 Bacteria n n uncultured bacterium JQ997955 275 5 243 3E-88 92% 92% 291192723 Bacteria n n uncultured bacterium JQ999144 566 5 564 0 97% 97% 291192725 Bacteria n n uncultured bacterium JQ998927 545 7 542 0 99% 99% 291192726 Bacteria n n uncultured bacterium JQ998457 456 18 411 0 97% 97% 291192727 Bacteria n n uncultured bacterium JQ998758 533 18 497 0 94% 94% 291192731 Bacteria n n uncultured bacterium JQ998991 549 17 541 0 97% 97% 291192733 Bacteria n n uncultured bacterium JQ998813 537 18 531 0 99% 99% 291192736 Bacteria n n uncultured bacterium JQ999151 568 17 550 0 94% 94% 291192737 Bacteria n n uncultured bacterium JQ998759 533 17 522 0 98% 98% 291192739 Bacteria n n uncultured bacterium JQ999041 554 8 545 0 94% 94% 291192741 Bacteria n n uncultured bacterium JQ998332 408 9 361 2E-132 91% 91% 291192751 Bacteria n n uncultured bacterium JQ998402 436 5 405 3E-179 95% 95% 291247631 Bacteria n n uncultured bacterium JQ998117 340 18 298 1E-142 100% 100% 291251358 Bacteria n n uncultured bacterium JQ998640 515 5 179 4E-85 100% 100% 291277719 Bacteria n n uncultured bacterium JQ998544 486 5 132 4E-59 100% 100% 291277728 Bacteria n n uncultured bacterium JQ997952 274 2 229 1E-111 99% 99% 291507687 Bacteria n n uncultured bacterium JQ998600 504 17 298 4E-139 99% 99% 291507688 Bacteria n n uncultured bacterium JQ998161 353 4 311 2E-150 98% 98% 294478352 Bacteria n n uncultured bacterium JQ999188 597 17 593 0 92% 92% 294478540 Bacteria n n uncultured bacterium JQ997922 262 5 218 2E-70 90% 90% 294478555 Bacteria n n uncultured bacterium JQ999215 700 16 265 5E-35 80% 80% 294478594 Bacteria n n uncultured bacterium JQ998189 364 5 169 4E-63 94% 94% 294478603 Bacteria n n uncultured bacterium JQ998156 352 17 301 1E-128 96% 96% 294478693 Bacteria n n uncultured bacterium JQ997920 261 5 160 4E-67 97% 97% 294514723 Bacteria n n uncultured bacterium JQ998525 480 5 423 0 100% 100% 294514757 Bacteria n n uncultured bacterium JQ998760 533 28 421 4E-144 91% 91% 294652611 Bacteria n n uncultured bacterium JQ998512 477 10 370 3E-120 90% 90% 294663720 Bacteria n n uncultured bacterium JQ998343 412 25 367 5E-177 100% 100% 294719638 Bacteria n n uncultured bacterium JQ998186 363 5 289 1E-133 98% 98% 294998181 Bacteria n n uncultured bacterium JQ997902 250 5 147 1E-67 100% 100% 295005427 Bacteria n n uncultured bacterium JQ997895 246 18 102 4E-27 94% 94% 295027798 Bacteria n n uncultured bacterium JQ997941 269 19 223 4E-82 94% 94% 295029932 Bacteria n n uncultured bacterium JQ997948 272 3 220 5E-106 99% 99% 295322327 Bacteria n n uncultured bacterium JQ998125 343 5 299 8E-150 100% 100% 295394073 Bacteria n n uncultured bacterium JQ998735 530 22 487 0 93% 93% 295809979 Bacteria n n uncultured bacterium JQ998049 315 5 283 2E-96 90% 90% 295810133 Bacteria n n uncultured bacterium JQ998557 492 5 423 0 100% 100% 295810494 Bacteria n n uncultured bacterium JQ998192 364 5 300 6E-136 97% 97% 295810605 Bacteria n n uncultured bacterium JQ998568 494 5 437 0 100% 100% 295814828 Bacteria n n uncultured bacterium JQ999500 399 49 354 4E-153 99% 99% 62997524 Bacteria n n uncultured bacterium JQ999496 429 5 392 1E-158 93% 93% 219962362 Bacteria n n uncultured bacterium JQ999495 370 5 337 2E-151 96% 96% 223030862 Bacteria n n uncultured bacterium JQ999498 426 18 394 7E-176 97% 97% 223033601 Bacteria n n uncultured bacterium JQ999499 521 27 473 0 96% 96% 223034843 Bacteria n n uncultured bacterium JQ999497 564 18 527 0 98% 98% 226903461 Bacteria n n uncultured bacterium JQ998382 207 1 194 8E-94 99% 99% 219529249 Bacteria n n uncultured bacterium JQ998439 242 1 242 1E-112 98% 98% 225337346 Bacteria n n uncultured bacterium JQ998474 236 1 236 1E-107 97% 97% 322226064 Bacteria n n uncultured bacterium JQ998513 231 1 218 1E-101 98% 98% 169288388 Bacteria n n uncultured bacterium JQ998789 224 1 224 5E-111 99% 99% 377550288 Bacteria n n uncultured bacterium JQ998981 268 1 262 4E-108 95% 95% 322177659 Bacteria n n uncultured bacterium JQ999169 428 1 428 3E-160 91% 91% 322206429 Bacteria n n uncultured bacterium JQ999171 243 1 240 1E-57 86% 86% 325960808 Bacteria n n uncultured bacterium JQ999202 246 1 246 3E-98 94% 94% 296951465 Bacteria n n uncultured bacterium JQ999222 362 18 317 2E-151 99% 99% 105990434 Bacteria n n uncultured candidate division WYO bacterium JQ999223 446 24 381 3E-135 91% 91% 20975333 Bacteria n n uncultured chicken cecal bacterium JQ999224 332 24 287 7E-135 100% 100% 295018012 Bacteria n n uncultured compost bacterium JQ999231 856 5 90 2E-20 90% 90% 295018042 Bacteria n n uncultured compost bacterium JQ999225 471 20 420 0 100% 100% 295018051 Bacteria n n uncultured compost bacterium JQ999227 531 5 529 0 98% 98% 295018127 Bacteria n n uncultured compost bacterium JQ999230 541 155 530 0 98% 98% 295018129 Bacteria n n uncultured compost bacterium JQ999229 539 5 537 0 100% 100% 295018133 Bacteria n n uncultured compost bacterium JQ999228 535 18 517 0 99% 99% 295026878 Bacteria n n uncultured compost bacterium JQ999226 511 17 442 0 100% 100% 295027011 Bacteria n n uncultured compost bacterium JQ999233 528 18 513 0 96% 96% 34525928 Bacteria n n uncultured Gram-positive bacterium JQ999234 540 110 513 2E-143 90% 90% 56541536 Bacteria n n uncultured Gram-positive bacterium JQ999232 501 7 434 1E-159 91% 91% 253721751 Bacteria n n uncultured Gram-positive bacterium JQ999235 247 5 212 3E-102 100% 100% 270305386 Bacteria n n uncultured marine bacterium JQ999239 460 4 381 0 99% 99% 284467687 Bacteria n n uncultured marine bacterium JQ999236 300 16 268 4E-127 100% 100% 284467695 Bacteria n n uncultured marine bacterium JQ999237 398 17 346 9E-170 100% 100% 284467703 Bacteria n n uncultured marine bacterium JQ999238 432 5 385 0 99% 99% 284468021 Bacteria n n uncultured marine bacterium symbiont Acropora eurystoma exposed to pH 8.2 JQ999240 421 18 329 6E-152 98% 98% 150022096 Bacteria n n uncultured rumen bacterium JQ999242 484 26 169 4E-64 99% 99% 283981089 Bacteria n n uncultured rumen bacterium JQ999241 469 22 437 6E-177 94% 94% 283982045 Bacteria n n uncultured rumen bacterium JQ999245 382 24 332 4E-158 100% 100% 16517886 Bacteria n n uncultured soil bacterium 186 Table S1 Cont.

JQ999247 474 16 402 7E-117 88% 88% 81022835 Bacteria n n uncultured soil bacterium JQ999250 543 5 538 0 92% 92% 83285238 Bacteria n n uncultured soil bacterium JQ999249 541 5 541 0 96% 96% 87243108 Bacteria n n uncultured soil bacterium JQ999244 331 4 289 7E-135 97% 97% 109391696 Bacteria n n uncultured soil bacterium JQ999246 452 18 412 0 100% 100% 194475432 Bacteria n n uncultured soil bacterium JQ999248 504 5 454 0 96% 96% 194475447 Bacteria n n uncultured soil bacterium JQ999252 555 1 326 7E-167 100% 100% 194475452 Bacteria n n uncultured soil bacterium JQ999243 329 17 293 4E-132 98% 98% 239737157 Bacteria n n uncultured soil bacterium JQ999251 554 18 535 0 99% 99% 260750894 Bacteria n n uncultured soil bacterium JQ999253 324 15 107 5E-37 98% 98% 6018249 Bacteria n n uncultured sponge symbiont PAUC32f JQ999254 352 95 315 7E-61 87% 87% 6453690 Bacteria n n unidentified eubacterium clone BSV70 JQ999255 535 4 516 0 91% 91% 218533752 Bacteria Planctomycetes n uncultured planctomycete JQ999256 543 18 493 0 100% 100% 224027504 Bacteria Proteobacteria (alpha) Caulobacteraceae Brevundimonas sp. AKB-2008-JO46 JQ999257 532 5 424 0 98% 98% 239056019 Bacteria Proteobacteria (alpha) Caulobacteraceae Brevundimonas sp. MCS 35 JQ999258 525 19 471 0 99% 99% 295809779 Bacteria Proteobacteria (alpha) Caulobacteraceae Brevundimonas sp. V3M6 JQ999259 367 16 329 5E-152 98% 98% 288908581 Bacteria Proteobacteria (alpha) Caulobacteraceae Caulobacter sp. cau1 JQ999260 283 5 164 9E-69 97% 97% 254933869 Bacteria Proteobacteria (alpha) n alpha proteobacterium EX129 JQ999261 555 16 550 0 99% 99% 197360272 Bacteria Proteobacteria (alpha) n uncultured alpha proteobacterium JQ998943 546 5 545 0 95% 95% 237973923 Bacteria Proteobacteria (alpha) n uncultured bacterium JQ999262 548 20 529 0 96% 96% 285026366 Bacteria Proteobacteria (alpha) vulgare JQ999263 564 17 554 0 97% 97% 73622359 Bacteria Proteobacteria (alpha) Methylobacteriaceae Methylobacterium sp. iRIV1 JQ999264 275 5 243 1E-117 99% 99% 186923312 Bacteria Proteobacteria (alpha) Methylobacteriaceae uncultured Methylobacterium sp. JQ999265 550 18 548 0 99% 99% 242963821 Bacteria Proteobacteria (alpha) uncultured Agrobacterium sp. JQ999266 548 5 543 0 90% 90% 164598042 Bacteria Proteobacteria (alpha) Rhizobiaceae uncultured Rhizobium sp. JQ999267 538 24 331 7E-147 98% 98% 295651547 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. HMD3141 JQ999268 313 10 267 4E-117 97% 97% 38425236 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. J364 JQ999269 571 18 527 0 98% 98% 194395238 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. JLT1284 JQ999270 529 16 528 0 100% 100% 237638489 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. MC5-8 JQ999271 293 5 237 2E-115 100% 100% 294662650 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. PS31_2010_ JQ999272 341 30 293 8E-135 100% 100% 289064903 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. sptzw33 JQ999273 556 3 302 2E-138 97% 97% 139001937 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. SSRW9-1 JQ999274 334 18 190 3E-79 98% 98% 158392748 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus sp. YT0095 JQ999275 386 18 353 1E-158 97% 97% 56266597 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus versutus JQ999276 503 18 457 0 100% 100% 206581410 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus yeei JQ999277 519 13 469 0 100% 100% 125656032 Bacteria Proteobacteria (alpha) Rhodobacteraceae Rhodobacter changlensis JQ999278 531 18 529 0 96% 96% 282934988 Bacteria Proteobacteria (alpha) Rhodobacteraceae Rhodobacter sp. RC5-103 JQ999279 558 17 410 0 99% 99% 71844062 Bacteria Proteobacteria (alpha) Rhodobacteraceae uncultured Amaricoccus sp. JQ999280 558 16 530 0 93% 93% 50364319 Bacteria Proteobacteria (alpha) Rhodobacteraceae uncultured Sulfitobacter sp. JQ999281 352 9 286 1E-103 92% 92% 197359977 Bacteria Proteobacteria (alpha) Erythrobacteraceae uncultured Porphyrobacter sp. JQ999282 519 5 474 0 100% 100% 197114179 Bacteria Proteobacteria (alpha) Sphingomonadaceae Sphingomonas dokdonensis JQ999283 271 5 238 5E-116 100% 100% 294992056 Bacteria Proteobacteria (alpha) Sphingomonadaceae Sphingomonas sp. MJ528 JQ999284 388 21 352 2E-135 94% 94% 284434472 Bacteria Proteobacteria (alpha) Sphingomonadaceae Sphingomonas sp. NMC17 JQ999285 574 18 566 0 96% 96% 158935501 Bacteria Proteobacteria (alpha) Sphingomonadaceae Sphingomonas sp. PA218 JQ999286 591 118 263 3E-26 84% 84% 186925070 Bacteria Proteobacteria (alpha) Sphingomonadaceae uncultured Sphingomonadaceae bacterium JQ999287 342 18 295 1E-132 98% 98% 192803975 Bacteria Proteobacteria (alpha) Sphingomonadaceae uncultured Sphingomonas sp. JQ999288 499 24 423 0 98% 98% 209420935 Bacteria Proteobacteria (alpha) Sphingomonadaceae uncultured Sphingomonas sp. JQ999290 552 3 453 3E-161 90% 90% 209421510 Bacteria Proteobacteria (alpha) Sphingomonadaceae uncultured Sphingomonas sp. JQ999289 531 136 525 0 99% 99% 209423311 Bacteria Proteobacteria (alpha) Sphingomonadaceae uncultured Sphingomonas sp. JQ999291 411 5 246 2E-111 98% 98% 44194288 Bacteria Proteobacteria (beta) Burkholderia sp. JQ999292 562 18 558 0 97% 97% 118139431 Bacteria Proteobacteria (beta) Burkholderiaceae Burkholderia sp. Brij35 JQ999293 556 5 357 1E-104 88% 88% 246771317 Bacteria Proteobacteria (beta) Burkholderiaceae Burkholderia sp. LD-11 JQ999296 465 4 295 3E-145 99% 99% 254972620 Bacteria Proteobacteria (beta) Burkholderiaceae uncultured Burkholderia sp. JQ999295 430 5 190 1E-89 99% 99% 295656441 Bacteria Proteobacteria (beta) Burkholderiaceae uncultured Burkholderia sp. JQ999297 542 18 475 9E-161 91% 91% 168812099 Bacteria Proteobacteria (beta) Comamonadaceae Acidovorax defluvii JQ999298 456 1 261 2E-131 100% 100% 109659435 Bacteria Proteobacteria (beta) Comamonadaceae Caldimonas hydrothermale JQ999299 539 5 534 0 100% 100% 189047087 Bacteria Proteobacteria (beta) Comamonadaceae Caldimonas manganoxidans JQ999300 538 18 535 0 99% 99% 290350907 Bacteria Proteobacteria (beta) Comamonadaceae Comamonadaceae bacterium Gu-R-8 JQ999302 532 18 532 0 98% 98% 5007060 Bacteria Proteobacteria (beta) Comamonadaceae Delftia acidovorans JQ999301 264 5 219 5E-91 95% 95% 15529695 Bacteria Proteobacteria (beta) Comamonadaceae Delftia acidovorans JQ999303 569 17 568 0 100% 100% 213536827 Bacteria Proteobacteria (beta) Comamonadaceae Delftia acidovorans JQ999304 345 69 301 6E-111 98% 98% 282892628 Bacteria Proteobacteria (beta) Comamonadaceae Diaphorobacter sp. DNB7 JQ999305 252 4 206 4E-101 100% 100% 294828897 Bacteria Proteobacteria (beta) Comamonadaceae uncultured Polaromonas sp. JQ999306 502 15 468 0 96% 96% 153869451 Bacteria Proteobacteria (beta) Oxalobacteraceae Herbaspirillum huttiense JQ999307 412 25 366 2E-176 100% 100% 62183809 Bacteria Proteobacteria (beta) Oxalobacteraceae Herbaspirillum sp. B601 JQ999308 546 117 521 1E-159 92% 92% 121488021 Bacteria Proteobacteria (beta) Sutterellaceae Sutterella morbirenis JQ999309 551 6 548 0 98% 98% 215981578 Bacteria Proteobacteria (beta) Sutterellaceae uncultured Sutterella sp. JQ999310 443 18 373 0 99% 99% 295656273 Bacteria Proteobacteria (beta) n beta proteobacterium enrichment culture clone VNAB098 JQ999311 527 15 522 0 99% 99% 85002019 Bacteria Proteobacteria (beta) n Denitrobacter sp. BBTR53 JQ999313 540 19 534 0 100% 100% 154192572 Bacteria Proteobacteria (beta) n uncultured beta proteobacterium JQ999312 426 5 367 6E-177 98% 98% 219880888 Bacteria Proteobacteria (beta) n uncultured beta proteobacterium JQ999314 526 18 524 0 99% 99% 290759918 Bacteria Proteobacteria (beta) Neisseria flava JQ999315 544 3 544 0 96% 96% 60500789 Bacteria Proteobacteria (beta) Neisseriaceae uncultured Neisseria sp. JQ999316 547 22 547 0 99% 99% 238915008 Bacteria Proteobacteria (beta) Neisseriaceae uncultured Neisseria sp. JQ999317 535 18 529 0 90% 90% 223036385 Bacteria Proteobacteria (beta) Nitrosomonadaceae uncultured Nitrosomonas sp. JQ999494 222 1 222 8E-104 98% 98% 34604519 Bacteria Proteobacteria (delta) Bacteriovorax sp. EPC3 JQ999318 550 5 508 0 94% 94% 34604519 Bacteria Proteobacteria (delta) Bacteriovoracaceae Bacteriovorax sp. EPC3 JQ999319 464 18 431 1E-144 89% 89% 284428338 Bacteria Proteobacteria (delta) Pelobacteraceae uncultured Pelobacter sp. JQ999320 556 18 548 0 98% 98% 61201821 Bacteria Proteobacteria (delta) n uncultured Myxococcales bacterium 187 Table S1 Cont.

JQ998693 525 18 502 0 95% 95% 237948946 Bacteria Proteobacteria (delta) n uncultured bacterium JQ999321 647 4 333 5E-154 97% 97% 290759912 Bacteria Proteobacteria (epsilon) Campylobacteraceae Campylobacter concisus JQ999322 336 4 287 2E-131 97% 97% 41400321 Bacteria Proteobacteria (epsilon) Helicobacteraceae uncultured Helicobacter sp. JQ999324 394 24 346 2E-112 90% 90% 254772196 Bacteria Proteobacteria (gamma) Aestuariibacter sp. PaD1.07 JQ999325 334 8 261 2E-106 95% 95% 219846182 Bacteria Proteobacteria (gamma) Alteromonadaceae Microbulbifer maritimus JQ999326 463 18 390 0 98% 98% 239775314 Bacteria Proteobacteria (gamma) Pseudoalteromonadaceae Pseudoalteromonas sp. Ld19 JQ999328 423 5 349 2E-147 95% 95% 257123155 Bacteria Proteobacteria (gamma) Chromatiaceae Rheinheimera sp. HMD2012 JQ999329 244 18 202 4E-91 100% 100% 220936495 Bacteria Proteobacteria (gamma) Enterobacteriaceae Klebsiella sp. C611 JQ999330 278 5 225 5E-111 100% 100% 33334429 Bacteria Proteobacteria (gamma) Enterobacteriaceae primary endosymbiont of Sitophilus zeamais JQ999331 244 6 182 3E-82 98% 98% 215981570 Bacteria Proteobacteria (gamma) Enterobacteriaceae uncultured Shigella sp. JQ999332 612 306 601 2E-84 87% 87% 194368434 Bacteria Proteobacteria (gamma) n gamma proteobacterium B-WPhS5 JQ999333 597 12 61 1E-15 100% 100% 238623540 Bacteria Proteobacteria (gamma) n gamma proteobacterium C48 UNDR-2009 JQ999334 386 5 258 3E-100 93% 93% 291195498 Bacteria Proteobacteria (gamma) n gamma proteobacterium enrichment culture clone BF35-3 JQ999335 354 18 271 1E-122 98% 98% 291195511 Bacteria Proteobacteria (gamma) n gamma proteobacterium enrichment culture clone JF9 3 JQ999343 510 18 463 0 100% 100% 22135588 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999341 505 14 446 3E-165 91% 91% 90995220 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999342 507 18 449 0 95% 95% 148615128 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999344 524 5 521 0 96% 96% 148723984 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999346 569 18 565 0 98% 98% 152003521 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999340 478 18 425 5E-168 93% 93% 218533821 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999347 721 1 65 3E-22 98% 98% 225031789 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999337 320 5 276 6E-116 95% 95% 238953195 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999338 383 18 363 4E-173 99% 99% 239835504 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999345 526 5 474 0 100% 100% 263040682 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999336 286 18 248 1E-116 100% 100% 295791645 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium JQ999348 565 1 559 0 91% 91% 14334262 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium MB11B08 JQ999349 541 5 538 0 99% 99% 283486725 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. 2029 JQ999350 351 18 307 5E-132 96% 96% 283486726 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. 2034 JQ999351 272 18 232 1E-107 100% 100% 295393594 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. AS-11 JQ999352 335 5 164 2E-75 99% 99% 285027202 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. G5 1-2 JQ999353 527 89 523 5E-173 92% 92% 290457130 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. JW2.4a JQ999354 252 5 197 1E-91 99% 99% 186702557 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. NY93B JQ999355 307 16 258 2E-81 90% 90% 158933844 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. VB93 JQ999356 272 5 220 1E-106 100% 100% 32127599 Bacteria Proteobacteria (gamma) Halomonadaceae uncultured Halomonas sp. JQ999357 446 16 407 3E-179 96% 96% 52424026 Bacteria Proteobacteria (gamma) Halomonadaceae uncultured Halomonas sp. JQ999358 579 19 572 0 99% 99% 290759922 Bacteria Proteobacteria (gamma) Haemophilus haemolyticus JQ999359 538 170 534 7E-167 96% 96% 290759923 Bacteria Proteobacteria (gamma) Pasteurellaceae Haemophilus parahaemolyticus JQ999362 360 18 315 6E-151 99% 99% 289186780 Bacteria Proteobacteria (gamma) Acinetobacter sp. QT15 JQ999363 391 18 346 3E-164 99% 99% 228007453 Bacteria Proteobacteria (gamma) Moraxellaceae Enhydrobacter sp. KB3-12 JQ999364 573 15 572 0 90% 90% 284434468 Bacteria Proteobacteria (gamma) Moraxellaceae Enhydrobacter sp. NMC13 JQ999365 659 51 211 3E-52 91% 91% 158343584 Bacteria Proteobacteria (gamma) Moraxellaceae atlantae JQ999366 562 17 561 1E-179 88% 88% 102620805 Bacteria Proteobacteria (gamma) Moraxellaceae JQ999367 584 18 555 0 98% 98% 98975329 Bacteria Proteobacteria (gamma) Moraxellaceae Moraxella bovoculi JQ999368 587 18 586 0 98% 98% 98975330 Bacteria Proteobacteria (gamma) Moraxellaceae Moraxella bovoculi JQ999369 561 18 555 0 94% 94% 213536823 Bacteria Proteobacteria (gamma) Moraxellaceae JQ999370 262 18 186 2E-80 99% 99% 285192491 Bacteria Proteobacteria (gamma) Moraxellaceae Moraxella osloensis JQ999371 711 14 390 3E-147 92% 92% 102620806 Bacteria Proteobacteria (gamma) Moraxellaceae Moraxella ovis JQ999372 531 24 531 0 100% 100% 294992072 Bacteria Proteobacteria (gamma) Moraxellaceae Moraxella sp. MJ616 JQ999373 382 14 335 3E-149 97% 97% 198404097 Bacteria Proteobacteria (gamma) Moraxellaceae Moraxella sp. WPCB001 JQ999374 362 17 305 5E-147 100% 100% 164506999 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter faecalis JQ999375 535 42 534 0 100% 100% 38601962 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter frigidicola JQ999376 508 18 457 0 99% 99% 168812014 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter maritimus JQ999377 543 17 540 0 98% 98% 240129723 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter maritimus JQ999378 298 5 198 1E-27 80% 80% 6691640 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter pacificensis JQ999379 474 17 408 8E-166 94% 94% 85001939 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter psychrophilus JQ999381 531 7 465 0 97% 97% 121483518 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter pulmonis JQ999380 415 4 367 0 100% 100% 284813477 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter pulmonis JQ999382 537 5 415 0 99% 99% 259121295 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. Air226 JQ999383 531 5 531 0 94% 94% 82547996 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. B-3151 JQ999384 496 23 462 1E-178 93% 93% 116688012 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. B6 JQ999385 485 5 432 0 100% 100% 291072775 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. BSw20995 JQ999386 472 17 409 0 99% 99% 291072776 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. BSw21055 JQ999387 523 18 522 0 93% 93% 291072779 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. BSw21072 JQ999388 540 3 460 0 100% 100% 291293761 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. enrichment culture clone B1-3 JQ999389 548 22 548 0 96% 96% 291293763 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. enrichment culture clone B2-3 JQ999390 523 17 520 0 100% 100% 291293769 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. enrichment culture clone B5-2 JQ999391 406 19 358 2E-175 100% 100% 34525843 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. es9 JQ999392 361 10 310 1E-152 99% 99% 294997060 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. JT05 JQ999393 542 18 523 0 90% 90% 133740735 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. Nj-36 JQ999394 536 5 532 0 100% 100% 158562692 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. NP43 JQ999395 509 18 476 0 100% 100% 154103703 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. P11-B-9 JQ999396 312 2 249 3E-124 100% 100% 218203793 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. SCSWD17 JQ999397 442 3 302 2E-137 97% 97% 196050561 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. ThV-A JQ999398 477 8 414 0 99% 99% 89257986 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. TSBY-37 JQ999399 301 18 268 5E-126 100% 100% 225055455 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. UST050418-045 JQ999400 555 5 540 0 95% 95% 209421415 Bacteria Proteobacteria (gamma) Moraxellaceae uncultured Enhydrobacter sp. JQ999401 419 4 364 7E-166 96% 96% 186926363 Bacteria Proteobacteria (gamma) Moraxellaceae uncultured Moraxellaceae bacterium 188 Table S1 Cont.

JQ999402 320 36 161 2E-51 97% 97% 78354988 Bacteria Proteobacteria (gamma) Moraxellaceae uncultured Psychrobacter sp. JQ999403 468 2 360 1E-114 89% 89% 284010071 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonadaceae bacterium IZ2 JQ999406 270 157 239 2E-34 100% 100% 1907095 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas asplenii JQ999407 550 5 525 0 99% 99% 22217940 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas cf. stutzeri V4.MO.16 JQ999409 526 18 505 0 93% 93% 293628582 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas lutea JQ999410 552 18 445 0 100% 100% 294999830 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas panipatensis JQ999418 463 5 404 7E-117 88% 88% 52139970 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. 14III/A01/008 JQ999419 538 17 510 0 100% 100% 195969270 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. 47 JQ999420 408 36 375 2E-157 96% 96% 15778356 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. 5.1 JQ999421 468 5 371 0 99% 99% 289655714 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. 64B-38 JQ999422 539 17 481 0 97% 97% 78038858 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. 7325 JQ999423 432 12 288 8E-136 99% 99% 28932767 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. A_wp02211 JQ999424 524 15 484 0 95% 95% 295027170 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. ADR45 JQ999425 503 20 431 7E-172 94% 94% 295027184 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. ADR62 JQ999426 509 4 460 0 100% 100% 195364246 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. B6_2008_ JQ999427 562 186 528 2E-142 94% 94% 116248064 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. BL4 JQ999428 306 18 267 4E-117 98% 98% 164419540 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. BSi20432 JQ999429 551 17 489 6E-163 90% 90% 239829379 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. c246 JQ999430 271 15 182 6E-80 99% 99% 151564445 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. EGU641 JQ999431 539 24 535 0 99% 99% 295345518 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. enrichment culture clone 13.1 JQ999432 373 5 325 2E-166 100% 100% 117551063 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. G1016 JQ999433 538 3 535 0 100% 100% 294861179 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. GN33-1 JQ999434 504 15 453 0 94% 94% 117582528 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. Lin 2-2 JQ999435 555 5 532 0 97% 97% 270048071 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. ljh-6 JQ999436 472 5 398 2E-176 96% 96% 241911410 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. PUT JQ999437 541 7 449 1E-164 91% 91% 290751426 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. R-27204 JQ999438 520 18 480 0 100% 100% 152061209 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. SCT JQ999439 350 5 303 1E-83 86% 86% 21327145 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. SoO1 JQ999440 556 32 555 0 97% 97% 283837557 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. StFLB155 JQ999441 528 15 514 0 99% 99% 284999744 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. SU19 JQ999442 563 18 560 0 100% 100% 158699333 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. SY6 JQ999443 307 4 269 5E-136 100% 100% 259090477 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. VS05_10 JQ999444 550 8 530 0 97% 97% 259090490 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. VS05_24 JQ999445 537 18 529 0 94% 94% 34525812 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. wp17 JQ999446 301 18 261 1E-121 100% 100% 295003953 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. WP6 JQ999447 504 17 467 0 93% 93% 213493549 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. WR8-43 JQ999448 464 5 408 4E-114 87% 87% 218158071 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. X31 JQ999449 446 23 423 0 100% 100% 226815632 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. ZR1-10 JQ999471 542 5 539 0 99% 99% 254621816 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas xanthomarina JQ999473 557 5 554 0 98% 98% 254587030 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonadaceae bacterium JQ999472 541 5 340 6E-163 98% 98% 254587124 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonadaceae bacterium JQ999478 452 39 401 2E-166 96% 96% 32346517 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999475 351 5 305 5E-152 99% 99% 46253495 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999477 431 18 386 0 100% 100% 151564597 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999480 530 5 504 2E-177 90% 90% 154757022 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999474 331 15 158 6E-66 99% 99% 209421463 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999479 502 18 466 0 96% 96% 257072692 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999476 411 5 377 0 100% 100% 283580095 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999481 554 17 550 0 100% 100% 283979647 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999503 439 18 174 3E-75 100% 100% 157367007 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. JQ999482 451 17 376 5E-168 97% 97% 162317487 Bacteria Proteobacteria (gamma) Piscirickettsiaceae uncultured Methylophaga sp. JQ999483 496 5 447 0 100% 100% 295815419 Bacteria Proteobacteria (gamma) Stenotrophomonas sp. I_B14 JQ999484 670 29 104 4E-26 97% 97% 254039381 Bacteria Proteobacteria (gamma) Xanthomonadaceae uncultured Frateuria sp. JQ999485 441 18 386 8E-156 94% 94% 290873614 Bacteria Proteobacteria (gamma) Xanthomonadaceae uncultured Stenotrophomonas sp. JQ999486 406 1 371 3E-125 89% 89% 295148942 Bacteria Proteobacteria (gamma) Xanthomonadaceae Xanthomonas axonopodis JQ999487 535 1 531 0 97% 97% 255339862 Bacteria Proteobacteria (symbiont) n proteobacterium symbiont of Nilaparvata lugens JQ999491 465 28 431 2E-172 94% 94% 89889088 Bacteria Proteobacteria (uncultured) n uncultured proteobacterium JQ999490 418 18 374 0 99% 99% 184190210 Bacteria Proteobacteria (uncultured) n uncultured proteobacterium JQ999488 282 7 223 8E-109 100% 100% 262409981 Bacteria Proteobacteria (uncultured) n uncultured proteobacterium JQ999489 371 5 326 6E-141 95% 95% 262410007 Bacteria Proteobacteria (uncultured) n uncultured proteobacterium JQ999492 284 146 197 1E-08 91% 91% 111283638 Bacteria Tenericutes Acholeplasmataceae Acholeplasma exanthum JQ999605 564 15 559 4E-174 88% 88% 167859741 Eukaryota Amoebozoa Nolandella sp. ATCC 50913 JQ999598 541 18 538 0 98% 98% 52082731 Eukaryota Arthropoda Entomobrya dorsosignata JQ999599 532 180 523 3E-111 89% 89% 68144261 Eukaryota Arthropoda Entomobryidae Sinella curviseta JQ999600 313 80 250 5E-57 91% 91% 224831607 Eukaryota Arthropoda Mycetophilidae Boletina plana JQ999601 537 18 536 0 89% 89% 532978 Eukaryota Arthropoda Diaspididae Aonidiella aurantii JQ999568 445 13 312 8E-146 98% 98% 284159116 Eukaryota Ascomycota Mycosphaerellaceae Mycosphaerellaceae sp. CPC 12304 JQ999575 532 5 208 7E-43 85% 85% 293630602 Eukaryota Ascomycota n Dothideomycetes sp. TRN 153 JQ999570 572 5 572 0 99% 99% 169893764 Eukaryota Ascomycota Didymellaceae Boeremia exigua JQ999571 568 17 552 0 99% 99% 31415568 Eukaryota Ascomycota Didymellaceae Peyronellaea glomerata JQ999572 580 17 550 0 98% 98% 294987147 Eukaryota Ascomycota Didymellaceae Phoma vasinfecta JQ999573 302 5 166 2E-76 99% 99% 1888314 Eukaryota Ascomycota Leptosphaeriaceae Leptosphaeria doliolum JQ999574 502 17 440 0 100% 100% 288557599 Eukaryota Ascomycota Phaeosphaeriaceae Phaeosphaeria sp. UZK JQ999576 386 5 340 9E-150 96% 96% 225134682 Eukaryota Ascomycota Trichocomaceae Byssochlamys spectabilis JQ999578 591 18 589 0 98% 98% 198250453 Eukaryota Ascomycota Helotiaceae Articulospora tetracladia JQ999579 573 17 557 0 99% 99% 171673225 Eukaryota Ascomycota n uncultured Ascomycota JQ999910 437 166 390 8E-101 97% 97% 295419269 Eukaryota Ascomycota n Candida orthopsilosis JQ999580 330 3 280 6E-141 100% 100% 6537145 Eukaryota Ascomycota Plectosphaerellaceae Verticillium dahliae JQ999581 573 18 568 0 100% 100% 254028318 Eukaryota Ascomycota n Nigrospora sp. SGSGf13 189 Table S1 Cont.

JQ999604 335 1 290 3E-144 99% 99% 98990721 Eukaryota Bacillariophyta Thalassiosiraceae Stephanodiscus sp. FHTC11 JQ999582 567 24 564 0 99% 99% 117168475 Eukaryota Basidiomycota Corticiaceae Sistotrema brinkmannii JQ999583 446 17 399 0 99% 99% 109452380 Eukaryota Basidiomycota n Rhodotorula lamellibrachiae JQ999584 579 24 506 0 98% 98% 111283858 Eukaryota Basidiomycota n Sakaguchia dacryoidea JQ999585 551 49 532 0 95% 95% 254927417 Eukaryota Basidiomycota n Leucosporidium sp. AY30 JQ999586 289 18 244 2E-94 95% 95% 225134685 Eukaryota Basidiomycota n Rhodotorula glutinis JQ999587 558 24 553 0 99% 99% 260279026 Eukaryota Basidiomycota n Dioszegia rishiriensis JQ999588 428 5 354 1E-173 99% 99% 124377860 Eukaryota Basidiomycota Tremellaceae Cryptococcus neoformans JQ999612 549 21 540 0 97% 97% 183206236 Eukaryota Chlorophyta Microsporaceae Microspora stagnorum JQ999610 544 18 544 0 99% 99% 5566332 Eukaryota Ciliophora Urostylidae Uroleptus pisces JQ999611 537 14 518 0 99% 99% 46019696 Eukaryota Heterokontophyta Botryidiopsidaceae Botrydiopsis constricta JQ999589 542 23 488 0 100% 100% 219563746 Eukaryota n n fungal sp. M222 JQ999608 556 24 531 0 98% 98% 194031911 Eukaryota n n uncultured eukaryote JQ999609 556 18 62 3E-11 98% 98% 291258337 Eukaryota n n uncultured eukaryote JQ999606 385 5 335 2E-170 100% 100% 198444245 Eukaryota n n uncultured eukaryote JQ999607 581 14 578 0 98% 98% 198444246 Eukaryota n n uncultured eukaryote JQ999592 550 18 162 3E-36 88% 88% 151413658 Eukaryota n n uncultured fungus JQ999593 550 23 550 0 98% 98% 189418598 Eukaryota n n uncultured fungus JQ999590 330 5 285 3E-144 100% 100% 234195395 Eukaryota n n uncultured fungus JQ999594 581 3 558 0 99% 99% 262358083 Eukaryota n n uncultured fungus JQ999591 385 5 328 4E-168 100% 100% 289470189 Eukaryota n n uncultured fungus JQ999595 581 17 575 0 98% 98% 291551855 Eukaryota n n uncultured fungus JQ999596 339 5 278 2E-136 99% 99% 157955968 Eukaryota n n uncultured marine fungus JQ999597 538 14 525 0 95% 95% 291172885 Eukaryota n n uncultured soil fungus JQ999603 527 17 523 0 98% 98% 259129963 Eukaryota Rotifera Adinetidae Adineta vaga JQ999613 489 5 126 1E-54 99% 99% 161621688 Eukaryota Streptophyta Cupressaceae Cupressus tonkinensis JQ999614 537 17 537 0 97% 97% 170516205 Eukaryota Streptophyta Taxaceae Taxus wallichiana JQ999615 395 19 294 4E-59 82% 82% 6752453 Eukaryota Streptophyta Orchidaceae Isotria verticillata JQ999616 267 3 212 3E-103 100% 100% 45386022 Eukaryota Streptophyta Poaceae Hordeum vulgare JQ999617 339 5 273 3E-134 99% 99% 166084404 Eukaryota Streptophyta Caprifoliaceae Caprifoliaceae environmental sample JQ999618 325 5 277 4E-137 99% 99% 290782507 Eukaryota Streptophyta Calpurnia aurea JQ999619 419 23 369 3E-154 96% 96% 22033 Eukaryota Streptophyta Fabaceae Vicia faba JQ999621 540 6 530 0 98% 98% 6688957 Eukaryota Streptophyta Plantaginaceae Plantago lanceolata JQ999623 551 5 545 0 98% 98% 1777739 Eukaryota Streptophyta Ulmaceae Zelkova serrata JQ999624 544 24 529 0 95% 95% 7595575 Eukaryota Streptophyta Xanthoceraceae Xanthoceras sorbifolium 190

Table S2. Large subunit rRNA genes of Bacteria and Eukarya from V5. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Accession Q Q Q %- %- e-value GI number Domain Phylum Family Genus / Species num length start end ident sim JQ997197 529 30 519 3E-165 89% 89% 48728139 Bacteria Actinobacteria Frankiaceae uncultured Frankia sp. JQ997198 732 17 128 2E-35 93% 93% 48728167 Bacteria Actinobacteria Frankiaceae uncultured Frankia sp. JQ997196 521 26 506 4E-95 81% 81% 48728178 Bacteria Actinobacteria Frankiaceae uncultured Frankia sp. JQ997274 369 8 54 4E-14 100% 100% 289551862 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium abscessus JQ999637 486 5 321 7E-62 82% 82% 269314044 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium immunogenum JQ999638 554 17 509 0 92% 92% 44368 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium kansasii JQ999639 552 18 455 0 93% 93% 196174916 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium shottsii JQ997284 598 5 598 0 90% 90% 2414571 Bacteria Actinobacteria Propionibacteriaceae Propionibacterium freudenreichii JQ999640 567 14 560 8E-152 85% 85% 6714990 Bacteria Actinobacteria Actinoallomurus spadix JQ997287 501 8 306 4E-119 93% 93% 5901576 Bacteria Actinobacteria Thermomonosporaceae Thermomonospora chromogena JQ999641 332 26 295 8E-115 95% 95% 291045144 Bacteria Actinobacteria Bifidobacteriaceae Bifidobacterium bifidum JQ999642 349 19 255 5E-82 90% 90% 30313593 Bacteria Bacteroidetes Bacteroidaceae Bacteroides caccae JQ997308 588 20 582 0 90% 90% 213536826 Bacteria Bacteroidetes Bacteroidaceae Bacteroides ovatus JQ999643 565 19 563 0 89% 89% 30313596 Bacteria Bacteroidetes Bacteroidaceae Bacteroides stercoris JQ997309 589 17 587 0 88% 88% 213536825 Bacteria Bacteroidetes Bacteroidaceae Bacteroides vulgatus JQ999644 301 83 269 2E-55 88% 88% 56744982 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium JQ999645 347 55 295 2E-90 92% 92% 56744983 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium JQ999646 552 40 550 0 90% 90% 30313598 Bacteria Bacteroidetes Porphyromonadaceae Parabacteroides merdae JQ999829 266 24 226 1E-77 93% 93% 38374131 Bacteria Bacteroidetes Cytophagaceae Flexibacter flexilis JQ999647 666 129 385 3E-92 91% 91% 46409892 Bacteria Cyanobacteria n Gloeobacter violaceus JQ999648 500 65 449 2E-146 91% 91% 46409882 Bacteria Cyanobacteria n Euhalothece sp. BAA001 JQ997361 360 18 327 9E-85 86% 86% 222138150 Bacteria Cyanobacteria n Gloeocapsopsis crepidinum JQ999649 291 20 259 7E-110 97% 97% 90186509 Bacteria Cyanobacteria n Synechococcus sp. C9 JQ997393 625 18 621 0 90% 90% 225696243 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997384 515 4 438 9E-176 93% 93% 225696245 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997375 383 5 246 1E-88 93% 93% 227072227 Bacteria Cyanobacteria n uncultured cyanobacterium JQ997374 361 18 307 8E-135 97% 97% 256692872 Bacteria Cyanobacteria n uncultured cyanobacterium JQ999650 558 26 555 0 97% 97% 46409901 Bacteria Cyanobacteria n Leptolyngbya boryana JQ999651 581 15 530 0 97% 97% 46409896 Bacteria Cyanobacteria n Leptolyngbya sp. PCC 7104 JQ997443 551 5 545 0 93% 93% 149364162 Bacteria Cyanobacteria n Microcoleus vaginatus JQ999652 588 2 583 0 96% 96% 46409893 Bacteria Cyanobacteria n Oscillatoria sp. PCC 6506 JQ999653 516 5 474 1E-168 90% 90% 46409900 Bacteria Cyanobacteria n Plectonema terebrans JQ999654 356 53 154 3E-29 91% 91% 281376717 Bacteria Firmicutes Alicyclobacillaceae sp. Z27 JQ999745 223 4 209 2E-69 91% 91% 347582959 Bacteria Firmicutes Bacillaceae Bacillus JQ999655 442 10 327 1E-113 91% 91% 20126656 Bacteria Firmicutes Bacillaceae Bacillus cereus JQ999656 587 14 587 0 94% 94% 241896768 Bacteria Firmicutes Bacillaceae Bacillus licheniformis JQ999657 525 13 466 0 95% 95% 152211809 Bacteria Firmicutes Listeriaceae grayi JQ999658 548 50 548 8E-97 81% 81% 152211814 Bacteria Firmicutes Listeriaceae JQ999659 540 4 539 2E-138 84% 84% 296465 Bacteria Firmicutes Listeriaceae JQ999660 571 6 561 0 89% 89% 296416 Bacteria Firmicutes Planococcaceae Sporosarcina globispora JQ999665 579 18 575 0 97% 97% 93463980 Bacteria Firmicutes Staphylococcaceae Staphylococcus xylosus JQ999666 506 5 450 2E-157 90% 90% 11342510 Bacteria Firmicutes Enterococcaceae asini JQ999667 564 5 518 0 94% 94% 11342519 Bacteria Firmicutes Enterococcaceae Enterococcus casseliflavus JQ999668 549 18 496 0 94% 94% 11342513 Bacteria Firmicutes Enterococcaceae Enterococcus cecorum JQ999669 550 8 541 0 94% 94% 11342514 Bacteria Firmicutes Enterococcaceae Enterococcus columbae JQ999670 577 5 574 0 93% 93% 11342515 Bacteria Firmicutes Enterococcaceae Enterococcus dispar JQ999671 435 18 379 2E-146 93% 93% 183673657 Bacteria Firmicutes Enterococcaceae Enterococcus faecium JQ999672 506 7 439 0 94% 94% 183673673 Bacteria Firmicutes Enterococcaceae Enterococcus faecium JQ999674 616 13 193 2E-33 83% 83% 11078579 Bacteria Firmicutes Enterococcaceae Enterococcus gallinarum JQ999673 564 1 561 0 90% 90% 11342520 Bacteria Firmicutes Enterococcaceae Enterococcus gallinarum JQ999675 563 41 561 7E-177 89% 89% 11342522 Bacteria Firmicutes Enterococcaceae Enterococcus malodoratus JQ999676 570 5 505 0 91% 91% 11342523 Bacteria Firmicutes Enterococcaceae Enterococcus mundtii JQ999677 361 139 329 1E-77 95% 95% 11342524 Bacteria Firmicutes Enterococcaceae Enterococcus pseudoavium JQ999678 552 18 551 0 95% 95% 11342525 Bacteria Firmicutes Enterococcaceae Enterococcus raffinosus JQ999679 462 5 425 6E-167 92% 92% 11342526 Bacteria Firmicutes Enterococcaceae Enterococcus saccharolyticus JQ999680 561 21 378 2E-138 92% 92% 11342528 Bacteria Firmicutes Enterococcaceae Enterococcus sulfureus JQ999681 380 19 285 9E-95 91% 91% 11342596 Bacteria Firmicutes Enterococcaceae Melissococcus plutonius JQ999682 447 153 398 1E-114 98% 98% 11342527 Bacteria Firmicutes Enterococcaceae Tetragenococcus solitarius JQ999683 243 80 219 8E-54 94% 94% 164664890 Bacteria Firmicutes Enterococcaceae uncultured Enterococcus sp. JQ999684 362 17 120 3E-39 96% 96% 164664896 Bacteria Firmicutes Enterococcaceae uncultured Enterococcus sp. JQ997623 582 17 538 0 99% 99% 50080123 Bacteria Firmicutes Lactobacillaceae Lactobacillus animalis JQ997624 539 18 277 8E-77 87% 87% 190360905 Bacteria Firmicutes Lactobacillaceae Lactobacillus brevis JQ997651 537 23 534 0 94% 94% 50080122 Bacteria Firmicutes Lactobacillaceae Lactobacillus murinus JQ999685 375 18 238 1E-107 99% 99% 167046809 Bacteria Firmicutes Lactobacillaceae claussenii JQ999686 557 18 490 8E-97 81% 81% 167046804 Bacteria Firmicutes Lactobacillaceae Pediococcus parvulus JQ999687 552 18 547 1E-109 81% 81% 167046808 Bacteria Firmicutes Lactobacillaceae Pediococcus pentosaceus JQ999688 386 24 337 5E-127 93% 93% 167047104 Bacteria Firmicutes Lactobacillaceae Pediococcus stilesii JQ999689 557 5 535 8E-152 85% 85% 45597358 Bacteria Firmicutes Streptococcaceae Streptococcus canis JQ999690 586 4 231 2E-83 91% 91% 2897684 Bacteria Firmicutes Streptococcaceae Streptococcus constellatus JQ999692 513 18 449 0 94% 94% 25396587 Bacteria Firmicutes Streptococcaceae Streptococcus dysgalactiae JQ999691 328 54 190 1E-58 98% 98% 281331096 Bacteria Firmicutes Streptococcaceae Streptococcus dysgalactiae JQ999693 545 5 539 0 91% 91% 281331108 Bacteria Firmicutes Streptococcaceae Streptococcus dysgalactiae JQ999694 527 5 341 2E-148 95% 95% 45597360 Bacteria Firmicutes Streptococcaceae Streptococcus equi JQ999695 546 18 505 0 98% 98% 45597357 Bacteria Firmicutes Streptococcaceae Streptococcus equinus 191 Table S2 Cont.

JQ999696 570 5 558 0 93% 93% 45597361 Bacteria Firmicutes Streptococcaceae Streptococcus equinus JQ999697 499 2 467 0 93% 93% 45597366 Bacteria Firmicutes Streptococcaceae Streptococcus hyointestinalis JQ997695 526 201 486 7E-147 100% 100% 11991762 Bacteria Firmicutes Streptococcaceae Streptococcus mutans JQ997698 576 4 570 0 92% 92% 213536840 Bacteria Firmicutes Streptococcaceae Streptococcus mutans JQ999698 594 5 556 0 89% 89% 288522 Bacteria Firmicutes Streptococcaceae Streptococcus oralis JQ999699 434 16 395 7E-176 96% 96% 433515 Bacteria Firmicutes Streptococcaceae Streptococcus parauberis JQ999880 262 1 262 2E-101 93% 93% 160426828 Bacteria Firmicutes Clostridiaceae Clostridium JQ997801 291 5 225 5E-91 95% 95% 213536838 Bacteria Firmicutes Peptostreptococcaceae Peptostreptococcus anaerobius JQ997804 456 12 391 3E-125 89% 89% 3821805 Bacteria Firmicutes Erysipelotrichaceae Erysipelothrix rhusiopathiae JQ999702 554 59 550 0 90% 90% 288510 Bacteria Firmicutes Veillonellaceae Pectinatus frisingensis JQ997847 422 18 268 2E-122 99% 99% 213536832 Bacteria Fusobacteria Fusobacteriaceae Fusobacterium necrophorum JQ999703 562 9 308 1E-95 89% 89% 15028908 Bacteria Fusobacteria Fusobacteriaceae Fusobacterium nucleatum JQ999704 570 27 561 2E-152 86% 86% 15029008 Bacteria Fusobacteria Fusobacteriaceae Ilyobacter polytropus JQ999705 565 13 551 6E-163 87% 87% 15029012 Bacteria Fusobacteria Fusobacteriaceae Propionigenium maris JQ999706 321 4 270 3E-104 93% 93% 15029011 Bacteria Fusobacteria Fusobacteriaceae Propionigenium modestum JQ999716 304 18 273 9E-114 96% 96% 41350813 Bacteria n n uncultured bacterium JQ999721 327 24 283 6E-116 96% 96% 41350814 Bacteria n n uncultured bacterium JQ999734 385 18 337 7E-151 97% 97% 224814922 Bacteria n n uncultured bacterium JQ999730 365 5 295 3E-109 92% 92% 240002803 Bacteria n n uncultured bacterium JQ999777 552 17 433 0 96% 96% 291258506 Bacteria n n uncultured bacterium JQ999760 500 17 457 0 94% 94% 291258521 Bacteria n n uncultured bacterium JQ999736 398 11 346 4E-153 96% 96% 291258526 Bacteria n n uncultured bacterium JQ999772 544 17 487 0 92% 92% 291258532 Bacteria n n uncultured bacterium JQ999773 544 17 543 6E-138 84% 84% 291258599 Bacteria n n uncultured bacterium JQ999743 423 15 304 5E-93 88% 88% 291258613 Bacteria n n uncultured bacterium JQ999788 572 14 564 0 94% 94% 291258642 Bacteria n n uncultured bacterium JQ999793 583 19 515 0 91% 91% 291258644 Bacteria n n uncultured bacterium JQ999749 441 154 390 1E-89 92% 92% 291258653 Bacteria n n uncultured bacterium JQ999785 566 5 268 5E-104 93% 93% 291258669 Bacteria n n uncultured bacterium JQ999761 500 5 468 3E-150 88% 88% 291258684 Bacteria n n uncultured bacterium JQ999797 696 5 233 3E-87 93% 93% 291258703 Bacteria n n uncultured bacterium JQ999746 431 5 81 7E-22 94% 94% 291258731 Bacteria n n uncultured bacterium JQ999714 299 18 240 2E-105 98% 98% 291258817 Bacteria n n uncultured bacterium JQ999758 490 3 444 1E-153 89% 89% 291258864 Bacteria n n uncultured bacterium JQ999751 448 18 437 3E-175 94% 94% 291258875 Bacteria n n uncultured bacterium JQ999715 300 4 242 7E-100 95% 95% 291258896 Bacteria n n uncultured bacterium JQ999712 270 26 229 9E-44 83% 83% 291258991 Bacteria n n uncultured bacterium JQ999774 545 137 529 6E-158 93% 93% 291259015 Bacteria n n uncultured bacterium JQ999724 337 5 251 5E-112 97% 97% 291259090 Bacteria n n uncultured bacterium JQ999713 279 5 223 3E-63 88% 88% 291259092 Bacteria n n uncultured bacterium JQ999748 433 20 363 2E-157 96% 96% 291259094 Bacteria n n uncultured bacterium JQ999757 486 18 422 2E-151 91% 91% 291259109 Bacteria n n uncultured bacterium JQ999784 565 29 535 1E-165 88% 88% 291259178 Bacteria n n uncultured bacterium JQ999769 536 47 532 0 91% 91% 291259189 Bacteria n n uncultured bacterium JQ999770 536 193 474 1E-109 93% 93% 291259240 Bacteria n n uncultured bacterium JQ999764 518 24 461 0 94% 94% 291259266 Bacteria n n uncultured bacterium JQ999756 483 5 426 0 95% 95% 291259291 Bacteria n n uncultured bacterium JQ999709 266 5 216 2E-60 88% 88% 291259334 Bacteria n n uncultured bacterium JQ999741 421 5 376 7E-151 93% 93% 291259383 Bacteria n n uncultured bacterium JQ999794 583 82 580 0 93% 93% 291259408 Bacteria n n uncultured bacterium JQ999755 456 76 431 4E-84 83% 83% 291259424 Bacteria n n uncultured bacterium JQ999778 557 5 524 0 89% 89% 291259440 Bacteria n n uncultured bacterium JQ999720 318 18 260 8E-70 87% 87% 291259455 Bacteria n n uncultured bacterium JQ999795 614 29 564 1E-145 86% 86% 291259482 Bacteria n n uncultured bacterium JQ999754 455 149 358 3E-25 78% 78% 291259493 Bacteria n n uncultured bacterium JQ999725 342 5 258 5E-107 94% 94% 291259502 Bacteria n n uncultured bacterium JQ999780 559 63 313 9E-72 87% 87% 291259509 Bacteria n n uncultured bacterium JQ999738 404 28 148 1E-38 92% 92% 291259601 Bacteria n n uncultured bacterium JQ999753 452 175 411 2E-71 88% 88% 291259787 Bacteria n n uncultured bacterium JQ999787 569 429 551 3E-32 89% 89% 291259805 Bacteria n n uncultured bacterium JQ999781 559 21 525 0 98% 98% 291259832 Bacteria n n uncultured bacterium JQ999775 548 26 543 2E-177 89% 89% 291259953 Bacteria n n uncultured bacterium JQ999732 368 5 333 1E-132 93% 93% 291259969 Bacteria n n uncultured bacterium JQ999708 261 6 211 3E-63 89% 89% 291260077 Bacteria n n uncultured bacterium JQ999711 269 5 227 4E-77 91% 91% 291260092 Bacteria n n uncultured bacterium JQ999768 535 9 410 3E-155 92% 92% 291260192 Bacteria n n uncultured bacterium JQ999783 564 18 463 1E-169 91% 91% 291260193 Bacteria n n uncultured bacterium JQ999786 566 17 553 2E-163 87% 87% 291260258 Bacteria n n uncultured bacterium JQ999796 666 5 311 8E-93 88% 88% 291260323 Bacteria n n uncultured bacterium JQ999707 248 5 190 1E-71 94% 94% 291260330 Bacteria n n uncultured bacterium JQ999752 451 15 392 3E-175 96% 96% 291260331 Bacteria n n uncultured bacterium JQ999767 533 18 480 1E-144 87% 87% 291260367 Bacteria n n uncultured bacterium JQ999719 314 185 268 2E-31 98% 98% 291260384 Bacteria n n uncultured bacterium JQ999766 532 15 516 3E-146 86% 86% 291260462 Bacteria n n uncultured bacterium JQ999710 268 18 223 1E-97 99% 99% 291260506 Bacteria n n uncultured bacterium JQ999733 383 5 289 9E-115 93% 93% 291260539 Bacteria n n uncultured bacterium JQ999765 520 22 452 0 96% 96% 291260594 Bacteria n n uncultured bacterium JQ999717 306 4 258 1E-102 94% 94% 291260683 Bacteria n n uncultured bacterium JQ999737 399 5 341 7E-131 92% 92% 291260701 Bacteria n n uncultured bacterium 192 Table S2 Cont.

JQ999729 362 18 261 2E-90 92% 92% 291260769 Bacteria n n uncultured bacterium JQ999740 413 5 368 4E-158 95% 95% 291260837 Bacteria n n uncultured bacterium JQ999779 558 5 382 1E-159 94% 94% 291260876 Bacteria n n uncultured bacterium JQ999718 312 5 253 2E-76 88% 88% 291260895 Bacteria n n uncultured bacterium JQ999762 501 17 431 2E-176 94% 94% 291260953 Bacteria n n uncultured bacterium JQ999791 576 28 533 3E-91 80% 80% 291261001 Bacteria n n uncultured bacterium JQ999735 388 24 322 1E-108 91% 91% 291261015 Bacteria n n uncultured bacterium JQ999744 426 5 282 5E-53 82% 82% 291261049 Bacteria n n uncultured bacterium JQ999782 562 20 537 0 92% 92% 291261180 Bacteria n n uncultured bacterium JQ999776 550 45 485 1E-149 89% 89% 291261284 Bacteria n n uncultured bacterium JQ999763 517 27 473 0 96% 96% 291261295 Bacteria n n uncultured bacterium JQ999731 365 29 238 7E-81 93% 93% 291261310 Bacteria n n uncultured bacterium JQ999750 441 18 404 1E-179 96% 96% 291261311 Bacteria n n uncultured bacterium JQ999726 345 5 296 8E-140 98% 98% 291261346 Bacteria n n uncultured bacterium JQ999792 576 16 507 1E-70 78% 78% 291261350 Bacteria n n uncultured bacterium JQ999771 537 18 436 0 95% 95% 291261377 Bacteria n n uncultured bacterium JQ999759 490 2 160 5E-59 94% 94% 291261380 Bacteria n n uncultured bacterium JQ999723 336 23 306 3E-104 91% 91% 291261400 Bacteria n n uncultured bacterium JQ999722 335 18 202 3E-79 96% 96% 291261412 Bacteria n n uncultured bacterium JQ999747 432 19 382 3E-175 98% 98% 291261484 Bacteria n n uncultured bacterium JQ999789 575 33 523 0 92% 92% 291261504 Bacteria n n uncultured bacterium JQ999790 575 25 537 0 96% 96% 291261577 Bacteria n n uncultured bacterium JQ999739 412 5 177 2E-82 99% 99% 291261679 Bacteria n n uncultured bacterium JQ999728 355 17 264 3E-104 95% 95% 291261696 Bacteria n n uncultured bacterium JQ999742 422 135 308 5E-33 83% 83% 291261700 Bacteria n n uncultured bacterium JQ999727 347 5 302 2E-140 97% 97% 291261750 Bacteria n n uncultured bacterium JQ999798 586 18 582 0 91% 91% 12583964 Bacteria Planctomycetes Planctomycetaceae Pirellula staleyi JQ999799 581 18 577 0 90% 90% 2244633 Bacteria Proteobacteria (alpha) Caulobacteraceae Brevundimonas diminuta JQ999800 338 5 291 4E-68 84% 84% 32328286 Bacteria Proteobacteria (alpha) n unknown marine alpha proteobacterium JP57 JQ999801 557 5 554 0 98% 98% 29725918 Bacteria Proteobacteria (alpha) Mesorhizobium loti JQ999802 585 4 584 2E-178 87% 87% 197734889 Bacteria Proteobacteria (alpha) Rhizobiaceae Rhizobium gallicum JQ999803 559 24 278 2E-87 90% 90% 138754396 Bacteria Proteobacteria (alpha) Rhizobiaceae Rhizobium giardinii JQ999804 577 16 569 0 91% 91% 138754393 Bacteria Proteobacteria (alpha) Rhizobiaceae Sinorhizobium arboris JQ999805 412 5 338 2E-137 94% 94% 2244670 Bacteria Proteobacteria (alpha) Rhodobacteraceae Paracoccus denitrificans JQ999806 413 24 362 2E-116 90% 90% 89277242 Bacteria Proteobacteria (alpha) n Caedibacter caryophilus JQ999807 526 5 137 4E-60 99% 99% 32328294 Bacteria Proteobacteria (alpha) Sphingomonadaceae Sphingomonas sp. KT0216 JQ999809 445 4 368 2E-162 95% 95% 290796637 Bacteria Proteobacteria (beta) n uncultured bacterium JQ999808 359 16 349 2E-160 98% 98% 290796638 Bacteria Proteobacteria (beta) n uncultured Burkholderiales bacterium JQ999810 240 96 206 1E-32 91% 91% 73533084 Bacteria Proteobacteria (beta) Oxalobacteraceae Herbaspirillum autotrophicum JQ999811 523 83 522 4E-174 92% 92% 73533083 Bacteria Proteobacteria (beta) Oxalobacteraceae Herbaspirillum huttiense JQ999812 535 18 524 0 96% 96% 50957144 Bacteria Proteobacteria (epsilon) Helicobacteraceae JQ999323 584 18 540 0 90% 90% 213536822 Bacteria Proteobacteria (gamma) Succinivibrionaceae Anaerobiospirillum succiniciproducens JQ999327 311 19 268 2E-80 89% 89% 213536828 Bacteria Proteobacteria (gamma) hominis JQ999813 485 6 420 3E-175 94% 94% 225182735 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas axialensis JQ999814 574 5 517 0 98% 98% 225182736 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas boliviensis JQ999815 555 18 551 0 95% 95% 225182737 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas neptunia JQ999816 560 13 558 0 90% 90% 168148844 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sulfidaeris JQ999817 354 51 308 4E-53 83% 83% 17976821 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas variabilis JQ999818 522 18 472 2E-137 87% 87% 225735345 Bacteria Proteobacteria (gamma) Halomonadaceae Salinicola halophilus JQ999819 562 5 557 0 92% 92% 164451918 Bacteria Proteobacteria (gamma) Pasteurellaceae suis JQ999820 574 21 559 0 95% 95% 2244630 Bacteria Proteobacteria (gamma) Moraxellaceae Acinetobacter calcoaceticus JQ999361 576 5 576 0 97% 97% 127463238 Bacteria Proteobacteria (gamma) Moraxellaceae Acinetobacter calcoaceticus JQ999822 310 5 272 5E-126 98% 98% 1913845 Bacteria Proteobacteria (gamma) Xanthomonadaceae Xanthomonas fragariae JQ999823 339 25 308 6E-146 100% 100% 288225748 Bacteria Spirochaetes Brachyspiraceae JQ999824 460 5 412 0 100% 100% 288225749 Bacteria Spirochaetes Brachyspiraceae Brachyspira pilosicoli JQ999825 468 181 404 2E-102 97% 97% 288225746 Bacteria Spirochaetes Brachyspiraceae Brachyspira sp. PT.C JQ999826 414 18 382 2E-67 81% 81% 294768449 Bacteria Tenericutes Acholeplasmataceae Acholeplasma equifetale JQ999493 559 2 458 4E-145 88% 88% 294660656 Bacteria Tenericutes Spiroplasmataceae Spiroplasma diabroticae JQ999827 458 5 359 2E-126 90% 90% 294768451 Bacteria Tenericutes Mycoplasmataceae Mycoplasma feliminutum JQ999828 465 23 390 4E-179 98% 98% 183579829 Bacteria Verrucomicrobia Verrucomicrobia subdivision 3 Pedosphaera parvula JQ999877 538 18 519 0 99% 99% 148372021 Eukaryota Arthropoda Entomobryidae Lepidocyrtus sp. Yan Gao 06126 JQ999878 542 25 541 0 97% 97% 145308394 Eukaryota Arthropoda Entomobryidae Sinella curviseta JQ999876 445 33 83 0.000000001 92% 92% 59890651 Eukaryota Arthropoda Silphidae Silpha obscura JQ999875 258 18 214 1E-82 96% 96% 202070905 Eukaryota Arthropoda Trichoceridae Trichocera brevicornis JQ999837 553 5 550 0 100% 100% 284158823 Eukaryota Ascomycota Davidiellaceae Davidiella tassiana JQ999838 553 2 516 0 97% 97% 284158872 Eukaryota Ascomycota Mycosphaerellaceae Passalora perplexa JQ999839 413 136 362 2E-97 96% 96% 283827928 Eukaryota Ascomycota Mycosphaerellaceae Verrucisporota daviesiae JQ999862 348 5 301 8E-150 99% 99% 282160302 Eukaryota Ascomycota Didymellaceae Phoma infossa JQ999865 503 5 457 0 100% 100% 294987020 Eukaryota Ascomycota Didymellaceae Phoma macrostoma JQ999872 552 89 548 0 97% 97% 294987076 Eukaryota Ascomycota Didymellaceae Phoma viburnicola JQ999863 419 5 364 0 99% 99% 294987114 Eukaryota Ascomycota Didymellaceae Stagonosporopsis rudbeckiae JQ999861 268 4 215 5E-101 99% 99% 294987118 Eukaryota Ascomycota Didymellaceae Stagonosporopsis valerianellae JQ999874 868 17 128 1E-42 96% 96% 290790471 Eukaryota Ascomycota Lentitheciaceae Lentitheciaceae sp. A369-1 JQ999866 522 24 457 0 100% 100% 284192847 Eukaryota Ascomycota Phaeosphaeriaceae Phaeodothis winteri JQ999841 561 24 559 0 100% 100% 154563033 Eukaryota Ascomycota Phaeosphaeriaceae Phaeosphaeria avenaria JQ999840 558 18 525 0 96% 96% 154563036 Eukaryota Ascomycota Phaeosphaeriaceae Phaeosphaeria avenaria JQ999842 569 5 554 0 97% 97% 159171560 Eukaryota Ascomycota Phaeosphaeriaceae Phaeosphaeria avenaria JQ999869 542 5 541 0 100% 100% 208879720 Eukaryota Ascomycota Phaeosphaeriaceae Phaeosphaeria nodorum JQ999843 437 24 384 0 99% 99% 290760004 Eukaryota Ascomycota Phaeosphaeriaceae Phaeosphaeria spartinicola 193 Table S2 Cont.

JQ999873 555 24 555 0 96% 96% 289449251 Eukaryota Ascomycota Pleosporaceae Alternaria tenuissima JQ999855 521 233 457 6E-108 99% 99% 90577143 Eukaryota Ascomycota n Candida ontarioensis JQ999856 212 11 225 1E-55 86% 86% 90577143 Eukaryota Ascomycota n mitosporic Saccharomycetales JQ999858 564 18 560 0 99% 99% 156099766 Eukaryota Ascomycota Saccharomycetaceae Cyberlindnera jadinii JQ999879 557 270 548 3E-121 95% 95% 18698476 Eukaryota Bacillariophyta Bacillariaceae Hantzschia amphioxys JQ999844 336 18 305 4E-108 92% 92% 148358050 Eukaryota Basidiomycota Clavariaceae Macrotyphula fistulosa JQ999845 477 15 410 2E-166 94% 94% 148358060 Eukaryota Basidiomycota Cortinariaceae Leucocortinarius bulbiger JQ999846 339 13 294 2E-120 95% 95% 148358057 Eukaryota Basidiomycota Nidulariaceae Cyathus striatus JQ999847 455 21 396 1E-144 92% 92% 84873683 Eukaryota Basidiomycota Physalacriaceae Armillaria hinnulea JQ999848 507 29 271 2E-112 98% 98% 54695062 Eukaryota Basidiomycota Tulasnellaceae uncultured Tulasnellaceae JQ999860 268 16 210 9E-44 84% 84% 256859920 Eukaryota Basidiomycota Geastraceae Geastrum sessile JQ999871 551 4 541 2E-153 86% 86% 34915795 Eukaryota Basidiomycota Gloeophyllaceae Gloeophyllum sepiarium JQ999864 429 27 253 1E-68 88% 88% 34915791 Eukaryota Basidiomycota Coriolaceae Donkioporia expansa JQ999849 547 5 543 7E-162 86% 86% 77379307 Eukaryota Basidiomycota Ganodermataceae Ganoderma lucidum JQ999868 537 19 516 0 96% 96% 117164085 Eukaryota Basidiomycota vaillantii JQ999850 554 18 538 0 95% 95% 110704320 Eukaryota Basidiomycota Malasseziaceae Malassezia cf. restricta HN3127 JQ999851 545 20 542 0 98% 98% 52631074 Eukaryota Basidiomycota Malasseziaceae Malassezia pachydermatis JQ999867 536 20 536 0 90% 90% 256859939 Eukaryota Basidiomycota Tilletiaceae Tilletia olida JQ999852 346 17 300 4E-138 98% 98% 111283857 Eukaryota Basidiomycota n Sakaguchia dacryoidea JQ999853 546 17 546 0 100% 100% 224979500 Eukaryota Basidiomycota n Cryptococcus sp. ATT123 JQ999859 468 4 385 2E-111 87% 87% 304472 Eukaryota Basidiomycota Tremellaceae Cryptococcus gattii JQ999889 383 21 334 7E-81 85% 85% 17028321 Eukaryota Chlorophyta n Chlorosarcina brevispinosa JQ999905 550 18 546 0 92% 92% 237648547 Eukaryota Chlorophyta Chlorellaceae Chlorella variabilis JQ999890 561 8 557 0 93% 93% 535783 Eukaryota Chlorophyta Chlorellaceae Pseudochlorella pringsheimii JQ999891 596 141 595 1E-145 89% 89% 12667510 Eukaryota Chlorophyta n Trichosarcina mucosa JQ999886 531 18 528 0 93% 93% 220966760 Eukaryota Ciliophora Oxytrichidae Sterkiella histriomuscorum JQ999881 416 5 221 1E-98 97% 97% 157411062 Eukaryota Heterokontophyta Halosiphonaceae Halosiphon tomentosus JQ999857 542 18 539 0 99% 99% 85700735 Eukaryota n n uncultured compost fungus JQ999887 575 18 553 0 90% 90% 291261822 Eukaryota n n uncultured eukaryote JQ999888 580 25 557 0 89% 89% 291261826 Eukaryota n n uncultured eukaryote JQ999883 348 18 311 3E-119 93% 93% 291263034 Eukaryota n n uncultured eukaryote JQ999885 465 24 408 4E-159 94% 94% 291263118 Eukaryota n n uncultured eukaryote JQ999884 401 86 357 3E-109 93% 93% 291263535 Eukaryota n n uncultured eukaryote JQ999882 281 4 233 1E-112 99% 99% 291263834 Eukaryota n n uncultured eukaryote JQ999854 528 26 525 0 92% 92% 239819331 Eukaryota n n uncultured fungus JQ999904 581 36 577 0 97% 97% 170516193 Eukaryota Streptophyta Taxaceae Taxus wallichiana JQ999892 392 5 113 1E-23 87% 87% 109138712 Eukaryota Streptophyta Ptilidiaceae Ptilidium pulcherrimum JQ999893 538 33 538 0 97% 97% 66969404 Eukaryota Streptophyta Tofieldiaceae Tofieldia calyculata JQ999894 547 106 523 2E-178 94% 94% 109138849 Eukaryota Streptophyta Amborellaceae Amborella trichopoda JQ999895 307 5 275 3E-118 96% 96% 109138856 Eukaryota Streptophyta Chloranthaceae Hedyosmum arborescens JQ999897 355 102 321 2E-100 97% 97% 22595025 Eukaryota Streptophyta Hydrangeaceae Philadelphus lewisii JQ999899 225 1 214 2E-94 96% 96% 57340766 Eukaryota Streptophyta Lecythidaceae Napoleona JQ999903 572 15 559 0 95% 95% 62902790 Eukaryota Streptophyta Fabaceae Glycine max JQ999902 549 18 491 0 98% 98% 19655 Eukaryota Streptophyta Fabaceae Medicago sativa JQ999901 542 5 349 1E-139 93% 93% 267850659 Eukaryota Streptophyta Fabaceae Spatholobus suberectus 194

Table S3. Ribosomal RNA gene sequences less than 200 nt in length (could not be submitted to NCBI) from V5. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Q Q Q %- %- 454 Sequence number e-value GI number Domain Phylum Family Genus / Species length start end ident sim GKJWQY101AUK82 258 5 168 4E-77 99% 99% 269979854 Bacteria Actinobacteria Actinomycetaceae uncultured Actinomyces sp. GKJWQY101AC1PI 228 5 174 2E-79 99% 99% 293629580 Bacteria Actinobacteria Microbacteriaceae Klugiella sp. Cr8-25 GKJWQY101AKK09 64 5 58 1E-15 96% 96% 295345516 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. enrichment culture clone ENR1 9.1 GKJWQY101ALS2O 204 23 158 6E-64 100% 100% 295809764 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. V2M1 GKJWQY101ATYDL 575 217 434 4E-75 91% 91% 241995735 Bacteria Actinobacteria Micrococcaceae Kocuria sp. 10-4DEP GKJWQY101AL2DJ 225 17 168 9E-73 100% 100% 158551986 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. SAWW3 GKJWQY101AGL3S 97 23 83 5E-21 98% 98% 170674451 Bacteria Actinobacteria n Micrococcineae bacterium HM06-11 GKJWQY101A0PVQ 184 17 139 9E-57 100% 100% 295410271 Bacteria Actinobacteria Nocardiaceae Rhodococcus sp. WTZ-R2 GKJWQY101BB1KL 302 110 220 8E-40 95% 95% 289470718 Bacteria Actinobacteria Nocardioidaceae Kribbella sp. PIP 158 GKJWQY101APNNM 180 17 159 7E-68 100% 100% 110238724 Bacteria Actinobacteria Propionibacteriaceae Propionibacterium sp. SV442 GKJWQY101A02B3 147 50 103 7E-17 98% 98% 2226289 Bacteria Actinobacteria Pseudonocardiaceae Thermobispora bispora GKJWQY101AUK0Q 186 18 141 1E-55 99% 99% 270048059 Bacteria Actinobacteria Sanguibacteraceae Sanguibacter sp. ljh-8 GKJWQY101BSJK8 127 5 57 3E-15 96% 96% 219846414 Bacteria Actinobacteria Thermomonosporaceae Actinocorallia libanotica GKJWQY101ALSG0 260 5 196 8E-89 98% 98% 284930165 Bacteria Actinobacteria Bifidobacteriaceae Bifidobacterium gallinarum GKJWQY101A70F2 221 5 190 2E-78 96% 96% 237930451 Bacteria Actinobacteria Bifidobacteriaceae uncultured Bifidobacterium sp. GKJWQY101AHOH5 180 4 127 2E-23 85% 85% 151384647 Bacteria Actinobacteria Coriobacteriaceae uncultured Olsenella sp. GKJWQY101BJRK5 197 11 152 1E-56 96% 96% 1418399 Bacteria Actinobacteria n bacterium WE2 GKJWQY101AJU56 147 1 101 6E-38 96% 96% 24210873 Bacteria Actinobacteria n uncultured actinobacterium GKJWQY101BXZWQ 126 21 79 1E-19 98% 98% 157837222 Bacteria Actinobacteria n uncultured actinobacterium GKJWQY101BUECR 185 5 150 1E-60 96% 96% 293618134 Bacteria Actinobacteria n uncultured actinobacterium GKJWQY101A8CYD 240 43 178 1E-61 99% 99% 293618152 Bacteria Actinobacteria n uncultured actinobacterium GKJWQY101AOYOC 152 4 94 2E-32 96% 96% 48728183 Bacteria Actinobacteria Frankiaceae uncultured Frankia sp. GKJWQY101APUX2 555 16 549 3E-170 88% 88% 37545001 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium GKJWQY101A4BGJ 176 21 103 3E-22 92% 92% 7208098 Bacteria Actinobacteria Thermomonosporaceae echinospora GKJWQY101ATSX1 563 139 555 0 96% 96% 208690660 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium GKJWQY101BNJE7 147 4 85 2E-27 95% 95% 217337428 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium GKJWQY101BW1AP 192 2 121 2E-53 99% 99% 217337429 Bacteria Bacteroidetes n uncultured Bacteroidales bacterium GKJWQY101A82OQ 179 5 122 2E-47 97% 97% 162849118 Bacteria Bacteroidetes Porphyromonadaceae uncultured Porphyromonas sp. GKJWQY101AAUNK 212 18 168 6E-69 99% 99% 186926082 Bacteria Bacteroidetes Flavobacteriaceae uncultured Flavobacterium sp. GKJWQY101AHUQL 210 5 158 3E-72 99% 99% 184132972 Bacteria Bacteroidetes Sphingobacteriaceae Pedobacter steynii GKJWQY101BAP0G 236 4 198 7E-89 97% 97% 284431811 Bacteria Bacteroidetes Sphingobacteriaceae Sphingobacterium sp. 0-1 GKJWQY101BHQ5F 188 17 153 7E-48 93% 93% 154759415 Bacteria Bacteroidetes Flavobacteriaceae Flavobacterium columnare GKJWQY101AULNM 159 18 132 3E-46 97% 97% 154845376 Bacteria Cyanobacteria n Gloeocapsa sp. CR_L16 GKJWQY101BSMW8 63 25 55 0.000003 100% 100% 119657303 Bacteria Cyanobacteria n uncultured Microcystis sp. GKJWQY101AY8U6 476 372 410 0.00001 95% 95% 124361561 Bacteria Cyanobacteria n uncultured Thermosynecococcus sp. GKJWQY101BYVWY 294 82 258 2E-71 95% 95% 46948081 Bacteria Cyanobacteria n uncultured Antarctic cyanobacterium GKJWQY101BPXL9 157 5 104 6E-43 99% 99% 67482522 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A5GVE 188 21 156 3E-61 99% 99% 105990262 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A349D 280 18 166 2E-64 97% 97% 105990476 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AGISR 560 5 559 0 97% 97% 146429655 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AJCIH 146 5 65 7E-17 95% 95% 156187252 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101ACXP4 94 1 80 1E-31 99% 99% 163962573 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AV5XT 196 5 178 2E-83 99% 99% 189484695 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A9ERV 113 18 47 0.00002 100% 100% 189484726 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BVZ69 231 2 180 1E-76 97% 97% 213053918 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AJE27 232 5 164 7E-74 99% 99% 213053926 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BUG7V 198 2 155 3E-72 99% 99% 213053927 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AGMF1 194 3 150 1E-70 100% 100% 213053932 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BD8H5 190 5 146 3E-67 100% 100% 219883489 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AR8AI 150 8 134 2E-57 99% 99% 225031951 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A3SVV 213 17 169 1E-61 95% 95% 227072232 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101AT7RZ 160 9 127 2E-52 99% 99% 229563895 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A5DQZ 136 17 106 1E-38 100% 100% 229563905 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BHNRF 165 25 112 2E-37 100% 100% 253749514 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BA64N 229 27 99 2E-24 96% 96% 257782341 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BKQNB 294 25 131 2E-31 91% 91% 261290032 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BM0KB 120 5 63 2E-21 100% 100% 285015322 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101BQSM2 47 1 46 6E-15 100% 100% 225696274 Bacteria Cyanobacteria n Uncultured Chroococcidiopsis sp. GKJWQY101AGMV1 233 5 185 7E-89 100% 100% 18182395 Bacteria Cyanobacteria n uncultured soil crust cyanobacterium GKJWQY101AR5CJ 134 3 87 8E-36 100% 100% 45643565 Bacteria Cyanobacteria Microchaetaceae Petalonema sp. ANT.LG2.8 GKJWQY101ALPMM 141 4 79 1E-19 92% 92% 183673428 Bacteria Cyanobacteria Nostocaceae Trichormus naviculoides GKJWQY101AB8TO 202 7 148 3E-62 98% 98% 24850290 Bacteria Cyanobacteria Rivulariaceae Calothrix sp. CCMEE 5085 GKJWQY101BPQ7H 169 3 123 1E-55 100% 100% 143635043 Bacteria Cyanobacteria Scytonemataceae Brasilonema terrestre GKJWQY101BEX4H 218 22 197 4E-81 98% 98% 167508113 Bacteria Cyanobacteria n Geitlerinema sp. CCALA 138 GKJWQY101BA9JR 241 5 169 6E-75 98% 98% 262477864 Bacteria Cyanobacteria n Leptolyngbya sp. 22.A GKJWQY101BVSZA 179 23 110 2E-18 88% 88% 57996802 Bacteria Cyanobacteria n Microcoleus steenstrupii GKJWQY101BCKFY 192 5 160 2E-73 99% 99% 121308615 Bacteria Cyanobacteria n Oscillatoria lutea GKJWQY101ALVLN 199 1 143 4E-66 99% 99% 291603790 Bacteria Cyanobacteria n Oscillatoria margaritifera GKJWQY101BYF48 148 4 40 5E-09 100% 100% 226844847 Bacteria Cyanobacteria n Oscillatoria sp. MMG-2 GKJWQY101BBGQI 149 8 132 3E-55 98% 98% 258547392 Bacteria Cyanobacteria n Phormidium autumnale GKJWQY101BE15A 212 60 161 5E-45 100% 100% 258547396 Bacteria Cyanobacteria n Phormidium autumnale GKJWQY101BAT30 217 1 145 1E-66 99% 99% 260595431 Bacteria Cyanobacteria n Phormidium autumnale GKJWQY101BI23D 490 265 459 1E-34 82% 82% 45643541 Bacteria Cyanobacteria n Phormidium pseudopristleyi GKJWQY101BQZC6 125 23 93 5E-28 100% 100% 157384116 Bacteria Cyanobacteria n uncultured Planktothricoides sp. GKJWQY101AY7ZD 107 5 50 0.000002 90% 90% 225696182 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. GKJWQY101ARSH3 168 5 80 1E-30 100% 100% 225696183 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. GKJWQY101A09D3 176 18 142 2E-47 95% 95% 225696254 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. 195 Table S3 Cont.

GKJWQY101A4KSE 138 49 108 7E-22 100% 100% 225696281 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. GKJWQY101BTRUZ 140 17 80 9E-21 97% 97% 225696289 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. GKJWQY101BUHI6 294 5 84 1E-32 100% 100% 225696290 Bacteria Cyanobacteria n uncultured Chroococcidiopsis sp. GKJWQY101AEUTS_2 40 5 40 2E-10 100% 100% 358680771 Bacteria Cyanobacteria Arthrospira Arthrospira platensis GKJWQY101AZK7G 178 1 137 2E-43 90% 90% 46409885 Bacteria Cyanobacteria Nostocaceae Anabaena sp. PCC 9109 GKJWQY101A6FV9 248 5 197 3E-68 91% 91% 46409888 Bacteria Cyanobacteria Nostocaceae Nodularia harveyana GKJWQY101BA9Y4 154 18 123 2E-47 100% 100% 46409897 Bacteria Cyanobacteria n Lyngbya aestuarii GKJWQY101BDPUJ 291 5 241 5E-106 97% 97% 284055538 Bacteria Deinococcus-Thermus Deinococcaceae Deinococcus radiodurans GKJWQY101AXBQG 175 17 128 3E-41 96% 96% 145567431 Bacteria Fibrobacteres Fibrobacteraceae uncultured Fibrobacter sp. GKJWQY101BM1GN 71 6 55 3E-11 94% 94% 294337929 Bacteria Firmicutes Bacillaceae Bacillus clausii GKJWQY101ARL9J 221 5 189 2E-89 99% 99% 294337906 Bacteria Firmicutes Bacillaceae Bacillus sp. 3LF 24T GKJWQY101BWSJU 549 201 547 2E-173 99% 99% 15042017 Bacteria Firmicutes Bacillaceae Bacillus sp. NCIB 12289 GKJWQY101BURPU 269 15 99 2E-35 100% 100% 134290402 Bacteria Firmicutes Bacillaceae Geobacillus kaustophilus GKJWQY101BPODQ 120 18 79 4E-23 100% 100% 283131629 Bacteria Firmicutes Bacillaceae uncultured Amphibacillus sp. GKJWQY101BQ5E9 450 86 414 3E-135 94% 94% 284428972 Bacteria Firmicutes Bacillaceae uncultured Bacillus sp. GKJWQY101BZOS5 330 50 211 2E-51 90% 90% 295083276 Bacteria Firmicutes Bacillaceae Bacillus licheniformis GKJWQY101BNMTI 183 5 114 7E-33 91% 91% 254841686 Bacteria Firmicutes Clostridiaceae Clostridium perfringens GKJWQY101AY4WN_2 69 1 69 3E-17 92% 92% 373943374 Bacteria Firmicutes Clostridiaceae Clostridium GKJWQY101B0MXN 222 5 190 2E-83 97% 97% 254952546 Bacteria Firmicutes Enterococcaceae Enterococcus columbae GKJWQY101AAXDV 137 11 93 2E-31 98% 98% 50080727 Bacteria Firmicutes Enterococcaceae Tetragenococcus doogicus GKJWQY101BDM0N 186 5 140 4E-60 99% 99% 189345361 Bacteria Firmicutes Lactobacillaceae Lactobacillus delbrueckii GKJWQY101BBYPN 167 5 114 1E-49 100% 100% 285201718 Bacteria Firmicutes Lactobacillaceae Lactobacillus rhamnosus GKJWQY101AAULG 351 18 112 3E-39 99% 99% 285802968 Bacteria Firmicutes Lactobacillaceae Lactobacillus sp. oral taxon 461 GKJWQY101BCYFO_2 175 1 175 4E-52 89% 89% 345283481 Bacteria Firmicutes Lactobacillaceae Lactobacillus GKJWQY101BYGNW 193 5 119 1E-50 99% 99% 154189328 Bacteria Firmicutes n uncultured Bacilli bacterium GKJWQY101BGC3W 113 23 82 5E-22 100% 100% 295656276 Bacteria Firmicutes n Firmicutes bacterium enrichment culture clone VNBB003 GKJWQY101BPAK0 220 15 175 2E-59 93% 93% 118135758 Bacteria Firmicutes n uncultured Firmicutes bacterium GKJWQY101B1XT2 490 5 353 3E-180 100% 100% 146430648 Bacteria Firmicutes n uncultured Firmicutes bacterium GKJWQY101ADWDT 111 18 78 3E-19 97% 97% 291330751 Bacteria Firmicutes n uncultured Firmicutes bacterium GKJWQY101BAJPW 199 4 146 8E-68 100% 100% 291332909 Bacteria Firmicutes n uncultured Firmicutes bacterium GKJWQY101BQAJ9 118 18 87 2E-27 100% 100% 295656572 Bacteria Firmicutes n uncultured Firmicutes bacterium GKJWQY101ANJ0H 213 3 164 4E-71 98% 98% 27819245 Bacteria Firmicutes n uncultured low G+C Gram-positive bacterium GKJWQY101BR4I0 145 2 100 2E-43 100% 100% 292485814 Bacteria Firmicutes Planococcaceae uncultured Planococcus sp. GKJWQY101AWLH0 224 81 143 1E-21 98% 98% 292485818 Bacteria Firmicutes Planococcaceae uncultured Planomicrobium sp. GKJWQY101A74IZ 194 5 136 1E-61 100% 100% 159032974 Bacteria Firmicutes Streptococcaceae Lactococcus lactis GKJWQY101ARYR6 239 24 187 2E-79 100% 100% 295002587 Bacteria Firmicutes Streptococcaceae Streptococcus australis GKJWQY101ADMQF 130 30 88 1E-19 98% 98% 24474990 Bacteria Firmicutes Streptococcaceae Streptococcus constellatus GKJWQY101BRH69 166 5 134 1E-54 97% 97% 290759882 Bacteria Firmicutes Streptococcaceae Streptococcus cristatus GKJWQY101AD5O6 203 15 158 3E-62 97% 97% 285184215 Bacteria Firmicutes Streptococcaceae Streptococcus intermedius GKJWQY101BRELC 206 5 107 2E-39 96% 96% 110432033 Bacteria Firmicutes Streptococcaceae Streptococcus mutans GKJWQY101A0A4K 169 24 137 1E-45 97% 97% 285205054 Bacteria Firmicutes Streptococcaceae Streptococcus sp. oral taxon E07 GKJWQY101BCLHF 176 3 128 2E-58 100% 100% 60501147 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. GKJWQY101BJ018 220 5 164 3E-77 100% 100% 295322329 Bacteria Firmicutes Streptococcaceae uncultured Streptococcus sp. GKJWQY101BXNXM_2 209 1 209 1E-86 96% 96% 339277069 Bacteria Firmicutes Streptococcaceae Streptococcus GKJWQY101BGFER 163 4 155 3E-71 99% 99% 285802763 Bacteria Firmicutes Veillonellaceae Selenomonas sp. oral taxon 442 GKJWQY101AKQLO 205 18 148 6E-54 96% 96% 285166090 Bacteria Firmicutes Veillonellaceae Selenomonas sputigena GKJWQY101ACH6F 209 18 153 3E-62 99% 99% 285166317 Bacteria Firmicutes Veillonellaceae Selenomonas sputigena GKJWQY101A4X2Z 215 5 184 1E-85 99% 99% 16508084 Bacteria n n uncultured bacterium GKJWQY101AVTAG 302 5 176 9E-84 100% 100% 19908565 Bacteria n n uncultured bacterium GKJWQY101A7KKU 167 5 105 2E-42 99% 99% 37654631 Bacteria n n uncultured bacterium GKJWQY101A6OXT 139 15 102 1E-24 91% 91% 54695037 Bacteria n n uncultured bacterium GKJWQY101AW45M 226 5 194 2E-64 91% 91% 63087409 Bacteria n n uncultured bacterium GKJWQY101B2Z7C 207 5 174 3E-72 96% 96% 68004476 Bacteria n n uncultured bacterium GKJWQY101BN5ON 301 109 219 2E-41 95% 95% 71089366 Bacteria n n uncultured bacterium GKJWQY101AQAPU 232 5 185 2E-44 86% 86% 71089410 Bacteria n n uncultured bacterium GKJWQY101AGG0L 404 193 371 2E-82 98% 98% 71739177 Bacteria n n uncultured bacterium GKJWQY101ALMD0 274 17 162 2E-69 100% 100% 74038724 Bacteria n n uncultured bacterium GKJWQY101BRO0X 211 5 63 5E-15 93% 93% 80978525 Bacteria n n uncultured bacterium GKJWQY101A114T 131 3 84 4E-34 100% 100% 109143142 Bacteria n n uncultured bacterium GKJWQY101B2BO0 186 6 138 1E-60 99% 99% 109145063 Bacteria n n uncultured bacterium GKJWQY101AD5PI 150 3 97 3E-41 100% 100% 109146140 Bacteria n n uncultured bacterium GKJWQY101A3O5F 275 100 259 1E-67 96% 96% 110435279 Bacteria n n uncultured bacterium GKJWQY101BK3NT 188 3 134 9E-62 100% 100% 110441026 Bacteria n n uncultured bacterium GKJWQY101A4T8R 171 15 113 2E-38 97% 97% 110442172 Bacteria n n uncultured bacterium GKJWQY101BZXHF 161 19 98 3E-31 99% 99% 110445958 Bacteria n n uncultured bacterium GKJWQY101ALSLZ 399 129 352 7E-111 100% 100% 110449562 Bacteria n n uncultured bacterium GKJWQY101AAHX7 170 62 142 9E-32 99% 99% 110450750 Bacteria n n uncultured bacterium GKJWQY101AVQKA 241 14 196 2E-54 89% 89% 117572568 Bacteria n n uncultured bacterium GKJWQY101BGPB7 172 7 170 5E-69 96% 96% 126115235 Bacteria n n uncultured bacterium GKJWQY101A4OEB 530 375 522 2E-37 87% 87% 126674327 Bacteria n n uncultured bacterium GKJWQY101AKX7M 176 5 126 7E-53 98% 98% 148249503 Bacteria n n uncultured bacterium GKJWQY101BYO28 149 24 51 0.0005 100% 100% 151610746 Bacteria n n uncultured bacterium GKJWQY101BL1PV 87 4 42 0.0000004 95% 95% 156522979 Bacteria n n uncultured bacterium GKJWQY101AVV30 252 5 177 1E-82 99% 99% 157926754 Bacteria n n uncultured bacterium GKJWQY101BIS50 108 5 51 4E-13 98% 98% 158148312 Bacteria n n uncultured bacterium GKJWQY101ACR1Y 111 17 79 5E-22 98% 98% 160922959 Bacteria n n uncultured bacterium GKJWQY101BHT31 160 3 129 1E-55 98% 98% 160922967 Bacteria n n uncultured bacterium GKJWQY101BDAHB 214 28 161 8E-63 100% 100% 160922973 Bacteria n n uncultured bacterium GKJWQY101B0MIJ 173 5 131 1E-55 98% 98% 161085546 Bacteria n n uncultured bacterium GKJWQY101ADTLC 352 18 113 4E-38 98% 98% 164460342 Bacteria n n uncultured bacterium GKJWQY101BV86J 228 5 169 5E-65 95% 95% 169130236 Bacteria n n uncultured bacterium 196 Table S3 Cont.

GKJWQY101BMOCO 135 5 84 5E-33 100% 100% 169132834 Bacteria n n uncultured bacterium GKJWQY101AQHB3 251 24 193 5E-81 99% 99% 169266077 Bacteria n n uncultured bacterium GKJWQY101BDMF0 229 19 194 2E-45 87% 87% 169270486 Bacteria n n uncultured bacterium GKJWQY101A7UQ6 95 6 62 2E-14 93% 93% 169270865 Bacteria n n uncultured bacterium GKJWQY101A7X5W 190 4 125 3E-46 95% 95% 169276777 Bacteria n n uncultured bacterium GKJWQY101BJ2CQ 133 42 101 1E-18 97% 97% 169281343 Bacteria n n uncultured bacterium GKJWQY101BYP36 218 5 173 4E-66 94% 94% 169285764 Bacteria n n uncultured bacterium GKJWQY101A0A9F 202 24 177 1E-55 93% 93% 169286886 Bacteria n n uncultured bacterium GKJWQY101AYP07 478 200 417 2E-101 98% 98% 169288388 Bacteria n n uncultured bacterium GKJWQY101ACHNZ 216 5 147 8E-68 100% 100% 169908324 Bacteria n n uncultured bacterium GKJWQY101AVWFQ 148 5 92 9E-36 99% 99% 186703460 Bacteria n n uncultured bacterium GKJWQY101AWIA1 153 24 97 3E-26 97% 97% 187424100 Bacteria n n uncultured bacterium GKJWQY101BZBZK 255 16 94 8E-29 97% 97% 187472598 Bacteria n n uncultured bacterium GKJWQY101BO4CQ 188 2 132 4E-55 97% 97% 187967876 Bacteria n n uncultured bacterium GKJWQY101BQFTF 133 17 85 6E-27 100% 100% 189017027 Bacteria n n uncultured bacterium GKJWQY101AW4ZG 229 5 100 6E-40 99% 99% 189305490 Bacteria n n uncultured bacterium GKJWQY101BTFQ8 263 7 131 3E-53 98% 98% 190703341 Bacteria n n uncultured bacterium GKJWQY101AT5HI 193 18 149 1E-46 93% 93% 192787190 Bacteria n n uncultured bacterium GKJWQY101BYLLM 216 5 162 2E-74 99% 99% 192975671 Bacteria n n uncultured bacterium GKJWQY101APW1M 234 6 174 1E-76 98% 98% 192976646 Bacteria n n uncultured bacterium GKJWQY101BCCB7 96 18 65 2E-15 100% 100% 192979799 Bacteria n n uncultured bacterium GKJWQY101BLGRG 233 14 189 2E-74 96% 96% 192984058 Bacteria n n uncultured bacterium GKJWQY101A68XQ 161 2 113 1E-50 100% 100% 192985634 Bacteria n n uncultured bacterium GKJWQY101AYCI9 190 5 133 2E-54 98% 98% 192988614 Bacteria n n uncultured bacterium GKJWQY101A0EAO 184 4 142 7E-58 96% 96% 192989009 Bacteria n n uncultured bacterium GKJWQY101AEMB4 196 10 102 1E-30 94% 94% 192989690 Bacteria n n uncultured bacterium GKJWQY101BEIXQ 337 22 203 6E-86 99% 99% 194139977 Bacteria n n uncultured bacterium GKJWQY101AFAG6 220 42 165 1E-55 99% 99% 196050989 Bacteria n n uncultured bacterium GKJWQY101BAP6K 228 3 180 3E-87 100% 100% 197346063 Bacteria n n uncultured bacterium GKJWQY101AEOVS 227 3 147 7E-69 100% 100% 197346834 Bacteria n n uncultured bacterium GKJWQY101B2WLT 208 21 179 6E-69 97% 97% 197347839 Bacteria n n uncultured bacterium GKJWQY101AC3X6 193 5 125 2E-48 97% 97% 197350746 Bacteria n n uncultured bacterium GKJWQY101BEOTH 220 4 168 1E-56 92% 92% 197358937 Bacteria n n uncultured bacterium GKJWQY101BTB05 73 18 50 0.0000003 100% 100% 198403818 Bacteria n n uncultured bacterium GKJWQY101BP7BQ 394 168 334 5E-63 95% 95% 198403887 Bacteria n n uncultured bacterium GKJWQY101A3PZM 120 25 85 1E-14 93% 93% 206598970 Bacteria n n uncultured bacterium GKJWQY101BBE60 189 24 149 4E-55 98% 98% 210076514 Bacteria n n uncultured bacterium GKJWQY101AP30O 219 5 174 2E-79 99% 99% 214022832 Bacteria n n uncultured bacterium GKJWQY101ATCER 216 18 202 6E-89 99% 99% 214025732 Bacteria n n uncultured bacterium GKJWQY101AZILO 235 18 149 1E-61 100% 100% 215262245 Bacteria n n uncultured bacterium GKJWQY101ATF4E 246 18 147 2E-60 100% 100% 215269474 Bacteria n n uncultured bacterium GKJWQY101AISXB 219 16 174 5E-75 99% 99% 215270163 Bacteria n n uncultured bacterium GKJWQY101A1VO9 144 16 66 7E-17 100% 100% 215271334 Bacteria n n uncultured bacterium GKJWQY101AQV1K 224 17 180 9E-58 92% 92% 217323656 Bacteria n n uncultured bacterium GKJWQY101A2QDJ 112 4 68 4E-23 98% 98% 217417638 Bacteria n n uncultured bacterium GKJWQY101BZW8R 188 6 90 2E-29 96% 96% 218411184 Bacteria n n uncultured bacterium GKJWQY101BAP4Z 144 1 102 3E-45 100% 100% 220980230 Bacteria n n uncultured bacterium GKJWQY101AF6N4 160 17 133 2E-33 91% 91% 222101801 Bacteria n n uncultured bacterium GKJWQY101BU7MO 133 18 80 2E-21 98% 98% 223675948 Bacteria n n uncultured bacterium GKJWQY101AVNTG 205 5 152 6E-69 99% 99% 223676026 Bacteria n n uncultured bacterium GKJWQY101AARCP 213 90 178 9E-38 100% 100% 223676405 Bacteria n n uncultured bacterium GKJWQY101AZYPW 181 21 150 3E-47 94% 94% 223677000 Bacteria n n uncultured bacterium GKJWQY101ACQYN 229 5 185 7E-89 100% 100% 223677158 Bacteria n n uncultured bacterium GKJWQY101A1FP8 170 5 118 8E-52 100% 100% 223677225 Bacteria n n uncultured bacterium GKJWQY101AXZSR 623 341 586 7E-98 94% 94% 223679670 Bacteria n n uncultured bacterium GKJWQY101AG521 203 5 159 8E-73 99% 99% 223679768 Bacteria n n uncultured bacterium GKJWQY101BPNVT 86 4 41 6E-10 100% 100% 223680429 Bacteria n n uncultured bacterium GKJWQY101BPZTP 212 24 167 2E-64 99% 99% 223681084 Bacteria n n uncultured bacterium GKJWQY101BSD63 168 17 130 2E-32 91% 91% 223681760 Bacteria n n uncultured bacterium GKJWQY101AF98O 498 293 451 1E-65 96% 96% 223684843 Bacteria n n uncultured bacterium GKJWQY101AQ46D 201 18 102 1E-35 100% 100% 223689665 Bacteria n n uncultured bacterium GKJWQY101BSWK8 103 18 97 6E-26 95% 95% 223696363 Bacteria n n uncultured bacterium GKJWQY101BWN63 212 5 179 1E-85 100% 100% 223696670 Bacteria n n uncultured bacterium GKJWQY101BF2X2 209 4 156 2E-58 94% 94% 223696681 Bacteria n n uncultured bacterium GKJWQY101BJIPN 129 5 58 1E-18 100% 100% 223696712 Bacteria n n uncultured bacterium GKJWQY101BINK2 216 5 173 3E-82 100% 100% 223987560 Bacteria n n uncultured bacterium GKJWQY101B1686 179 5 136 9E-62 100% 100% 224569286 Bacteria n n uncultured bacterium GKJWQY101BBI23 224 5 186 2E-89 100% 100% 224569396 Bacteria n n uncultured bacterium GKJWQY101AIJSJ 280 18 170 2E-69 99% 99% 224569450 Bacteria n n uncultured bacterium GKJWQY101ACUTJ 176 1 144 3E-66 99% 99% 224570010 Bacteria n n uncultured bacterium GKJWQY101BMIOK 206 5 161 6E-74 99% 99% 225302498 Bacteria n n uncultured bacterium GKJWQY101A8GVM 110 4 65 4E-23 100% 100% 225332552 Bacteria n n uncultured bacterium GKJWQY101AJOGY 450 154 395 2E-112 98% 98% 225337346 Bacteria n n uncultured bacterium GKJWQY101A94MM 132 17 100 5E-33 99% 99% 225382275 Bacteria n n uncultured bacterium GKJWQY101A97N8 114 5 65 7E-21 98% 98% 226446886 Bacteria n n uncultured bacterium GKJWQY101AQQE2 184 69 137 4E-25 99% 99% 227937910 Bacteria n n uncultured bacterium GKJWQY101A47QD 214 24 169 8E-58 95% 95% 229428878 Bacteria n n uncultured bacterium GKJWQY101B08E3 181 16 149 1E-50 95% 95% 229428912 Bacteria n n uncultured bacterium GKJWQY101AJVH3 191 18 174 6E-74 99% 99% 237774919 Bacteria n n uncultured bacterium GKJWQY101BKWJX 102 2 52 5E-17 100% 100% 238274022 Bacteria n n uncultured bacterium GKJWQY101BB15S 206 5 152 6E-69 99% 99% 238275953 Bacteria n n uncultured bacterium GKJWQY101BPQIT 176 3 118 7E-53 100% 100% 238286775 Bacteria n n uncultured bacterium GKJWQY101AX4CO 219 24 192 3E-82 100% 100% 238296004 Bacteria n n uncultured bacterium 197 Table S3 Cont.

GKJWQY101AFUHT 181 5 120 7E-53 100% 100% 238302943 Bacteria n n uncultured bacterium GKJWQY101B0FZ0 182 5 137 2E-62 100% 100% 238303545 Bacteria n n uncultured bacterium GKJWQY101ANDAP 251 24 190 6E-75 98% 98% 238306185 Bacteria n n uncultured bacterium GKJWQY101AG7ZP 222 3 172 4E-81 99% 99% 238306254 Bacteria n n uncultured bacterium GKJWQY101ARRBE 234 4 202 7E-99 100% 100% 238309466 Bacteria n n uncultured bacterium GKJWQY101BPA56 228 3 174 1E-81 99% 99% 238309474 Bacteria n n uncultured bacterium GKJWQY101BNTDY 263 5 202 1E-96 99% 99% 238310689 Bacteria n n uncultured bacterium GKJWQY101AELLV 368 5 120 2E-47 97% 97% 238313072 Bacteria n n uncultured bacterium GKJWQY101BME63 269 5 189 4E-82 97% 97% 238313611 Bacteria n n uncultured bacterium GKJWQY101BQWYM 200 1 140 2E-64 99% 99% 238315007 Bacteria n n uncultured bacterium GKJWQY101AFK5L 220 5 180 4E-86 100% 100% 238319366 Bacteria n n uncultured bacterium GKJWQY101BVCL6 201 23 156 1E-60 99% 99% 238324477 Bacteria n n uncultured bacterium GKJWQY101ABFSX 463 196 431 2E-107 97% 97% 238327948 Bacteria n n uncultured bacterium GKJWQY101B0QT6 171 5 89 4E-20 91% 91% 238331528 Bacteria n n uncultured bacterium GKJWQY101AL5T9 227 14 202 1E-86 98% 98% 238335486 Bacteria n n uncultured bacterium GKJWQY101BIA0Y 180 5 129 4E-55 98% 98% 238336486 Bacteria n n uncultured bacterium GKJWQY101BND1Y 186 4 154 3E-72 100% 100% 238340166 Bacteria n n uncultured bacterium GKJWQY101A9YVS 229 45 183 3E-62 99% 99% 238341273 Bacteria n n uncultured bacterium GKJWQY101BFJXK 114 3 84 3E-34 100% 100% 238347940 Bacteria n n uncultured bacterium GKJWQY101A9BQW 162 5 116 1E-50 100% 100% 238349878 Bacteria n n uncultured bacterium GKJWQY101AG5ZO 231 5 182 3E-48 88% 88% 238351695 Bacteria n n uncultured bacterium GKJWQY101A2NQY 210 5 164 6E-74 99% 99% 238352312 Bacteria n n uncultured bacterium GKJWQY101AELIY 202 10 103 6E-39 99% 99% 238352426 Bacteria n n uncultured bacterium GKJWQY101A6YZN 180 2 123 1E-54 99% 99% 238352568 Bacteria n n uncultured bacterium GKJWQY101BSXH9 574 147 574 4E-160 91% 91% 238415763 Bacteria n n uncultured bacterium GKJWQY101A6U0N 178 21 133 3E-51 100% 100% 238587419 Bacteria n n uncultured bacterium GKJWQY101BURBQ 202 4 157 5E-70 99% 99% 239837088 Bacteria n n uncultured bacterium GKJWQY101AUZ7K 232 18 185 5E-80 99% 99% 239837341 Bacteria n n uncultured bacterium GKJWQY101BFQ9T 171 2 113 1E-50 100% 100% 240001746 Bacteria n n uncultured bacterium GKJWQY101AZUYL 129 17 89 4E-29 100% 100% 247891912 Bacteria n n uncultured bacterium GKJWQY101B09JV 123 17 91 6E-27 97% 97% 253825750 Bacteria n n uncultured bacterium GKJWQY101BUXPV 140 18 109 1E-39 100% 100% 254547427 Bacteria n n uncultured bacterium GKJWQY101BE5CL 181 7 131 5E-49 96% 96% 254771233 Bacteria n n uncultured bacterium GKJWQY101BWU6P 603 36 63 0.002 100% 100% 254971702 Bacteria n n uncultured bacterium GKJWQY101BHUKI 229 5 184 5E-80 97% 97% 255043921 Bacteria n n uncultured bacterium GKJWQY101BRBK4 185 2 119 3E-52 99% 99% 255044082 Bacteria n n uncultured bacterium GKJWQY101A55O7 145 23 105 5E-33 99% 99% 255339758 Bacteria n n uncultured bacterium GKJWQY101AD8U0 274 3 185 6E-90 100% 100% 255339764 Bacteria n n uncultured bacterium GKJWQY101ANPVF 190 24 158 9E-57 97% 97% 256592689 Bacteria n n uncultured bacterium GKJWQY101BBIYS 87 18 60 2E-10 98% 98% 256681321 Bacteria n n uncultured bacterium GKJWQY101AYIQ0 173 5 122 1E-49 98% 98% 257131081 Bacteria n n uncultured bacterium GKJWQY101B1L3S 133 3 65 1E-14 92% 92% 258548056 Bacteria n n uncultured bacterium GKJWQY101AWV2U 233 5 201 9E-93 98% 98% 258550960 Bacteria n n uncultured bacterium GKJWQY101APOW0 55 2 43 2E-12 100% 100% 258551205 Bacteria n n uncultured bacterium GKJWQY101BX81T 688 5 457 0 96% 96% 258680101 Bacteria n n uncultured bacterium GKJWQY101AVW0Q 378 18 335 1E-147 97% 97% 258680407 Bacteria n n uncultured bacterium GKJWQY101AOP20 284 5 210 1E-92 97% 97% 258680950 Bacteria n n uncultured bacterium GKJWQY101AEFC7 538 5 476 0 93% 93% 258681941 Bacteria n n uncultured bacterium GKJWQY101ADZOY 346 17 312 6E-146 99% 99% 258681993 Bacteria n n uncultured bacterium GKJWQY101ACRBS 347 7 260 2E-115 97% 97% 258682090 Bacteria n n uncultured bacterium GKJWQY101BPZYR 222 22 188 4E-51 89% 89% 258682100 Bacteria n n uncultured bacterium GKJWQY101AI27W 451 8 418 6E-142 89% 89% 258682119 Bacteria n n uncultured bacterium GKJWQY101BXXJY 285 17 241 7E-110 99% 99% 258682645 Bacteria n n uncultured bacterium GKJWQY101A3VNN 207 5 159 8E-73 99% 99% 258682725 Bacteria n n uncultured bacterium GKJWQY101AJO34 386 6 339 1E-148 96% 96% 258683264 Bacteria n n uncultured bacterium GKJWQY101B089A 543 26 541 0 97% 97% 258684587 Bacteria n n uncultured bacterium GKJWQY101B1U3Z 549 19 543 0 98% 98% 258684800 Bacteria n n uncultured bacterium GKJWQY101BMENF 559 20 547 0 96% 96% 258684963 Bacteria n n uncultured bacterium GKJWQY101ADSPO 151 24 106 2E-27 95% 95% 258685315 Bacteria n n uncultured bacterium GKJWQY101AWUL7 137 5 91 4E-34 98% 98% 258686152 Bacteria n n uncultured bacterium GKJWQY101BF5YU 522 21 425 0 96% 96% 258687396 Bacteria n n uncultured bacterium GKJWQY101A4682 528 107 472 9E-171 97% 97% 258687549 Bacteria n n uncultured bacterium GKJWQY101AJVQS 546 5 447 0 98% 98% 258687641 Bacteria n n uncultured bacterium GKJWQY101BNW25 426 5 245 8E-61 87% 87% 258687840 Bacteria n n uncultured bacterium GKJWQY101ADWNC 328 17 282 1E-132 99% 99% 258687963 Bacteria n n uncultured bacterium GKJWQY101AC6T3 447 25 368 4E-164 97% 97% 258688401 Bacteria n n uncultured bacterium GKJWQY101BAGMC 219 6 152 5E-70 100% 100% 258688531 Bacteria n n uncultured bacterium GKJWQY101BYR0S 412 5 378 1E-167 95% 95% 258688670 Bacteria n n uncultured bacterium GKJWQY101AIZ1T 226 18 192 1E-75 97% 97% 258689065 Bacteria n n uncultured bacterium GKJWQY101APBNE 557 10 538 0 89% 89% 258689086 Bacteria n n uncultured bacterium GKJWQY101BKH69 253 24 220 2E-89 97% 97% 258689185 Bacteria n n uncultured bacterium GKJWQY101ADEAR 164 4 65 6E-23 100% 100% 259604289 Bacteria n n uncultured bacterium GKJWQY101BXWG8 197 5 150 2E-58 96% 96% 259880044 Bacteria n n uncultured bacterium GKJWQY101BEEWO 134 17 102 2E-36 100% 100% 259880137 Bacteria n n uncultured bacterium GKJWQY101ACH03 164 4 119 6E-53 100% 100% 260108166 Bacteria n n uncultured bacterium GKJWQY101A1RXA 132 1 87 3E-35 99% 99% 260609497 Bacteria n n uncultured bacterium GKJWQY101BLGDQ 231 18 160 9E-68 100% 100% 260609898 Bacteria n n uncultured bacterium GKJWQY101AERMJ 210 14 157 1E-66 99% 99% 260609992 Bacteria n n uncultured bacterium GKJWQY101BY15Y 137 5 70 2E-21 97% 97% 260667055 Bacteria n n uncultured bacterium GKJWQY101BPHVX 143 5 99 4E-34 96% 96% 260667085 Bacteria n n uncultured bacterium GKJWQY101BKT0U 137 5 125 4E-49 97% 97% 261261872 Bacteria n n uncultured bacterium GKJWQY101AQVUZ 224 5 167 1E-75 99% 99% 261261913 Bacteria n n uncultured bacterium GKJWQY101AQWBK 90 24 63 2E-09 98% 98% 262174158 Bacteria n n uncultured bacterium 198 Table S3 Cont.

GKJWQY101AD88J 193 1 134 3E-61 99% 99% 262174241 Bacteria n n uncultured bacterium GKJWQY101BRBKI 141 4 98 1E-39 99% 99% 262528725 Bacteria n n uncultured bacterium GKJWQY101BFOCG 165 18 87 5E-24 97% 97% 269152240 Bacteria n n uncultured bacterium GKJWQY101ADP9P 317 7 160 8E-55 93% 93% 269162065 Bacteria n n uncultured bacterium GKJWQY101AA3KS 167 5 125 1E-55 100% 100% 269162927 Bacteria n n uncultured bacterium GKJWQY101A0UAE 86 1 79 4E-31 99% 99% 269855408 Bacteria n n uncultured bacterium GKJWQY101AVZ1L 428 182 375 1E-93 99% 99% 269973185 Bacteria n n uncultured bacterium GKJWQY101BV8M2 198 18 165 1E-70 100% 100% 270104615 Bacteria n n uncultured bacterium GKJWQY101A2DWK 155 5 102 6E-43 100% 100% 270267828 Bacteria n n uncultured bacterium GKJWQY101BOFMM 163 5 111 6E-48 100% 100% 281485327 Bacteria n n uncultured bacterium GKJWQY101A7BNU 123 5 80 7E-31 100% 100% 281488397 Bacteria n n uncultured bacterium GKJWQY101BLS2Q 356 5 201 8E-95 99% 99% 284158463 Bacteria n n uncultured bacterium GKJWQY101AU3MF 184 5 132 2E-59 100% 100% 284944629 Bacteria n n uncultured bacterium GKJWQY101A4RQ9 107 5 55 5E-17 100% 100% 285016583 Bacteria n n uncultured bacterium GKJWQY101ANHEG 160 9 57 8E-12 96% 96% 285960274 Bacteria n n uncultured bacterium GKJWQY101BVG3R 205 5 160 2E-63 96% 96% 285960440 Bacteria n n uncultured bacterium GKJWQY101ALVLH 206 22 156 2E-63 100% 100% 285960578 Bacteria n n uncultured bacterium GKJWQY101AIGIT 131 3 99 4E-39 98% 98% 288551175 Bacteria n n uncultured bacterium GKJWQY101AX0X2 121 22 75 1E-18 100% 100% 289185720 Bacteria n n uncultured bacterium GKJWQY101AX88J 191 6 152 6E-49 91% 91% 289185786 Bacteria n n uncultured bacterium GKJWQY101AOFRT 245 17 194 7E-84 99% 99% 289656603 Bacteria n n uncultured bacterium GKJWQY101AC3OO 104 4 48 1E-13 100% 100% 290564556 Bacteria n n uncultured bacterium GKJWQY101AO7X1 208 17 128 1E-41 96% 96% 290591144 Bacteria n n uncultured bacterium GKJWQY101BSD8Q 473 143 178 0.00001 97% 97% 290596837 Bacteria n n uncultured bacterium GKJWQY101AU3TD 88 19 56 6E-10 100% 100% 290604206 Bacteria n n uncultured bacterium GKJWQY101BRKP7 177 5 160 7E-53 92% 92% 290611134 Bacteria n n uncultured bacterium GKJWQY101B3CO0 211 5 163 4E-71 98% 98% 290618299 Bacteria n n uncultured bacterium GKJWQY101AWPF6 225 24 181 4E-66 96% 96% 290618303 Bacteria n n uncultured bacterium GKJWQY101BI53V 248 18 138 2E-55 100% 100% 290620788 Bacteria n n uncultured bacterium GKJWQY101BTFT8 225 67 141 4E-26 97% 97% 290621759 Bacteria n n uncultured bacterium GKJWQY101BER09 186 5 153 3E-71 100% 100% 290625301 Bacteria n n uncultured bacterium GKJWQY101A3SBZ 181 5 137 2E-62 100% 100% 290770388 Bacteria n n uncultured bacterium GKJWQY101BSAL8 98 4 66 9E-24 100% 100% 290783722 Bacteria n n uncultured bacterium GKJWQY101AR8CU 184 23 182 2E-77 100% 100% 291067147 Bacteria n n uncultured bacterium GKJWQY101A3CG5 230 5 182 2E-65 93% 93% 291192724 Bacteria n n uncultured bacterium GKJWQY101BYJU2 164 5 105 1E-39 97% 97% 291246515 Bacteria n n uncultured bacterium GKJWQY101ASA6U 149 18 100 1E-34 100% 100% 291249052 Bacteria n n uncultured bacterium GKJWQY101BF0DI 67 18 62 5E-09 93% 93% 291249353 Bacteria n n uncultured bacterium GKJWQY101BTIQM 177 1 132 3E-56 98% 98% 291252246 Bacteria n n uncultured bacterium GKJWQY101BAAER 185 18 86 1E-26 100% 100% 291510658 Bacteria n n uncultured bacterium GKJWQY101A02NJ 200 18 168 6E-69 99% 99% 295008960 Bacteria n n uncultured bacterium GKJWQY101AVMBQ 106 24 61 8E-10 100% 100% 295012247 Bacteria n n uncultured bacterium GKJWQY101A3CHU 218 17 187 2E-63 93% 93% 295013607 Bacteria n n uncultured bacterium GKJWQY101ADQPK 186 5 137 1E-60 99% 99% 295013630 Bacteria n n uncultured bacterium GKJWQY101BR4RJ 260 5 193 3E-93 100% 100% 295027769 Bacteria n n uncultured bacterium GKJWQY101AIPYT 186 5 137 2E-54 97% 97% 295045179 Bacteria n n uncultured bacterium GKJWQY101AQDVB 104 28 73 3E-14 100% 100% 295083163 Bacteria n n uncultured bacterium GKJWQY101BM4L2 95 5 45 1E-11 100% 100% 295638929 Bacteria n n uncultured bacterium GKJWQY101BH7TJ 266 60 235 5E-86 100% 100% 295810063 Bacteria n n uncultured bacterium GKJWQY101AWOPR 185 5 130 2E-58 100% 100% 295810065 Bacteria n n uncultured bacterium GKJWQY101BD5HX 204 5 156 2E-69 99% 99% 295810143 Bacteria n n uncultured bacterium GKJWQY101AL26W 207 19 150 2E-54 97% 97% 295810432 Bacteria n n uncultured bacterium GKJWQY101AU1LV 120 5 92 2E-37 100% 100% 295810565 Bacteria n n uncultured bacterium GKJWQY101BBPAI 146 5 95 4E-39 100% 100% 62997521 Bacteria n n uncultured bacterium GKJWQY101BI2RK 467 190 413 1E-104 98% 98% 223030240 Bacteria n n uncultured bacterium GKJWQY101BRKQJ 268 24 216 4E-92 99% 99% 223033480 Bacteria n n uncultured bacterium GKJWQY101BPZ30 248 27 203 3E-73 96% 96% 295016296 Bacteria n n uncultured compost bacterium GKJWQY101B2BOY 143 21 98 7E-32 100% 100% 16517864 Bacteria n n uncultured soil bacterium GKJWQY101BK6K3 245 13 194 2E-84 98% 98% 283444151 Bacteria n n uncultured soil bacterium GKJWQY101A6O01 167 1 167 8E-73 97% 97% 193084603 Bacteria n n Uncultured bacterium GKJWQY101ARH2A 177 6 125 2E-53 99% 99% 291258594 Bacteria n n uncultured bacterium GKJWQY101BHTXI 135 17 105 8E-36 99% 99% 291259108 Bacteria n n uncultured bacterium GKJWQY101B1CM3 234 9 121 4E-17 83% 83% 291259661 Bacteria n n uncultured bacterium GKJWQY101B0GC4 232 5 182 1E-80 98% 98% 291259901 Bacteria n n uncultured bacterium GKJWQY101BK2V0 220 5 126 4E-46 95% 95% 291260044 Bacteria n n uncultured bacterium GKJWQY101A3QAF 430 5 174 8E-71 96% 96% 291260145 Bacteria n n uncultured bacterium GKJWQY101A4I2H 200 23 145 1E-51 98% 98% 291260173 Bacteria n n uncultured bacterium GKJWQY101A52X5 151 5 72 2E-22 97% 97% 291260287 Bacteria n n uncultured bacterium GKJWQY101AC0EY 132 5 59 5E-13 93% 93% 291260377 Bacteria n n uncultured bacterium GKJWQY101BH0G8 159 5 101 2E-42 100% 100% 291260391 Bacteria n n uncultured bacterium GKJWQY101B1PKG 118 5 74 3E-19 93% 93% 291260525 Bacteria n n uncultured bacterium GKJWQY101BEX01 215 52 140 2E-24 91% 91% 291260593 Bacteria n n uncultured bacterium GKJWQY101A0SSA 243 67 180 5E-21 84% 84% 291260615 Bacteria n n uncultured bacterium GKJWQY101BNG1Z 167 5 115 2E-48 99% 99% 291260854 Bacteria n n uncultured bacterium GKJWQY101AZHPN 175 15 119 2E-18 84% 84% 291261156 Bacteria n n uncultured bacterium GKJWQY101AI0QB 126 17 91 1E-28 99% 99% 291261438 Bacteria n n uncultured bacterium GKJWQY101AX367 102 5 46 0.0000005 93% 93% 291261727 Bacteria n n uncultured bacterium GKJWQY101BAGJP 139 4 84 7E-32 99% 99% 291261757 Bacteria n n uncultured bacterium GKJWQY101AH42X 188 31 108 1E-16 89% 89% 12583967 Bacteria Planctomycetes Planctomycetaceae planctomycete str. 139 GKJWQY101BQI14 154 51 119 8E-22 96% 96% 12583972 Bacteria Planctomycetes Planctomycetaceae planctomycete str. 670 GKJWQY101A0ZU2 198 5 144 4E-66 100% 100% 285200728 Bacteria Proteobacteria (alpha) n alpha proteobacterium oral taxon A67 GKJWQY101A6E43 105 8 62 2E-15 96% 96% 35464139 Bacteria Proteobacteria (alpha) n uncultured alpha proteobacterium GKJWQY101BQS3G 101 2 60 2E-21 100% 100% 55975756 Bacteria Proteobacteria (alpha) n uncultured alpha proteobacterium 199 Table S3 Cont.

GKJWQY101BHUK1 433 18 367 6E-167 97% 97% 146429045 Bacteria Proteobacteria (alpha) n uncultured alpha proteobacterium GKJWQY101AYR9D 217 5 149 7E-69 100% 100% 295147952 Bacteria Proteobacteria (alpha) Methylobacteriaceae Methylobacterium fujisawaense GKJWQY101BTLIZ 177 17 143 3E-51 96% 96% 289185500 Bacteria Proteobacteria (alpha) Rhodobiaceae Afifella marina GKJWQY101BFJ8J 221 3 160 9E-48 90% 90% 197734890 Bacteria Proteobacteria (alpha) Rhizobiaceae Sinorhizobium meliloti GKJWQY101AL2H8 139 4 82 9E-31 99% 99% 109140177 Bacteria Proteobacteria (beta) Alcaligenaceae uncultured Alcaligenes sp. GKJWQY101ANXLQ 190 5 119 3E-52 100% 100% 76665718 Bacteria Proteobacteria (beta) Burkholderiaceae Burkholderia sp. STM1424 GKJWQY101AUNYR 226 5 180 7E-84 99% 99% 269113442 Bacteria Proteobacteria (beta) Burkholderiaceae uncultured Burkholderia sp. GKJWQY101AGFKW 130 5 77 4E-29 100% 100% 189305112 Bacteria Proteobacteria (beta) Burkholderiaceae uncultured Ralstonia sp. GKJWQY101BA25Y 151 4 110 6E-48 100% 100% 184189965 Bacteria Proteobacteria (beta) Comamonadaceae uncultured Comamonadaceae bacterium GKJWQY101A9UVV 302 18 154 2E-60 99% 99% 284793547 Bacteria Proteobacteria (beta) n uncultured beta proteobacterium GKJWQY101AY5N9_2 141 1 141 2E-53 94% 94% 116651960 Bacteria Proteobacteria (beta) Burkholderiaceae Burkholderia cepacia complex GKJWQY101BME2Z 235 5 179 2E-85 100% 100% 112012332 Bacteria Proteobacteria (epsilon) Helicobacteraceae Helicobacter sp. MIT 01-3238 GKJWQY101B09E4_2 227 1 227 5E-30 74% 74% 345468266 Bacteria Proteobacteria (epsilon) Campylobacteraceae Arcobacter butzleri GKJWQY101APNF5 164 56 130 3E-21 94% 94% 260986235 Bacteria Proteobacteria (gamma) Enterobacteriaceae Erwinia sp. AaMG18 GKJWQY101AG17Z 148 5 95 3E-35 98% 98% 156618579 Bacteria Proteobacteria (gamma) n gamma proteobacterium A11 GKJWQY101AX2T2 204 32 158 3E-57 99% 99% 291195506 Bacteria Proteobacteria (gamma) n gamma proteobacterium enrichment culture clone BP44-5 GKJWQY101AG5GL 101 4 62 3E-19 98% 98% 8547190 Bacteria Proteobacteria (gamma) n uncultured gamma proteobacterium KEppiB7 GKJWQY101AGSVT 182 18 129 3E-21 85% 85% 159576434 Bacteria Proteobacteria (gamma) Halomonadaceae Halomonas sp. S8-1 GKJWQY101APW63 205 16 169 6E-74 100% 100% 192757978 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter frigidicola GKJWQY101BCXNO 231 5 175 2E-79 99% 99% 13276758 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter immobilis GKJWQY101AGV77 240 5 136 5E-31 87% 87% 168812040 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter maritimus GKJWQY101ATVI7 196 5 107 1E-45 100% 100% 222142510 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. 24 GKJWQY101AO5SO 172 18 90 5E-29 100% 100% 46250582 Bacteria Proteobacteria (gamma) Moraxellaceae Psychrobacter sp. CH61 GKJWQY101BVGIC 225 73 176 4E-46 100% 100% 209422607 Bacteria Proteobacteria (gamma) Moraxellaceae uncultured Acinetobacter sp. GKJWQY101BNT3R 127 5 77 4E-24 96% 96% 10719523 Bacteria Proteobacteria (gamma) Moraxellaceae uncultured Psychrobacter SIC.10360 GKJWQY101B0ZPN 207 62 146 2E-33 99% 99% 295345523 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. enrichment culture clone 23.2 GKJWQY101AD6DO 175 21 123 1E-45 100% 100% 295651552 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. HMD3178 GKJWQY101AJLCW 334 17 133 4E-53 100% 100% 295815425 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. III_B28 GKJWQY101BPHU6 70 18 59 3E-12 100% 100% 164707704 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. PGO22 GKJWQY101BQTSN 196 4 147 2E-68 100% 100% 259090496 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. VS05_30 GKJWQY101BZ7PS 172 4 138 4E-60 99% 99% 129561834 Bacteria Proteobacteria (gamma) Pseudomonadaceae Pseudomonas sp. WW6 GKJWQY101AHQ2U 206 93 184 2E-29 94% 94% 283979678 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonadaceae bacterium GKJWQY101BGAKE 423 17 168 2E-72 100% 100% 163676404 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. GKJWQY101BJYWX 357 80 324 1E-122 100% 100% 238543855 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. GKJWQY101BZD8Z 89 1 44 3E-13 100% 100% 242124429 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. GKJWQY101AYZXN 265 25 197 2E-84 100% 100% 295687302 Bacteria Proteobacteria (gamma) Pseudomonadaceae uncultured Pseudomonas sp. GKJWQY101BV8TV 70 1 70 9E-28 100% 100% 384478111 Bacteria Proteobacteria (gamma) Enterobacteriaceae Providencia GKJWQY101BI5B7 133 5 75 5E-18 92% 92% 151936519 Bacteria Proteobacteria (uncultured) n uncultured proteobacterium GKJWQY101BHXF9 221 18 190 2E-79 98% 98% 154189153 Bacteria Proteobacteria (uncultured) n uncultured proteobacterium GKJWQY101ASNWF 122 16 76 1E-18 97% 97% 9828144 Eukaryota Ascomycota Lecanoraceae Lecanora intumescens GKJWQY101A5TK8 127 19 64 4E-14 100% 100% 268633305 Eukaryota Bacillariophyta Thalassiosiraceae Stephanodiscus sp. KHR001 GKJWQY101BTMVW 234 18 202 1E-71 94% 94% 290770772 Eukaryota Chlorophyta Pycnococcaceae Nephroselmis astigmatica GKJWQY101AGMQF 507 4 251 9E-106 95% 95% 290770792 Eukaryota Chlorophyta n Pyramimonas parkeae GKJWQY101ARIDL 299 4 233 1E-107 98% 98% 290770790 Eukaryota Chlorophyta n Pyramimonas tetrarhynchus GKJWQY101AQVXH 245 32 200 2E-70 96% 96% 290770770 Eukaryota Chlorophyta Pycnococcaceae Pseudoscourfieldia marina GKJWQY101A9ES5 143 21 89 3E-25 99% 99% 225545967 Eukaryota Heterokontophyta n uncultured labyrinthulid GKJWQY101BX53J 135 37 80 5E-13 100% 100% 159031144 Eukaryota Mollusca Veneridae Nutricola tantilla GKJWQY101ASWK5 233 85 154 4E-17 92% 92% 220942102 Eukaryota n n Bilateria environmental sample GKJWQY101A02BY 209 5 164 6E-74 99% 99% 222089870 Eukaryota n n uncultured eukaryote GKJWQY101AJBZL 123 16 71 2E-17 98% 98% 218684719 Eukaryota n n uncultured fungus GKJWQY101BEOOB 167 1 114 4E-50 99% 99% 268637040 Eukaryota n n uncultured fungus GKJWQY101A1CUV 546 5 518 0 93% 93% 56713123 Eukaryota n n uncultured phototrophic eukaryote GKJWQY101BQ79B 114 2 63 4E-23 100% 100% 223674440 Eukaryota n n uncultured phototrophic eukaryote GKJWQY101BANY8 110 18 58 2E-11 100% 100% 39547204 Eukaryota n n uncultured rumen protozoa GKJWQY101BVC5X 141 19 96 2E-26 96% 96% 291262345 Eukaryota n n uncultured eukaryote GKJWQY101B0QOY_2 113 1 113 1E-44 96% 96% 291261826 Eukaryota n n Uncultured eukaryote GKJWQY101BZIGB 493 212 425 4E-94 96% 96% 57340766 Eukaryota Streptophyta n Napoleona sp. JS-2005 GKJWQY101AF93I 149 9 107 7E-42 99% 99% 284506657 Eukaryota Streptophyta n Zygnematales sp. M3006 GKJWQY101AM47G 217 5 93 5E-20 89% 89% 284506685 Eukaryota Streptophyta Zygnemataceae Spirogyra sp. M1843 GKJWQY101AL9IH 486 19 99 3E-21 91% 91% 284506686 Eukaryota Streptophyta Zygnemataceae Zygnema sp. M-1156 GKJWQY101AWRLV 167 1 167 2E-78 99% 99% 367479280 Eukaryota Streptophyta Gentianales Asclepias GKJWQY101AQ6JO 415 69 228 6E-57 93% 93% 216963381 Eukaryota Tardigrada Milnesiidae Milnesium tardigradum GKJWQY101A9BVY 144 20 101 2E-32 99% 99% 227452751 Eukaryota Zygomycota Mucoraceae Gongronella sp. xt-2009 200

Table S4. Bacteria mRNA (and other non-rRNA) gene sequences from V5. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n". Records in green are from table 11; moved back (16 records were moved back because old Blastn result was better that new Blastn and Blastx from KAAS KEGG).

Q Q Q %- %- 454 Sequence number e-value GI number Domain Phylum Class/Order Genus/species length start end ident sim GKJWQY101AGGHX 545 5 506 0 97% 97% 31789410 Bacteria Acidobacteria n uncultured Acidobacteria bacterium GKJWQY101BLAHS 571 66 567 2E-172 89% 89% 117647674 Bacteria Actinobacteria Actinobacteria Acidothermus cellulolyticus GKJWQY101BM4QM 251 3 199 2E-75 93% 93% 140843962 Bacteria Actinobacteria Actinobacteria Corynebacterium glutamicum GKJWQY101BHEK2 433 5 376 0 98% 98% 111147037 Bacteria Actinobacteria Actinobacteria Frankia alni GKJWQY101ALVRX 500 5 174 2E-82 100% 100% 119951388 Bacteria Actinobacteria Actinobacteria Arthrobacter aurescens GKJWQY101B2Q7Z 448 36 408 3E-140 91% 91% 41019279 Bacteria Actinobacteria Actinobacteria Micromonospora echinospora GKJWQY101BJIYE 542 38 541 3E-180 90% 90% 145301903 Bacteria Actinobacteria Actinobacteria Salinispora tropica GKJWQY101BP0TD 550 202 550 1E-80 83% 83% 118168627 Bacteria Actinobacteria Actinobacteria Mycobacterium smegmatis GKJWQY101B2BHA 504 342 439 1E-39 98% 98% 76556220 Bacteria Actinobacteria Actinobacteria Propionibacterium freudenreichii GKJWQY101AE43F 573 18 571 0 91% 91% 133909243 Bacteria Actinobacteria Actinobacteria Saccharopolyspora erythraea GKJWQY101A68QX 413 315 358 2E-07 93% 93% 24413764 Bacteria Actinobacteria Actinobacteria Streptomyces coelicolor GKJWQY101AH1E9 507 161 482 2E-83 85% 85% 260644157 Bacteria Actinobacteria Actinobacteria Streptomyces scabiei GKJWQY101BPEIQ 456 17 410 0 96% 96% 118764602 Bacteria Actinobacteria Actinobacteria Bifidobacterium adolescentis GKJWQY101AFA5U 502 380 494 8E-27 87% 87% 295793053 Bacteria Actinobacteria Actinobacteria Bifidobacterium animalis GKJWQY101A4K6T 615 4 148 2E-29 85% 85% 291516109 Bacteria Actinobacteria Actinobacteria Bifidobacterium longum GKJWQY101AAOCE 567 19 512 3E-136 85% 85% 257473675 Bacteria Actinobacteria Actinobacteria Eggerthella lenta GKJWQY101B0JWH 576 227 572 4E-115 89% 89% 295105686 Bacteria Actinobacteria Actinobacteria Gordonibacter pamelaeae GKJWQY101BKG7S 547 17 512 6E-148 86% 86% 108764099 Bacteria Actinobacteria Actinobacteria Rubrobacter xylanophilus GKJWQY101A5PQZ 510 4 427 4E-124 86% 86% 256007408 Bacteria Actinobacteria Actinobacteria Acidimicrobium ferrooxidans GKJWQY101A65QS 554 23 549 0 95% 95% 38200856 Bacteria Actinobacteria Actinobacteria Corynebacterium diphtheriae GKJWQY101BB5IP 574 5 569 0 94% 94% 68262661 Bacteria Actinobacteria Actinobacteria Corynebacterium jeikeium GKJWQY101BC0IC 557 16 556 3E-175 88% 88% 237757549 Bacteria Actinobacteria Actinobacteria Corynebacterium kroppenstedtii GKJWQY101AQ51G 597 5 571 0 88% 88% 256687298 Bacteria Actinobacteria Actinobacteria Kytococcus sedentarius GKJWQY101AV3CT 587 19 581 0 93% 93% 119947346 Bacteria Actinobacteria Actinobacteria Arthrobacter aurescens GKJWQY101ASGX9 573 17 503 2E-173 90% 90% 183579876 Bacteria Actinobacteria Actinobacteria Kocuria rhizophila GKJWQY101A6R2G 571 5 570 0 94% 94% 162952245 Bacteria Actinobacteria Actinobacteria Renibacterium salmoninarum GKJWQY101BURAB 566 21 523 2E-92 80% 80% 119953846 Bacteria Actinobacteria Actinobacteria Mycobacterium vanbaalenii GKJWQY101BLJAK 582 5 398 1E-144 91% 91% 54013472 Bacteria Actinobacteria Actinobacteria Nocardia farcinica GKJWQY101APUX9 573 5 569 4E-170 87% 87% 110816552 Bacteria Actinobacteria Actinobacteria Rhodococcus jostii GKJWQY101A2KCQ 591 17 589 0 94% 94% 269095543 Bacteria Actinobacteria Actinobacteria Sanguibacter keddieii GKJWQY101ANZOF 303 4 68 2E-15 92% 92% 212548595 Bacteria Bacteroidetes Bacteroidia Candidatus Azobacteroides pseudotrichonymphae GKJWQY101BUUW5 565 17 562 0 93% 93% 188593544 Bacteria Bacteroidetes Bacteroidia Porphyromonas gingivalis GKJWQY101BQQMF 393 18 342 4E-78 84% 84% 291513545 Bacteria Bacteroidetes Bacteroidia Alistipes shahii GKJWQY101BGIIN 310 18 247 3E-89 93% 93% 254946573 Bacteria Bacteroidetes Cytophagia Dyadobacter fermentans GKJWQY101B2ENG 246 18 200 4E-62 91% 91% 283814236 Bacteria Bacteroidetes Cytophagia Spirosoma linguale GKJWQY101BHBKI 544 18 518 0 96% 96% 294979899 Bacteria Bacteroidetes Flavobacteria Zunongwangia profunda GKJWQY101AJ2KZ 595 18 592 0 89% 89% 52214156 Bacteria Bacteroidetes Bacteroidia Bacteroides fragilis GKJWQY101A4I0Z 576 19 574 0 88% 88% 60491031 Bacteria Bacteroidetes Bacteroidia Bacteroides fragilis GKJWQY101BS0NP 584 64 582 0 91% 91% 29342101 Bacteria Bacteroidetes Bacteroidia Bacteroides thetaiotaomicron GKJWQY101AN552 561 4 559 0 88% 88% 149931032 Bacteria Bacteroidetes Bacteroidia Bacteroides vulgatus GKJWQY101AWOSS 589 4 587 0 87% 87% 149935098 Bacteria Bacteroidetes Bacteroidia Parabacteroides distasonis GKJWQY101AP6Y7 476 18 421 8E-156 92% 92% 34398108 Bacteria Bacteroidetes Bacteroidia Porphyromonas gingivalis GKJWQY101BYSCO 548 7 537 7E-167 87% 87% 294471613 Bacteria Bacteroidetes Bacteroidia Prevotella ruminicola GKJWQY101BJIPK 572 1 569 0 92% 92% 255340365 Bacteria Bacteroidetes Flavobacteria Flavobacteriaceae bacterium 3519-10

GKJWQY101BZXZT 586 18 571 0 94% 94% 146152184 Bacteria Bacteroidetes Flavobacteria Flavobacterium johnsoniae GKJWQY101BEOT5 555 20 329 8E-132 95% 95% 149770655 Bacteria Bacteroidetes Flavobacteria Flavobacterium psychrophilum GKJWQY101ABFUA 587 17 582 0 89% 89% 255342900 Bacteria Bacteroidetes Pedobacter heparinus GKJWQY101A91OF 548 15 506 0 91% 91% 295083795 Bacteria Bacteroidetes Bacteroidia Bacteroides xylanisolvens GKJWQY101A8VQ6 567 32 251 4E-45 83% 83% 269787736 Bacteria Chloroflexi Sphaerobacter thermophilus GKJWQY101BB7VS 233 67 184 2E-40 93% 93% 171696371 Bacteria Cyanobacteria Chroococcales Cyanothece sp. ATCC 51142 GKJWQY101ADTE9 305 19 271 2E-90 91% 91% 218169741 Bacteria Cyanobacteria Chroococcales Cyanothece sp. PCC 7424 GKJWQY101AIKHU 277 4 217 4E-92 96% 96% 256588085 Bacteria Cyanobacteria Chroococcales Cyanothece sp. PCC 8802 GKJWQY101A4H47 553 333 381 4E-10 94% 94% 256592529 Bacteria Cyanobacteria Chroococcales Cyanothece sp. PCC 8802 GKJWQY101AXWLH 352 18 320 3E-104 90% 90% 146740642 Bacteria Cyanobacteria Chroococcales Synechococcus rubescens GKJWQY101BYVV6 360 16 277 7E-41 79% 79% 147849409 Bacteria Cyanobacteria Chroococcales Synechococcus sp. RCC307 GKJWQY101AGPF8 340 16 279 4E-43 80% 80% 33633126 Bacteria Cyanobacteria Chroococcales Synechococcus sp. WH 8102 GKJWQY101ASMWQ 163 29 136 1E-40 95% 95% 37508091 Bacteria Cyanobacteria Gloeobacteria Gloeobacter violaceus GKJWQY101BYL6X 559 4 543 0 90% 90% 158303474 Bacteria Cyanobacteria n Acaryochloris marina GKJWQY101AWOW9 290 5 221 2E-80 92% 92% 284809060 Bacteria Cyanobacteria n cyanobacterium UCYN-A GKJWQY101BQ48U 481 5 412 1E-169 93% 93% 219862254 Bacteria Cyanobacteria n Cyanothece sp. PCC 7425 GKJWQY101BJ44C 613 18 609 0 88% 88% 56684969 Bacteria Cyanobacteria n Synechococcus elongatus GKJWQY101BQPKI 556 19 546 2E-88 79% 79% 75699950 Bacteria Cyanobacteria n Anabaena variabilis GKJWQY101AKAL4 539 17 497 6E-158 88% 88% 110164990 Bacteria Cyanobacteria n Trichodesmium erythraeum GKJWQY101BXWGA 568 157 184 0.002 100% 100% 111610610 Bacteria Cyanobacteria Nostocales Tolypothrix sp. PCC 7601 = UTEX LB 481 GKJWQY101A5ZEU 261 8 214 4E-92 97% 97% 186463002 Bacteria Cyanobacteria Nostocales Nostoc punctiforme 201 Table S4 Cont.

GKJWQY101APB60 447 5 401 1E-158 92% 92% 47118302 Bacteria Cyanobacteria Nostocales Nostoc sp. PCC 7120 GKJWQY101BC039 546 335 540 3E-17 76% 76% 17134864 Bacteria Cyanobacteria Nostocales Nostoc sp. PCC 7120 GKJWQY101ATFBR 558 5 556 0 90% 90% 146740643 Bacteria Cyanobacteria Oscillatoriales Microcoleus chthonoplastes GKJWQY101AKQUQ 222 18 177 7E-74 99% 99% 146740644 Bacteria Cyanobacteria Oscillatoriales Spirulina sp. PCC 6313 GKJWQY101BRVHX 334 19 280 8E-85 89% 89% 146740638 Bacteria Cyanobacteria Prochlorales Prochlorothrix hollandica GKJWQY101AQDCE 479 17 414 0 98% 98% 290469363 Bacteria Deinococcus-Thermus Deinococci Meiothermus ruber GKJWQY101A2UHA 548 5 238 6E-98 95% 95% 294979666 Bacteria Deinococcus-Thermus Deinococci Thermus thermophilus GKJWQY101AWMCZ 545 26 543 3E-156 86% 86% 226319394 Bacteria Deinococcus-Thermus Deinococci Deinococcus deserti GKJWQY101AQZAT 611 79 551 4E-140 87% 87% 154350369 Bacteria Firmicutes Bacilli Bacillus amyloliquefaciens GKJWQY101ACEUC 378 6 334 1E-103 88% 88% 225785631 Bacteria Firmicutes Bacilli Bacillus cereus GKJWQY101AA6R0 423 5 386 4E-163 94% 94% 218540569 Bacteria Firmicutes Bacilli Bacillus cereus GKJWQY101B2DJZ 416 18 364 6E-147 94% 94% 221237819 Bacteria Firmicutes Bacilli Bacillus cereus GKJWQY101ANNC9 607 25 557 4E-95 80% 80% 37903989 Bacteria Firmicutes Bacilli Bacillus cereus GKJWQY101A6VHS 591 5 575 0 93% 93% 157679556 Bacteria Firmicutes Bacilli Bacillus pumilus GKJWQY101BLJ1O 463 5 368 6E-157 95% 95% 226092535 Bacteria Firmicutes Bacilli Brevibacillus brevis GKJWQY101AHEDV 181 30 142 2E-43 96% 96% 111182492 Bacteria Firmicutes Bacilli Enterococcus avium GKJWQY101BC6PS 383 13 79 3E-20 96% 96% 283481151 Bacteria Firmicutes Bacilli Enterococcus faecium GKJWQY101AMO9E 334 134 282 1E-58 95% 95% 111182490 Bacteria Firmicutes Bacilli Enterococcus gallinarum GKJWQY101A9OD0 479 5 477 0 100% 100% 15146028 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii GKJWQY101ADA2E 517 5 472 0 99% 99% 15146030 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii GKJWQY101BI8YA 641 193 226 0.000001 100% 100% 195537732 Bacteria Firmicutes Bacilli Lactobacillus plantarum GKJWQY101A2WZI 582 14 578 0 96% 96% 183223999 Bacteria Firmicutes Bacilli Lactobacillus reuteri GKJWQY101BLMTA 376 5 330 7E-86 85% 85% 120400324 Bacteria Firmicutes Bacilli Lactobacillus johnsonii GKJWQY101AF275 567 23 362 6E-173 99% 99% 116106497 Bacteria Firmicutes Bacilli Lactococcus lactis GKJWQY101BFXEC 525 18 522 0 99% 99% 116103724 Bacteria Firmicutes Bacilli Lactobacillus casei GKJWQY101ACCFZ 548 22 545 0 99% 99% 103422338 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii GKJWQY101BPUIN 555 18 555 0 100% 100% 257152781 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus GKJWQY101AKANV 345 100 142 0.00003 91% 91% 116090851 Bacteria Firmicutes Bacilli Oenococcus oeni GKJWQY101ALDHC 261 17 182 8E-79 99% 99% 3582195 Bacteria Firmicutes Bacilli Lactococcus lactis GKJWQY101A6FQL 507 18 504 0 99% 99% 116108977 Bacteria Firmicutes Bacilli Lactococcus lactis GKJWQY101A5HHI 374 5 271 9E-105 93% 93% 23094784 Bacteria Firmicutes Bacilli Streptococcus agalactiae GKJWQY101A7KUH 203 5 159 4E-71 99% 99% 113120160 Bacteria Firmicutes Bacilli Streptococcus macedonicus GKJWQY101BXGNS 493 5 378 0 99% 99% 225724295 Bacteria Firmicutes Bacilli Streptococcus pneumoniae GKJWQY101AFNBN 518 18 511 1E-169 89% 89% 25307955 Bacteria Firmicutes Bacilli Streptococcus pneumoniae GKJWQY101A2ND4 505 5 355 5E-143 93% 93% 168994879 Bacteria Firmicutes Bacilli Streptococcus pneumoniae GKJWQY101BVGTG 546 4 543 0 97% 97% 182628304 Bacteria Firmicutes Bacilli Streptococcus pneumoniae GKJWQY101A2QNZ 477 17 422 0 98% 98% 209539788 Bacteria Firmicutes Bacilli Streptococcus pyogenes GKJWQY101AIT3S 370 24 324 1E-93 88% 88% 21905618 Bacteria Firmicutes Bacilli Streptococcus pyogenes GKJWQY101BBVTI 484 5 441 0 99% 99% 229264291 Bacteria Firmicutes Bacilli Bacillus anthracis GKJWQY101AXQ8R 580 4 576 0 92% 92% 56908016 Bacteria Firmicutes Bacilli Bacillus clausii GKJWQY101ASAUF 575 5 528 0 95% 95% 152022606 Bacteria Firmicutes Bacilli Bacillus cytotoxicus GKJWQY101BZHXT 548 5 548 0 93% 93% 47118318 Bacteria Firmicutes Bacilli Bacillus halodurans GKJWQY101BN6EO 577 18 573 0 90% 90% 145902672 Bacteria Firmicutes Bacilli Bacillus licheniformis GKJWQY101A440I 454 3 188 5E-83 97% 97% 294346812 Bacteria Firmicutes Bacilli Bacillus megaterium GKJWQY101A2M8L 595 5 588 0 93% 93% 294799901 Bacteria Firmicutes Bacilli Bacillus megaterium GKJWQY101AM35J 538 5 537 0 96% 96% 225184640 Bacteria Firmicutes Bacilli Bacillus subtilis GKJWQY101BMIJM 565 1 560 0 95% 95% 168990106 Bacteria Firmicutes Bacilli Lysinibacillus sphaericus GKJWQY101BMULU 441 5 385 0 98% 98% 42632302 Bacteria Firmicutes Bacilli Oceanobacillus iheyensis GKJWQY101BL46W 622 12 498 8E-167 89% 89% 289169617 Bacteria Firmicutes Bacilli GKJWQY101BIMJV 561 5 558 6E-168 86% 86% 171988566 Bacteria Firmicutes Bacilli Exiguobacterium sibiricum GKJWQY101AZ6PA 461 109 412 7E-107 90% 90% 229467163 Bacteria Firmicutes Bacilli Exiguobacterium sp. AT1b GKJWQY101AKUVY 246 18 204 2E-75 95% 95% 261280339 Bacteria Firmicutes Bacilli Paenibacillus sp. Y412MC10 GKJWQY101AL2EY 563 5 556 0 91% 91% 222119372 Bacteria Firmicutes Bacilli Macrococcus caseolyticus GKJWQY101BOYUA 408 5 365 1E-178 98% 98% 222420101 Bacteria Firmicutes Bacilli Staphylococcus carnosus GKJWQY101AKXQT 538 18 531 0 97% 97% 68445725 Bacteria Firmicutes Bacilli Staphylococcus haemolyticus GKJWQY101BWB1P 559 18 552 0 97% 97% 289178903 Bacteria Firmicutes Bacilli Staphylococcus lugdunensis GKJWQY101AIN1W 569 5 564 0 94% 94% 72493824 Bacteria Firmicutes Bacilli Staphylococcus saprophyticus GKJWQY101A2H1I 561 29 521 6E-173 89% 89% 158967071 Bacteria Firmicutes Bacilli Lactobacillus acidophilus GKJWQY101BFH4F 563 60 395 2E-167 99% 99% 190711126 Bacteria Firmicutes Bacilli Lactobacillus casei GKJWQY101AIPY1 586 18 585 0 98% 98% 116092543 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii GKJWQY101BXUBT 538 1 516 0 93% 93% 262396937 Bacteria Firmicutes Bacilli Lactobacillus johnsonii GKJWQY101AKKWT 451 18 307 7E-92 88% 88% 254044096 Bacteria Firmicutes Bacilli Lactobacillus plantarum GKJWQY101A9FF7 586 5 583 0 97% 97% 259648365 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus GKJWQY101BF89U 570 5 564 0 99% 99% 90820184 Bacteria Firmicutes Bacilli Lactobacillus salivarius GKJWQY101A6GDN 151 13 119 4E-39 95% 95% 295831662 Bacteria Firmicutes Bacilli Leuconostoc kimchii GKJWQY101AZ303 492 16 431 0 96% 96% 13400022 Bacteria Firmicutes Bacilli Lactococcus lactis GKJWQY101AUKS3 528 3 525 0 99% 99% 124491690 Bacteria Firmicutes Bacilli Lactococcus lactis GKJWQY101A3SPW 574 5 574 0 100% 100% 281374316 Bacteria Firmicutes Bacilli Lactococcus lactis GKJWQY101AB8GR 544 5 516 0 93% 93% 22535226 Bacteria Firmicutes Bacilli Streptococcus agalactiae GKJWQY101BZH5I 545 5 410 3E-170 94% 94% 225700893 Bacteria Firmicutes Bacilli Streptococcus equi GKJWQY101AH1ID 551 5 544 0 97% 97% 288730948 Bacteria Firmicutes Bacilli Streptococcus gallolyticus GKJWQY101BI2WE 560 5 557 0 98% 98% 157074445 Bacteria Firmicutes Bacilli Streptococcus gordonii GKJWQY101AYWCN 564 5 559 0 97% 97% 254996425 Bacteria Firmicutes Bacilli Streptococcus mutans 202 Table S4 Cont.

GKJWQY101BOVEU 558 21 556 0 97% 97% 225726369 Bacteria Firmicutes Bacilli Streptococcus pneumoniae GKJWQY101A1CSL 395 4 362 1E-172 98% 98% 19913450 Bacteria Firmicutes Bacilli Streptococcus pyogenes GKJWQY101BNM0N 430 107 383 6E-142 100% 100% 292557464 Bacteria Firmicutes Bacilli Streptococcus suis GKJWQY101AYWMX 285 5 253 7E-105 95% 95% 222113012 Bacteria Firmicutes Bacilli Streptococcus uberis GKJWQY101BUKAI 560 18 560 0 94% 94% 295112306 Bacteria Firmicutes Bacilli Enterococcus sp. 7L76 GKJWQY101AW8QY 590 23 587 0 94% 94% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus GKJWQY101ALG3R 326 18 259 5E-112 98% 98% 15487824 Bacteria Firmicutes Bacilli Lactobacillus sakei GKJWQY101BVSNC 236 18 199 9E-83 98% 98% 158139258 Bacteria Firmicutes Clostridia Alkaliphilus oremlandii GKJWQY101BIJGR 305 24 146 4E-48 96% 96% 149901357 Bacteria Firmicutes Clostridia Clostridium beijerinckii GKJWQY101AVTLD 316 127 276 1E-67 99% 99% 295317476 Bacteria Firmicutes Clostridia Clostridium botulinum GKJWQY101AR4X1 537 18 537 0 100% 100% 110673209 Bacteria Firmicutes Clostridia Clostridium perfringens GKJWQY101AX0O9 546 25 207 1E-74 95% 95% 295089810 Bacteria Firmicutes Clostridia Clostridium saccharolyticum GKJWQY101APBWG 500 18 322 6E-118 93% 93% 223928079 Bacteria Firmicutes Clostridia Clostridium sp. enrichment culture clone 7-14 GKJWQY101BXM9C 573 12 555 1E-115 82% 82% 125712750 Bacteria Firmicutes Clostridia Clostridium thermocellum GKJWQY101B17VM 233 16 192 1E-86 100% 100% 256797400 Bacteria Firmicutes Clostridia Anaerococcus prevotii GKJWQY101AF4EN 544 18 542 0 98% 98% 167830502 Bacteria Firmicutes Clostridia Finegoldia magna GKJWQY101AIZHR 472 23 409 3E-135 90% 90% 291556121 Bacteria Firmicutes Clostridia Eubacterium siraeum GKJWQY101B2AHH 421 14 374 1E-138 92% 92% 291529795 Bacteria Firmicutes Clostridia Eubacterium siraeum GKJWQY101BNDMO 405 57 351 1E-59 82% 82% 114336511 Bacteria Firmicutes Clostridia Syntrophomonas wolfei GKJWQY101AOC8C 305 17 260 2E-115 98% 98% 188497817 Bacteria Firmicutes Clostridia Clostridium botulinum GKJWQY101BE4ZG 596 18 590 0 95% 95% 110681940 Bacteria Firmicutes Clostridia Clostridium perfringens GKJWQY101BX0WB 552 1 504 1E-174 89% 89% 160426828 Bacteria Firmicutes Clostridia Clostridium phytofermentans GKJWQY101AUKJZ 542 17 488 6E-173 90% 90% 238871336 Bacteria Firmicutes Clostridia Eubacterium eligens GKJWQY101BDY1Q 551 5 485 0 95% 95% 238874104 Bacteria Firmicutes Clostridia Eubacterium rectale GKJWQY101ARORW 494 18 447 4E-79 80% 80% 219536331 Bacteria Firmicutes Clostridia Desulfitobacterium hafniense GKJWQY101ATSIV 593 7 583 0 87% 87% 223928104 Bacteria Firmicutes Clostridia Clostridium sp. enrichment culture clone 7-25 GKJWQY101BNDWM 572 5 571 0 88% 88% 291523683 Bacteria Firmicutes Clostridia Eubacterium rectale GKJWQY101A40TK 418 4 370 7E-141 92% 92% 291526582 Bacteria Firmicutes Clostridia Eubacterium rectale GKJWQY101AO1P2 622 6 618 0 86% 86% 291520697 Bacteria Firmicutes Clostridia Coprococcus catus GKJWQY101A44JY 561 7 541 0 94% 94% 295092884 Bacteria Firmicutes Clostridia Coprococcus sp. ART55/1 GKJWQY101BQCZD 544 9 543 0 92% 92% 295114602 Bacteria Firmicutes Clostridia butyrate-producing bacterium SM4/1

GKJWQY101BQTA7 480 18 427 0 95% 95% 291558333 Bacteria Firmicutes Clostridia butyrate-producing bacterium SSC/2

GKJWQY101AYGQY 563 24 559 2E-163 87% 87% 291541372 Bacteria Firmicutes Clostridia Ruminococcus bromii GKJWQY101AKXDF 588 18 579 0 91% 91% 295107714 Bacteria Firmicutes Clostridia Ruminococcus obeum GKJWQY101BCTZL 587 5 534 0 89% 89% 291543184 Bacteria Firmicutes Clostridia Ruminococcus sp. 18P13 GKJWQY101BBYCY 569 17 526 0 96% 96% 291545299 Bacteria Firmicutes Clostridia Ruminococcus sp. SR1/5 GKJWQY101BCX1D 600 15 598 0 90% 90% 291548560 Bacteria Firmicutes Clostridia Ruminococcus torques GKJWQY101BSTMC 564 27 564 0 91% 91% 295098739 Bacteria Firmicutes Erysipelotrichi Eubacterium cylindroides GKJWQY101AHKOV 564 5 555 0 91% 91% 283951607 Bacteria Firmicutes Negativicutes Acidaminococcus fermentans GKJWQY101BGGHX 576 17 568 0 99% 99% 269093698 Bacteria Firmicutes Negativicutes Veillonella parvula GKJWQY101BWHUV 584 5 572 0 89% 89% 291532143 Bacteria Firmicutes Negativicutes Megamonas hypermegale GKJWQY101AEIGL 570 18 559 0 91% 91% 20095250 Bacteria Fusobacteria Fusobacteria Fusobacterium nucleatum GKJWQY101BJ7Q1 563 17 560 0 94% 94% 257048753 Bacteria Fusobacteria Fusobacteria Leptotrichia buccalis GKJWQY101BTYHD 557 13 343 3E-96 87% 87% 156770867 Bacteria n n uncultured bacterium GKJWQY101AANQS 765 8 284 2E-108 93% 93% 156773414 Bacteria n n uncultured bacterium GKJWQY101AP35G 369 18 305 3E-104 91% 91% 156774261 Bacteria n n uncultured bacterium GKJWQY101BYYCA 271 5 223 1E-106 99% 99% 156770305 Bacteria n n uncultured bacterium GKJWQY101BA89J 207 18 127 8E-43 96% 96% 156776473 Bacteria n n uncultured bacterium GKJWQY101BFWYL 285 18 240 2E-105 98% 98% 156769021 Bacteria n n uncultured bacterium GKJWQY101AR683 223 3 155 2E-69 99% 99% 156770979 Bacteria n n uncultured bacterium GKJWQY101AMU20 257 18 106 8E-24 91% 91% 156768878 Bacteria n n uncultured bacterium GKJWQY101A8WXN 563 18 515 0 91% 91% 156773397 Bacteria n n uncultured bacterium GKJWQY101B0JOO 229 65 198 9E-58 98% 98% 156773640 Bacteria n n uncultured bacterium GKJWQY101BM98Q 303 4 217 3E-78 92% 92% 156773646 Bacteria n n uncultured bacterium GKJWQY101BQAGF 532 4 471 3E-151 88% 88% 156770997 Bacteria n n uncultured bacterium GKJWQY101A2KY0 541 18 484 2E-138 86% 86% 156775997 Bacteria n n uncultured bacterium GKJWQY101BXWSP 521 15 449 5E-94 82% 82% 156770975 Bacteria n n uncultured bacterium GKJWQY101A06LP 569 18 569 0 94% 94% 156768872 Bacteria n n uncultured bacterium GKJWQY101ACT2K 476 108 435 5E-98 87% 87% 156775471 Bacteria n n uncultured bacterium GKJWQY101A2BN5 538 10 424 2E-163 92% 92% 156768589 Bacteria n n uncultured bacterium GKJWQY101BL1SN 555 18 465 0 98% 98% 156772425 Bacteria n n uncultured bacterium GKJWQY101A24D1 418 3 362 6E-92 84% 84% 156773609 Bacteria n n uncultured bacterium GKJWQY101BNAXW 568 15 560 0 88% 88% 156774731 Bacteria n n uncultured bacterium GKJWQY101A2W2H 537 34 489 3E-161 90% 90% 156769665 Bacteria n n uncultured bacterium GKJWQY101A5D79 603 129 555 1E-160 91% 91% 156768004 Bacteria n n uncultured bacterium GKJWQY101BU6EL 555 14 548 0 93% 93% 156767585 Bacteria n n uncultured bacterium GKJWQY101A4300 358 5 306 1E-108 91% 91% 193083711 Bacteria n n uncultured bacterium AD12-C5 GKJWQY101A91EV 205 20 160 5E-40 89% 89% 193084530 Bacteria n n uncultured bacterium ARCTIC38 F 05 GKJWQY101ADWXG 416 27 347 7E-131 94% 94% 193084460 Bacteria n n uncultured bacterium ARCTIC47 D 06 203 Table S4 Cont.

GKJWQY101AU0ML 283 7 239 4E-102 96% 96% 193084613 Bacteria n n uncultured bacterium HF0070_34E11

GKJWQY101ANMYP 93 23 60 7E-10 100% 100% 193084646 Bacteria n n uncultured bacterium HF0200_39L23

GKJWQY101AHOJC 335 1 273 1E-102 92% 92% 193084638 Bacteria n n uncultured bacterium HF4000_48A13

GKJWQY101BI2JO 400 5 337 1E-104 88% 88% 193083728 Bacteria n n uncultured bacterium KM3-47-A6 GKJWQY101ANG5G 545 17 544 0 89% 89% 62945638 Bacteria n n uncultured bacterium zdt-25h14 GKJWQY101A9FHD 225 5 141 2E-24 83% 83% 83595892 Bacteria n n uncultured marine bacterium Ant4E12

GKJWQY101BLR88 506 5 501 0 97% 97% 146260146 Bacteria n n uncultured soil bacterium GKJWQY101AH7PE 431 9 388 3E-164 94% 94% 62860402 Bacteria n n uncultured bacterium zdt-44a23 GKJWQY101B18KD 574 14 492 8E-172 90% 90% 156777063 Bacteria n n uncultured bacterium GKJWQY101AM37K 544 86 508 8E-117 85% 85% 156769510 Bacteria n n uncultured bacterium GKJWQY101A17ME 572 18 570 0 90% 90% 156767597 Bacteria n n uncultured bacterium GKJWQY101AECX4 530 4 522 9E-176 88% 88% 156769699 Bacteria n n uncultured bacterium GKJWQY101BZ6F8 569 18 566 3E-176 88% 88% 156767618 Bacteria n n uncultured bacterium GKJWQY101A2QNX 544 7 423 6E-163 92% 92% 156767628 Bacteria n n uncultured bacterium GKJWQY101BM4AZ 443 18 341 2E-121 91% 91% 156768405 Bacteria n n uncultured bacterium GKJWQY101BNW14 408 20 353 6E-92 86% 86% 156770664 Bacteria n n uncultured bacterium GKJWQY101AXA5Z 445 53 386 8E-126 91% 91% 156770556 Bacteria n n uncultured bacterium GKJWQY101BWIJY 548 18 457 6E-118 85% 85% 156768500 Bacteria n n uncultured bacterium GKJWQY101AOB4M 528 193 432 6E-93 93% 93% 156770596 Bacteria n n uncultured bacterium GKJWQY101A8S8N 551 5 547 0 92% 92% 156770864 Bacteria n n uncultured bacterium GKJWQY101BSJWR 566 11 564 0 94% 94% 156768827 Bacteria n n uncultured bacterium GKJWQY101BVQQU 585 18 580 3E-166 86% 86% 156770902 Bacteria n n uncultured bacterium GKJWQY101BKAYD 576 4 571 0 93% 93% 156771060 Bacteria n n uncultured bacterium GKJWQY101BJH0Q 560 16 557 0 95% 95% 156769048 Bacteria n n uncultured bacterium GKJWQY101BD56T 570 62 565 0 95% 95% 156773064 Bacteria n n uncultured bacterium GKJWQY101BY5FD 536 3 531 0 92% 92% 156775970 Bacteria n n uncultured bacterium GKJWQY101AKOGB 575 18 572 2E-152 85% 85% 156776145 Bacteria n n uncultured bacterium GKJWQY101B1UIB 569 17 562 0 90% 90% 156776214 Bacteria n n uncultured bacterium GKJWQY101AXTWR 530 18 455 6E-128 86% 86% 156776216 Bacteria n n uncultured bacterium GKJWQY101AZQX2 535 18 529 0 92% 92% 156773440 Bacteria n n uncultured bacterium GKJWQY101BH4I5 552 2 550 7E-177 87% 87% 156773475 Bacteria n n uncultured bacterium GKJWQY101BYYVG 588 16 586 0 91% 91% 156773480 Bacteria n n uncultured bacterium GKJWQY101A407X 528 4 523 0 93% 93% 156776331 Bacteria n n uncultured bacterium GKJWQY101B1OHM 297 18 247 1E-112 99% 99% 193084596 Bacteria n n uncultured bacterium 2.6_D6 GKJWQY101A41PE 569 7 567 0 89% 89% 193083707 Bacteria n n uncultured bacterium AD12-A11 GKJWQY101BT4ZY 567 24 531 0 96% 96% 193083717 Bacteria n n uncultured bacterium AD84-H10 GKJWQY101AWPDP 568 17 566 0 96% 96% 193084603 Bacteria n n uncultured bacterium HF0010_04H24

GKJWQY101BKKTZ 579 21 575 0 89% 89% 193084608 Bacteria n n uncultured bacterium HF0010_10C01

GKJWQY101BD5KR 585 18 585 0 99% 99% 193084619 Bacteria n n uncultured bacterium HF0500_12O04

GKJWQY101BMECY 519 17 477 0 96% 96% 193084621 Bacteria n n uncultured bacterium HF0500_24B12

GKJWQY101AH347 608 20 87 1E-20 96% 96% 193084601 Bacteria n n uncultured bacterium JM9_G5 GKJWQY101BE4R0 362 215 242 0.001 100% 100% 91199943 Bacteria Planctomycetes Planctomycetacia Candidatus Kuenenia stuttgartiensis

GKJWQY101BDY6H 568 20 550 1E-135 84% 84% 283436255 Bacteria Planctomycetes Planctomycetacia Pirellula staleyi GKJWQY101AYMDQ 317 20 270 1E-102 94% 94% 67003493 Bacteria Proteobacteria (alpha) Alphaproteobacteria Brevundimonas sp. SD212 GKJWQY101A2HLQ 209 31 165 5E-55 96% 96% 240266805 Bacteria Proteobacteria (alpha) Alphaproteobacteria grahamii GKJWQY101B0GZZ 540 171 203 0.000003 100% 100% 146189981 Bacteria Proteobacteria (alpha) Alphaproteobacteria Bradyrhizobium sp. ORS 278 GKJWQY101BQDY9 536 18 476 0 92% 92% 219944660 Bacteria Proteobacteria (alpha) Alphaproteobacteria Methylobacterium nodulans GKJWQY101BV2AQ 541 123 216 2E-23 89% 89% 221739013 Bacteria Proteobacteria (alpha) Alphaproteobacteria Agrobacterium vitis GKJWQY101A4OMR 527 191 527 6E-118 90% 90% 111606883 Bacteria Proteobacteria (alpha) Alphaproteobacteria aminovorans GKJWQY101BU6BN 504 38 404 7E-112 87% 87% 86284380 Bacteria Proteobacteria (alpha) Alphaproteobacteria Rhizobium etli GKJWQY101AX6T2 494 18 391 1E-75 81% 81% 190694918 Bacteria Proteobacteria (alpha) Alphaproteobacteria Rhizobium etli GKJWQY101BHVBO 505 5 505 1E-99 81% 81% 227337257 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sinorhizobium fredii GKJWQY101BERYH 547 24 479 6E-78 79% 79% 227339586 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sinorhizobium fredii GKJWQY101BRU7O 505 21 489 1E-84 80% 80% 30407155 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sinorhizobium meliloti GKJWQY101BRQ7P 560 134 394 5E-64 85% 85% 25168258 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sinorhizobium meliloti GKJWQY101AJPHZ 518 63 472 4E-50 77% 77% 119372524 Bacteria Proteobacteria (alpha) Alphaproteobacteria Paracoccus denitrificans GKJWQY101AGSJB 496 5 429 0 99% 99% 32263857 Bacteria Proteobacteria (alpha) Alphaproteobacteria Paracoccus sp. Ol18 GKJWQY101BYMZF 548 17 68 4E-10 93% 93% 221161990 Bacteria Proteobacteria (alpha) Alphaproteobacteria Rhodobacter sphaeroides GKJWQY101ATYDP 385 15 328 7E-136 95% 95% 56676665 Bacteria Proteobacteria (alpha) Alphaproteobacteria Ruegeria pomeroyi GKJWQY101BSXCF 501 54 453 4E-134 89% 89% 42414566 Bacteria Proteobacteria (alpha) Alphaproteobacteria Acetobacter pasteurianus GKJWQY101AL6DG 481 5 436 3E-130 87% 87% 82943940 Bacteria Proteobacteria (alpha) Alphaproteobacteria Magnetospirillum magneticum GKJWQY101ATZNT 193 44 158 1E-50 99% 99% 288926859 Bacteria Proteobacteria (alpha) Alphaproteobacteria Rhodospirillum centenum GKJWQY101BUXBD 461 372 459 4E-10 82% 82% 145322317 Bacteria Proteobacteria (alpha) Alphaproteobacteria Novosphingobium aromaticivorans GKJWQY101BRVHF 354 21 316 1E-142 98% 98% 188532098 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sphingobium chungbukense GKJWQY101A14H0 579 23 576 0 91% 91% 170658659 Bacteria Proteobacteria (alpha) Alphaproteobacteria Methylobacterium radiotolerans GKJWQY101AHOIZ 503 10 497 9E-176 90% 90% 47118328 Bacteria Proteobacteria (alpha) Alphaproteobacteria Mesorhizobium loti GKJWQY101BSA8U 613 19 610 0 91% 91% 119376152 Bacteria Proteobacteria (alpha) Alphaproteobacteria Paracoccus denitrificans 204 Table S4 Cont.

GKJWQY101BM00X 551 21 550 0 91% 91% 83574254 Bacteria Proteobacteria (alpha) Alphaproteobacteria Rhodospirillum rubrum GKJWQY101ASBL2 563 18 545 0 96% 96% 292676846 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sphingobium japonicum GKJWQY101BATJM 570 18 563 0 97% 97% 98975575 Bacteria Proteobacteria (alpha) Alphaproteobacteria Sphingopyxis alaskensis GKJWQY101BQWCO 376 66 220 8E-11 76% 76% 115421100 Bacteria Proteobacteria (beta) Betaproteobacteria GKJWQY101ASXGJ 515 371 468 2E-17 85% 85% 38707361 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia anthina GKJWQY101BJCLG 511 155 511 0 100% 100% 13625778 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia cepacia GKJWQY101BS6QI 486 17 453 0 95% 95% 237502667 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia pseudomallei GKJWQY101BN6XJ 105 5 67 5E-17 94% 94% 295438061 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia sp. CCGE1002 GKJWQY101BXETD 510 18 215 1E-39 82% 82% 83649860 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia thailandensis GKJWQY101AHRMM 534 5 308 9E-32 77% 77% 83652219 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia thailandensis GKJWQY101BS6LH 508 5 505 1E-144 85% 85% 91685338 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia xenovorans GKJWQY101BIKNE 401 18 348 1E-153 97% 97% 120587178 Bacteria Proteobacteria (beta) Betaproteobacteria Acidovorax citrulli GKJWQY101AJROG 413 25 376 1E-133 92% 92% 121551644 Bacteria Proteobacteria (beta) Betaproteobacteria Verminephrobacter eiseniae GKJWQY101B2KTE 243 4 129 8E-49 95% 95% 295794626 Bacteria Proteobacteria (beta) Betaproteobacteria Thiomonas intermedia GKJWQY101AW7FS 510 61 278 2E-37 81% 81% 294338440 Bacteria Proteobacteria (beta) Betaproteobacteria Thiomonas sp. 3As GKJWQY101BKTPD 537 17 533 0 93% 93% 48428765 Bacteria Proteobacteria (beta) Betaproteobacteria Collimonas fungivorans GKJWQY101A7HUP 554 10 508 0 93% 93% 34105712 Bacteria Proteobacteria (beta) Betaproteobacteria Chromobacterium violaceum GKJWQY101BOC0B 511 18 391 4E-179 98% 98% 66731897 Bacteria Proteobacteria (beta) Betaproteobacteria GKJWQY101AXXBZ 498 5 442 0 98% 98% 161594571 Bacteria Proteobacteria (beta) Betaproteobacteria Neisseria meningitidis GKJWQY101BYX8F 547 24 547 0 96% 96% 254667570 Bacteria Proteobacteria (beta) Betaproteobacteria Neisseria meningitidis GKJWQY101B17TY 497 18 341 2E-151 97% 97% 77965403 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia sp. 383 GKJWQY101A4LWM 503 18 500 0 92% 92% 77964193 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia sp. 383 GKJWQY101AGP17 539 17 500 0 97% 97% 77968738 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia sp. 383 GKJWQY101ATCIB 517 18 514 0 96% 96% 295441430 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia sp. CCGE1002 GKJWQY101AD239 350 18 201 2E-90 100% 100% 288237308 Bacteria Proteobacteria (beta) Betaproteobacteria Cupriavidus metallidurans GKJWQY101AX6PV 517 19 513 3E-151 87% 87% 30407128 Bacteria Proteobacteria (beta) Betaproteobacteria Ralstonia solanacearum GKJWQY101AG8TI 459 5 405 2E-102 84% 84% 30407127 Bacteria Proteobacteria (beta) Betaproteobacteria Ralstonia solanacearum GKJWQY101BLFSC 544 18 528 0 95% 95% 221728669 Bacteria Proteobacteria (beta) Betaproteobacteria Acidovorax ebreus GKJWQY101AQ5RN 548 18 545 0 91% 91% 120604516 Bacteria Proteobacteria (beta) Betaproteobacteria Acidovorax sp. JS42 GKJWQY101BQ51J 519 141 502 7E-18 73% 73% 171192370 Bacteria Proteobacteria (beta) Betaproteobacteria Polynucleobacter necessarius GKJWQY101AEFDT 566 24 563 0 96% 96% 91695138 Bacteria Proteobacteria (beta) Betaproteobacteria Polaromonas sp. JS666 GKJWQY101BLFW7 540 18 534 0 90% 90% 133737197 Bacteria Proteobacteria (beta) Betaproteobacteria Herminiimonas arsenicoxydans GKJWQY101AVDT3 547 51 545 1E-159 88% 88% 71845263 Bacteria Proteobacteria (beta) Betaproteobacteria Dechloromonas aromatica GKJWQY101B3I7Y 506 18 506 0 98% 98% 186660182 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia vietnamiensis GKJWQY101BISXY 575 18 571 0 99% 99% 260222220 Bacteria Proteobacteria (beta) Betaproteobacteria Curvibacter putative symbiont of Hydra magnipapillata GKJWQY101A2RQ4 553 5 551 1E-150 85% 85% 90823168 Bacteria Proteobacteria (delta) Deltaproteobacteria Pelobacter carbinolicus GKJWQY101AVQQS 550 11 275 8E-112 95% 95% 262076673 Bacteria Proteobacteria (delta) Deltaproteobacteria Haliangium ochraceum GKJWQY101A7KK6 487 18 272 9E-91 91% 91% 219952977 Bacteria Proteobacteria (delta) Deltaproteobacteria Anaeromyxobacter dehalogenans GKJWQY101AN6A1 562 23 555 2E-133 84% 84% 82617838 Bacteria Proteobacteria (delta) Deltaproteobacteria uncultured delta proteobacterium DeepAnt-32C6 GKJWQY101ATI2M 440 5 407 5E-103 84% 84% 78217452 Bacteria Proteobacteria (delta) Deltaproteobacteria Desulfovibrio alaskensis GKJWQY101A2EXF 323 32 265 1E-72 88% 88% 34483186 Bacteria Proteobacteria (epsilon) Epsilonproteobacteria Wolinella succinogenes GKJWQY101AR8JF 379 18 332 1E-137 95% 95% 237499037 Bacteria Proteobacteria (gamma) Gammaproteobacteria Tolumonas auensis GKJWQY101ACR3I 481 18 416 5E-158 92% 92% 120322793 Bacteria Proteobacteria (gamma) Gammaproteobacteria Marinobacter hydrocarbonoclasticus

GKJWQY101AML8K 373 18 335 8E-135 94% 94% 167351963 Bacteria Proteobacteria (gamma) Gammaproteobacteria Shewanella halifaxensis GKJWQY101A8SDV 547 23 460 0 98% 98% 117610791 Bacteria Proteobacteria (gamma) Gammaproteobacteria Shewanella sp. ANA-3 GKJWQY101BJ0XY 460 5 399 6E-127 88% 88% 219994503 Bacteria Proteobacteria (gamma) Gammaproteobacteria Thioalkalivibrio sulfidophilus GKJWQY101BMVH7 413 12 299 2E-122 95% 95% 261835099 Bacteria Proteobacteria (gamma) Gammaproteobacteria Halothiobacillus neapolitanus GKJWQY101ASOB7 495 119 292 1E-29 82% 82% 83630956 Bacteria Proteobacteria (gamma) Gammaproteobacteria Hahella chejuensis GKJWQY101ADZWX 379 7 356 2E-115 89% 89% 145316543 Bacteria Proteobacteria (gamma) Gammaproteobacteria Enterobacter sp. 638 GKJWQY101A5AS0 229 5 197 3E-77 94% 94% 291551905 Bacteria Proteobacteria (gamma) Gammaproteobacteria Erwinia amylovora GKJWQY101AF69O 551 4 548 0 97% 97% 291197582 Bacteria Proteobacteria (gamma) Gammaproteobacteria Erwinia amylovora GKJWQY101AYZFU 278 24 176 2E-41 88% 88% 150953431 Bacteria Proteobacteria (gamma) Gammaproteobacteria GKJWQY101AULKC 321 5 36 0.000007 100% 100% 295647398 Bacteria Proteobacteria (gamma) Gammaproteobacteria GKJWQY101AOYQY 489 4 456 0 98% 98% 53749768 Bacteria Proteobacteria (gamma) Gammaproteobacteria Legionella pneumophila GKJWQY101BJB7H 400 5 271 4E-103 93% 93% 256794767 Bacteria Proteobacteria (gamma) Gammaproteobacteria Kangiella koreensis GKJWQY101A55T9 550 25 496 5E-149 87% 87% 261412053 Bacteria Proteobacteria (gamma) Gammaproteobacteria Aggregatibacter actinomycetemcomitans GKJWQY101AQVXX 253 21 216 5E-61 89% 89% 247533203 Bacteria Proteobacteria (gamma) Gammaproteobacteria Aggregatibacter aphrophilus GKJWQY101BJFYS 507 5 462 0 99% 99% 108733343 Bacteria Proteobacteria (gamma) Gammaproteobacteria Pseudomonas syringae GKJWQY101AZO3V 560 18 452 0 95% 95% 49529273 Bacteria Proteobacteria (gamma) Gammaproteobacteria Acinetobacter sp. ADP1 GKJWQY101AX65L 589 5 588 0 93% 93% 71037566 Bacteria Proteobacteria (gamma) Gammaproteobacteria Psychrobacter arcticus GKJWQY101ALTT0 576 18 572 0 97% 97% 148570901 Bacteria Proteobacteria (gamma) Gammaproteobacteria Psychrobacter sp. PRwf-1 GKJWQY101ASWG9 515 17 430 0 97% 97% 226717097 Bacteria Proteobacteria (gamma) Gammaproteobacteria Azotobacter vinelandii GKJWQY101ACB0K 278 18 227 1E-77 93% 93% 190684944 Bacteria Proteobacteria (gamma) Gammaproteobacteria Cellvibrio japonicus GKJWQY101ARN9L 358 17 310 4E-148 99% 99% 63253978 Bacteria Proteobacteria (gamma) Gammaproteobacteria Pseudomonas syringae GKJWQY101A9KUB 568 5 556 0 91% 91% 283472039 Bacteria Proteobacteria (gamma) Gammaproteobacteria Xanthomonas albilineans GKJWQY101AGMLE 576 20 556 1E-155 86% 86% 117607074 Bacteria Proteobacteria (alpha) Alphaproteobacteria Magnetococcus sp. MC-1 GKJWQY101ALGOA 548 18 548 0 90% 90% 226525288 Bacteria Verrucomicrobia n uncultured Verrucomicrobia bacterium GKJWQY101BWYXQ 338 8 275 5E-82 88% 88% 187424568 Bacteria Verrucomicrobia Verrucomicrobiae Akkermansia muciniphila 205

Table S5. Eukarya mRNA gene sequences (and several genomic records that have rRNA genes) from the V5 sample. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n". Records in green from table 11 moved back (2 records were moved back because old Blastn result was better that new Blastn and Blastx from KAAS KEGG).

Q Q Q %- %- 454 Sequence number e-value GI number Domain Phylum Class / Order Genus / Species length start end ident sim GKJWQY101B20DL 80 1 80 3E-33 100% 100% 294935818 Eukaryota Perkinsea Perkinsus marinus GKJWQY101AEUTS 128 1 128 6E-53 97% 97% 50657597 Eukaryota Rhodophyta Florideophyceae Gracilaria tenuistipitata GKJWQY101AV8WV 332 1 332 2E-161 98% 98% 294661761 Eukaryota Arthropoda Branchiopoda Daphnia pulex GKJWQY101ACXJK 411 1 411 0 95% 95% 195094333 Eukaryota Arthropoda Insecta Drosophila grimshawi GKJWQY101BDI9Y 393 1 393 0 96% 96% 195102030 Eukaryota Arthropoda Insecta Drosophila grimshawi GKJWQY101BCYFO 306 1 306 3E-141 97% 97% 295393587 Eukaryota Arthropoda Insecta Euphydryas editha GKJWQY101B0QOY 266 1 266 3E-84 88% 88% 153902485 Eukaryota Arthropoda Insecta Gryllus bimaculatus GKJWQY101B09E4 370 1 112 1E-25 88% 88% 153919430 Eukaryota Arthropoda Insecta Gryllus bimaculatus GKJWQY101BA6T8 556 1 556 0 95% 95% 242805358 Eukaryota Ascomycota Eurotiomycetes Talaromyces stipitatus GKJWQY101AKNUG 492 1 492 6E-104 82% 82% 67540319 Eukaryota Ascomycota Eurotiomycetes Emericella nidulans GKJWQY101A5AN6 72 1 72 1E-18 92% 92% 50542891 Eukaryota Ascomycota Saccharomycetes Yarrowia lipolytica GKJWQY101BK6ZF 47 1 47 1E-09 94% 94% 49343295 Eukaryota Ascomycota Saccharomycetes Kluyveromyces lactis GKJWQY101ATCHW 494 1 494 0 99% 99% 238033210 Eukaryota Ascomycota Saccharomycetes Pichia pastoris GKJWQY101AFQK1 28 1 28 0.000005 100% 100% 259145041 Eukaryota Ascomycota Saccharomycetes Saccharomyces cerevisiae GKJWQY101A475X 79 1 79 1E-32 100% 100% 259147625 Eukaryota Ascomycota Saccharomycetes Saccharomyces cerevisiae GKJWQY101A88RH 115 1 115 2E-47 97% 97% 259147931 Eukaryota Ascomycota Saccharomycetes Saccharomyces cerevisiae GKJWQY101AQT0M 135 1 135 4E-51 94% 94% 259149327 Eukaryota Ascomycota Saccharomycetes Saccharomyces cerevisiae GKJWQY101BK6ZF_2 140 1 140 2E-19 78% 78% 23630288 Eukaryota Basidiomycota Tremellomycetes Cryptococcus neoformans GKJWQY101BK6ZF_3 151 1 151 5E-15 78% 78% 315465309 Eukaryota Basidiomycota Ustilaginomycetes Sporisorium GKJWQY101ANHKA 513 1 513 0 91% 91% 71483398 Eukaryota Basidiomycota Microbotryales Microbotryum violaceum GKJWQY101B0NNM 284 1 284 7E-116 94% 94% 171191502 Eukaryota Cercozoa Imbricatea Paulinella chromatophora GKJWQY101BSD9R 103 1 100 2E-42 99% 99% 56159573 Eukaryota Chlorophyta Ulvophyceae Pseudendoclonium akinetum GKJWQY101A15DF 332 1 332 2E-47 78% 78% 156307499 Eukaryota Cnidaria Anthozoa Nematostella vectensis GKJWQY101AR687 53 1 53 3E-15 96% 96% 253981836 Eukaryota Cryptophyta Cryptomonas paramecium GKJWQY101AY5N9 185 1 185 4E-91 100% 100% 166044520 Eukaryota Heterokontophyta Oomycota Aphanomyces euteiches GKJWQY101BRT6Z 395 1 395 0 98% 98% 166044537 Eukaryota Heterokontophyta Oomycota Aphanomyces euteiches GKJWQY101AJXX3 28 1 28 0.000005 100% 100% 336170404 Eukaryota Kinetoplastida Trypanosomatidae Schizotrypanum GKJWQY101A8PWD 362 1 362 2E-177 98% 98% 154347362 Eukaryota Percolozoa Heterolobosea Naegleria gruberi GKJWQY101ASKP6 96 1 96 1E-38 98% 98% 124012151 Eukaryota Streptophyta Chlorokybophyceae Chlorokybus atmophyticus GKJWQY101AZ7HH 105 1 105 2E-37 94% 94% 210148086 Eukaryota Streptophyta Cornales Cornus kousa GKJWQY101BH3Q0 218 1 218 7E-45 83% 83% 13445170 Eukaryota Streptophyta Liliopsida Festuca arundinacea GKJWQY101ABSK7 442 1 442 0 93% 93% 21779916 Eukaryota Streptophyta Liliopsida Aegilops tauschii GKJWQY101BPH2R 480 1 480 2E-109 83% 83% 32307244 Eukaryota Streptophyta Liliopsida Aegilops tauschii GKJWQY101AJYHQ 243 1 243 3E-108 97% 97% 40849982 Eukaryota Streptophyta Liliopsida Triticum turgidum GKJWQY101AXHYK 81 1 81 2E-26 95% 95% 72256311 Eukaryota Streptophyta Liliopsida Triticum aestivum GKJWQY101BH96M 334 1 334 6E-63 83% 83% 102567891 Eukaryota Streptophyta Liliopsida Zea perennis GKJWQY101AQJ6W 300 1 299 2E-113 92% 92% 115392331 Eukaryota Streptophyta Liliopsida Triticum monococcum GKJWQY101AQJ6W_2 181 1 181 5E-70 94% 94% 115392331 Eukaryota Streptophyta Liliopsida Triticum monococcum GKJWQY101BYWC5 492 1 492 0 91% 91% 124007144 Eukaryota Streptophyta Liliopsida Triticum urartu GKJWQY101ANTJW 285 1 285 6E-131 97% 97% 148372275 Eukaryota Streptophyta Liliopsida Triticum monococcum GKJWQY101BYJS2 492 1 492 0 99% 99% 148910867 Eukaryota Streptophyta Liliopsida Triticum turgidum GKJWQY101AQ547 348 1 348 3E-100 87% 87% 194131647 Eukaryota Streptophyta Liliopsida Triticum turgidum GKJWQY101BU9TF 262 1 262 2E-106 94% 94% 194239068 Eukaryota Streptophyta Liliopsida Triticum aestivum GKJWQY101BF6SX 432 1 432 0 100% 100% 209361311 Eukaryota Streptophyta Liliopsida Coix lacryma-jobi GKJWQY101AOCWM 154 1 154 2E-13 77% 77% 212007811 Eukaryota Streptophyta Liliopsida Triticum aestivum GKJWQY101BW1XB 383 1 383 0 99% 99% 255099160 Eukaryota Streptophyta Liliopsida Dendrocalamus latiflorus GKJWQY101AN3LX 127 1 127 2E-57 99% 99% 219819090 Eukaryota Streptophyta Apiales Daucus carota GKJWQY101BVC5O 578 1 578 2E-123 83% 83% 88656961 Eukaryota Streptophyta Asterales Lactuca sativa GKJWQY101BNTYY 300 273 300 0.001 100% 100% 259526188 Eukaryota Streptophyta Asterales Artemisia annua GKJWQY101AZBHB 232 1 232 1E-107 98% 98% 194132090 Eukaryota Streptophyta Malvales Gonystylus GKJWQY101AYF7P 275 1 275 3E-139 99% 99% 170522360 Eukaryota Streptophyta Brassicales Carica papaya GKJWQY101ADA3T 198 1 198 1E-96 99% 99% 62149314 Eukaryota Streptophyta Caryophyllales Silene latifolia GKJWQY101APA8N 428 1 428 0 94% 94% 193788921 Eukaryota Streptophyta Trifolium subterraneum GKJWQY101A4746 335 1 335 2E-152 96% 96% 293338622 Eukaryota Streptophyta Fabales Lathyrus sativus GKJWQY101ATFGY 514 1 514 0 93% 93% 210143279 Eukaryota Streptophyta Fabales Glycine max GKJWQY101BGGKO 394 1 394 1E-159 93% 93% 218135405 Eukaryota Streptophyta Fabales Medicago truncatula GKJWQY101A1C94 345 1 345 4E-178 99% 99% 329124647 Eukaryota Streptophyta Solanales Solanum GKJWQY101BOE3Q 467 1 467 0 96% 96% 147776538 Eukaryota Streptophyta Vitales Vitis vinifera GKJWQY101BYF2M 52 1 52 5E-13 94% 94% 147820696 Eukaryota Streptophyta Vitales Vitis vinifera GKJWQY101BTYU0 537 1 537 0 99% 99% 239764707 Eukaryota Streptophyta Vitales Vitis vinifera 206

Table S6. Archaea and Viruses from V5. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Q Q Q %- %- 454 Sequence number e-value GI number Order Family Genus Description length start end ident sim GKJWQY101A0CZO 303 20 258 1E-117 99% 99% 207366080 Archaea n n n Uncultured archaeon partial 16S rRNA gene, clone ODP204 30 Bac263 GKJWQY101A62OT 539 33 61 0.0005 100% 100% 262527001 Archaea n n n Uncultured archaeon ANME-1, unordered contigs GKJWQY101A8DGR 536 17 533 0 98% 98% 288872851 Viruses n Microviridae Microvirus Enterobacteria phage phiX174 isolate JACSK, complete genome GKJWQY101AB5CE 552 18 547 1E-84 79% 79% 91982906 Viruses Caudovirales Siphoviridae n Propionibacterium phage PA6, complete genome 207

Table S7. Small subunit rRNA genes of Bacteria and Eukarya from V6. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n". Accession Q Q Q e-value %- %- GI number Domain Phylum Class Genus / Species number length start end ident sim JQ999507 239 1 239 1E-110 97% 97% 38195134 Bacteria Actinobacteria Actinobacteria uncultured actinobacterium JQ999509 312 1 312 1E-157 99% 99% 154184543 Bacteria Firmicutes Clostridia uncultured Lachnospiraceae bacterium JQ999560 228 1 228 8E-113 100% 100% 295443962 Bacteria Firmicutes Bacilli Staphylococcus sp. NCCP-163 JQ999508 274 1 272 3E-118 93% 93% 295815580 Bacteria Firmicutes Bacilli Lactococcus lactis JQ999559 247 1 247 2E-120 99% 99% 294337929 Bacteria Firmicutes Bacilli Bacillus clausii JQ999534 593 3 514 0 98% 98% 110448355 Bacteria n n uncultured bacterium JQ999535 666 1 666 0 94% 94% 237774881 Bacteria n n uncultured bacterium JQ999516 245 1 245 3E-122 100% 100% 70959311 Bacteria n n uncultured bacterium JQ999537 871 1 871 0 98% 98% 238303249 Bacteria n n uncultured bacterium JQ999531 445 28 445 0 99% 99% 223675496 Bacteria n n uncultured bacterium JQ999561 246 1 239 1E-91 93% 93% 257143813 Bacteria n n uncultured bacterium JQ999521 284 50 284 1E-106 97% 97% 220682601 Bacteria n n uncultured bacterium JQ999515 243 1 243 7E-119 99% 99% 292596344 Bacteria n n uncultured bacterium JQ999524 313 1 313 1E-151 98% 98% 71089597 Bacteria n n uncultured bacterium JQ999518 251 60 248 2E-85 97% 97% 169286658 Bacteria n n uncultured bacterium JQ999525 339 1 302 7E-155 100% 100% 169283493 Bacteria n n uncultured bacterium JQ999530 430 29 430 1E-144 90% 90% 238409922 Bacteria n n uncultured bacterium JQ999562 286 1 286 6E-125 95% 95% 76057884 Bacteria n n uncultured bacterium JQ999523 311 1 311 2E-159 100% 100% 291507684 Bacteria n n uncultured bacterium JQ999563 398 1 363 1E-158 95% 95% 295810008 Bacteria n n uncultured bacterium JQ999529 416 1 416 0 96% 96% 290616790 Bacteria n n uncultured bacterium JQ999519 261 1 261 7E-129 99% 99% 238341248 Bacteria n n uncultured bacterium JQ999533 544 1 529 0 93% 93% 192966167 Bacteria n n uncultured bacterium JQ999520 270 80 270 4E-67 92% 92% 291506987 Bacteria n n uncultured bacterium JQ999526 349 1 349 3E-178 99% 99% 261262250 Bacteria n n uncultured bacterium JQ999512 239 1 239 1E-110 97% 97% 284158279 Bacteria n n uncultured bacterium JQ999528 387 1 381 0 98% 98% 192988657 Bacteria n n uncultured bacterium JQ999536 720 1 720 0 92% 92% 82393901 Bacteria n n uncultured bacterium JQ999564 516 1 516 0 98% 98% 257144483 Bacteria n n uncultured bacterium JQ999527 355 4 355 5E-142 93% 93% 151548166 Bacteria n n uncultured bacterium JQ999532 482 1 482 0 92% 92% 110440003 Bacteria n n uncultured bacterium JQ999517 249 1 249 7E-119 98% 98% 217417015 Bacteria n n uncultured bacterium JQ999511 219 1 219 4E-96 96% 96% 240000804 Bacteria n n uncultured bacterium JQ999513 240 2 240 2E-99 95% 95% 109676490 Bacteria n n uncultured bacterium JQ999510 212 1 212 6E-99 98% 98% 158998775 Bacteria n n lobster gut bacterium ABHa3 JQ999514 242 1 218 4E-106 99% 99% 295814818 Bacteria n n uncultured bacterium JQ999522 284 1 284 2E-120 95% 95% 192979871 Bacteria n n uncultured bacterium JQ999565 401 1 319 1E-163 100% 100% 224027508 Bacteria Proteobacteria (alpha) Alphaproteobacteria Brevundimonas sp. AKB-2008-KU11 JQ999539 226 1 226 1E-105 98% 98% 189306205 Bacteria Proteobacteria (alpha) Alphaproteobacteria uncultured Mycoplana sp. JQ999538 233 3 233 1E-90 93% 93% 148615349 Bacteria Proteobacteria (alpha) Alphaproteobacteria uncultured alpha proteobacterium JQ999566 270 1 245 2E-119 99% 99% 134084827 Bacteria Proteobacteria (alpha) Alphaproteobacteria Subaequorebacter tamlense JQ999542 515 3 515 0 91% 91% 213536827 Bacteria Proteobacteria (beta) Betaproteobacteria Delftia acidovorans JQ999541 655 2 655 0 96% 96% 255348346 Bacteria Proteobacteria (beta) Betaproteobacteria Comamonas sp. BF-3 JQ999540 373 1 373 5E-172 95% 95% 295322914 Bacteria Proteobacteria (beta) Betaproteobacteria Burkholderia cepacia JQ999567 397 10 397 7E-146 91% 91% 291482199 Bacteria Proteobacteria (beta) Betaproteobacteria uncultured beta proteobacterium JQ999544 450 8 450 0 96% 96% 149900449 Bacteria Proteobacteria (beta) Betaproteobacteria Uncultured betaproteobacterium JQ999543 234 1 234 2E-113 99% 99% 285200309 Bacteria Proteobacteria (beta) Betaproteobacteria Herbaspirillum sp. oral taxon A32 JQ999548 959 20 959 0 100% 100% 295394130 Bacteria Proteobacteria (gamma) Gammaproteobacteria Escherichia sp. enrichment culture clone JQ999545 634 1 634 0 97% 97% 255763066 Bacteria Proteobacteria (gamma) Gammaproteobacteria Rheinheimera sp. HMD2012 JQ999554 314 1 314 5E-161 100% 100% 295687302 Bacteria Proteobacteria (gamma) Gammaproteobacteria uncultured Pseudomonas sp. JQ999552 275 1 275 8E-134 98% 98% 162951385 Bacteria Proteobacteria (gamma) Gammaproteobacteria uncultured gamma proteobacterium JQ999506 346 73 346 5E-137 99% 99% 269931076 Bacteria Proteobacteria (gamma) Gammaproteobacteria Escherichia coli JQ999549 309 1 309 5E-156 99% 99% 269911743 Bacteria Proteobacteria (gamma) Gammaproteobacteria uncultured Enterobacteriaceae bacterium JQ999556 416 32 416 7E-161 94% 94% 57918745 Bacteria Proteobacteria (gamma) Gammaproteobacteria Vibrio sp. U32 JQ999551 756 1 756 0 99% 99% 294799818 Bacteria Proteobacteria (gamma) Gammaproteobacteria Shigella sp. 29_2010_ JQ999550 454 16 454 1E-174 93% 93% 39546462 Bacteria Proteobacteria (gamma) Gammaproteobacteria rainbow trout intestinal bacterium T1 JQ999546 229 1 229 3E-112 99% 99% 257073647 Bacteria Proteobacteria (gamma) Gammaproteobacteria uncultured Citrobacter sp. JQ999547 228 1 228 1E-111 99% 99% 257074351 Bacteria Proteobacteria (gamma) Gammaproteobacteria uncultured Enterobacter sp. JQ999557 279 1 279 3E-123 96% 96% 154194068 Bacteria Proteobacteria (uncult) n uncultured proteobacterium JQ999558 297 49 297 5E-116 98% 98% 154190433 Bacteria Proteobacteria (uncult) n uncultured proteobacterium JQ999625 338 1 338 7E-170 99% 99% 291482367 Eukaryota Ascomycota Dothideomycetes Cladosporium cladosporioides JQ999627 701 1 676 0 92% 92% 27447881 Eukaryota Ascomycota n Medeolaria farlowii JQ999626 368 1 368 6E-176 98% 98% 219563700 Eukaryota Ascomycota Leotiomycetes Cyathicula microspora JQ999629 264 1 264 2E-125 98% 98% 156637429 Eukaryota Ascomycota Saccharomycetes Millerozyma farinosa JQ999633 840 1 840 0 95% 95% 283131270 Eukaryota n n uncultured fungus JQ999630 226 1 226 1E-111 100% 100% 157925543 Eukaryota n n uncultured fungus 208 Table S7 Cont.

JQ999632 273 1 254 1E-126 99% 99% 256006248 n n n uncultured organism JQ999631 303 1 303 2E-154 100% 100% 290782478 Eukaryota Streptophyta n Melicope cf. crassiramis SW-2006 209

Table S8. Large subunit rRNA genes of Bacteria and Eukarya from V6. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n". Accession Q length Q start Q end e-value %-ident %-sim GI number Domain Phylum Class Genus / Species number JQ999835 250 1 250 2E-105 95% 95% 296416 Bacteria Firmicutes Bacilli Sporosarcina globispora JQ999833 308 68 308 7E-75 86% 86% 291259210 Bacteria n n uncultured bacterium

JQ999831 226 1 205 5E-90 97% 97% 291260098 Bacteria n n uncultured bacterium

JQ999830 222 1 222 2E-108 99% 99% 291258986 Bacteria n n uncultured bacterium

JQ999907 588 178 588 0 97% 97% 159171560 Eukaryota Ascomycota Dothideomycetes Phaeosphaeria avenaria

JQ999906 500 1 500 0 99% 99% 284158823 Eukaryota Ascomycota Dothideomycetes Davidiella tassiana

JQ999628 749 4 749 0 94% 94% 291170394 Eukaryota Ascomycota n Coniosporium apollinis

JQ999908 231 1 231 1E-106 97% 97% 38154526 Eukaryota n n uncultured fungus 210

Table S9. Ribosomal RNA gene sequences less than 200 nt in length (could not be submitted to NCBI) from V6. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Q Q Q %- %- Contiguous sequence e-value GI number Domain Phyum Class Genus / Species length start end ident sim VostokV6_rep_c226 179 1 179 1E-85 99% 99% 262063836 Bacteria Actinobacteria Actinobacteria uncultured Frankineae bacterium VostokV6_s130 192 1 124 1E-55 99% 99% 268032028 Bacteria Actinobacteria Actinobacteria Micrococcus sp. M-B-1 VostokV6_rep_c96 197 1 156 1E-60 94% 94% 95117795 Bacteria Actinobacteria Actinobacteria Clavibacter michiganensis VostokV6_s134 134 1 134 1E-59 99% 99% 196174919 Bacteria Actinobacteria Actinobacteria Mycobacterium marinum VostokV6_s244 108 1 108 5E-47 99% 99% 290794322 Bacteria Chloroflexi n uncultured Chloroflexi bacterium VostokV6_rep_c194 175 1 175 1E-85 100% 100% 9187671 Bacteria n n uncultured rape rhizosphere VostokV6_rep_c199 200 23 200 8E-68 94% 94% 119352337 Bacteria n n uncultured bacterium VostokV6_rep_c274 203 1 203 6E-59 85% 85% 28848718 Bacteria n n uncultured bacterium VostokV6_s184 137 1 137 1E-64 100% 100% 126402247 Bacteria n n uncultured bacterium VostokV6_s243 78 1 78 3E-32 100% 100% 295687301 Bacteria n n uncultured bacterium VostokV6_s265 188 1 188 3E-86 98% 98% 217416971 Bacteria n n uncultured bacterium VostokV6_rep_c157 210 1 210 1E-96 97% 97% 291260192 Bacteria n n uncultured bacterium VostokV6_rep_c293 195 1 195 9E-92 98% 98% 291261679 Bacteria n n uncultured bacterium VostokV6_s95 99 1 76 6E-31 100% 100% 291260719 Bacteria n n uncultured bacterium VostokV6_c165 203 1 203 2E-84 95% 95% 285159491 Bacteria Proteobacteria Betaproteobacteria mirabilis VostokV6_s253 156 1 156 7E-72 99% 99% 284810302 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. CV4.4.3R1 VostokV6_s62 105 43 105 6E-21 97% 97% 238057381 Bacteria Proteobacteria Betaproteobacteria Uncultured Thiobacillus sp. VostokV6_rep_c148 163 1 163 4E-65 95% 95% 98975330 Bacteria Proteobacteria Gammaproteobacteria Moraxella bovoculi VostokV6_s113 198 1 162 8E-53 90% 90% 1240033 Bacteria Proteobacteria Gammaproteobacteria Escherichia VostokV6_c270 148 1 148 9E-71 100% 100% 284159233 Eukaryota Ascomycota Saccharomycetes Ogataea thermomethanolica VostokV6_c110 94 25 94 7E-25 97% 97% 166947935 Eukaryota Ascomycota Saccharomycetes Candida tropicalis VostokV6_c187 111 1 111 5E-42 94% 94% 158819348 Eukaryota Ascomycota Saccharomycetes Babjeviella inositovora VostokV6_c100 177 4 177 4E-85 100% 100% 234195550 Eukaryota Basidiomycota n uncultured Agaricomycotina VostokV6_c223 77 1 77 1E-31 100% 100% 295393258 Eukaryota Basidiomycota Tremellomycetes Bullera taiwanensis VostokV6_rep_c131 185 1 184 1E-90 100% 100% 292660457 Eukaryota Basidiomycota n uncultured Basidiomycota VostokV6_c273 47 3 42 7E-10 98% 98% 256859920 Eukaryota Basidiomycota Geastrum sessile VostokV6_c233 218 35 218 5E-85 98% 98% 50845139 Eukaryota n n uncultured fungus VostokV6_rep_c141 114 1 114 5E-52 100% 100% 256373711 Eukaryota Streptophyta n Thalictrum simplex 211

Table S10. Bacteria and Eukarya mRNA (and other non-rRNA) gene sequences from V6. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Q Q Q %- %- Contiguous sequence e-value GI number Domain Phylum Class Genus / Species length start end ident sim VostokV6 rep c158 182 1 182 3E-61 91% 91% 237757549 Bacteria Actinobacteria Actinobacteria Corynebacterium kroppenstedtii VostokV6 rep c202 205 1 205 3E-92 97% 97% 119947346 Bacteria Actinobacteria Actinobacteria Arthrobacter aurescens VostokV6 rep c217 321 1 321 9E-124 92% 92% 38200856 Bacteria Actinobacteria Actinobacteria Corynebacterium diphtheriae VostokV6 rep c246 175 4 175 2E-52 89% 89% 140843962 Bacteria Actinobacteria Actinobacteria Corynebacterium glutamicum VostokV6 s74 113 1 113 1E-47 98% 98% 162952245 Bacteria Actinobacteria Actinobacteria Renibacterium salmoninarum VostokV6 c235 95 1 95 1E-41 100% 100% 254946573 Bacteria Bacteroidetes Cytophagia Dyadobacter fermentans VostokV6 c112 258 5 258 6E-35 79% 79% 218766851 Bacteria Deinococcus-Thermus Deinococci Thermus thermophilus VostokV6 c161 147 1 142 2E-53 94% 94% 289178903 Bacteria Firmicutes Bacilli Staphylococcus lugdunensis VostokV6 c212 189 1 189 3E-72 94% 94% 256797400 Bacteria Firmicutes Clostridia Anaerococcus prevotii VostokV6 c256 587 31 587 0 95% 95% 295107714 Bacteria Firmicutes Clostridia Ruminococcus obeum VostokV6 c97 217 1 217 1E-81 93% 93% 262396937 Bacteria Firmicutes Bacilli Lactobacillus johnsonii VostokV6 rep c149 186 1 186 9E-87 98% 98% 225726369 Bacteria Firmicutes Bacilli Streptococcus pneumoniae VostokV6 rep c204 316 53 316 7E-130 99% 99% 68445725 Bacteria Firmicutes Bacilli Staphylococcus haemolyticus VostokV6 rep c139 317 1 317 3E-119 92% 92% 257048753 Bacteria Fusobacteria Fusobacteria Leptotrichia buccalis VostokV6 c185 164 1 164 1E-79 100% 100% 193084619 Bacteria n n uncultured bacterium HF0500 12O04 VostokV6 c248 245 3 209 1E-22 77% 77% 193084063 Bacteria n n uncultured bacterium KM3-23-D4 VostokV6 s291 200 1 200 2E-68 92% 92% 62860402 Bacteria n n uncultured bacterium zdt-44a23 VostokV6 rep c123 498 48 498 0 94% 94% 193084637 Bacteria n n uncultured bacterium HF4000 16C08 VostokV6 c15 584 1 584 0 100% 100% 146403799 Bacteria Proteobacteria Alphaproteobacteria Bradyrhizobium sp. BTAi1 VostokV6 rep c230 265 2 265 2E-114 96% 96% 188532098 Bacteria Proteobacteria Alphaproteobacteria Sphingobium chungbukense VostokV6 s272 222 2 222 3E-97 96% 96% 288914861 Bacteria Proteobacteria Alphaproteobacteria Azospirillum sp. B510 VostokV6 c16 689 25 689 0 98% 98% 120591888 Bacteria Proteobacteria Betaproteobacteria Polaromonas naphthalenivorans VostokV6 c245 332 1 332 8E-90 86% 86% 160361034 Bacteria Proteobacteria Betaproteobacteria Delftia acidovorans VostokV6 c268 191 71 191 6E-14 80% 80% 294338440 Bacteria Proteobacteria Betaproteobacteria Thiomonas sp. 3As VostokV6 c72 111 1 96 2E-25 90% 90% 121551644 Bacteria Proteobacteria Betaproteobacteria Verminephrobacter eiseniae VostokV6 rep c124 358 1 358 2E-155 95% 95% 221728669 Bacteria Proteobacteria Betaproteobacteria Acidovorax ebreus VostokV6 rep c77 251 1 251 2E-95 92% 92% 237502667 Bacteria Proteobacteria Betaproteobacteria Burkholderia pseudomallei VostokV6 c247 203 1 203 6E-94 98% 98% 145692985 Bacteria Proteobacteria Gammaproteobacteria Pseudomonas aeruginosa VostokV6 rep c104 605 1 563 0 99% 99% 281599365 Bacteria Proteobacteria Gammaproteobacteria Shigella flexneri VostokV6 rep c117 194 1 194 2E-73 93% 93% 161361677 Bacteria Proteobacteria Gammaproteobacteria VostokV6 rep c155 257 1 257 2E-125 99% 99% 291150583 Bacteria Proteobacteria Gammaproteobacteria Pantoea ananatis VostokV6 rep c200 325 1 325 9E-139 94% 94% 206564770 Bacteria Proteobacteria Gammaproteobacteria Klebsiella pneumoniae VostokV6 rep c218 285 62 285 2E-85 93% 93% 291551905 Bacteria Proteobacteria Gammaproteobacteria Erwinia amylovora VostokV6 s87 79 1 79 5E-30 98% 98% 253778933 Bacteria Proteobacteria Gammaproteobacteria Photorhabdus asymbiotica VostokV6 rep c192 247 1 247 4.00E-121 98% 98% 294661761 EukaryotaArthropoda Daphniidae Daphnia pulex VostokV6 rep c171 564 28 536 0 99% 99% 294910885 EukaryotaArthropoda Ixodidae Dermacentor variabilis VostokV6 s73 107 1 65 5.00E-22 97% 97% 294910977 EukaryotaArthropoda Ixodidae Dermacentor variabilis VostokV6 s69 206 1 206 1.00E-91 96% 96% 294922085 EukaryotaArthropoda Ixodidae Dermacentor variabilis VostokV6 s122 220 1 220 5.00E-105 98% 98% 294922086 EukaryotaArthropoda Ixodidae Dermacentor variabilis 212

Table S11. Blastn and Blastx results from analysis of V5 sequences on the KAAS KEGG site (Moriya et al. 2007). The searches were for highly similar sequences (megablast); Max. target sequences = 100; Expected threshold = 1e-10 (unless no results were found, then 0); filter low complexity regions and translated nucleotide search over Reference sequence protein database; Matrix - BLOSUM62; Scorin parameters (existence 11; extension 1); filter low complexity regions. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n". For the Blastx results instead of percent similarity the values are for the percent positive those are marked with "*". Sequences with identities” 50% are labeled with violet background.

%- %- 454 Sequence number Orthology # KAAS KEGG Enzyme name Pathways Names Q length Q start Q end e-value GI number Domain Phyla Class / Order Description ident simil* GKJWQY101AMLFW K09687 antibiotic transport system ATP-binding protein ABC transporters 308 1 308 2.0E-62 81% 81% 451902131 Bacteria Actinobacteria Actinobacteria Corynebacterium halotolerans YIM 70093 = DSM 44683, complete genome, product = BC-type multidrug transporter ATPase GKJWQY101AYS8L K02006 cobalt/nickel transport system ATP-binding protein ABC transporters 288 1 288 1.0E-129 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = cobalt ABC transporter, ATPase Oscillatoriales subunit GKJWQY101BNHJH K02010 iron(III) transport system ATP-binding protein [EC:3.6.3.30] ABC transporters 513 1 513 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = Fe(3+)-transporting ATPase GKJWQY101BAA0P K02045 sulfate transport system ATP-binding protein [EC:3.6.3.25] ABC transporters 468 1 468 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = sulfate ABC transporter, ATPase Oscillatoriales subunit GKJWQY101AGPVU K09690 lipopolysaccharide transport system permease protein ABC transporters 463 1 463 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = ABC-2 type transporter GKJWQY101AD59C K11070 spermidine/putrescine transport system permease protein ABC transporters 414 1 414 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = binding-protein-dependent transport Oscillatoriales systems inner membrane component GKJWQY101AYDKO K11085 ATP-binding cassette, subfamily B, bacterial MsbA [EC:3.6.3.-] ABC transporters 290 1 290 2.0E-123 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = xenobiotic-transporting ATPase GKJWQY101AIM5Y K01995 branched-chain amino acid transport system ATP-binding protein ABC transporters 584 1 584 0.0E+00 99% 99% 296848933 Bacteria Deinococcus-Thermus Deinococci Meiothermus silvanus DSM 9946, complete genome, product = ABC transporter related protein GKJWQY101A0EHG K02035 peptide/nickel transport system substrate-binding protein ABC transporters 495 1 495 0.0E+00 99% 99% 296848933 Bacteria Deinococcus-Thermus Deinococci Meiothermus silvanus DSM 9946, complete genome, product = ABC-type dipeptide/oligopeptide/nickel transport system, permease component GKJWQY101ASUCA K10036 glutamine transport system substrate-binding protein ABC transporters 311 1 311 8.0E-161 100% 100% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = amino ABC transporter, permease, 3-TM region, His/Glu/Gln/Arg/opine family domain protein GKJWQY101AMX6D K10118 multiple sugar transport system permease protein ABC transporters 457 1 457 0.0E+00 98% 98% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = binding--dependent transport system inner membrane component family protein (<1..51) (48..>456) GKJWQY101AP0B9 K10440 ribose transport system permease protein ABC transporters 239 1 239 1.0E-70 87% 87% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = permease component of ribose///galactoside ABC-type transporters GKJWQY101BDGCG K10441 ribose transport system ATP-binding protein [EC:3.6.3.17] ABC transporters 450 1 450 2.0E-118 84% 84% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = ABC-type sugar transport system, ATPase component <1..252); ABC-type xylose transport system, permease component (249..>446) GKJWQY101BVWN6 K11073 putrescine transport system substrate-binding protein ABC transporters 489 1 489 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = putrescine ABC transporter putrescine-binding protein PotF GKJWQY101B17TY_2 K10228 / transport system permease protein ABC transporters 161 1 161 7.0E-64 95% 95% 77965403 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. 383 chromosome 1, complete sequence, product = sorbitol ABC transporter membrane protein/mannitol ABC transporter membrane protein GKJWQY101BLVUA K11710 manganese/zinc/iron transport system ATP- binding protein ABC transporters 521 4 521 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = phosphonate-transporting ATPase Oscillatoriales GKJWQY101AZ1BP K11952 bicarbonate transport system ATP-binding protein [EC:3.6.3.-] ABC transporters 401 1 401 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = nitrate ABC transporter, ATPase Oscillatoriales subunits C and D GKJWQY101BA0N7 K11959 urea transport system substrate-binding protein ABC transporters 457 1 457 0.0E+00 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = integral membrane sensor hybrid Oscillatoriales histidine kinase GKJWQY101BB18D K12368 dipeptide transport system substrate-binding protein ABC transporters 471 1 471 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = dipeptide-binding ABC transporter, periplasmic substrate-binding component GKJWQY101ACKKD K12369 dipeptide transport system permease protein ABC transporters 474 1 474 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = dipeptide transport system permease protein DppB (<1..211); dipeptide-binding ABC transporter, periplasmic substrate-binding component (331..>474) GKJWQY101BC35N K01533 Acting on acid anhydrides to catalyse transmembrane movement of substances 331 1 331 4.0E-164 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = copper-translocating P-type Cu2+-exporting ATPase [EC:3.6.3.4] ATPase GKJWQY101AY87X K01265 Acting on peptide bonds (peptidases) 359 1 359 2.0E-147 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = methionine aminopeptidase, type I methionyl aminopeptidase [EC:3.4.11.18] Oscillatoriales GKJWQY101AVMJN K01258 tripeptide aminopeptidase [EC:3.4.11.4] Acting on peptide bonds (peptidases) 498 1 498 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = peptidase T GKJWQY101B0Z6C K00676 Acyltransferases [Transfer groups other than aminoacyl groups] 340 1 340 3.0E-135 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = GCN5-related N-acetyltransferase ribosomal-protein-alanine N-acetyltransferase [EC:2.3.1.128] Oscillatoriales GKJWQY101A7UDN K00684 Acyltransferases [tRNA modification factors] 423 1 423 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = leucyl/phenylalanyl-tRNA--protein leucyl/phenylalanyl-tRNA--protein transferase [EC:2.3.2.6] Oscillatoriales transferase (<1..283); hypothetical protein (330..>422) GKJWQY101B20C7 K13821 proline dehydrogenase / delta 1-pyrroline-5-carboxylate dehydrogenase Alanine, aspartate and glutamate metabolism 480 2 478 2.0E-93 92% 95%* 400288736 Bacteria Proteobacteria Gammaproteobacteria bifunctional proline dehydrogenase/pyrroline-5-carboxylate dehydrogenase [Psychrobacter sp. PAMC [EC:1.5.1.12 1.5.99.8] 21119] GKJWQY101AKE4Q K01425 glutaminase [EC:3.5.1.2] Alanine, aspartate and glutamate metabolism/Arginine and proline metabolism/D-Glutamine and D- 407 1 407 1.0E-170 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = L-glutaminase glutamate metabolism/Nitrogen metabolism Oscillatoriales GKJWQY101AKLKY K00812 aspartate aminotransferase [EC:2.6.1.1] Alanine, aspartate and glutamate metabolism/Cysteine and methionine metabolism/Arginine and 473 1 473 0.0E+00 98% 98% 325124855 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii subsp. bulgaricus 2038, complete genome, product = aspartate proline metabolism/Tyrosine metabolism/Phenylalanine metabolism/Phenylalanine, tyrosine and aminotransferase tryptophan biosynthesis/Novobiocin biosynthesis/Isoquinoline alkaloid biosynthesis/Tropane, piperidine and pyridine alkaloid biosynthesis

GKJWQY101B0F5K K00278 L-aspartate oxidase [EC:1.4.3.16] Alanine, aspartate and glutamate metabolism/Nicotinate and nicotinamide metabolism 481 1 481 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = L-aspartate oxidase GKJWQY101BLU93 K00259 Alanine, aspartate and glutamate metabolism/Taurine and hypotaurine metabolism 465 1 465 0.0E+00 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = alanine dehydrogenase/pyridine alanine dehydrogenase [EC:1.4.1.1] Oscillatoriales nucleotide transhydrogenase GKJWQY101BPBGL K04042 bifunctional UDP-N-acetylglucosamine pyrophosphorylase / Glucosamine-1- Amino sugar and nucleotide sugar metabolism 407 1 407 0.0E+00 99% 99% 110673209 Bacteria Firmicutes Clostridia Clostridium perfringens ATCC 13124, complete genome, product = UDP-N-acetylglucosamine phosphate N-acetyltransferase [EC:2.3.1.157 2.7.7.23] diphosphorylase/glucosamine-1-phosphate N-acetyltransferase GKJWQY101AGM2N K01443 N-acetylglucosamine-6-phosphate deacetylase [EC:3.5.1.25] Amino sugar and nucleotide sugar metabolism 458 1 458 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = N-acetylglucosamine-6-phosphate deacetylase GKJWQY101AM94T K01791 UDP-N-acetylglucosamine 2-epimerase [EC:5.1.3.14] Amino sugar and nucleotide sugar metabolism 449 1 449 0.0E+00 96% 96% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = UDP-N-acetylglucosamine 2- epimerase GKJWQY101ARLO8 K01876 aspartyl-tRNA synthetase [EC:6.1.1.12] Aminoacyl-tRNA biosynthesis 357 1 357 3.0E-161 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = aspartyl-tRNA synthetase GKJWQY101BS0OD K01887 arginyl-tRNA synthetase [EC:6.1.1.19] Aminoacyl-tRNA biosynthesis 233 1 233 7.0E-76 89% 89% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = arginyl-tRNA synthetase GKJWQY101AXKT6 K01890 phenylalanyl-tRNA synthetase beta chain [EC:6.1.1.20] Aminoacyl-tRNA biosynthesis 352 1 352 2.0E-148 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = phenylalanyl-tRNA synthetase beta Oscillatoriales subunit GKJWQY101BY4M2 K01890 phenylalanyl-tRNA synthetase beta chain [EC:6.1.1.20] Aminoacyl-tRNA biosynthesis 508 1 508 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = phenylalanyl-tRNA synthetase beta Oscillatoriales subunit GKJWQY101BC61Q K01873 valyl-tRNA synthetase [EC:6.1.1.9] Aminoacyl-tRNA biosynthesis 267 1 267 9.0E-135 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = valine--tRNA ligase GKJWQY101B0MN7 K01879 glycyl-tRNA synthetase beta chain [EC:6.1.1.14] Aminoacyl-tRNA biosynthesis 488 1 488 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = glycine--tRNA ligase, beta subunit (<1..286); glycine--tRNA ligase, alpha subunit (288..>484) GKJWQY101ABVRO K01883 cysteinyl-tRNA synthetase [EC:6.1.1.16] Aminoacyl-tRNA biosynthesis 442 1 442 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = cysteine--tRNA ligase GKJWQY101BCECM K01866 tyrosyl-tRNA synthetase [EC:6.1.1.1] Aminoacyl-tRNA biosynthesis 311 1 311 7.0E-37 76% 76% 156768434 Bacteria n n Uncultured bacterium clone LM0ABA39ZD07FM1 genomic sequence GKJWQY101AATC2 K01874 methionyl-tRNA synthetase [EC:6.1.1.10] Aminoacyl-tRNA biosynthesis 498 1 498 2.0E-139 85% 85% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = methionyl-tRNA synthetase GKJWQY101A4YI5 K04566 lysyl-tRNA synthetase, class I [EC:6.1.1.6] Aminoacyl-tRNA biosynthesis 466 1 466 5.0E-115 83% 83% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = lysyl-tRNA synthetase class I (K), archaeal and spirochete GKJWQY101A5G4B K04566 lysyl-tRNA synthetase, class I [EC:6.1.1.6] Aminoacyl-tRNA biosynthesis 478 1 478 5.0E-120 83% 83% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = lysyl-tRNA synthetase class I (K), archaeal and spirochete GKJWQY101AVMQG K04567 lysyl-tRNA synthetase, class II [EC:6.1.1.6] Aminoacyl-tRNA biosynthesis 411 1 411 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = lysyl-tRNA synthetase (class II) GKJWQY101A1JG9 K00461 arachidonate 5-lipoxygenase [EC:1.13.11.34] Arachidonic acid metabolism 485 1 485 3.0E-172 89% 89% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = arachidonate 15-lipoxygenase GKJWQY101BMOU7 K01259 proline iminopeptidase [EC:3.4.11.5] Arginine and proline metabolism 367 1 367 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = proline-specific peptidases family protein GKJWQY101ALQO4 K01755 argininosuccinate lyase [EC:4.3.2.1] Arginine and proline metabolism 441 1 441 0.0E+00 99% 99% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = argininosuccinate lyase GKJWQY101BHQ01 K01572 oxaloacetate decarboxylase, beta subunit [EC:4.1.1.3] Pyruvate metabolism 242 1 242 5.0E-52 83% 83% 149931032 Bacteria Bacteroidetes Bacteroidia Bacteroides vulgatus ATCC 8482, complete genome, product = oxaloacetate decarboxylase beta chain

GKJWQY101A2NZC K03476 L-ascorbate 6-phosphate lactonase [EC:3.1.1.-] Ascorbate and aldarate metabolism 315 1 315 4.0E-159 99% 99% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc705, product = L-ascorbate-6- phosphate lactonase GKJWQY101BTC2H K03204 type IV secretion system protein VirB9 Bacterial secretion system 483 1 483 2.0E-109 82% 82% 474421396 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium loti DNA, island, strain: NZP2037, product = conjugal transfer protein, TrbG GKJWQY101AFHLP K03199 type IV secretion system protein VirB4 Bacterial secretion system 466 1 466 0.0E+00 93% 93% 27817680 Bacteria Proteobacteria Betaproteobacteria Ralstonia oxalatica transposon Tn4371, product = putative mating pair formation protein GKJWQY101ASAUQ K03076 preprotein translocase subunit SecY Bacterial secretion system/Protein export 239 1 239 7.0E-56 84% 84% 149770655 Bacteria Bacteroidetes Flavobacterium psychrophilum JIP02/86 complete genome, product = preprotein translocase SecY subunit GKJWQY101BIP1B K03073 preprotein translocase subunit SecE Bacterial secretion system/Protein export 350 1 350 3.0E-161 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = protein translocase subunit Oscillatoriales secE/sec61 gamma (<1..175); transcription antitermination protein nusG (172..>347) GKJWQY101BCK7L K01142 exodeoxyribonuclease III [EC:3.1.11.2] Base excision repair 472 1 472 0.0E+00 98% 98% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = exodeoxyribonuclease III GKJWQY101BJH43 K10255 omega-6 fatty acid desaturase (delta-12 desaturase) [EC:1.14.19.-] Biosynthesis of unsaturated fatty acids 486 1 486 0.0E+00 99% 99% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = fatty acid desaturase GKJWQY101BGYFQ K00656 formate C-acetyltransferase [EC:2.3.1.54] Butanoate metabolism 292 1 292 1.0E-133 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = formate acetyltransferase GKJWQY101A3577 K01907 acetoacetyl-CoA synthetase [EC:6.2.1.16] Butanoate metabolism 470 1 470 1.0E-150 87% 87% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = acetoacetyl-CoA synthase GKJWQY101AI2W0 K01907 acetoacetyl-CoA synthetase [EC:6.2.1.16] Butanoate metabolism 479 1 479 3.0E-157 88% 88% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = acetoacetyl-CoA synthase GKJWQY101BJZAX K07190 phosphorylase kinase alpha/beta subunit Calcium signaling pathway 258 1 258 2.0E-126 99% 99% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = phosphorylase kinase alphabeta GKJWQY101AVJYK K01537 Calcium signaling pathway 482 1 482 0.0E+00 100% 100% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = HAD ATPase, P-type, IC family Ca2+-transporting ATPase [EC:3.6.3.8] protein GKJWQY101AU07B K00855 phosphoribulokinase [EC:2.7.1.19] Carbon fixation in photosynthetic organisms 501 1 501 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = phosphoribulokinase GKJWQY101BZ4H8 K03737 putative pyruvate-flavodoxin oxidoreductase [EC:1.2.7.-] Carbon fixation pathways in 442 1 442 4.0E-140 87% 87% 29342101 Bacteria Bacteroidetes Bacteroidia Bacteroides thetaiotaomicron VPI-5482, complete genome, product = pyruvate-flavodoxin oxidoreductase GKJWQY101A6S1N K03737 putative pyruvate-flavodoxin oxidoreductase [EC:1.2.7.-] Carbon fixation pathways in prokaryotes 390 1 390 1.0E-99 84% 84% 295083795 Bacteria Bacteroidetes Bacteroidia Bacteroides xylanisolvens XB1A draft genome, product = pyruvate:ferredoxin (flavodoxin) oxidoreductase, homodimeric 213 Table S11 Cont.

GKJWQY101AR4JZ K01358 ATP-dependent Clp protease, protease subunit [EC:3.4.21.92] Cell cycle - Caulobacter 484 1 484 0.0E+00 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = ATP-dependent Clp protease Oscillatoriales proteolytic subunit ClpP GKJWQY101BDWER K01338 ATP-dependent Lon protease [EC:3.4.21.53] Cell cycle - Caulobacter 514 1 514 3.0E-162 87% 87% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = ATP-dependent protease La GKJWQY101BFHGL K03544 ATP-dependent Clp protease ATP-binding subunit ClpX Cell cycle - Caulobacter 461 1 461 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = ATP-dependent Clp protease ATP-binding subunit ClpX GKJWQY101BK9HC K06666 glucose repression regulatory protein TUP1 Cell cycle - yeast 480 1 480 1.0E-41 53% 63%* 427716641 Bacteria Cyanobacteria n WD40 repeat-containing protein [Calothrix sp. PCC 7507] GKJWQY101AKTZ2 K00148 Chloroalkane and chloroalkene degradation/Methane metabolism 506 1 506 0.0E+00 98% 98% 133737197 Bacteria Proteobacteria Betaproteobacteria Herminiimonas arsenicoxydans chromosome, complete sequence, product = glutathione-independent glutathione-independent formaldehyde dehydrogenase [EC:1.2.1.46] formaldehyde dehydrogenase (FDH) (FALDH) GKJWQY101AVMAD K00148 Chloroalkane and chloroalkene degradation/Methane metabolism 483 1 483 0.0E+00 98% 98% 133737197 Bacteria Proteobacteria Betaproteobacteria Herminiimonas arsenicoxydans chromosome, complete sequence, product = glutathione-independent glutathione-independent formaldehyde dehydrogenase [EC:1.2.1.46] formaldehyde dehydrogenase (FDH) (FALDH) GKJWQY101BLVZX K00148 Chloroalkane and chloroalkene degradation/Methane metabolism 427 1 427 0.0E+00 99% 99% 133737197 Bacteria Proteobacteria Betaproteobacteria Herminiimonas arsenicoxydans chromosome, complete sequence, product = glutathione-independent glutathione-independent formaldehyde dehydrogenase [EC:1.2.1.46] formaldehyde dehydrogenase (FDH) (FALDH) GKJWQY101AO712_2 K01563 haloalkane dehalogenase Chlorocyclohexane and chlorobenzene degradation/Chloroalkane and chloroalkene degradation 178 1 178 8.0E-69 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = alpha/beta hydrolase fold protein GKJWQY101AJE3X K01802 cis-trans-Isomerases 480 1 480 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = peptidyl-prolyl cis-trans isomerase peptidylprolyl isomerase [EC:5.2.1.8] Oscillatoriales cyclophilin type GKJWQY101ADVWE K00031 Citrate cycle (TCA cycle)/Glutathione metabolism/Carbon fixation pathways in prokaryotes/ 507 15 507 2.0E-64 76% 76% 72493824 Bacteria Firmicutes Bacilli Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 DNA, complete genome, product = isocitrate dehydrogenase [EC:1.1.1.42] isocitrate dehydrogenase GKJWQY101AM1FT K00031 isocitrate dehydrogenase [EC:1.1.1.42] Citrate cycle (TCA cycle)/Glutathione metabolism/Carbon fixation pathways in prokaryotes/ 474 14 474 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = isocitrate dehydrogenase GKJWQY101A6F8L K00658 Citrate cycle (TCA cycle)/ degradation 481 1 481 0.0E+00 91% 91% 469772332 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum FQY_4, complete genome, product = dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex(<1..325); dihydrolipoamide dehydrogenase 2-oxoglutarate dehydrogenase E2 component (dihydrolipoamide succinyltransf of 2-oxoglutarate dehydrogenase (396..>479) GKJWQY101AHUTJ K00164 Citrate cycle (TCA cycle)/Lysine degradation/Tryptophan metabolism 348 1 348 6.0E-108 87% 87% 71037566 Bacteria Proteobacteria Gammaproteobacteria Psychrobacter arcticus 273-4, complete genome, product = 2-oxoglutarate dehydrogenase E1 2-oxoglutarate dehydrogenase E1 component [EC:1.2.4.2] component GKJWQY101BIWNZ K00240 Citrate cycle (TCA cycle)/Oxidative phosphorylation 394 1 394 0.0E+00 96% 96% 299070035 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum CFBP2957 chromosome complete genome, product = succinate succinate dehydrogenase iron-sulfur protein [EC:1.3.99.1] dehydrogenase, Fe-S protein GKJWQY101A05N9 K00558 Cysteine and methionine metabolism 344 1 344 3.0E-141 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DNA-cytosine methyltransferase DNA (cytosine-5-)-methyltransferase [EC:2.1.1.37] Oscillatoriales GKJWQY101BICX6 K01776 glutamate racemase [EC:5.1.1.3] D-Glutamine and D-glutamate metabolism 465 1 465 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = glutamate racemase GKJWQY101AUR1Y K01925 UDP-N-acetylmuramoylalanine--D-glutamate ligase [EC:6.3.2.9] D-Glutamine and D-glutamate metabolism/Peptidoglycan biosynthesis 495 1 495 0.0E+00 99% 99% 325124855 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii subsp. bulgaricus 2038, complete genome, product = UDP-N- acetylmuramoylalanine--D-glutamate ligase GKJWQY101ALDHC K03502 DNA polymerase V DNA repair and recombination proteins 165 1 165 3.0E-78 99% 99% 3582195 Bacteria Firmicutes Bacilli Lactococcus lactis strain DPC3147 plasmid pMRC01, complete sequence (conserved hypothetical protein, ORFU) GKJWQY101ADMQY K01356 repressor LexA [EC:3.4.21.88] DNA repair and recombination proteins 503 1 503 0.0E+00 99% 99% 312279338 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii subsp. bulgaricus ND02, complete genome, product = LexA repressor GKJWQY101BKUOE K02314 replicative DNA helicase [EC:3.6.4.12] DNA replication/Cell cycle - Caulobacter 433 1 433 0.0E+00 100% 100% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = replicative DNA helicase GKJWQY101AJLH0 K02314 replicative DNA helicase [EC:3.6.4.12] DNA replication/Cell cycle - Caulobacter 337 1 337 3.0E-136 93% 93% 300072131 Bacteria Proteobacteria Betaproteobacteria Herbaspirillum seropedicae SmR1, complete genome, product = replicative DNA helicase protein GKJWQY101APN55 K02469 DNA gyrase subunit A [EC:5.99.1.3] DNA replication/DNA repair and recombination 518 1 518 1.0E-140 85% 85% 78609255 Bacteria Firmicutes Bacilli Lactobacillus sakei strain 23K complete genome, product = DNA gyrase, A subunit GKJWQY101A1C2A K03111 single-strand DNA-binding protein DNA replication/Mismatch repair/Homologous recombination 437 1 437 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = single-strand binding protein GKJWQY101AESM7 K01422 E3.4.99.- (found through discontiguous megablast) Endopeptidases of unknown catalytic mechanism 504 1 504 1.0E-42 70% 70% 149935098 Bacteria Bacteroidetes Bacteroidia Parabacteroides distasonis ATCC 8503, complete genome, product = putative zinc protease YmxG GKJWQY101BQ1WY K01147 Exoribonucleases producing 5'-phosphomonoesters 381 1 381 5.0E-109 86% 86% 300072131 Bacteria Proteobacteria Betaproteobacteria Herbaspirillum seropedicae SmR1, complete genome, product = exoribonuclease R protein (function = exoribonuclease II [EC:3.1.13.1] K - Transcription) GKJWQY101BPW6B K00648 Fatty acid biosynthesis 243 1 243 7.0E-96 93% 93% 145902672 Bacteria Firmicutes Bacilli Bacillus licheniformis ATCC 14580, complete genome, product = beta-ketoacyl-acyl carrier protein 3-oxoacyl-[acyl-carrier-protein] synthase III [EC:2.3.1.180] synthase III GKJWQY101BGY07 K02371 enoyl-[acyl carrier protein] reductase II [EC:1.3.1.-] Fatty acid biosynthesis 335 1 335 2.0E-172 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = enoyl-(Acyl-carrier-protein) reductase (<1..253); acyl carrier protein 2 (276..>335) GKJWQY101A3SQJ K01716 3-hydroxydecanoyl-[acyl-carrier-protein] dehydratase [EC:4.2.1.60] Fatty acid biosynthesis 258 1 258 1.0E-69 86% 86% 365177649 Bacteria Proteobacteria Alphaproteobacteria Sinorhizobium fredii HH103 main chromosome, complete sequence, product = 3-hydroxydecanoyl- (acyl carrier protein) dehydratase GKJWQY101BM1CY K00208 Fatty acid biosynthesis/Biotin metabolism 478 1 478 2.00E-148 87% 87% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = enoyl-(acyl-carrier-protein) enoyl-[acyl-carrier protein] reductase I [EC:1.3.1.9] reductase (NADH) GKJWQY101BMNR7 K00208 Fatty acid biosynthesis/Biotin metabolism 435 1 435 6.00E-129 86% 86% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = enoyl-(acyl-carrier-protein) enoyl-[acyl-carrier protein] reductase I [EC:1.3.1.9] reductase (NADH) GKJWQY101BVV9P K02372 3R-hydroxymyristoyl ACP dehydrase [EC:4.2.1.-] Fatty acid biosynthesis/Biotin metabolism 227 1 227 9.0E-30 78% 78% 326411376 Bacteria Proteobacteria Alphaproteobacteria Polymorphum gilvum SL003B-26A1, complete genome, product = 3R)-hydroxymyristoyl-[acyl-carrier- protein] dehydratase GKJWQY101ALM6K K00059 Fatty acid biosynthesis/Biotin metabolism/Biosynthesis of unsaturated fatty acids 464 1 464 8.0E-78 79% 79% 186463002 Bacteria Cyanobacteria Nostoc punctiforme PCC 73102, complete genome, product = short-chain dehydrogenase/reductase 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] Nostocales SDR GKJWQY101BFXP6 K00059 Fatty acid biosynthesis/Biotin metabolism/Biosynthesis of unsaturated fatty acids 490 1 490 4.0E-146 86% 86% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = 3-oxoacyl-(acyl-carrier-protein) 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] Oscillatoriales reductase GKJWQY101AOPRD K00059 Fatty acid biosynthesis/Biotin metabolism/Biosynthesis of unsaturated fatty acids 404 1 404 1.0E-90 82% 82% 170937689 Bacteria Proteobacteria Betaproteobacteria Cupriavidus taiwanensis str. LMG19424 chromosome 1, complete genome, product = putative 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] dehydrogenase/reductase GKJWQY101BA0SO K07511 enoyl-CoA hydratase [EC:4.2.1.17] Fatty acid elongation 430 1 430 0.0E+00 97% 97% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = 3-hydroxybutyryl-CoA dehydratase GKJWQY101BXDHE K01897 long-chain acyl-CoA synthetase [EC:6.2.1.3] Fatty acid metabolism/Peroxisome /PPAR signaling pathway 295 1 295 7.0E-22 74% 74% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = acyl-CoA synthetase (AMP- forming)/AMP-acid ligase II GKJWQY101AQNM9 K01897 long-chain acyl-CoA synthetase [EC:6.2.1.3] Fatty acid metabolism/Peroxisome /PPAR signaling pathway 511 1 511 0.0E+00 99% 99% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = long-chain-fatty-acid--CoA ligase GKJWQY101BLPEJ K01897 long-chain acyl-CoA synthetase [EC:6.2.1.3] Fatty acid metabolism/Peroxisome /PPAR signaling pathway 522 1 522 0.0E+00 99% 99% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = long-chain-fatty-acid--CoA ligase GKJWQY101BIW4Y K00754 [EC:2.4.1.-] Fructose and mannose metabolism 484 1 484 0.0E+00 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = glycosyl transferase, Oscillatoriales WecB/TagA/CpsF family (teichoic acid biosynthesis) GKJWQY101A090Z K00971 mannose-1-phosphate guanylyltransferase [EC:2.7.7.22] Fructose and mannose metabolism/Amino sugar and nucleotide sugar metabolism 409 1 409 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = mannose-1-phosphate Oscillatoriales guanylyltransferase GKJWQY101B1CPM K00965 UDPglucose--hexose-1-phosphate uridylyltransferase [EC:2.7.7.12] metabolism/Amino sugar and nucleotide sugar metabolism 392 1 392 0.0E+00 98% 98% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = galactose-1-phosphate uridylyltransferase GKJWQY101BS0KQ K01193 beta-fructofuranosidase [EC:3.2.1.26] Galactose metabolism/Starch and sucrose metabolism 346 1 346 2.0E-178 99% 99% 443424428 Bacteria Firmicutes Bacilli Staphylococcus warneri SG1, complete genome, product = sucrose-6-phosphate hydrolase (note= COG1621 Beta-fructosidases (levanase/invertase) GKJWQY101B1CAN K00257 [EC:1.3.99.-] Geraniol degradation 467 1 467 0.0E+00 100% 100% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = acyl-CoA dehydrogenase GKJWQY101BV1WT K00257 [EC:1.3.99.-] Geraniol degradation 462 1 462 0.0E+00 100% 100% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = acyl-CoA dehydrogenase GKJWQY101BPRRL K00257 [EC:1.3.99.-] Geraniol degradation 479 1 479 1.0E-180 91% 91% 206593202 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum strain IPO1609 Genome Draft, product = acyl-coa dehydrogenase protein GKJWQY101BYCTV K01255 leucyl aminopeptidase [EC:3.4.11.1] Glutathione metabolism 494 1 494 0.0E+00 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = leucyl aminopeptidase GKJWQY101AK7G8 K00799 Glutathione metabolism/Metabolism of xenobiotics by cytochrome P450 452 1 452 3.0E-107 83% 83% 299065054 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum str. CMR15 chromosome, complete genome, product = glutathione S- glutathione S-transferase [EC:2.5.1.18] transferase GKJWQY101BP3UP K00681 Glutathione metabolism/Taurine and hypotaurine metabolism/Cyanoamino acid 192 1 192 4.0E-93 99% 99% 145902672 Bacteria Firmicutes Bacilli Bacillus licheniformis ATCC 14580, complete genome, product = peptidase T3, gamma- gamma-glutamyltranspeptidase [EC:2.3.2.2] metabolism/Arachidonic acid metabolism glutamyltranspeptidase GKJWQY101AX7BR K07407 alpha-galactosidase [EC:3.2.1.22] Glycerolipid metabolism 386 1 386 0.0E+00 98% 98% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc 705, product = alpha- galactosidase (GH36) GKJWQY101A5XCC K00005 glycerol dehydrogenase [EC:1.1.1.6] Glycerolipid metabolism 547 14 547 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = glycerol dehydrogenase GKJWQY101BK048 K00057 Glycerophospholipid metabolism 326 1 326 4.0E-124 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = glycerol-3-phosphate glycerol-3-phosphate dehydrogenase (NAD(P)+) [EC:1.1.1.94] Oscillatoriales dehydrogenase (NAD(P)+) GKJWQY101BWIVS K00872 homoserine kinase [EC:2.7.1.39] Glycine, serine and threonine metabolism 483 1 483 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = homoserine kinase GKJWQY101A43VY K06001 tryptophan synthase beta chain [EC:4.2.1.20] Glycine, serine and threonine metabolism 439 1 439 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = tryptophan synthase, beta subunit

GKJWQY101AY8NU K00302 Glycine, serine and threonine metabolism 490 1 490 5.0E-165 88% 88% 317165637 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium ciceri biovar biserrulae WSM1271, complete genome, product = sarcosine oxidase, sarcosine oxidase, subunit alpha [EC:1.5.3.1] alpha subunit family protein GKJWQY101AMFXC K00108 Glycine, serine and threonine metabolism 455 13 455 2.0E-124 85% 85% 433663430 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium australicum WSM2073, complete genome, product = choline dehydrogenase-like choline dehydrogenase [EC:1.1.99.1] flavoprotein GKJWQY101BBP73 K00058 Glycine, serine and threonine metabolism/ 528 1 525 0.0E+00 96% 96% 103422338 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii subsp. bulgaricus ATCC 11842 complete genome, product = D-3-phosphoglycerate dehydrogenase [EC:1.1.1.95] Methane metabolism Phosphoglycerate dehydrogenase (<1..413); Phosphoserine aminotransferase (432..>524) GKJWQY101ATR7M K00133 Glycine, serine and threonine metabolism/Cysteine and methionine metabolism/Lysine biosynthesis 381 1 381 1.0E-124 88% 88% 336024847 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium opportunistum WSM2075, complete genome, product = aspartate-semialdehyde aspartate-semialdehyde dehydrogenase [EC:1.2.1.11] dehydrogenase GKJWQY101A7G4R K00134 Glycolysis-Gluconeogenesis 518 1 517 0.0E+00 97% 97% 413973243 Bacteria Firmicutes Bacilli Lactococcus lactis subsp. cremoris UC509.9, complete genome, product = putative alkylphosphonate uptake protein (<1..42); glyceraldehyde 3-phosphate dehydrogenase (110..>516) glyceraldehyde 3-phosphate dehydrogenase [EC:1.2.1.12] GKJWQY101BNEJ6 K00134 Glycolysis-Gluconeogenesis 413 1 413 0.0E+00 99% 99% 413973243 Bacteria Firmicutes Bacilli Lactococcus lactis subsp. cremoris UC509.9, complete genome, product = glyceraldehyde 3- glyceraldehyde 3-phosphate dehydrogenase [EC:1.2.1.12] phosphate dehydrogenase (8..414) GKJWQY101BTPQT K00131 Glycolysis-Gluconeogenesis 471 1 471 0.0E+00 94% 94% 406368402 Bacteria Firmicutes Bacilli Streptococcus pneumoniae gamPNI0373, complete genome, product = glyceraldehyde-3-phosphate glyceraldehyde-3-phosphate dehydrogenase (NADP) [EC:1.2.1.9] dehydrogenase (NADP+) GKJWQY101ASDW8 K00134 Glycolysis-Gluconeogenesis 458 1 458 7.0E-168 90% 90% 111606883 Bacteria Proteobacteria Alphaproteobacteria Aminobacter aminovorans partial gapA gene for glyceraldehyde-3-phosphate dehydrogenase, strain glyceraldehyde 3-phosphate dehydrogenase [EC:1.2.1.12] DSM 10368 GKJWQY101B0A0U K00134 Glycolysis-Gluconeogenesis 477 1 477 3.0E-172 90% 90% 111606883 Bacteria Proteobacteria Alphaproteobacteria Aminobacter aminovorans partial gapA gene for glyceraldehyde-3-phosphate dehydrogenase, strain glyceraldehyde 3-phosphate dehydrogenase [EC:1.2.1.12] DSM 10368 GKJWQY101A4IEA K00627 Glycolysis-Gluconeogenesis/Citrate cycle (TCA cycle)/Pyruvate metabolism 433 1 433 0.0E+00 97% 97% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = dihydrolipoamide pyruvate dehydrogenase E2 component (dihydrolipoamide acetyltransferase) [E acetyltransferase component of pyruvate dehydrogenase complex GKJWQY101AYI7F K00163 Glycolysis-Gluconeogenesis/Citrate cycle (TCA cycle)/Pyruvate metabolism/Butanoate metabolism 479 1 479 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = pyruvate dehydrogenase E1 pyruvate dehydrogenase E1 component [EC:1.2.4.1] component GKJWQY101BVMGW K00162 Glycolysis-Gluconeogenesis/Citrate cycle (TCA cycle)/Pyruvate metabolism/Butanoate 318 1 318 4.00E-159 99% 99% 392602377 Bacteria Firmicutes Bacilli Streptococcus mutans GS-5, complete genome, product = branched-chain alpha-keto acid metabolism/HIF-1 signaling pathway dehydrogenase subunit E2 (<1..105); pyruvate dehydrogenase E1 component beta subunit (118..>316) pyruvate dehydrogenase E1 component subunit beta [EC:1.2.4.1] GKJWQY101BGJYW K01610 phosphoenolpyruvate carboxykinase (ATP) [EC:4.1.1.49] Glycolysis-Gluconeogenesis/Citrate cycle (TCA cycle)/Pyruvate metabolism/Carbon fixation in 510 1 510 8.0E-163 87% 87% 301161079 Bacteria Bacteroidetes Bacteroidia Bacteroides fragilis 638R genome, product = putative phosphoenolpyruvate carboxykinase photosynthetic organisms GKJWQY101A0Q2Y K01689 [EC:4.2.1.11] Glycolysis-Gluconeogenesis/Methane metabolism/RNA degradation/HIF-1 signaling pathway 235 1 235 1.0E-118 100% 100% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc705, product = enolase 214 Table S11 Cont.

GKJWQY101BGU7F K00128 Glycolysis-Gluconeogenesis/Pentose and glucuronate interconversions/Ascorbate and aldarate 338 1 338 9.0E-156 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = aldehyde dehydrogenase metabolism/Fatty acid metabolism/Valine, leucine and isoleucinedegradation/Lysine degradation/Arginine and proline metabolism/Histidine metabolism/Tryptophan metabolism/beta- Alanine metabolism/Glycerolipid metabolism/Pyruvate metabolism/Chloroalkane and chloroalkene degradation/Propanoate metabolism/Limonene and pinene degradation aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] Oscillatoriales GKJWQY101BV866 K00128 Glycolysis-Gluconeogenesis/Pentose and glucuronate interconversions/Ascorbate and aldarate 245 3 245 4.0E-83 90% 90% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = aldehyde dehydrogenase metabolism/Fatty acid metabolism/Valine, leucine and isoleucinedegradation/Lysine degradation/Arginine and proline metabolism/Histidine metabolism/Tryptophan metabolism/beta- Alanine metabolism/Glycerolipid metabolism/Pyruvate metabolism/Chloroalkane and chloroalkene degradation/Propanoate metabolism/Limonene and pinene degradation aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] Oscillatoriales GKJWQY101BQ494 K01810 glucose-6-phosphate isomerase [EC:5.3.1.9] Glycolysis-Gluconeogenesis/Pentose phosphate pathway/Starch and sucrose metabolism/Amino sugar 357 1 357 1.0E-175 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = glucose-6-phosphate isomerase and nucleotide sugar metabolism Oscillatoriales GKJWQY101BAZHJ K01895 acetyl-CoA synthetase [EC:6.2.1.1] Glycolysis-Gluconeogenesis/Pyruvate metabolism/Propanoate metabolism/Carbon fixation pathways 296 1 296 9.0E-91 88% 88% 336024847 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium opportunistum WSM2075, complete genome, product = acetate--CoA ligase in prokaryotes/Methane metabolism GKJWQY101BG1OI K01895 acetyl-CoA synthetase [EC:6.2.1.1] Glycolysis-Gluconeogenesis/Pyruvate metabolism/Propanoate metabolism/Carbon fixation pathways 363 1 363 5.0E-19 73% 73% 385869619 Bacteria Proteobacteria Gammaproteobacteria Pectobacterium sp. SCC3193, complete genome, product = acetyl-coenzyme A synthetase in prokaryotes/Methane metabolism GKJWQY101ALV77 K01235 alpha-glucuronidase [EC:3.2.1.139] Glycosidases[hydrolyse O- and S-glycosyl] 403 1 403 5.0E-20 72% 72% 433303004 Bacteria Bacteroidetes Bacteroidia Prevotella dentalis DSM 3688 chromosome 2, complete sequence, product = alpha-glucuronidase GKJWQY101AFEH7 K01200 Glycosidases[hydrolyse O- and S-glycosyl] 566 1 566 0.0E+00 98% 98% 296848933 Bacteria Deinococcus-Thermus Deinococci Meiothermus silvanus DSM 9946, complete genome, product = alpha-1,6-glucosidase, pullulanase- pullulanase [EC:3.2.1.41] type GKJWQY101B1465 K01608 tartronate-semialdehyde synthase [EC:4.1.1.47] Glyoxylate and dicarboxylate metabolism 287 1 287 8.0E-61 82% 82% 327367349 Bacteria Proteobacteria Betaproteobacteria Burkholderia gladioli BSR3 chromosome 1, complete sequence, product = glyoxylate carboligase GKJWQY101AIKIJ K11472 glycolate oxidase FAD binding subunit Glyoxylate and dicarboxylate metabolism 391 1 391 4.0E-180 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = FAD linked oxidase domain protein Oscillatoriales GKJWQY101BRYKR K01715 3-hydroxybutyryl-CoA dehydratase [EC:4.2.1.55] Glyoxylate and dicarboxylate metabolism/Butanoate metabolism/Carbon fixation pathways in 432 1 432 0.0E+00 99% 99% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = 3-hydroxybutyryl-CoA prokaryotes dehydratase GKJWQY101AH3TL K00123 Glyoxylate and dicarboxylate metabolism/Methane metabolism 348 1 348 2.0E-176 99% 99% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = formate dehydrogenase O beta formate dehydrogenase, alpha subunit [EC:1.2.1.2] subunit (1..25); formate dehydrogenase O alpha subunit (36..349) GKJWQY101AH3TL_2 K00123 Glyoxylate and dicarboxylate metabolism/Methane metabolism 116 1 116 3.0E-46 97% 97% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = formate dehydrogenase O formate dehydrogenase, alpha subunit [EC:1.2.1.2] alpha subunit GKJWQY101AZSAO K00124 Glyoxylate and dicarboxylate metabolism/Methane metabolism 267 2 267 9.0E-100 92% 92% 334194119 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum Po82, complete genome, product = formate dehydrogenase iron-sulfur subunit formate dehydrogenase, beta subunit [EC:1.2.1.2] GKJWQY101AIW11 K01915 glutamine synthetase [EC:6.3.1.2] Glyoxylate and dicarboxylate metabolism/Nitrogen metabolism/Alanine, aspartate and glutamate 486 1 486 2.0E-149 87% 87% 269095543 Bacteria Actinobacteria Actinobacteria Sanguibacter keddieii DSM 10542, complete genome, product = L-glutamine synthetase metabolism/Arginine and proline metabolism/Two-component system GKJWQY101BRVH2 K00013 histidinol dehydrogenase [EC:1.1.1.23] Histidine metabolism 494 1 494 6.0E-164 89% 89% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = histidinol dehydrogenase GKJWQY101BLV08 K02500 cyclase [EC:4.1.3.-] Histidine metabolism 499 1 499 0.0E+00 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = imidazole glycerol phosphate Oscillatoriales synthase subunit hisF GKJWQY101AQTYB K00765 Histidine metabolism 352 1 352 3.0E-170 98% 98% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = ATP phosphoribosyltransferase ATP phosphoribosyltransferase [EC:2.4.2.17] GKJWQY101AD3KT K02501 glutamine amidotransferase [EC:2.4.2.-] Histidine metabolism 508 1 508 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = imidazole glycerol phosphate synthase amidotransferase su... GKJWQY101BZKZO K00817 histidinol-phosphate aminotransferase [EC:2.6.1.9] Histidine metabolism/Tyrosine metabolism/Phenylalanine metabolism/Phenylalanine, tyrosine and 481 1 481 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = histidinol-phosphate transaminase tryptophan biosynthesis/Novobiocin biosynthesis/Tropane, piperidine and pyridine alkaloid biosynthesis GKJWQY101B1I99 K03553 recombination protein RecA Homologous recombination 343 1 343 2.0E-152 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = protein recA GKJWQY101BWEV4 K04066 primosomal protein N' (replication factor Y) (superfamily II helicase) [EC:3.6.4.-] Homologous recombination 473 1 473 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = replication restart DNA helicase PriA Oscillatoriales GKJWQY101BZIGR K04066 primosomal protein N' (replication factor Y) (superfamily II helicase) [EC:3.6.4.-] Homologous recombination 498 1 498 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = replication restart DNA helicase PriA Oscillatoriales GKJWQY101A6L2S K03655 ATP-dependent DNA helicase RecG [EC:3.6.4.12] Homologous recombination 344 1 344 2.0E-177 99% 99% 413973243 Bacteria Firmicutes Bacilli Lactococcus lactis subsp. cremoris UC509.9, complete genome, product = ATP-dependent DNA helicase RecG GKJWQY101AM4OJ K03336 3D-(3,5/4)-trihydroxycyclohexane-1,2-dione hydrolase [EC:3.7.1.-] Inositol phosphate metabolism 201 1 201 2.0E-96 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = 3D-(3, 5/4)- trihydroxycyclohexane-1, 2-dione hydrolase GKJWQY101BRHOO K00677 Lipopolysaccharide biosynthesis 293 1 293 3.0E-130 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = acyl-(acyl-carrier-protein)--UDP-N- UDP-N-acetylglucosamine acyltransferase [EC:2.3.1.129] Oscillatoriales acetylglucosamine O-acyltransferase GKJWQY101B1OUF K00748 Lipopolysaccharide biosynthesis 413 1 413 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = putative lipid-A-disaccharide lipid-A-disaccharide synthase [EC:2.4.1.182] Oscillatoriales synthase GKJWQY101AZRJ4 K02535 UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase [EC:3.5.1.108] Lipopolysaccharide biosynthesis 498 1 498 8.0E-138 85% 85% 407894523 Bacteria Proteobacteria Betaproteobacteria Acidovorax sp. KKS102, complete genome, product = UDP-3-O-[3-hydroxymyristoyl] N- acetylglucosamine deacetylase GKJWQY101AUUBE K02517 lipid A biosynthesis lauroyl acyltransferase [EC:2.3.1.-] HtrB Lipopolysaccharide biosynthesis 312 1 312 6.0E-128 94% 94% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = lipid A biosynthesis lauroyl acyltransferase GKJWQY101A7HV4 K01929 UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate--D-alanyl-D- Lysine biosynthesis/Peptidoglycan biosynthesis 480 1 480 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = UDP-N-acetylmuramoyl- alanine ligase [EC:6.3.2.10] tripeptide--D-alanyl-D-alanine ligase family protein GKJWQY101BMITR K01929 UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate--D-alanyl-D- Lysine biosynthesis/Peptidoglycan biosynthesis 479 1 479 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = UDP-N-acetylmuramoyl- alanine ligase [EC:6.3.2.10] tripeptide--D-alanyl-D-alanine ligase family protein GKJWQY101BHYEF K01423 [EC:3.4.-.-] Lysine degradation/Biotin metabolism 314 1 314 2.0E-136 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = processing peptidase GKJWQY101BW6ZG K01070 S-formylglutathione hydrolase [EC:3.1.2.12] Methane metabolism 237 1 237 4.0E-98 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = S-formylglutathione hydrolase GKJWQY101A44JV K00598 trans-aconitate 2-methyltransferase [EC:2.1.1.144] Methyltransferases [Transferring one-carbon groups] 477 1 477 0.0E+00 99% 99% 187427012 Bacteria Proteobacteria Gammaproteobacteria CDC 3083-94, complete genome, product = trans-aconitate 2-methyltransferase GKJWQY101AIZ3R K03572 DNA mismatch repair protein MutL Mismatch repair 490 1 490 0.0E+00 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DNA mismatch repair protein MutL Oscillatoriales GKJWQY101BYVK9 K03601 exodeoxyribonuclease VII large subunit [EC:3.1.11.6] Mismatch repair 511 1 511 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = exodeoxyribonuclease VII small Oscillatoriales subunit (<1..20) & VII large subunit(17..>511) GKJWQY101A5EC7 K07496 putative transposase n/a 517 1 517 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = transposase, IS605 OrfB family GKJWQY101BOJM6 K07492 putative transposase n/a 227 1 227 1.0E-112 100% 100% 407894523 Bacteria Proteobacteria Betaproteobacteria Acidovorax sp. KKS102, complete genome, product = IS4 family transposase (<1..213); hypothetical protein (210..>227) GKJWQY101ASQTP K00721 dolichol-phosphate mannosyltransferase [EC:2.4.1.83] N-Glycan biosynthesis 418 1 418 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = glycosyl transferase family 2 GKJWQY101AV5PN K01950 NAD+ synthase (glutamine-hydrolysing) [EC:6.3.5.1] Nicotinate and nicotinamide metabolism 235 1 235 2.0E-115 99% 99% 430780086 Bacteria Spirochaetes Spirochaetes Brachyspira pilosicoli P43/6/78, complete genome, product = NAD+ synthetase GKJWQY101AIJQR K00367 Nitrogen metabolism 430 1 430 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = assimilatory nitrate reductase ferredoxin-nitrate reductase [EC:1.7.7.2] Oscillatoriales (ferredoxin) precursor GKJWQY101BN57R K02050 sulfonate/nitrate/taurine transport system permease protein NitT/TauT family transport system 458 1 458 6.0E-114 83% 83% 469772332 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum FQY_4, complete genome, product = ABC-type nitrate/sulfonate/bicarbonate transport system, ATPase component (<1..75); Alkanesulfonates transport system permease protein (99..>458) GKJWQY101BFZXQ K03723 transcription-repair coupling factor (superfamily II helicase) [EC:3.6.4.-] Nucleotide excision repair 476 1 476 1.0E-100 81% 81% 295083795 Bacteria Bacteroidetes Bacteroidia Bacteroides xylanisolvens XB1A draft genome, product = genomic DNA GKJWQY101AEXUL K03703 excinuclease ABC subunit C Nucleotide excision repair 461 1 461 1.0E-131 86% 86% 336024847 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium opportunistum WSM2075, complete genome, product = excinuclease ABC, C subunit

GKJWQY101AJ43P K03702 excinuclease ABC subunit B Nucleotide excision repair 476 1 476 0.0E+00 93% 93% 133737197 Bacteria Proteobacteria Betaproteobacteria Herminiimonas arsenicoxydans chromosome, complete sequence, product = UvrABC system protein B (Protein uvrB) (Excinuclease ABC subunit B) GKJWQY101ACX1U K02335 DNA polymerase I [EC:2.7.7.7] Nucleotide excision repair/Base excision repair/Purine metabolism/Pyrimidine metabolism/DNA 421 1 421 0.0E+00 96% 96% 325332286 Bacteria Firmicutes Bacilli Lactobacillus acidophilus 30SC, complete genome, product = DNA polymerase I replication/Mismatch repair/Homologous recombination GKJWQY101AHB7R K03657 DNA helicase II / ATP-dependent DNA helicase PcrA [EC:3.6.4.12] Nucleotide excision repair/Mismatch repair 533 1 533 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = ATP-dependent DNA helicase PcrA Oscillatoriales GKJWQY101A0KMK K00986 Nucleotidyltransferases 516 1 516 0.0E+00 91% 91% 428244862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112 plasmid pOSC7112.02, complete sequence, product = RNA- RNA-directed DNA polymerase [EC:2.7.7.49] Oscillatoriales directed DNA polymerase (Reverse transcriptase) GKJWQY101BNDE7 K13403 methylenetetrahydrofolate dehydrogenase(NAD+) / 5,10-methenyltetrahydrofolate One carbon pool by folate 532 1 532 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = bifunctional protein folD (<1..293); cyclohydrolase [EC:3.5.4.9 1.5.1.15] Oscillatoriales hypothetical protein (370..>535) GKJWQY101AO712 K08045 adenylate cyclase 5 [EC:4.6.1.1] Oocyte meiosis 244 1 244 1.0E-92 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = adenylate/guanylate cyclase with Oscillatoriales GAF and PAS/PAC sensors GKJWQY101BVTHW K01507 inorganic pyrophosphatase [EC:3.6.1.1] Oxidative phosphorylation 234 1 234 1.0E-97 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = inorganic pyrophosphatase GKJWQY101A6GJD K01535 H+-transporting ATPase [EC:3.6.3.6] Oxidative phosphorylation 290 1 290 6.0E-132 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = ATPase, P-type (transporting), Oscillatoriales HAD superfamily, subfamily IC GKJWQY101A2N5U K05577 NADH dehydrogenase I subunit 5 [EC:1.6.5.3] Oxidative phosphorylation 251 1 251 1.0E-108 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = NADH dehydrogenase subunit L GKJWQY101BK6YD K03885 NADH dehydrogenase [EC:1.6.99.3] Oxidative phosphorylation 330 1 330 1.0E-169 99% 99% 133737197 Bacteria Proteobacteria Betaproteobacteria Herminiimonas arsenicoxydans chromosome, complete sequence, product = NADH dehydrogenase

GKJWQY101B1IG8 K02298 cytochrome o ubiquinol oxidase subunit I [EC:1.10.3.-] Oxidative phosphorylation 465 1 465 4.0E-175 91% 91% 30407127 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum GMI1000 chromosome complete sequence, product = probable transmembrane cytochrome o ubiquinol oxidase (subunitI oxidoreductase protein GKJWQY101APEM7 K00342 Oxidative phosphorylation/Nitrogen metabolism 476 1 476 0.0E+00 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = proton-translocating NADH-quinone NADH dehydrogenase I subunit M [EC:1.6.5.3] Oscillatoriales oxidoreductase, chain M GKJWQY101BAGJL K00334 Oxidative phosphorylation/Nitrogen metabolism 447 1 447 0.0E+00 98% 98% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = NADH-ubiquinone NADH dehydrogenase I subunit E [EC:1.6.5.3] oxidoreductase chain E GKJWQY101BREYN K00412 Oxidative phosphorylation/Nitrogen metabolism/Two-component system 509 1 509 0.0E+00 98% 98% 77965403 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. 383 chromosome 1, complete sequence, product = cytochrome b/b6-like ubiquinol-cytochrome c reductase cytochrome b subunit [EC:1.10.2.2] protein(<1..392); cytochrome c1 (414..>509) GKJWQY101BY1LH K02115 F-type H+-transporting ATPase subunit gamma [EC:3.6.3.14] Oxidative phosphorylation/Photosynthesis 465 1 465 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = ATP synthase F1 subcomplex Oscillatoriales gamma subunit GKJWQY101B2HJ8 K00425 cytochrome bd-I oxidase subunit I [EC:1.10.3.-] Oxidative phosphorylation/Two-component system 471 1 471 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = cytochrome d ubiquinol oxidase subunit I (<1..399) and subunit II (431..>471) GKJWQY101BPM80 K00425 cytochrome bd-I oxidase subunit I [EC:1.10.3.-] Oxidative phosphorylation/Two-component system 471 1 471 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = cytochrome d ubiquinol oxidase subunit I (<1..397); cytochrome d ubiquinol oxidase subunit II (429..>469) GKJWQY101BIGXG K00184 molybdopterin oxidoreductase, iron-sulfur binding subunit [EC:1.2.7.-] Oxidoreductases [iron-sulfur protein as acceptor] 357 1 357 1.0E-89 84% 84% 255342900 Bacteria Bacteroidetes Sphingobacteriia Pedobacter heparinus DSM 2366, complete genome, product = putative iron-sulfur binding oxidoreductase GKJWQY101BIWTP K00359 NADH oxidase [EC:1.6.-.-] Oxidoreductases/Acting on NADH or NADPH 525 1 525 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = NADH peroxidase 215 Table S11 Cont.

GKJWQY101BRRK3 K01118 FMN-dependent NADH-azoreductase [EC:1.7.-.-] Oxidoreductases[Acting on nitrogenous compounds as donors] 503 1 503 6.0E-159 87% 87% 299075079 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum PSI07 megaplasmid mpPSI07, complete sequence, product = FMN-dependent NADH-azoreductase (FMN-dependent NADH-azo compound oxidoreductase) (Azo-dye reductase)

GKJWQY101BYRYA K01918 pantoate--beta-alanine ligase [EC:6.3.2.1] Pantothenate and CoA biosynthesis/beta-Alanine metabolism 412 1 412 8.0E-162 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = pantothenate synthetase GKJWQY101AQW1B K01467 beta-lactamase [EC:3.5.2.6] and biosynthesis/beta-Lactam resistance/Two-component system 448 1 448 0.0E+00 99% 99% 407955691 Bacteria Firmicutes Bacilli Bacillus subtilis BEST7613 DNA, complete genome, product = beta-lactamase GKJWQY101BOVOO K00012 Pentose and glucuronate interconversions/Ascorbate and aldarate metabolism/Amino sugar and 290 12 290 5.0E-63 83% 83% 363404872 Bacteria Proteobacteria Alphaproteobacteria Brucella canis HSK A52141 chromosome 2, complete sequence, product = UDP glucose 6- nucleotide sugar metabolism/ dehydrogenase UDPglucose 6-dehydrogenase [EC:1.1.1.22] Starch and sucrose metabolism GKJWQY101BH0SD K00616 transaldolase [EC:2.2.1.2] Pentose phosphate pathway 322 1 322 2.0E-141 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = transaldolase GKJWQY101AV9UD K00874 2-dehydro-3-deoxygluconokinase [EC:2.7.1.45] Pentose phosphate pathway/Pentose and glucuronate interconversions 496 1 496 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = 2-keto-3-deoxygluconate kinase

GKJWQY101AOSDF K00948 ribose-phosphate pyrophosphokinase [EC:2.7.6.1] Pentose phosphate pathway/Purine metabolism 396 1 396 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = ribose-phosphate pyrophosphokinase GKJWQY101A52UG K03587 cell division protein FtsI (penicillin-binding protein 3) [EC:2.4.1.129] Peptidoglycan biosynthesis 391 1 391 2.0E-134 89% 89% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = penicillin-binding protein Oscillatoriales transpeptidase GKJWQY101AIWNW K03693 penicillin-binding protein Peptidoglycan biosynthesis 414 1 414 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = transglycosylase family protein

GKJWQY101AUBWF K06153 undecaprenyl-diphosphatase [EC:3.6.1.27] Peptidoglycan biosynthesis 469 1 469 0.0E+00 98% 98% 257048753 Bacteria Fusobacteria Fusobacteriales Leptotrichia buccalis DSM 1135, complete genome, product = undecaprenol kinase (PFAM: Bacitracin resistance protein BacA) GKJWQY101AYAGY K05712 3-(3-hydroxy-phenyl)propionate hydroxylase [EC:1.14.13.-] Phenylalanine metabolism 476 1 476 0.0E+00 96% 96% 387578572 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 2, complete sequence, product = 3-(3-hydroxy- phenyl)propionate hydroxylase GKJWQY101BTCIC K01451 hippurate hydrolase [EC:3.5.1.32] Phenylalanine metabolism 479 1 479 6.0E-154 87% 87% 299076774 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum str. PSI07 chromosome, complete genome, product = putative Hippurate hydrolase (hipO) GKJWQY101A7AXB K01735 3-dehydroquinate synthase [EC:4.2.3.4] Phenylalanine, tyrosine and tryptophan biosynthesis 501 1 501 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = 3-dehydroquinate synthase GKJWQY101AP61F K01090 Phosphoric-monoester hydrolases [serine-threonine] 417 1 417 0.0E+00 99% 99% 312279338 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii subsp. bulgaricus ND02, complete genome, product = Serine/threonine protein phosphatase [EC:3.1.3.16] protein phosphatase Stp1 GKJWQY101A6VDP K02794 PTS system, mannose-specific IIB component [EC:2.7.1.69] Phosphotransferase system (PTS)/Fructose and mannose metabolism/Amino sugar and nucleotide 253 1 253 3.0E-119 98% 98% 335369081 Bacteria Firmicutes Bacilli Streptococcus parasanguinis ATCC 15912, complete genome, product = PTS system, mannose- sugar metabolism specific IIB component GKJWQY101BWBP4 K02786 PTS system, -specific IIA component [EC:2.7.1.69] Phosphotransferase system (PTS)/Galactose metabolism 472 1 472 0.0E+00 100% 100% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = lactose-specific phosphotransferase enzyme IIA component GKJWQY101BMIJH K02787 PTS system, lactose-specific IIB component [EC:2.7.1.69] Phosphotransferase system (PTS)/Galactose metabolism 315 1 315 5.0E-163 100% 100% 406356677 Bacteria Firmicutes Bacilli Lactobacillus casei W56 complete genome, product = PTS system lactose-specific EIICB component

GKJWQY101BVTHW_2 T02375 S-layer domain-containing protein Photosynthesis 180 1 180 2.0E-74 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = S-layer domain-containing protein Oscillatoriales GKJWQY101BQ2XO K02634 apocytochrome f Photosynthesis 506 1 506 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = apocytochrome f GKJWQY101BIZEW K02689 photosystem I P700 chlorophyll a apoprotein A1 Photosynthesis 543 1 543 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = photosystem I P700 chlorophyll a Oscillatoriales apoprotein A2 (<1..244); photosystem I P700 chlorophyll a apoprotein A1 (355..>541) GKJWQY101AWFUY K02690 photosystem I P700 chlorophyll a apoprotein A2 Photosynthesis 429 1 429 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = photosystem I P700 chlorophyll a Oscillatoriales apoprotein A2 GKJWQY101BJE6N K02699 photosystem I subunit XI Photosynthesis 504 1 504 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = photosystem I reaction centre Oscillatoriales subunit XI PsaL (<1..262); glycosyl transferase family 2 (376..>506) GKJWQY101A5ZXB K02703 photosystem II P680 reaction center D1 protein Photosynthesis 336 1 336 7.0E-152 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = photosystem Q(B) protein Oscillatoriales (photosystem II PsbA/D1, reaction centre) GKJWQY101A0MVM K02705 photosystem II CP43 chlorophyll apoprotein Photosynthesis 430 1 430 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = photosystem II 44 kDa subunit Oscillatoriales reaction center protein GKJWQY101BY2F1 K02285 phycocyanin beta chain Photosynthesis - antenna proteins 442 1 442 0.0E+00 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = Phycocyanin GKJWQY101BDPWW K02358 elongation factor Tu Plant-pathogen interaction 487 1 487 3.0E-147 86% 86% 384071898 Bacteria Firmicutes Bacilli Halobacillus halophilus DSM 2266 complete genome, product = translation elongation factor Tu GKJWQY101BVGC1 K02358 elongation factor Tu Plant-pathogen interaction 455 1 455 2.0E-128 85% 85% 307830710 Bacteria Firmicutes Bacilli Staphylococcus kloosii strain DSM 20676 elongation factor Tu (tuf) gene, partial cds GKJWQY101BRE7O K01790 dTDP-4-dehydrorhamnose 3,5-epimerase [EC:5.1.3.13] Polyketide sugar unit biosynthesis/Streptomycin biosynthesis 132 1 132 1.3E+02 79% 79% 451908558 Bacteria Proteobacteria Gammaproteobacteria Salmonella enterica subsp. enterica serovar Javiana str. CFSAN001992, complete genome, product = dTDP-4-dehydrorhamnose 3,5-epimerase GKJWQY101ASQXM_2 K02014 mechanosensitive ion channel Pores ion channels 93 1 93 3.0E-20 87% 87% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = MscS Mechanosensitive ion channel Oscillatoriales GKJWQY101BLISX K04744 LPS-assembly protein Pores ion channels 515 1 515 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = outer membrane protein Imp, required for envelope biogenesis/Organic solvent tolerance protein precursor GKJWQY101AH1BL K01599 uroporphyrinogen decarboxylase [EC:4.1.1.37] Porphyrin and chlorophyll metabolism 494 1 494 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = uroporphyrinogen decarboxylase Oscillatoriales GKJWQY101AU6TO K01845 glutamate-1-semialdehyde 2,1-aminomutase [EC:5.4.3.8] Porphyrin and chlorophyll metabolism 392 1 392 2.0E-153 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = glutamate-1-semialdehyde 2,1- Oscillatoriales aminomutase GKJWQY101BUDW0 K03403 magnesium chelatase subunit H [EC:6.6.1.1] Porphyrin and chlorophyll metabolism 479 1 479 0.0E+00 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = cobaltochelatase CobN subunit (note: Oscillatoriales CobN/Magnesium Chelatase) GKJWQY101AB54X K06042 precorrin-8X methylmutase [EC:5.4.1.2] Porphyrin and chlorophyll metabolism 321 1 321 8.0E-151 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = precorrin-8X methylmutase GKJWQY101BVYRA K00228 Porphyrin and chlorophyll metabolism 339 1 339 3.0E-106 88% 88% 336024847 Bacteria Proteobacteria Alphaproteobacteria Mesorhizobium opportunistum WSM2075, complete genome, product = coproporphyrinogen III coproporphyrinogen III oxidase [EC:1.3.3.3] oxidase GKJWQY101AQPZB K04037 light-independent protochlorophyllide reductase subunit L [EC:1.18.-.-] Porphyrin and chlorophyll metabolism 376 1 376 1.0E-176 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = ferredoxin protochlorophyllide Oscillatoriales reductase subunit L GKJWQY101ACHK0 K04039 light-independent protochlorophyllide reductase subunit B [EC:1.18.-.-] Porphyrin and chlorophyll metabolism 355 1 355 2.0E-167 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = ferredoxin protochlorophyllide Oscillatoriales reductase subunit B GKJWQY101ASQXM K00510 heme oxygenase [EC:1.14.99.3] Porphyrin and chlorophyll metabolism/Mineral absorption 263 1 263 2.0E-122 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = heme oxygenase GKJWQY101BRN0I K07250 4-aminobutyrate aminotransferase / (S)-3-amino-2-methylpropionate transaminase Propanoate metabolism 480 1 480 3.0E-57 76% 76% 344313278 Bacteria Actinobacteria Actinobacteria Streptomyces sp. SirexAA-E, complete genome, product = 4-aminobutyrate aminotransferase [EC:2.6.1.22 2.6.1.19] GKJWQY101APEQF K03101 signal peptidase II [EC:3.4.23.36] Protein export 430 1 430 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = lipoprotein signal peptidase GKJWQY101BUW9I K01524 exopolyphosphatase / guanosine-5'-triphosphate,3'-diphosphate pyrophosphatase Purine metabolism 484 1 484 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = Ppx/GppA phosphatase [EC:3.6.1.40 3.6.1.11] Oscillatoriales GKJWQY101BSGTQ K01952 phosphoribosylformylglycinamidine synthase [EC:6.3.5.3] Purine metabolism 461 1 461 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = phosphoribosylformylglycinamidine Oscillatoriales synthase subunit II GKJWQY101ANN8E K00088 Purine metabolism 368 1 368 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = inosine-5'-monophosphate IMP dehydrogenase [EC:1.1.1.205] dehydrogenase GKJWQY101ASHPM K01139 guanosine-3',5'-bis(diphosphate) 3'-pyrophosphohydrolase [EC:3.1.7.2] Purine metabolism 481 1 481 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = GTP pyrophosphokinase, (p)ppGpp synthetase II / Guanosine-3',5'-bis(diphosphate) 3'-pyrophosphohydrolase GKJWQY101BS894 K01933 phosphoribosylformylglycinamidine cyclo-ligase [EC:6.3.3.1] Purine metabolism 355 1 355 3.0E-180 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = phosphoribosylformylglycinamidine cyclo-ligas GKJWQY101BI3EE K01951 GMP synthase (glutamine-hydrolysing) [EC:6.3.5.2] Purine metabolism/Drug metabolism - cytochrome P450 341 1 341 1.0E-154 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = GMP synthase (glutamine- Oscillatoriales hydrolyzing) GKJWQY101A7UX8 K01951 GMP synthase (glutamine-hydrolysing) [EC:6.3.5.2] Purine metabolism/Drug metabolism - cytochrome P450 469 1 469 4.0E-96 81% 81% 110283346 Bacteria Proteobacteria Alphaproteobacteria Chelativorans sp. BNC1, complete genome, product = GMP synthase (glutamine-hydrolyzing) GKJWQY101AF62K K01768 adenylate cyclase [EC:4.6.1.1] Purine metabolism/Meiosis-yeast 488 1 488 0.0E+00 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = adenylate/guanylate cyclase with Oscillatoriales GAF sensor(s) GKJWQY101BHHIQ K00525 Purine metabolism/Pyrimidine metabolism 484 1 484 2.0E-108 82% 82% 78609255 Bacteria Firmicutes Bacilli Lactobacillus sakei strain 23K complete genome, product = ribonucleoside-diphosphate reductase, ribonucleoside-diphosphate reductase alpha chain [EC:1.17.4.1] alpha chain GKJWQY101AQ3AJ K02340 DNA polymerase III subunit delta [EC:2.7.7.7] Purine metabolism/Pyrimidine metabolism/DNA replication/Mismatch repair/Homologous 434 1 434 3.0E-176 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DNA polymerase III, delta subunit recombination Oscillatoriales GKJWQY101A1FIZ K02343 DNA polymerase III subunit gamma/tau [EC:2.7.7.7] Purine metabolism/Pyrimidine metabolism/DNA replication/Mismatch repair/Homologous 436 1 436 0.0E+00 99% 99% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = DNA polymerase III, subunit recombination gamma and tau GKJWQY101AW79B K00962 polyribonucleotide nucleotidyltransferase [EC:2.7.7.8] Purine metabolism/Pyrimidine metabolism/RNA degradation 502 1 502 9.0E-123 83% 83% 319414919 Bacteria Bacteroidetes Bacteroidia Bacteroides helcogenes P 36-108, complete genome, product = polyribonucleotide nucleotidyltransferase GKJWQY101BZRZ2 K03046 DNA-directed RNA polymerase subunit beta' [EC:2.7.7.6] Purine metabolism/Pyrimidine metabolism/RNA polymerase 430 1 430 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = DNA-directed RNA polymerase beta' subunit GKJWQY101BUUVY K00226 dihydroorotate dehydrogenase (fumarate) [EC:1.3.98.1] Pyrimidine metabolism 433 1 433 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = dihydroorotate oxidase A GKJWQY101ANJN3 K00756 Pyrimidine metabolism 355 1 355 5.0E-173 98% 98% 333956887 Bacteria Firmicutes Bacilli Lactobacillus kefiranofaciens ZW3, complete genome, product = pyrimidine-nucleoside phosphorylase pyrimidine-nucleoside phosphorylase [EC:2.4.2.2] GKJWQY101BOOIO K00857 thymidine kinase [EC:2.7.1.21] Pyrimidine metabolism 352 1 352 5.0E-168 98% 98% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = thymidine kinase family protein

GKJWQY101A7UM0 K01494 dCTP deaminase [EC:3.5.4.13] Pyrimidine metabolism 444 1 444 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = arginine decarboxylase, ornithine decarboxylase, lysine decarboxylase (<1..70); deoxycytidine triphosphate deaminase (152..>444) GKJWQY101A9VNV K01955 carbamoyl-phosphate synthase large subunit [EC:6.3.5.5] Pyrimidine metabolism/Alanine, aspartate and glutamate metabolism 469 1 469 7.0E-163 89% 89% 71037566 Bacteria Proteobacteria Gammaproteobacteria Psychrobacter arcticus 273-4, complete genome, product = carbamoyl-phosphate synthase large subunit GKJWQY101BUBHU K01485 cytosine deaminase [EC:3.5.4.1] Pyrimidine metabolism/Arginine and proline metabolism 483 1 483 1.0E-150 87% 87% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = cytosine deaminase GKJWQY101BNDRC K00560 thymidylate synthase [EC:2.1.1.45] Pyrimidine metabolism/One carbon pool by folate 302 1 302 5.0E-148 99% 99% 413973243 Bacteria Firmicutes Bacilli Lactococcus lactis subsp. cremoris UC509.9, complete genome, product = thymidylate synthase GKJWQY101AH75P K00158 Pyruvate metabolism 464 1 464 0.0E+00 99% 99% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc705, product = pyruvate oxidase pyruvate oxidase [EC:1.2.3.3] GKJWQY101A94GH K01759 lactoylglutathione lyase [EC:4.4.1.5] Pyruvate metabolism/MAPK signaling pathway-yeast 289 1 289 3.0E-115 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = lactoylglutathione lyase GKJWQY101AZOWP K01007 pyruvate, water dikinase [EC:2.7.9.2] Pyruvate metabolism/Methane metabolism/Carbon fixation pathways in prokaryotes 482 1 482 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = phosphoenolpyruvate synthase

GKJWQY101ARU6X_2 K01977 16S ribosomal RNA gene Ribosome 179 1 179 2.0E-79 97% 97% 209915940 Bacteria n n Uncultured bacterium clone PBM_b9 16S ribosomal RNA gene, partial sequence GKJWQY101BO1HI_2 K01980 23S ribosomal RNA Ribosomes 271 29 271 2.0E-117 99% 99% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = 23S rRNA gene GKJWQY101BVVMM_2 K01980 23S ribosomal RNA Ribosomes 249 8 248 2.0E-120 99% 99% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = 23S rRNA gene GKJWQY101BV866_2 K01980 23S ribosomal RNA Ribosomes 171 1 171 3.0E-78 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = 23S rRNA gene GKJWQY101A4443_2 K01980 23S ribosomal RNA Ribosomes 334 1 334 2.0E-152 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = signal peptidase I (<1..241); Oscillatoriales exonuclease V subunit alpha (303..>444) 216 Table S11 Cont.

GKJWQY101BK048_2 K08311 putative (di)nucleoside polyphosphate hydrolase [EC:3.6.1.-] RNA degradation 122 1 120 5.0E-39 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = NUDIX hydrolase (Nucleoside Oscillatoriales Diphosphate linked to X) GKJWQY101ABJOM K05592 ATP-dependent RNA helicase DeaD [EC:3.6.4.13] RNA degradation 512 1 512 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DEAD/DEAH box helicase domain Oscillatoriales protein GKJWQY101BRBEO K05592 ATP-dependent RNA helicase DeaD [EC:3.6.4.13] RNA degradation 517 1 517 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DEAD/DEAH box helicase domain Oscillatoriales protein GKJWQY101AOWCF K03654 ATP-dependent DNA helicase RecQ [EC:3.6.4.12] RNA degradation 413 1 413 0.0E+00 93% 93% 312279338 Bacteria Firmicutes Bacilli Lactobacillus delbrueckii subsp. bulgaricus ND02, complete genome, product = ATP-dependent DNA helicase RecQ GKJWQY101A17VU K03628 transcription termination factor Rho RNA degradation 411 1 411 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = transcription termination factor Rho GKJWQY101AWCAE K12598 ATP-dependent RNA helicase DOB1 [EC:3.6.4.13] RNA degradation [Proteasome] 411 1 411 0.0E+00 97% 97% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DSH domain protein (superfamily II Oscillatoriales RNA helicase) GKJWQY101A7U37 K12614 ATP-dependent RNA helicase DDX6/DHH1 [EC:3.6.4.13] RNA degradation [Proteasome] 475 1 475 0.0E+00 99% 99% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = DEAD/DEAH box helicase domain Oscillatoriales protein GKJWQY101BSNY0 K03231 elongation factor 1-alpha RNA transport 510 1 510 0.0E+00 91% 91% 71483398 Eukaryota Basidiomycota Microbotryomycetes Microbotryum violaceum isolate Dsylvestris_9119 elongation factor 1alpha (EF1) gene, partial cds

GKJWQY101ASUL5 K00690 sucrose phosphorylase [EC:2.4.1.7] Starch and sucrose metabolism 405 1 405 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoriales Oscillatoria nigro-viridis PCC 7112, complete genome, product = synthase GKJWQY101AFXGY K05343 alpha-D-glucosyltransferase [EC:5.4.99.16] Starch and sucrose metabolism 394 1 394 0.0E+00 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = trehalose synthase (PRIAM: Maltose Oscillatoriales alpha-D-glucosyltransferase) GKJWQY101BPKKT K00703 starch synthase [EC:2.4.1.21] Starch and sucrose metabolism 495 1 495 0.0E+00 95% 95% 326682110 Bacteria Firmicutes Bacilli Streptococcus oralis Uo5 complete genome sequence, product = glycogen synthase GKJWQY101BG8L6 K01710 dTDP-glucose 4,6-dehydratase [EC:4.2.1.46] Streptomycin biosynthesis/Polyketide sugar unit biosynthesis/Biosynthesis of vancomycin group 502 1 502 3.0E-172 89% 89% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = UDP-glucuronate decarboxylase Oscillatoriales (<1..263); nucleotide sugar dehydrogenase (384..>487) GKJWQY101AFENF K00381 Sulfur metabolism 417 1 417 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = Sulfite reductase [NADPH] sulfite reductase (NADPH) hemoprotein beta-component [EC:1.8.1.2] hemoprotein beta-component GKJWQY101BCE0I K01738 cysteine synthase A [EC:2.5.1.47] Sulfur metabolism/Cysteine and methionine metabolism 518 1 518 0.0E+00 99% 99% 257152781 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 plasmid sequence, strain Lc 705, product = cystathionine beta- synthase GKJWQY101BWI6R K00566 tRNA-specific 2-thiouridylase [EC:2.8.1.-] Sulfur relay system 526 1 526 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = tRNA (5-methylaminomethyl-2- Oscillatoriales thiouridylate)-methyltransferase GKJWQY101ACULM K03639 molybdenum cofactor biosynthesis protein Sulfur relay system; Folate biosynthesis 546 1 546 0.0E+00 96% 96% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = cyclic pyranopterin monophosphate Oscillatoriales synthase subunit MoaA GKJWQY101BFZWI K00806 Terpenoid backbone biosynthesis 493 1 493 0.0E+00 98% 98% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = undecaprenyl pyrophosphate undecaprenyl diphosphate synthase [EC:2.5.1.31] Oscillatoriales synthetase GKJWQY101BZA8P K00805 Terpenoid backbone biosynthesis 256 1 256 2.0E-130 100% 100% 355393429 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus ATCC 8530, complete genome, product = polyprenyl synthetase family heptaprenyl diphosphate synthase [EC:2.5.1.30] protein GKJWQY101BCEGP K00919 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase [EC:2.7.1.148] Terpenoid backbone biosynthesis 408 1 408 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = 4-diphosphocytidyl-2-C-methyl- D-erythritol kinase GKJWQY101AKUTV K03527 4-hydroxy-3-methylbut-2-enyl diphosphate reductase [EC:1.17.1.2] Terpenoid backbone biosynthesis 478 1 478 0.0E+00 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (<1..263); FKBP-type peptidyl-prolyl cis-trans isomerase slpA (266..>479)

GKJWQY101ARLFL K00788 Thiamine metabolism 450 1 450 2.0E-178 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = thiamine-phosphate diphosphorylase thiamine-phosphate pyrophosphorylase [EC:2.5.1.3] Oscillatoriales GKJWQY101BCW7I K03147 thiamine biosynthesis protein ThiC Thiamine metabolism 406 1 406 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = hydroxymethylpyrimidine synthase Oscillatoriales (ThiC family) GKJWQY101ASDW0 K00897 Transferases [Phosphotransferases/ acceptor - alcohol group] 456 1 456 0.0E+00 99% 99% 392495344 Bacteria Bacteroidetes Flavobacteriia Riemerella anatipestifer strain W9 aminoglycoside-3'-O-phosphotransferase (aph) gene, partial cds aminoglycoside 3'-phosphotransferase [EC:2.7.1.95] GKJWQY101BVJ36 K00936 E2.7.3.- Transferases[Phosphotransferases/ acceptor - nitrogenous group] 433 1 433 4.0E-180 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = PAS/PAC sensor signal transduction Oscillatoriales histidine kinase GKJWQY101AU6TO_2 K07769 Cyan7425_1370 Signal transduction histidine kinase Two-component system 143 1 143 1.0E-21 82% 82% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = 844 bp at 5' side: putative GAF Oscillatoriales sensor protein, 178 bp at 3' side: phosphoesterase RecJ domain protein GKJWQY101BN289 K07778 two-component system, NarL family, sensor histidine kinase DesK [EC:2.7.13.3] Two-component system 470 1 470 0.0E+00 93% 93% 339291081 Bacteria Firmicutes Bacilli Streptococcus salivarius 57.I, complete genome, product = signal transduction histidine kinase GKJWQY101AG52Q K07709 two-component system, NtrC family, sensor histidine kinase HydH [EC:2.7.13.3] Two-component system 208 1 280 1.0E-68 90% 90% 469772332 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum FQY_4, complete genome, product = signal transduction histidine kinase regulating C4-dicarboxylate transport system" GKJWQY101AHQ0Q K13924 two-component system, chemotaxis family, CheB/CheR fusion protein [EC:3.1.1.61 Two-component system 197 1 197 5.0E-71 92% 92% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = signal transduction histidine kinase 2.1.1.80] Oscillatoriales with CheB and CheR activity GKJWQY101A0TE3 K02488 two-component system, cell cycle response regulator Two-component system/Cell cycle - Caulobacter 447 1 447 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = response regulator receiver Oscillatoriales modulated diguanylate cyclase GKJWQY101BDHFA K02313 chromosomal replication initiator protein Two-component system/Cell cycle - Caulobacter 246 1 246 9.0E-125 100% 100% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc 705 product = chromosomal replication initiator protein dnaA GKJWQY101BVI18 K03407 two-component system, chemotaxis family, sensor kinase CheA [EC:2.7.13.3] Two-component system; Bacterial chemotaxis 511 1 511 0.0E+00 95% 95% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = CheA signal transduction histidine Oscillatoriales kinase GKJWQY101A6MA3 K00680 [EC:2.3.1.-] Tyrosine metabolism/Benzoate degradation/Aminobenzoate degradation/Ethylbenzene 522 3 522 1.0E-111 81% 81% 78609255 Bacteria Firmicutes Bacilli Lactobacillus sakei strain 23K complete genome, product = putative N-Acetyltransferase, GNAT degradation/Limonene and pinene degradation family (<1..420); hypothetical protein (483..>521) GKJWQY101ADM9M K02551 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase Ubiquinone and other terpenoid-quinone biosynthesis 346 1 346 6.0E-143 93% 93% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = 2-succinyl-5-enolpyruvyl-6-hydroxy- [EC:2.2.1.9] Oscillatoriales 3-cyclohexene-1-carboxylatesynthase GKJWQY101AAEO3 K03183 ubiquinone/menaquinone biosynthesis methyltransferase [EC:2.1.1.- 2.1.1.163] Ubiquinone and other terpenoid-quinone biosynthesis 402 1 402 9.0E-172 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = demethylmenaquinone Oscillatoriales methyltransferase GKJWQY101AOVNZ K00053 Valine, leucine and isoleucine biosynthesis/ 326 13 312 7.0E-67 83% 83% 323467537 Bacteria Actinobacteria Actinobacteria Arthrobacter phenanthrenivorans Sphe3, complete genome, product = ketol-acid reductoisomerase ketol-acid reductoisomerase [EC:1.1.1.86] Pantothenate and CoA biosynthesis GKJWQY101BVJE8 K00053 Valine, leucine and isoleucine biosynthesis/ 301 1 301 7.0E-67 83% 83% 323467537 Bacteria Actinobacteria Actinobacteria Arthrobacter phenanthrenivorans Sphe3, complete genome, product = ketol-acid reductoisomerase ketol-acid reductoisomerase [EC:1.1.1.86] Pantothenate and CoA biosynthesis GKJWQY101ASA5O K00053 Valine, leucine and isoleucine biosynthesis/ 214 1 211 2.0E-25 78% 78% 29342101 Bacteria Bacteroidetes Bacteroidia Bacteroides thetaiotaomicron VPI-5482, complete genome, product = ketol-acid reductoisomerase ketol-acid reductoisomerase [EC:1.1.1.86] Pantothenate and CoA biosynthesis GKJWQY101AJPN3 K01652 acetolactate synthase I/II/III large subunit [EC:2.2.1.6] Valine, leucine and isoleucine biosynthesis/Butanoate metabolism/C5-Branched dibasic acid 498 1 498 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = acetolactate synthase metabolism/Pantothenate and CoA biosynthesis Oscillatoriales GKJWQY101BF7DE K01703 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.35 Valine, leucine and isoleucine biosynthesis/C5-Branched dibasic acid metabolism 437 1 437 0.0E+00 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = homoaconitate hydratase family 4.2.1.33] Oscillatoriales protein (3-isopropylmalate dehydratase, large subunit) GKJWQY101APBSQ K00020 Valine, leucine and isoleucine degradation 444 20 444 0.0E+00 99% 99% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc 705, product = 3- 3-hydroxyisobutyrate dehydrogenase [EC:1.1.1.31] hydroxyisobutyrate dehydrogenase GKJWQY101BTV1D K00020 Valine, leucine and isoleucine degradation 502 20 502 0.0E+00 100% 100% 257149867 Bacteria Firmicutes Bacilli Lactobacillus rhamnosus Lc 705 whole genome sequence, strain Lc 705, product = 3- 3-hydroxyisobutyrate dehydrogenase [EC:1.1.1.31] hydroxyisobutyrate dehydrogenase GKJWQY101BN3ES K00826 branched-chain amino acid aminotransferase [EC:2.6.1.42] Valine, leucine and isoleucine degradation/Valine, leucine and isoleucine biosynthesis/Pantothenate 158 1 158 9.0E-58 94% 94% 428238862 Bacteria Cyanobacteria Oscillatoria nigro-viridis PCC 7112, complete genome, product = branched-chain amino acid and CoA biosynthesis Oscillatoriales aminotransferase GKJWQY101A4443 K03087 RNA polymerase nonessential primary-like sigma factor pathogenic cycle 162 1 162 1.0E-76 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = RNA polymerase sigma factor RpoS GKJWQY101ACXQT_2 K01939 adenylosuccinate synthase [EC:6.3.4.4] Purine metabolism/Alanine, aspartate and glutamate metabolism 155 140 3 3.0E-06 61% 65%* 322437074 Bacteria Acidobacteria Acidobacteriales adenylosuccinate synthetase [Granulicella tundricola MP5ACTX9] GKJWQY101A5QPE K01945 phosphoribosylamine--glycine ligase [EC:6.3.4.13] Purine metabolism 546 526 17 1.0E-73 72% 81%* 308178295 Bacteria Actinobacteria Actinobacteria phosphoribosylamine--glycine ligase [Arthrobacter arilaitensis Re117] GKJWQY101BXMTR K09686 antibiotic transport system permease protein ABC transporters 377 3 377 2.0E-78 99% 99%* 471965147 Bacteria Actinobacteria Actinobacteria putative ABC transporter, permease protein [Arthrobacter gangotriensis Lz1y] GKJWQY101AOY6A K07823 3-oxoadipyl-CoA thiolase [EC:2.3.1.174] Benzoate degradation 185 168 1 2.0E-24 86% 92%* 471966247 Bacteria Actinobacteria Actinobacteria beta-ketoadipyl CoA thiolase [Arthrobacter gangotriensis Lz1y] GKJWQY101A5BE1 K01918 pantoate--beta-alanine ligase [EC:6.3.2.1] Pantothenate and CoA biosynthesis/beta-Alanine metabolism 409 404 3 2.0E-90 100% 100%* 471964577 Bacteria Actinobacteria Actinobacteria pantoate/beta-alanine ligase [Arthrobacter gangotriensis Lz1y] GKJWQY101BEK18 K00666 fatty-acyl-CoA synthase [EC:6.2.1.-] Lipid biosynthesis [Ligases forming carbon-sulfur bonds] 385 381 10 4.0E-59 75% 84%* 359771954 Bacteria Actinobacteria Actinobacteria putative fatty-acid--CoA ligase [Gordonia effusa NBRC 100432] GKJWQY101ACH7I K00859 dephospho-CoA kinase [EC:2.7.1.24] Pantothenate and CoA biosynthesis 345 1 333 1.0E-31 65% 81%* 308177711 Bacteria Actinobacteria dephospho-CoA kinase [Arthrobacter arilaitensis Re117] GKJWQY101BECUD K01493 dCMP deaminase [EC:3.5.4.12] Pyrimidine metabolism 384 329 9 7.0E-62 87% 93%* 256826526 Bacteria Actinobacteria Coriobacteridae deoxycytidylate deaminase [Cryptobacterium curtum DSM 15641] GKJWQY101BNN0W K00982 glutamate-ammonia-ligase adenylyltransferase [EC:2.7.7.42] Transferases[Adenylyltransferase/ glutamate---ammonia-ligase] 419 419 3 6.0E-61 75% 76%* 88854956 Bacteria Actinobacteria n glutamine synthetase adenylyltransferase [marine actinobacterium PHSC20C1] GKJWQY101BZOWH K01921 D-alanine-D-alanine ligase [EC:6.3.2.4] D-Alanine metabolism/Peptidoglycan biosynthesis 361 1 360 2.0E-43 60% 80%* 88854490 Bacteria Actinobacteria n D-alanine--D-alanine ligase [marine actinobacterium PHSC20C1] hypothetical protein BACCOP_01513 [Bacteroides coprocola DSM 17136] note = ABC-type GKJWQY101BPNKE K02015 iron complex transport system permease protein ABC transporters 486 485 3 7.0E-57 100% 100%* 189460866 Bacteria Bacteroidetes Bacteroidia enterobactin transport system, permease component GKJWQY101BPUSX K00278 L-aspartate oxidase [EC:1.4.3.16] Alanine, aspartate and glutamate metabolism/Nicotinate and nicotinamide metabolism 398 4 387 1.0E-82 96% 96%* 189431608 Bacteria Bacteroidetes Bacteroidia L-aspartate oxidase [Bacteroides coprocola DSM 17136] GKJWQY101AMLGF K00262 glutamate dehydrogenase (NADP+) [EC:1.4.1.4] Alanine, aspartate and glutamate metabolism/Arginine and proline metabolism/Nitrogen metabolism 310 309 13 7.0E-62 98% 100%* 189431537 Bacteria Bacteroidetes Bacteroidia glutamate dehydrogenase, NAD-specific [Bacteroides coprocola DSM 17136] GKJWQY101BWE86 K00278 L-aspartate oxidase [EC:1.4.3.16] Alanine, aspartate and glutamate metabolism/Nicotinate and nicotinamide metabolism 369 4 366 3.0E-78 97% 98%* 189431608 Bacteria Bacteroidetes Bacteroidia L-aspartate oxidase [Bacteroides coprocola DSM 17136] hypothetical protein BACCOP_02664 [Bacteroides coprocola DSM17136] note = Glutamine GKJWQY101BNDWQ K00764 amidophosphoribosyltransferase [EC:2.4.2.14] Purine metabolism/Alanine, aspartate and glutamate metabolism 520 520 14 5.0E-102 88% 95%* 189461995 Bacteria Bacteroidetes Bacteroidia phosphoribosylpyrophosphate amidotransferase GKJWQY101BF26C K09808 lipoprotein-releasing system permease protein ABC transporters 387 386 3 6.0E-66 78% 88%* 224025277 Bacteria Bacteroidetes Bacteroidia hypothetical protein BACCOPRO_02016 [Bacteroides coprophilus DSM 18228] UDP-N-acetylglucosamine--N-acetylmuramyl-(pentapeptide) pyrophosphoryl- GKJWQY101BUDXS K02563 Peptidoglycan biosynthesis/Cell cycle - Caulobacter 347 338 3 1.0E-41 69% 77%* 198274308 Bacteria Bacteroidetes Bacteroidia hypothetical protein BACPLE_00452 [Bacteroides plebeius DSM 17135] undecaprenol N-acetylglucosamine transferase [EC:2.4.1.227] GKJWQY101BDF2G K01685 altronate hydrolase [EC:4.2.1.7] Pentose and glucuronate interconversions 397 3 392 5.0E-34 48% 67%* 325300639 Bacteria Bacteroidetes Bacteroidia altronate dehydratase [Bacteroides salanitronis DSM 18170] GKJWQY101AWYJI K00849 galactokinase [EC:2.7.1.6] Galactose metabolism/Amino sugar and nucleotide sugar metabolism 414 2 409 2.0E-82 90% 97%* 325298068 Bacteria Bacteroidetes Bacteroidia galactokinase [Bacteroides salanitronis DSM 18170] GKJWQY101BHOCV K00951 GTP pyrophosphokinase [EC:2.7.6.5] Purine metabolism 495 494 15 9.0E-87 85% 87%* 325298624 Bacteria Bacteroidetes Bacteroidia (p)ppGpp synthetase I SpoT/RelA [Bacteroides salanitronis DSM 18170] GKJWQY101BN3QR K00428 cytochrome c peroxidase [EC:1.11.1.5] Oxidoreductases [Acting on a peroxide as acceptor] 513 512 9 7.0E-93 79% 88%* 265763882 Bacteria Bacteroidetes Bacteroidia cytochrome-C peroxidase [Bacteroides sp. 2_1_16] GKJWQY101BHNZ6 K05349 beta-glucosidase [EC:3.2.1.21] Phenylpropanoid biosynthesis 405 1 405 9.0E-62 75% 83%* 317480750 Bacteria Bacteroidetes Bacteroidia glycosyl hydrolase family 3 C terminal domain-containing protein [Bacteroides sp. 4_1_36] GKJWQY101BNPQZ K01284 peptidyl-dipeptidase Dcp [EC:3.4.15.5] Acting on peptide bonds (peptidases) 382 379 14 4.0E-69 88% 92%* 237709863 Bacteria Bacteroidetes Bacteroidia peptidyl-dipeptidase [Bacteroides sp. 9_1_42FAA] GKJWQY101AV49L K01889 phenylalanyl-tRNA synthetase alpha chain [EC:6.1.1.20] Aminoacyl-tRNA biosynthesis 161 5 160 6.0E-18 77% 90%* 150004775 Bacteria Bacteroidetes Bacteroidia phenylalanyl-tRNA synthetase subunit alpha [Bacteroides vulgatus ATCC 8482] GKJWQY101AZOQD K01662 1-deoxy-D-xylulose-5-phosphate synthase [EC:2.2.1.7] Thiamine metabolism/Terpenoid backbone biosynthesis 246 6 245 1.0E-22 65% 70%* 150004317 Bacteria Bacteroidetes Bacteroidia 1-deoxy-D-xylulose-5-phosphate synthase [Bacteroides vulgatus ATCC 8482] GKJWQY101BPTLK K00975 glucose-1-phosphate adenylyltransferase [EC:2.7.7.27] Starch and sucrose metabolism/Amino sugar and nucleotide sugar metabolism 338 8 337 1.0E-51 73% 80%* 375145467 Bacteria Bacteroidetes Bacteroidia glucose-1-phosphate adenylyltransferase [Niastella koreensis GR20-10] GKJWQY101ANZNS K01206 alpha-L-fucosidase [EC:3.2.1.51] Other glycan degradation 417 2 415 1.0E-64 75% 82%* 374385611 Bacteria Bacteroidetes Bacteroidia hypothetical protein HMPREF9449_01500 [Odoribacter laneus YIT 12061] GKJWQY101ADMIV K01417 putative zinc metalloprotease [EC:3.4.24.-] Metalloendopeptidases 235 2 223 9.0E-31 76% 85%* 154490034 Bacteria Bacteroidetes Bacteroidia hypothetical protein PARMER_00263 [Parabacteroides merdae ATCC 43184] GKJWQY101AV49L_2 K01889 phenylalanyl-tRNA synthetase alpha chain [EC:6.1.1.20] Aminoacyl-tRNA biosynthesis 269 2 268 2.0E-40 75% 82%* 330996958 Bacteria Bacteroidetes Bacteroidia phenylalanine--tRNA ligase, alpha subunit [Paraprevotella xylaniphila YIT 11841] GKJWQY101AD9QP K10536 agmatine deiminase [EC:3.5.3.12] Arginine and proline metabolism 463 461 3 3.0E-92 97% 98%* 281420760 Bacteria Bacteroidetes Bacteroidia peptidylarginine deiminase-related protein [Prevotella copri DSM 18205] 217 Table S11 Cont.

GKJWQY101BM66A K01892 histidyl-tRNA synthetase [EC:6.1.1.21] Aminoacyl-tRNA biosynthesis 474 474 1 2.0E-101 100% 100%* 357043085 Bacteria Bacteroidetes Bacteroidia histidyl-tRNA synthetase [Prevotella histicola F0411] GKJWQY101A8MT2 K01892 histidyl-tRNA synthetase [EC:6.1.1.21] Aminoacyl-tRNA biosynthesis 497 3 497 2.0E-109 96% 97%* 357043085 Bacteria Bacteroidetes Bacteroidia histidyl-tRNA synthetase [Prevotella histicola F0411] GKJWQY101AGV5Y K01892 histidyl-tRNA synthetase [EC:6.1.1.21] Aminoacyl-tRNA biosynthesis 474 474 1 2.0E-101 100% 100%* 357043085 Bacteria Bacteroidetes Bacteroidia histidyl-tRNA synthetase [Prevotella histicola F0411] Pentose and glucuronate interconversions/Ascorbate and aldarate metabolism/Amino sugar and GKJWQY101BO1HI K00012 UDPglucose 6-dehydrogenase [EC:1.1.1.22] nucleotide sugar metabolism/ 256 9 254 5.0E-32 62% 85%* 298209197 Bacteria Bacteroidetes Flavobacteria UDP-glucose 6-dehydrogenase [Croceibacter atlanticus HTCC2559] Starch and sucrose metabolism Fatty acid metabolism/Valine, leucine and isoleucine degradation/beta-Alanine GKJWQY101AJ8K7 K00249 acyl-CoA dehydrogenase [EC:1.3.99.3] 399 3 398 6.0E-59 88% 92%* 146298219 Bacteria Bacteroidetes Flavobacteria acyl-CoA dehydrogenase [Flavobacterium johnsoniae UW101] metabolism/Propanoate metabolism/PPAR signaling pathway GKJWQY101AGQIA K00257 [EC:1.3.99.-] Geraniol degradation 313 311 12 5.0E-52 86% 93%* 374595657 Bacteria Bacteroidetes Flavobacteria acyl-CoA dehydrogenase domain-containing protein [Gillisia limnaea DSM 15749] Glyoxylate and dicarboxylate metabolism/Propanoate metabolism/Carbon fixation pathways in GKJWQY101A3F2S K01966 propionyl-CoA carboxylase beta chain [EC:6.4.1.3] 340 338 3 6.0E-68 95% 97%* 332877849 Bacteria Bacteroidetes Flavobacteriia carboxyl transferase domain protein [Capnocytophaga sp. oral taxon 329 str. F0087] prokaryotes/Valine, leucine and isoleucine degradation GKJWQY101AYM89 K00791 tRNA dimethylallyltransferase [EC:2.5.1.75] Zeatin biosynthesis 167 161 6 3.0E-19 77% 90%* 332876550 Bacteria Bacteroidetes Flavobacteriia tRNA dimethylallyltransferase [Capnocytophaga sp. oral taxon 329 str. F0087] Purine metabolism/Pyrimidine metabolism/DNA replication/Mismatch repair/Homologous GKJWQY101BROII K02342 DNA polymerase III subunit epsilon [EC:2.7.7.7] 210 2 208 9.0E-19 59% 73%* 375146288 Bacteria Bacteroidetes Sphingobacteriia DNA polymerase III subunit epsilon [Niastella koreensis GR20-10] recombination GKJWQY101BRUMM K05367 penicillin-binding protein 1C [EC:2.4.1.-] Peptidoglycan biosynthesis 485 478 2 5.0E-66 62% 74%* 149276515 Bacteria Bacteroidetes Sphingobacteriia putative penicillin-binding protein [Pedobacter sp. BAL39] GKJWQY101BQYQT K00333 NADH dehydrogenase I subunit D [EC:1.6.5.3] Oxidative phosphorylation/Nitrogen metabolism 252 245 6 9.0E-47 91% 98%* 149280567 Bacteria Bacteroidetes Sphingobacteriia NADH dehydrogenase I chain D [Pedobacter sp. BAL39] GKJWQY101BXW5Y K01190 beta-galactosidase [EC:3.2.1.23] Galactose metabolism/Other glycan degradation/Sphingolipid metabolism 401 1 396 5.0E-38 53% 71%* 149280049 Bacteria Bacteroidetes Sphingobacteriia beta-galactosidase [Pedobacter sp. BAL39] GKJWQY101AG8YC K00281 glycine dehydrogenase [EC:1.4.4.2] Glycine, serine and threonine metabolism 326 318 1 7.0E-58 87% 93%* 300770671 Bacteria Bacteroidetes Sphingobacteriia glycine dehydrogenase (decarboxylating) [Sphingobacterium spiritivorum ATCC 33861] GKJWQY101ASHYA K00517 [EC:1.14.-.-] Bisphenol degradation/Polycyclic aromatic hydrocarbon degradation/Aminobenzoate degradation 400 397 2 1.0E-50 64% 78%* 434407557 Bacteria Cyanobacteria Oscillatoriales cytochrome P450 [Cylindrospermum stagnale PCC 7417] GKJWQY101BCW87 K00611 ornithine carbamoyltransferase [EC:2.1.3.3] Arginine and proline metabolism 236 1 225 4.0E-44 93% 100%* 428319688 Bacteria Cyanobacteria Oscillatoriales ornithine carbamoyltransferase [Oscillatoria nigro-viridis PCC 7112] binding-protein-dependent transport systems inner membrane component [Oscillatoria nigro-viridis GKJWQY101AUUMV K11069 spermidine/putrescine transport system substrate-binding protein ABC transporters 414 3 413 4.0E-63 77% 98%* 428318681 Bacteria Cyanobacteria Oscillatoriales PCC 7112] bacteriocin-processing peptidase [Chroococcidiopsis thermalis PCC 7203] (note = ABC-type GKJWQY101ASBJ0 K06148 ATP-binding cassette, subfamily C, bacterial ABC transporters 408 402 1 1.0E-69 79% 88%* 428210401 Bacteria Cyanobacteria Pleurocapsales bacteriocin/lantibiotic exporters) GKJWQY101BRT8J K01937 CTP synthase [EC:6.3.4.2] Pyrimidine metabolism 306 1 249 2.0E-33 73% 80%* 308175443 Bacteria Firmicutes Bacilli CTP synthetase [Bacillus amyloliquefaciens DSM 7] GKJWQY101BVCXB K00014 shikimate dehydrogenase [EC:1.1.1.25] Phenylalanine, tyrosine and tryptophan biosynthesis 292 2 292 2.0E-34 60% 77%* 225866315 Bacteria Firmicutes Bacilli shikimate 5-dehydrogenase [Bacillus cereus 03BB102] GKJWQY101AD9H9 K04564 superoxide dismutase, Fe-Mn family [EC:1.15.1.1] Peroxisome 223 222 1 2.0E-21 68% 74%* 336114384 Bacteria Firmicutes Bacilli Superoxide dismutase [Bacillus coagulans 2-6] GKJWQY101BXEDH_2 K00259 alanine dehydrogenase [EC:1.4.1.1] Alanine, aspartate and glutamate metabolism/Taurine and hypotaurine metabolism 222 5 220 8.0E-33 81% 88%* 415884092 Bacteria Firmicutes Bacilli alanine dehydrogenase [Bacillus methanolicus MGA3] Glutathione metabolism/Taurine and hypotaurine metabolism/Cyanoamino acid GKJWQY101BHN6U K00681 gamma-glutamyltranspeptidase [EC:2.3.2.2] 367 366 16 1.0E-57 78% 89%* 398310944 Bacteria Firmicutes Bacilli gamma-glutamyltranspeptidase [Bacillus mojavensis RO-H-1] metabolism/Arachidonic acid metabolism GKJWQY101BQAM3 K00073 ureidoglycolate dehydrogenase [EC:1.1.1.154] Purine metabolism 468 2 457 6.0E-74 71% 73%* 458798528 Bacteria Firmicutes Bacilli ureidoglycolate dehydrogenase [Bacillus sonorensis L12] GKJWQY101BJ19A K01953 synthase (glutamine-hydrolysing) [EC:6.3.5.4] Nitrogen metabolism/Alanine, aspartate and glutamate metabolism 420 1 420 3.0E-45 58% 72%* 403236697 Bacteria Firmicutes Bacilli asparagine synthetase [Bacillus sp. 10403023] GKJWQY101AHUTZ K01438 acetylornithine deacetylase [EC:3.5.1.16] Arginine and proline metabolism 382 382 2 2.0E-64 78% 86%* 228983691 Bacteria Firmicutes Bacilli Acetylornitine deacetylase [Bacillus thuringiensis serovar tochigiensis BGSC 4Y1] GKJWQY101BNQIG K11632 bacitracin transport system permease protein ABC transporters 340 12 338 5.0E-35 60% 77%* 471974841 Bacteria Firmicutes Bacilli FtsX-like permease family protein [Bhargavaea cecembensis DSE10] GKJWQY101BMA7B K00609 aspartate carbamoyltransferase catalytic subunit [EC:2.1.3.2] Pyrimidine metabolism/Alanine, aspartate and glutamate metabolism 182 178 2 6.0E-21 68% 81%* 375089143 Bacteria Firmicutes Bacilli aspartate carbamoyltransferase [Dolosigranulum pigrum ATCC 51524] phosphopantothenoylcysteine decarboxylase / phosphopantothenate--cysteine ligase GKJWQY101AOU9D K13038 Pantothenate and CoA biosynthesis 329 5 319 3.0E-27 52% 64%* 403667677 Bacteria Firmicutes Bacilli phosphopantothenoylcysteine synthetase/decarboxylase [Kurthia sp. JC8E] [EC:6.3.2.5 4.1.1.36] Tyrosine metabolism/Benzoate degradation/Aminobenzoate degradation/Ethylbenzene GKJWQY101BAQ3Y K00680 [EC:2.3.1.-] 407 3 407 2.0E-91 99% 99%* 354806737 Bacteria Firmicutes Bacilli acetyltransferase, GNAT family [Lactobacillus curvatus CRL 705] degradation/Limonene and pinene degradation GKJWQY101ARCYY K01551 arsenite-transporting ATPase [EC:3.6.3.16] Acting on acid anhydrides 398 387 1 5.0E-58 74% 87%* 403071345 Bacteria Firmicutes Bacilli Arsenical pump-driving ATPase [Oceanobacillus sp. Ndiop] HAM1-like protein [Planococcus antarcticus DSM 14505] and pyrophosphate-releasing xanthosine/ GKJWQY101BSHTT K02428 nucleoside-triphosphate pyrophosphatase [EC:3.6.1.19] Purine metabolism/Pyrimidine metabolism 250 1 249 4.0E-43 81% 90%* 389819483 Bacteria Firmicutes Bacilli inosine triphosphatase GKJWQY101BSTX6 K11749 regulator of sigma E protease [EC:3.4.24.-] Cell cycle - Caulobacter 496 494 3 9.0E-82 74% 87%* 389815285 Bacteria Firmicutes Bacilli zinc metalloprotease Lmo1318 [Planococcus antarcticus DSM 14505] Citrate cycle (TCA cycle)/Lysine degradation/Glyoxylate and dicarboxylate metabolism/Carbon GKJWQY101AYPYY K01681 aconitate hydratase 1 [EC:4.2.1.3] 393 392 3 3.0E-81 96% 99%* 389820589 Bacteria Firmicutes Bacilli aconitate hydratase [Planococcus antarcticus DSM 14505] fixation pathways in prokaryotes response regulator [Planococcus donghaensis MPA1U2] region_name = REC (<1..49); region_name = GKJWQY101BV8AW K07668 two-component system, OmpR family, response regulator VicR Two-component system 469 1 468 5.0E-104 96% 97%* 323487712 Bacteria Firmicutes Bacilli trans_reg_C (67..>156) GKJWQY101AF6LQ K02013 iron complex transport system ATP-binding protein [EC:3.6.3.34] ABC transporters 411 4 411 5.0E-79 88% 93%* 323490869 Bacteria Firmicutes Bacilli iron transport system ATP-binding protein [Planococcus donghaensis MPA1U2] GKJWQY101BN853 K04518 prephenate dehydratase [EC:4.2.1.51] Phenylalanine, tyrosine and tryptophan biosynthesis 355 1 354 1.0E-68 86% 95%* 458759422 Bacteria Firmicutes Bacilli Prephenate dehydratase [Planococcus halocryophilus Or1] GKJWQY101BH9UZ K02314 replicative DNA helicase [EC:3.6.4.12] DNA replication/Cell cycle - Caulobacter 341 3 341 7.0E-35 62% 74%* 224477942 Bacteria Firmicutes Bacilli Replicative DNA helicase [Staphylococcus carnosus subsp. carnosus TM300] GKJWQY101B2KLB K00075 UDP-N-acetylmuramate dehydrogenase [EC:1.3.1.98] Amino sugar and nucleotide sugar metabolism/Peptidoglycan biosynthesis 315 1 306 4.0E-55 83% 93%* 289551402 Bacteria Firmicutes Bacilli UDP-N-acetylenolpyruvoylglucosamine reductase [Staphylococcus lugdunensis HKU09-01] GKJWQY101AS6OK K03781 [EC:1.11.1.6] Peroxisome 485 2 484 3.0E-81 76% 85%* 319893610 Bacteria Firmicutes Bacilli catalase [Staphylococcus pseudintermedius HKU10-03] GKJWQY101B09E4 K01533 Cu2+-exporting ATPase [EC:3.6.3.4] Acting on acid anhydrides to catalyse transmembrane movement of substances 208 208 2 2.0E-26 70% 88%* 76799990 Bacteria Firmicutes Bacilli copper-translocating P-type ATPase, partial [Streptococcus agalactiae 18RS21] GKJWQY101A2KTL K00674 2,3,4,5-tetrahydropyridine-2-carboxylate N-succinyltransferase [EC:2.3.1.117] Lysine biosynthesis 234 230 6 1.0E-38 96% 97%* 302024578 Bacteria Firmicutes Bacilli 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-acetyltransferase [Streptococcus suis 05HAS68] hypothetical protein CLOSTASPAR_00313 [Clostridium asparagiforme DSM 15981] note = putative L GKJWQY101BZEK4 K03079 L-ribulose-5-phosphate 3-epimerase [EC:5.1.3.22] Ascorbate and aldarate metabolism 303 303 4 6.0E-50 82% 86%* 225386566 Bacteria Firmicutes Clostridia xylulose 5-phosphate 3-epimerase; reviewed hypothetical protein HMPREF9473_03766 [Clostridium hathewayi WAL-18680] region_name = GKJWQY101AJY6W K10017 histidine transport system ATP-binding protein [EC:3.6.3.21] ABC transporters 236 235 8 9.0E-34 83% 92%* 358065153 Bacteria Firmicutes Clostridia ABC_HisP_GlnQ_permeases GKJWQY101ADGTJ K00567 methylated-DNA-[protein]-cysteine S-methyltransferase [EC:2.1.1.63] DNA repair and recombination [Single Strand Breaks Repair] 346 338 3 1.0E-25 50% 66%* 302387653 Bacteria Firmicutes Clostridia methylated-DNA--protein-cysteine methyltransferase [Clostridium saccharolyticum WM1] nicotinate-nucleotide--dimethylbenzimidazole phosphoribosyltransferase GKJWQY101ARUMF K00768 Porphyrin and chlorophyll metabolism 250 240 4 5.0E-24 63% 81%* 325262503 Bacteria Firmicutes Clostridia nicotinate-nucleotide--dimethylbenzimidazole phosphoribosyltransferase [Clostridium sp. D5] [EC:2.4.2.21] GKJWQY101BVCND K00854 xylulokinase [EC:2.7.1.17] Pentose and glucuronate interconversions 520 519 4 2.0E-73 78% 84%* 332655340 Bacteria Firmicutes Clostridia xylulokinase [Ruminococcaceae bacterium D16] GKJWQY101BSAEK K02006 cobalt/nickel transport system ATP-binding protein ABC transporters 410 6 407 6.0E-50 63% 78%* 332654627 Bacteria Firmicutes Clostridia putative ABC transporter ATP-binding protein [Ruminococcaceae bacterium D16] GKJWQY101AMELW K00335 NADH dehydrogenase I subunit F [EC:1.6.5.3] Oxidative phosphorylation/Nitrogen metabolism 290 7 282 1.0E-44 84% 90%* 253579944 Bacteria Firmicutes Clostridia NADH dehydrogenase I subunit F [Ruminococcus sp. 5_1_39BFAA] putative Ech , subunit EchA [Subdoligranulum variabile DSM 15176] (note = NADH- GKJWQY101BHU3U K05903 NADH dehydrogenase (quinone) [EC:1.6.99.5] Oxidative phosphorylation 444 444 1 1.0E-59 85% 90%* 261368112 Bacteria Firmicutes Clostridia iquinone/plastoquinone (complex I)) GKJWQY101A2NZL K03076 preprotein translocase subunit SecY Bacterial secretion system/Protein export 446 1 444 2.0E-42 59% 78%* 261367783 Bacteria Firmicutes Clostridia preprotein translocase, SecY subunit [Subdoligranulum variabile DSM 15176] GKJWQY101AJC8H K00981 phosphatidate cytidylyltransferase [EC:2.7.7.41] Glycerophospholipid metabolism/Phosphatidylinositol signaling system 154 1 144 5.0E-20 85% 93%* 323142324 Bacteria Firmicutes Negativicutes phosphatidate cytidylyltransferase [Phascolarctobacterium succinatutens YIT 12067] GKJWQY101AATB5 K01362 E3.4.21.- Serine endopeptidases 422 417 1 4.0E-80 94% 94%* 294795056 Bacteria Firmicutes Negativicutes putative serine protease HtrA [Veillonella sp. 3_1_44] GKJWQY101BFG0O K01134 arylsulfatase A [EC:3.1.6.8] Sphingolipid metabolism 202 201 1 2.0E-16 49% 55%* 149198675 Bacteria Lentisphaerae Lentisphaeria arylsulphatase A [Lentisphaera araneosa HTCC2155] GKJWQY101BMOQ0 K00265 glutamate synthase (NADPH/NADH) large chain [EC:1.4.1.14 1.4.1.13] Alanine, aspartate and glutamate metabolism/Nitrogen metabolism 496 458 3 9.0E-88 88% 94%* 283778000 Bacteria Planctomycetes Planctomycetia glutamate synthase (NADH) [Pirellula staleyi DSM 6068] GKJWQY101BX3GV K01654 N-acetylneuraminate synthase [EC:2.5.1.56] Amino sugar and nucleotide sugar metabolism 373 21 371 2.0E-27 53% 66%* 163758286 Bacteria Proteobacteria Alphaproteobacteria N-acetylneuraminic acid synthetase [Hoeflea phototrophica DFL-43] imidazole glycerol phosphate synthase, glutamine amidotransferase subunit [Hoeflea phototrophica GKJWQY101BB17C K02501 glutamine amidotransferase [EC:2.4.2.-] Histidine metabolism 336 335 3 2.0E-44 70% 76%* 163758289 Bacteria Proteobacteria Alphaproteobacteria DFL-43] GKJWQY101ARU2K K07516 3-hydroxyacyl-CoA dehydrogenase [EC:1.1.1.35] Carbon fixation pathways in prokaryotes 425 1 396 5.0E-41 70% 79%* 359790820 Bacteria Proteobacteria Alphaproteobacteria 3-hydroxyacyl-CoA dehydrogenase NAD-binding protein [Mesorhizobium alhagi CCNWXJ12-2] bifunctional 2',3'-cyclic nucleotide 2'-phosphodiesterase/3'-nucleotidase periplasmic precursor protein GKJWQY101BQ2U_2 K01119 2',3'-cyclic-nucleotide 2'-phosphodiesterase [EC:3.1.4.16] Purine metabolism/Pyrimidine metabolism 249 242 3 1.0E-35 86% 90%* 359790610 Bacteria Proteobacteria Alphaproteobacteria [Mesorhizobium alhagi CCNWXJ12-2] GKJWQY101AL5GX K07154 serine/threonine-protein kinase HipA [EC:2.7.11.1] Transferring phosphorus-containing groups 406 405 1 2.0E-70 79% 88%* 357027749 Bacteria Proteobacteria Alphaproteobacteria HipA N-terminal domain-containing protein [Mesorhizobium amorphae CCNWGS0123] GKJWQY101BFOCB K00108 choline dehydrogenase [EC:1.1.99.1] Glycine, serine and threonine metabolism 347 1 345 3.0E-60 82% 90%* 433773923 Bacteria Proteobacteria Alphaproteobacteria choline dehydrogenase-like flavoprotein [Mesorhizobium australicum WSM2073] GKJWQY101ANA9N K11004 ATP-binding cassette, subfamily B, bacterial HlyB/CyaB ABC transporters 491 1 489 3.0E-74 91% 95%* 337270029 Bacteria Proteobacteria Alphaproteobacteria type I secretion system ATPase [Mesorhizobium opportunistum WSM2075] GKJWQY101BQJK1 K11003 hemolysin D Bacterial secretion system 464 3 464 2.0E-72 73% 84%* 337270028 Bacteria Proteobacteria Alphaproteobacteria HlyD family type I secretion membrane fusion protein [Mesorhizobium opportunistum WSM2075] GKJWQY101BEEY9 K14977 ureidoglycine aminohydrolase [EC:3.5.3.-] Purine metabolism 374 3 356 2.0E-69 89% 92%* 220914710 Bacteria Proteobacteria Alphaproteobacteria hypothetical protein [Methylobacterium nodulans ORS 2060] GKJWQY101BXGRA K09461 anthraniloyl-CoA monooxygenase [EC:1.14.13.40] Aminobenzoate degradation 242 240 1 1.0E-41 85% 93%* 440227685 Bacteria Proteobacteria Alphaproteobacteria 2-amninobenzoyl-CoA monooxygenase/reductase [Rhizobium tropici CIAT 899] GKJWQY101AXQRA K11074 putrescine transport system permease protein ABC transporters 469 22 468 9.0E-43 54% 71%* 294676929 Bacteria Proteobacteria Alphaproteobacteria polyamine ABC transporter permease PotC [Rhodobacter capsulatus SB 1003] GKJWQY101AZ7ML K03551 holliday junction DNA helicase RuvB [EC:3.6.4.12] Homologous recombination 336 325 2 2.0E-56 78% 90%* 83592425 Bacteria Proteobacteria Alphaproteobacteria Holliday junction DNA helicase RuvB [Rhodospirillum rubrum ATCC 11170] GKJWQY101A9B69 K00067 dTDP-4-dehydrorhamnose reductase [EC:1.1.1.133] Polyketide sugar unit biosynthesis/Streptomycin biosynthesis 457 5 457 5.0E-60 58% 76%* 15892379 Bacteria Proteobacteria Alphaproteobacteria dTDP-4-dehydrorhamnose reductase. [ str. Malish 7] GKJWQY101AHO0M K00370 nitrate reductase 1, alpha subunit [EC:1.7.99.4] Nitrogen metabolism/Two-component system 130 1 129 2.0E-13 79% 83%* 86137746 Bacteria Proteobacteria Alphaproteobacteria respiratory nitrate reductase, alpha subunit [Roseobacter sp. MED193] GKJWQY101BUN1T K01919 glutamate--cysteine ligase [EC:6.3.2.2] Glutathione metabolism 150 148 8 2.0E-23 98% 100%* 393719433 Bacteria Proteobacteria Alphaproteobacteria glutamate--cysteine ligase [Sphingomonas echinoides ATCC 14820] GKJWQY101A4KSO K01181 endo-1,4-beta-xylanase [EC:3.2.1.8] Glycosidases[hydrolyse O- and S-glycosyl] 379 3 275 6.0E-33 65% 75%* 393718930 Bacteria Proteobacteria Alphaproteobacteria endo-1,4-beta-xylanase [Sphingomonas echinoides ATCC 14820] GKJWQY101B0W4V K02065 putative ABC transport system ATP-binding protein Putative ABC transport system 341 341 3 3.0E-49 74% 76%* 94498321 Bacteria Proteobacteria Alphaproteobacteria ABC transporter, ATP-binding protein [Sphingomonas sp. SKA58] GKJWQY101BKTPU K11070 spermidine/putrescine transport system permease protein ABC transporters 416 404 3 2.0E-53 100% 100%* 116628226 Bacteria Proteobacteria Alphaproteobacteria spermidine/putrescine ABC transporter, permease protein [Loktanella vestfoldensis SKA53] GKJWQY101BF22D K01243 S-adenosylhomocysteine/5'-methylthioadenosine nucleosidase [EC:3.2.2.9] Cysteine and methionine metabolism 249 249 7 2.0E-29 68% 81%* 351728658 Bacteria Proteobacteria Betaproteobacteria mta/sah nucleosidase [Acidovorax radicis N35] GKJWQY101A11NA K02111 F-type H+-transporting ATPase subunit alpha [EC:3.6.3.14] Oxidative phosphorylation/Photosynthesis 184 184 2 4.0E-34 98% 100%* 402569635 Bacteria Proteobacteria Betaproteobacteria F0F1 ATP synthase subunit alpha [Burkholderia cepacia GG4] GKJWQY101AEBTJ K00619 amino-acid N-acetyltransferase [EC:2.3.1.1] Arginine and proline metabolism 447 447 1 6.0E-48 60% 73%* 407712609 Bacteria Proteobacteria Betaproteobacteria amino-acid N-acetyltransferase [Burkholderia phenoliruptrix BR3459a] two-component system, OmpR family, heavy metal sensor histidine kinase CusS GKJWQY101BRBAH K07644 Two-component system 418 413 3 8.0E-41 57% 67%* 78063616 Bacteria Proteobacteria Betaproteobacteria heavy metal sensor signal transduction histidine kinase [Burkholderia sp. 383] [EC:2.7.13.3] GKJWQY101BPZWT K09758 aspartate 4-decarboxylase [EC:4.1.1.12] Alanine, aspartate and glutamate metabolism 373 372 1 2.0E-50 70% 80%* 385206715 Bacteria Proteobacteria Betaproteobacteria aspartate 4-decarboxylase [Burkholderia sp. Ch1-1] GKJWQY101ABF0Q K00433 chloride peroxidase [EC:1.11.1.10] Oxidoreductases [Acting on a peroxide as acceptor] 374 374 3 2.0E-81 98% 99%* 387903695 Bacteria Proteobacteria Betaproteobacteria Non-heme chloroperoxidase [Burkholderia sp. KJ006] GKJWQY101BP6MS K07798 Cu(I)/Ag(I) efflux system membrane protein CusB Two-component system 384 378 1 1.0E-52 78% 82%* 413959210 Bacteria Proteobacteria Betaproteobacteria copper efflux system membrane protein [Burkholderia sp. SJ98] GKJWQY101B1X48 K07787 Cu(I)/Ag(I) efflux system membrane protein CusA Two-component system 484 1 483 1.0E-100 98% 97%* 416909337 Bacteria Proteobacteria Betaproteobacteria CzcA family heavy metal efflux protein, partial [Burkholderia sp. TJI49] GKJWQY101BN6IV_2 polar amino acid ABC transporter inner membrane subunit ABC transporters 189 187 2 1.0E-19 65% 83%* 390570584 Bacteria Proteobacteria Betaproteobacteria polar amino acid ABC transporter inner membrane subunit [Burkholderia terrae BS001] GKJWQY101BN6IV K02041 phosphonate transport system ATP-binding protein ABC transporters 210 209 3 2.0E-33 84% 94%* 390570583 Bacteria Proteobacteria Betaproteobacteria ABC transporter [Burkholderia terrae BS001] GKJWQY101BGCC7 K01173 endonuclease G, mitochondrial Apoptosis 359 358 2 9.0E-45 61% 75%* 445497248 Bacteria Proteobacteria Betaproteobacteria DNA/RNA non-specific endonuclease [Janthinobacterium sp. HH01] GKJWQY101AQ218 K00666 fatty-acyl-CoA synthase [EC:6.2.1.-] Lipid biosynthesis [Ligases forming carbon-sulfur bonds] 388 2 338 2.0E-31 50% 59%* 73541092 Bacteria Proteobacteria Betaproteobacteria long-chain-fatty-acid--CoA ligase [Ralstonia eutropha JMP134] hypothetical protein HMPREF0989_00609 [Ralstonia sp. 5_2_56FAA] note = ABC-type GKJWQY101BS60U K02051 sulfonate/nitrate/taurine transport system substrate-binding protein NitT/TauT family transport system 356 3 356 1.0E-72 97% 99%* 404395813 Bacteria Proteobacteria Betaproteobacteria nitrate/sulfonate/bicarbonate transport systems, periplasmic components two-component system, OmpR family, phosphate regulon sensor histidine kinase GKJWQY101BGPT5 K07636 Two-component system 386 1 384 3.0E-82 98% 99%* 309778701 Bacteria Proteobacteria Betaproteobacteria sensor protein CzcS [Ralstonia sp. 5_7_47FAA] PhoR [EC:2.7.13.3] hypothetical protein DESPIG_01487 [Desulfovibrio piger ATCC 29098] (note: D-cysteine GKJWQY101ABL6A K05396 D-cysteine desulfhydrase [EC:4.4.1.15] Cysteine and methionine metabolism 372 1 372 2.0E-28 50% 64%* 212703443 Bacteria Proteobacteria Deltaproteobacteria desulfhydrase; Validated) GKJWQY101BSHB9 K01153 type I , R subunit [EC:3.1.21.3] Endodeoxyribonucleases producing 5'-phosphomonoesters 399 394 2 3.0E-54 68% 81%* 404494602 Bacteria Proteobacteria Deltaproteobacteria type I restriction-modification system, restriction subunit [Pelobacter carbinolicus DSM 2380] 218 Table S11 Cont.

GKJWQY101BOU8W_2 K02046 sulfate transport system permease protein ABC transporters 264 263 3 2.0E-51 100% 100%* 71065631 Bacteria Proteobacteria Gammaproteobacteria ABC sulfate/thiosulfate transporter inner membrane protein CysT [Psychrobacter arcticus 273-4] GKJWQY101AHO0M_2 K00370 nitrate reductase 1, alpha subunit [EC:1.7.99.4] Nitrogen metabolism/Two-component system 304 7 294 1.0E-54 91% 98%* 262374579 Bacteria Proteobacteria Gammaproteobacteria nitrate reductase, alpha subunit [Acinetobacter junii SH205] GKJWQY101BWY0I K01669 deoxyribodipyrimidine photo-lyase [EC:4.1.99.3] DNA repair and recombination proteins 139 2 124 9.0E-11 81% 88%* 410861788 Bacteria Proteobacteria Gammaproteobacteria deoxyribodipyrimidine photo-lyase [ macleodii AltDE1] GKJWQY101AF0JM K01273 membrane dipeptidase [EC:3.4.13.19] Family M19: membrane dipeptidase family 486 486 1 7.0E-98 66% 84%* 226944628 Bacteria Proteobacteria Gammaproteobacteria peptidase M19 [Azotobacter vinelandii DJ] GKJWQY101A4OKZ K00097 4-hydroxythreonine-4-phosphate dehydrogenase [EC:1.1.1.262] Vitamin B6 metabolism 234 6 233 8.0E-44 99% 98%* 352106048 Bacteria Proteobacteria Gammaproteobacteria 4-hydroxythreonine-4-phosphate dehydrogenase [Halomonas sp. HAL1] alanine-glyoxylate transaminase / serine-glyoxylate transaminase / serine-pyruvate Alanine, aspartate and glutamate metabolism/Glycine, serine and threonine metabolism/Glyoxylate Serine--glyoxylate aminotransferase [Marinobacter sp. BSs20148], alanine-glyoxylate GKJWQY101A4RPZ K00830 319 2 316 8.0E-60 90% 92%* 399543417 Bacteria Proteobacteria Gammaproteobacteria transaminase [EC:2.6.1.51 2.6.1.45 2.6.1.44] and dicarboxylate metabolism/Methane metabolism aminotransferase (AGAT) family GKJWQY101ASOB_2 K00281 glycine dehydrogenase [EC:1.4.4.2] Glycine, serine and threonine metabolism 96 1 96 2.0E-09 81% 93%* 89093021 Bacteria Proteobacteria Gammaproteobacteria glycine dehydrogenase [Neptuniibacter caesariensis] two-component system, OmpR family, osmolarity sensor histidine kinase EnvZ GKJWQY101AOVFW K07638 Two-component system 244 243 1 5.0E-47 93% 97%* 400287352 Bacteria Proteobacteria Gammaproteobacteria histidine kinase [Psychrobacter sp. PAMC 21119] note = osmolarity sensor protein [EC:2.7.13.3] proline dehydrogenase / delta 1-pyrroline-5-carboxylate dehydrogenase bifunctional proline dehydrogenase/pyrroline-5-carboxylate dehydrogenase [Psychrobacter sp. PAMC GKJWQY101ALPLG K13821 Alanine, aspartate and glutamate metabolism 480 2 478 2.0E-93 92% 95%* 400288736 Bacteria Proteobacteria Gammaproteobacteria [EC:1.5.1.12 1.5.99.8] 21119] GKJWQY101APEXF K02013 iron complex transport system ATP-binding protein [EC:3.6.3.34] ABC transporters 258 2 256 3.0E-34 76% 84%* 400286648 Bacteria Proteobacteria Gammaproteobacteria ABC transporter-like protein [Psychrobacter sp. PAMC 21119] Glycine, serine and threonine metabolism/Arginine and proline metabolism/Histidine GKJWQY101AVAMV K00274 monoamine oxidase [EC:1.4.3.4] metabolism/Tyrosine metabolism/Phenylalanine metabolism/Tryptophan metabolism/Isoquinoline 300 2 295 9.0E-43 93% 96%* 400287901 Bacteria Proteobacteria Gammaproteobacteria L-amino acid oxidase [Psychrobacter sp. PAMC 21119] alkaloid biosynthesis GKJWQY101AOGKN K00412 ubiquinol-cytochrome c reductase cytochrome b subunit [EC:1.10.2.2] Oxidative phosphorylation/Nitrogen metabolism/Two-component system 335 333 22 8.0E-38 95% 96%* 400288130 Bacteria Proteobacteria Gammaproteobacteria cytochrome b/b6-like protein [Psychrobacter sp. PAMC 21119] GKJWQY101BP7PB K00604 methionyl-tRNA formyltransferase [EC:2.1.2.9] One carbon pool by folate/Aminoacyl-tRNA biosynthesis 265 209 3 2.0E-34 90% 97%* 400287034 Bacteria Proteobacteria Gammaproteobacteria methionyl-tRNA formyltransferase [Psychrobacter sp. PAMC 21119] pyrimidine operon attenuation protein / uracil phosphoribosyltransferase GKJWQY101AOB8M K02825 Pyrimidine metabolism 234 6 233 1.0E-20 60% 69%* 223936078 Bacteria Verrucomicrobia Verrucomicrobiae Uracil phosphoribosyltransferase [Pedosphaera parvula Ellin514] [EC:2.4.2.9] GKJWQY101A5QWZ K01881 prolyl-tRNA synthetase [EC:6.1.1.15] Aminoacyl-tRNA biosynthesis 362 2 352 2.0E-17 43% 62%* 260947818 Eukaryota Ascomycota Saccharomycotina hypothetical protein CLUG_01665 [Clavispora lusitaniae ATCC 42720] hypothetical protein MGL_0809 [Malassezia globosa CBS 7966] note = oligosaccharyltransferase 48 GKJWQY101A3PTQ K12670 oligosaccharyltransferase complex subunit beta Protein processing in endoplasmic reticulum 383 382 2 7.0E-59 72% 86%* 164662188 Eukaryota Basidiomycota Ustilaginomycotina kDa subunit beta; pfam03345 GKJWQY101BPKHW K14018 phospholipase A-2-activating protein Protein processing in endoplasmic reticulum 439 437 3 5.0E-48 62% 72%* 164657618 Eukaryota Basidiomycota Ustilaginomycotina hypothetical protein MGL_2921 [Malassezia globosa CBS 7966] hypothetical protein MGL_0312 [Malassezia globosa CBS 7966] note = 1,3-beta-glucan synthase GKJWQY101AXJYB K00706 1,3-beta-glucan synthase [EC:2.4.1.34] Starch and sucrose metabolism 397 384 16 3.0E-72 91% 92%* 164662831 Eukaryote Basidiomycota Exobasidiomycetes component Pentose and glucuronate interconversions/Ascorbate and aldarate metabolism/Amino sugar and GKJWQY101BVVMM K00012 UDPglucose 6-dehydrogenase [EC:1.1.1.22] 278 15 269 1.0E-47 93% 96%* 164657686 Eukaryote Basidiomycota Ustilaginomycotina hypothetical protein MGL_2955 [Malassezia globosa CBS 7966] nucleotide sugar metabolism/Starch and sucrose metabolism 219

Table S12. Blastn and Blastx results from analysis of V6 sequences on the KAAS KEGG site (Moriya et al. 2007). The searches were for highly similar sequences (megablast); Max. target sequences = 100; Expected threshold = 1e-10 (unless no results were found, then 0); filter low complexity regions and translated nucleotide search over Reference sequence protein database; Matrix - BLOSUM62; Scoring parameters (existence 11; extension 1); filter low complexity regions. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n". For the Blastx results instead of percent similarity the values are for the percent positive those are marked with "*".

Orthology Q Q Q %- %- 454 Sequence number KAAS KEGG Enzyme name Pathways Names e-value GI number Domain Phyla Class / Order Description # length start end ident simil GJDB4OT01BSP4O K01903 succinyl-CoA synthetase beta subunit Citrate cycle (TCA cycle)/Propanoate 176 1 174 1.00E-76 97% 97% 299065054 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum str. CMR15 chromosome, complete [EC:6.2.1.5] metabolism/C5-Branched dibasic acid genome, product = succinate-CoA ligase (ADP-forming) metabolism/Carbon fixation pathways in beta subunit prokaryotes GJDB4OT01BKJYT K01154 type I restriction enzyme, S subunit Endodeoxyribonucleases 122 1 122 3.00E-51 98% 98% 407894523 Bacteria Proteobacteria Betaproteobacteria Acidovorax sp. KKS102, complete genome, product = [EC:3.1.21.3] restriction modification system DNA specificity subunit

GJDB4OT01ARL6R K00928 aspartate kinase [EC:2.7.2.4] Glycine, serine and threonine 280 1 280 4.00E-129 97% 97% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, metabolism/Cysteine and methionine product = aspartokinase metabolism/Lysine biosynthesi GJDB4OT01A4BOH K03933 chitin-binding protein n/a 214 1 214 5.00E-82 93% 93% 77964193 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. 383 chromosome 3, complete sequence, product = chitin-bindingprotein GJDB4OT01AZQW8 K06891 ATP-dependent Clp protease adaptor protein n/a 106 1 106 2.00E-33 92% 92% 334194119 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum Po82, complete genome, product = ClpS atp-dependent clp protease adaptor protein clps

GJDB4OT01BVZ09 K07486 Transposase and inactivated derivatives n/a 202 1 202 8.00E-31 79% 79% 387580705 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 3, complete sequence, product = transposase GJDB4OT01BOSSW K03702 excinuclease ABC subunit B Nucleotide excision repair 252 1 252 3.00E-109 96% 96% 133737197 Bacteria Proteobacteria Betaproteobacteria Herminiimonas arsenicoxydans chromosome, complete sequence, product = UvrABC system protein B (Protein uvrB)(Excinuclease ABC subunit B) GJDB4OT01BUG3G K02275 subunit II [EC:1.9.3.1] Oxidative phosphorylation 165 1 165 1.00E-69 97% 97% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = cytochrome c oxidase polypeptide II

GJDB4OT01BUG3G_2 K02275 cytochrome c oxidase subunit II [EC:1.9.3.1] Oxidative phosphorylation 106 1 106 2.00E-47 100% 100% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = cytochrome c oxidase polypeptide II

GJDB4OT01AOX6C K02298 cytochrome o ubiquinol oxidase subunit I Oxidative phosphorylation 181 1 181 1.00E-83 98% 98% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, [EC:1.10.3.-] product = cytochrome O ubiquinol oxidase subunit I

GJDB4OT01BBC3K K00615 transketolase [EC:2.2.1.1] Pentose phosphate pathway/Carbon fixation in 238 1 238 1.00E-117 99% 99% 356871503 Eukaryota Ascomycota Saccharomycetes Pichia sorbitophila strain CBS 7064 chromosome J complete photosynthetic organisms/Biosynthesis of sequence, product = Piso0_002788 (Transketolase similar to ansamycins Tkl2p) GJDB4OT01B2XJX K01485 cytosine deaminase [EC:3.5.4.1] Pyrimidine metabolism/Arginine and proline 138 1 138 7.00E-45 91% 91% 30407127 Bacteria Proteobacteria Betaproteobacteria Ralstonia solanacearum GMI1000 chromosome complete metabolism sequence, product = probable cytosine deaminase (cytosine aminohydrolase)protein GJDB4OT01BHOMH K02892 large subunit ribosomal protein L23 Ribosome 271 1 271 2.00E-131 99% 99% 387575654 Bacteria Proteobacteria Betaproteobacteria Burkholderia sp. KJ006 chromosome 1, complete sequence, product = LSU ribosomal protein L4p (L1e) (<1..28); LSU ribosomal protein L23p (L23Ae) (25..>269)

GJDB4OT01AZ705 K03566 LysR family transcriptional regulator, glycine Transcription factors 215 1 215 1.00E-77 92% 92% 187713229 Bacteria Proteobacteria Betaproteobacteria Burkholderia phytofirmans PsJN chromosome 1, complete cleavage system transcriptional activator sequence, product = transcriptional regulator, LysR family

GJDB4OT01AOVN9 K01533 Cu2+-exporting ATPase [EC:3.6.3.4] Acting on acid anhydrides to catalyse 174 1 174 1.00E-16 72 79%* 209517887 Bacteria Proteobacteria Betaproteobacteria heavy metal translocating P-type ATPase [Burkholderia sp. transmembrane movement of substances H160] GJDB4OT01AHOQ9 K03818 putative colanic acid biosynthesis Acyltransferases 228 226 2 5.00E-26 67 75%* 311107671 Bacteria Proteobacteria Betaproteobacteria colanic acid biosynthesis acetyltransferase WcaF acetyltransferase WcaF [EC:2.3.1.-] [Achromobacter xylosoxidans A8] GJDB4OT01A77FL K01791 UDP-N-acetylglucosamine 2-epimerase Amino sugar and nucleotide sugar metabolism 270 259 8 6.00E-32 74 77%* 300313505 Bacteria Proteobacteria Betaproteobacteria UDP-N-acetylglucosamine 2-epimerase [Herbaspirillum [EC:5.1.3.14] seropedicae SmR1] GJDB4OT01ALI0S K01919 glutamate--cysteine ligase [EC:6.3.2.2] Glutathione metabolism 174 1 174 7.00E-19 74 81%* 319764755 Bacteria Proteobacteria Betaproteobacteria glutamate/cysteine ligase [Alicycliphilus denitrificans BC]

GJDB4OT01BVG95 K07008 Predicted glutamine amidotransferase Histidine metabolism 190 1 189 4.00E-32 81 96%* 351728124 Bacteria Proteobacteria Betaproteobacteria glutamine amidotransferase class-II, partial [Acidovorax radicis N35] GJDB4OT01A179C K06942 Predicted GTPase, probable translation factor n/a 122 121 2 2.00E-19 100 100%* 223043362 Bacteria Firmicutes Bacilli GTP-binding protein YchF [Staphylococcus capitis SK14]

GJDB4OT01AO7LF K02259 cytochrome c oxidase assembly protein subunitOxidative phosphorylation/Porphyrin and 171 1 171 1.00E-23 81 92%* 418245854 Bacteria Actinobacteria Actinobacteria cytochrome c oxidase subunit XV assembly protein 15 chlorophyll metabolism/Two-component [Corynebacterium glutamicum ATCC 14067] system GJDB4OT01ACOWR K08483 phosphotransferase system, enzyme I, PtsI Phosphotransferase system 186 1 186 3.00E-31 93 95%* 149926016 Bacteria Proteobacteria Betaproteobacteria Phosphoenolpyruvate-protein phosphotransferase [EC:2.7.3.9] [Limnobacter sp. MED105] GJDB4OT01BQTBQ K07799 putative multidrug efflux transporter MdtA Two-component system 259 3 254 1.00E-21 70 74%* 417708018 Bacteria Proteobacteria Gammaproteobacteria efflux transporter, RND family, MFP subunit [Shigella flexneri VA-6] 220

Table S13. Sequences removed from the V5 data set that were identical or similar to sequence from controls. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Accession number / Q Q %- Q end e-value GI number Domain Phylum Family Genus / Species sequence number length start ident JQ997204 550 17 547 0 90% 262527504 Bacteria Actinobacteria Intrasporangiaceae Janibacter sp. N2M JQ997210 558 14 446 0 100% 14252975 Bacteria Actinobacteria Microbacteriaceae Clavibacter michiganensis JQ999635 554 18 508 0 95% 95117795 Bacteria Actinobacteria Microbacteriaceae Clavibacter michiganensis JQ997209 542 5 535 0 94% 254728761 Bacteria Actinobacteria Microbacteriaceae Clavibacter michiganensis GKJWQY101ABFJ2 584 18 583 0 95% 147829108 Bacteria Actinobacteria Microbacteriaceae Clavibacter michiganensis GKJWQY101BBWIZ 602 7 594 0 92% 169155030 Bacteria Actinobacteria Microbacteriaceae Clavibacter michiganensis GKJWQY101BOGGY 580 18 576 0 94% 227452846 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium aurimucosum GKJWQY101BNKEW 560 16 500 5.00E-164 89% 47118314 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium efficiens GKJWQY101AS6C1 579 17 578 0 89% 219857661 Bacteria Actinobacteria Micrococcaceae Arthrobacter chlorophenolicus GKJWQY101AGV6J 570 5 567 0 89% 116608677 Bacteria Actinobacteria Micrococcaceae Arthrobacter sp. FB24 JQ997264 554 5 554 0 99% 157703991 Bacteria Actinobacteria Micrococcaceae Micrococcus sp. SY-13 JQ997244 536 5 376 0 99% 53986985 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997241 378 5 324 1.00E-158 99% 108745470 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997249 561 5 558 0 91% 121078653 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997246 552 15 532 0 95% 209875195 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997247 558 156 491 5.00E-124 92% 219523974 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997248 559 5 554 0 100% 219809053 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997242 395 14 362 1.00E-178 99% 255683811 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997250 581 18 562 0 92% 284010036 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997243 533 45 531 0 95% 291170385 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus JQ997245 549 18 548 0 97% 295027175 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus GKJWQY101ANDPA 579 7 578 0 96% 239837778 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus GKJWQY101BB4KB 552 2 534 0 92% 283133067 Bacteria Actinobacteria Micrococcaceae Rothia mucilaginosa JQ997271 545 21 542 0 100% 290759827 Bacteria Actinobacteria Micrococcaceae Rothia mucilaginosa GKJWQY101AYY7N 514 17 463 1.00E-149 89% 145213092 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium gilvum GKJWQY101AHIKR 502 14 449 3.00E-81 80% 126232413 Bacteria Actinobacteria Mycobacteriaceae Mycobacterium sp. JLS GKJWQY101BW78O 526 17 230 6.00E-83 93% 258553496 Bacteria Actinobacteria Nakamurellaceae Nakamurella multipartita GKJWQY101BNTM3 401 18 370 9.00E-145 93% 255918463 Bacteria Actinobacteria Actinosynnemataceae Actinosynnema mirum GKJWQY101B0WYX 571 17 571 0 88% 229564415 Bacteria Actinobacteria Beutenbergiaceae Beutenbergia cavernae GKJWQY101BWYS7 570 16 565 0 94% 256558041 Bacteria Actinobacteria Dermabacteraceae Brachybacterium faecium GKJWQY101BID96 589 24 584 0 93% 284061874 Bacteria Actinobacteria Geodermatophilaceae Geodermatophilus obscurus GKJWQY101B0W1Z 560 18 553 0 89% 262083393 Bacteria Actinobacteria Gordoniaceae Gordonia bronchialis GKJWQY101AS57B 556 5 548 0 89% 196121877 Bacteria Actinobacteria Kineosporiaceae Kineococcus radiotolerans GKJWQY101BURT9 573 18 561 0 90% 226237899 Bacteria Actinobacteria Nocardiaceae Rhodococcus opacus GKJWQY101AL5D7 537 17 483 0 97% 119534933 Bacteria Actinobacteria Nocardiaceae Nocardioides sp. JS614 GKJWQY101BJOST 547 8 546 0 98% 50839098 Bacteria Actinobacteria Propionibacteriaceae Propionibacterium acnes GKJWQY101BQS08 598 4 584 0 95% 291375051 Bacteria Actinobacteria Propionibacteriaceae Propionibacterium acnes GKJWQY101BYVMH 599 5 363 1.00E-114 88% 270504784 Bacteria Actinobacteria Streptosporangiaceae Streptosporangium roseum GKJWQY101A0TH4 586 17 583 0 95% 257472321 Bacteria Actinobacteria Coriobacteriaceae Atopobium parvulum GKJWQY101A3ZEU 584 18 577 1.00E-160 86% 256033965 Bacteria Bacteroidetes Chitinophagaceae Chitinophaga pinensis JQ997450 586 5 346 1.00E-164 98% 13774313 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997424 341 16 294 1.00E-127 97% 13774314 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997437 542 5 487 0 100% 19070784 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997442 549 18 543 0 93% 19070787 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997445 555 17 552 0 100% 19070788 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997444 553 1 500 0 96% 19070791 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997421 275 5 242 4.00E-87 92% 19070794 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997433 482 18 434 0 95% 19070795 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997425 361 19 292 1.00E-138 100% 19070798 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997438 542 12 539 0 94% 19070799 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997428 429 3 386 0 99% 19070802 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997431 445 18 410 0 99% 19070804 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997452 693 17 349 3.00E-152 97% 19070806 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997430 430 5 388 1.00E-124 89% 19070807 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997439 544 20 522 0 92% 19070809 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997432 464 18 417 0 96% 19070810 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997420 267 5 230 5.00E-96 95% 19070813 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997447 560 5 196 5.00E-89 98% 19070814 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997429 429 3 324 7.00E-146 96% 19070815 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997422 313 3 266 1.00E-132 100% 19070816 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997440 546 5 543 0 96% 19070817 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997423 330 18 306 2.00E-145 99% 19070818 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997427 413 7 364 2.00E-141 93% 19070819 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997441 548 5 543 0 94% 19070820 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus 221 Table S13 Cont.

JQ997426 368 18 331 8.00E-160 99% 57996791 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997435 536 5 508 0 99% 57996792 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997451 610 24 398 2.00E-163 95% 149364148 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997446 556 5 553 0 98% 149364157 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997436 538 23 537 0 93% 149364161 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997448 560 5 560 0 94% 149364212 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997449 564 4 561 0 94% 171474880 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997434 524 5 385 0 98% 260534371 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101AJSLD 166 5 151 8.00E-67 99% 19070792 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101ALAE0 352 84 244 2.00E-71 98% 19070793 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101BAG81 118 3 106 2.00E-46 100% 19070796 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101BPKV7 194 20 163 1.00E-66 99% 19070797 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101BSRBZ 199 18 169 4.00E-71 99% 19070801 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101BYSFG 120 14 88 1.00E-28 99% 19070808 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus GKJWQY101A7XVF 188 5 144 3.00E-66 100% 149364147 Bacteria Cyanobacteria Phormidiaceae Microcoleus vaginatus JQ997463 552 21 549 0 93% 167508130 Bacteria Cyanobacteria Phormidiaceae Phormidium amoenum JQ998746 238 13 238 2.00E-105 98% 46409893 Bacteria Cyanobacteria Oscillatoriaceae Oscillatoria JQ997394 642 5 635 0 91% 225696244 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A8PWD_2 97 1 97 1.00E-42 100% 225696243 Bacteria Cyanobacteria n uncultured cyanobacterium GKJWQY101A8PV9 556 5 555 0 91% 212559329 Bacteria Firmicutes Bacillaceae Anoxybacillus flavithermus GKJWQY101AEF0G 232 5 135 4.00E-61 100% 294987211 Bacteria Firmicutes Bacillaceae Anoxybacillus flavithermus GKJWQY101A52X2 566 10 522 0 93% 239805877 Bacteria Firmicutes Bacillaceae Geobacillus sp. WCH70 GKJWQY101BMIJM 565 1 560 0 95% 168990106 Bacteria Firmicutes Bacillaceae Lysinibacillus sphaericus GKJWQY101A1C8O 551 23 498 0 92% 291482254 Bacteria Firmicutes Clostridiaceae Clostridium difficile GKJWQY101BEI1M 547 24 521 0 93% 291482099 Bacteria Firmicutes Clostridiaceae Clostridium difficile GKJWQY101BG2WT 586 5 580 0 92% 291482100 Bacteria Firmicutes Clostridiaceae Clostridium difficile GKJWQY101BAWHO 552 4 531 0 91% 291482251 Bacteria Firmicutes Clostridiaceae Clostridium difficile GKJWQY101B14ZL 587 5 585 0 90% 291482250 Bacteria Firmicutes Clostridiaceae Clostridium difficile GKJWQY101A7XRP 573 18 570 0 92% 115249003 Bacteria Firmicutes Clostridiaceae Clostridium difficile GKJWQY101AQWZZ 574 18 571 0 89% 295102939 Bacteria Firmicutes Clostridiaceae Faecalibacterium prausnitzii GKJWQY101AGCFL 593 5 590 0 91% 291537741 Bacteria Firmicutes Lachnospiraceae Roseburia intestinalis GKJWQY101AKNEB 590 18 104 7.00E-28 94% 260066140 Bacteria Firmicutes Staphylococcaceae Staphylococcus aureus GKJWQY101A4B8O 578 5 565 0 88% 149944932 Bacteria Firmicutes Staphylococcaceae Staphylococcus aureus GKJWQY101BM94T 530 5 527 0 89% 47118312 Bacteria Firmicutes Staphylococcaceae Staphylococcus aureus GKJWQY101BOSAK 552 5 552 0 97% 9664721 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101BO8W7 564 17 560 0 99% 9664737 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101AO1AU 556 18 534 0 99% 9664799 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101A7HX3 540 18 538 3.00E-180 89% 9623643 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101ALP91 553 18 553 0 100% 9624258 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101BUHRL 539 33 440 6.00E-168 93% 27316888 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101BOC3N 562 24 533 0 97% 9664635 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101BML37 564 17 530 0 99% 9664791 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis JQ997602 218 5 149 1.00E-45 91% 223016892 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis JQ999663 489 3 350 5.00E-178 99% 31044171 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis JQ999664 507 18 431 0 100% 31044172 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis JQ999662 437 5 393 0 99% 31044173 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis JQ999661 344 5 267 4.00E-128 99% 213688819 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis GKJWQY101BLWUI 529 18 474 0 97% 295029968 Bacteria Firmicutes Lactobacillaceae Lactobacillus crispatus JQ997626 290 17 199 3.00E-53 89% 285198791 Bacteria Firmicutes Lactobacillaceae Lactobacillus crispatus GKJWQY101BFEY2 607 18 605 0 99% 160347623 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101A9I15 579 5 571 0 95% 111610219 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101AMEIU 538 18 536 0 95% 112148580 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101ANUFG 478 5 472 0 100% 13398532 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101AWVPF 572 18 569 0 94% 157272205 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101AJJAJ 390 18 357 5.00E-177 100% 111610285 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101BYWHX 641 17 355 4.00E-140 94% 111610140 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101ATCEK 553 1 549 0 100% 3282340 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101B1CMN 543 4 542 0 98% 112148551 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101A41CM 532 18 505 0 94% 157272234 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101ACW54 527 4 524 0 100% 111610264 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101ARFS8 557 18 357 7.00E-167 98% 133917173 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101BSMTE 558 18 554 0 96% 111610178 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101AE8IV 527 18 523 0 99% 157272211 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101AGVG5 509 19 503 0 97% 2687734 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101AD5HW 559 78 517 0 98% 111610125 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101A4URK 517 18 516 0 97% 157272233 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101BJRWH 560 16 553 0 98% 157272216 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101BJ0YC 532 5 460 0 100% 111610270 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997639 332 17 288 4.00E-132 99% 6537242 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus 222 Table S13 Cont.

JQ997638 246 28 198 3.00E-73 97% 30060365 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997643 428 17 390 0 100% 53766373 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997650 554 15 538 0 91% 57231848 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997641 391 18 339 2.00E-165 100% 77681103 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997648 533 5 531 0 99% 121581962 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997642 407 16 346 5.00E-157 97% 168208498 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997645 526 23 522 0 99% 224924260 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997646 528 21 522 0 99% 225029237 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997640 370 18 342 2.00E-150 97% 226815166 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997649 535 18 531 0 100% 239586136 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997647 532 17 529 0 99% 270513917 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus JQ997644 446 17 407 0 100% 290784156 Bacteria Firmicutes Lactobacillaceae Lactobacillus helveticus GKJWQY101B1IA6 116 3 44 5.00E-12 100% 285801734 Bacteria Firmicutes Streptococcaceae Streptococcus mitis GKJWQY101AKWUI 575 16 574 0 91% 288906474 Bacteria Firmicutes Streptococcaceae Streptococcus mitis JQ997694 532 18 527 0 98% 262286142 Bacteria Firmicutes Streptococcaceae Streptococcus mitis JQ997717 387 17 353 3.00E-99 88% 285195837 Bacteria Firmicutes Streptococcaceae Streptococcus sanguinis GKJWQY101AODH5 587 5 587 0 95% 125496804 Bacteria Firmicutes Streptococcaceae Streptococcus sanguinis JQ997718 740 18 370 0 100% 295002597 Bacteria Firmicutes Streptococcaceae Streptococcus sanguinis JQ999700 546 5 546 0 99% 45597365 Bacteria Firmicutes Streptococcaceae Streptococcus sanguinis GKJWQY101BCX33 567 18 560 0 99% 24473733 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101BGY7W 543 24 521 0 94% 154424882 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101BXHH4 536 5 483 0 100% 46019822 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101ANZOM 524 18 517 0 96% 221047219 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101AG25G 267 3 204 2.00E-100 100% 284080586 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101BB72C 419 4 354 1.00E-173 98% 90655828 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101AWY8E 592 27 306 2.00E-69 84% 15485427 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101AY5Z9 543 18 540 0 99% 6708106 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101AXEED 594 18 590 0 98% 55737978 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101A4EQD 585 5 579 0 97% 55736088 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus GKJWQY101BO375 620 18 609 0 92% 116100249 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ999701 524 17 440 0 100% 288525 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997733 530 18 492 0 95% 152002890 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997732 508 16 455 0 99% 157400510 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997730 434 18 309 3.00E-150 100% 162945243 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997728 368 21 313 6.00E-151 100% 187475307 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997734 542 17 541 0 100% 225029151 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997736 549 5 520 0 99% 253720698 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997731 491 17 430 0 100% 254305414 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997737 551 4 547 0 100% 268619092 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997729 425 9 376 0 99% 285803102 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ997735 545 4 545 0 100% 292673279 Bacteria Firmicutes Streptococcaceae Streptococcus thermophilus JQ999660 571 6 561 0 89% 296416 Bacteria Firmicutes Planococcaceae Sporosarcina globispora GKJWQY101B0W4V 543 341 488 9.00E-12 77% 90103542 Bacteria Proteobacteria Alphaproteobacteria Rhodopseudomonas palustris JQ999360 560 19 554 0 97% 67527215 Bacteria Proteobacteria Moraxellaceae GKJWQY101AXN46 232 59 183 9.00E-53 98% 168192641 Bacteria Proteobacteria Methylobacteriaceae Methylobacterium sp. 4-46 GKJWQY101A3454 419 55 374 8.00E-41 77% 159140696 Bacteria Proteobacteria Rhizobiaceae Agrobacterium tumefaciens GKJWQY101BEXW8 522 288 501 8.00E-27 78% 221721649 Bacteria Proteobacteria Rhizobiaceae Agrobacterium tumefaciens GKJWQY101A44CM 474 6 420 2.00E-88 81% 115259848 Bacteria Proteobacteria Rhizobiaceae Rhizobium leguminosarum GKJWQY101AA9UB 528 33 448 2.00E-83 81% 115254414 Bacteria Proteobacteria Rhizobiaceae Rhizobium leguminosarum GKJWQY101BSXFZ 528 349 470 5.00E-24 85% 209533368 Bacteria Proteobacteria Rhizobiaceae Rhizobium leguminosarum GKJWQY101A1E9L 514 6 509 2.00E-148 86% 240856645 Bacteria Proteobacteria Rhizobiaceae Rhizobium leguminosarum JQ999799 581 18 577 0 90% 2244633 Bacteria Proteobacteria Caulobacteraceae Brevundimonas diminuta GKJWQY101A9F0H 544 336 471 7.00E-23 82% 295429362 Bacteria Proteobacteria Caulobacteraceae Caulobacter segnis GKJWQY101A9VAR 532 208 508 9.00E-37 77% 84785911 Bacteria Proteobacteria Erythrobacteraceae Erythrobacter litoralis GKJWQY101B2TF4 563 1 515 0 95% 148498119 Bacteria Proteobacteria Sphingomonadaceae Sphingomonas wittichii GKJWQY101BOJM6 364 106 331 1.00E-83 92% 163258032 Bacteria Proteobacteria Alcaligenaceae GKJWQY101BQ54I 256 18 211 1.00E-92 99% 171994659 Bacteria Proteobacteria Burkholderiaceae Burkholderia ambifaria GKJWQY101BYI13 513 26 510 8.00E-97 81% 115283258 Bacteria Proteobacteria Burkholderiaceae Burkholderia ambifaria GKJWQY101BEI1O 536 18 534 0 95% 171998010 Bacteria Proteobacteria Burkholderiaceae Burkholderia ambifaria GKJWQY101BRVHV 506 3 491 0 91% 115280044 Bacteria Proteobacteria Burkholderiaceae Burkholderia ambifaria GKJWQY101A440F 528 9 525 1.00E-159 87% 190714216 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101BPELL 514 42 514 0 97% 116652879 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101BZA3A 566 5 562 0 99% 169820555 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101BWFJ6 528 156 524 4.00E-159 95% 116649273 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101ALL88 537 32 498 1.00E-154 89% 190714214 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101A2LFA 523 18 523 0 92% 169814598 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101AO2I9 584 18 580 0 93% 190714218 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101BL5T8 548 24 545 0 98% 169817759 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia GKJWQY101BP3WG 502 5 499 0 100% 190714220 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia 223 Table S13 Cont.

GKJWQY101BB8D5 506 22 462 4.00E-99 82% 189336000 Bacteria Proteobacteria Burkholderiaceae Burkholderia multivorans GKJWQY101BJXXU 557 260 555 2.00E-137 97% 189338899 Bacteria Proteobacteria Burkholderiaceae Burkholderia multivorans GKJWQY101A2NI2 512 214 474 1.00E-34 78% 189332915 Bacteria Proteobacteria Burkholderiaceae Burkholderia multivorans GKJWQY101AY842 526 5 519 0 98% 189338131 Bacteria Proteobacteria Burkholderiaceae Burkholderia multivorans JQ999294 563 18 556 0 94% 290457127 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis GKJWQY101A1LOF 527 6 520 0 97% 134134073 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis GKJWQY101ACN2B 554 9 552 0 99% 134137285 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis GKJWQY101A02CY 512 5 511 0 99% 134132180 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis GKJWQY101B056W 577 4 572 0 93% 134135188 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis GKJWQY101AEMEY 513 16 513 0 96% 28974940 Bacteria Proteobacteria Ralstoniaceae Ralstonia pickettii GKJWQY101ARIRG 539 39 414 3.00E-146 92% 601939 Bacteria Proteobacteria Ralstoniaceae Ralstonia pickettii GKJWQY101ALG7X 519 7 518 0 92% 240863652 Bacteria Proteobacteria Ralstoniaceae Ralstonia pickettii GKJWQY101APKXX 485 3 480 0 92% 240867064 Bacteria Proteobacteria Ralstoniaceae Ralstonia pickettii GKJWQY101BDQBV 545 20 544 0 99% 187724002 Bacteria Proteobacteria Ralstoniaceae Ralstonia pickettii GKJWQY101AAZQK 550 5 541 0 91% 240868245 Bacteria Proteobacteria Ralstoniaceae Ralstonia pickettii GKJWQY101A3210 489 17 431 0 96% 124257968 Bacteria Proteobacteria Comamonadaceae Methylibium petroleiphilum GKJWQY101BPNEG 528 24 468 0 100% 262206648 Bacteria Proteobacteria Comamonadaceae Comamonas testosteroni GKJWQY101BJ06O 538 17 534 0 98% 239799596 Bacteria Proteobacteria Comamonadaceae Variovorax paradoxus GKJWQY101BNV97 515 55 512 1.00E-178 92% 170774137 Bacteria Proteobacteria Comamonadaceae Leptothrix cholodnii GKJWQY101AP1J4 648 9 306 4.00E-61 84% 190010013 Bacteria Proteobacteria Xanthomonadaceae Stenotrophomonas maltophilia GKJWQY101AXBCM 557 18 557 0 95% 288887617 Bacteria Proteobacteria Enterobacteriaceae Klebsiella variicola GKJWQY101BYDKG 535 19 533 0 97% 260447279 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli GKJWQY101BGYZ8 410 5 361 0 100% 294489418 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli GKJWQY101AVWVM 539 5 532 0 90% 290760697 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli GKJWQY101BUK33 498 5 237 1.00E-79 90% 295054830 Bacteria Proteobacteria Enterobacteriaceae GKJWQY101BJ7OH 517 18 516 0 100% 295059951 Bacteria Proteobacteria Enterobacteriaceae Enterobacter cloacae JQ999405 609 13 64 5.00E-15 98% 215435094 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas aeruginosa GKJWQY101ADACY 576 7 567 0 90% 218768969 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas aeruginosa JQ999404 428 47 364 5.00E-133 94% 285028782 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas aeruginosa GKJWQY101AGCOW 420 5 240 3.00E-100 95% 150958624 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas aeruginosa GKJWQY101A6JM9 537 23 508 0 99% 95101722 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas entomophila GKJWQY101A34YN 233 25 195 6.00E-65 93% 7546742 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens GKJWQY101BV9BO 477 5 414 0 98% 68342549 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens GKJWQY101AYR98 500 18 492 2.00E-127 85% 171705315 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens GKJWQY101A1SM7 594 17 593 0 98% 229359445 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens GKJWQY101B1U2N 566 6 562 0 97% 253992019 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens JQ999408 560 5 557 0 99% 118026408 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens GKJWQY101AZRPU 177 5 130 2.00E-58 100% 295646750 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens GKJWQY101A1CE6 212 3 167 5.00E-80 100% 295149356 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas mendocina GKJWQY101BF9AJ 569 17 532 0 99% 145573243 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas mendocina GKJWQY101AZLOG 580 5 579 0 98% 169757190 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida GKJWQY101BZK7N 571 17 567 0 98% 148509317 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida GKJWQY101BA5SM 560 5 558 0 98% 166857509 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida GKJWQY101BQFXA 514 24 510 0 99% 24987239 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999412 397 18 350 2.00E-171 100% 4928221 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999416 540 18 539 0 100% 183585700 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999413 398 18 363 2.00E-180 100% 227433755 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999414 539 18 403 0 99% 254621800 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999417 544 5 484 0 100% 295083378 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999415 539 81 537 0 98% 295646754 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999411 335 5 289 2.00E-146 100% 295814491 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida GKJWQY101AY10W 527 18 55 0.00004 95% 13310118 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida GKJWQY101A3Z0H 573 4 514 0 97% 71553748 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas savastanoi GKJWQY101AU6HQ 497 14 493 0 98% 60115908 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri GKJWQY101BQTGJ 598 5 595 0 96% 145568602 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri GKJWQY101APOFH 134 5 88 2.00E-32 98% 1718243 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri GKJWQY101BSMOV 275 106 249 5.00E-61 97% 227452753 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999821 579 5 576 0 99% 2244673 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999452 325 5 226 1.00E-98 97% 7321259 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999451 284 17 179 2.00E-55 92% 12832002 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999453 342 5 302 4.00E-152 100% 15282431 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999450 264 5 216 1.00E-56 88% 19338604 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999463 539 5 536 0 99% 22474444 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999468 567 5 564 0 99% 77456204 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999470 575 5 575 0 99% 86211364 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999467 559 5 559 0 99% 102231497 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999461 510 5 288 4.00E-144 100% 112820874 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999459 476 15 431 0 99% 194399053 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999469 568 67 541 0 94% 209981671 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri 224 Table S13 Cont.

JQ999464 544 16 529 0 100% 256861793 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999466 556 18 556 0 99% 257043242 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999458 433 5 371 3.00E-175 97% 281191476 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999462 513 142 439 3.00E-121 94% 285206748 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999456 395 18 360 5.00E-177 100% 289547136 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999455 382 18 203 4.00E-88 99% 294662661 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999454 355 18 307 1.00E-147 100% 295810392 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999460 498 4 443 0 98% 295810394 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999457 399 18 344 2.00E-161 98% 295810395 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri JQ999465 554 5 549 0 99% 295810397 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas stutzeri GKJWQY101A8AMI 547 20 307 6.00E-123 95% 213985689 Bacteria Proteobacteria Moraxellaceae Acinetobacter baumannii JQ999816 560 13 558 0 90% 168148844 Bacteria Proteobacteria Halomonadaceae Halomonas sulfidaeris JQ999822 310 5 272 5.00E-126 98% 1913845 Bacteria Proteobacteria Xanthomonadaceae Xanthomonas fragariae GKJWQY101BEX3C 583 5 583 0 98% 92392509 Bacteria Proteobacteria Moraxellaceae Psychrobacter cryohalolentis JQ999502 324 5 284 2.00E-110 93% 157367002 Bacteria Proteobacteria Pseudomonadaceae uncultured Pseudomonas sp. JQ999339 437 18 391 3.00E-179 97% 60266657 Bacteria Proteobacteria n uncultured gamma proteobacterium GKJWQY101B2XIV 586 5 580 0 91% 156769729 Bacteria n n uncultured bacterium GKJWQY101BLIPI 556 2 555 0 95% 156768841 Bacteria n n uncultured bacterium JQ999031 554 20 554 0 92% 92087350 Bacteria n n uncultured bacterium JQ998949 547 24 544 0 96% 189182700 Bacteria n n uncultured bacterium JQ998015 303 18 259 1.00E-122 100% 192787447 Bacteria n n uncultured bacterium JQ999098 559 17 418 0 97% 209170703 Bacteria n n uncultured bacterium JQ998675 523 18 520 0 98% 224569115 Bacteria n n uncultured bacterium JQ998571 495 15 379 0 99% 229429018 Bacteria n n uncultured bacterium JQ999027 553 19 520 0 100% 238330523 Bacteria n n uncultured bacterium JQ998515 478 5 438 0 99% 238415793 Bacteria n n uncultured bacterium JQ998455 456 8 400 0 98% 256355282 Bacteria n n uncultured bacterium JQ999190 602 4 595 0 91% 256592892 Bacteria n n uncultured bacterium JQ999149 567 18 565 0 94% 285960330 Bacteria n n uncultured bacterium JQ999185 588 3 584 0 93% 285960363 Bacteria n n uncultured bacterium JQ998650 517 69 463 0 100% 289185872 Bacteria n n uncultured bacterium JQ999182 581 1 577 0 91% 289656597 Bacteria n n uncultured bacterium JQ998096 331 5 270 1.00E-127 98% 291192742 Bacteria n n uncultured bacterium JQ998495 469 5 419 0 96% 291192757 Bacteria n n uncultured bacterium JQ999774 545 137 529 6.00E-158 93% 291259015 Bacteria n n uncultured bacterium JQ999729 362 18 261 2.00E-90 92% 291260769 Bacteria n n uncultured bacterium JQ999740 413 5 368 4.00E-158 95% 291260837 Bacteria n n uncultured bacterium GKJWQY101BR4RJ 260 5 193 3.00E-93 100% 295027769 Bacteria n n uncultured bacterium GKJWQY101AZEVC 445 1 445 0 96% 83281396 Eukaryota Arthropoda Culicidae Culex quinquefasciatus GKJWQY101A08VK 28 1 28 0.002 100% 113193577 Eukaryota Arthropoda Drosophilidae Drosophila melanogaster GKJWQY101BXQ6C 491 1 491 0 95% 172190 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae GKJWQY101AUREB 124 23 53 0.000008 100% 295393257 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae GKJWQY101BBYO0 43 1 43 2.00E-10 98% 269944715 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae GKJWQY101BBSPQ 105 1 105 1.00E-46 100% 294929468 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae JQ999577 553 17 543 0 93% 225134683 Eukaryota Ascomycota Trichocomaceae Penicillium chrysogenum JQ999569 517 103 511 5.00E-173 94% 283827965 Eukaryota Ascomycota Teratosphaeriaceae Teratosphaeria suttonii GKJWQY101AFQZR 257 1 257 5.00E-62 85% 72256214 Eukaryota Basidiomycota Ustilaginaceae Ustilago maydis GKJWQY101BSALF 484 1 484 1.00E-56 76% 71022004 Eukaryota Basidiomycota Ustilaginaceae Ustilago maydis JQ999870 548 18 544 0 95% 228547090 Eukaryota Basidiomycota Polyporaceae Lignosus rhinocerus GKJWQY101BNMZJ 534 18 124 1.00E-40 96% 148888555 Eukaryota Chordata Bovidae Bos indicus GKJWQY101BV8EI 508 288 478 6.00E-73 93% 1002428 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BURCN 486 54 188 3.00E-46 93% 32364476 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BF0CZ 495 170 483 7.00E-107 90% 36988715 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AVJ8U 535 7 501 3.00E-161 88% 50363274 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BSNZS 474 204 452 1.00E-118 98% 59858218 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AVMZK 474 153 237 2.00E-28 95% 67944510 Eukaryota Chordata Bovidae Bos taurus GKJWQY101B05KZ 538 5 85 7.00E-33 100% 83638662 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AO1F2 531 5 528 0 91% 119216318 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A8YVU 538 61 192 7.00E-33 87% 126010708 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AC3Q3 671 18 109 1.00E-26 92% 126033171 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BG5W1 523 215 475 3.00E-91 91% 148743975 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AM38L 551 210 536 1.00E-125 92% 151554659 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A2KRZ 508 218 483 6.00E-53 82% 151556917 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A35JD 485 231 479 3.00E-81 89% 154425576 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AO1F2 531 5 528 0 91% 119216318 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A8YVU 538 61 192 7.00E-33 87% 126010708 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AC3Q3 671 18 109 1.00E-26 92% 126033171 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BG5W1 523 215 475 3.00E-91 91% 148743975 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AM38L 551 210 536 1.00E-125 92% 151554659 Eukaryota Chordata Bovidae Bos taurus 225 Table S13 Cont.

GKJWQY101A2KRZ 508 218 483 6.00E-53 82% 151556917 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A35JD 485 231 479 3.00E-81 89% 154425576 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BW7C1 558 44 152 9.00E-37 94% 188485453 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AECTS 487 197 416 1.00E-94 96% 188485455 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A6Y0M 586 514 564 4.00E-11 94% 214010996 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BWLQC 551 5 488 6.00E-168 90% 270310991 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AV25T 516 18 516 0 95% 270310993 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BKUSG 547 70 369 2.00E-82 87% 211998866 Eukaryota Chordata Bovidae Bos taurus GKJWQY101ABYCR 519 179 449 2.00E-112 94% 163565 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BL45R 458 101 189 3.00E-16 86% 3873616 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BDGMI 512 18 509 0 96% 13569587 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BMRAG 546 40 439 0 100% 14594798 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AA90W 514 19 503 5.00E-143 86% 21425595 Eukaryota Chordata Bovidae Bos taurus GKJWQY101ARRLV 257 27 212 1.00E-81 97% 27227458 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BWE5U 523 146 347 8.00E-57 87% 29692104 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BYYPR 520 152 356 2.00E-53 86% 46850518 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AAK5N 520 91 294 2.00E-62 89% 52839263 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BXKWL 518 19 516 0 91% 56411964 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BCK3R 513 17 512 0 94% 63169154 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BCG59 533 16 532 0 95% 66734170 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BKG3M 533 20 530 4.00E-129 84% 83286786 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BSEK5 618 139 166 0.002 100% 129561996 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BQ1W1 560 10 507 1.00E-155 88% 134244145 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BHOYS 528 223 494 6.00E-133 99% 31341883 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BFBG8 547 216 468 1.00E-109 96% 31342962 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BBE6M 231 5 183 9.00E-88 100% 47564057 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BD9DT 552 18 95 3.00E-31 100% 74267625 Eukaryota Chordata Bovidae Bos taurus GKJWQY101ACOHL 558 135 274 2.00E-34 87% 77735986 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AWRQM 511 76 486 6.00E-48 76% 94574056 Eukaryota Chordata Bovidae Bos taurus GKJWQY101ALMP9 585 100 236 1.00E-60 99% 115495638 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BZ030 560 20 533 0 98% 115496399 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BW03O 515 70 162 6.00E-38 99% 115496837 Eukaryota Chordata Bovidae Bos taurus GKJWQY101B2T6S 554 481 549 9.00E-22 96% 115497327 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BUUKG 515 123 512 0 99% 118150885 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AB70E 532 43 147 5.00E-44 99% 122692336 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BSN1F 579 133 174 2.00E-09 98% 125991941 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BBORT 108 1 62 4.00E-23 100% 148227015 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AHX0L 436 20 194 7.00E-57 90% 149642896 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BA3QT 520 403 514 1.00E-30 91% 154152074 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BRRAJ 517 194 413 4.00E-55 85% 156120486 Eukaryota Chordata Bovidae Bos taurus GKJWQY101AZUNM 522 5 198 6.00E-63 90% 156120792 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A32BJ 526 5 77 9.00E-27 99% 156523067 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BIASS 446 23 184 7.00E-47 88% 157073991 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BWCRK 599 18 55 6.00E-09 100% 157074093 Eukaryota Chordata Bovidae Bos taurus GKJWQY101APUR7 513 348 500 2.00E-33 85% 157074095 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A8WC2 557 387 555 8.00E-82 100% 157428077 Eukaryota Chordata Bovidae Bos taurus GKJWQY101A3AP7 332 42 236 2.00E-90 98% 158937292 Eukaryota Chordata Bovidae Bos taurus GKJWQY101BNNNZ 541 18 510 0 99% 14336700 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AI583 524 5 75 2.00E-22 96% 262331521 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BIW6Y 573 5 570 0 96% 197245396 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101ALYO2 519 18 516 0 98% 224922786 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101A1CZ5 606 5 259 1.00E-119 98% 163954924 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AD8KV 559 19 106 2.00E-18 88% 193220939 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BSGL8 510 20 358 6.00E-63 81% 291290993 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BSMSU 558 18 438 0 98% 194385695 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BNWXC 522 329 513 8.00E-32 82% 255652918 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AYL0B 524 18 498 0 99% 3289998 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BZ34P 542 24 534 0 95% 124302213 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BGPYK 537 18 356 1.00E-84 84% 30023945 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AOCTV 445 5 389 8.00E-131 89% 134152716 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BXZZB 632 23 488 1.00E-159 89% 195927052 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AVDNO 528 5 526 0 95% 219520697 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BITIR 518 18 515 0 98% 282396079 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BQ8KU 231 5 163 7.00E-64 95% 237874182 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BHE6J 306 197 240 1.00E-07 93% 219521881 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101A3FOT 565 5 475 5.00E-64 77% 261278320 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101ARFPP 536 17 489 0 95% 291084840 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101ACOO1 537 4 518 0 95% 73697497 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101A09S8 500 18 433 0 97% 254039630 Eukaryota Chordata Hominidae Homo sapiens 226 Table S13 Cont.

GKJWQY101AVI61 525 18 521 0 99% 197333801 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BJBYT 555 18 545 1.00E-135 85% 114306774 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AT79U 575 75 198 3.00E-11 79% 237820692 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BN89C 516 18 512 0 97% 150170720 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101B0JO4 588 17 588 0 98% 167830470 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BVZGM 545 5 541 0 97% 19718557 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101ATS4V 522 18 469 0 99% 47940445 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AJ4OL 558 192 556 0 98% 62087205 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101ABSQH 491 18 485 0 95% 160948584 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BUHAT 214 5 121 3.00E-42 94% 160948585 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101A2QDN 617 18 600 0 89% 270048017 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BCNPF 549 80 473 3.00E-126 88% 293597499 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AKRP1 595 7 589 0 94% 224809264 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BZNKX 481 5 442 0 99% 261823972 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AR7W1 555 23 553 0 99% 237874188 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BTVAZ 574 18 572 0 98% 281182732 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BSHC9 515 5 512 0 98% 33341735 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BJPFL 363 7 222 5.00E-97 97% 291190796 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BERG1 149 20 86 4.00E-24 99% 13625541 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BXHYS 569 135 533 3.00E-41 76% 227330591 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BOL5J 512 16 506 0 98% 293597503 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BQ9JT 502 18 496 0 98% 157426899 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BGJ0B 82 18 75 1.00E-16 95% 215983102 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101B17JJ 558 18 552 0 99% 226510210 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101AQCZO 442 18 47 1.00E-04 100% 261598996 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101BX9QE 573 3 552 4.00E-85 78% 386434 Eukaryota Chordata Hominidae Homo sapiens GKJWQY101A2A1E 473 224 252 5.00E-04 100% 37537319 Eukaryota Chordata Hominidae Pan troglodytes GKJWQY101ATLTF 519 7 475 1.00E-115 84% 37537322 Eukaryota Chordata Hominidae Pan troglodytes GKJWQY101BSUBR 551 31 432 5.00E-139 90% 37537465 Eukaryota Chordata Hominidae Pan troglodytes GKJWQY101AAQYP 521 69 126 4.00E-10 90% 172072623 Eukaryota Chordata Muridae Mus musculus GKJWQY101AHH9C 194 1 194 2.00E-40 83% 45423887 Eukaryota Chordata Phasianidae Gallus gallus GKJWQY101A1FPO 439 1 439 2.00E-149 89% 118085917 Eukaryota Chordata Phasianidae Gallus gallus GKJWQY101BXNXM 328 1 328 1.00E-124 92% 154937557 Eukaryota Chordata Phasianidae Gallus gallus JQ999602 295 5 245 9.00E-119 99% 218047175 Eukaryota Chordata Phasianidae Gallus gallus GKJWQY101A1FPO 576 18 456 1.00E-149 89% 118085917 Eukaryota Chordata Phasianidae Gallus gallus JQ999622 517 24 468 0 100% 155573953 Eukaryota Streptophyta Euphorbiaceae Euphorbia atoto JQ999620 540 5 537 0 99% 290782471 Eukaryota Streptophyta Fagaceae Quercus suber JQ999896 558 5 557 0 98% 37993790 Eukaryota Streptophyta Fagaceae Quercus suber GKJWQY101BOE6X 501 1 501 0 93% 32994295 Eukaryota Streptophyta Poaceae Oryza sativa GKJWQY101B08PD 536 1 536 0 92% 32995694 Eukaryota Streptophyta Poaceae Oryza sativa GKJWQY101AI87M 526 1 526 0 98% 99651997 Eukaryota Streptophyta Poaceae Oryza sativa 227

Table S14. Sequences removed from the V6 data set that were identical or similar to sequence from controls. Taxonomic affiliation that was not found on NCBI Gene Bank is marked as "n".

Accession number / Q length Q start Q end e-value %-ident GI number Domain Phylum Family Genus / Species sequence number VostokV6_c1 2175 1 2175 0 100% 291375051 Bacteria Actinobacteria Propionibacteriaceae Propionibacterium acnes VostokV6_c21 1016 1 1016 0 99% 239837778 Bacteria Actinobacteria Micrococcaceae Micrococcus luteus VostokV6_rep_c162 209 12 209 2.00E-63 90% 171850984 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium urealyticum VostokV6_rep_c241 71 1 71 9.00E-27 99% 227452846 Bacteria Actinobacteria Corynebacteriaceae Corynebacterium aurimucosum VostokV6_c269 191 1 191 1.00E-85 97% 134265192 Bacteria Firmicutes Bacillaceae Geobacillus thermodenitrificans VostokV6_c263 154 1 154 2.00E-68 98% 9664721 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis VostokV6_c4 896 1 896 0 99% 9624251 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis VostokV6_rep_c178 168 1 168 8.00E-82 100% 9664799 Bacteria Firmicutes Staphylococcaceae Staphylococcus epidermidis VostokV6_c66 57 1 57 4.00E-19 98% 288906474 Bacteria Firmicutes Staphylococcaceae Streptococcus mitis JQ999835 250 1 250 2.00E-105 95% 296416 Bacteria Firmicutes Planococcaceae Sporosarcina globispora JQ999836 1187 5 1175 0 97% 2244633 Bacteria Proteobacteria Caulobacteraceae Brevundimonas diminuta VostokV6_c144 253 68 253 2.00E-80 96% 115280044 Bacteria Proteobacteria Burkholderiaceae Burkholderia ambifaria VostokV6_c11 1091 23 1091 0 95% 190714218 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia VostokV6_c20 548 8 548 0 93% 105891751 Bacteria Proteobacteria Burkholderiaceae Burkholderia cenocepacia VostokV6_c118 120 1 120 2.00E-55 100% 189338131 Bacteria Proteobacteria Burkholderiaceae Burkholderia multivorans VostokV6_c209 101 1 101 7.00E-45 100% 134134073 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis VostokV6_c255 153 1 153 3.00E-71 99% 134137285 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis VostokV6_c262 138 1 138 3.00E-65 100% 134135188 Bacteria Proteobacteria Burkholderiaceae Burkholderia vietnamiensis VostokV6_c247 203 1 203 6.00E-94 98% 145692985 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas aeruginosa JQ999555 554 28 554 0 100% 292386075 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida JQ999553 215 1 212 2.00E-104 100% 116294371 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas putida VostokV6_c98 102 1 102 1.00E-38 96% 229359445 Bacteria Proteobacteria Pseudomonadaceae Pseudomonas fluorescens VostokV6_s215 125 1 125 6.00E-37 91% 239829322 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c146 924 1 924 0 91% 157065147 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c17 2018 56 2018 0 100% 606010 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c33 1524 29 1524 0 99% 218350208 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c34 1654 1 1421 0 95% 5801827 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c7 712 1 712 0 100% 284919779 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c70 113 42 113 1.00E-22 95% 294489418 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_rep_c125 151 1 85 4.00E-34 99% 290760697 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_rep_c136 351 58 351 4.00E-138 98% 257762509 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_rep_c168 556 34 556 0 99% 238859724 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_rep_c198 1034 1 1034 0 100% 260447279 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_rep_c43 708 1 637 0 98% 218425442 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_rep_c52 454 1 453 0 98% 291220687 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_s116 215 1 215 2.00E-104 99% 281177210 Bacteria Proteobacteria Enterobacteriaceae Escherichia coli VostokV6_c264 137 4 64 1.00E-14 93% 288887617 Bacteria Proteobacteria Enterobacteriaceae Klebsiella variicola JQ999834 323 6 323 4.00E-142 96% 1913845 Bacteria Proteobacteria Xanthomonadaceae Xanthomonas fragariae VostokV6_s265 188 1 188 3.00E-86 98% 217416971 Bacteria n n uncultured bacterium VostokV6_c167 43 6 43 7.00E-09 97% 285811954 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae VostokV6_c257 278 1 278 2.00E-125 96% 285813361 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae VostokV6_c71 43 1 43 3.00E-13 100% 285813870 Eukaryota Ascomycota Saccharomycetaceae Saccharomyces cerevisiae VostokV6_c120 67 1 46 3.00E-11 96% 35978 Eukaryota Chordata Hominidae Homo sapiens VostokV6_c79 61 1 61 6.00E-23 100% 18653449 Eukaryota Chordata Hominidae Homo sapiens VostokV6_rep_c49 165 33 165 5.00E-59 99% 118636082 Eukaryota Chordata Hominidae Homo sapiens VostokV6_c163 141 1 141 3.00E-65 99% 189027236 Eukaryota Chordata Hominidae Homo sapiens 228

SUPPLEMENTARY FIGURES

Current supplementary figures represent manually reconstructed pathways based on the results from the KAAS KEGG server searches. Metagenomic and metatranscriptomic sequences were compared to 40 genomes from different taxonomic groups (see materials and methods).

Once the searches were complete, 208 mapped pathways were retrieved from the KAAS KEGG server. Those KAAS KEGG enzymes, which sequences were similar to current metagenomic and metatranscriptomic data were highlighted on those 208 metabolic pathways by the server.

Based on the highlighted enzymes, metabolic maps were redrawn and connected. During this analysis 16 supplementary figures were created. Those are shown below and are labeled as

Figure S1-S16. Blue arrows ( ) indicate unidirectional or reversible ( ) reaction(s) where enzyme(s) was (were) found. Note that arrows within the pathways do not indicate a single reaction, but can mean multiple reactions as well. Blue dashed arrows ( ) represent reactions where enzyme(s) was (were) not found in our data. Red dashed arrows ( ) represent existing reaction(s) and its enzyme(s) for the reactions that cross on large complicated maps within several pathways. Red arrows with red “x” mark in the middle ( x ) shows pathways where enzyme(s) was (were) not found, but the intermediate substrates and final product exist in the pathway(s). Yellow arrows ( or ) indicate that synthesized product feeds certain pathway, but the origin of the substrate is unknown. Green arrows ( or ) represent spontaneous or non-enzymatic reaction. Blunt blue lines ( ) are connection between the same pathways; these do not represent reactions or enzymes. Names of the pathways are shown in boxes. It is important to remember that, while the same products can be synthesized from the same substrates by different enzymes in different pathways, some pathways were found to share enzymes. Pyrimidine metabolism UMP 229 Nitrogen assimilation PRPP

NH3 uracil UTP CTP TMP carbamoyl- L-glutamine phosphate RNA and DNA synthesis

Arginine and Proline Arginine and Proline metabolism metabolism Propanoate met. arginine citrulline ornithine L-proline L-cysteine ȕ-alanine L-1-pyrroline- N-carbamoyl- Pantothenate 5-carboxylate L-glutamate and CoA L-aspartate Urea cycle putrescine pyruvate

L-alanine L-aspartate o/r TCA pyruvate L-asparagine 4-amino- butanoate Glycolysis oxaloacetate fumarate Tetracycline biosynthesis D-GlutamineD-Glutamate and synthesis pathway acid dibasic C5-Branched

succinate o/r TCA Nicotine and Nicotinamide metabolism

Butanoate met. butanal butanoyl-CoA L-aspartate acetyl-CoA glycerone-P NADP+ (R)-3-((R)-3-hydroxy- (R)-3-hydroxy- (S)-3-hydroxy- quinolinate butanoyloxy)butanoate butanoyl-CoA butanoyl-Co-A Pyruvate met. nicotinate NAD+ acetoacetate acetoacetyl-CoA nicotinate deamino-NAD+ Leucine degradation D-ribonucleotide 230

Figure S1. Alanine, aspartate and glutamate metabolic processes and supporting pathways (1). Here alanine, aspartate and glutamate metabolic reactions are connected with pyrimidine, butanoate, nicotine and nicotinamide, and arginine and proline biosynthetic processes, as well as leucine degradation. Those enzymes that 454 sequences were similar to create a major net of reaction that from these tree amino acids supplies the major metabolic pathways, like glycolysis, TCA, coenzyme synthesis, pyrimidine metabolism, and

Nicotinamide production. Ȗ-glutamyl-ȕ-aminopropiononitrile 231 cyanoglycoside cyanohydrin L-3-cyanoalanine Ȗ-glutamyl-ȕ-cyanoalanine

Cyanoamino acid metabolism UDP-N-acetylmuramoyl- Propanoate met. L-alanyl-D-glutamyl-meso- 2,6-diaminopimeloyl- Pyrimidine metabolism N-carbamoyl-L-aspartate D-alanyl-D-alanine Peptidoglycan biosynthesis L-lysine L-cysteine Ǻ-alanine Cyanoamino acid metabolism Pantothenate Glycolysis and CoA L-aspartate meso-2,6-diamino- 4-semildehyde pimelate

pyruvate L-alanine L-aspartate L-asparagine Tetracycline biosynthesis

Arginine and Proline ȕ-Alanine metabolism metabolism carbamoyl-phosphate fumarate oxaloacetate Glutathione metabolism Amino acids: Glycine, Serine, Threonine Nicotine and Nicotinamide metabolism: Quinolinate

L-glutamate 2-oxoglutarate o/r TCA

NH 3 Porphyrin metabolism NH3 creatine-P Glutathione metabolism Arginine and Proline carbamoyl-phosphate metabolism L-glutamine creatinine ǹ-amino acids glycine Nitrogen metabolism (nitrogenous compounds) 5-phospho-ribosylamine N-methylhydantoin nitrate nitrite ammonia L-glutamine L-glutamate

N-carbamoyl-sarcosine Purine metabolism nitrogen Glutamate metabolism 232

Figure S2. Alanine, aspartate and glutamate metabolic processes and supporting pathways (2). Here alanine, aspartate and glutamate metabolic reactions are connected with pantothenate and Co-A biosynthesis, glycolysis, arginine and proline, glutathione metabolism, nitrogen metabolism, porphyrin metabolism, and possibly cyanoamino acid metabolism. Sequences similar to those from glutamine and glutamate metabolic processed, as well as nitrogen cycle were present. Lysine pathway was lacking most of the enzymes for its production (sequences were not found in our data), however, synthesis of lysine was determined to be bypassed from aspartate metabolism. N-Glycan biosynthesis ȕ-D-fructose-6P Glycolysis

233 phosphoenol- Methane metabolism pyruvate

o/rTCA H2S sulfite

pyruvate acetyl-P acetate oxaloacetate Sulfur metabolism

CO2 formate Glycine, Serine, Threonine metabolism acetyl-CoA acetyladenylate Glyoxylate and Chloralkane and acetaldehyde Dicarboxylate met. Alkene degradation Carbon fixation L-malate malonyl-CoA Fatty acid biosynthesis 3-phospho-D-glycerate glyoxylate

Propanoate met. acetoacetyl-CoA Synthesis and degradation of ketone bodies

Butanoate met. acetoacetate 2-oxobutanoate propanoyl-CoA acetone

Cysteine and Valine, Leucine and Valine, Leucine and 2-acetolactate Methionine met. Isoleucine degradation Isoleucine biosynthesis

Valine degradation

Figure S3. Pyruvate metabolic processes with supporting pathways. Metatranscriptomic sequences suggest the presence of photosynthetic genes, one of which was 3-phospho-D-glycerate. This metabolic intermediate participates in the reductive Calvin-Benson cycle for carbon fixation, as well as methane metabolism, glycolysis, and synthesis of several amino acids. Also, sequences similar to two genes for phosphoenolpyruvate carboxykinase and phosphoenolpyruvate carboxylase were found in the data set. Those function in the pyruvate metabolism, but can supply rTCA cycle with oxaloacetate. 234 malonyl-CoA acetate Methane metabolism acetate Sulfur metabolism

Phosphoenol- Fatty acid biosynthesis acetyl-CoA acetyl-CoA oxaloacetate L-serine pyruvate

fatty acid 2-phospho- glycine D-glycerate o/r TCA formate formaldehyde 5,10-methylene-THF Pyruvate metabolism Pyruvate aldehyde One carbon pool by folate

ǹ-hydroxy fatty acid Glyoxylate and 5,10-methenyl-THF CO Dicarboxylate met. 2 Chloroalkane and L-alcohol acetyl-CoA chloroacetate 2-chloroethanol Alkene degradation

trans-3-chloro- Fatty acid metabolism glycolate trans-3-chloroacrylic acid 2-propene-1-ol

acetaldehyde malonate semialdehyde acetoacetyl-CoA glutarate cis-3-chloro- cis-3-chloroacrylic acid acetylene ethylene 2-propene-1-ol Pyruvate metabolism Pyruvate

Synthesis and degradation acetyl-CoA of ketone bodies acetoacetate acetone

Butanoate met.

Figure S4. Pyruvate metabolic processes supplying methane and glyoxylate and dicarboxylate metabolic pathways. These are additional to Figure S3 pyruvate metabolic processes. Also, pyruvate supplies fatty acid biosynthesis with malonyl-CoA, which is used for recycling acetyl-CoA via TCA cycle. Purine metabolism inosine-monophosphate IMP 2’,3’-cyclic AMP AICAR 235 phosphoribosyl 2’,3’-cyclic GTP GTP ATP pyrophosphate PRPP Histidine metabolism (PRPP) inosine adenosine guanosine

Pentose Phosphate imidazole glycerol-3P pathway Thiamine metabolism DNA and RNA synthesis

L-histidine carnosine 1-(5’-phosphoribosyl)- Pantothenate and Co-A 5-aminoimidazole (AIR) biosynthesis Į-D-glucose- 6-phosphate urocanate thiamine phosphate [ThiS]-COSH imidazole- histamine Glycolysis acetaldehyde

methylimidazole thiamine imidazole- [ThiI]-SSH acetaldehyde 4-acetate

thiamine diphosphate Alanine, Aspartate and methylimidazole- N-formyl- L-Cysteine Glutamate metabolism acetic acid L-aspartate Cysteine metabolism aspartate thiamine triphosphate

Figure S5. Histidine and thiamine metabolic processes and their connection with purine metabolism. Sequences similar to those that code for enzymes responsible for the aminoimidazole carboxamide ribonucleotide (AICAR) synthesis were not found in the purine metabolism, however were present in the histidine metabolism. The AICAR is an important precursor for the inosine-monophosphate synthesis, which is used for the synthesis of adenine and guanine nucleotides. While, sequences similar to the enzymes producing phosphoribosyl-pyrophosphate (PRPP) in histidine pathways were not found in our data, PRPP was determined to be supplied from pentose phosphate pathway instead. Histidine metabolic pathway was also incomplete for the aspartate synthesis. 236 D-sorbidol D-mannose fructose D-glucose glycerol

Amino sugar and nucleotide sugar metabolism Galactose metabolism

Glycolysis D-galactose fructose-6P D-glucosamine-6P D-galactose-6P

Į-D-glucose-6P Alanine, Aspartate and ȕ-D-fructose-6P glucose-6P Glutamate metabolism

stachyose D-fructose D- Starch and N-Glycan biosynthesis ADP-glucose mannose Sucrose metabolism uridine diphosphate D-glucose D-galactose Peptidoglycan biosynthesis N-acetylglucosamine sucrose GDP-D-mannose

Fructose and Mannose metabolism

N-Glycan biosynthesis GPI-anchor biosynthesis (Glc)2 (GlcNAc)2- Various types of (Man)9 (Asn)1 N-glycan biosynthesis Cytosol Cytosol GDP-D-mannose mannose-ȕ-P-Dol (GlcNAc)2 (Man)9-(Asn)1

Figure S6. Amino sugar and nucleotide sugar metabolism and glycolysis and their connections with glycan biosynthesis. This figure shows that most of the carbohydrate processes from galactose metabolism are directed towards GDP-D-mannose synthesis, which in turn is supplied for various glycan biosynthetic processes. 237

ACP lauroyl-ACP Peptidoglycan biosynthesis Lipopolysaccharide biosynthesis

uridine diphosphate lipid A disaccharide GlcNAcĮ1-2GlcĮ1-2GalĮ1 lauroyl-KDO2-lipid IV (A) N-acetylglucosamine

Und-PP-MurNAc-(GlcNAc)-L- UDP-MurNAc-L-Ala-D-Glu Ala-Ȗ-D-Glu-L-Lys-D-Ala-D-Ala D-alanine

D-alanine D-Glutamate metabolism D-Glutamine and UDP-GlcNAc D-Ala-D-Ala

undecaprenyl-P UMP UMP Pi UDP-GlcNAc Und-PP-MurNAc-(GlcNAc)- Und-PP-MurNAc-(GlcNAc)- L-Ala-D-isoglutaminyl-L-Lys- L-Ala-Ȗ-D-Glu-L-Lys-(L-Ala)2- D-alanine (Gly)3-D-Ala-D-Ala D-Ala-D-Ala UDP-N-acetylmuramoyl- L-alanyl-D-glutamyl-meso- D-alanine D-alanine 2,6-diaminopimeloyl- D-alanyl-D-alanine peptidoglycan peptidoglycan peptidoglycan

Peptidoglycan biosynthesis

Figure S7. Peptidoglycan biosynthesis and supporting metabolic pathways. 238 threonine 2-oxobutanoate (R)-2-methylmalate

2-hydroxyethyl-ThPP pyruvate Glycine, Serine, Threonine metabolism 2-acetolactate malonyl-CoA 2-propyn-1-ol (2S)-2-isopropylmalate

L-isoleucine L-valine malonate 2-propyn-1-al (2S)-2-isopropyl- semialdehyde 3-oxosuccinate protein protein

propynoate Valine, Leucine and L-leucine protein Isoleucine degradation 3-hydroxy- propanyol-CoA Propanoate met. Pyrimidine metabolism

propanoyl-CoA (S)-3-amino-isobutanoate acrylyl-CoA acetoacetate o/r TCA (S)-methylmalonyl-CoA

acetyl-CoA succinyl-CoA (R)-methylmalonyl-CoA Butanoate met.

Pyruvate met. acetoacetyl-CoA

Figure S8. Valine, leucine and isoleucine biosynthesis and degradation processes and those pathways that support them.

Sequences similar to those encoding for the enzymes responsible for these amino acids synthesis were present in 454 data.

Degradation of these amino acids was found to supply carbohydrate pathways (propanoate, butanoate, and pyruvate) and energy producing TCA cycle. 239 Glycolysis 3-sulfopyruvate

Glyoxylate met. Cysteine metabolism

glycerate 3P-D-glycerate 2-aminoacrylate X L-cysteine

Pyruvate metabolism Glutathione metabolism

hydroxy-pyruvate serine pyruvate o/r TCA Methane metabolism

L-tryptophane 5,10-methylene-THF Glyoxylate NH3 metabolism One carbon pool by folate glyoxylate Tryptophane metabolism CO2 THF dihydro-lipoylprotein S-amino- choline glycine methyldihydro- lipoylprotein Glycerophospholipid metabolism threonine 2-oxobutanoate lipoylprotein

Aspartate metabolism Valine, Leucine and homoserine Isoleucine biosynthesis L-aspartate

Figure S9. Glycine, serine, threonine metabolism and those pathways that support their synthesis. The production of glycine from serine creates intermediate 5,10-methylene-THF that feeds methane metabolism, while conversion of glycine to the same 5,10-methylene-THF occurs with release of carbon dioxide and ammonia. D-erythrose-4P Pentose Phosphate pathway L-tryptophanyl tRNA

N-(5-phospho- Glycolysis 240 Acridone alkaloid ȕ-D-ribosyl)- L-tryptophan biosynthesis anthranilate PEP Indole o/r TCA PRPP 3-dehydroquinate anthranilate Novobiocin biosynthesis

Shikimate 4-hydroxy- pathway phenyl- shikimate chorismate succinate fumarate acetaldehyde

4-hydroxy- 2,3-dihydroxy- trans-2,3- Folate biosynthesis prephenate phenylpyruvate phenyl- dihydroxy- propanoate cinnamate phenylpyruvate Ubiquinoneother and biosynthesis terpenoid-quinone phenylacetaldehyde 4-hydroxy- pretyrosine phenylacetate tyrosine 3-hydroxy-phenyl- phenylalanine phenyl acetate propanoate 3-hydroxy-5-carboxy- trans-3-hydroxy- homoprotocatechuate hippurate methylmuconate cinnamate 2-phenylacetamide semialdehyde

benzoate Phenylalanine metabolism

Figure S10. Phenylalanine, tyrosine and tryptophan biosynthetic processes. However, production of all three aromatic amino acids was found incomplete. Only sequences matching those enzymes responsible for the tryptophan synthesis were fully present in our data. While sequences similar to 3 enzymes from the Shikimate pathway were absent, the presence of the chorismate was confirmed from other pathways, indicating possible synthesis of phenylalanine. The KAAS KEGG search also indicated the presence of 4-hydroxyphenylpyruvate and pretyrosine (both are substrates for the tyrosine production) sequences similar to those from the enzymes capable producing tyrosine were not found. L-cysteinyl-glycine L-cysteine Glutathione metabolism Cysteine and sulfide Methionine metabolism 241 L-Ȗ-glutamyl- 3-mercapto-pyruvate L-amino acid X pyruvate glutathione glutathione disulfide NADPH NADP+ L-cystathionine O-succinyl-L-homoserine 2-oxobutanoate

L-glutamate X L-glutamate Propanoate met. O-acetyl- Serine metabolism L-homoserine L-homoserine L-aspartate serine glycine X Aspartate metabolism Taurine 5-glutamyl-taurine L-homocysteine R-S-cysteine

L-alanine L-methionine 4-methylthio-2-oxobutanoate Arginine and Proline metabolism D-Glutamine and D-Glutamate metabolism pyruvate D-glutamate L-glutamine Amino sugar and sulfoacetaldehyde UDP-N-acetylmuramate nucleotide sugar met. Glutamate metabolism UDP-MurNAc-L-Ala-D-Glu Peptidoglycan biosynthesis O-D-alanyl- D-Alanine metabolism D-alanine D-alanyl-D-alanine poly(phosphoribitol)

Figure S11. Glutathione and cysteine and methionine biosynthetic processes. While sequences similar to O-succinyl- and O-acetyl-

L-homoserine were not found among 454 data, other sequences were similar to those of enzymes capable converting these substrates into L-homocysteine and subsequently into L-methionine. Also, sequences related to those that encode for glutathione producing enzymes were not found, however, those that are responsible for the production of glutamate, glycine, cysteine-glycine and cysteine from glutathione were found by KAAS KEGG search. Glycolysis

D-glyceraldehyde Carotenoid biosynthesis acetyl-CoA 242 3-phosphate pyruvate isorenieratene canthaxanthin Terpenoids backbone biosynthesis

2-C-methyl-D-erythritol-4P acetoacetyl-CoA ȕ-carotene chlorobactene

1-hydroxy-2-methyl- mevalonate-5P Ȗ-carotene 2-butenyl 4-diphosphate staphyloxanthin

dimethylallyl-PP isopentenyl-PP lycopene glycosyl-4,4’-diapo- phytoene neurosporenoate

geranyl-PP geranylgeranyl-PP 4,4’-diaponeurosporenic acid

di-trans, poly-cis- (E,E)-farnesyl-PP 4,4’-diapophytoene 4,4’-diaponeurosporene undercaprenyl-PP

Phenylalanine, Tyrosine chorismate heptaprenyl-PP and Tryptophan biosynthesis octaprenyl-PP Ubiquinone and other terpenoid-quinone biosynthesis 2-succinyl-5enolpyruvyl- 6-hydroxy-3-cyclohexene- 1-carboxylate 2-demethyl-menaquinone isochorismate menaquinone

Figure S12. Terpenoids backbone pathway for carotenoid biosynthesis supplied with acetyl-CoA and glyceraldehyde 3- phosphate from Glycolysis. Two products of glycolysis could possibly enter the isopentenyl diphosphate synthesis, which is used as an intermediate for the geranyl-diphosphate production (a precursor for different carotenoids synthesis), or octaprenyl diphosphate for menaquinone synthesis for the ubiquinone and other terpenoid-quinone biosynthesis processes. 243

Glutamate metabolism L-glutamate L-glutamyl-tRNA (Glu)

5-amino-levulinate glutamate-1-semialdehyde Porphyrin and Chlorophyll metabolism

uroporphyrinogen I hydroxymethylbilane transferrin-Fe uroporphyrin I

coproporphyrinogen I vitamin B12 coenzyme uroporphyrinogen III coproporphyrin I apotransferrin

Co-precorrin 3B precorrin 2 uroporphyrin III Fe3+ biliverdin Fe2+ Cob(II)yrinate coproporphyrin III a,c-diamide protoporphyrinogen IX precorrin 3B hemoglobin Co-precorrin 4 Coenzyme F430 precorrin 4 oxyhemoglobin protoporphyrin IX protoheme (heme)

Mg-protoporphyrin IX

Figure S13. Pigment synthesis with porphyrin and chlorophyll pathways. Current slide represents anaerobic synthesis only.

None of the sequences for the aerobic pigment production were detected. The sequence analysis also suggests the presence of sequences similar to those encoding for iron membrane transport proteins and heme-carrying enzymes. 3-fluorocyclohexadiene- 3-fluorocatechol 5-fluoromuconolactone 3-fluorobenzoate cis,cis-1,2-diol-1-carboxylate 5-fluorocyclohexadiene- 244 cis,cis-1,2-diol-1-carboxylate Fluorobenzoate degradation 4-fluorocatechol 4-fluoromuconolactone 4-fluorocyclohexadiene- 4-fluorobenzoate cis,cis-1,2-diol-1-carboxylate Chlrocyclohexane and trans-4-carboxymethylene- maleylacetate Chlorobenzene degradation but-2-en-4-olide cis-4-chlorodienelactone 2-chloro-maleylacetate 4-methylenebut-2-en-4-olide cis-acetylacrylate hydroxyquinol 1,2,4-benzenetriol Benzoate degradation

S-glutaryl- 4-nitrocatechol glutaryl-CoA 2-maleylacetate dihydrolipoamide gutaconyl-CoA cis-1,2-dihydroxy- cyclohexa-3,5-diene- catechol 2-oxoadipate crotonoyl-CoA crotonoyl-CoA 3-oxoadipate 1-carboxylate

nitrobenzene aniline Lysine degradation (S)-3-hydroxybutanoyl-CoA (S)-3-hydroxybutanoyl-CoA 3-oxoadipyl-CoA benzoate 4-hydroxybenzoate

protein-lysine acetoacetyl-CoA acetoacetyl-CoA succinyl-CoA benzamide phenol

trimethyl-lysine acetyl-CoA acetyl-CoA o/r TCA phenylboronic acid glycine Aminobenzoate degradation

4-trimethylammonio butanal 4-trimethylammonio butanoate Figure S14. Synthesis and degradation of different benzoate compounds. While benzoic (fluorobenzoate/ benzoate) compounds are commonly used as pesticides, naturally, they occur in plant essential oils. Aminobenzoate is produced by Bacteria degrading tryptophan.

Under anaerobic conditions denitrifying and methanogenic species are capable of degrading such aromatic compounds (benzoic acid is converted into CO2 and CH4) and recycle organic carbon (Vargas et al 2000; Eby et al 2001; Mouttaki et al 2009). 245

7,8-dihydropteroate 4-aminobenzoate chorismate Phenylalanine biosynthesis

Folate biosynthesis 7,8-dihydrofolate (DHF) folate cPMP GTP Purine metabolism

THF-L-glutamate THF-polyglutamate

5,6,7,8-tetrahydrofolate (THF) One carbon pool by folate

10-formyl-THF 5,10-methenyl-THF 5,10-methylene-THF

10-formyl-THF 5-methyl-THF

Figure S15. One carbon pool by folate cycle and the synthesis of folate from THF. The folate can be produced directly from

DHF, which is synthesized from 4-aminobenzoate in phenylalanine synthesis pathways (sequences for some enzymes were not detected), or from THF-L-glutamate. 246

N-carbamoyl- ȕ-alanine ȕ-Alanine metabolism malonate ȕ-alanine ȕ-aminopropionaldehyde L-aspartate Aspartate metabolism semialdehyde histidine

carnosine dephospho-CoA CoA 3-hydroxy- D-4’-phospho- 4’-phospho- propanyol-CoA (R)-pantothenate Pantothenate and CoA biosynthesis pantothenate pantetheine

acrylyl-CoA (R)-pantoate L-cysteine Cysteine metabolism

2-dehydropantoate 3-methyl-2-oxobutanoate L-valine Valine biosynthesis

Figure S16. Schematic diagram for the ȕ-alanine metabolism. Together with valine and cysteine metabolic pathways ȕ- alanine is required for 4’-phosphopantetheine synthesis, which is an intermediate compound for the coenzyme A synthesis.

Although, enzymes producing 4’-phosphopantothenate were not found in the KAAS KEGG results, the rest of the pathway was complete with ȕ-alanine presence. 247

Abbreviations for supplementary figures

(2S) or (R) in front of the molecule indicates that it is ether with S- (sinister) or with R-

(rectus) configuration at the chiral center (a carbon atom of a molecule that has two groups, position of those creates stereoisomers). For four groups attached to a carbon atom, hydrogen has always lowest priority (rank 4) and is not used and hydroxyl group (if present) has highest rank

1. Distance from the group with rank 1 to that with rank 3 via rank 2 will be either clockwise (R) or counterclockwise (S).

(Glc)2 (GlcNAc)2-(Man)9 (Asn)1 glycoprotein of the endoplasmic reticulum lumen, which consists of 2 moleculeVRIJOXFRVHOLQNHG Į- glycosidic bond) to 9 mannose molecules, which

DUHOLQNHG ȕ-glycosidic bond) to 2 molecules of N-acetyl-D-glucosamine, with attached 1 molecule of asparagine.

[ThiI]-SSH contains Thil enzyme with attached persulfide to its cysteine residue. In Bacteria and Archaea tRNAs 4-thiouridyl modification (s4U close to the D-loop) creates crosslinked tRNAs basically protecting them from aminoacylation (maturation). Thil complex in the presence of ATP performs tRNA modifications by transferring sulfur to s4U at the position 8 of the tRNA (Tanaka et al 2009).

[ThiS]-COSH is a ThiS protein that was post-transcriptionally modified into thiocarboxylate. This intermediate sulfur carrier protein is involved in thiamine biosynthesis and ubiquitin like protein-protein interactions (Xi et al 2001).

2-hydroxyethyl-ThPP is the same as 2-(alpha-hydroxyethyl)-thiamine diphosphate.

2-propyn-1-al is a highly reactive alpha,beta-unsaturated aldehyde, which originates from oxidation or bioactivation of 2-propyn-1-ol, which is a propargyl alcohol (HC2CH2OH)

(DeMaster et al 1994). 248

3P-D-glycerate – 3-phospho-D-glycerate.

2- or 3- or 4-fluorocyclohexadiene-cis,cis-1,2-diol-1-carboxylate: this molecule (C7H7FO4) contains fluorine (either at second, third or fourth carbon) and hydroxyl group at a forth carbon atom of the cyclohexadiene (six carbon cyclic with two double bonds), and on the same side (cis isomer) at the 1st and 2nd carbon position containing two hydroxyl groups (1,2-diol) and carboxyl group at the 1st carbon atom.

4-methylenebut-2-en-4-olide also called protoanemonin, a lactone molecule: 4 carbon cyclic

st with double bond with oxygen on the 1 carbon atom. Also, this protoanemonin (C5H4O2,

th nd rd Ranunculaceae toxin) has CH2 group on the 4 carbon, double bond between 2 and 3 carbon atoms.

Acetyl-P - acetyl phosphate.

ACP - acyl carrier protein (fatty acid biosynthesis).

AICAR - 5-aminoimidazole-4-carboxamide ribonucleotide is an intermediate in the generation of inosine monophosphate.

AMP, ATP – adenosine monophosphate, adenosine triphosphate.

cis-3-chloro-2-propene-1-ol – cis isomer three carbon alcohol (C3H5ClO), chlorine is on the

3rd carbon, double bond between 2nd and 3rd carbon. Hydroxyl group at the 1st carbon, molecule is linear.

CoA- coenzyme A.

Cob(II)yrinatea,c-diamide - C45H56N6O12Co is NADH-dependent flavoenzyme exhibiting reductase activity. 249

Coenzyme F430 – nickel porphinoid with absorption maximum of 430 nm; the most reduced tetrapyrrole. Found in anaerobic Bacteria that are capable of the reverse methanogenesis and methanogenic Archaea (Diakun et al 1985).

Co-precorrin-3B –complex of cobalt (Co3+) and pigment from porphyrin and chlorophyll pathways. Accumulates when NADPH concentrations drop down. Aerobic conditions only.

cPMP - cyclic pyranopterin phosphate.

Creatine-P – phosphocreatine, reserve of high-energy phosphates.

CTP – cytosine triphosphate.

D-Ala-D-Ala – D-alanine dimmer ligated in the presence of ATP and ligase enzyme.

Fe3+ / Fe2+ - molecule of iron with oxidation state +3 and +2.

GDP, GTP –guanosine diphosphate, guanosine triphosphate.

GlcNAcα1-2Glcα1-2Galα1- lipopolysaccharide with N-acetyl-alpha-D-glucosamine, alpha-

1,2-glucose followed with 1,2-alpha-D-galactose.

Glycerone-P – glycerone phosphate.

GPI-anchor – glycosylphosphatidylinositol protein plays an important role in organization.

IMP – inosine monophosphate.

lauroyl-KDO2-lipid IV (A) – other name (KDO)2-(lauroyl)-lipid IVA. A lipid A comprising lipid IVA glycosylated with two 3-deoxy-D-manno-octulosonic acid (KDO) residues and carrying an additional dodecanoyl group (ChEBI:27422).

meso-2,6-diamino-pimelate – an isomer of the pimelate molecule containing amino groups

nd th at 2 and 6 carbon (on the same side) (C7H12N2O4).

Mg – magnesium. 250

NAD / NADH - nicotinamide adenine dinucleotide / reduces molecule.

NADP / NADPH - nicotinamide adenine dinucleotide phosphate / reduces molecule

N-carbamoyl-L-aspartate – product of the first steps of pyrimidine biosynthesis.

o/rTCA – oxidative/reductive Tricarbonic Acid Cycle ( or Krebs Cycle).

O-acetyl-L-homoserine - COCH3(O)CH2CH2CHNH2COOH is a product of alpha amino acid homoserine and acetyl functional group.

PEP – phosphoenol pyruvate.

Pi – inorganic phosphate.

PRPP – phosphoribosyl pyrophosphate.

THF – tetrahydrofolic acid.

TMP - thymidine monophosphate.

UDP-GlcNAc - uridine diphosphate N-acetylglucosamine.

UDP-MurNAc-L-Ala-D-Glu - uridine diphosphate N-acetylmuramic acid with attached alanine-glucose dipeptide (molecular formula C28H43N5O23P2).

UMP, UTP – uridine monophosphate, uridine triphosphate. 251

References for the supplementary information

DeMaster EG, Dahlseid T, Redfern B (1994) Comparative oxidation of 2-propyn-1-ol with other low molecular weight unsaturated and saturated primary alcohols by bovine liver catalase in vitro. Chem Res Toxicol 7(3): 414-419.

Tanaka Y, Yamagata S, Kitago Y, Yamada Y, Chimnaronk S, Yao M, and Tanaka I (2009)

Deduced RNA binding mechanism of ThiI based on structural and binding analyses of a minimal

RNA ligand. RNA 15(8): 1498–1506.

Xi J, Ge Y, Kinsland C, McLafferty FW, and Begley TP (2001) Biosynthesis of the thiazole moiety of thiamin in Escherichia coli: Identification of an acyldisulfide-linked protein–protein conjugate that is functionally analogous to the ubiquitin/E1 complex. PNAS 98(15): 8513-8518.

Diakun GP, Piggott B, Tinton HJ, Ankel-Fuchs D, Thauert RK (1985) An extended-X-ray- absorption-fine-structure (e.x.a.f.s.) study of coenzyme F430 from Methanobacterium thermoautotrophicum. Biochem. J. 232: 281-284.

Vargas C, Song B, Camps M, Haggblom MM (2000) Anaerobic degradation of fluorinated aromatic compounds. Appl Microbiol Biotechnol 53: 342-347.

Mouttaki H, Nanny MA, McInerney MJ (2009) Metabolism of hydroxylated and fluorinated benzoates by Syntrophus aciditrophicus and detection of a fluorodiene metabolite. Appl Environ

Microbiol 75(4): 998-1004.

Eby DM, Beharry ZM, Coulter ED, Kurtz DM JR, Neidle EL (2001) Characterization and evolution of anthranilate 1,2-dioxygenase from Acinetobacter sp. strain ADP1. Journal of

Bacteriology 183(1): 109–118.