*Manuscript Click here to download Manuscript: JBEI_BioEnergyResearch_11march2010.doc

1 2 3 4 1 5 6 7 2 Strategies for Enhancing the Effectiveness of Metagenomic-based Enzyme 8 9 10 3 Discovery in Lignocellulolytic Microbial Communities 11 12 4 13 14 1,2,* 1,3,* 1,4 1,3 15 5 Kristen M. DeAngelis , John M.Gladden , Martin Allgaier , Patrik D’haeseleer , Julian 16 17 6 L. Fortney1,2, Amitha Reddy1,5, Philip Hugenholtz1,4, Steven W. Singer1,2, Jean S. Vander 18 19 1,5 2,6 1,7 1,2,6,+ 20 7 Gheynst , Whendee L. Silver , Blake A. Simmons , and Terry C. Hazen 21 22 8 23 24 1 25 9 Affiliations: Microbial Communities Group, Deconstruction Division, Joint BioEnergy Institute, 26 27 10 Emeryville CA; 2Earth Sciences Division, Lawrence Berkeley National Lab; 3Physical and 28 29 11 Sciences Directorate, Lawrence Livermore National Laboratory; 4Joint Genome Institute, Walnut 30 31 5 32 12 Creek, CA; Department of Biological and Agricultural Engineering, University of California, 33 34 13 Davis; 6Ecosystem Sciences, Policy and Management, University of California, Berkeley; 35 36 7 37 14 Biomass Science and Conversion Technology Department, Sandia National Laboratory, 38 39 15 Livermore, CA 40 41 * 42 16 These authors contributed equally to this manuscript. 43 44 17 +Corresponding author: Ecology Department, Earth Sciences Division, Lawrence Berkeley 45 46 47 18 National Lab, One Cyclotron Road MS 70A-3317, Berkeley CA 94720. Tel. 510 486 6223; fax 48 49 19 510 486 7152; email [email protected] 50 51 20 52 53 54 55 22 56 57 58 59 23 60 61 62 Enrichment Cultures for Enzyme Discovery, page 1 of 31 63 64 65 DISCLAIMER

This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or The Regents of the University of California.

Ernest Orlando Lawrence Berkeley National Laboratory is an equal opportunity employer.

1 2 3 4 24 Abstract 5 6 7 25 8 9 26 Producing cellulosic biofuels from material has recently emerged as a key U.S. Department 10 11 12 27 of Energy goal. For this technology to be commercially viable on a large scale, it is critical to 13 14 28 make production cost efficient by streamlining both the deconstruction of lignocellulosic 15 16 29 biomass and fuel production. Many natural ecosystems efficiently degrade lignocellulosic 17 18 19 30 biomass and harbor enzymes that, when identified, could be used to increase the efficiency of 20 21 31 commercial biomass deconstruction. However, ecosystems most likely to yield relevant 22 23 24 32 enzymes, such as tropical rain forest soil in Puerto Rico, are often too complex for enzyme 25 26 33 discovery using current metagenomic sequencing technologies. One potential strategy to 27 28 29 34 overcome this problem is to selectively cultivate the microbial communities from these complex 30 31 35 ecosystems on biomass under defined conditions, generating less complex biomass-degrading 32 33 34 36 microbial populations. To test this premise, we cultivated microbes from Puerto Rican soil or 35 36 37 green waste compost under precisely defined conditions in the presence dried ground switchgrass 37 38 38 (Panicum virgatum L.) or lignin, respectively, as the sole carbon source. Phylogenetic profiling 39 40 41 39 of the two feedstock-adapted communities using SSU rRNA gene amplicon pyrosequencing or 42 43 40 phylogenetic microarray analysis revealed that the adapted communities were significantly 44 45 46 41 simplified compared to the natural communities from which they were derived. Several 47 48 42 members of the lignin-adapted and switchgrass-adapted consortia are related to organisms 49 50 51 43 previously characterized as biomass degraders, while others were from less well-characterized 52 53 44 phyla. The decrease in complexity of these communities make them good candidates for 54 55 45 metagenomic sequencing and will likely enable the reconstruction of a greater number of full- 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 2 of 31 63 64 65 1 2 3 4 46 length genes, leading to the discovery of novel lignocellulose-degrading enzymes adapted to 5 6 7 47 feedstocks and conditions of interest. 8 9 48 10 11 12 49 Keywords Lignocellulolytic; enzymes; metagenome; community; rain forest; compost; 13 14 50 PhyloChip; pyrotag 15 16 51 17 18 19 52 20 21 53 Introduction 22 23 24 54 25 26 55 The US Department of Energy has recently made alternative liquid fuel production from 27 28 29 56 lignocellulosic biomass a primary goal. Establishing such renewable, low-carbon liquid fuel 30 31 57 alternatives is a critical short- and long-term solution to the environmental problems and national 32 33 34 58 security risks associated with petroleum consumption. Cellulosic biofuels are one such 35 36 59 alternative that are receiving unprecedented international attention, owing to the large, 37 38 60 underutilized reservoir of renewable energy in plant biomass [30, 10]. Currently, one of the 39 40 41 61 major barriers to the large-scale production of inexpensive cellulosic biofuels is the ability to 42 43 62 efficiently deconstruct biomass into fermentable carbon sources, such as and xylose. 44 45 46 63 Enzymatic saccharification of the plant cell polymers cellulose and hemicellulose is an efficient 47 48 64 method to obtain these sugars from biomass, but this process is costly using present-day fungal 49 50 51 65 commercial enzyme cocktails. Discovery of more efficient and robust biomass-degrading 52 53 66 enzymes will drive down costs and increase the economic viability of this technology. 54 55 67 Many natural ecosystems, such as soils and compost, almost completely mineralize plant 56 57 58 68 biomass. The indigenous microbes in these ecosystems may provide a rich reservoir of genes 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 3 of 31 63 64 65 1 2 3 4 69 relevant to the development of cellulosic biofuels. Target genes include glycosyl hydrolases, 5 6 7 70 enzymes that convert simple sugar intermediates into biofuels [30], and lignolytic enzymes that 8 9 71 can either release cellulose from the plant polymer lignin to increase sugar yields from biomass, 10 11 12 72 or facilitate lignin transformation to biobased products. Lignin is of special interest, since 13 14 73 currently it is a waste stream in cellulosic biofuels production that is burned to recover heat [24]. 15 16 74 Our research, within the Microbial Communities Group, Deconstruction Division, U.S. DOE 17 18 19 75 Joint BioEnergy Institute (JBEI), focuses on two natural biomass-degrading ecosystems: the 20 21 76 tropical forest soils of Puerto Rico and municipal green waste compost. Wet tropical forest soils 22 23 24 77 are some of the most productive and diverse terrestrial ecosystems on earth. A recent study 25 26 78 identified tropical forest soils as the fastest decomposing soils of plant material compared to all 27 28 29 79 other biomes globally [39]. Green waste compost is another ecosystem where microorganisms 30 31 80 rapidly break down lignocellulosic biomass into carbon dioxide, water, and humus. This 32 33 34 81 degradation is so fast, in fact, that the compost heap can heat to 60–70°C, due to the metabolic 35 36 82 activity of the microbial community. We are using metagenomics, proteomics, and 37 38 83 transcriptomics to investigate these communities, both in their native state and after cultivation 39 40 41 84 on candidate bioenergy feedstocks (Fig. 1). 42 43 85 Identifying specific genes from these ecosystems, which have a high degree of microbial 44 45 46 86 diversity, is challenging. Fortunately, next-generation sequencing technologies such as 454 47 48 87 pyrosequencing can facilitate the discovery of relevant genes [30]. Recent metagenome studies 49 50 51 88 have demonstrated that it is possible to assign functional annotations to partial gene sequences 52 53 89 from shotgun sequence reads with a reasonable degree of accuracy, based on BLASTX hits 54 55 90 against reference databases [38, 45]. Such annotation can provide a useful functional profile of a 56 57 58 91 community and help identify gene categories of interest. However, this study and others [1] 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 4 of 31 63 64 65 1 2 3 4 92 indicate that shotgun metagenome sequence data from highly complex natural microbial 5 6 7 93 communities is of limited use for targeted enzyme discovery, because of the lack of contiguous 8 9 94 sequences (contigs) large enough to contain a complete open reading frames (ORFs); for 10 11 12 95 cellulases, this is at least 1 kb [33]. For example, Allgaier et al. [1] found only 25 potentially 13 14 96 full-length lignocellulose degrading enzymes from a switchgrass-compost microbial community. 15 16 97 To discover and extract a greater number of novel functional enzymes for bioenergy 17 18 19 98 applications, lower complexity metagenomic data sets that are more amenable to assembly are 20 21 99 required. One possible method for generating less complex microbial communities is to adapt 22 23 24 100 environmental communities to feedstocks (e.g., switchgrass, or lignin) under fixed conditions, 25 26 101 such as temperature, pH, etc. that are more directly relevant to the feedstocks and process 27 28 29 102 conditions expected for large-scale industrial biomass deconstruction, with the expectation that 30 31 103 the resulting communities will be enriched for microbes expressing desired enzymes. 32 33 34 104 In this manuscript, we present the results of a shotgun sequencing by 454 Titanium 35 36 105 technology of tropical forest soils, which were sufficiently complex to resist assembly of full- 37 38 106 length genes. While we present analysis of known lignocellulolytic enzymes and the prospects 39 40 41 107 for enzyme discovery with this data set, the lack of assembly demonstrates the need for less 42 43 108 complex communities. To test the premise that feedstock-adapted communities reduce diversity 44 45 46 109 while preserving function, Puerto Rican tropical forest soil and municipal green waste compost 47 48 110 microbial communities were adapted to switchgrass anaerobically and lignin aerobically, 49 50 51 111 respectively. Small subunit (SSU) rRNA profiling of both communities demonstrated that they 52 53 112 were less complex than the starting inocula and enriched for organisms that we predict are more 54 55 113 suited to degrading the target feedstock. 56 57 58 114 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 5 of 31 63 64 65 1 2 3 4 115 5 6 7 116 Materials and Methods 8 9 117 10 11 12 118 Environmental Samples. Tropical forest soil samples were collected in a subtropical lower 13 14 119 montane wet forest in the Luquillo Experimental Forest, which is part of the NSF-sponsored 15 16 120 Long-Term Ecological Research program in Puerto Rico (18°18’N, 65°50’W). The dominant 17 18 -1 19 121 plant cover is Dacryodes excelsa, with plant richness of 50 ha made up mostly of early 20 21 122 sucessional trees, herbs, and tree ferns. Climate is relatively aseasonal, with mean annual rainfall 22 23 24 123 of 4500 mm and mean annual temperatures of 22 to 24C [49, 36]. Soils are acidic (pH 5.5), 25 26 124 clayey ultisols with high iron and aluminum content and characterized by redox fluctuating from 27 28 29 125 oxic to anoxic on a scale of weeks [42, 36]. Soils were collected from the Bisley watershed ~250 30 31 126 meters above sea level (masl) from the 0–10 cm depth, using a 2.5 cm diameter soil corer. Cores 32 33 34 127 were stored intact in Ziploc bags at ambient temperature and immediately transported to the lab, 35 36 128 where they were used for growth inocula or frozen in liquid nitrogen and stored at -80°C for 37 38 39 129 molecular analysis. Green waste compost inoculum was obtained from a Grover Soil Solutions 40 41 130 compost facility located in Zamora, CA, on August 6, 2007. Details of sampling and storage are 42 43 131 described in Allgaier et al [1]. 44 45 46 132 47 48 133 DNA Extraction. DNA from the rain forest soil was used for metagenomic sequencing and SSU 49 50 51 134 rRNA profiling; DNA from the green waste compost and the switchgrass- or lignin-adapted 52 53 135 cultures was used only for SSU rRNA profiling. DNA was extracted using a modified CTAB 54 55 56 136 extraction method with bead beating and phenol-chloroform extraction, as previously 57 58 137 described[6], with the following exceptions: Phosphate buffer concentration was 250 mM; 50 µL 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 6 of 31 63 64 65 1 2 3 4 138 100 mM aluminum ammonium sulfate (Sigma Aldrich, St. Louis, MO) was added to the soils as 5 6 7 139 a flocculant for excess humics [5]; DNA was precipitated in Peg6000 solution (30% (w/v) 8 9 140 polyethylene glycol 6000 (Sigma Aldrich, St. Louis, MO) in 1.6M NaCl); soils were extracted a 10 11 12 141 second time as above. These crude extractions were treated further using the Qiagen DNA RNA 13 14 142 AllPrep kit (Qiagen, Valencia, CA). 15 16 143 17 18 19 144 20 21 145 Metagenome Sequencing and Analysis. The native Puerto Rican rainforest soil microbial 22 23 24 146 community was prepared for metagenomic sequencing. Genomic DNA was extracted from the 25 26 147 rain forest soil sample and used for sequencing library construction following the DOE Joint 27 28 29 148 Genome Institute standard operating procedure for shotgun sequencing using the Roche 454 GS 30 31 149 FLX Titanium technology (Branford, CT). Obtained sequencing reads were quality trimmed 32 33 34 150 using the software tool ―lucy‖ [11] to an accuracy of 99.3%. The unassembled metagenomic 35 36 151 data set was loaded into MG-RAST [37] for functional analysis, and subsystem-based metabolic 37 38 152 profiles were computed at a maximum E-value of 1e-10. We used the November 2008 release of 39 40 41 153 PRIAM [12], modified for nucleotide queries using the ―-p F‖ flag of rpsblast, to assign four- 42 43 154 digit EC (Enzyme Commission) numbers, at E<1e-10. Finally, we used BLASTX to search for 44 45 46 155 homologs (again, at E<1e-10) against 87,000 enzyme sequences in a local copy of the CAZy [9] 47 48 156 and FOLy [31] databases. We used the best BLASTX hit for each sequence read to assign 49 50 51 157 protein family memberships, and also used the best BLASTX hit against the 6,367 CAZy and 52 53 158 FOLy enzymes that have independently validated EC numbers to assign a putative EC number to 54 55 159 the sequence read. (Note that many of the protein families in CAZy are highly multifunctional, 56 57 58 160 so in general it is difficult to assign enzyme function based merely on family membership.) 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 7 of 31 63 64 65 1 2 3 4 161 5 6 7 162 Cultivation Conditions. For adaptation to growth on feedstocks as sole carbon source, tropical 8 9 163 forest soil cores were homogenized and inoculated in basal salts minimal medium (BMM) [43] 10 11 12 164 containing trace minerals [50, 44], vitamins [25], and buffered to pH 5.5 to match the measured 13 14 165 soil pH using MES. This medium was used for all enrichments (Table 1). Soils were added at 15 16 166 0.5 g per 200 mL BMM, and the resulting mixture was incubated anaerobically at ambient 17 18 -1 19 167 temperatures for eight weeks with 10 g L dried, ground switchgrass as the sole carbon source. 20 21 168 Samples of switchgrass (MPV 2 cultivar) were kindly provided by the laboratory of Dr. Ken 22 23 24 169 Vogel (USDA, ARS, Lincoln, NE). 25 26 170 For the lignin-adapted compost consortia, 100 mg of compost was added to 50 mL of 27 28 29 171 M9TE medium [34] amended with 0.5 g/L lignin in a 250 mL shaker flask and shaken at 200 30 31 172 rpm for two weeks at 37°C. Three types of lignin were used in the enrichment cultures: (1) AL- 32 33 34 173 Alkali lignin with low sulfonate content (Sigma-Aldrich cat. No. 471003, St. Louis, MO), (2) 35 36 174 OL- Organosolv lignin, propionate (Sigma-Aldrich cat. No. 371033, St. Louis, MO), (3) IL- 37 38 175 Indulin AT (Meadwestvaco, Richmond, VA) that had been washed with water to remove soluble 39 40 41 176 lignin by Soxhlet extraction, following the protocol outlined in Giroux et. al.[21]. The water- 42 43 177 soluble AL is sulfonated with a molecular weight (MW) range between 10,000 and 60,000; the 44 45 46 178 OL is a mixture of soluble and insoluble lignin; and the IL is unsulfonated, completely insoluble, 47 48 179 and likely contains only high MW lignin. 49 50 51 180 Two milliliters of this culture were then seeded into a fresh flask containing 50 mL of 52 53 181 M9TE amended with 0.5 g/L lignin in a 250 mL shaker flask shaken at 200 rpm for two weeks at 54 55 182 37°C. This step was repeated three times. After the fifth enrichment culture, cells were prepared 56 57 58 183 for DNA extraction, culture supernatant (10 mL) was placed into five × 2 ml microcentrifuge 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 8 of 31 63 64 65 1 2 3 4 184 tubes and spun at maximum speed for five min. The pellets from each tube were combined and 5 6 7 185 transferred to a two mL lysing matrix E tube (MP Biomedicals) and frozen at -80°C. 8 9 186 The M9TE medium was prepared using the following recipe per liter: 200 mL 5 M9 10 11 12 187 minimal salts, two mL 1M MgCl2, 100 µL 1M CaCl2, and 60 mL 17x trace elements (final 13 14 188 concentration: 470 µM nitrilotriacetic acid, 730 µM MgSO4.7H2O, 180 µM MnSO4.H2O, 1 mM 15 16 17 189 NaCl 1 g, 33 µM FeCl2, 39 µM CoSO4, 41 µM CaCl2.2H2O, 21 µM ZnSO4.7H2O, 2.4 µM 18 19 190 CuSO4.5H2O, 1.3 µM AlK(SO4)2.12H2O, 97 µM H3BO3, 2.7 µM Na2MoO4.H2O). The pH was 20 21 22 191 then adjusted 6.5 with KOH. M9TE medium was filter sterilized through a 0.2 µm filter. 23 24 192 25 26 193 SSU ribosomal RNA profiles using PhyloChip Gene Microarray and Amplicon Pyrosequencing. 27 28 29 194 PhyloChip and amplicon pyrosequencing were used to generate SSU rRNA profiles of native 30 31 195 and adapted soil communities and native and adapted compost communities, respectively. For 32 33 34 196 PhyloChip analysis, purified genomic DNA was quantified and 30 ng was added to a PCR 35 36 197 reaction for amplification of the SSU ribosomal RNA genes. Primers used were 8F for , 37 38 39 198 4Fa for , and the same reverse primer for both, 1492R [51, 23]: PCR amplification was 40 41 199 performed as previously described [14]. For application onto the high-density SSU rDNA 42 43 200 microarray (PhyloChip), PCR products were concentrated to 500 ng (bacteria) or 200 ng 44 45 46 201 (archaea), then pooled, fragmented, biotin-labeled and hybridized as previously described [6]. 47 48 202 For amplicon pyrosequencing, small subunit (SSU) rRNA gene sequences were amplified using 49 50 51 203 the primer pair 926f/1392r, as described in Kunin et al. [28]. The reverse primer included a five 52 53 204 bp barcode for multiplexing of samples during sequencing. Emulsion PCR and sequencing of the 54 55 56 205 PCR amplicons were performed following manufacturer’s instructions for the Roche 454 GS 57 58 206 FLX Titanium technology (Branford, CT), with the exception that the final dilution was 1e-8. 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 9 of 31 63 64 65 1 2 3 4 207 Sequencing tags were analyzed using the software tool PyroTagger (http://pyrotagger.jgi- 5 6 7 208 psf.org/) using a 395 bp sequence length threshold. 8 9 209 10 11 12 210 13 14 211 Results 15 16 212 17 18 19 213 Native Lignocellulolytic Microbial Community Metagenomics. In this study, we used 20 21 214 metagenomics to identify enzymes from the native microbial community present in a tropical 22 23 24 215 rain forest soil sample from Puerto Rico (Allgaier et al. analyzed the metagenome of a 25 26 216 switchgrass compost in a previously published study [1]). We performed shotgun sequencing 27 28 29 217 using the Roche 454 GS FLX Titanium technology to obtain metagenomic data for the native 30 31 218 soil community, resulting in 863,759 reads, with an average read length of 417 bp, for a total of 32 33 34 219 350Mbp of sequence data. After trimming and quality control, the final data set resulted in 35 36 220 780,588 reads equaling 321 Mbp. Assembly of the tropical forests soil metagenome was 37 38 221 attempted using the Newbler assembler software by 454 Life Sciences, but the species 39 40 41 222 composition of the sample was too complex to yield any significant assembly of the metagenome 42 43 223 sequence reads. MG-RAST was able to assign various degrees of functional annotation to 29.7% 44 45 46 224 of the sequences (232,025) at E<1e-10. PRIAM enzyme-specific sequence profiles assigned 4- 47 48 225 digit EC numbers to 110,411 sequence reads at E<1e-10. BLASTX of the rainforest metagenome 49 50 51 226 to the CAZy and FOLy databases resulted in 29,051 protein family assignments, and 9,041 EC 52 53 227 number assignments (Fig. 2), also at E<1e-10. 54 55 228 The most abundant carbohydrate and lignin active enzyme families, as inferred by best 56 57 58 229 BLASTX hits against CAZy and FOLy, are glycoside hydrolases (GH, 12,193 BLASTX hits) 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 10 of 31 63 64 65 1 2 3 4 230 and glycosyl transferases (GT, 11,562 hits), followed by carbohydrate esterases (CE, 3133 hits), 5 6 7 231 lignin oxidases (LO, 413 hits), polysaccharide lyases (PL, 282 hits), and lignin-degrading 8 9 232 auxiliary oxidases (LDA, 240 hits) (Fig. 3). GT2 and GT4 are large, predominantly bacterial, 10 11 12 233 multifunctional enzyme families, and most of the sequences present here seem to be involved in 13 14 234 aspects of bacterial cell-wall biogenesis. GH13 contains mostly starch- and glycogen-degrading 15 16 235 enzymes, as well as trehalose synthases. Lignolytic enzymes are relatively low in abundance 17 18 19 236 (likely due to the smaller number of known reference enzymes, and the lack of bacterial 20 21 237 sequences in FOLy), consisting mainly of putative cellobiose dehydrogenases (LO3, 361 hits), 22 23 24 238 and a small number of aryl-alcohol oxidases (LDA1, 124 hits), glucose oxidases (LDA6, 82 hits) 25 26 239 and laccases (LO1, 47 hits). Not shown are 3645 hits against enzymes with carbohydrate 27 28 29 240 binding modules (CBM), the two most abundant of which are CBM13 (previously known as 30 31 241 cellulose-binding family XIII, 1366 hits) and the glycogen-binding CBM48 (826 hits). 32 33 34 242 Supplementary table S1 shows similar abundance patterns for selected lignocellulose 35 36 243 degrading GH families across rain forest soil (this study), compost [1], cow rumen [7], and 37 38 244 termite metagenome [48] datasets. GH family counts in the latter metagenomes were based on 39 40 41 245 pfam hits to open reading frames on assembled metagenome contigs; however, the good 42 43 246 correlation between BLASTX hits on unassembled reads and pfam hits on assembled ORFs 44 45 46 247 suggests that BLASTX hits provide a reasonable proxy for enzyme family assignment. Beta- 47 48 248 glucosidases and other oligosaccharide degrading families are the most abundant in the rain 49 50 51 249 forest reads. The main cellulase families represented are GH5 (243 hits) and GH9 (86 hits), 52 53 250 whereas other traditional cellulase families (GH7, GH45, GH48) are only present in very low 54 55 251 abundance (if at all) in all four microbiomes examined. 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 11 of 31 63 64 65 1 2 3 4 252 Overall, the native soil community has a variety of potentially interesting genes for use in 5 6 7 253 industrial biofuels production. However, the lack of assembly of full-length genes in this data 8 9 254 set and the rather short list of full-length genes identified from compost in an earlier study by 10 11 12 255 Allgaier et al. [1] prompted us to investigate whether selective cultivation on bioenergy 13 14 256 feedstocks could reduce the community complexity of these native communities, thereby 15 16 257 facilitating identification of a greater number of full-length genes in future metagenomic 17 18 19 258 sequencing efforts. 20 21 259 22 23 24 260 Switchgrass-adapted Puerto Rican Rainforest Soil Cultures. We chose to adapt the tropical soil 25 26 261 sample to switchgrass under anaerobic conditions to select for anaerobic biomass-degrading 27 28 29 262 microbes, since many of these organisms produce cellulosomes, multi-enzyme complexes 30 31 263 capable of depolymerizing both cellulose and hemicellulose. Anaerobic switchgrass-adapted 32 33 34 264 consortia were enriched from tropical forest soils by passing the communities two times for six 35 36 265 weeks, with switchgrass as the sole carbon source, under anaerobic conditions with and without 37 38 266 supplemental iron (Table 1). The richness of the original soil sample was 1339 taxa as 39 40 41 267 determined by PhyloChip (Fig. 4a), and growth on switchgrass as the sole carbon source reduced 42 43 268 the richness to 84 taxa, while inclusion of iron in the consortia growth media resulted in a 44 45 46 269 richness of 336 taxa. There were archaea present in the soils that were not present in either 47 48 270 feedstock-adapted community, along with taxa from 20 phyla (Table 2), 49 50 51 271 Taxa in the switchgrass-adapted communities lacking iron were heavily dominated by 52 53 272 , , and (Fig. 4a), with members of the 54 55 273 Proteobacteria making up 83% of the richness of the switchgrass-adapted communities. All of 56 57 58 274 the taxa enriched in the switchgrass-amended cultures lacking iron were also present in the iron- 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 12 of 31 63 64 65 1 2 3 4 275 amended cultures (Fig. 4a). Taxa that were specifically enriched in the presence of iron and not 5 6 7 276 found in the non-iron consortia were mostly dominated by the Bacteroidetes, 8 9 277 Desulfovibrionaceae (class Deltaproteobacteria), Caulobacterales (class ), 10 11 12 278 and Enterococcales (class Bacilli). There were also representatives from many phyla that are 13 14 279 rare, uncultivated, and otherwise of cryptic function (Table 2). Of the 41 phyla originally 15 16 280 represented in the soil, only nine phyla remained when communities were adapted to switchgrass 17 18 19 281 only, while iron addition resulted in the growth of taxa from 28 different phyla on switchgrass as 20 21 282 the sole carbon source. These results clearly show that selective growth does indeed reduce the 22 23 24 283 complexity of a native soil community. 25 26 284 27 28 29 285 Lignin-adapted Municipal Green Waste Compost Cultures. To test a different set of selective 30 31 286 conditions, we adapted the municipal green waste compost community to various types of 32 33 34 287 purified lignin as the sole carbon source under aerobic conditions, to select for microbes 35 36 288 specialized in the deconstruction and/or modification of lignin (Table 1). The community was 37 38 289 grown under aerobic conditions since many lignin-degrading enzymes use oxidative chemistry to 39 40 41 290 depolymerize lignin. Three types of lignin were chosen as the carbon source: alkali lignin (AL), 42 43 291 organosolv lignin (OL), and Indulin AT (IL). The AL is sulfonated, and completely water- 44 45 46 292 soluble with a MW range reported by Sigma Aldrich (St. Louis, MO) between 10,000 and 47 48 293 60,000. The OL is a mixture of soluble and insoluble lignin, indicating that it may have higher 49 50 51 294 MW fragments of lignin than the AL. The IL is unsulfonated and completely insoluble (the 52 53 295 soluble fraction was removed by water extraction), and therefore is likely to contain only high 54 55 296 MW lignin. The range of water solubility and MW for these types of lignin is likely to facilitate 56 57 58 297 the enrichment of microbes with a broad range of lignin-degrading attributes. 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 13 of 31 63 64 65 1 2 3 4 298 After five two-week enrichments, the composition of each lignin-adapted microbial 5 6 7 299 community was determined by SSU rRNA amplicon pyrosequencing. In the compost inoculum, 8 9 300 a total of 391 different taxa were identified representing 24 bacterial and eukaryotic phyla. The 10 11 12 301 diversity of the compost inoculum microbial community was reduced to 30, 98, and 136 taxa in 13 14 302 the lignin-enriched cultures belonging to 15 , with each of the three lignin-adapted 15 16 303 cultures dominated by Alphaproteobacteria (Fig. 4b, Table 2). In general, were of 17 18 19 304 minor abundance in both the compost inoculum and the lignin-adapted communities, with the 20 21 305 exception of one taxon belonging to the Alveolata, which accounted for two percent of the 22 23 24 306 microbial community in the OL enrichment. 25 26 307 Identifying the organisms shared between the three cultures may indicate which 27 28 29 308 organisms are playing an active role in lignin modification and depolymerization. Eight 30 31 309 phylotypes are common in all three lignin-amended cultures. Five of these phylotypes have 32 33 34 310 cultured representatives: Paracoccus sp. Str. WB1, Mesorhizobium sp. Str. CCBAU 41182, 35 36 311 Brackish water isolate str. HINUF007, Rhizobium sp. str. RM1-2001, Hyphomicrobium aestuarii 37 38 312 str. DSM 1564. The other three phylotypes are most closely related to SSU rDNA clones 39 40 41 313 recovered from environmental samples (Water manure clone, Aspen rhizosphere clone, and 42 43 314 Solid waste clone). 44 45 46 315 Again, the lignin-adapted communities showed a substantial decrease in microbial 47 48 316 diversity compared to the native compost inoculum. The selective conditions were completely 49 50 51 317 different from the switchgrass-adapted soil communities, yet both types of selection reduced the 52 53 318 number of taxa compared to the respective native community, ranging from 3 to 15 fold. 54 55 319 56 57 58 320 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 14 of 31 63 64 65 1 2 3 4 321 Discussion 5 6 7 322 8 9 323 Sequencing of individual genomes and metagenomes has advanced explosively in the last 5 10 11 12 324 years (e.g., Roche 454, Illumina, and Solexa). We can now sequence an organism’s entire 13 14 325 genome or an environmental metagenome in a few hours. Unfortunately, the resulting avalanche 15 16 326 of bioinformatic data in terms of assembly and annotation has become a major bottleneck for 17 18 19 327 utilizing this data. This is especially true for metagenomes from complex communities, where 20 21 328 many previously unsequenced taxa are present [46]. Indeed, by most estimates, so far less than 22 23 24 329 1% of all microbial taxa have been identified, and less than 1% of those identified have been 25 26 330 sequenced. Clearly, there is a vast, untapped genomic potential in many habitats waiting for the 27 28 29 331 bioinformatic databases to catch up. Until that time, we need to devise ways to limit the 30 31 332 complexity found in these untapped resources, so that we can start to realize discovery of 32 33 34 333 enzymes needed for enabling bioenergy breakthroughs. Examination of metagenomic datasets 35 36 334 obtained from Puerto Rican rainforest soil indicates that this microbial community possesses 37 38 335 many genes of interest to JBEI—such as cellulases, hemicellulases, and lignases—for 39 40 41 336 deconstruction of feedstock biomass material, as well as enzymes and pathways for synthesis of 42 43 337 new biofuels. Similar results were found earlier with municipal green waste compost [1]. The 44 45 46 338 wide variety of enzyme families found in the rainforest soil (close to half of the 300 enzyme 47 48 339 families described in CAZy and FOLy are present at five hits or more) is illustrative of the 49 50 51 340 incredible diversity found in natural soil communities. Interestingly, about one sixth of the 16% 52 53 341 of the sequence reads assigned by BLASTX to glycoside hydrolases match genes not yet 54 55 342 assigned to a specific family (GH0 and GT0 in Fig. 3). This is a significantly higher fraction 56 57 58 343 than was found in the compost sample (data not shown), suggesting that the rainforest 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 15 of 31 63 64 65 1 2 3 4 344 environment may provide a rich source for as-yet uncharacterized families of glycoside 5 6 7 345 hydrolases and glycosyl tranferases. However, despite the abundant sequence information, the 8 9 346 shotgun sequence reads cannot be assembled into full-length enzymes because of the enormous 10 11 12 347 complexity of soil microbial communities. Partial sequences of the length obtained in this study 13 14 348 (averaging 417 bp) are too short to encapsulate many typical lignocellulolytic domains, let alone 15 16 349 full-length genes. To obtain greater sequence coverage of the metagenome and genes of interest, 17 18 19 350 our strategy is to simplify the microbial communities by generating specific feedstock-adapted 20 21 351 consortia, which will possess the functional characteristics desirable for development of next- 22 23 24 352 generation biofuels. 25 26 353 Since the tropical rain forest soil is rich in anaerobic microorganisms, we reduced the 27 28 29 354 complexity of microbial community by growing them anaerobically on switchgrass as a sole 30 31 355 carbon and energy source, and by enriching with iron as a terminal electron acceptor. By 32 33 34 356 performing this selective enrichment we were able to cultivate and characterize mixed microbial 35 36 357 populations that are proficient in lignocellulosic degradation. The anaerobic enrichment yielded 37 38 358 mostly Gammaproteobacteria from the family Enterobacteraceae. The Enterobacteriaceae 39 40 13 41 359 were determined previously to play a strong role in anaerobic degradation of C-glucose, likely 42 43 360 by mixed acid fermentation [15]. This group of facultative aerobes is most commonly identified 44 45 46 361 with intestinal gut microbiota, though their role in anaerobic processes in soils is becoming more 47 48 362 apparent [26]. Taxa from the groups Helicobacterales (Epsilon-proteobacteria) and 49 50 51 363 Pepttococcales (class ) are also generally facultative aerobes commonly associated 52 53 364 with the gut microbial community, as pathogens as well as commensals [32]. Since the 54 55 365 microbial community of ruminant is relatively well characterized, and an active case 56 57 58 366 study site for biofuels [48], it is encouraging to find classic cellulolytic taxa enriched in these 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 16 of 31 63 64 65 1 2 3 4 367 communities. The anaerobic switchgrass-adapted tropical forest soil cultures also turned up 5 6 7 368 in the genus Beutenbergia sp., Arthrobacter sp., and Streptomyces. The 8 9 369 Beutenbergia spp. were initially discovered as anaerobes isolated from a cave [22]. Arthrobacter 10 11 12 370 spp. have so far only coincidentally been associated with lignocellulolytic environments, but 13 14 371 enriched taxa are closely related to the taxa Streptomyces viridosporus, known to depolymerize 15 16 372 lignin as it degrades lignocellulose via an extracellular lignin peroxidase [40]. 17 18 19 373 Iron addition caused a large increase in the diversity of feedstock-adapted communities, 20 21 374 likely because these cultures were limited for electron donors. These taxa were from many 22 23 24 375 diverse phyla, but sulfate-reducing bacteria dominated, including members of the 25 26 376 Deltaproteobacteria and Desulfotomaculum ( Firmicutes). There are myriad examples in 27 28 29 377 the literature of sulfate-reducing bacteria (SRB) engaged in aromatic hydrocarbon degradation 30 31 378 [20]. The role of SRBs in anaerobic decomposition of lignocellulose in natural environments has 32 33 34 379 been well characterized for industrial applications, where leachate from paper pulp mills or 35 36 380 municipal solid waste treatment seek to enrich SRBs for the purpose of bioremediation 37 38 381 [16, 27]. Members of the family Desulfobacteriaceae have been shown to use cellulose 39 40 41 382 fermentation products coupled with sulfidogenesis in co-cultures derived from a soda lake [52]. 42 43 383 While it has not been shown directly in soils, it seems likely that the sulfate-reducing bacteria 44 45 46 384 possess the ability to depolymerize and assimilate carbon from lignocellulose in the soil and 47 48 385 couple it to sulfate reduction [41, 47]. The heretofore, poor characterization of sulfate-reducing 49 50 51 386 bacteria in anaerobic lignocellulose degradation suggests opportunity for discovery of novel 52 53 387 mechanisms of deconstruction. 54 55 388 The green waste compost community, which is dominated by aerobic microorganisms, 56 57 58 389 was reduced in complexity by growing the community on lignin as a sole carbon source under 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 17 of 31 63 64 65 1 2 3 4 390 aerobic conditions. Lignin-adapted cultures were also enriched for a variety of organisms 5 6 7 391 represented by different phyla. The most abundant organism in the AL-amended culture was a 8 9 392 member of the phylum Actinobacteria, related to Rhodococcus sp. Isolates of the genus 10 11 12 393 Rhodococcous have been shown to degrade a wide variety of aromatic compounds, hydrophobic 13 14 394 natural compounds, and xenobiotics, including lignin-like compounds, and may be responsible 15 16 395 for lignin degradation in the AL culture [2, 3, 17, 13]. One organism present in all three lignin- 17 18 19 396 amended cultures is related to the brackish water isolate str. HINUF007. Its SSU rDNA 20 21 397 sequence is related to the family Sphingomonadaceae (98.9% identity to Sphingomonadaceae 22 23 24 398 bacterium clone IWENVB15). Sphingomonas paucimobilis has been extensively characterized 25 26 399 for its role in lignin degradation and can catabolize several lignin-derived biaryls and monoaryls 27 28 29 400 [35]. Therefore, the Sphingomonas-related organism in the lignin-amended cultures may have an 30 31 401 important role in lignin degradation. Other phyla represented in these lignin-adapted cultures 32 33 34 402 include and . Little is known about these phyla, yet both are 35 36 403 prevalent in soil and are proposed to deconstruct complex organic matter [38, 29]. Isolation of 37 38 404 lignin-degrading members of these phyla or genome reconstruction of these members from 39 40 41 405 metagenomic sequencing data will provide important insights into their ability to degrade lignin. 42 43 406 The SSU rRNA community profiles also indicate that a second unintentional selection 44 45 46 407 force may be at work in the lignin-amended cultures. Four of the phylotypes shared between the 47 48 408 three cultures (Paracoccus, Mesorhizobium, Rhizobium, and Hyphomicrobium) have cultured 49 50 51 409 representatives with the ability to perform denitrification under low or fluctuating oxygen 52 53 410 concentrations [19, 4, 18]. The primary nitrogen source in the lignin-amended cultures is NH4+; 54 55 411 however, the trace elements solution used to supply essential metals contain the chelator 56 57 58 412 nitrilotriacetate (NTA). Nitrilotriacetate is a biodegradable carbon source, and many NTA- 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 18 of 31 63 64 65 1 2 3 4 413 utilizing bacteria have been characterized [8]. Aerobic microorganisms initiate NTA degradation 5 6 7 414 with the enzyme NTA monooxygenase, a gene present in both Mesorhizobium and Paraccocus 8 9 415 genomes as determined from a search of NCBI [8]. Although the concentration of NTA in the 10 11 12 416 cultures was low (about 0.5 µM), it is unclear whether these organisms were consuming NTA or 13 14 417 lignin. Removal of NTA from future enrichment cultures should clarify this issue. 15 16 418 The exponential advances that have occurred in pyrosequencing in just the last 5 years 17 18 19 419 have made us appreciate even more ―The Doctrine of Infallibility‖— i.e., there is no compound, 20 21 420 man-made or natural, that microorganisms cannot degrade. However, until we sequence more of 22 23 24 421 the extant microbial community and develop better and higher throughput techniques for genome 25 26 422 assembly and annotation, we need to reduce community complexity in ways that are appropriate 27 28 29 423 for discovering new, essential enzymes. This is especially true for discovering lignocellulosic 30 31 424 enzymes that can be used for production of alternative liquid fuels. Choosing the appropriate 32 33 34 425 environmental community and enrichment conditions, (e.g., anaerobic or aerobic) is necessary in 35 36 426 selecting for microbes that can deconstruct lignocellulosic biomass under industrial pretreatment- 37 38 427 relevant conditions. We demonstrated that metagenome and SSU rRNA genotyping of soil and 39 40 41 428 compost microbial communities show they are inhabited by lignocellulose-degrading microbes, 42 43 429 and are thus good candidates for starting inocula. The decrease in complexity of these 44 45 46 430 communities will likely increase sequence coverage of the metagenome using 454 47 48 431 pyrosequencing, increasing the number of full-length genes identified, and facilitating the 49 50 51 432 discovery of lignocelluloses-degrading enzymes. Selective community enrichments like these 52 53 433 should greatly facilitate our ability to use the latest advances in genomic sequencing as a 54 55 434 platform for high throughput discovery of unique new enzymes, which will advance the 56 57 58 435 development of lignocellulosic liquid biofuels. 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 19 of 31 63 64 65 1 2 3 4 436 5 6 7 437 8 9 438 Acknowledgements: The authors would like to especially thank Dr. Ken Vogel of the USDA for 10 11 12 439 the Switchgrass samples used in this study. This work was part of the DOE Joint BioEnergy 13 14 440 Institute (http://www.jbei.org) supported by the U. S. Department of Energy, Office of Science, 15 16 441 Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 17 18 19 442 between Lawrence Berkeley National Laboratory and the U. S. Department of Energy. 20 21 443 22 23 24 444 References 25 26 445 27 28 29 446 [1] Allgaier M, Reddy A, Park JI, Ivanova N, D'Haeseleer P, Lowry S, et al. (2010) Targeted 30 447 Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community. Plos 31 448 One 5: 9. 32 449 [2] Andreoni V, Bernasconi S, Bestetti P & Villa M (1991) Metabolism of lignin-related 33 34 450 compounds by Rhodococcus rhodochorous- bioconversion of anisoin. Applied Microbiology and 35 451 Biotechnology 36: 410-415. 36 452 [3] Barnes MR, Duetz WA & Williams PA (1997) A 3-(3-hydroxyphenyl)propionic acid 37 453 catabolic pathway in Rhodococcus globerulus PWD1: Cloning and characterization of the hpp 38 454 operon. Journal of Bacteriology 179: 6145-6153. 39 40 455 [4] Baumann B, Snozzi M, Zehnder AJ & Van Der Meer JR (1996) Dynamics of denitrification 41 456 activity of Paracoccus denitrificans in continuous culture during aerobic-anaerobic changes. J 42 457 Bacteriol 178: 4367-4374. 43 458 [5] Braid MD, Daniels LM & Kitts CL (2003) Removal of PCR inhibitors from soil DNA by 44 459 chemical flocculation. Journal of Microbiological Methods 52: 389-393. 45 46 460 [6] Brodie EL, DeSantis TZ, Joyner DC, Baek SM, Larsen JT, Andersen GL, et al. (2006) 47 461 Application of a high-density oligonucleotide microarray approach to study bacterial population 48 462 dynamics during uranium reduction and reoxidation. Applied and Environmental Microbiology 49 463 72: 6288-6298. 50 51 464 [7] Brulc JM, Antonopoulos DA, Miller MEB, Wilson MK, Yannarell AC, Dinsdale EA, et al. 52 465 (2009) Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals 53 466 forage specific glycoside hydrolases. Proceedings of the National Academy of Sciences of the 54 467 United States of America 106: 1948-1953. 55 468 [8] Bucheli-Witschel M & Egli T (2001) Environmental fate and microbial degradation of 56 57 469 aminopolycarboxylic acids. FEMS Microbiology Reviews 25: 69-106. 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 20 of 31 63 64 65 1 2 3 4 470 [9] Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V & Henrissat B (2009) The 5 6 471 Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic 7 472 Acids Research 37: D233-D238. 8 473 [10] Charles D (2009) Corn-Based Ethanol Flunks Key Test. Science 324: 587-587. 9 474 [11] Chou HH & Holmes MH (2001) DNA sequence quality trimming and vector removal. 10 475 Bioinformatics 17: 1093-1104. 11 12 476 [12] Claudel-Renard C, Chevalet C, Faraut T & Kahn D (2003) Enzyme-specific profiles for 13 477 genome annotation: PRIAM. Nucleic Acids Research 31: 6633-6639. 14 478 [13] de Carvalho C & da Fonseca MMR (2005) The remarkable Rhodococcus erythropolis. 15 479 Applied Microbiology and Biotechnology 67: 715-726. 16 480 [14] DeAngelis KM, Silver WL, Thompson AW & Firestone MK (2009) Adaptation and 17 18 481 acclimation of Puerto Rico soil microbial communities to fluctuating redox. Ecology Letters 19 482 submitted. 20 483 [15] Degelmann DM, Kolb S, Dumont M, Murrell JC & Drake HL (2009) Enterobacteriaceae 21 484 facilitate the anaerobic degradation of glucose by a forest soil. FEMS Microbiology Ecology 68: 22 23 485 312-319. 24 486 [16] Detmers J, Schulte U, Strauss H & Kuever J (2001) Sulfate reduction at a lignite seam: 25 487 Microbial abundance and activity. Microbial Ecology 42: 238-247. 26 488 [17] Eulberg D, Golovleva LA & Schlomann M (1997) Characterization of catechol catabolic 27 489 genes from Rhodococcus erythropolis 1CP. Journal of Bacteriology 179: 370-381. 28 29 490 [18] Fesefeldt A, Kloos K, Bothe H, Lemmer H & Gliesche CG (1998) Distribution of 30 491 denitrification and nitrogen fixation genes in Hyphomicrobium spp. and other budding bacteria. 31 492 Canadian Journal of Microbiology 44: 181-186. 32 493 [19] Garcia-Plazaola JI, Becerril JM, Arreseigor C, Hernandez A, Gonzalezmurua C & 33 34 494 Apariciotejo PM (1993) Denitrifying ability of 13 Rhizobium meliloti strains. Plant and Soil 35 495 149: 43-50. 36 496 [20] Gibson J & Harwood CS (2002) Metabolic diversity in aromatic compound utilization by 37 497 anaerobic microbes. Annual Review of Microbiology 56: 345-369. 38 498 [21] Giroux H, Vidal P, Bouchard J & Lamy F (1988) Degradation of kraft indulin by 39 40 499 Streptomyces viridosporus and Streptomyces badius. Applied and Environmental Microbiology 41 500 54: 3064-3070. 42 501 [22] Groth I, Schumann P, Schuetze B, Augsten K, Kramer I & Stackebrandt E (1999) 43 502 Beutenbergia cavernae gen. nov., sp, nov., an L-lysine-containing actinomycete isolated from a 44 503 cave. International Journal of Systematic Bacteriology 49: 1733-1740. 45 46 504 [23] Hershberger KL, Barns SM, Reysenbach AL, Dawson SC & Pace NR (1996) Wide 47 505 diversity of . Nature 384: 420-420. 48 506 [24] Jaeger KE & Eggert T (2002) Lipases for biotechnology. Current Opinion in Biotechnology 49 507 13: 390-397. 50 51 508 [25] Janssen PH, Schuhmann A, Morschel E & Rainey FA (1997) Novel anaerobic 52 509 ultramicrobacteria belonging to the Verrucomicrobiales lineage of bacterial descent isolated by 53 510 dilution culture from anoxic rice paddy soil. Applied and Environmental Microbiology 63: 1382- 54 511 1388. 55 512 [26] Kersters K (2006) The Genus Deleya. : A Handbook on the Biology of Bacteria, 56 57 513 Vol 6, Third Edition: Proteobacteria: Gamma Subclass 836-843. 58 514 [27] Kim J, Kim M & Bae W (2009) Effect of oxidized leachate on degradation of lignin by 59 515 sulfate-reducing bacteria. Waste Management & Research 27: 520-526. 60 61 62 Enrichment Cultures for Enzyme Discovery, page 21 of 31 63 64 65 1 2 3 4 516 [28] Kunin WE, Vergeer P, Kenta T, Davey MP, Burke T, Woodward FI, et al. (2009) Variation 5 6 517 at range margins across multiple spatial scales: environmental temperature, population genetics 7 518 and metabolomic phenotype. Proceedings of the Royal Society B-Biological Sciences 276: 1495- 8 519 1506. 9 520 [29] Lee KC, Webb RI, Janssen PH, Sangwan P, Romeo T, Staley JT, et al. (2009) Phylum 10 521 Verrucomicrobia representatives share a compartmentalized cell plan with members of bacterial 11 12 522 phylum . Bmc Microbiology 9. 13 523 [30] Lee SK, Chou H, Ham TS, Lee TS & Keasling JD (2008) Metabolic engineering of 14 524 microorganisms for biofuels production: from bugs to synthetic biology to fuels. Current 15 525 Opinion in Biotechnology 19: 556-563. 16 526 [31] Levasseur A, Plumi F, Coutinho PM, Rancurel C, Asther M, Delattre M, et al. (2008) 17 18 527 FOLy: An integrated database for the classification and functional annotation of fungal 19 528 oxidoreductases potentially involved in the degradation of lignin and related aromatic 20 529 compounds. Fungal Genetics and Biology 45: 638-645. 21 530 [32] Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS, et al. (2008) 22 23 531 Evolution of mammals and their gut microbes. Science 320: 1647-1651. 24 532 [33] Li LL, McCorkle SR, Monchy S, Taghavi S & van der Lelie D (2009) Bioprospecting 25 533 metagenomes: glycosyl hydrolases for converting biomass. Biotechnology for Biofuels 2: 11. 26 534 [34] Lopez MJ, Vargas-GarcÌa MdC, Su·rez-Estrella F, Nichols NN, Dien BS & Moreno J 27 535 (2007) Lignocellulose-degrading enzymes produced by the ascomycete Coniochaeta ligniaria 28 29 536 and related species: Application for a lignocellulosic substrate treatment. Enzyme and Microbial 30 537 Technology 40: 794-800. 31 538 [35] Masai E, Katayama Y & Fukuda M (2007) Genetic and biochemical investigations on 32 539 bacterial catabolic pathways for lignin-derived aromatic compounds. Bioscience Biotechnology 33 34 540 and Biochemistry 71: 1-15. 35 541 [36] McGroddy M & Silver WL (2000) Variations in belowground carbon storage and soil CO2 36 542 flux rates along a wet tropical climate gradient. Vol. 32 ed.^eds.), p.^pp. 614-624. 37 543 [37] Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, et al. (2008) The 38 544 metagenomics RAST server - a public resource for the automatic phylogenetic and functional 39 40 545 analysis of metagenomes. Bmc Bioinformatics 9: 8. 41 546 [38] Mou XZ, Sun SL, Edwards RA, Hodson RE & Moran MA (2008) Bacterial carbon 42 547 processing by generalist species in the coastal ocean. Nature 451: 708-U704. 43 548 [39] Parton W, Silver WL, Burke IC & Grassens L (2007) Global-Scale Similarities in Nitrogen 44 549 Release Patterns During Long-Term Decomposition. Science. 45 46 550 [40] Pasti MB, Pometto AL, Nuti MP & Crawford DL (1990) Lignin-solubilizing ability of 47 551 Actinomycetes isolated from termite (Termitidae) hindgut. Applied and Environmental 48 552 Microbiology 56: 2213-2218. 49 553 [41] Seeliger S, Cord-Ruwisch R & Schink B (1998) A periplasmic and extracellular c-type 50 51 554 cytochrome of Geobacter sulfurreducens acts as a ferric iron reductase and as an electron carrier 52 555 to other acceptors or to partner bacteria. Journal of Bacteriology 180: 3686-3691. 53 556 [42] Silver WL, Lugo AE & Keller M (1999) Soil oxygen availability and biogeochemistry 54 557 along rainfall and topographic gradients in upland wet tropical forest soils. Biogeochemistry 44: 55 558 301-328. 56 57 559 [43] Tanner RS (2007) Cultivation of bacteria and fungi. Manual of environmental microbiology 58 560 69-78. 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 22 of 31 63 64 65 1 2 3 4 561 [44] Tschech A & Pfennig N (1984) Growth-Yield Increase Linked to Caffeate Reduction in 5 6 562 Acetobacterium-Woodii. Archives of Microbiology 137: 163-167. 7 563 [45] Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, et al. (2009) A 8 564 core gut microbiome in obese and lean twins. Nature 457: 480-U487. 9 565 [46] Vogel TM, Simonet P, Jansson JK, Hirsch PR, Tiedje JM, van Elsas JD, et al. (2009) 10 566 TerraGenome: a consortium for the sequencing of a soil metagenome. Nature Reviews 11 12 567 Microbiology 7: 252-252. 13 568 [47] Walker CB, He ZL, Yang ZK, Ringbauer JA, He Q, Zhou JH, et al. (2009) The Electron 14 569 Transfer System of Syntrophically Grown Desulfovibrio vulgaris. Journal of Bacteriology 191: 15 570 5793-5801. 16 571 [48] Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT, et al. 17 18 572 (2007) Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher 19 573 termite. Nature 450: 560-U517. 20 574 [49] Weaver PL & Murphy PG (1990) Forest Structure and Productivity in Puerto-Rico Luquillo 21 575 Mountains. Biotropica 22: 69-82. 22 23 576 [50] Widdel F, Kohring GW & Mayer F (1983) Studies on Dissimilatory Sulfate-Reducing 24 577 Bacteria That Decompose Fatty-Acids .3. Characterization of the Filamentous Gliding 25 578 Desulfonema-Limicola Gen-Nov Sp-Nov, and Desulfonema-Magnum Sp-Nov. Archives of 26 579 Microbiology 134: 286-294. 27 580 [51] Wilson KH, Blitchington RB & Greene RC (1990) Amplification of Bacterial-16s 28 29 581 Ribosomal DNA with Polymerase Chain-Reaction. Journal of Clinical Microbiology 28: 1942- 30 582 1946. 31 583 [52] Zavarzin GA, Zhilina TN & Dulov LE (2008) Alkaliphilic sulfidogenesis on cellulose by 32 584 combined cultures. Microbiology 77: 419-429. 33 34 585 35 586 36 587 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 23 of 31 63 64 65 1 2 3 4 588 5 6 7 589 Table 1. Culture conditions of feedstock-adapted consortia 8 9 590 10 11 12 Sample Temp 13 14 Name Inoculaa Feedstocka Mediaa (°C) Headspace pH 15 16 17 OL Compost Organosolv lignin M9TE 37 Aerobic 7.0 18 19 AL Compost Alkali lignin M9TE 37 Aerobic 7.0 20 21 IL Compost Indulin AT M9TE 37 Aerobic 7.0 22 23 24 25 26 SG PR soil switchgrass BMM 25 N2 5.5 27 28 29 SG-Fe PR soil switchgrass BMM + Fe 25 N2 5.5 30 31 591 32 33 a 34 592 See Methods for description of Inocula, feedstock, and media. 35 36 593 37 38 39 594 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 24 of 31 63 64 65 1 2 3 4 595 Table 2. Table of dominant phylotypes in the feedstock-adapted communities cross-referenced to 5 6 7 596 literature reports of exoenzyme activity relevant to biofuels. 8 9 597 10 11 12 Sample Abundanc Phylum Class Closest relative taxon 13 Name e (%) 14 OL 48.3 Proteobacteria Alphaproteobacteria Paracoccus sp. 15 3.5 Proteobacteria Gammaproteobacteria Water manure clone 16 17 3.0 Proteobacteria Alphaproteobacteria Brackish water isolate 18 2.8 Proteobacteria Alphaproteobacteria Mesorhizobium sp. 19 2.8 Proteobacteria Alphaproteobacteria Azospirillum rugosum str. 20 2.8 Proteobacteria Alphaproteobacteria Hyphomicrobium aestuarii str. 21 2.2 Proteobacteria Betaproteobacteria Pigmentiphaga sp. 22 23 2.0 Alveolata Voromonas Voromonas pontica str. 24 2.0 Bacteroidetes Flavobacteriales Wastewater treatment reactor clone 25 1.7 Verrucomicrob Opitutae Biofilm reed bed reactor clone 26 1.5 ia Flexibacterales Prairie soil clone 27 28 1.5 Bacteroidetes Alphaproteobacteria Wastewater plant clone 29 1.5 Proteobacteria Alphaproteobacteria Chromium contaminated soil clone 30 1.4 Proteobacteria Alphaproteobacteria Rhizobium sp. 31 1.2 Proteobacteria Alphaproteobacteria Solid waste clone 32 1.2 Proteobacteria Gammaproteobacteria Soil clone 33 34 1.2 Proteobacteria Alphaproteobacteria Phaeospirillum fulvum str. 35 1.2 Proteobacteria Solibacteres Aspen rhizosphere clone 36 1.0 Acidobacteria Gammaproteobacteria Activated sludge maturation clone 37 Proteobacteria 38 39 AL 17.6 Actinobacteria Actinobacteridae Rhodococcus sp. 40 8.0 Firmicutes Bacillales Oxalophagus oxalicus str. 41 7.5 Proteobacteria Alphaproteobacteria Brackish water isolate 42 7.2 Proteobacteria Alphaproteobacteria Paracoccus sp. 43 6.1 Proteobacteria Alphaproteobacteria Chromium contaminated soil clone 44 45 4.9 Proteobacteria Alphaproteobacteria Mesorhizobium sp. 46 4.3 Bacteroidetes Sphingobacterium sp. 47 4.1 Proteobacteria Gammaproteobacteria Water manure clone 48 3.6 Actinobacteria Actinobacteridae Microbacterium sp. 49 2.8 Proteobacteria Alphaproteobacteria Hyphomicrobium aestuarii str. 50 51 2.6 Acidobacteria Solibacteres Aspen rhizosphere clone 52 2.1 Proteobacteria Alphaproteobacteria Rhizobium sp. 53 1.6 Proteobacteria Alphaproteobacteria Sphingomonas sp. 54 1.4 Proteobacteria Betaproteobacteria Bordetella sp. 55 56 1.4 Bacteroidetes Sphingobacteria Pedobacter koreensis str. 57 1.2 Proteobacteria Alphaproteobacteria Solid waste clone 58 IL 13.6 Proteobacteria Alphaproteobacteria Rhizobium sp. 59 12.7 Proteobacteria Alphaproteobacteria Brackish water isolate 60 61 62 Enrichment Cultures for Enzyme Discovery, page 25 of 31 63 64 65 1 2 3 4 9.4 Proteobacteria Alphaproteobacteria Mesorhizobium sp. 5 6 7.5 Proteobacteria Betaproteobacteria Pigmentiphaga sp. 7 6.9 Proteobacteria Alphaproteobacteria Paracoccus sp. 8 5.7 Proteobacteria Alphaproteobacteria Wastewater plant clone 9 4.2 Acidobacteria Solibacteres Aspen rhizosphere clone 10 4.2 Proteobacteria Alphaproteobacteria Solid waste clone 11 12 3.6 Proteobacteria Alphaproteobacteria Sphingomonadaceae str. 13 2.7 Proteobacteria Alphaproteobacteria Chelatovorus multitrophus str. 14 2.1 Bacteroidetes Mature mushroom compost isolate 15 1.8 Actinobacteria Rubrobacteridae Solirubrobacter sp. 16 17 1.8 Bacteroidetes Flavobacteriales Wastewater treatment reactor clone 18 1.8 Proteobacteria Gammaproteobacteria Oceanospirillum sp. 19 1.5 Proteobacteria Alphaproteobacteria Freshwater lake isolate 20 1.5 Proteobacteria Gammaproteobacteria Oil-contaminated soil clone 21 22 1.2 Proteobacteria Gammaproteobacteria Water manure clone 23 1.2 Proteobacteria Alphaproteobacteria Hyphomicrobium aestuarii str. 24 1.2 Proteobacteria Gammaproteobacteria Pseudomonas sp. 25 1.2 Proteobacteria Alphaproteobacteria Mesorhizobium sp. 26 598 27 28 599 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 26 of 31 63 64 65 1 2 3 4 600 Table 3. Richness of switchgrass-adapted consortia derived from tropical forest soils as 5 6 7 601 determined by SSU ribosomal RNA gene microarray PhyloChip. 8 9 Sample Richness Phylum Class Nearest taxon 10 SG only 61 Proteobacteria Gammaproteobacteria Enterobacterales 11 12 6 Proteobacteria Epsilonproteobacteria Helicobacterales 13 3 Actinobacteria Actinobacteria Actinomycetales 14 3 Unclassified Unclassified 15 2 Proteobacteria Betaproteobacteria Rhodocyclales 16 17 2 Acidobacteria Acidobacteria Acidobacterales 18 1 Firmicutes Clostridia Peptococcales 19 1 Bacteroidetes Sphingobacteria Flammeovirgaceae 20 1 Cyanobacteria Chloroplasts 21 1 Anaerolineae 22 23 1 Firmicutes gut clone group 24 1 Bacteroidetes KSA1 25 1 Thermodesulfobacteria 26 SG+Fe 27 Bacteroidetes Bacteroidetes Prevotellaceae 27 28 & 19 Proteobacteria Deltaproteobacteria Desulfovibrionaceae 29 NOT SG 13 Proteobacteria Alphaproteobacteria Caulobacterales 30 11 Firmicutes Bacilli Enterococcales 31 8 Verrucomicrobia Verrucomicrobiae Verrucomicrobia 32 4 Chloroflexi Dehalococcoidetes 33 34 3 Firmicutes Desulfotomaculum Desulfovibrionaceae 35 2 Planctomycetes Planctomycetacia Prellulae 36 2 Bacteroidetes Flavobacteria 37 2 Firmicutes Catabacter 38 39 2 Firmicutes Symbiobacteria 40 2 OP9/JS1 OP9 41 2 NC10 NC10-1 42 1 Spirochaetes Spirochaetaceae 43 1 TM7 TM7-3 44 45 1 Aquificae 46 1 OP10 CH21 cluster 47 1 Chlamydiae Parachlamydiaceae 48 1 Chloroflexi Thermomicrobia 49 1 Acidobacteria Acidobacteria-5 50 51 1 marine group A mgA-2 52 53 602 54 603 55 56 604 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 27 of 31 63 64 65 1 2 3 4 605 Table S1: Inventory of selected lignocellulose degrading glycoside hydrolase families (GHs) 5 606 identified in the rain forest soil microbiome. 6 CAZy known activity Rain forest Compost Compost Cow rumen Termite hindgut 7 family (blastx) (blastx) (pfam) (pfam) (pfam) 8 (Allgaier et (Brulc et al., (Warnecke et al., 9 al., 2010) 2009) 2007) 10 Cellulases GH5 cellulase 5.8 4.0 3.2 1 16.3 11 GH6 endoglucanase 0.5 1.2 2.1 0 0 12 GH7 endoglucanase 0 0 0.1 0 0 13 GH9 endoglucanase 2.0 2.5 4.3 0.9 3.9 14 GH44 endoglucanase 1.2 0.3 0.4 0 1 GH45 endoglucanase 0 0 0 0 0.6 15 GH48 endo-processive cellulases 0.4 1.1 0.5 0 0 16 Total 10 9.1 10.6 1.9 21.8 17 18 Endohemicellulases 19 GH8 endo-xylanases 2.9 1.2 0.5 0.6 2.7 GH10 endo-1,4--xylanase 1.6 6.1 8.9 1 12.1 20 GH11 xylanase 0.3 1.2 1.4 0.1 2.5 21 GH12 endoglucanase & 0.4 0.9 0.6 0 0 22 xyloglucan hydrolysis 23 GH26 -mannanase & xylanase 0.4 1.2 1.5 0.7 2.8 24 GH28 galacturonases 8.6 6.2 0.9 0.7 1.7 GH53 endo-1,4--galactanase 0.6 0.2 0.2 2.5 2.7 25 Total 14.7 17 14 5.6 24.5 26 27 Cell wall elongation 28 GH16 xyloglucanases & 3.3 2.5 2 0 0.3 29 xyloglycosyltransferases 30 GH17 1,3--glucosidases 2.0 3.0 0.1 0 0 GH74 endoglucanases & 0.3 0.2 1.6 0 1 31 xyloglucanases 32 GH81 1,3--glucanase 0 0.4 0.3 0 0 33 Total 5.6 6.2 4 0 1.3 34 35 Debranching enzymes GH51 -L-arabinofuranosidase 4.3 5.9 7.8 9.9 3.5 36 GH54 -L-arabinofuranosidase 0.4 0 0 0.2 0 37 GH62 -L-arabinofuranosidase 0.2 0.6 1.7 0 0 38 GH67 -glucuronidase 0.8 3.0 3.6 0 2.3 39 GH78 -L-rhamnosidase 3.1 4.5 8.1 5.1 0.9 40 Total 8.8 14 21.2 15.2 6.7 41 Oligosaccharide-degrading enzymes 42 GH1 -glucosidase and many 6.5 5.0 9.2 1.5 3.1 43 other -linked dimers 44 GH2 -galactosidases and other 13.6 9.9 8.6 28.6 8.8 45 -linked dimers 46 GH3 mainly -glucosidases 17.9 19.7 12.2 27 12.8 47 GH29 -L-fucosidase 4.5 1.8 2.1 4.2 0 48 GH35 -galactosidase 2.5 0.4 0.6 1.8 0.7 GH38 -mannosidase 2.3 1.7 2.6 2.6 5.6 49 GH39 -xylosidase 2.2 3.5 1 0.3 1.3 50 GH42 -galactosidase 1.9 1.0 2.5 1.7 4.8 51 GH43 arabinases & xylosidases 9.5 10.7 11.3 9.4 8.3 52 GH52 -xylosidase 0.2 0 0 0 0.4 53 Total 60.9 53.7 50.1 77.1 45.8

54 Total GHs 4206 4708 801 651 653 55 607 GHs are grouped according to major functional role and compared to other lignocellulosic systems. The indicated values are 56 608 percentages of blastx hits of unassembled reads vs. CAZy protein families, or pfam hits of assembled sequences, weighted 57 609 according to species abundance distribution. Note that the blastx-based and pfam-based abundance values for the compost dataset 58 610 are well correlated (r=0.84), indicating that blastx hits on unassembled reads are a reasonable proxy for pfam hits on assembled 611 ORFs. 59 60 612 61 62 Enrichment Cultures for Enzyme Discovery, page 28 of 31 63 64 65 1 2 3 4 613 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 29 of 31 63 64 65 1 2 3 4 614 FIGURE LEGENDS 5 6 7 615 8 9 616 Figure 1. Schematic structure of our approach to metabolic analysis of complex microbial 10 11 12 617 communities. Starting with natural samples at the bottom, a series of analytical, biochemical, 13 14 618 and bioinformatics tools are applied so that eventually we are left with the most active, relevant 15 16 619 consortia for novel enzyme and pathway discovery. 17 18 19 620 20 21 621 Figure 2. Metabolic profiling of the metagenomic sequence data revealed a wide diversity of 22 23 24 622 protein sequences relevant to bioenergy, including lignocellulolytic enzymes (cellulases, 25 26 623 hemicellulases, lignases) as well as potential biofuel synthesis pathways (e.g., mevalonate and 27 28 29 624 non-mevalonate (DOXP) isoprenoid biosynthesis, lycopene biosynthesis, and a proposed butanol 30 31 625 biosynthesis pathway). Note that our BLASTX results cover only the lignocellulolytic enzymes 32 33 34 626 present in CAZy and FOLy, whereas PRIAM EC number predictions cover virtually all known 35 36 627 metabolic pathways. 37 38 628 39 40 41 629 Figure 3. A pie chart demonstrating the relative abundance of CAZy and FOLy enzyme families 42 43 630 found in the tropical forest metagenome, inferred from BLASTX hits. GH: glycoside hydrolases; 44 45 46 631 GT: glycosyl transferrases; PL: polysaccharide lyases; CE: carbohydrate esterases; LO: lignin 47 48 632 oxidases; LDA: lignin degrading auxiliary oxidases. Note that GH0 and GT0, denoted by grey 49 50 51 633 with dotted lines; are not recognized enzyme families, but a bin for as-yet unclassified enzymes. 52 53 634 Some of the smaller subfamilies are grouped together for ease of display. 54 55 635 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 30 of 31 63 64 65 1 2 3 4 636 Figure 4. Microbial community profiles of native versus feedstock adapted communities from 5 6 7 637 (A) compost and (B) tropical forest soils. Profiles are based on the sequences of SSU rRNA 8 9 638 genes using high-density PhyloChip microarray or SSU rRNA amplicon pyrosequencing. 10 11 12 639 Richness of native communities is much higher compared to feedstock-adapted communities 13 14 640 after growth on various feedstocks. 15 16 641 17 18 19 642 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Enrichment Cultures for Enzyme Discovery, page 31 of 31 63 64 65 Figure 1.

Novel Deconstruction Enzymes & Pathways Whole Genome Sequencing

Proteomics, Metagenomics, Metatranscriptomics

Ribosomal Panel (16S/18S clone libraries)

Community Screening (PhyloChip, GeoChip, Pyrotag, MycoChip)

ontrol reactors (using e.g. cellulose, lignin)

C Enrichment/Cultivation Activity Assays

Good performing(e.g. reactors azocellulose, laccase, respiration, CO2, CH4, qPCR)

Fermentation Reactors & Rain Forest Soil Respirometers online real-time sensors (e.g. respiration, ethanol)

Tropical Forest Soils, Compost

Figure 1. Schematic structure of our approach to metabolic analysis of complex microbial communities. Starting with natural samples at the bottom, a series of analytical, biochemical and bioinformatics tools are applied so that eventually we are left with the most active, relevant consortia for novel enzyme and pathway discovery. Figure 2

FiHVSF

Endoglucanase Exoglucanase/cellobiohydrolase PRIAM Exoglucanase/glucanohydrolase

cellulose b-Glucosidase BLASTX Endoxylanase b-Xylosidase a-L-Arabinofuranosidase a-Glucuronidase Acetyl xylan esterase Feruloyl esterase Xyloglucanase

hemicellulose b-Galactosidase a-L-fucosidase Laccase Lignin peroxidase Cellobiose dehydrogenase aryl-alcohol oxidase vanillyl-alcohol oxidase glyoxal oxidase lignin pyranose oxidase galactose oxidase glucose oxidase benzoquinone reductase Feruloyl esterase 1-Deoxy-D-xylulose 5-phosphate synthase 1-Deoxy-D-xylulose 5-phosphate reductoisomerase 4-Diphosphocytidyl-2C-methyl-D-erythritol synthase 4-Diphosphocytidyl-2C-methyl-D-erythritol kinase 2C-Methyl-D-erythritol 2,4-cyclodiphosphate synthase DOXP 1-Hydroxy-2-methyl-2(E)-butenyl 4-diphosphate synthase 1-Hydroxy-2-methyl-2(E)-butenyl 4-diphosphate reductase Isopentenyl-diphosphate delta-isomerase Acetyl-CoA-acetyltransferase Hydroxymethylglutaryl-CoA synthase Hydroxymethylglutaryl-CoA reductase Mevalonate kinase Phosphomevalonate kinase

mevalonate Diphosphomevalonate decarboxylase Isopentenyl-diphosphate delta-isomerase Acetyl-CoA C-acetyltransferase 3-hydroxyacyl-CoA dehydrogenase. Acetoacetyl-CoA reductase 3-hydroxybutyryl-CoA dehydrogenase 3-hydroxybutyryl-CoA dehydratase Enoyl-CoA hydratase

butanol Butyryl-CoA dehydrogenase Trans-2-enoyl-CoA reductase (NAD+) Acetaldehyde dehydrogenase (acetylating) Butanal dehydrogenase Butanol dehydrogenase Dimethylallyltransferase Geranyltranstransferase Geranylgeranyl-diphosphate synthase Geranylgeranyl-diphosphate geranylgeranyltransferase Phytoene desaturase lycopene Carotene 7,8-desaturase 0 50 100 150 200 250 300 350 400 450 number of hits from tropical forest metagenome A , LD LO L, P E, C G ly c o s y l t r a n ) s

H f e G r (

r

s a s

e

e s

s a

l

(

o G

r

T

d

)

y

h

e

d

i

s

o

c

y

l

G

Figure 3. This pie chart demonstrates the relative abundance found in the tropical forest metagenome for all the CAZy and FOLy enzyme families, inferred from BLASTX hits. GH: glycoside hydrolases; GT: glycosyl transferrases; PL: polysaccharide lyases; CE: carbohydrate esterases; LO: lignin oxidases; LDA: lignin degrading auxiliary oxidases. Note that GH0 and GT0, denoted by grey with dotted lines; are not proper enzyme families, but a bin for as-yet unclassied enzymes. Some of the smaller subfamilies are grouped together for ease of display. Figure 4(B Verrucomicrobia 1400 Unclassified TM7 TM6 Thermodesulfobacteria Termite group 1 1200 Synergistes Spirochaetes SPAM Proteobacteria Planctomycetes 1000 OP9/JS1 OP8 OP3 OP10 OD1 Nitrospira 800 NC10 Natronoanaerobium marine group A Lentisphaerae LD1PA group 600 Firmicutes richness (number of taxa) DSS1 Deinococcus-Thermus Cyanobacteria 400 Crenarchaeota Coprothermobacteria Chloroflexi Chlorobi Chlamydiae Caldithrix 200 BRC1 Bacteroidetes Aquificae AD3 Actinobacteria Acidobacteria 0 WS3 soil inoculum FAC, SG + Fe FAC, SG only Figure 4b

400 Unclassified Eukaryota Fungi WS6 VHS-B5-50 Verrucomicrobia 300 Thermoanaerobacteria Thermoacetogenium Thermi Spirochaetes Deltaproteobacteria Gammaproteobacteria 200 Betaproteobacteria Alphaproteobacteria Planctomycetes OP10 richness (number of taxa) NKB19 100 Gemmatimonadetes Firmicutes Fervidomicrobium Chloroflexi Chlorobi Bacteroidetes Actinobacteria 0 Acidobacteria compost AL IL OL Archaea inoculum

fromribosomalpyrosequencing.adaped (A) communitiescompost RNA genes Richness and after using(B) tropical ofgrowth high-density native forest on communities various soils. PhyloChip Proles feedstocks. are microarray aremuch based higher onor SSUcomparedthe sequencesamplicon to feedstock- of 16S

Figure 4. Microbial community proles of native versus feedstock adapted communities Supplemental Table S1

Table S1: Inventory of selected lignocellulose degrading glycoside hydrolase families (GHs) identified in the rain forest soil microbiome. CAZy known activity Rain forest Compost Compost Cow rumen Termite hindgut family (blastx) (blastx) (pfam) [1] (pfam) [1, 2] (pfam) [3] Cellulases GH5 cellulase 5.8 4.0 3.2 1 16.3 GH6 endoglucanase 0.5 1.2 2.1 0 0 GH7 endoglucanase 0 0 0.1 0 0 GH9 endoglucanase 2.0 2.5 4.3 0.9 3.9 GH44 endoglucanase 1.2 0.3 0.4 0 1 GH45 endoglucanase 0 0 0 0 0.6 GH48 endo-processive cellulases 0.4 1.1 0.5 0 0 Total 10 9.1 10.6 1.9 21.8 Endohemicellulases GH8 endo-xylanases 2.9 1.2 0.5 0.6 2.7 GH10 endo-1,4--xylanase 1.6 6.1 8.9 1 12.1 GH11 xylanase 0.3 1.2 1.4 0.1 2.5 GH12 endoglucanase & 0.4 0.9 0.6 0 0 xyloglucan hydrolysis GH26 -mannanase & xylanase 0.4 1.2 1.5 0.7 2.8 GH28 galacturonases 8.6 6.2 0.9 0.7 1.7 GH53 endo-1,4--galactanase 0.6 0.2 0.2 2.5 2.7 Total 14.7 17 14 5.6 24.5 Cell wall elongation GH16 xyloglucanases & 3.3 2.5 2 0 0.3 xyloglycosyltransferases GH17 1,3--glucosidases 2.0 3.0 0.1 0 0 GH74 endoglucanases & 0.3 0.2 1.6 0 1 xyloglucanases GH81 1,3--glucanase 0 0.4 0.3 0 0 Total 5.6 6.2 4 0 1.3 Debranching enzymes GH51 -L-arabinofuranosidase 4.3 5.9 7.8 9.9 3.5 GH54 -L-arabinofuranosidase 0.4 0 0 0.2 0 GH62 -L-arabinofuranosidase 0.2 0.6 1.7 0 0 GH67 -glucuronidase 0.8 3.0 3.6 0 2.3 GH78 -L-rhamnosidase 3.1 4.5 8.1 5.1 0.9 Total 8.8 14 21.2 15.2 6.7 Oligosaccharide-degrading enzymes GH1 -glucosidase and many 6.5 5.0 9.2 1.5 3.1 other -linked dimers GH2 -galactosidases and other 13.6 9.9 8.6 28.6 8.8 -linked dimers GH3 mainly -glucosidases 17.9 19.7 12.2 27 12.8 GH29 -L-fucosidase 4.5 1.8 2.1 4.2 0 GH35 -galactosidase 2.5 0.4 0.6 1.8 0.7 GH38 -mannosidase 2.3 1.7 2.6 2.6 5.6 GH39 -xylosidase 2.2 3.5 1 0.3 1.3 GH42 -galactosidase 1.9 1.0 2.5 1.7 4.8 GH43 arabinases & xylosidases 9.5 10.7 11.3 9.4 8.3 GH52 -xylosidase 0.2 0 0 0 0.4 Total 60.9 53.7 50.1 77.1 45.8 Total GHs 4206 4708 801 651 653 GHs are grouped according to major functional role and compared to other lignocellulosic systems. The indicated values are percentages of blastx hits of unassembled reads vs. CAZy protein families, or pfam hits of assembled sequences, weighted according to species abundance distribution. Note that the blastx-based and pfam-based abundance values for the compost dataset are well correlated (r=0.84), indicating that blastx hits on unassembled reads are a reasonable proxy for pfam hits on assembled ORFs.

1. Allgaier, M., et al., Targeted discovery of glycoside hydrolases from a switchgrass- adapted compost community. PLoS One, 2010. (In press).

1 2. Brulc, J.M., et al., Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci U S A, 2009. 106(6): p. 1948-53. 3. Warnecke, F., et al., Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature, 2007. 450(7169): p. 560-5.

2