Photosynthetic : Are they the only?

Alex Greenspan UC Davis Microbial Diversity 2012

Abstract:

The advent of oxygenic photosynthesis marks one of the singularly most important events in life's history. However, the origins of oxygenic photosynthesis are poorly understood. All known oxygenic photosynthesis is performed by members of the bacterial cyanobacteria and all cultivated cyanobacteria are capable of oxygenic photosynthesis. Since the advent of 16S rRNA sequencing for in situ identification of microbes in their environments, various environmental surveys have yielded 16S sequences which group on the cyanobacterial branch of bacterial phylogeny, basal to any known photosynthetic cyanobacteria. Based on the environmental distribution of the different clusters of these basal cyanobacterial sequences, many of these speculative groups seem to grow obligately in the dark, suggesting that some of these sequences may belong to extant, ancestral, aphotic cyanobacteria. The discovery and isolation of such would likely yield invaluable information about the origins of oxygenic photosynthesis. This project was directed at further characterizing these organisms based on their phylogeny and observed environmental distribution to contribute towards further attempts at isolating ancestral, aphotic cyanobacteria. In addition to phylogenetic analysis, I also attempted to design PCR primers specific for the environmental groups, which would greatly aid any later attempts at enriching and isolating these organisms.

Background: The ability to fix carbon from light energy is distributed through a diverse selection of bacterial groups, and involves a similarly wide variety of chemistries (Overmann and Garcia-Pichel, 2006). Different organisms have evolved to utilize different wavelengths of light to transfer electrons from a variety of electron donors ultimately to carbon dioxide. Of those capable of photosynthesis, only one group of related organisms have evolved to utilize water as an electron donor, resulting in the production of molecular oxygen: the cyanobacteria. The advent of this process almost certainly marked the shift in the Earth from an anoxic to oxic atmosphere. The consequences of this event on the trajectory of life's evolution are incomparable. All known cyanobacteria today are capable of oxygenic photosynthesis. However, given the patchy distribution of photosynthesis in the nearest-related groups of to cyanobacteria, it is unlikely that their last common ancestor was photosynthetic (for more on this claim, and really greater details on all of the biochemistry and phylogeny of photosynthesis which I'm somewhat neglecting here, check out James Hemp's 2011 report from the course). Several studies have used comparative genomics to search for clues to the origins of photosynthesis in this group. Mulkidjanian et al. (2006) infer somehow that photosynthesis itself may have originated with the cyanobacteria. Beck et al (2012) manage to infer core metabolic pathways for all cyanobacteria. In any case, if we're interested in how photosynthesis originated in the group, ultimately what we need is to find out what they were doing before photosynthesis entered the lineage. A diversity of fermentative pathways are widely conserved among a broad range of cyanobacteria (Stal, 1997), so this seems like a plausible ancestral way of life. What we would want to have, in order to truly learn anything about the origins of oxygenic photosynthesis in cyanobacteria, is an isolate representing aphotic, ancestral cyanobacteria. We have reasons to believe such an might exist. Numerous 16S rRNA gene sequencing surveys have identified sequences which group at the base of the cyanobacterial branch of the . Many of these sequences seem to come from conserved clades that are consistently found in analogous environments, and many of these environments are consistently dark (for example, many sequences from a seemingly monophyletic group have been found in the guts of a surprising array of . More on this in a moment). For a good review of these sequences, see Ruth Ley's paper on mouse (2006). This report is an attempt both to better characterize these groups phylogenetic and environmental distribution, as well as to develop methods for rapidly quantifying these potential basal cyanobacteria in enrichments.

Results and Discussion: Phylogenetic Analysis: The most plausible candidates for ancestral, aphotic cyanobacteria are those organisms—which have been detected by environmental 16S rRNA gene sequencing surveys—whose 16S rRNA gene sequences fall low (basal to photosynthetic cyanobacteria) on the cyanobacterial branch of bacterial phylogenies. The most extensive and interactive source for exploring environmental 16S sequences in the context of overall bacterial diversity is the Silva Arb database. In the most recent non-redundant Silva Arb database, 108, there are 7 groups of sequences which fall on the cyanobacterial branch of the bacterial tree, basal to all known photosynthetic cyanobacteria but above the root of the branch where cyanobacteria split from their nearest neighbors. If organisms related to ancestral cyanobacteria exist today, it is likely those that possess these sequences. The Silva Arb database is constructed by using the parsimony insertion tool in Arb to add curated, aligned sequences (above a certain alignment quality) a guide tree inferred de novo from sequences in the previous release. The parsimony insertion algorithm, while powerful and invaluable for identifying previously unknown sources of microbial diversity, is not especially sensitive in resolving deeply divergent branchings and gives no estimate of the confidence in any given grouping. In order to resolve the relation of putative ancestral cyanobacterial lineages to known cyanobacteria, I constructed a of 16S rRNA sequences representing known, photosynthetic cyanobacteria, all the putative basal clades, as well as representatives of ' nearest neighbors in the bacterial tree (Figure 1). Phylogenetic analysis served to answer three primary questions about the putative basal cyanobacterial groups. A cursory examination of the basal cyanobacterial clades in the Silva Arb tree reveals a relationship between environment and phylogeny among the groups; most sequences that cluster together into basal clades in the cyanobacteria branch in Arb were extracted from the same environment, and most sequences extracted from certain environments in the cyanobacteria tree will fall into the same group. For example sequences that fall into the group labeled 4c0d in the Figure 1: Maximum likelihood phylogenetic tree of representatives of photosynthetic cyanobacteria, putative basal cyanobacteria, and cyanobacteria nearest neighbors, using alba as an outgroup. Branches are colored by phylogeny. Leaves are colored by environment the sequences were derived from. cyanobacteria branch in Arb were found predominantly in 16S surveys of guts or animal feces, and most of the cyanobacterial sequences derived from guts are found in the 4c0d branch (with the exception of the SHA-109 group, which contains many sequences from a single survey of sheep rumen). Furthermore, the gut-associated sequences in 4c0d were derived from numerous independent investigations or different guts and surprisingly wide diversity of feces (orangutan, elephant, penguin, zebra, mouse, etc). Similarly, the group MLE1-12 has been observed primarily in drinking water and drinking water associated habitats (e.g. drinking-water treatment , aquifers, etc). Surprisingly, MLE1-12 does not contain many sequences from other fresh-water habitats (e.g. lakes, rivers), but seems almost solely to have been observed in below-ground, human associated water sources. The first question I wanted to answer about these basal cyanobacterial sequences was whether these environmental associations hold up to more rigorous phylogenetic analysis. This question can be addressed somewhat through bootstrapping: a technique whereby positions sequences in an alignment used to build a phylogenetic tree are randomly resampled with replacement, trees are reconstructed from the random resamples, and clades are scored based on whether all of the members of that group based on the consensus tree are present in a given bootstrap tree; a bootstrap value of 70 on a given node means that all leaves above that node were present above that node in 70 percent of the randomly generated trees but does not tell you which sequences moved nor where they moved in the other 30 percent of the bootstrap trees. This gives you some measure of confidence that there is sufficient phylogenetic signal in the sequences to infer accurate relationships. Anyway, getting back to the tree. Three of the original basal cyanobacterial groups in the Silva tree end up on the cyanobacterial branch (above the split with the neighboring groups). Two of these groups (the gut group 4c0d and the drinking water group MLE1-12) have very high bootstrap support (97 and 98 % respectively). The remaining group (ML635J21) splits into two groups which cluster by environment (soil and marine sediment), of which the sediment group is well supported (bootstrap=99).

Figure 2: Maximum likelihood tree of cyanobacteria and nearest neighbors after removing low quality sequences and clustering basal groups Likewise, the remaining putative basal cyanobacterial groups have reasonably high bootstraps but fall on the neighboring branch, not the cyanobacteria. At the very least this seems to answer the first question, that at least some of the environmental groups are well supported phylogenetic groups as well. Some statistics may be required to test this hypothesis more rigorously but I don't at the moment know what they would be. The remaining issues are whether or not the basal groups of interest are actually cyanobacteria to the exclusion of other neighboring groups, and whether or not the groups of interest are indeed basal (i.e. related to the ancestral state) to photosynthetic cyanobacteria. We can again address both of these questions somewhat with bootstrapping. In the tree shown above, the bootstrap value for the node between the branch containing all putative and known cyanobacteria and their nearest neighbors, , is 11. As you move up the cyanobacterial branch, the values increase somewhat. At the node between the soil group of basal cyanobacteria and the branch containing all known cyanobacteria (along with the gut and drinking-water groups) is 65. This indicates that the grouping of the gut and drinking water basal groups with the cyanobacteria is better supported than grouping the soil and sediment clades with cyanobacteria. The node where photosynthetic cyanobacteria branch off has lower support at 40. This pattern indicates that the low value for the known cyanobacteria is likely a result of interchange in the bootstrap trees between the gut/drinking water groups and the known group. These ambiguities can be resolved by actually visually analyzing the bootstrap trees. All in all, these values indicate that there is low support in the sequences for any given branch order deep in the cyanobacterial tree, but that the gut and drinking-water groups are most likely cyanobacteria and not more closely related to any of their neighbors. One potential way to further resolve these questions is to remove low quality sequences from the analysis, which may cause ambiguities in the bootstraps. Figure 2 shows a tree constructed using the same methods as above after removing sequences with low alignment scores, and clustering the basal cyanobaterial groups at 90% sequence identity and choosing only one representative sequence from each cluster. Bootstrap support increases for photosynthetic cyanobacteria to 75 (the node below ) but does not change for grouping the gut and drinking water groups with cyanobacteria. The bootstrap value for the split between the cyanobacterial branch and the chloroflexi branch rises slightly to 31, still too low to conclude anything.

Basal Cyanobacteria-specific PCR Primers: I attempted to design PCR primers for the 16S rRNA gene that were specific for cyanobacteria but inclusive of basal groups, as well as primers that were specific for several of the basal groups that was inferred to be cyanobacterial based on my preliminary phylogeny (in particular, the gut group, the drinking-water group, the apparently soil-associated group, and the sediment-associated group). In order to do this I manually modified existing cyanobacterial 16S primers to match each position to the bases present in my sequences of interest. Where there was substantial disagreement about the base at a given position I added a degenerate base to accommodate all possibilities. As it turned out, this significantly reduced the specificity of the primers. I tested each primer set against all 8 of my samples (termite guts, first-catch drinking water, flushed drinking water, waste water influent, waste water effluent, SBR 1 sludge, SBR 2 sludge, SBR 1 liquid, and SBR 2 liquid) (PCR results: Figure 2). The primers designed for the sediment-associated group (ML635J21) amplified in all of my samples except the two drinking water samples (of which, none of the primers really amplified all that well). All six primer sets amplified to some extent in the waste-water effluent. I attempted to clone from all seven successful amplifications using the ML635J21 primers, as well as Cya359-781, MLE1-12, and SMD from the waste-water effluent sample. Of these, all but ML635J21 from SBR-2 liquid successfully cloned. Figure 3: Gel image of all cyanobacteria primers on all samples. Primer key: 1=Cya106-781, 2=Cya359-781, 3=MLE1-12, 4=ML635J21, 5=4c0D, 6=SMD.

32ish clones were selected from each library for sequencing. The two plates consisting of “ML635J21” sequences each yielded usable sequences in approximately half of their wells. Judging from the trace files of the sequencing runs, it is unclear why sequencing quality was low; it does not appear to be the result of mixed amplicons. The plate containing clones from the three other primer sets used against the waste-water effluent sample yielded a higher percentage of usable sequences. Figure 3 summarizes the taxonomic classification of the sequences from each of the four primer pairs tested.

Figure 4: Relative abundance of bacterial classes observed in clone libraries made from "cyanobacteria-specific" PCR primer amplicons. : ( As you can see, there are very few cyanobacterial sequences were picked up in any of the clone libraries. There were no cyanobacterial sequences in the library made from the modified cyanobacteria specific primers designed by Garcia-Pichel et al, nor from the primers I designed for ML635J21. Of the 3 sequences that came back as cyanobacterial based on the RDP classifier from MLE1-12, two were classified as chloroplast and the remaining 1 had no classification more specific than cyanobacteria. The same unclassified cyanobacteria sequence when inserted into the Silva 108 NR Arb database using Arb's parsimony insertion tool, clusters with Planktothrix. Interestingly, of the two sequences from the MLE1-12 library that RDP classified as chloroplast, only one clusters with chloroplast in Arb. Additionally, there is one sequence from the MLE1-12 library which clusters with chloroplast in Arb but was not identified as such by RDP. Likewise, RDP classified only one sequence from the SMD library as cyanobacteria, and specifically chloroplast. That sequence, along with one other, cluster with chloroplast in Arb. Presumably there are other discrepancies which I did not detect. Something to be aware of if you need really detailed classification. In any case, the take home message is that none of my clone library sequences from any the primer sets I thought I had designed to be cyanobacteria specific actually came back as the basal cyanobacterial groups I was looking for, regardless of the sequence-classification method I used. This may be explicable based on the predicted taxonomic coverage of my primers as well as the observed diversity of bacteria in the samples I collected.

Figure 5: Relative abundance of bacterial phyla observed in all three samples submitted for 454. 454 Pyrosequencing of 16S rRNA genes: Concurrent to primer design and testing, I submitted three of my samples for 454 pyrosequencing of 16S rRNA genes to verify whether my target groups were actually present in my samples. Of the 9 samples from which I had DNA, I chose three for deep sequencing: termite-gut microbiota, flushed drinking water, and waste-water influent. The observed relative abundance of bacteria phyla from each sample is summarized in Figure 4. All three samples did contain cyanobacterial sequences. The breakdown of what cyanobacteria were present in each sample is summarized in Figure 5. Of the five groups of cyanobacterial sequences two belong to the putative cyanobacterial groups. In the termite gut microbiota sample I observed Wd272 sequences, but this is one of the groups of ambiguous relationship to cyanobacteria in my phylogeny and one for which I did not even really try to design primers. Second, I did observe 4c0d sequences in the waste-water influent sample. This is the well-supported gut-group of basal cyanobacteria. I did design primers for this group, and they did not efficiently amplify in any of my samples. The 454 data indicates that at least one of the groups of interest for this report was present in low abundance in at least one of my samples. The fact that the primers for that group did not amplify, and none were detected in any of the clone libraries for other primers indicates flaws in the primer design.

Figure 5: Yet another pie chart, this time depicting the identity and relative abundance of cyanobacterial sequences observed in three samples submitted for 454 sequencing Theoretical Taxonomic Coverage of Primers: I tested the theoretical taxonomic coverage of each of my cyanobacteria-specific 16S rRNA gene PCR primers against a reduced version of the Silva 108 Arb database, using the program taxa_coverage.py program included in Primer Prospector. Figure 6 shows the distribution of total hits of each primer across the database. Modified Cya106, 359, and 781 (from Garcia-Pichel et al) as well as those I re-designed for the SMD groups have a very wide taxonomic range of theoretical amplification. What's more, both Cya359-781 as well as the SMD primers should theoretically amplify most bacterial 16S sequences. It is therefore somewhat surprising that neither pair showed substantial amplification in initial PCR tests of the primers. Primers designed for the 4c0d, MLE1-12, and ML635J21 are theoretically more specific for cyanobacteria. As it turns out, it is not particularly trivial to summarize the predicted taxonomic coverage of a set of primers below the phylum level based on the output of taxa_coverage.py, so I was unable to do so here. I can tell you however, that for MLE1-12 96% of all predicted hits are from cyanobacteria but only 2% of all predicted hits are from MLE1-12. Likewise, for 4c0d 98% of hits are cyanobacteria but only 11% are actually 4c0d. Figure 6: Last batch of pie charts--I swear. Anyway, for these ones I took the predicted hits of the primers I designed and then searched a reduced version of Silva NR 108. The total of each pie is any hit in the database with a "weighted score" lower than 1 (basically, allows up to 2 mismatches). The pies show the relative representation of each phylum in those theoretical hits. Total hits that pass threshold are shown below pies. Conclusions: If organisms exist today that are the direct descendants of ancestral, aphotic cyanobacteria, their discovery and characterization would prove invaluable for understanding the origins of oxygenic photosynthesis. Being able to isolate such organisms would first require resolving their relationship to extant, photosynthetic cyanobacteria and being able to assay readily for their enrichment, given a lack of a priori physiological knowledge. This mini-project was an attempt to make initial steps in both of these directions. Based on my phylogenetic analysis, there is reasonably strong support for the gut-associated and drinking-water associated putative basal cyanobacteria grouping with photosynthetic cyanobacteria. In addition, there is strong support for these two groups falling basal to all known photosynthetic cyanobacteria, making them ideal candidates for the descendants of ancestral aphotic cyanobacteria. There is less strong support for other groups of basal cyanobacteria, and the divergences may be too ancient to accurately resolve (for example reversions from a mutation back to the original sequence reduce bootstrap values and the likelihood of such reversions increases with the age of the divergence). Further directions in terms of phylogenetic analysis include inferring and comparing trees from different tree-building methods and using different models of substitution. It also may be possible to further resolve the tree by removing particularly poorly supported branches from analysis (e.g. some of the putative basal groups that jump over to the chloroflexi) or adding more groups of neighboring sequences. My attempts at designing primers failed pretty conclusively. I think what happened is that I added sufficient degeneracy that certain primers in the degernate set were specific for a wide array of non-cyanobacteria groups, and because cyanobacteria are such low abundance (even when present) in my samples, the other groups were more readily amplified and detected in my clone libraries. So, even though some of my primers were somewhat specific, they needed to be really stringent in order to avoid amplifying more abundant groups. The better approach in designing primers might be to use a method that explicitly excludes groups you don't want to amplify. This is possible using primer prospector (including the -e flag in the generate_primers_denovo.py program along with a fasta of sequences you want to exclude) but this outputs a lot of primers which are difficult to pair and then optimize. Arb might actually be a better option, but if you want to use if to design primers for this course you need to make sure you can set up the pt-server early on (if you check out the primer design tool in Arb, you'll understand what I'm talking about). All in all, my most important conclusion is this: it's really difficult to target a phylogenetically recognizable but uncharacterized group of organisms for cultivation. And it's nigh impossible to do this in any way that's gratifying for your mini-project. Historically, novel groups of microbes have been cultured by picking a thermodynamically possible but previously unexplored chemistry and creating the conditions in which a bug capable of that chemistry would flourish. With the advent of environmental 16S rRNA gene sequencing, we've come to understand that the groups of organisms we've been able to culture are an astonishingly small part of the diversity that exists. There are dozens of bacterial phyla that we can infer to be as phylogenetically distinct from one another as any that we've cultured, and yet we have no idea what such organisms are doing. It's not clear from the phylogenies what we can do to learn much more. Success stories in this emerging realm of strike me as so far few and far between. The most conspicuous example being the SAR11 group, which required somewhat fortuitous breakthroughs in understanding its physiology in order to allow cultivation (Rappe et al., 2002). My impression is that pursuing uncultivated microbes, at this point, based purely on recognition of their distinct phylogeny, requires more persistent dedication and luck than the time for this course really allows. That being said I learned a hell of a lot. Aknowledgements: Given the exhaustive pessimism perhaps evident throughout this report I may seem ungrateful and unenthusiastic by the end of this course. But I can say with complete confidence that this has been the single most worthwhile thing I've done with my life so far. I can't even begin to express my gratitude to the long list of people who made this the case. In particularly though, I want to thank Steven Zinder and Dan Buckley. I can only imagine the tremendous amount of work that must go into coordinating and teaching this course. But more than anything else, what made this course so wonderful was the dedication and commitment that Dan and Steve had to ensuring that we as students got as much out of this as possible.

None of the work I put into my project would have been remotely possible for me without Chuck Pepe- Raney. I also need to thank Suzanna Brauer and Suiquan Tang. And Ashley Campbell, for innumerable favors and acts of support but especially for driving me out to the waste-water treatment one hectic afternoon. And Adam Zinder (as well as again Steve) for help with the termite hunt. As well as all of the TA's and all of the students for an incredible amount of support.

As well, this would not have been possible for me without the support and patience of my advisor, Doug Cook, who hopefully will never find out that his name is associated with such a work as this. And the support of the MBL Pioneers Scholarship Fund and the Pfizer Inc. Endowed Scholarship.

Thank you all so much!

Methods: Phylogenetic analysis: All 16S rRNA gene sequences used in this analysis were selected from the Silva Non-redundant Arb Database, release 108. Initially, sequences were selected to represent a broad diversity of cyanobacteria, the bacterial groups inferred to be most closely related to cyanobacteria, as well as all of the sequences in the database from the putative basal cyanobacterial groups. First, all sequences from “type strains” of cultivated organisms (the sequences that appear blue in the Silva Arb tree) were selected from cyanobacteria, as well as their nearest neighboring groups (, , , and chloroflexi) in the Silva Arb tree. In addition, I hand selected sequences representing well supported but uncultured groups from within each of cyanobacteria and its nearest neighbors. All from each of the groups appearing in the Silva Arb tree on the cyanobacterial branch basal to Gloeobacter were also included in the initial analysis. Sequences were aligned using the program SSU-align using default settings and alignments were masked using the SSU-mask program included in the SSU-align distribution. Large phylogenetic trees (i.e. those including all of the above-described groups) were constructed using the phylogenetic inference program RAxML (version 7.2.8), using a general, time-reversible Gamma distribution (GTRGAMMA) substitution-rate model. Bootstrapping was performed using RaxML's rapid bootstrapping algorithm with 100 replicates. 10 randomized stepwise addition parsimony trees were used in construction of the consensus tree. To resolve ambiguities in the clustering of the drinking water-associated and gut-associated basal cyanobacterial groups with photosynthetic cyanobacteria, 1 sequence from each group in the initial tree with a bootstrap value above 70 was selected. In addition, I excluded the putative basal cyanobacterial group SHA-109 from further analysis because of its low bootstrap support. Trees were inferred from this reduced sequence set using RAxML in Arb, with a GTRGAMMA substitution-rate model, rapid bootstrapping with 1000 replicates, and 10 randomized trees for the consensus tree. Primer “Design”: I attempted to design primers that would be specific to cyanobacteria 16S rRNA genes and inclusive of the basal groups of interest for this report, as well as primers specific for each of the groups. My approach for doing so involved using the software package primer prospector to modify cyanobacterial 16S rRNA gene primers designed by Garcia-Pichel et al. In order to do this I used the analyze_primers.py program with the -t flag to compare the published CYA106f, CYA359f, and CYA781R primers from Garcia-Pichel et al to fasta files including all sequences on the cyanobacterial branch of the Silva Arb tree, only sequences from the group MLE1-12 in the Silva Arb tree, only sequences from the 4c0d group, only sequences from ML635J21, and last from the two groups starting with SM on the cyanobacterial branch. I then manually compared the bases at each position in the published primers with the base frequencies in each group, changing the primer sequences accordingly to add specificity to each group and inserting degenerate bases, where the minority base at a position for a given group represented more than 5 % of the sequences. I retroactively explored the predicted taxonomic coverage of each primer pair by running analyze_primers.py on each of my primers against representative sequences from all of the 97%id clusters in the Silva Arb 108 database (taken from http://www.arb- silva.de/no_cache/download/archive/qiime/). The hits file was then used to assess taxonomic coverage by using the taxa_coverage.py program (included in primer prospector) with the Silva_108_taxa_mapping file (again, from the qiime-clustered Silva database) as the taxon mapper. I later attempted to redesign basal cyanobacteria-specific 16S primers by using the generate_primers_denovo.py program with sequences from cyanobacterial group of interest as input, then using sort_primers.py to pick primer pairs, and assessing taxonomic coverage of potential primer pairs using the above-described steps (all of this and more will make sense if you follow the primer prospector tutorial at http://pprospector.sourceforge.net/tutorial/tutorial.html, I promise. That being said, as it turns out, Arb—odious though it may be—might make for a more useful and powerful tool in designing taxa-specific 16S primers. Best of luck if you decide to do this sort of thing).

Sample collection and DNA extraction: Drinking water samples were collected from the tap in the 2nd floor break room in autoclaved 1L glass bottles. 1 first-catch sample was collected as soon as the tap was turned on. In addition, a flushed sample was collected by running the tap for 5 minutes (ideally you would check the residuals of whatever disinfectant your municipality uses to make sure that you're actually drawing water from the distribution line, rather than water that's been sitting in the pipes for some time). 250 mL of water were filtered through 47 mm diameter, 0.2 μm pore polycarbonate filter for each sample. For DNA extractions, filters were wound cell-side inward around the inside of the bead-beating tubes from MOBio's PowerBiofilm DNA Isolation kits, and bead-beat for 1 minute with 350 μL of buffer BF1. The manufacturer protocol was used from hereon out. If you're interested in looking at drinking-water microbiology though, I would highly recommend getting 0.2 μm pore nitrocellulose filters (rather than polycarbonate). You can filter a lot more water, more quickly, and use the whole filter in a phenol- chloroform extraction, getting higher DNA yields. Waste-water samples, as well as sequencing batch reactor (SBR) sludge samples were provided by the Falmouth Wastewater Department. SBR samples were collected by dipping a bucket a in the reactors during their cycles. SBR was running it's mix-fill cycle during collection; SBR 2 was running the react cycle during collection (see David Vuono's report from 2011 to understand what these things mean). The SBR1, SBR2, and effluent samples were collected into autoclaved, 1L graduated cylinders. SBR1 and SBR2 samples sat at 4°C for several hours to allow sludge to separate from the liquid. DNA from SRB liquid phase as well as waste-water effluent was extracted by filter 200 mL and following the DNA extraction procedure outlined for drinking water. DNA from waste-water influent, as well as the SRB sludge samples was extracted by centrifuging 1.5 mL of each, and extracting from the pellet as per the MOBio PowerBiofilm DNA Isolation kit manufacturer protocol. Termites: I spent several hours a day over the course of three days scouring cape cod for termites—flipping over damp logs in the woods and digging, asking home-owners if they'd seen any around their house, etc—in both Woods Hole and Falmouth and never saw a single goddamn one. Upon visiting Ed Leadbetter's yard with Steve and Adam Zinder we found 3 colonies within 5 minutes. If you're looking for termites, just go there. As a result of the difficulty in acquiring samples, the termites I originally used for 454 and clone libraries had been collected three weeks previously and had been living in a tupperware in the lab off of cardboard for the duration. The termite gut microbiota was extracted by clamping the head of each termite with forceps, grasping the very tip of the abdomen and pulling to extrude the hind-gut (it is highly recommended that you put the termites on ice prior to doing this—it is a rather more disturbing process otherwise) (detailed instructions and video can be found in Matson et al). The hindgut was pierced with a needle and drained into anoxic basal salts solution (as per Warneke et al). The gut contents of ~25 termites was pooled and DNA extracted using the MOBio PowerSoil DNA Isolation kit following the manufacturer protocol.

454 library preparation: PCR was performed on DNA extractions from termite guts, flushed drinking water, and waste- water influent using barcoded 515F and 907R bacterial 16S rRNA gene primers (see appendix for details) and Phusion high fidelity taq polymerase (New England Biolabs (probably), Ipswitch, MA). The reaction was conducted using a “touch-down” cycle (see appendix 2). After the PCR product was “quantified” using a gel image, Chuck (Pepe-Raney) pretty much took over the preparation. Once the sequences came back, they were analyzed using qiime (Caporaso, 2010). This involved splitting my libraries out (see Appendix 1 again for primer details) using split_libraries.py to remove sequences less than 400 bp and greater than 450 bp, and using the flag -e 0. Sequences were then clustered at 97 percent identity using uclust (through qiime), representatives sequences chosen, and classified using assign_taxonomy.py. For taxonomic assignments, we used the RDP classifier with a training set consisting of a set of representative sequences for each of 97 % identity clusters in the Silva 108 NR database (training set available at http://www.arb- silva.de/no_cache/download/archive/qiime/, for an explanation as to why bother with this see Werner et al., 2012).

PCR with “Cyanobacteria specific” primers: Check out Appendix 3 for the PCR mix recipe. Also appendix 4 for the primer sequences. And 5 for the reaction cycle.

Cloning: Ran the PCR products on a 1% agarose TAE gel, cut out the bands, spun them down (follow the Millipore gel purification kit protocol). Then cloned it! (following the Invitrogen topo cloning kit, with a ligation time of like an hour...sorry I've been working on this a long time now).

Appendices!: Appendix 1: SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer StudentNameBarcodePlateWell Description AlexFL.AG TTGTCGGTT GAGTGYCAGCMGCCGCGGTAA GGCCGYCAATTCMTTTRAGTTT Alex E06 AlexFL.AG AlexGnt.AG ATAACCGTT GAGTGYCAGCMGCCGCGGTAA GGCCGYCAATTCMTTTRAGTTT Alex F01 AlexGnt.AG AlexWW.AG GTACCGGTT GAGTGYCAGCMGCCGCGGTAA GGCCGYCAATTCMTTTRAGTTT Alex F02 AlexWW.AG

(sorry about this weird empty table below...I can't get rid of it) Appendix 2: Stage 1: 1x 98 C 60 sec

Stage 2: 10x 98 C 5 s 68 C 10 s 72 C 7 s

Stage 3: 12x 98 C 5 s 58 C 10 s 72 C 7 s

Stage 4: 72 C 21

Appendix 3: Ingredient per sample (uL) Phusion 2X HF MasterMix 15 DMSO, 100% 2.4 Reverse Primer, 25uM 0.6 Forward Primer, 6.25 uM 2.4 Template DNA 40-100 ng 8 Water 1.6

Appendix 4: They didn't work anyway. It's really late. I don't want to deal with it. If you're really curious about my primers, just email me: [email protected]

Appendix 5: Step Temperature Time Initial Denaturation 95°C 5 min Denaturation 95°C 30 sec Annealing 52°C 30 sec Repeat 30x Extension 72°C 30 sec Final Extension 72°C 5 min Hold 4°C Forever! References: Beck, C., Knoop, H., Axmann, I. M., & Steuer, R. (2012). The diversity of cyanobacterial metabolism: genome analysis of multiple phototrophic microorganisms. BMC genomics, 13(1), 56. BioMed Central Ltd. doi:10.1186/1471-2164-13-56 Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., et al. (2011). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), 335-336. doi:10.1038/nmeth.f.303.QIIME Garcia-pichel, F., Nu, U., & Muyzer, G. (1997). PCR Primers To Amplify 16S rRNA Genes from Cyanobacteria. Applied and environmental microbiology, 63(8), 3327-3332. Ley, R. E., Bäckhed, F., Turnbaugh, P., Lozupone, C. a, Knight, R. D., & Gordon, J. I. (2005). Obesity alters gut . Proceedings of the National Academy of Sciences of the United States of America, 102(31), 11070-5. doi:10.1073/pnas.0504978102 Matson, E., Ottesen, E., & Leadbetter, J. (2007). Extracting DNA from the gut microbes of the termite (Zootermopsis nevadensis). Journal of visualized experiments : JoVE, 22(4), 195. doi:10.3791/195 Mulkidjanian, A. Y., Koonin, E. V., Makarova, K. S., Mekhedov, S. L., Sorokin, A., Wolf, Y. I., Dufresne, A., et al. (2006). The cyanobacterial genome core and the origin of photosynthesis. Proceedings of the National Academy of Sciences of the United States of America, 103(35), 13126-31. doi:10.1073/pnas.0605709103 Overmann, J., & Garcia-pichel, F. (2006). The . (M. Dworkin, S. Falkow, E. Rosenberg, K.- H. Schleifer, & E. Stackebrandt, Eds.) (pp. 32-85). Springer New York. doi:10.1007/0-387-30742- 7 Stal, L. (1997). Fermentation in cyanobacteria. FEMS microbiology Reviews, 21, 179-211. Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1574-6976.1997.tb00350.x/full Warnecke, F., Luginbühl, P., Ivanova, N., Ghassemian, M., Richardson, T. H., Stege, J. T., Cayouette, M., et al. (2007). Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature, 450(7169), 560-5. doi:10.1038/nature06269